History log of /linux-master/drivers/gpu/drm/i915/gt/intel_lrc.c
Revision Date Author Comments
# 03fe4b87 26-Oct-2023 Andrzej Hajda <andrzej.hajda@intel.com>

drm/i915: Add WABB blit for Wa_16018031267 / Wa_16018063123

Apply WABB blit for Wa_16018031267 / Wa_16018063123.

v3: drop unused enum definition
v4: move selftest to separate patch, use wa only on BCS0.
v5: fixed selftest caller to context_wabb

Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231026-wabb-v6-2-4aa7d55d0a8a@intel.com


# afddcbe4 17-Sep-2023 Alan Previn <alan.previn.teres.alexis@intel.com>

drm/i915/lrc: User PXP contexts requires runalone bit in lrc

On Meteorlake onwards, HW specs require that all user contexts that
run on render or compute engines and require PXP must enforce
run-alone bit in lrc. Add this enforcement for protected contexts.

Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Vivaik Balasubrawmanian <vivaik.balasubrawmanian@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230917211933.1407559-4-alan.previn.teres.alexis@intel.com


# 98fa06e4 14-Sep-2023 Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>

drm/i915: Add Wa_18022495364

Invalidate instruction and State cache bit using INDIRECT_CTX on
every gpu context switch for gen12.
The goal of this workaround is to actually perform an explicit
invalidation of that cache (by re-writing the register) during every GPU
context switch, which is accomplished via a "workaround batchbuffer"
that's attached to the context via INDIRECT_CTX. (Matt Roper)

Please refer [1] for more reviews and comment on the same patch

[1] https://patchwork.freedesktop.org/series/123377/

v2:
- Remove extra parentheses from the condition (Lucas)
- Align spacing and new line (Lucas)

v3:
- Fix commit message.

v4:
- Only Gen12 changes are kept and Remove DG2+ condition (Matt Roper)
- Fix the commit message for r-b (Matt Roper)
- Rename the register bit in define

v5:
- Move out this workaround from golden context init (Matt Roper)
- Use INDIRECT_CTX to set bit on each GPU context switch (Matt Roper)

v6:
- Change IP Version base condition for Gen12 (Matt Roper)
- Made imperative form of commit version messages (Suraj)
- s/Added/Add in patch header (Suraj)

v7:
- In version descriptions s/Ropper/Roper (Matt Atwood)

BSpec: 11354
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Suraj Kandpal <suraj.kandpal@intel.com>
Cc: Matt Atwood <matthew.s.atwood@intel.com>
Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230914202000.1069884-1-dnyaneshwar.bhadane@intel.com


# c92ec508 13-Sep-2023 Dan Carpenter <dan.carpenter@linaro.org>

drm/i915/gt: Prevent error pointer dereference

Move the check for "if (IS_ERR(obj))" in front of the call to
i915_gem_object_set_cache_coherency() which dereferences "obj".
Otherwise it will lead to a crash.

Fixes: 43aa755eae2c ("drm/i915/mtl: Update cache coherency setting for context structure")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/455b2279-2e08-4d00-9784-be56d8ee42e3@moroto.mountain


# 5a213086 21-Aug-2023 Matt Roper <matthew.d.roper@intel.com>

drm/i915: Eliminate IS_MTL_GRAPHICS_STEP

Several workarounds are guarded by IS_MTL_GRAPHICS_STEP. However none
of these workarounds are actually tied to MTL as a platform; they only
relate to the Xe_LPG graphics IP, regardless of what platform it appears
in. At the moment MTL is the only platform that uses Xe_LPG with IP
versions 12.70 and 12.71, but we can't count on this being true in the
future. Switch these to use a new IS_GFX_GT_IP_STEP() macro instead
that is purely based on IP version. IS_GFX_GT_IP_STEP() is also
GT-based rather than device-based, which will help prevent mistakes
where we accidentally try to apply Xe_LPG graphics workarounds to the
Xe_LPM+ media GT and vice-versa.

v2:
- Switch to a more generic and shorter IS_GT_IP_STEP macro that can be
used for both graphics and media IP (and any other kind of GTs that
show up in the future).
v3:
- Switch back to long-form IS_GFX_GT_IP_STEP macro. (Jani)
- Move macro to intel_gt.h. (Andi)
v4:
- Build IS_GFX_GT_IP_STEP on top of IS_GFX_GT_IP_RANGE and
IS_GRAPHICS_STEP building blocks and name the parameters from/until
rather than begin/fixed. (Jani)
- Fix usage examples in comment.
v5:
- Tweak comment on macro. (Gustavo)

Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230821180619.650007-15-matthew.d.roper@intel.com


# eaeb4b36 16-Aug-2023 Matt Roper <matthew.d.roper@intel.com>

drm/i915/dg2: Drop pre-production GT workarounds

DG2 first production steppings were C0 (for DG2-G10), B1 (for DG2-G11),
and A1 (for DG2-G12). Several workarounds that apply onto to
pre-production hardware can be dropped. Furthermore, several
workarounds that apply to all production steppings can have their
conditions simplified to no longer check the GT stepping.

v2:
- Keep Wa_16011777198 in place for now; it will be removed separately
in a follow-up patch to keep review easier.

Bspec: 44477
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230816214201.534095-10-matthew.d.roper@intel.com


# f17cc0f1 13-Sep-2023 Dan Carpenter <dan.carpenter@linaro.org>

drm/i915/gt: Prevent error pointer dereference

Move the check for "if (IS_ERR(obj))" in front of the call to
i915_gem_object_set_cache_coherency() which dereferences "obj".
Otherwise it will lead to a crash.

Fixes: 43aa755eae2c ("drm/i915/mtl: Update cache coherency setting for context structure")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/455b2279-2e08-4d00-9784-be56d8ee42e3@moroto.mountain
(cherry picked from commit c92ec50822fb84306d951520d81919328421acbd)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# f1530f91 07-Aug-2023 Jonathan Cavitt <jonathan.cavitt@intel.com>

drm/i915/gt: Apply workaround 22016122933 correctly

WA_22016122933 was recently applied to all MeteorLake engines, which is
simultaneously too broad (should only apply to Media engines) and too
specific (should apply to all platforms that use the same media engine
as MeteorLake). Correct this in cases where coherency settings are
modified.

There were also two additional places where the workaround was applied
unconditionally. The change was confirmed as necessary for all
platforms, so the workaround label was removed.

Suggested-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Acked-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230801153242.2445478-4-jonathan.cavitt@intel.com
Link: https://patchwork.freedesktop.org/patch/msgid/20230807121957.598420-4-andi.shyti@linux.intel.com


# 115cdcca 07-Aug-2023 Jonathan Cavitt <jonathan.cavitt@intel.com>

drm/i915: Make i915_coherent_map_type GT-centric

Refactor i915_coherent_map_type to be GT-centric rather than
device-centric. Each GT may require different coherency
handling due to hardware workarounds.

Since the function now takes a GT instead of the i915, the function is
renamed and moved to the gt folder.

Suggested-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Acked-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230801153242.2445478-3-jonathan.cavitt@intel.com
Link: https://patchwork.freedesktop.org/patch/msgid/20230807121957.598420-3-andi.shyti@linux.intel.com


# 76ff7789 24-Jul-2023 Andi Shyti <andi.shyti@linux.intel.com>

drm/i915/gt: Support aux invalidation on all engines

Perform some refactoring with the purpose of keeping in one
single place all the operations around the aux table
invalidation.

With this refactoring add more engines where the invalidation
should be performed.

Fixes: 972282c4cf24 ("drm/i915/gen12: Add aux table invalidate for all engines")
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: <stable@vger.kernel.org> # v5.8+
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230725001950.1014671-8-andi.shyti@linux.intel.com


# 2f0b927d 24-Jul-2023 Andi Shyti <andi.shyti@linux.intel.com>

drm/i915/gt: Cleanup aux invalidation registers

Fix the 'NV' definition postfix that is supposed to be INV.

Take the chance to also order properly the registers based on
their address and call the GEN12_GFX_CCS_AUX_INV address as
GEN12_CCS_AUX_INV like all the other similar registers.

Remove also VD1, VD3 and VE1 registers that don't exist and add
BCS0 and CCS0.

Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.8+
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230725001950.1014671-2-andi.shyti@linux.intel.com


# 43aa755e 06-Jul-2023 Zhanjun Dong <zhanjun.dong@intel.com>

drm/i915/mtl: Update cache coherency setting for context structure

As context structure is shared memory for CPU/GPU, Wa_22016122933 is
needed for this memory block as well.

Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
CC: Fei Yang <fei.yang@intel.com>
Reviewed-by: Fei Yang <fei.yang@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230706174704.177929-1-zhanjun.dong@intel.com


# 6a35f22d 24-Jul-2023 Andi Shyti <andi.shyti@linux.intel.com>

drm/i915/gt: Support aux invalidation on all engines

Perform some refactoring with the purpose of keeping in one
single place all the operations around the aux table
invalidation.

With this refactoring add more engines where the invalidation
should be performed.

Fixes: 972282c4cf24 ("drm/i915/gen12: Add aux table invalidate for all engines")
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: <stable@vger.kernel.org> # v5.8+
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230725001950.1014671-8-andi.shyti@linux.intel.com
(cherry picked from commit 76ff7789d6e63d1a10b3b58f5c70b2e640c7a880)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# d14560ac 24-Jul-2023 Andi Shyti <andi.shyti@linux.intel.com>

drm/i915/gt: Cleanup aux invalidation registers

Fix the 'NV' definition postfix that is supposed to be INV.

Take the chance to also order properly the registers based on
their address and call the GEN12_GFX_CCS_AUX_INV address as
GEN12_CCS_AUX_INV like all the other similar registers.

Remove also VD1, VD3 and VE1 registers that don't exist and add
BCS0 and CCS0.

Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.8+
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230725001950.1014671-2-andi.shyti@linux.intel.com
(cherry picked from commit 2f0b927d3ca3440445975ebde27f3df1c3ed6f76)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# 1a365a2b 17-May-2023 Radhakrishna Sripada <radhakrishna.sripada@intel.com>

drm/i915/mtl: Extend Wa_16014892111 to MTL A-step

Like DG2, MTL a-step hardware is subject to Wa_16014892111 which
requires that any changes made to the DRAW_WATERMARK register be
done via an INDIRECT_CTX batch buffer rather than through a regular
context workaround.

The bspec gives the same non-default recommended tuning value
for DRAW_WATERMARK as DG2, so we can re-use the INDIRECT_CTX code
to apply that tuning setting on A-step hardware.

Application of the tuning setting on B-step and later does not
need INDIRECT_CTX handling and is already done in
mtl_ctx_workarounds_init() as usual.

v2: Limit the WA for A-step
v3: Update the commit message.
v4: Reorder platform checks and update commit message.

Bspec: 68331
Cc: Haridhar Kalvala <haridhar.kalvala@intel.com>
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230517233111.297542-2-radhakrishna.sripada@intel.com


# ca54a9a3 18-Jan-2023 Lucas De Marchi <lucas.demarchi@intel.com>

drm/i915/mtl: Fix bcs default context

Commit 0d0e7d1eea9e ("drm/i915/mtl: Define engine context layouts")
added the engine context for Meteor Lake. In a second revision of the
patch it was believed the xcs offsets were wrong due to a tagging
issue in the spec. The first version was actually correct, as shown
by the intel_lrc_live_selftests/live_lrc_layout test:

i915: Running gt_lrc
i915: Running intel_lrc_live_selftests/live_lrc_layout
bcs0: LRI command mismatch at dword 1, expected 1108101d found 11081019
[drm:drm_helper_probe_single_connector_modes [drm_kms_helper]] [CONNECTOR:236:DP-1] disconnected
bcs0: HW register image:
[0000] 00000000 1108101d 00022244 ffff0008 00022034 00000088 00022030 00000088
...
bcs0: SW register image:
[0000] 00000000 11081019 00022244 00090009 00022034 00000000 00022030 00000000

The difference in the 2 additional dwords (0x1d vs 0x19) are the offsets
0x120 / 0x124 that are indeed part of the context image.

Bspec: 45585

Fixes: 0d0e7d1eea9e ("drm/i915/mtl: Define engine context layouts")
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230111235531.3353815-2-radhakrishna.sripada@intel.com


# 262a6cd0 17-Jan-2023 Matt Roper <matthew.d.roper@intel.com>

drm/i915: Move/adjust register definitions related to Wa_22011450934

The implementation of Wa_22011450934 introduced three new register
definitions in i915_reg.h that didn't get moved to the GT/engine
register headers when all the other registers moved; let's move them to
the appropriate headers and tidy up their definitions now for
consistency:

- STATE_ACK_DEBUG is moved to the engine register header and converted
to a parameterized definition; the workaround only needs the RCS
instance to be programmed, but there are instances on other engines
that could be used by other workarounds in the future.

- The two CULLBIT registers move to the GT register header. Since
they belong to MMIO ranges that became MCR starting with Xe_HP,
their definitions should be defined as MCR_REG() and use an Xe_HP
prefix to keep the register semantics clear.

Note that the MCR definition is just for consistency and to prevent
accidental misuse if other workarounds related to these registers show
up in the future. There's no functional change to today's driver since
the workaround that references these registers only accesses them via
MI_LRR engine instructions. Engine-initiated register accesses do not
utilize the same steering controls as CPU-initiated accesses; they
use a different steering control register (0x20CC) which is initialized
to a non-terminated DSS target by pre-OS firmware and never changed
thereafter (i915 does not touch it and userspace does not have
permission to change that register).

Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230117202627.4134579-1-matthew.d.roper@intel.com


# 75444cff 18-Jan-2023 Lucas De Marchi <lucas.demarchi@intel.com>

drm/i915/mtl: Fix bcs default context

Commit 0d0e7d1eea9e ("drm/i915/mtl: Define engine context layouts")
added the engine context for Meteor Lake. In a second revision of the
patch it was believed the xcs offsets were wrong due to a tagging
issue in the spec. The first version was actually correct, as shown
by the intel_lrc_live_selftests/live_lrc_layout test:

i915: Running gt_lrc
i915: Running intel_lrc_live_selftests/live_lrc_layout
bcs0: LRI command mismatch at dword 1, expected 1108101d found 11081019
[drm:drm_helper_probe_single_connector_modes [drm_kms_helper]] [CONNECTOR:236:DP-1] disconnected
bcs0: HW register image:
[0000] 00000000 1108101d 00022244 ffff0008 00022034 00000088 00022030 00000088
...
bcs0: SW register image:
[0000] 00000000 11081019 00022244 00090009 00022034 00000000 00022030 00000000

The difference in the 2 additional dwords (0x1d vs 0x19) are the offsets
0x120 / 0x124 that are indeed part of the context image.

Bspec: 45585

Fixes: 0d0e7d1eea9e ("drm/i915/mtl: Define engine context layouts")
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230111235531.3353815-2-radhakrishna.sripada@intel.com
(cherry picked from commit ca54a9a32da0f0ef7e5cbcd111b66f3c9d78b7d2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# b1f80a5a 29-Sep-2022 Lucas De Marchi <lucas.demarchi@intel.com>

drm/i915/gt: Document function to decode register state context

It's not obvious how the encode/decode of the per platform tables is
done. Document it so while adding tables for new platforms people can be
confident they right things is being done.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220930050903.3479619-3-lucas.demarchi@intel.com


# c3d5cfe7 29-Sep-2022 Lucas De Marchi <lucas.demarchi@intel.com>

drm/i915: Fix __gen125_emit_bb_start() without WA

ce->wa_bb_page is allocated only for graphics version 12. However
__gen125_emit_bb_start() is used for any graphics version >= 12.50. For
the currently supported platforms this is not an issue, but for future
ones there's a mismatch causing the jump to
`wa_offset + DG2_PREDICATE_RESULT_BB` to be invalid since wa_offset is
not correct.

As in other places in the driver, check for graphics version "greater or
equal" to future-proof the support for new platforms.

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220930050903.3479619-2-lucas.demarchi@intel.com


# 0d0e7d1e 28-Sep-2022 Matt Roper <matthew.d.roper@intel.com>

drm/i915/mtl: Define engine context layouts

The part of the media and blitter engine contexts that we care about for
setting up an initial state on MTL are nearly similar to DG2 (and PVC).
The difference being PRT_BB_STATE being replaced with NOP.

For render/compute engines, the part of the context images are nearly
the same, although the layout had a very slight change --- one POSH
register was removed and the placement of some LRI/noops adjusted
slightly to compensate.

v2:
- Dg2, mtl xcs offsets slightly vary. Use a separate offsets array(Bala)
- Add missing nop in xcs offsets(Bala)
v3:
- Fix the spacing for nop in xcs offset(MattR)
v4:
- Fix rcs register offset(MattR)
v4.1:
- Fix commit message(Lucas)

Bspec: 46261, 46260, 45585
Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Cc: Licas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220928155511.2379663-1-radhakrishna.sripada@intel.com


# 783f6f85 07-Sep-2022 Lucas De Marchi <lucas.demarchi@intel.com>

drm/i915: Noop lrc_init_wa_ctx() on recent/future platforms

Except for graphics version 8 and 9, nothing is done in
lrc_init_wa_ctx(). Assume this won't be needed on future platforms as
well and remove the warning.

Note that this function is not called for anything below version 8 since
those don't use either guc or execlist, i.e. HAS_EXECLISTS() is false.

Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220907230841.1703574-1-lucas.demarchi@intel.com


# c9424fa1 13-Sep-2022 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Explicitly clear BB_OFFSET for new contexts

Even though the initial protocontext we load onto HW has the register
cleared, by the time we save it into the default image, BB_OFFSET has
had the enable bit set. Reclear BB_OFFSET for each new context.

Testcase: igt/i915_selftests/gt_lrc

v2:
Extend it for gen8.
v3:
BB_OFFSET is recorded per engine from Gen9 onwards

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/37c67abb3303852f06a570a4360addf52bf941c1.1663081418.git.karolina.drobnik@intel.com


# 29063c6a 06-Sep-2022 Matt Roper <matthew.d.roper@intel.com>

drm/i915/mtl: Add gsi_offset when emitting aux table invalidation

The aux table invalidation registers are a bit unique --- they're
engine-centric registers that reside in the GSI register space rather
than within the engines' regular MMIO ranges. That means that when
issuing invalidation on engines in the standalone media GT, the GSI
offset must be added to the regular MMIO offset for the invalidation
registers.

Cc: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220906234934.3655440-12-matthew.d.roper@intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 25bcc828 23-Aug-2022 Matt Roper <matthew.d.roper@intel.com>

drm/i915/dg2: Incorporate Wa_16014892111 into DRAW_WATERMARK tuning

Although register tuning settings are generally implemented via the
workaround infrastructure, it turns out that the DRAW_WATERMARK register
is not properly saved/restored by hardware around power events (i.e.,
RC6 entry) so updates to the value cannot be applied in the usual
manner. New workaround Wa_16014892111 informs us that any tuning
updates to this register must instead be applied via an INDIRECT_CTX
batch buffer. This will ensure that the necessary value is re-applied
when a context begins running, even if an RC6 entry had wiped the
register back to hardware defaults since the last context ran.

Fixes: 6dc85721df74 ("drm/i915/dg2: Add additional tuning settings")
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6642
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220823202449.83727-1-matthew.d.roper@intel.com


# 166c44e6 25-Apr-2022 Chris Wilson <chris.p.wilson@intel.com>

drm/i915/gt: Clear SET_PREDICATE_RESULT prior to executing the ring

Userspace may leave predication enabled upon return from the batch
buffer, which has the consequent of preventing all operation from the
ring from being executed, including all the synchronisation, coherency
control, arbitration and user signaling. This is more than just a local
gpu hang in one client, as the user has the ability to prevent the
kernel from applying critical workarounds and can cause a full GT reset.

We could simply execute MI_SET_PREDICATE upon return from the user
batch, but this has the repercussion of modifying the user's context
state. Instead, we opt to execute a fixup batch which by mixing
predicated operations can determine the state of the
SET_PREDICATE_RESULT register and restore it prior to the next userspace
batch. This allows us to protect the kernel's ring without changing the
uABI.

Suggested-by: Zbigniew Kempczynski <zbigniew.kempczynski@intel.com>
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Zbigniew Kempczynski <zbigniew.kempczynski@intel.com>
Cc: Thomas Hellstrom <thomas.hellstrom@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220425152317.4275-4-ramalingam.c@intel.com


# bb6287cb 01-Apr-2022 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Track context current active time

Track context active (on hardware) status together with the start
timestamp.

This will be used to provide better granularity of context
runtime reporting in conjunction with already tracked pphwsp accumulated
runtime.

The latter is only updated on context save so does not give us visibility
to any currently executing work.

As part of the patch the existing runtime tracking data is moved under the
new ce->stats member and updated under the seqlock. This provides the
ability to atomically read out accumulated plus active runtime.

v2:
* Rename and make __intel_context_get_active_time unlocked.

v3:
* Use GRAPHICS_VER.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> # v1
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220401142205.3123159-6-tvrtko.ursulin@linux.intel.com


# d8b93201 28-Mar-2022 Fei Yang <fei.yang@intel.com>

drm/i915: avoid concurrent writes to aux_inv

GPU hangs have been observed when multiple engines write to the
same aux_inv register at the same time. To avoid this each engine
should only invalidate its own auxiliary table. The function
gen12_emit_flush_xcs() currently invalidate the auxiliary table for
all engines because the rq->engine is not necessarily the engine
eventually carrying out the request, and potentially the engine
could even be a virtual one (with engine->instance being -1).
With the MMIO remap feature, we can actually set bit 17 of MI_LRI
instruction and let the hardware to figure out the local aux_inv
register at runtime to avoid invalidating auxiliary table for all
engines.

Bspec: 45728

v2: Invalidate AUX table for indirect context as well.

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220328171650.1900674-1-fei.yang@intel.com


# ff6b19d3 01-Mar-2022 Matt Roper <matthew.d.roper@intel.com>

drm/i915/xehp: Add compute workarounds

Additional workarounds are required once we start exposing CCS engines.

Note that we have a number of workarounds that update registers in the
shared render/compute reset domain. Historically we've just added such
registers to the RCS engine's workaround list. But going forward we
should be more careful to place such workarounds on a wa_list for an
engine that definitely exists and is not fused off (e.g., a platform
with no RCS would never apply the RCS wa_list). We'll keep
rcs_engine_wa_init() focused on RCS-specific workarounds that only need
to be applied if the RCS engine is present. A separate
general_render_compute_wa_init() function will be used to define
workarounds that touch registers in the shared render/compute reset
domain and that we need to apply regardless of what render and/or
compute engines actually exist. Any workarounds defined in this new
function will internally be added to the first present RCS or CCS
engine's workaround list to ensure they get applied (and only get
applied once rather than being needlessly re-applied several times).

Co-author: Srinivasan Shanmugam
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220301231549.1817978-13-matthew.d.roper@intel.com


# c674c5b9 01-Mar-2022 Matt Roper <matthew.d.roper@intel.com>

drm/i915/xehp: CCS should use RCS setup functions

The compute engine handles the same commands the render engine can
(except 3D pipeline), so it makes sense that CCS is more similar to RCS
than non-render engines.

The CCS context state (lrc) is also similar to the render one, so reuse
it. Note that the compute engine has its own CTX_R_PWR_CLK_STATE
register.

In order to avoid having multiple RCS && CCS checks, add the following
engine flag:
- I915_ENGINE_HAS_RCS_REG_STATE - use the render (larger) reg state ctx.

BSpec: 46260
Original-author: Michel Thierry
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220301231549.1817978-6-matthew.d.roper@intel.com


# 2bb116c7 14-Feb-2022 Jani Nikula <jani.nikula@intel.com>

drm/i915/lrc: replace include with forward declarations

Prefer forward declarations over includes if possible.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220214173810.2108975-2-jani.nikula@intel.com


# dd4821ba 14-Feb-2022 Jani Nikula <jani.nikula@intel.com>

drm/i915/lrc: move lrc_get_runtime() to intel_lrc.c

Move the static inline next to the only caller.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220214173810.2108975-1-jani.nikula@intel.com


# 88d23eda 28-Jan-2022 Ramalingam C <ramalingam.c@intel.com>

drm/i915/dg2: Add Wa_22011450934

An indirect ctx wabb is implemented as per Wa_22011450934 to avoid rcs
restore hang during context restore of a preempted context in GPGPU mode

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Chris Wilson <chris.p.wilson@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220128185209.18077-2-ramalingam.c@intel.com


# 0d6419e9 27-Jan-2022 Matt Roper <matthew.d.roper@intel.com>

drm/i915: Move GT registers to their own header file

This is a huge, chaotic mass of registers copied over as-is without any
real cleanup. We'll come back and organize these better, align on
consistent coding style, remove dead code, etc. in separate patches
later that will be easier to review.

v2:
- Add missing include in intel_pxp_irq.c
v3:
- Correct a few indentation errors (Lucas)
- Minor conflict resolution

Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220127234334.4016964-6-matthew.d.roper@intel.com


# 202b1f4c 10-Jan-2022 Matt Roper <matthew.d.roper@intel.com>

drm/i915/gt: Move engine registers to their own header

Let's continue breaking up and cleaning up the massive i915_reg.h file
by moving all registers that are defined in relation to an engine base
to their own header.

There are probably a bunch of other "engine registers" that we haven't
moved yet (especially those that belong to the render engine in the
0x2??? range), but this is a relatively straightforward first step.

Cc: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220111051600.3429104-8-matthew.d.roper@intel.com


# a88afcfa 22-Dec-2021 Matthew Brost <matthew.brost@intel.com>

drm/i915/execlists: Weak parallel submission support for execlists

A weak implementation of parallel submission (multi-bb execbuf IOCTL) for
execlists. Doing as little as possible to support this interface for
execlists - basically just passing submit fences between each request
generated and virtual engines are not allowed. This is on par with what
is there for the existing (hopefully soon deprecated) bonding interface.

We perma-pin these execlists contexts to align with GuC implementation.

v2:
(John Harrison)
- Drop siblings array as num_siblings must be 1
v3:
(John Harrison)
- Drop single submission
v4:
(John Harrison)
- Actually drop single submission
- Use IS_ERR check on return value from intel_context_create
- Set last request to NULL on unpin

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211222223532.28698-1-matthew.brost@intel.com


# 4b19f6b7 16-Nov-2021 Ramalingam C <ramalingam.c@intel.com>

drm/i915/dg2: Add Wa_16013000631

Invalidate IC cache through pipe control command as part of the ctx
restore flow through indirect ctx pointer.

v2:
- Move pipe control from xcs indirect context to the rcs indirect
context. We'll eventually need this on the CCS engines too, but
support for those hasn't landed yet.

Cc: Chris Wilson <chris.p.wilson@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211116174818.2128062-5-matthew.d.roper@intel.com


# c2aa552f 14-Oct-2021 Matthew Brost <matthew.brost@intel.com>

drm/i915/guc: Add multi-lrc context registration

Add multi-lrc context registration H2G. In addition a workqueue and
process descriptor are setup during multi-lrc context registration as
these data structures are needed for multi-lrc submission.

v2:
(John Harrison)
- Move GuC specific fields into sub-struct
- Clean up WQ defines
- Add comment explaining math to derive WQ / PD address
v3:
(John Harrison)
- Add PARENT_SCRATCH_SIZE define
- Update comment explaining multi-lrc register
v4:
(John Harrison)
- Move PARENT_SCRATCH_SIZE to common file

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-9-matthew.brost@intel.com


# 0d8ee5ba 22-Sep-2021 Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/i915: Don't back up pinned LMEM context images and rings during suspend

Pinned context images are now reset during resume. Don't back them up,
and assuming that rings can be assumed empty at suspend, don't back them
up either.

Introduce a new object flag, I915_BO_ALLOC_PM_VOLATILE meaning that an
object is allowed to lose its content on suspend.

v3:
- Slight documentation clarification (Matthew Auld)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210922062527.865433-7-thomas.hellstrom@linux.intel.com


# ae4b0eac 05-Aug-2021 Akeem G Abodunrin <akeem.g.abodunrin@intel.com>

drm/i915/dg2: Add new LRI reg offsets

New LRI register offsets were introduced for DG2, this patch adds
those extra registers, and create new register table for setting offsets
to compare with HW generated context image - especially for gt_lrc test.
Also updates general purpose register with scratch offset for DG2, in
order to use it for live_lrc_fixed selftest.

Cc: Chris P Wilson <chris.p.wilson@intel.com>
Cc: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Caz Yokoyama <caz.yokoyama@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210805163647.801064-8-matthew.d.roper@intel.com


# 6266992c 28-Jul-2021 Lucas De Marchi <lucas.demarchi@intel.com>

drm/i915/gt: remove GRAPHICS_VER == 10

Replace all remaining handling of GRAPHICS_VER {==,>=} 10 with
{==,>=} 11. With the removal of CNL, there is no platform with graphics
version equals 10.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210728220326.1578242-5-lucas.demarchi@intel.com


# 7fc37efd 21-Jul-2021 Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>

drm/i915/xehp: New engine context offsets

The layout of some engine contexts has changed on Xe_HP. Define the new
offsets.

Bspec: 45585, 46256
Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Venkata Ramana Nayana <venkata.ramana.nayana@intel.com>
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721223043.834562-10-matthew.d.roper@intel.com


# 50a9ea08 21-Jul-2021 Stuart Summers <stuart.summers@intel.com>

drm/i915/xehp: Handle new device context ID format

Xe_HP changes the format of the context ID from past platforms.

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721223043.834562-9-matthew.d.roper@intel.com


# 74e4b909 08-Jul-2021 Jason Ekstrand <jason@jlekstrand.net>

drm/i915: Stop storing the ring size in the ring pointer (v3)

Previously, we were storing the ring size in the ring pointer before it
was actually allocated. We would then guard setting the ring size on
checking for CONTEXT_ALLOC_BIT. This is error-prone at best and really
only saves us a few bytes on something that already burns at least 4K.
Instead, this patch adds a new ring_size field and makes everything use
that.

v2 (Daniel Vetter):
- Replace 512 * SZ_4K with SZ_2M

v2 (Jason Ekstrand):
- Rebase on top of page migration code

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-3-jason@jlekstrand.net


# c816723b 05-Jun-2021 Lucas De Marchi <lucas.demarchi@intel.com>

drm/i915/gt: replace IS_GEN and friends with GRAPHICS_VER

This was done by the following semantic patch:

@@ expression i915; @@
- INTEL_GEN(i915)
+ GRAPHICS_VER(i915)

@@ expression i915; expression E; @@
- INTEL_GEN(i915) >= E
+ GRAPHICS_VER(i915) >= E

@@ expression dev_priv; expression E; @@
- !IS_GEN(dev_priv, E)
+ GRAPHICS_VER(dev_priv) != E

@@ expression dev_priv; expression E; @@
- IS_GEN(dev_priv, E)
+ GRAPHICS_VER(dev_priv) == E

@@
expression dev_priv;
expression from, until;
@@
- IS_GEN_RANGE(dev_priv, from, until)
+ IS_GRAPHICS_VER(dev_priv, from, until)

@def@
expression E;
identifier id =~ "^gen$";
@@
- id = GRAPHICS_VER(E)
+ ver = GRAPHICS_VER(E)

@@
identifier def.id;
@@
- id
+ ver

It also takes care of renaming the variable we assign to GRAPHICS_VER()
so to use "ver" rather than "gen".

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210605155356.4183026-2-lucas.demarchi@intel.com


# fa85bfd1 27-Apr-2021 Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>

drm/i915: Update the helper to set correct mapping

Determine the possible coherent map type based on object location,
and if target has llc or if user requires an always coherent
mapping.

Cc: Matthew Auld <matthew.auld@intel.com>
Cc: CQ Tang <cq.tang@intel.com>
Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210427085417.120246-2-matthew.auld@intel.com


# ba485bc8 27-Jan-2021 Matthew Auld <matthew.auld@intel.com>

drm/i915: allocate context from LMEM

Prefer allocating the context from LMEM on dgfx.

Based on a patch from Michel Thierry.

v2: flatten the chain

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20210127131417.393872-6-matthew.auld@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5ace5e96 23-Mar-2021 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Make lrc_init_wa_ctx compatible with ww locking, v3.

Make creation separate from pinning, in order to take the lock only
once, and pin the mapping with the lock held.

Changes since v1:
- Rebase on top of upstream changes.
Changes since v2:
- Fully clear wa_ctx on error.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-27-maarten.lankhorst@linux.intel.com


# 9834dfef 13-Jan-2021 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Prune inlines

Remove all the manual inlines from non-critical sections in gt/

add/remove: 2/0 grow/shrink: 0/3 up/down: 762/-1473 (-711)
Function old new delta
mi_set_context.isra - 602 +602
write_dma_entry - 160 +160
__set_pd_entry 214 69 -145
clear_pd_entry 190 42 -148
ring_request_alloc 2021 841 -1180
Total: Before=1605086, After=1604375, chg -0.04%

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210113152224.29794-1-chris@chris-wilson.co.uk


# a42f4dd2 09-Jan-2021 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Remove unused function 'dword_in_page'

>> drivers/gpu/drm/i915/gt/intel_lrc.c:17:28: error: unused function 'dword_in_page' [-Werror,-Wunused-function]
static inline unsigned int dword_in_page(void *addr)

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210109163455.28466-1-chris@chris-wilson.co.uk


# 9a437ccb 09-Jan-2021 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Exercise lrc_wa_ctx initialisation failure

Inject a fault into lrc_init_wa_ctx() to ensure that we can tolerate a
failure to construct the workarounds.

v2: Avoid mentioning an error for fault-injection, other CI will
complain about the dmesg spam.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210109114453.27798-1-chris@chris-wilson.co.uk


# 5b4dc95c 08-Jan-2021 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Prevent use of engine->wa_ctx after error

On error we unpin and free the wa_ctx.vma, but do not clear any of the
derived flags. During lrc_init, we look at the flags and attempt to
dereference the wa_ctx.vma if they are set. To protect the error path
where we try to limp along without the wa_ctx, make sure we clear those
flags!

Reported-by: Matt Roper <matthew.d.roper@intel.com>
Fixes: 604a8f6f1e33 ("drm/i915/lrc: Only enable per-context and per-bb buffers if set")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.15+
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210108204026.20682-1-chris@chris-wilson.co.uk


# 093a0bea 31-Dec-2020 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Populate logical context during first pin.

This allows us to remove pin_map from state allocation, which saves
us a few retry loops. We won't need this until first pin, anyway.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201231170405.22843-1-chris@chris-wilson.co.uk


# b436a5f8 22-Dec-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Track all timelines created using the HWSP

We assume that the contents of the HWSP are lost across suspend, and so
upon resume we must restore critical values such as the timeline seqno.
Keep track of every timeline allocated that uses the HWSP as its storage
and so we can then reset all seqno values by walking that list.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201222104242.10993-1-chris@chris-wilson.co.uk


# a0d3fdb6 18-Dec-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Split logical ring contexts from execlist submission

Split the definition, construction and updating of the Logical Ring
Context from the execlist submission interface. The LRC is used by the
HW, irrespective of our different submission backends.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201219020343.22681-1-chris@chris-wilson.co.uk


# f867b66e 04-Dec-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Clear the execlists timers upon reset

Across a reset, we stop the engine but not the timers. This leaves a
window where the timers have inconsistent state with the engine, but
should only result in a spurious timeout. As we cancel the outstanding
events, also cancel their timers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201204151234.19729-4-chris@chris-wilson.co.uk


# d997e240 04-Dec-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Cancel the preemption timeout on responding to it

We currently presume that the engine reset is successful, cancelling the
expired preemption timer in the process. However, engine resets can
fail, leaving the timeout still pending and we will then respond to the
timeout again next time the tasklet fires. What we want is for the
failed engine reset to be promoted to a full device reset, which is
kicked by the heartbeat once the engine stops processing events.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1168
Fixes: 3a7a92aba8fb ("drm/i915/execlists: Force preemption")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201204151234.19729-2-chris@chris-wilson.co.uk


# b9695405 04-Dec-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Ignore repeated attempts to suspend request flow across reset

Before reseting the engine, we suspend the execution of the guilty
request, so that we can continue execution with a new context while we
slowly compress the captured error state for the guilty context. However,
if the reset fails, we will promptly attempt to reset the same request
again, and discover the ongoing capture. Ignore the second attempt to
suspend and capture the same request.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1168
Fixes: 32ff621fd744 ("drm/i915/gt: Allow temporary suspension of inflight requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <stable@vger.kernel.org> # v5.7+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201204151234.19729-1-chris@chris-wilson.co.uk


# a5855989 26-Nov-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Check for a completed last request once

Pull the repeated check for the last active request being completed to a
single spot, when deciding whether or not execlist preemption is
required.

In doing so, we remove the tasklet kick, introduced with the completion
checks in commit 35f3fd8182ba ("drm/i915/execlists: Workaround switching
back to a completed context"), if we find the request was completed but
have not yet seen the corresponding CS event. This was devolving into a
busy spin of the tasklet while we waited for the event as the delivery
was not as instantaneous as expected. Under load this is sufficient to
exhaust the tasklet softirq timeslice, and force ksoftirqd. Quite
noticeable overhead for no apparent improvement in latency.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201126140407.31952-2-chris@chris-wilson.co.uk


# b8e2bd98 26-Nov-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Decouple completed requests on unwind

Since the introduction of preempt-to-busy, requests can complete in the
background, even while they are not on the engine->active.requests list.
As such, the engine->active.request list itself is not in strict
retirement order, and we have to scan the entire list while unwinding to
not miss any. However, if the request is completed we currently leave it
on the list [until retirement], but we could just as simply remove it
and stop treating it as active. We would only have to then traverse it
once while unwinding in quick succession.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201126140407.31952-1-chris@chris-wilson.co.uk


# 46eecfcc 23-Nov-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Free stale request on destroying the virtual engine

Since preempt-to-busy, we may unsubmit a request while it is still on
the HW and completes asynchronously. That means it may be retired and in
the process destroy the virtual engine (as the user has closed their
context), but that engine may still be holding onto the unsubmitted
compelted request. Therefore we need to potentially cleanup the old
request on destroying the virtual engine. We also have to keep the
virtual_engine alive until after the sibling's execlists_dequeue() have
finished peeking into the virtual engines, for which we serialise with
RCU.

v2: Be paranoid and flush the tasklet as well.
v3: And flush the tasklet before the engines, as the tasklet may
re-attach an rb_node after our removal from the siblings.

Fixes: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201123113717.20500-4-chris@chris-wilson.co.uk


# b5b349b9 19-Nov-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Lift waiter/signaler iterators

Lift the list iteration defines for traversing the signaler/waiter lists
into i915_scheduler.h for reuse.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201119165616.10834-5-chris@chris-wilson.co.uk


# 562675d0 19-Nov-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Update request status flags for debug pretty-printer

We plan to expand upon the number of available statuses for when we
pretty-print the requests along the timelines, and so need a new set of
flags. We have settled upon:

Unready [U]
- initial status after being submitted, the request is not
ready for execution as it is waiting for external fences

Ready [R]
- all fences the request was waiting on have been signaled,
and the request is now ready for execution and will be
in a backend queue

- a ready request may still need to wait on semaphores
[internal fences]

Ready/virtual [V]
- same as ready, but queued over multiple backends

Executing [E]
- the request has been transferred from the backend queue and
submitted for execution on HW

- a completed request may still be regarded as executing, its
status may not be updated until it is retired and removed
from the lists

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201119165616.10834-3-chris@chris-wilson.co.uk


# 1f0e785a 19-Nov-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Lift i915_request_show()

Extract i915_request_show for reuse in other request chain pretty
printers.

For a bonus point, quietly change the seqno format from %llx to %lld to
match everywhere else.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201119165616.10834-2-chris@chris-wilson.co.uk


# 45e50f48 18-Nov-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Remember to free the virtual breadcrumbs

Since we allocate some breadcrumbs for the virtual engine, and the
virtual engine has a custom destructor, we also need to free the
breadcrumbs after use.

Fixes: b3786b29379c ("drm/i915/gt: Distinguish the virtual breadcrumbs from the irq breadcrumbs")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201118133839.1783-1-chris@chris-wilson.co.uk


# d33fcd79 17-Nov-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Ignore dt==0 for reporting underflows

The presumption was that some time would always elapse between recording
the start and the finish of a context switch. This turns out to be a
regular occurrence and emitting a debug statement superfluous.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201117113103.21480-4-chris@chris-wilson.co.uk


# 488751a0 18-Jan-2021 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Prevent use of engine->wa_ctx after error

On error we unpin and free the wa_ctx.vma, but do not clear any of the
derived flags. During lrc_init, we look at the flags and attempt to
dereference the wa_ctx.vma if they are set. To protect the error path
where we try to limp along without the wa_ctx, make sure we clear those
flags!

Reported-by: Matt Roper <matthew.d.roper@intel.com>
Fixes: 604a8f6f1e33 ("drm/i915/lrc: Only enable per-context and per-bb buffers if set")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.15+
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210108204026.20682-1-chris@chris-wilson.co.uk
(cherry-picked from 5b4dc95cf7f573e927fbbd406ebe54225d41b9b2)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210118095332.458813-1-chris@chris-wilson.co.uk


# 0fe8bf4d 04-Dec-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Cancel the preemption timeout on responding to it

We currently presume that the engine reset is successful, cancelling the
expired preemption timer in the process. However, engine resets can
fail, leaving the timeout still pending and we will then respond to the
timeout again next time the tasklet fires. What we want is for the
failed engine reset to be promoted to a full device reset, which is
kicked by the heartbeat once the engine stops processing events.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1168
Fixes: 3a7a92aba8fb ("drm/i915/execlists: Force preemption")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201204151234.19729-2-chris@chris-wilson.co.uk
(cherry picked from commit d997e240ceecb4f732611985d3a939ad1bfc1893)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 5419d93f 04-Dec-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Ignore repeated attempts to suspend request flow across reset

Before reseting the engine, we suspend the execution of the guilty
request, so that we can continue execution with a new context while we
slowly compress the captured error state for the guilty context. However,
if the reset fails, we will promptly attempt to reset the same request
again, and discover the ongoing capture. Ignore the second attempt to
suspend and capture the same request.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1168
Fixes: 32ff621fd744 ("drm/i915/gt: Allow temporary suspension of inflight requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <stable@vger.kernel.org> # v5.7+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201204151234.19729-1-chris@chris-wilson.co.uk
(cherry picked from commit b969540500bce60cf1cdfff5464388af32b9a553)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 280ffdb6 23-Nov-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Free stale request on destroying the virtual engine

Since preempt-to-busy, we may unsubmit a request while it is still on
the HW and completes asynchronously. That means it may be retired and in
the process destroy the virtual engine (as the user has closed their
context), but that engine may still be holding onto the unsubmitted
compelted request. Therefore we need to potentially cleanup the old
request on destroying the virtual engine. We also have to keep the
virtual_engine alive until after the sibling's execlists_dequeue() have
finished peeking into the virtual engines, for which we serialise with
RCU.

v2: Be paranoid and flush the tasklet as well.
v3: And flush the tasklet before the engines, as the tasklet may
re-attach an rb_node after our removal from the siblings.

Fixes: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201123113717.20500-4-chris@chris-wilson.co.uk
(cherry picked from commit 46eecfccb4c2b0f258adbafb2e53ca3b822cd663)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# b4ca4354 18-Nov-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Remember to free the virtual breadcrumbs

Since we allocate some breadcrumbs for the virtual engine, and the
virtual engine has a custom destructor, we also need to free the
breadcrumbs after use.

Fixes: b3786b29379c ("drm/i915/gt: Distinguish the virtual breadcrumbs from the irq breadcrumbs")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201118133839.1783-1-chris@chris-wilson.co.uk
(cherry picked from commit 45e50f48b7907e650cfbbc7879abfe3a0c419c73)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# bda30024 04-Nov-2020 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Improve record of hung engines in error state

Between events which trigger engine and GPU resets and capturing the error
state we lose information on which engine triggered the reset. Improve
this by passing in the hung engine mask down to error capture.

Result is that the list of engines in user visible "GPU HANG: ecode
<gen>:<engines>:<ecode>, <process>" is now a list of hanging and not just
active engines. Most importantly the displayed process is now the one
which was actually hung.

v2:
* Stub prototype. (Chris)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201104134743.916027-1-tvrtko.ursulin@linux.intel.com


# e67d01d8 02-Nov-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Flush xcs before tgl breadcrumbs

In a simple test case that writes to scratch and then busy-waits for the
batch to be signaled, we observe that the signal is before the write is
posted. That is bad news.

Splitting the flush + write_dword into two separate flush_dw prevents
the issue from being reproduced, we can presume the post-sync op is not
so post-sync.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/216
Testcase: igt/gem_exec_fence/parallel
Testcase: igt/i915_selftest/live/gt_timelines
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201102221057.29626-2-chris@chris-wilson.co.uk
(cherry picked from commit 09212e81e5450743e5b06b27c4e344e4c45b630d)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 8ce70996 22-Oct-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Use the local HWSP offset during submission

We wrap the timeline on construction of the next request, but there may
still be requests in flight that have not yet finalized the breadcrumb.
(The breadcrumb is delayed as we need engine-local offsets, and for the
virtual engine that is not known until execution.) As such, by the time
we write to the timeline's HWSP offset it may have changed, and we
should use the value we preserved in the request instead.

Though the window is small and infrequent (at full flow we can expect a
timeline's seqno to wrap once every 30 minutes), the impact of writing
the old seqno into the new HWSP is severe: the old requests are never
completed, and the new requests are completed before they are even
submitted.

Fixes: ebece7539242 ("drm/i915: Keep timeline HWSP allocated until idle across the system")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.2+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201022064127.10159-1-chris@chris-wilson.co.uk
(cherry picked from commit c10f6019d0b2dc8a6a62b55459f3ada5bc4e5e1a)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 09212e81 02-Nov-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Flush xcs before tgl breadcrumbs

In a simple test case that writes to scratch and then busy-waits for the
batch to be signaled, we observe that the signal is before the write is
posted. That is bad news.

Splitting the flush + write_dword into two separate flush_dw prevents
the issue from being reproduced, we can presume the post-sync op is not
so post-sync.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/216
Testcase: igt/gem_exec_fence/parallel
Testcase: igt/i915_selftest/live/gt_timelines
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201102221057.29626-2-chris@chris-wilson.co.uk


# c10f6019 22-Oct-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Use the local HWSP offset during submission

We wrap the timeline on construction of the next request, but there may
still be requests in flight that have not yet finalized the breadcrumb.
(The breadcrumb is delayed as we need engine-local offsets, and for the
virtual engine that is not known until execution.) As such, by the time
we write to the timeline's HWSP offset it may have changed, and we
should use the value we preserved in the request instead.

Though the window is small and infrequent (at full flow we can expect a
timeline's seqno to wrap once every 30 minutes), the impact of writing
the old seqno into the new HWSP is severe: the old requests are never
completed, and the new requests are completed before they are even
submitted.

Fixes: ebece7539242 ("drm/i915: Keep timeline HWSP allocated until idle across the system")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.2+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201022064127.10159-1-chris@chris-wilson.co.uk


# 4a9bb58a 15-Sep-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Wait for CSB entries on Tigerlake

On Tigerlake, we are seeing a repeat of commit d8f505311717 ("drm/i915/icl:
Forcibly evict stale csb entries") where, presumably, due to a missing
Global Observation Point synchronisation, the write pointer of the CSB
ringbuffer is updated _prior_ to the contents of the ringbuffer. That is
we see the GPU report more context-switch entries for us to parse, but
those entries have not been written, leading us to process stale events,
and eventually report a hung GPU.

However, this effect appears to be much more severe than we previously
saw on Icelake (though it might be best if we try the same approach
there as well and measure), and Bruce suggested the good idea of resetting
the CSB entry after use so that we can detect when it has been updated by
the GPU. By instrumenting how long that may be, we can set a reliable
upper bound for how long we should wait for:

513 late, avg of 61 retries (590 ns), max of 1061 retries (10099 ns)

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045
References: d8f505311717 ("drm/i915/icl: Forcibly evict stale csb entries")
References: HSDES#22011327657, HSDES#1508287568
Suggested-by: Bruce Chang <yu.bruce.chang@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org # v5.4
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-2-chris@chris-wilson.co.uk
(cherry picked from commit 233c1ae3c83f21046c6c4083da904163ece8f110)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# ca05277e 15-Sep-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Widen CSB pointer to u64 for the parsers

A CSB entry is 64b, and it is simpler for us to treat it as an array of
64b entries than as an array of pairs of 32b entries.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-1-chris@chris-wilson.co.uk
(cherry picked from commit f24a44e52fbc9881fc5f3bcef536831a15a439f3)
(cherry picked from commit 3d4dbe0e0f0d04ebcea917b7279586817da8cf46)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 64402570 02-Oct-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Undo forced context restores after trivial preemptions

We may try to preempt the currently executing request, only to find that
after unravelling all the dependencies that the original executing
context is still the earliest in the topological sort and re-submitted
back to HW (if we do detect some change in the ELSP that requires
re-submission). However, due to the way we check for wrap-around during
the unravelling, we mark any context that has been submitted just once
(i.e. with the rq->wa_tail set, but the ring->tail earlier) as
potentially wrapping and requiring a forced restore on resubmission.
This was expected to be not a problem, as it was anticipated that most
unwinding for preemption would result in a context switch and the few
that did not would be lost in the noise. It did not take long for
someone to find one particular workload where the cost of those extra
context restores was measurable.

However, since we know the wa_tail is of fixed size, and we know that a
request must be larger than the wa_tail itself, we can safely maintain
the check for request wrapping and check against a slightly future point
in the ring that includes an expected wa_tail. (That is if the
ring->tail is already set to rq->wa_tail, including another 8 bytes in
the check does not invalidate the incremental wrap detection.)

Fixes: 8ab3a3812aa9 ("drm/i915/gt: Incrementally check for rewinding")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201002083425.4605-1-chris@chris-wilson.co.uk
(cherry picked from commit bb65548e3c6e299175a9e8c3e24b2b9577656a5d)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 9b99e5ba 15-Oct-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Delay execlist processing for tgl

When running gem_exec_nop, it floods the system with many requests (with
the goal of userspace submitting faster than the HW can process a single
empty batch). This causes the driver to continually resubmit new
requests onto the end of an active context, a flood of lite-restore
preemptions. If we time this just right, Tigerlake hangs.

Inserting a small delay between the processing of CS events and
submitting the next context, prevents the hang. Naturally it does not
occur with debugging enabled. The suspicion then is that this is related
to the issues with the CS event buffer, and inserting an mmio read of
the CS pointer status appears to be very successful in preventing the
hang. Other registers, or uncached reads, or plain mb, do not prevent
the hang, suggesting that register is key -- but that the hang can be
prevented by a simple udelay, suggests it is just a timing issue like
that encountered by commit 233c1ae3c83f ("drm/i915/gt: Wait for CSB
entries on Tigerlake"). Also note that the hang is not prevented by
applying CTX_DESC_FORCE_RESTORE, or by inserting a delay on the GPU
between requests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201015195023.32346-1-chris@chris-wilson.co.uk
(cherry picked from commit 6ca7217dffaf1abba91558e67a2efb655ac91405)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 89db9537 15-Oct-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Confirm the context survives execution

Repeat our sanitychecks from before execution to after execution. One
expects that if we were to see these, the gpu would already be on fire,
but the timing may be informative.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201015190816.31763-1-chris@chris-wilson.co.uk


# bb65548e 02-Oct-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Undo forced context restores after trivial preemptions

We may try to preempt the currently executing request, only to find that
after unravelling all the dependencies that the original executing
context is still the earliest in the topological sort and re-submitted
back to HW (if we do detect some change in the ELSP that requires
re-submission). However, due to the way we check for wrap-around during
the unravelling, we mark any context that has been submitted just once
(i.e. with the rq->wa_tail set, but the ring->tail earlier) as
potentially wrapping and requiring a forced restore on resubmission.
This was expected to be not a problem, as it was anticipated that most
unwinding for preemption would result in a context switch and the few
that did not would be lost in the noise. It did not take long for
someone to find one particular workload where the cost of those extra
context restores was measurable.

However, since we know the wa_tail is of fixed size, and we know that a
request must be larger than the wa_tail itself, we can safely maintain
the check for request wrapping and check against a slightly future point
in the ring that includes an expected wa_tail. (That is if the
ring->tail is already set to rq->wa_tail, including another 8 bytes in
the check does not invalidate the incremental wrap detection.)

Fixes: 8ab3a3812aa9 ("drm/i915/gt: Incrementally check for rewinding")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201002083425.4605-1-chris@chris-wilson.co.uk


# 6ca7217d 15-Oct-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Delay execlist processing for tgl

When running gem_exec_nop, it floods the system with many requests (with
the goal of userspace submitting faster than the HW can process a single
empty batch). This causes the driver to continually resubmit new
requests onto the end of an active context, a flood of lite-restore
preemptions. If we time this just right, Tigerlake hangs.

Inserting a small delay between the processing of CS events and
submitting the next context, prevents the hang. Naturally it does not
occur with debugging enabled. The suspicion then is that this is related
to the issues with the CS event buffer, and inserting an mmio read of
the CS pointer status appears to be very successful in preventing the
hang. Other registers, or uncached reads, or plain mb, do not prevent
the hang, suggesting that register is key -- but that the hang can be
prevented by a simple udelay, suggests it is just a timing issue like
that encountered by commit 233c1ae3c83f ("drm/i915/gt: Wait for CSB
entries on Tigerlake"). Also note that the hang is not prevented by
applying CTX_DESC_FORCE_RESTORE, or by inserting a delay on the GPU
between requests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201015195023.32346-1-chris@chris-wilson.co.uk


# 5e39b4d9 30-Sep-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Signal cancelled requests

After marking the requests on an engine as cancelled upon wedging, send
any signals for their completions.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200930163253.2789-1-chris@chris-wilson.co.uk


# 29545e5c 26-Aug-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Remove defunct intel_virtual_engine_get_sibling()

As the last user was eliminated in commit e21fecdcde40 ("drm/i915/gt:
Distinguish the virtual breadcrumbs from the irq breadcrumbs"), we can
remove the function. One less implementation detail creeping beyond its
scope.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200826132811.17577-16-chris@chris-wilson.co.uk


# b82a8b93 16-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Be wary of data races when reading the active execlists

To implement preempt-to-busy (and so efficient timeslicing and best utilization
of the hardware submission ports) we let the GPU run asynchronously in respect
to the ELSP submission queue. This created challenges in keeping and accessing
the driver state mirroring the asynchronous GPU execution.

The latest occurence of this was spotted by KCSAN:

[ 1413.563200] BUG: KCSAN: data-race in __await_execution+0x217/0x370 [i915]
[ 1413.563221]
[ 1413.563236] race at unknown origin, with read to 0xffff88885bb6c478 of 8 bytes by task 9654 on cpu 1:
[ 1413.563548] __await_execution+0x217/0x370 [i915]
[ 1413.563891] i915_request_await_dma_fence+0x4eb/0x6a0 [i915]
[ 1413.564235] i915_request_await_object+0x421/0x490 [i915]
[ 1413.564577] i915_gem_do_execbuffer+0x29b7/0x3c40 [i915]
[ 1413.564967] i915_gem_execbuffer2_ioctl+0x22f/0x5c0 [i915]
[ 1413.564998] drm_ioctl_kernel+0x156/0x1b0
[ 1413.565022] drm_ioctl+0x2ff/0x480
[ 1413.565046] __x64_sys_ioctl+0x87/0xd0
[ 1413.565069] do_syscall_64+0x4d/0x80
[ 1413.565094] entry_SYSCALL_64_after_hwframe+0x44/0xa9

To complicate matters, we have to both avoid the read tearing of *active and
avoid any write tearing as perform the pending[] -> inflight[] promotion of the
execlists.

This is because we cannot rely on the memcpy doing u64 aligned copies on all
kernels/platforms and so we opt to open-code it with explicit WRITE_ONCE
annotations to satisfy KCSAN.

v2: When in doubt, write the same comment again.
v3: Expanded commit message.

Fixes: b55230e5e800 ("drm/i915: Check for awaits on still currently executing requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200716142207.13003-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Joonas: Rebased and reordered into drm-intel-gt-next branch]
[Joonas: Added expanded commit message from Tvrtko and Chris]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit b4d9145b0154f8c71dafc2db5fd445f1f3db9426)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 4ff64bcf 15-Sep-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Use a mmio read of the CSB in case of failure

If we find the GPU didn't update the CSB within 50us, we currently fail
and eventually reset the GPU. Lets report the value from the mmio space
as a last resort, it may just stave off an unnecessary GPU reset.

References: HSDES#22011327657
Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-4-chris@chris-wilson.co.uk


# 884c4074 15-Sep-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Apply the CSB w/a for all

Since we expect to inline the csb_parse() routines, the w/a for the
stale CSB data on Tigerlake will be pulled into process_csb(), and so we
might as well simply reuse the logic for all, and so will hopefully
avoid any strange behaviour on Icelake that was not covered by our
previous w/a.

References: d8f505311717 ("drm/i915/icl: Forcibly evict stale csb entries")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-3-chris@chris-wilson.co.uk


# 233c1ae3 15-Sep-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Wait for CSB entries on Tigerlake

On Tigerlake, we are seeing a repeat of commit d8f505311717 ("drm/i915/icl:
Forcibly evict stale csb entries") where, presumably, due to a missing
Global Observation Point synchronisation, the write pointer of the CSB
ringbuffer is updated _prior_ to the contents of the ringbuffer. That is
we see the GPU report more context-switch entries for us to parse, but
those entries have not been written, leading us to process stale events,
and eventually report a hung GPU.

However, this effect appears to be much more severe than we previously
saw on Icelake (though it might be best if we try the same approach
there as well and measure), and Bruce suggested the good idea of resetting
the CSB entry after use so that we can detect when it has been updated by
the GPU. By instrumenting how long that may be, we can set a reliable
upper bound for how long we should wait for:

513 late, avg of 61 retries (590 ns), max of 1061 retries (10099 ns)

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045
References: d8f505311717 ("drm/i915/icl: Forcibly evict stale csb entries")
References: HSDES#22011327657, HSDES#1508287568
Suggested-by: Bruce Chang <yu.bruce.chang@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org # v5.4
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-2-chris@chris-wilson.co.uk


# f24a44e5 15-Sep-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Widen CSB pointer to u64 for the parsers

A CSB entry is 64b, and it is simpler for us to treat it as an array of
64b entries than as an array of pairs of 32b entries.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-1-chris@chris-wilson.co.uk


# b4d9145b 16-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Be wary of data races when reading the active execlists

To implement preempt-to-busy (and so efficient timeslicing and best utilization
of the hardware submission ports) we let the GPU run asynchronously in respect
to the ELSP submission queue. This created challenges in keeping and accessing
the driver state mirroring the asynchronous GPU execution.

The latest occurence of this was spotted by KCSAN:

[ 1413.563200] BUG: KCSAN: data-race in __await_execution+0x217/0x370 [i915]
[ 1413.563221]
[ 1413.563236] race at unknown origin, with read to 0xffff88885bb6c478 of 8 bytes by task 9654 on cpu 1:
[ 1413.563548] __await_execution+0x217/0x370 [i915]
[ 1413.563891] i915_request_await_dma_fence+0x4eb/0x6a0 [i915]
[ 1413.564235] i915_request_await_object+0x421/0x490 [i915]
[ 1413.564577] i915_gem_do_execbuffer+0x29b7/0x3c40 [i915]
[ 1413.564967] i915_gem_execbuffer2_ioctl+0x22f/0x5c0 [i915]
[ 1413.564998] drm_ioctl_kernel+0x156/0x1b0
[ 1413.565022] drm_ioctl+0x2ff/0x480
[ 1413.565046] __x64_sys_ioctl+0x87/0xd0
[ 1413.565069] do_syscall_64+0x4d/0x80
[ 1413.565094] entry_SYSCALL_64_after_hwframe+0x44/0xa9

To complicate matters, we have to both avoid the read tearing of *active and
avoid any write tearing as perform the pending[] -> inflight[] promotion of the
execlists.

This is because we cannot rely on the memcpy doing u64 aligned copies on all
kernels/platforms and so we opt to open-code it with explicit WRITE_ONCE
annotations to satisfy KCSAN.

v2: When in doubt, write the same comment again.
v3: Expanded commit message.

Fixes: b55230e5e800 ("drm/i915: Check for awaits on still currently executing requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200716142207.13003-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Joonas: Rebased and reordered into drm-intel-gt-next branch]
[Joonas: Added expanded commit message from Tvrtko and Chris]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 47b08693 19-Aug-2020 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Make sure execbuffer always passes ww state to i915_vma_pin.

As a preparation step for full object locking and wait/wound handling
during pin and object mapping, ensure that we always pass the ww context
in i915_gem_execbuffer.c to i915_vma_pin, use lockdep to ensure this
happens.

This also requires changing the order of eb_parse slightly, to ensure
we pass ww at a point where we could still handle -EDEADLK safely.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-15-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 3999a708 19-Aug-2020 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Rework intel_context pinning to do everything outside of pin_mutex

Instead of doing everything inside of pin_mutex, we move all pinning
outside. Because i915_active has its own reference counting and
pinning is also having the same issues vs mutexes, we make sure
everything is pinned first, so the pinning in i915_active only needs
to bump refcounts. This allows us to take pin refcounts correctly
all the time.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-14-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# b3786b29 31-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Distinguish the virtual breadcrumbs from the irq breadcrumbs

On the virtual engines, we only use the intel_breadcrumbs for tracking
signaling of stale breadcrumbs from the irq_workers. They do not have
any associated interrupt handling, active requests are passed to a
physical engine and associated breadcrumb interrupt handler. This causes
issues for us as we need to ensure that we do not actually try and
enable interrupts and the powermanagement required for them on the
virtual engine, as they will never be disabled. Instead, let's
specify the physical engine used for interrupt handler on a particular
breadcrumb.

v2: Drop b->irq_armed = true mocking for no interrupt HW

Fixes: 4fe6abb8f513 ("drm/i915/gt: Ignore irq enabling on the virtual engines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-4-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 56f581ba 31-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Only transfer the virtual context to the new engine if active

One more complication of preempt-to-busy with respect to the virtual
engine is that we may have retired the last request along the virtual
engine at the same time as preparing to submit the completed request to
a new engine. That submit will be shortcircuited, but not before we have
updated the context with the new register offsets and marked the virtual
engine as bound to the new engine (by calling swap on ve->siblings[]).
As we may have just retired the completed request, we may also be in the
middle of calling virtual_context_exit() to turn off the power management
associated with the virtual engine, and that in turn walks the
ve->siblings[]. If we happen to call swap() on the array as we walk, we
will call intel_engine_pm_put() twice on the same engine.

In this patch, we prevent this by only updating the bound engine after a
successful submission which weeds out the already completed requests.

Alternatively, we could walk a non-volatile array for the pm, such as
using the engine->mask. The small advantage to performing the update
after the submit is that we then only have to do a swap for active
requests.

Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
References: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine"
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: "Nayana, Venkata Ramana" <venkata.ramana.nayana@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-3-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 2854d866 31-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbs

After staring at the breadcrumb enabling/cancellation and coming to the
conclusion that the cause of the mysterious stale breadcrumbs must the
act of submitting a completed requests, we can then redirect those
completed requests onto a dedicated signaled_list at the time of
construction and so eliminate intel_engine_transfer_stale_breadcrumbs().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-2-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# c18636f7 31-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove requirement for holding i915_request.lock for breadcrumbs

Since the breadcrumb enabling/cancelling itself is serialised by the
breadcrumbs.irq_lock, with a bit of care we can remove the outer
serialisation with i915_request.lock for concurrent
dma_fence_enable_signaling(). This has the important side-effect of
eliminating the nested i915_request.lock within request submission.

The challenge in serialisation is around the unsubmission where we take
an active request that wants a breadcrumb on the signaling engine and
put it to sleep. We do not want a concurrent
dma_fence_enable_signaling() to attach a breadcrumb as we unsubmit, so
we must mark the request as no longer active before serialising with the
concurrent enable-signaling.

On retire, we serialise with the concurrent enable-signaling, but
instead of clearing ACTIVE, we mark it as SIGNALED.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Joonas: Rebased and reordered into drm-intel-gt-next branch]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# d1bf5dd8 30-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Support multiple pinned timelines

We may need to allocate more than one pinned context/timeline for each
engine which can utilise the per-engine HWSP, so we need to give each
a different offset within it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200730183906.25422-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# a817c891 28-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Disable preparser around xcs invalidations on tgl

Unlike rcs where we have conclusive evidence from our selftesting that
disabling the preparser before performing the TLB invalidate and
relocations does impact upon the GPU execution, the evidence for the
same requirement on xcs is much more circumstantial. Let's apply the
preparser disable between batches as we invalidate the TLB as a dose of
healthy paranoia, just in case.

References: https://gitlab.freedesktop.org/drm/intel/-/issues/2169
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200728152110.830-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 96c5a15f 10-Aug-2020 Matt Roper <matthew.d.roper@intel.com>

drm/i915/kbl: Fix revision ID checks

We usually assume that increasing PCI device revision ID's translates to
newer steppings; macros like IS_KBL_REVID() that we use rely on this
behavior. Unfortunately this turns out to not be true on KBL; the
newer device 2 revision ID's sometimes go backward to older steppings.
The situation is further complicated by different GT and display
steppings associated with each revision ID.

Let's work around this by providing a table to map the revision ID to
specific GT and display steppings, and then perform our comparisons on
the mapped values.

v2:
- Move the kbl_revids[] array to intel_workarounds.c to avoid compiler
warnings about an unused variable in files that don't call the
macros (kernel test robot).

Bspec: 18329
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200811032105.2819370-1-matthew.d.roper@intel.com
Reviewed-by: Swathi Dhanavanthri <swathi.dhanavanthri@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 3f649ab7 03-Jun-2020 Kees Cook <keescook@chromium.org>

treewide: Remove uninitialized_var() usage

Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:

git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
xargs perl -pi -e \
's/\buninitialized_var\(([^\)]+)\)/\1/g;
s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.

No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: Kees Cook <keescook@chromium.org>


# 110f9efa 13-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Only swap to a random sibling once upon creation

The danger in switching at random upon intel_context_pin is that the
context may still actually be inflight, as it will not be scheduled out
until a context switch after it is complete -- that may be a long time
after we do a final intel_context_unpin.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2118
Fixes: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.3+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200713160549.17344-1-chris@chris-wilson.co.uk
(cherry picked from commit 90a987205c6cf74116a102ed446d22d92cdaf915)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 858f1299 11-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Ignore irq enabling on the virtual engines

We do not use the virtual engines for interrupts (they have physical
components), but we do use them to decouple the fence signaling during
submission. Currently, when we submit a completed request, we try to
enable the interrupt handler for the virtual engine, but we never disarm
it. A quick fix is then to mark the irq as enabled, and it will then
remain enabled -- and this prevents us from waking the device and never
letting it sleep again.

Fixes: f8db4d051b5e ("drm/i915: Initialise breadcrumb lists on the virtual engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200711203236.12330-1-chris@chris-wilson.co.uk
(cherry picked from commit 4fe6abb8f51355224808ab02a9febf65d184c40b)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 90a98720 13-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Only swap to a random sibling once upon creation

The danger in switching at random upon intel_context_pin is that the
context may still actually be inflight, as it will not be scheduled out
until a context switch after it is complete -- that may be a long time
after we do a final intel_context_unpin.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2118
Fixes: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.3+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200713160549.17344-1-chris@chris-wilson.co.uk


# 4fe6abb8 11-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Ignore irq enabling on the virtual engines

We do not use the virtual engines for interrupts (they have physical
components), but we do use them to decouple the fence signaling during
submission. Currently, when we submit a completed request, we try to
enable the interrupt handler for the virtual engine, but we never disarm
it. A quick fix is then to mark the irq as enabled, and it will then
remain enabled -- and this prevents us from waking the device and never
letting it sleep again.

Fixes: f8db4d051b5e ("drm/i915: Initialise breadcrumb lists on the virtual engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200711203236.12330-1-chris@chris-wilson.co.uk


# 2730055d 11-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Always reset the engine, even if inactive, on execlists failure

If something has gone awry with the CSB processing, we need to pause,
unwind and restart the request submission and event processing. However,
currently we skip the engine reset if we raise an error but discover no
active context, in the mistaken belief that it was merely a glitch in
the matrix. The glitches are real enough, and we do need to unwind even
if the engine appears idle (as it has gone permanently idle!) The
simplest way to unwind and recover is simply do the engine reset, which
should be very fast and _safe_ as nothing is active.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200711091349.28865-1-chris@chris-wilson.co.uk


# b2295e2e 10-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Be defensive in the face of false CS events

If the HW throws a curve ball and reports either en event before it is
possible, or just a completely impossible event, we have to grin and
bear it. The first few events, we will likely not notice as we would be
expecting some event, but as soon as we stop expecting an event and yet
they still keep coming, then we enter into undefined state territory.
In which case, bail out, stop processing the events, and reset the
engine and our set of queued requests to recover.

The sporadic hangs and warnings will continue to plague CI, but at least
system stability should not be compromised.

v2: Commentary and force the reset-on-error.
v3: Customised user facing message for forced resets from internal errors.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200710133125.30194-1-chris@chris-wilson.co.uk


# 89d19b2b 08-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Release shortlived maps of longlived objects

Some objects we map once during their construction, and then never
access their mappings again, even if they are kept around for the
duration of the driver. Keeping those pages mapped, often vmapped, is
therefore wasteful and we should release the maps as soon as we no
longer need them.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708173748.32734-3-chris@chris-wilson.co.uk


# 59c94b9d 08-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Replace opencoded i915_gem_object_pin_map()

As we have a pin_map interface, that knows how to flush the data to the
device, use it. The only downside is that we keep the kmap around, as
once acquired we keep the mapping cached until the object's backing
store is released.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708173748.32734-2-chris@chris-wilson.co.uk


# 0b6613c6 07-Jul-2020 Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>

drm/i915/sseu: Move sseu_info under gt_info

SSEUs are a GT capability, so track them under gt_info.

Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-8-daniele.ceraolospurio@intel.com


# 8ab3a381 09-Jun-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Incrementally check for rewinding

In commit 5ba32c7be81e ("drm/i915/execlists: Always force a context
reload when rewinding RING_TAIL"), we placed the check for rewinding a
context on actually submitting the next request in that context. This
was so that we only had to check once, and could do so with precision
avoiding as many forced restores as possible. For example, to ensure
that we can resubmit the same request a couple of times, we include a
small wa_tail such that on the next submission, the ring->tail will
appear to move forwards when resubmitting the same request. This is very
common as it will happen for every lite-restore to fill the second port
after a context switch.

However, intel_ring_direction() is limited in precision to movements of
upto half the ring size. The consequence being that if we tried to
unwind many requests, we could exceed half the ring and flip the sense
of the direction, so missing a force restore. As no request can be
greater than half the ring (i.e. 2048 bytes in the smallest case), we
can check for rollback incrementally. As we check against the tail that
would be submitted, we do not lose any sensitivity and allow lite
restores for the simple case. We still need to double check upon
submitting the context, to allow for multiple preemptions and
resubmissions.

Fixes: 5ba32c7be81e ("drm/i915/execlists: Always force a context reload when rewinding RING_TAIL")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200609151723.12971-1-chris@chris-wilson.co.uk
(cherry picked from commit e36ba817fa966f81fb1c8d16f3721b5a644b2fa9)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 4178b5a6 27-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Prevent timeslicing into unpreemptable requests

We have a I915_REQUEST_NOPREEMPT flag that we set when we must prevent
the HW from preempting during the course of this request. We need to
honour this flag and protect the HW even if we have a heartbeat request,
or other maximum priority barrier, pending. As such, restrict the
timeslicing check to avoid preempting into the topmost priority band,
leaving the unpreemptable requests in blissful peace running
uninterrupted on the HW.

v2: Set the I915_PRIORITY_BARRIER to be less than
I915_PRIORITY_UNPREEMPTABLE so that we never submit a request
(heartbeat or barrier) that can legitimately preempt the current
non-premptable request.

Fixes: 2a98f4e65bba ("drm/i915: add infrastructure to hold off preemption on a request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200527162418.24755-1-chris@chris-wilson.co.uk
(cherry picked from commit b72f02d78e4f257761ed003444ae52083f962076)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 84973767 19-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Incorporate the virtual engine into timeslicing

It was quite the oversight to only factor in the normal queue to decide
the timeslicing switch priority. By leaving out the next virtual request
from the priority decision, we would not timeslice the current engine if
there was an available virtual request.

Testcase: igt/gem_exec_balancer/sliced
Fixes: 3df2deed411e ("drm/i915/execlists: Enable timeslice on partial virtual engine dequeue")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200519132046.22443-3-chris@chris-wilson.co.uk
(cherry picked from commit 6ad249ba59badc7ff157d4db1f835748f0e2c9b6)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 3d09677a 12-Jun-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Lift opportunistic process_csb to before engine lock

Since the process_csb() does not require us to hold the
engine->active.lock, we can move the opportunistic flush before
direction submission to outside of the lock.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200612221113.9129-1-chris@chris-wilson.co.uk


# e36ba817 09-Jun-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Incrementally check for rewinding

In commit 5ba32c7be81e ("drm/i915/execlists: Always force a context
reload when rewinding RING_TAIL"), we placed the check for rewinding a
context on actually submitting the next request in that context. This
was so that we only had to check once, and could do so with precision
avoiding as many forced restores as possible. For example, to ensure
that we can resubmit the same request a couple of times, we include a
small wa_tail such that on the next submission, the ring->tail will
appear to move forwards when resubmitting the same request. This is very
common as it will happen for every lite-restore to fill the second port
after a context switch.

However, intel_ring_direction() is limited in precision to movements of
upto half the ring size. The consequence being that if we tried to
unwind many requests, we could exceed half the ring and flip the sense
of the direction, so missing a force restore. As no request can be
greater than half the ring (i.e. 2048 bytes in the smallest case), we
can check for rollback incrementally. As we check against the tail that
would be submitted, we do not lose any sensitivity and allow lite
restores for the simple case. We still need to double check upon
submitting the context, to allow for multiple preemptions and
resubmissions.

Fixes: 5ba32c7be81e ("drm/i915/execlists: Always force a context reload when rewinding RING_TAIL")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200609151723.12971-1-chris@chris-wilson.co.uk


# 8733a063 07-Jun-2020 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Adjust the sentinel assert to match implementation

Sentinels are supposed to be last requests in the elsp queue, not the
only one, so adjust the assert accordingly.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200607222108.14401-1-chris@chris-wilson.co.uk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 12b67c2e 05-Jun-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Always check to enable timeslicing if not submitting

We may choose not to submit for a number of reasons, yet not fill both
ELSP. In which case we must start timeslicing (there will be no ACK
event on which to hook the start) if the queue would benefit from the
currently active context being evicted.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200605122334.2798-2-chris@chris-wilson.co.uk


# fdd4f941 05-Jun-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Set timeslicing priority from queue

If we only submit the first port, leaving the second empty yet have
ready requests pending in the queue, use that to set the timeslicing
priority (i.e. the priority at which we will decided to enabling
timeslicing and evict the currently active context if the queue is of
equal priority after its quantum expired).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200605122334.2798-1-chris@chris-wilson.co.uk


# ac533c56 04-Jun-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Track if an engine requires forcewake w/a

Sometimes an engine might need to keep forcewake active while it is busy
submitting requests for a particular workaround. Track such nuisance
with engine->fw_domain.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200604153145.21068-1-chris@chris-wilson.co.uk


# 5a833995 02-Jun-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drop i915_request.i915 backpointer

We infrequently use the direct i915 backpointer from the i915_request,
so do we really need to waste the space in the struct for it? 8 bytes
from the most frequently allocated struct vs an 3 bytes and pointer
chasing in using rq->engine->i915?

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200602220953.21178-1-chris@chris-wilson.co.uk


# 2010b7f0 28-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Start timeslice on partial submission

We may choose to only submit ELSP[0], even though we have sufficient
requests to fill the whole ELSP. Normally, we only start timeslicing if
we fill more than one port, but in this case we need to start
timeslicing for the queue that we choose not to submit.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200528205727.20309-1-chris@chris-wilson.co.uk


# b72f02d7 27-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Prevent timeslicing into unpreemptable requests

We have a I915_REQUEST_NOPREEMPT flag that we set when we must prevent
the HW from preempting during the course of this request. We need to
honour this flag and protect the HW even if we have a heartbeat request,
or other maximum priority barrier, pending. As such, restrict the
timeslicing check to avoid preempting into the topmost priority band,
leaving the unpreemptable requests in blissful peace running
uninterrupted on the HW.

v2: Set the I915_PRIORITY_BARRIER to be less than
I915_PRIORITY_UNPREEMPTABLE so that we never submit a request
(heartbeat or barrier) that can legitimately preempt the current
non-premptable request.

Fixes: 2a98f4e65bba ("drm/i915: add infrastructure to hold off preemption on a request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200527162418.24755-1-chris@chris-wilson.co.uk


# 9ae6c4ef 25-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Shortcircuit queue_prio() for no internal levels

If there are no internal levels and the user priority-shift is zero, we
can help the compiler eliminate some dead code:

Function old new delta
start_timeslice 169 154 -15
__execlists_submission_tasklet 4696 4659 -37

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200525075347.582-4-chris@chris-wilson.co.uk


# 6ad249ba 19-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Incorporate the virtual engine into timeslicing

It was quite the oversight to only factor in the normal queue to decide
the timeslicing switch priority. By leaving out the next virtual request
from the priority decision, we would not timeslice the current engine if
there was an available virtual request.

Testcase: igt/gem_exec_balancer/sliced
Fixes: 3df2deed411e ("drm/i915/execlists: Enable timeslice on partial virtual engine dequeue")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200519132046.22443-3-chris@chris-wilson.co.uk


# 1ee05f9e 19-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Kick virtual siblings on timeslice out

If we decide to timeslice out the current virtual request, we will
unsubmit it while it is still busy (ve->context.inflight == sibling[0]).
If the virtual tasklet and then the other sibling tasklets run before we
completely schedule out the active virtual request for the preemption,
those other tasklets will see that the virtul request is still inflight
on sibling[0] and leave it be. Therefore when we finally schedule-out
the virtual request and if we see that we have passed it back to the
virtual engine, reschedule the virtual tasklet so that it may be
resubmitted on any of the siblings.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200519132046.22443-2-chris@chris-wilson.co.uk


# f5f7e790 18-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Reuse the tasklet priority for virtual as their siblings

In order to keep all the tasklets in the same execution lists and so
fifo ordered, be consistent and use the same priority for all.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200518081440.17948-3-chris@chris-wilson.co.uk


# 0f4013fb2 13-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Transfer old virtual breadcrumbs to irq_worker

The second try at staging the transfer of the breadcrumb. In part one,
we realised we could not simply move to the second engine as we were
only holding the breadcrumb lock on the first. So in commit 6c81e21a4742
("drm/i915/gt: Stage the transfer of the virtual breadcrumb"), we
removed it from the first engine and marked up this request to reattach
the signaling on the new engine. However, this failed to take into
account that we only attach the breadcrumb if the new request is added
at the start of the queue, which if we are transferring, it is because
we know there to be a request to be signaled (and hence we would not be
attached).

In this attempt, we try to transfer the completed requests to the
irq_worker on its rq->engine->breadcrumbs. This preserves the coupling
between the rq and its breadcrumbs, so that
i915_request_cancel_breadcrumb() does not attempt to manipulate the list
under the wrong lock.

v2: Code sharing is fun.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1862
Fixes: 6c81e21a4742 ("drm/i915/gt: Stage the transfer of the virtual breadcrumb")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200513074809.18194-1-chris@chris-wilson.co.uk


# 18e4af04 13-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drop no-semaphore boosting

Now that we have fast timeslicing on semaphores, we no longer need to
prioritise none-semaphore work as we will yield any work blocked on a
semaphore to the next in the queue. Previously with no timeslicing,
blocking on the semaphore caused extremely bad scheduling with multiple
clients utilising multiple rings. Now, there is no impact and we can
remove the complication.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200513173504.28322-1-chris@chris-wilson.co.uk


# 795d4d7f 13-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark the addition of the initial-breadcrumb in the request

The initial-breadcrumb is used to mark the end of the awaiting and the
beginning of the user payload. We verify that we do not start the user
payload before all signaler are completed, checking our semaphore setup
by looking for the initial breadcrumb being written too early. We also
want to ensure that we do not add semaphore waits after we have already
closed the semaphore section, an issue for later deferred waits.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200513165937.9508-2-chris@chris-wilson.co.uk


# b428d570 13-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Reset execlists registers before HWSP

Upon gt resume, we first poison then sanitize the engine. However, our
testing shows that gen9 will very rarely retain the poisoned value from
the HWSP mappings of the execlists status registers. This suggests that
it is reading back from the HWSP, so rejig the register reset.

v2: Maybe RING_CONTEXT_STATUS_PTR is write masked. It is.

References: https://gitlab.freedesktop.org/drm/intel/-/issues/1812
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200513100120.11617-1-chris@chris-wilson.co.uk


# a9d094dc 07-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark concurrent submissions with a weak-dependency

We recorded the dependencies for WAIT_FOR_SUBMIT in order that we could
correctly perform priority inheritance from the parallel branches to the
common trunk. However, for the purpose of timeslicing and reset
handling, the dependency is weak -- as we the pair of requests are
allowed to run in parallel and not in strict succession.

The real significance though is that this allows us to rearrange
groups of WAIT_FOR_SUBMIT linked requests along the single engine, and
so can resolve user level inter-batch scheduling dependencies from user
semaphores.

Fixes: c81471f5e95c ("drm/i915: Copy across scheduler behaviour flags across submit fences")
Testcase: igt/gem_exec_fence/submit
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.6+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200507155109.8892-1-chris@chris-wilson.co.uk
(cherry picked from commit 6b6cd2ebd8d071e55998e32b648bb8081f7f02bb)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 1c8ee8b9 10-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Restore Cherryview back to full-ppgtt

This reverts commit 0b718ba1e884f64dce27c19311dd2859b87e56b9.

There are still some residual issues with asynchronous binding and
execution, but since commit 92581f9fb99c ("drm/i915: Immediately execute
the fenced work") we prefer not to use asynchronous binds, and the
remaining issues do not seem restricted to Cherryview [at least the ones
seen over a few dozen CI runs, less frequent issues are sure to be
discovered!]

These issues seem to be mitigated, if not eliminated entirely, by the
previous commit 84eac0c65940 ("drm/i915/gt: Force pte cacheline to main
memory").

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200510102431.21959-3-chris@chris-wilson.co.uk


# f4d49692 11-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Mark up the racy read of execlists->context_tag

Since we are using bitops on context_tag to allow us to reserve and
release inflight tags concurrently, the scan for the next bit is
intentionally racy.

[ 516.446854] BUG: KCSAN: data-race in execlists_schedule_in.isra.0 [i915] / execlists_schedule_out [i915]
[ 516.446874]
[ 516.446886] write (marked) to 0xffff8881f7644048 of 8 bytes by interrupt on cpu 2:
[ 516.447076] execlists_schedule_out+0x538/0x6a0 [i915]
[ 516.447263] process_csb+0x10b/0x3d0 [i915]
[ 516.447449] execlists_submission_tasklet+0x30/0x170 [i915]
[ 516.447468] tasklet_action_common.isra.0+0x42/0x90
[ 516.447484] __do_softirq+0xc8/0x206
[ 516.447498] irq_exit+0xcd/0xe0
[ 516.447516] do_IRQ+0x44/0xc0
[ 516.447535] ret_from_intr+0x0/0x1c
[ 516.447550] cpuidle_enter_state+0x199/0x400
[ 516.447572] cpuidle_enter+0x50/0x90
[ 516.447587] do_idle+0x197/0x1e0
[ 516.447600] cpu_startup_entry+0x14/0x20
[ 516.447619] start_secondary+0xf9/0x130
[ 516.447643] secondary_startup_64+0xa4/0xb0
[ 516.447655]
[ 516.447671] read to 0xffff8881f7644048 of 8 bytes by task 460 on cpu 1:
[ 516.447863] execlists_schedule_in.isra.0+0x3cf/0x5a0 [i915]
[ 516.448064] execlists_dequeue+0xf8f/0x1690 [i915]
[ 516.448252] __execlists_submission_tasklet+0x48/0x60 [i915]
[ 516.448440] execlists_submit_request+0x2e2/0x310 [i915]
[ 516.448634] submit_notify+0x8f/0xc8 [i915]
[ 516.448820] __i915_sw_fence_complete+0x61/0x420 [i915]
[ 516.449005] i915_sw_fence_complete+0x58/0x80 [i915]
[ 516.449208] i915_sw_fence_commit+0x16/0x20 [i915]
[ 516.449399] __i915_request_queue+0x60/0x70 [i915]
[ 516.449590] i915_gem_do_execbuffer+0x33f1/0x4a00 [i915]
[ 516.449782] i915_gem_execbuffer2_ioctl+0x2a2/0x550 [i915]
[ 516.449800] drm_ioctl_kernel+0xe9/0x130
[ 516.449814] drm_ioctl+0x27d/0x45e
[ 516.449827] ksys_ioctl+0x89/0xb0
[ 516.449842] __x64_sys_ioctl+0x42/0x60
[ 516.449864] do_syscall_64+0x6e/0x2c0
[ 516.449878] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200511075722.13483-1-chris@chris-wilson.co.uk


# f1e79c7e 07-May-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

drm/i915: Replace zero-length array with flexible-array

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
int stuff;
struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200507185408.GA14561@embeddedor


# e41627db 08-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Improve precision on defer_request assert

The kernel_context does not use initial-breadcrumbs, so when we ask if
its requests have started we do so by comparing against the completion
seqno of the previous request. This is very imprecise, not precise
enough for the defer_request assertion.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1847
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200508104220.9872-1-chris@chris-wilson.co.uk


# 972282c4 07-May-2020 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915/gen12: Add aux table invalidate for all engines

All engines, exception being blitter as it does not
care about the form, can access compressed surfaces.

So we need to add forced aux table invalidates
for those engines.

v2: virtual instance masking (Chris)
v3: bug on if not found (Chris)

References: d248b371f747 ("drm/i915/gen12: Invalidate aux table entries forcibly")
References bspec#43904, hsdes#1809175790
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chuansheng Liu <chuansheng.liu@intel.com>
Cc: Rafael Antognolli <rafael.antognolli@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200507142045.8668-1-mika.kuoppala@linux.intel.com


# eec39e44 07-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove wait priority boosting

Upon waiting a request (when asked), we gave that request a small
priority boost, not enough for it to cause preemption, but enough for it
to be scheduled next before all equals. We also used that bit to give
new clients a small priority boost, similar to FQ_CODEL, such that we
favoured short interactive tasks ahead of long running streams.

However, this is causing lots of complications with timeslicing where we
both want to honour the boost and yet ignore it. Those complications
cause unexpected user behaviour (tasks not being timesliced and run
concurrently as epxected), and the easiest way to resolve that is to
remove the boost. Hopefully, we can find a compromise again if we need
to, but in theory timeslicing itself and future more advanced schedulers
should give us the interactivity boost we seek.

Testcase: igt/gem_exec_schedule/lateslice
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200507152338.7452-3-chris@chris-wilson.co.uk


# 6b6cd2eb 07-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark concurrent submissions with a weak-dependency

We recorded the dependencies for WAIT_FOR_SUBMIT in order that we could
correctly perform priority inheritance from the parallel branches to the
common trunk. However, for the purpose of timeslicing and reset
handling, the dependency is weak -- as we the pair of requests are
allowed to run in parallel and not in strict succession.

The real significance though is that this allows us to rearrange
groups of WAIT_FOR_SUBMIT linked requests along the single engine, and
so can resolve user level inter-batch scheduling dependencies from user
semaphores.

Fixes: c81471f5e95c ("drm/i915: Copy across scheduler behaviour flags across submit fences")
Testcase: igt/gem_exec_fence/submit
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.6+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200507155109.8892-1-chris@chris-wilson.co.uk


# d248b371 06-May-2020 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915/gen12: Invalidate aux table entries forcibly

Aux table invalidation can fail on update. So
next access may cause memory access to be into stale entry.

Proposed workaround is to invalidate entries between
all batchbuffers.

v2: correct register address (Yang)
v3: respect the order (Chris)

References bspec#43904, hsdes#1809175790
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chuansheng Liu <chuansheng.liu@intel.com>
Cc: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Yang A Shi <yang.a.shi@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200506165310.1239-1-mika.kuoppala@linux.intel.com


# 0c7c0c8e 06-May-2020 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915/gen12: Flush L3

Flush TDL,L3 and EUs

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200506144734.29297-3-mika.kuoppala@linux.intel.com


# 32d7171e 06-May-2020 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915/gen12: Fix HDC pipeline flush

HDC pipeline flush is bit on the first dword of
the PIPE_CONTROL, not the second. Make it so.

v2: function naming (Chris)

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200506144734.29297-2-mika.kuoppala@linux.intel.com


# f02ac414 06-May-2020 Mika Kuoppala <mika.kuoppala@linux.intel.com>

Revert "drm/i915/tgl: Include ro parts of l3 to invalidate"

This reverts commit 62037ffff229b7d94f1db5ef8d2e2ec819832ef3.

L3 ro cache invalidation is part of the dword0 of pipe
control. Also it is not relevant to this gen.

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200506144734.29297-1-mika.kuoppala@linux.intel.com


# 1bc6a601 28-Apr-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Track inflight CCID

The presumption is that by using a circular counter that is twice as
large as the maximum ELSP submission, we would never reuse the same CCID
for two inflight contexts.

However, if we continually preempt an active context such that it always
remains inflight, it can be resubmitted with an arbitrary number of
paired contexts. As each of its paired contexts will use a new CCID,
eventually it will wrap and submit two ELSP with the same CCID.

Rather than use a simple circular counter, switch over to a small bitmap
of inflight ids so we can avoid reusing one that is still potentially
active.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1796
Fixes: 2935ed5339c4 ("drm/i915: Remove logical HW ID")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200428184751.11257-2-chris@chris-wilson.co.uk
(cherry picked from commit 5c4a53e3b1cbc38d0906e382f1037290658759bb)
(cherry picked from commit 134711240307d5586ae8e828d2699db70a8b74f2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 53b2622e 28-Apr-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Avoid reusing the same logical CCID

The bspec is confusing on the nature of the upper 32bits of the LRC
descriptor. Once upon a time, it said that it uses the upper 32b to
decide if it should perform a lite-restore, and so we must ensure that
each unique context submitted to HW is given a unique CCID [for the
duration of it being on the HW]. Currently, this is achieved by using
a small circular tag, and assigning every context submitted to HW a
new id. However, this tag is being cleared on repinning an inflight
context such that we end up re-using the 0 tag for multiple contexts.

To avoid accidentally clearing the CCID in the upper 32bits of the LRC
descriptor, split the descriptor into two dwords so we can update the
GGTT address separately from the CCID.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1796
Fixes: 2935ed5339c4 ("drm/i915: Remove logical HW ID")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200428184751.11257-1-chris@chris-wilson.co.uk
(cherry picked from commit 2632f174a2e1a5fd40a70404fa8ccfd0b1f79ebd)
(cherry picked from commit a4b70fcc587860f4b972f68217d8ebebe295ec15)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 220dcfc1 07-Apr-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Yield the timeslice if caught waiting on a user semaphore

If we find ourselves waiting on a MI_SEMAPHORE_WAIT, either within the
user batch or in our own preamble, the engine raises a
GT_WAIT_ON_SEMAPHORE interrupt. We can unmask that interrupt and so
respond to a semaphore wait by yielding the timeslice, if we have
another context to yield to!

The only real complication is that the interrupt is only generated for
the start of the semaphore wait, and is asynchronous to our
process_csb() -- that is, we may not have registered the timeslice before
we see the interrupt. To ensure we don't miss a potential semaphore
blocking forward progress (e.g. selftests/live_timeslice_preempt) we mark
the interrupt and apply it to the next timeslice regardless of whether it
was active at the time.

v2: We use semaphores in preempt-to-busy, within the timeslicing
implementation itself! Ergo, when we do insert a preemption due to an
expired timeslice, the new context may start with the missed semaphore
flagged by the retired context and be yielded, ad infinitum. To avoid
this, read the context id at the time of the semaphore interrupt and
only yield if that context is still active.

Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200407130811.17321-1-chris@chris-wilson.co.uk
(cherry picked from commit c4e8ba7390346a77ffe33ec3f210bc62e0b6c8c6)
(cherry picked from commit cd60e4ac4738a6921592c4f7baf87f9a3499f0e2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 977253df 04-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Stop holding onto the pinned_default_state

As we only restore the default context state upon banning a context, we
only need enough of the state to run the ring and nothing more. That is
we only need our bare protocontext.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200504180745.15645-1-chris@chris-wilson.co.uk


# b68be5c6 05-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Record the active CCID from before reset

If we cannot trust the reset will flush out the CS event queue such that
process_csb() reports an accurate view of HW, we will need to search the
active and pending contexts to determine which was actually running at
the time we issued the reset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200505084629.31365-1-chris@chris-wilson.co.uk


# 25fd6de3 04-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Small tidy of gen8+ breadcrumb emission

Use a local to shrink a line under 80 columns, and refactor the common
emit_xcs_breadcrumb() wrapper of ggtt-write.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200504180507.6017-1-chris@chris-wilson.co.uk


# fe5a7082 01-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Make timeslicing an explicit engine property

In order to allow userspace to rely on timeslicing to reorder their
batches, we must support preemption of those user batches. Declare
timeslicing as an explicit property that is a combination of having the
kernel support and HW support.

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200501122249.12417-1-chris@chris-wilson.co.uk
(cherry picked from commit a211da9c771bf97395a3ced83a3aa383372b13a7)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# a211da9c 01-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Make timeslicing an explicit engine property

In order to allow userspace to rely on timeslicing to reorder their
batches, we must support preemption of those user batches. Declare
timeslicing as an explicit property that is a combination of having the
kernel support and HW support.

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200501122249.12417-1-chris@chris-wilson.co.uk


# 426d0073 29-Apr-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Always enable busy-stats for execlists

In the near future, we will utilize the busy-stats on each engine to
approximate the C0 cycles of each, and use that as an input to a manual
RPS mechanism. That entails having busy-stats always enabled and so we
can remove the enable/disable routines and simplify the pmu setup. As a
consequence of always having the stats enabled, we can also show the
current active time via sysfs/engine/xcs/active_time_ns.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200429205446.3259-1-chris@chris-wilson.co.uk


# be1cb55a 29-Apr-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Keep a no-frills swappable copy of the default context state

We need to keep the default context state around to instantiate new
contexts (aka golden rendercontext), and we also keep it pinned while
the engine is active so that we can quickly reset a hanging context.
However, the default contexts are large enough to merit keeping in
swappable memory as opposed to kernel memory, so we store them inside
shmemfs. Currently, we use the normal GEM objects to create the default
context image, but we can throw away all but the shmemfs file.

This greatly simplifies the tricky power management code which wants to
run underneath the normal GT locking, and we definitely do not want to
use any high level objects that may appear to recurse back into the GT.
Though perhaps the primary advantage of the complex GEM object is that
we aggressively cache the mapping, but here we are recreating the
vm_area everytime time we unpark. At the worst, we add a lightweight
cache, but first find a microbenchmark that is impacted.

Having started to create some utility functions to make working with
shmemfs objects easier, we can start putting them to wider use, where
GEM objects are overkill, such as storing persistent error state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200429172429.6054-1-chris@chris-wilson.co.uk


# f6a7c21c 28-Apr-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Verify we don't submit two identical CCIDs

Check that we do not submit two contexts into ELSP with the same CCID
[upper portion of the descriptor].

References: https://gitlab.freedesktop.org/drm/intel/-/issues/1793
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200428184751.11257-3-chris@chris-wilson.co.uk


# 5c4a53e3 28-Apr-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Track inflight CCID

The presumption is that by using a circular counter that is twice as
large as the maximum ELSP submission, we would never reuse the same CCID
for two inflight contexts.

However, if we continually preempt an active context such that it always
remains inflight, it can be resubmitted with an arbitrary number of
paired contexts. As each of its paired contexts will use a new CCID,
eventually it will wrap and submit two ELSP with the same CCID.

Rather than use a simple circular counter, switch over to a small bitmap
of inflight ids so we can avoid reusing one that is still potentially
active.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1796
Fixes: 2935ed5339c4 ("drm/i915: Remove logical HW ID")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200428184751.11257-2-chris@chris-wilson.co.uk


# 2632f174 28-Apr-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Avoid reusing the same logical CCID

The bspec is confusing on the nature of the upper 32bits of the LRC
descriptor. Once upon a time, it said that it uses the upper 32b to
decide if it should perform a lite-restore, and so we must ensure that
each unique context submitted to HW is given a unique CCID [for the
duration of it being on the HW]. Currently, this is achieved by using
a small circular tag, and assigning every context submitted to HW a
new id. However, this tag is being cleared on repinning an inflight
context such that we end up re-using the 0 tag for multiple contexts.

To avoid accidentally clearing the CCID in the upper 32bits of the LRC
descriptor, split the descriptor into two dwords so we can update the
GGTT address separately from the CCID.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1796
Fixes: 2935ed5339c4 ("drm/i915: Remove logical HW ID")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200428184751.11257-1-chris@chris-wilson.co.uk


# 4243cd53 27-Apr-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Sanitize GT first

We see that if the HW doesn't actually sleep, the HW may eat the poison
we set in its write-only HWSP during sanitize:

intel_gt_resume.part.8: 0000:00:02.0
__gt_unpark: 0000:00:02.0
gt_sanitize: 0000:00:02.0 force:yes
process_csb: 0000:00:02.0 vcs0: cs-irq head=5, tail=90
process_csb: 0000:00:02.0 vcs0: csb[0]: status=0x5a5a5a5a:0x5a5a5a5a
assert_pending_valid: Nothing pending for promotion!

The CS TAIL pointer should have been reset by reset_csb_pointers(), so
in this case it is likely that we have read back from the CPU cache and
so we must clflush our control over that page. In doing so, push the
sanitisation to the start of the GT sequence so that our poisoning is
assuredly before we start talking to the HW.

References: https://gitlab.freedesktop.org/drm/intel/-/issues/1794
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200427084000.10999-1-chris@chris-wilson.co.uk


# 68ace460 26-Apr-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Check preempt-timeout target before submit_ports

We evaluate *active, which is a pointer into execlists->inflight[]
during dequeue to decide how long a preempt-timeout we need to apply.
However, as soon as we do the submit_ports, the HW may send its ACK
interrupt causing us to promote execlists->pending[] tp
execlists->inflight[], overwriting the value of *active. We know *active
is only stable until we submit (as we only submit when there is no
pending promotion).

[ 16.102328] BUG: KCSAN: data-race in execlists_dequeue+0x1449/0x1600 [i915]
[ 16.102356]
[ 16.102375] race at unknown origin, with read to 0xffff8881e9500488 of 8 bytes by task 429 on cpu 1:
[ 16.102780] execlists_dequeue+0x1449/0x1600 [i915]
[ 16.103160] __execlists_submission_tasklet+0x48/0x60 [i915]
[ 16.103540] execlists_submit_request+0x38e/0x3c0 [i915]
[ 16.103940] submit_notify+0x8f/0xc0 [i915]
[ 16.104308] __i915_sw_fence_complete+0x61/0x420 [i915]
[ 16.104683] i915_sw_fence_complete+0x58/0x80 [i915]
[ 16.105054] i915_sw_fence_commit+0x16/0x20 [i915]
[ 16.105457] __i915_request_queue+0x60/0x70 [i915]
[ 16.105843] i915_gem_do_execbuffer+0x2d6b/0x4230 [i915]
[ 16.106227] i915_gem_execbuffer2_ioctl+0x2b0/0x580 [i915]
[ 16.106257] drm_ioctl_kernel+0xe9/0x130
[ 16.106279] drm_ioctl+0x27d/0x45e
[ 16.106311] ksys_ioctl+0x89/0xb0
[ 16.106336] __x64_sys_ioctl+0x42/0x60
[ 16.106370] do_syscall_64+0x6e/0x2c0
[ 16.106397] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200426094231.21995-1-chris@chris-wilson.co.uk


# b8a11811 24-Apr-2020 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Use indirect ctx bb to mend CMD_BUF_CCTL

Use indirect ctx bb to load cmd buffer control value
from context image to avoid corruption.

v2: add to lrc layout (Chris)
v3: end to a cacheline (Chris)
v4: add to lrc fixed (Chris)
v5: value in offset+1

Testcase: igt/i915_selftest/gt_lrc
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200424230632.30333-1-mika.kuoppala@linux.intel.com


# 685d2109 24-Apr-2020 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Add per ctx batchbuffer wa for timestamp

Restoration of a previous timestamp can collide
with updating the timestamp, causing a value corruption.

Combat this issue by using indirect ctx bb to
modify the context image during restoring process.

We can preload value into scratch register. From which
we then do the actual write with LRR. LRR is faster and
thus less error prone as probability of race drops.

v2: tidying (Chris)
v3: lrr for all engines
v4: grp
v5: reg bit
v6: wa_bb_offset, virtual engines (Chris)

References: HSDES#16010904313
Testcase: igt/i915_selftest/gt_lrc
Suggested-by: Joseph Koston <joseph.koston@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200424230546.30271-1-mika.kuoppala@linux.intel.com


# 168c6d23 24-Apr-2020 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Add engine scratch register to live_lrc_fixed

General purpose registers are per engine and
in a fixed location. Add to live_lrc_fixed.

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200424214841.28076-1-mika.kuoppala@linux.intel.com


# b4892e44 23-Apr-2020 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Make define for lrc state offset

More often than not, we need a byte offset into lrc
register state from the start of the hw state. Make it so.

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200423182355.21837-3-mika.kuoppala@linux.intel.com


# f1cc6acf 23-Apr-2020 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915/selftests: Add context batchbuffers registers to live_lrc_fixed

Add per ctx bb and indirect ctx bb register locations to live_lrc_fixed
for verification.

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200423224159.22078-1-chris@chris-wilson.co.uk


# 36fe164d 22-Apr-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Carefully order virtual_submission_tasklet

During the virtual engine's submission tasklet, we take the request and
insert into the submission queue on each of our siblings. This seems
quite simply, and so no problems with ordering. However, the sibling
execlists' submission tasklets may run concurrently with the virtual
engine's tasklet, submitting the request to HW before the virtual
finishes its task of telling all the siblings. If this happens, the
sibling tasklet may *reorder* the ve->sibling[] array that the virtual
engine tasklet is processing. This can *only* reorder within the
elements already processed by the virtual engine, nevertheless the
race is detected by KCSAN:

[ 185.580014] BUG: KCSAN: data-race in execlists_dequeue [i915] / virtual_submission_tasklet [i915]
[ 185.580054]
[ 185.580076] write to 0xffff8881f1919860 of 8 bytes by interrupt on cpu 2:
[ 185.580553] execlists_dequeue+0x6ad/0x1600 [i915]
[ 185.581044] __execlists_submission_tasklet+0x48/0x60 [i915]
[ 185.581517] execlists_submission_tasklet+0xd3/0x170 [i915]
[ 185.581554] tasklet_action_common.isra.0+0x42/0x90
[ 185.581585] __do_softirq+0xc8/0x206
[ 185.581613] run_ksoftirqd+0x15/0x20
[ 185.581641] smpboot_thread_fn+0x15a/0x270
[ 185.581669] kthread+0x19a/0x1e0
[ 185.581695] ret_from_fork+0x1f/0x30
[ 185.581717]
[ 185.581736] read to 0xffff8881f1919860 of 8 bytes by interrupt on cpu 0:
[ 185.582231] virtual_submission_tasklet+0x10e/0x5c0 [i915]
[ 185.582265] tasklet_action_common.isra.0+0x42/0x90
[ 185.582291] __do_softirq+0xc8/0x206
[ 185.582315] run_ksoftirqd+0x15/0x20
[ 185.582340] smpboot_thread_fn+0x15a/0x270
[ 185.582368] kthread+0x19a/0x1e0
[ 185.582395] ret_from_fork+0x1f/0x30
[ 185.582417]

We can prevent this race by checking for the ve->request after looking
up the sibling array.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200423115315.26825-1-chris@chris-wilson.co.uk


# 15501287 22-Apr-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Drop request-before-CS assertion

When we migrated to execlists, one of the conditions we wanted to test
for was whether the breadcrumb seqno was being written before the
breadcumb interrupt was delivered. This was following on from issues
observed on previous generations which were not so strongly ordered. With
the removal of the missed interrupt detection, we have not reliable
means of detecting the out-of-order seqno/interrupt but instead tried to
assert that the relationship between the CS event interrupt and the
breadwrite should be strongly ordered. However, Icelake proves it is
possible for the HW implementation to forget about minor little details
such as write ordering and so the order between *processing* the CS
event and the breadcrumb is unreliable.

Remove the unreliable assertion, but leave a debug telltale in case we
have reason to suspect.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1658
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200422141749.28709-1-chris@chris-wilson.co.uk


# bd3ec9e7 21-Apr-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Poison residual state [HWSP] across resume.

Since we may lose the content of any buffer when we relinquish control
of the system (e.g. suspend/resume), we have to be careful not to rely
on regaining control. A good method to detect when we might be using
garbage is by always injecting that garbage prior to first use on
load/resume/etc.

v2: Drop sanitize callback on cleanup
v3: Move seqno reset to timeline enter, so we reset all timelines.
However, this is done on every activation during runtime and not reset.
The similar level of paranoia we apply to correcting context state after
a period of inactivity.

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Venkata Ramana Nayana <venkata.ramana.nayana@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200421092504.7416-1-chris@chris-wilson.co.uk


# 23122a4d 15-Apr-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Scrub execlists state on resume

Before we resume, we reset the HW so we restart from a known good state.
However, as a part of the reset process, we drain our pending CS event
queue -- and if we are resuming that does not correspond to internal
state. On setup, we are scrubbing the CS pointers, but alas only on
setup.

Apply the sanitization not just to setup, but to all resumes.

Reported-by: Venkata Ramana Nayana <venkata.ramana.nayana@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Venkata Ramana Nayana <venkata.ramana.nayana@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200416114117.3460-1-chris@chris-wilson.co.uk


# dc483ba5 02-Apr-2020 Jani Nikula <jani.nikula@intel.com>

drm/i915/gt: prefer struct drm_device based logging

Prefer struct drm_device based logging over struct device based logging.

No functional changes.

Cc: Wambui Karuga <wambui.karugax@gmail.com>
Reviewed-by: Wambui Karuga <wambui.karugax@gmail.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200402114819.17232-16-jani.nikula@intel.com


# c4e8ba73 07-Apr-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Yield the timeslice if caught waiting on a user semaphore

If we find ourselves waiting on a MI_SEMAPHORE_WAIT, either within the
user batch or in our own preamble, the engine raises a
GT_WAIT_ON_SEMAPHORE interrupt. We can unmask that interrupt and so
respond to a semaphore wait by yielding the timeslice, if we have
another context to yield to!

The only real complication is that the interrupt is only generated for
the start of the semaphore wait, and is asynchronous to our
process_csb() -- that is, we may not have registered the timeslice before
we see the interrupt. To ensure we don't miss a potential semaphore
blocking forward progress (e.g. selftests/live_timeslice_preempt) we mark
the interrupt and apply it to the next timeslice regardless of whether it
was active at the time.

v2: We use semaphores in preempt-to-busy, within the timeslicing
implementation itself! Ergo, when we do insert a preemption due to an
expired timeslice, the new context may start with the missed semaphore
flagged by the retired context and be yielded, ad infinitum. To avoid
this, read the context id at the time of the semaphore interrupt and
only yield if that context is still active.

Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200407130811.17321-1-chris@chris-wilson.co.uk


# 848862e6 03-Apr-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Free request pool from virtual engines

While extremely unlikely to be populated, we could capture a request on
the virtual engine which we should free along with the virtual engine.

Fixes: 43acd6516ca9 ("drm/i915: Keep a per-engine request pool")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200403203303.10903-1-chris@chris-wilson.co.uk


# 4c977837 31-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Peek at the next submission for error interrupts

If we receive the error interrupt before the CS interrupt, we may find
ourselves without an active request to reset, skipping the GPU reset.
All because the attempt to reset was too early.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200401110435.30389-1-chris@chris-wilson.co.uk


# 91715555 31-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Pause CS flow before reset

Since we may be attempting to reset an active engine, we try to freeze
it in place before resetting -- to be on the safe side. We can go one
step further if we are using the CS flow semaphore to prevent the
context switching into the next.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200331091459.29179-2-chris@chris-wilson.co.uk


# f53ae29c 31-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Include a few tracek for timeslicing

Add a few telltales to see when timeslicing is being enabled.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200331120502.14713-1-chris@chris-wilson.co.uk


# e2ccf0d0 30-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Double check breadcrumb before crying foul

process_csb: 0000:00:02.0 bcs0: cs-irq head=4, tail=5
process_csb: 0000:00:02.0 bcs0: csb[5]: status=0x00008002:0x60000020
trace_ports: 0000:00:02.0 bcs0: preempted { ff84:45154! prio 2 }
trace_ports: 0000:00:02.0 bcs0: promote { ff84:45155* prio 2 }
trace_ports: 0000:00:02.0 bcs0: submit { ff84:45156 prio 2 }

process_csb: 0000:00:02.0 bcs0: cs-irq head=5, tail=6
process_csb: 0000:00:02.0 bcs0: csb[6]: status=0x00000018:0x60000020
trace_ports: 0000:00:02.0 bcs0: completed { ff84:45155* prio 2 }
process_csb: 0000:00:02.0 bcs0: ring:{start:0x00178000, head:0928, tail:0928, ctl:00000000, mode:00000200}
process_csb: 0000:00:02.0 bcs0: rq:{start:00178000, head:08b0, tail:08f0, seqno:ff84:45155, hwsp:45156},
process_csb: 0000:00:02.0 bcs0: ctx:{start:00178000, head:e000928, tail:0928},
process_csb: GEM_BUG_ON("context completed before request")

In this sequence, we can see that although we have submitted the next
request [ff84:45156] to HW (via ELSP[]) it has not yet reported the
lite-restore. Instead, we see the completion event of the currently
active request [ff84:45155] but at the time of processing that event,
the breadcrumb has not yet been written. Though by the time we do print
out the debug info, the seqno write of ff84:45156 has landed!

Therefore there is a serialisation problem between the seqno writes and
CS events, not just between the CS buffer and its head/tail pointers as
previously observed on Icelake.

This is not a huge problem, as we don't strictly rely on the breadcrumb
to determine HW activity, but it may indicate that interrupt delivery is
before the seqno write, aka bringing back the plague of missed
interrupts from yesteryear. However, there is no indication of this
wider problem, so let's just flush the seqno read before reporting an
error. If it persists after the fresh read we can worry again.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200330234318.30638-1-chris@chris-wilson.co.uk


# b28b34ac 30-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Explicitly reset both reg and context runtime

Upon a GPU reset, we copy the default context image over top of the
guilty image. This will rollback the CTX_TIMESTAMP register to before
our value of ce->runtime.last. Reset both back to 0 so that we do not
encounter an underflow on the next schedule out after resume.

This should not be a huge issue in practice, as hangs should be rare in
correct code.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200330125827.5804-1-chris@chris-wilson.co.uk


# 8b6d457f 29-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Include priority info in trace_ports

Add some extra information into trace_ports to help with reviewing
correctness.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200330113137.24425-1-chris@chris-wilson.co.uk


# 35f3fd81 27-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Workaround switching back to a completed context

In what seems remarkably similar to the w/a required to not reload an
idle context with HEAD==TAIL, it appears we must prevent the HW from
switching to an idle context in ELSP[1], while simultaneously trying to
preempt the HW to run another context and a continuation of the idle
context (which is no longer idle).

We can achieve this by preventing the context from completing while we
reload a new ELSP (by applying ring_set_paused(1) across the whole of
dequeue), except this eventually fails due to a lite-restore into a
waiting semaphore does not generate an ACK. Instead, we try to avoid
making the GPU do anything too challenging and not submit a new ELSP
while the interrupts + CSB events appear to have fallen behind the
completed contexts. We expect it to catch up shortly so we queue another
tasklet execution and hope for the best.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1501
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200327201433.21864-1-chris@chris-wilson.co.uk


# a97b786b 25-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Stage the transfer of the virtual breadcrumb

We move the virtual breadcrumb from one physical engine to the next, if
the next virtual request is scheduled on a new physical engine. Since
the virtual context can only be in one signal queue, we need it to track
the current physical engine for the new breadcrumbs. However, to move
the list we need both breadcrumb locks -- and since we cannot take both
at the same time (unless we are careful and always ensure consistent
ordering) stage the movement of the signaler via the current virtual
request.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1510
Fixes: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200325130059.30600-1-chris@chris-wilson.co.uk
(cherry picked from commit 6c81e21a4742385c00713137c6fdcade0412e93c)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 6c81e21a 25-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Stage the transfer of the virtual breadcrumb

We move the virtual breadcrumb from one physical engine to the next, if
the next virtual request is scheduled on a new physical engine. Since
the virtual context can only be in one signal queue, we need it to track
the current physical engine for the new breadcrumbs. However, to move
the list we need both breadcrumb locks -- and since we cannot take both
at the same time (unless we are careful and always ensure consistent
ordering) stage the movement of the signaler via the current virtual
request.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1510
Fixes: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200325130059.30600-1-chris@chris-wilson.co.uk


# 6670b413 24-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission

We dropped calling process_csb prior to handling direct submission in
order to avoid the nesting of spinlocks and lift process_csb() and the
majority of the tasklet out of irq-off. However, we do want to avoid
ksoftirqd latency in the fast path, so try and pull the interrupt-bh
local to direct submission if we can acquire the tasklet's lock.

v2: Document the read of pending[0] from outside the tasklet with
READ_ONCE.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200325120227.8044-1-chris@chris-wilson.co.uk


# 9bf7c313 25-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Drop setting sibling priority hint on virtual engines

We set the priority hint on execlists to avoid executing the tasklet for
when we know that there will be no change in execution order. However,
as we set it from the virtual engine for all siblings, but only one
physical engine may respond, we leave the hint set on the others
stopping direct submission that could take place.

If we do not set the hint, we may attempt direct submission even if we
don't expect to submit. If we set the hint, we may not do any submission
until the tasklet is run (and sometimes we may park the engine before
that has had a chance). Ergo there's only a minor ill-effect on mixed
virtual/physical engine workloads where we may try and fail to do direct
submission more often than required. (Pure virtual / engine workloads
will have redundant tasklet execution suppressed as normal.)

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1522
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200325101358.12231-1-chris@chris-wilson.co.uk


# 41e4065a 23-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rely on direct submission to the queue

Drop the pretense of kicking the tasklet (used only for the defunct guc
submission backend, it should just take ownership of the submit!) and so
remove the bh-kicking from around submission.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200323092841.22240-5-chris@chris-wilson.co.uk


# 91682e45 14-Mar-2020 Wambui Karuga <wambui.karugax@gmail.com>

drm/i915/lrc: convert to struct drm_device based logging macros.

Convert various instances of the printk based drm logging macros to the
struct drm_device based logging macros.

Note that this converts DRM_DEBUG_DRIVER() to drm_dbg() but does not
convert DRM_DEBUG() due to the lack of an analogous drm_device based
macro.

References: https://lists.freedesktop.org/archives/dri-devel/2020-January/253381.html
Signed-off-by: Wambui Karuga <wambui.karugax@gmail.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200314183344.17603-3-wambui.karugax@gmail.com


# c09f6b4d 04-Mar-2020 Caz Yokoyama <caz.yokoyama@intel.com>

Revert "drm/i915/tgl: Add extra hdc flush workaround"

This reverts commit 36a6b5d964d995b536b1925ec42052ee40ba92c4.

The commit takes care Wa_1604544889 which was fixed on a0 stepping based on
a0 replan. So no SW workaround is required on any stepping now.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Caz Yokoyama <caz.yokoyama@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Fixes: 36a6b5d964d9 ("drm/i915/tgl: Add extra hdc flush workaround")
Link: https://patchwork.freedesktop.org/patch/msgid/1c751032ce79c80c5485cae315f1a9904ce07cac.1583359940.git.caz.yokoyama@intel.com
(cherry picked from commit 175c4d9b3b9a60b4ea0b8cd034011808c6a03b05)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 9777d8b2 11-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Track active elements during dequeue

Record the initial active element we use when building the next ELSP
submission, so that we can compare against it latter to see if there's
no change.

Fixes: 44d0a9c05bc0 ("drm/i915/execlists: Skip redundant resubmission")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200311092624.10012-2-chris@chris-wilson.co.uk
(cherry picked from commit 60ef5b7ac6a131f09d287a5f156c878c2c926a30)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 175c4d9b 04-Mar-2020 Caz Yokoyama <caz.yokoyama@intel.com>

Revert "drm/i915/tgl: Add extra hdc flush workaround"

This reverts commit 36a6b5d964d995b536b1925ec42052ee40ba92c4.

The commit takes care Wa_1604544889 which was fixed on a0 stepping based on
a0 replan. So no SW workaround is required on any stepping now.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Caz Yokoyama <caz.yokoyama@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Fixes: 36a6b5d964d9 ("drm/i915/tgl: Add extra hdc flush workaround")
Link: https://patchwork.freedesktop.org/patch/msgid/1c751032ce79c80c5485cae315f1a9904ce07cac.1583359940.git.caz.yokoyama@intel.com


# eafc2aa2 06-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Enable timeslice on partial virtual engine dequeue

If we stop filling the ELSP due to an incompatible virtual engine
request, check if we should enable the timeslice on behalf of the queue.

This fixes the case where we are inspecting the last->next element when
we know that the last element is the last request in the execution queue,
and so decided we did not need to enable timeslicing despite the intent
to do so!

Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200306113012.3184606-1-chris@chris-wilson.co.uk
(cherry picked from commit 3df2deed411e0f1b7312baf0139aab8bba4c0410)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 60ef5b7a 11-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Track active elements during dequeue

Record the initial active element we use when building the next ELSP
submission, so that we can compare against it latter to see if there's
no change.

Fixes: 44d0a9c05bc0 ("drm/i915/execlists: Skip redundant resubmission")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200311092624.10012-2-chris@chris-wilson.co.uk


# 3a55dc89 10-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Mark up data-races in virtual engines

The virtual engine passes tokens back and forth to its backing physical
engines.

[ 57.372993] BUG: KCSAN: data-race in execlists_dequeue [i915] / virtual_submission_tasklet [i915]
[ 57.373012]
[ 57.373023] write to 0xffff8881f47324c0 of 4 bytes by interrupt on cpu 2:
[ 57.373241] execlists_dequeue+0x6fa/0x2150 [i915]
[ 57.373458] __execlists_submission_tasklet+0x48/0x60 [i915]
[ 57.373677] execlists_submission_tasklet+0xd3/0x170 [i915]
[ 57.373694] tasklet_action_common.isra.0+0x42/0xa0
[ 57.373709] __do_softirq+0xd7/0x2cd
[ 57.373723] irq_exit+0xbe/0xe0
[ 57.373735] do_IRQ+0x51/0x100
[ 57.373748] ret_from_intr+0x0/0x1c
[ 57.373963] engine_retire+0x89/0xe0 [i915]
[ 57.373977] process_one_work+0x3b1/0x690
[ 57.373990] worker_thread+0x80/0x670
[ 57.374004] kthread+0x19a/0x1e0
[ 57.374017] ret_from_fork+0x1f/0x30
[ 57.374027]
[ 57.374038] read to 0xffff8881f47324c0 of 4 bytes by interrupt on cpu 3:
[ 57.374256] virtual_submission_tasklet+0x27/0x5a0 [i915]
[ 57.374273] tasklet_action_common.isra.0+0x42/0xa0
[ 57.374288] __do_softirq+0xd7/0x2cd
[ 57.374302] run_ksoftirqd+0x15/0x20
[ 57.374315] smpboot_thread_fn+0x1ab/0x300
[ 57.374329] kthread+0x19a/0x1e0
[ 57.374342] ret_from_fork+0x1f/0x30

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200310141320.24149-3-chris@chris-wilson.co.uk


# f494960d 09-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Defend against concurrent updates to execlists->active

[ 206.875637] BUG: KCSAN: data-race in __i915_schedule+0x7fc/0x930 [i915]
[ 206.875654]
[ 206.875666] race at unknown origin, with read to 0xffff8881f7644480 of 8 bytes by task 703 on cpu 3:
[ 206.875901] __i915_schedule+0x7fc/0x930 [i915]
[ 206.876130] __bump_priority+0x63/0x80 [i915]
[ 206.876361] __i915_sched_node_add_dependency+0x258/0x300 [i915]
[ 206.876593] i915_sched_node_add_dependency+0x50/0xa0 [i915]
[ 206.876824] i915_request_await_dma_fence+0x1da/0x530 [i915]
[ 206.877057] i915_request_await_object+0x2fe/0x470 [i915]
[ 206.877287] i915_gem_do_execbuffer+0x45dc/0x4c20 [i915]
[ 206.877517] i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915]
[ 206.877535] drm_ioctl_kernel+0xe4/0x120
[ 206.877549] drm_ioctl+0x297/0x4c7
[ 206.877563] ksys_ioctl+0x89/0xb0
[ 206.877577] __x64_sys_ioctl+0x42/0x60
[ 206.877591] do_syscall_64+0x6e/0x2c0
[ 206.877606] entry_SYSCALL_64_after_hwframe+0x44/0xa9

v2: Be safe and include mb

References: https://gitlab.freedesktop.org/drm/intel/issues/1318
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200309170540.10332-1-chris@chris-wilson.co.uk


# a4e648a0 09-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlsts: Mark up racy inspection of current i915_request priority

[ 120.176548] BUG: KCSAN: data-race in __i915_schedule [i915] / effective_prio [i915]
[ 120.176566]
[ 120.176577] write to 0xffff8881e35e6540 of 4 bytes by task 730 on cpu 3:
[ 120.176792] __i915_schedule+0x63e/0x920 [i915]
[ 120.177007] __bump_priority+0x63/0x80 [i915]
[ 120.177220] __i915_sched_node_add_dependency+0x258/0x300 [i915]
[ 120.177438] i915_sched_node_add_dependency+0x50/0xa0 [i915]
[ 120.177654] i915_request_await_dma_fence+0x1da/0x530 [i915]
[ 120.177867] i915_request_await_object+0x2fe/0x470 [i915]
[ 120.178081] i915_gem_do_execbuffer+0x45dc/0x4c20 [i915]
[ 120.178292] i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915]
[ 120.178309] drm_ioctl_kernel+0xe4/0x120
[ 120.178322] drm_ioctl+0x297/0x4c7
[ 120.178335] ksys_ioctl+0x89/0xb0
[ 120.178348] __x64_sys_ioctl+0x42/0x60
[ 120.178361] do_syscall_64+0x6e/0x2c0
[ 120.178375] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 120.178387]
[ 120.178397] read to 0xffff8881e35e6540 of 4 bytes by interrupt on cpu 2:
[ 120.178606] effective_prio+0x25/0xc0 [i915]
[ 120.178812] process_csb+0xe8b/0x10a0 [i915]
[ 120.179021] execlists_submission_tasklet+0x30/0x170 [i915]
[ 120.179038] tasklet_action_common.isra.0+0x42/0xa0
[ 120.179053] __do_softirq+0xd7/0x2cd
[ 120.179066] irq_exit+0xbe/0xe0
[ 120.179078] do_IRQ+0x51/0x100
[ 120.179090] ret_from_intr+0x0/0x1c
[ 120.179104] cpuidle_enter_state+0x1b8/0x5d0
[ 120.179117] cpuidle_enter+0x50/0x90
[ 120.179131] do_idle+0x1a1/0x1f0
[ 120.179145] cpu_startup_entry+0x14/0x16
[ 120.179158] start_secondary+0x120/0x180
[ 120.179172] secondary_startup_64+0xa4/0xb0

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200309110934.868-5-chris@chris-wilson.co.uk


# fa192d90 09-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Mark up read of i915_request.fence.flags

[ 145.927961] BUG: KCSAN: data-race in can_merge_rq [i915] / signal_irq_work [i915]
[ 145.927980]
[ 145.927992] write (marked) to 0xffff8881e513fab0 of 8 bytes by interrupt on cpu 2:
[ 145.928250] signal_irq_work+0x134/0x640 [i915]
[ 145.928268] irq_work_run_list+0xd7/0x120
[ 145.928283] irq_work_run+0x1d/0x50
[ 145.928300] smp_irq_work_interrupt+0x21/0x30
[ 145.928328] irq_work_interrupt+0xf/0x20
[ 145.928356] _raw_spin_unlock_irqrestore+0x34/0x40
[ 145.928596] execlists_submission_tasklet+0xde/0x170 [i915]
[ 145.928616] tasklet_action_common.isra.0+0x42/0xa0
[ 145.928632] __do_softirq+0xd7/0x2cd
[ 145.928646] irq_exit+0xbe/0xe0
[ 145.928665] do_IRQ+0x51/0x100
[ 145.928684] ret_from_intr+0x0/0x1c
[ 145.928699] schedule+0x0/0xb0
[ 145.928719] worker_thread+0x194/0x670
[ 145.928743] kthread+0x19a/0x1e0
[ 145.928765] ret_from_fork+0x1f/0x30
[ 145.928784]
[ 145.928796] read to 0xffff8881e513fab0 of 8 bytes by task 738 on cpu 1:
[ 145.929046] can_merge_rq+0xb1/0x100 [i915]
[ 145.929282] __execlists_submission_tasklet+0x866/0x25a0 [i915]
[ 145.929518] execlists_submit_request+0x2a4/0x2b0 [i915]
[ 145.929758] submit_notify+0x8f/0xc0 [i915]
[ 145.929989] __i915_sw_fence_complete+0x5d/0x3e0 [i915]
[ 145.930221] i915_sw_fence_complete+0x58/0x80 [i915]
[ 145.930453] i915_sw_fence_commit+0x16/0x20 [i915]
[ 145.930698] __i915_request_queue+0x60/0x70 [i915]
[ 145.930935] i915_gem_do_execbuffer+0x3997/0x4c20 [i915]
[ 145.931175] i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915]
[ 145.931194] drm_ioctl_kernel+0xe4/0x120
[ 145.931208] drm_ioctl+0x297/0x4c7
[ 145.931222] ksys_ioctl+0x89/0xb0
[ 145.931238] __x64_sys_ioctl+0x42/0x60
[ 145.931260] do_syscall_64+0x6e/0x2c0
[ 145.931275] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200309110934.868-4-chris@chris-wilson.co.uk


# 875c3b4b 09-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Mark up racy check of last list element

[ 25.025543] BUG: KCSAN: data-race in __i915_request_create [i915] / process_csb [i915]
[ 25.025561]
[ 25.025573] write (marked) to 0xffff8881e85c1620 of 8 bytes by task 696 on cpu 1:
[ 25.025789] __i915_request_create+0x54b/0x5d0 [i915]
[ 25.026001] i915_request_create+0xcc/0x150 [i915]
[ 25.026218] i915_gem_do_execbuffer+0x2f70/0x4c20 [i915]
[ 25.026428] i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915]
[ 25.026445] drm_ioctl_kernel+0xe4/0x120
[ 25.026459] drm_ioctl+0x297/0x4c7
[ 25.026472] ksys_ioctl+0x89/0xb0
[ 25.026484] __x64_sys_ioctl+0x42/0x60
[ 25.026497] do_syscall_64+0x6e/0x2c0
[ 25.026510] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 25.026522]
[ 25.026532] read to 0xffff8881e85c1620 of 8 bytes by interrupt on cpu 2:
[ 25.026742] process_csb+0x8d6/0x1070 [i915]
[ 25.026949] execlists_submission_tasklet+0x30/0x170 [i915]
[ 25.026969] tasklet_action_common.isra.0+0x42/0xa0
[ 25.026984] __do_softirq+0xd7/0x2cd
[ 25.026997] irq_exit+0xbe/0xe0
[ 25.027009] do_IRQ+0x51/0x100
[ 25.027021] ret_from_intr+0x0/0x1c
[ 25.027033] poll_idle+0x3e/0x13b
[ 25.027047] cpuidle_enter_state+0x189/0x5d0
[ 25.027060] cpuidle_enter+0x50/0x90
[ 25.027074] do_idle+0x1a1/0x1f0
[ 25.027086] cpu_startup_entry+0x14/0x16
[ 25.027100] start_secondary+0x120/0x180
[ 25.027116] secondary_startup_64+0xa4/0xb0

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200309110934.868-2-chris@chris-wilson.co.uk


# 23a44ae9 09-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Mark up the racy access to switch_priority_hint

[ 7534.150687] BUG: KCSAN: data-race in __execlists_submission_tasklet [i915] / process_csb [i915]
[ 7534.150706]
[ 7534.150717] write to 0xffff8881f1bc24b4 of 4 bytes by task 24404 on cpu 3:
[ 7534.150925] __execlists_submission_tasklet+0x1158/0x2780 [i915]
[ 7534.151133] execlists_submit_request+0x2e8/0x2f0 [i915]
[ 7534.151348] submit_notify+0x8f/0xc0 [i915]
[ 7534.151549] __i915_sw_fence_complete+0x5d/0x3e0 [i915]
[ 7534.151753] i915_sw_fence_complete+0x58/0x80 [i915]
[ 7534.151963] i915_sw_fence_commit+0x16/0x20 [i915]
[ 7534.152179] __i915_request_queue+0x60/0x70 [i915]
[ 7534.152388] i915_gem_do_execbuffer+0x3997/0x4c20 [i915]
[ 7534.152598] i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915]
[ 7534.152615] drm_ioctl_kernel+0xe4/0x120
[ 7534.152629] drm_ioctl+0x297/0x4c7
[ 7534.152642] ksys_ioctl+0x89/0xb0
[ 7534.152654] __x64_sys_ioctl+0x42/0x60
[ 7534.152667] do_syscall_64+0x6e/0x2c0
[ 7534.152681] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 7534.152693]
[ 7534.152703] read to 0xffff8881f1bc24b4 of 4 bytes by interrupt on cpu 2:
[ 7534.152914] process_csb+0xe7c/0x10a0 [i915]
[ 7534.153120] execlists_submission_tasklet+0x30/0x170 [i915]
[ 7534.153138] tasklet_action_common.isra.0+0x42/0xa0
[ 7534.153153] __do_softirq+0xd7/0x2cd
[ 7534.153166] run_ksoftirqd+0x15/0x20
[ 7534.153180] smpboot_thread_fn+0x1ab/0x300
[ 7534.153194] kthread+0x19a/0x1e0
[ 7534.153207] ret_from_fork+0x1f/0x30

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200309144249.10309-1-chris@chris-wilson.co.uk


# 3df2deed 06-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Enable timeslice on partial virtual engine dequeue

If we stop filling the ELSP due to an incompatible virtual engine
request, check if we should enable the timeslice on behalf of the queue.

This fixes the case where we are inspecting the last->next element when
we know that the last element is the last request in the execution queue,
and so decided we did not need to enable timeslicing despite the intent
to do so!

Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200306113012.3184606-1-chris@chris-wilson.co.uk


# 1eaa251b 06-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Assert requests within a context are submitted in order

Check the flow of requests into the hardware to verify that are
submitted in order along their timeline.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200306071614.2846708-1-chris@chris-wilson.co.uk


# 81dcef4c 05-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Show the "switch priority hint" in dumps

Show the timeslicing priority hint in engine dumps to aide debugging.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200305135843.2760512-1-chris@chris-wilson.co.uk


# 8e9f84cf 03-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Propagate change in error status to children on unhold

As we release the head requests back into the queue, propagate any
change in error status that may have occurred while the requests were
temporarily suspended.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1277
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200304121849.2448028-2-chris@chris-wilson.co.uk


# 36e191f0 03-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Apply i915_request_skip() on submission

Trying to use i915_request_skip() prior to i915_request_add() causes us
to try and fill the ring upto request->postfix, which has not yet been
set, and so may cause us to memset() past the end of the ring.

Instead of skipping the request immediately, just flag the error on the
request (only accepting the first fatal error we see) and then clear the
request upon submission.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200304121849.2448028-1-chris@chris-wilson.co.uk


# 15db5fcc 02-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Check the sentinel is alone in the ELSP

We only use sentinel requests for "preempt-to-idle" passes, so assert
that they are the only request in a new submission.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200302085812.4172450-12-chris@chris-wilson.co.uk


# 3fc28d3e 27-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Reset queue_priority_hint after wedging

An odd and highly unlikely path caught us out. On delayed submission
(due to an asynchronous reset handler), we poked the priority_hint and
kicked the tasklet. However, we had already marked the device as wedged
and swapped out the tasklet for a no-op. The result was that we never
cleared the priority hint and became upset when we later checked.

<0> [574.303565] i915_sel-6278 2.... 481822445us : __i915_subtests: Running intel_execlists_live_selftests/live_error_interrupt
<0> [574.303565] i915_sel-6278 2.... 481822472us : __engine_unpark: 0000:00:02.0 rcs0:
<0> [574.303565] i915_sel-6278 2.... 481822491us : __gt_unpark: 0000:00:02.0
<0> [574.303565] i915_sel-6278 2.... 481823220us : execlists_context_reset: 0000:00:02.0 rcs0: context:f4ee reset
<0> [574.303565] i915_sel-6278 2.... 481824830us : __intel_context_active: 0000:00:02.0 rcs0: context:f51b active
<0> [574.303565] i915_sel-6278 2.... 481825258us : __intel_context_do_pin: 0000:00:02.0 rcs0: context:f51b pin ring:{start:00006000, head:0000, tail:0000}
<0> [574.303565] i915_sel-6278 2.... 481825311us : __i915_request_commit: 0000:00:02.0 rcs0: fence f51b:2, current 0
<0> [574.303565] i915_sel-6278 2d..1 481825347us : __i915_request_submit: 0000:00:02.0 rcs0: fence f51b:2, current 0
<0> [574.303565] i915_sel-6278 2d..1 481825363us : trace_ports: 0000:00:02.0 rcs0: submit { f51b:2, 0:0 }
<0> [574.303565] i915_sel-6278 2.... 481826809us : __intel_context_active: 0000:00:02.0 rcs0: context:f51c active
<0> [574.303565] <idle>-0 7d.h2 481827326us : cs_irq_handler: 0000:00:02.0 rcs0: CS error: 1
<0> [574.303565] <idle>-0 7..s1 481827377us : process_csb: 0000:00:02.0 rcs0: cs-irq head=3, tail=4
<0> [574.303565] <idle>-0 7..s1 481827379us : process_csb: 0000:00:02.0 rcs0: csb[4]: status=0x10000001:0x00000000
<0> [574.305593] <idle>-0 7..s1 481827385us : trace_ports: 0000:00:02.0 rcs0: promote { f51b:2*, 0:0 }
<0> [574.305611] <idle>-0 7..s1 481828179us : execlists_reset: 0000:00:02.0 rcs0: reset for CS error
<0> [574.305611] i915_sel-6278 2.... 481828284us : __intel_context_do_pin: 0000:00:02.0 rcs0: context:f51c pin ring:{start:00007000, head:0000, tail:0000}
<0> [574.305611] i915_sel-6278 2.... 481828345us : __i915_request_commit: 0000:00:02.0 rcs0: fence f51c:2, current 0
<0> [574.305611] <idle>-0 7dNs2 481847823us : __i915_request_unsubmit: 0000:00:02.0 rcs0: fence f51b:2, current 1
<0> [574.305611] <idle>-0 7dNs2 481847857us : execlists_hold: 0000:00:02.0 rcs0: fence f51b:2, current 1 on hold
<0> [574.305611] <idle>-0 7.Ns1 481847863us : intel_engine_reset: 0000:00:02.0 rcs0: flags=4
<0> [574.305611] <idle>-0 7.Ns1 481847945us : execlists_reset_prepare: 0000:00:02.0 rcs0: depth<-1
<0> [574.305611] <idle>-0 7.Ns1 481847946us : intel_engine_stop_cs: 0000:00:02.0 rcs0:
<0> [574.305611] <idle>-0 7.Ns1 538584284us : intel_engine_stop_cs: 0000:00:02.0 rcs0: timed out on STOP_RING -> IDLE
<0> [574.305611] <idle>-0 7.Ns1 538584347us : __intel_gt_reset: 0000:00:02.0 engine_mask=1
<0> [574.305611] <idle>-0 7.Ns1 538584406us : execlists_reset_rewind: 0000:00:02.0 rcs0:
<0> [574.305611] <idle>-0 7dNs2 538585050us : __i915_request_reset: 0000:00:02.0 rcs0: fence f51b:2, current 1 guilty? yes
<0> [574.305611] <idle>-0 7dNs2 538585063us : __execlists_reset: 0000:00:02.0 rcs0: replay {head:0000, tail:0068}
<0> [574.306565] <idle>-0 7.Ns1 538588457us : intel_engine_cancel_stop_cs: 0000:00:02.0 rcs0:
<0> [574.306565] <idle>-0 7dNs2 538588462us : __i915_request_submit: 0000:00:02.0 rcs0: fence f51c:2, current 0
<0> [574.306565] <idle>-0 7dNs2 538588471us : trace_ports: 0000:00:02.0 rcs0: submit { f51c:2, 0:0 }
<0> [574.306565] <idle>-0 7.Ns1 538588474us : execlists_reset_finish: 0000:00:02.0 rcs0: depth->1
<0> [574.306565] kworker/-202 2.... 538588755us : i915_request_retire: 0000:00:02.0 rcs0: fence f51c:2, current 2
<0> [574.306565] ksoftirq-46 7..s. 538588773us : process_csb: 0000:00:02.0 rcs0: cs-irq head=11, tail=1
<0> [574.306565] ksoftirq-46 7..s. 538588774us : process_csb: 0000:00:02.0 rcs0: csb[0]: status=0x10000001:0x00000000
<0> [574.306565] ksoftirq-46 7..s. 538588776us : trace_ports: 0000:00:02.0 rcs0: promote { f51c:2!, 0:0 }
<0> [574.306565] ksoftirq-46 7..s. 538588778us : process_csb: 0000:00:02.0 rcs0: csb[1]: status=0x10000018:0x00000020
<0> [574.306565] ksoftirq-46 7..s. 538588779us : trace_ports: 0000:00:02.0 rcs0: completed { f51c:2!, 0:0 }
<0> [574.306565] kworker/-202 2.... 538588826us : intel_context_unpin: 0000:00:02.0 rcs0: context:f51c unpin
<0> [574.306565] i915_sel-6278 6.... 538589663us : __intel_gt_set_wedged.part.32: 0000:00:02.0 start
<0> [574.306565] i915_sel-6278 6.... 538589667us : execlists_reset_prepare: 0000:00:02.0 rcs0: depth<-0
<0> [574.306565] i915_sel-6278 6.... 538589710us : intel_engine_stop_cs: 0000:00:02.0 rcs0:
<0> [574.306565] i915_sel-6278 6.... 538589732us : execlists_reset_prepare: 0000:00:02.0 bcs0: depth<-0
<0> [574.307591] i915_sel-6278 6.... 538589733us : intel_engine_stop_cs: 0000:00:02.0 bcs0:
<0> [574.307591] i915_sel-6278 6.... 538589757us : execlists_reset_prepare: 0000:00:02.0 vcs0: depth<-0
<0> [574.307591] i915_sel-6278 6.... 538589758us : intel_engine_stop_cs: 0000:00:02.0 vcs0:
<0> [574.307591] i915_sel-6278 6.... 538589771us : execlists_reset_prepare: 0000:00:02.0 vcs1: depth<-0
<0> [574.307591] i915_sel-6278 6.... 538589772us : intel_engine_stop_cs: 0000:00:02.0 vcs1:
<0> [574.307591] i915_sel-6278 6.... 538589778us : execlists_reset_prepare: 0000:00:02.0 vecs0: depth<-0
<0> [574.307591] i915_sel-6278 6.... 538589780us : intel_engine_stop_cs: 0000:00:02.0 vecs0:
<0> [574.307591] i915_sel-6278 6.... 538589786us : __intel_gt_reset: 0000:00:02.0 engine_mask=ff
<0> [574.307591] i915_sel-6278 6.... 538591175us : execlists_reset_cancel: 0000:00:02.0 rcs0:
<0> [574.307591] i915_sel-6278 6.... 538591970us : execlists_reset_cancel: 0000:00:02.0 bcs0:
<0> [574.307591] i915_sel-6278 6.... 538591982us : execlists_reset_cancel: 0000:00:02.0 vcs0:
<0> [574.307591] i915_sel-6278 6.... 538591996us : execlists_reset_cancel: 0000:00:02.0 vcs1:
<0> [574.307591] i915_sel-6278 6.... 538592759us : execlists_reset_cancel: 0000:00:02.0 vecs0:
<0> [574.307591] i915_sel-6278 6.... 538592977us : execlists_reset_finish: 0000:00:02.0 rcs0: depth->0
<0> [574.307591] i915_sel-6278 6.N.. 538592996us : execlists_reset_finish: 0000:00:02.0 bcs0: depth->0
<0> [574.307591] i915_sel-6278 6.N.. 538593023us : execlists_reset_finish: 0000:00:02.0 vcs0: depth->0
<0> [574.307591] i915_sel-6278 6.N.. 538593037us : execlists_reset_finish: 0000:00:02.0 vcs1: depth->0
<0> [574.307591] i915_sel-6278 6.N.. 538593051us : execlists_reset_finish: 0000:00:02.0 vecs0: depth->0
<0> [574.307591] i915_sel-6278 6.... 538593407us : __intel_gt_set_wedged.part.32: 0000:00:02.0 end
<0> [574.307591] kworker/-210 7d..1 551958381us : execlists_unhold: 0000:00:02.0 rcs0: fence f51b:2, current 2 hold release
<0> [574.307591] i915_sel-6278 0.... 559490788us : i915_request_retire: 0000:00:02.0 rcs0: fence f51b:2, current 2
<0> [574.307591] i915_sel-6278 0.... 559490793us : intel_context_unpin: 0000:00:02.0 rcs0: context:f51b unpin
<0> [574.307591] i915_sel-6278 0.... 559490798us : __engine_park: 0000:00:02.0 rcs0: parked
<0> [574.307591] i915_sel-6278 0.... 559490982us : __intel_context_retire: 0000:00:02.0 rcs0: context:f51c retire runtime: { total:30004ns, avg:30004ns }
<0> [574.307591] i915_sel-6278 0.... 559491372us : __engine_park: __engine_park:261 GEM_BUG_ON(engine->execlists.queue_priority_hint != (-((int)(~0U >> 1)) - 1))

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200227085723.1961649-9-chris@chris-wilson.co.uk


# d3b03d8b 27-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Check engine-is-awake on reset later

As we drop the engine-pm on retiring, that may happen while there are
still CS events in the buffer. As such we cannot assert the engine is
still active on reset, until we know that the current request is still
in flight.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1338
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200227204727.2009346-1-chris@chris-wilson.co.uk


# 88be76cd 25-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Allow userspace to specify ringsize on construction

No good reason why we must always use a static ringsize, so let
userspace select one during construction.

Link: https://github.com/intel/compute-runtime/pull/261
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Steve Carbonari <steven.carbonari@intel.com>
Reviewed-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200225192206.1107336-2-chris@chris-wilson.co.uk


# 66940061 20-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Protect signaler walk with RCU

While we know that the waiters cannot disappear as we walk our list
(only that they might be added), the same cannot be said for our
signalers as they may be completed by the HW and retired as we process
this request. Ergo we need to use rcu to protect the list iteration and
remember to mark up the list_del_rcu.

v2: Mark the deps as safe-for-rcu

Fixes: 793c22617367 ("drm/i915/gt: Protect execlists_hold/unhold from new waiters")
Fixes: 32ff621fd744 ("drm/i915/gt: Allow temporary suspension of inflight requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200220075025.1539375-1-chris@chris-wilson.co.uk


# 15de9cb5 10-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Avoid resetting ring->head outside of its timeline mutex

We manipulate ring->head while active in i915_request_retire underneath
the timeline manipulation. We cannot rely on a stable ring->head outside
of the timeline->mutex, in particular while setting up the context for
resume and reset.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1126
Fixes: 0881954965e3 ("drm/i915: Introduce intel_context.pin_mutex for pin management")
Fixes: e5dadff4b093 ("drm/i915: Protect request retirement with timeline->mutex")
References: f3c0efc9fe7a ("drm/i915/execlists: Leave resetting ring to intel_ring")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200211120131.958949-1-chris@chris-wilson.co.uk
(cherry picked from commit 42827350f75c56d0fe9f15d8425a1390528958b6)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# b1339eca 07-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Always force a context reload when rewinding RING_TAIL

If we rewind the RING_TAIL on a context, due to a preemption event, we
must force the context restore for the RING_TAIL update to be properly
handled. Rather than note which preemption events may cause us to rewind
the tail, compare the new request's tail with the previously submitted
RING_TAIL, as it turns out that timeslicing was causing unexpected
rewinds.

<idle>-0 0d.s2 1280851190us : __execlists_submission_tasklet: 0000:00:02.0 rcs0: expired last=130:4698, prio=3, hint=3
<idle>-0 0d.s2 1280851192us : __i915_request_unsubmit: 0000:00:02.0 rcs0: fence 66:119966, current 119964
<idle>-0 0d.s2 1280851195us : __i915_request_unsubmit: 0000:00:02.0 rcs0: fence 130:4698, current 4695
<idle>-0 0d.s2 1280851198us : __i915_request_unsubmit: 0000:00:02.0 rcs0: fence 130:4696, current 4695
^---- Note we unwind 2 requests from the same context

<idle>-0 0d.s2 1280851208us : __i915_request_submit: 0000:00:02.0 rcs0: fence 130:4696, current 4695
<idle>-0 0d.s2 1280851213us : __i915_request_submit: 0000:00:02.0 rcs0: fence 134:1508, current 1506
^---- But to apply the new timeslice, we have to replay the first request
before the new client can start -- the unexpected RING_TAIL rewind

<idle>-0 0d.s2 1280851219us : trace_ports: 0000:00:02.0 rcs0: submit { 130:4696*, 134:1508 }
synmark2-5425 2..s. 1280851239us : process_csb: 0000:00:02.0 rcs0: cs-irq head=5, tail=0
synmark2-5425 2..s. 1280851240us : process_csb: 0000:00:02.0 rcs0: csb[0]: status=0x00008002:0x00000000
^---- Preemption event for the ELSP update; note the lite-restore

synmark2-5425 2..s. 1280851243us : trace_ports: 0000:00:02.0 rcs0: preempted { 130:4698, 66:119966 }
synmark2-5425 2..s. 1280851246us : trace_ports: 0000:00:02.0 rcs0: promote { 130:4696*, 134:1508 }
synmark2-5425 2.... 1280851462us : __i915_request_commit: 0000:00:02.0 rcs0: fence 130:4700, current 4695
synmark2-5425 2.... 1280852111us : __i915_request_commit: 0000:00:02.0 rcs0: fence 130:4702, current 4695
synmark2-5425 2.Ns1 1280852296us : process_csb: 0000:00:02.0 rcs0: cs-irq head=0, tail=2
synmark2-5425 2.Ns1 1280852297us : process_csb: 0000:00:02.0 rcs0: csb[1]: status=0x00000814:0x00000000
synmark2-5425 2.Ns1 1280852299us : trace_ports: 0000:00:02.0 rcs0: completed { 130:4696!, 134:1508 }
synmark2-5425 2.Ns1 1280852301us : process_csb: 0000:00:02.0 rcs0: csb[2]: status=0x00000818:0x00000040
synmark2-5425 2.Ns1 1280852302us : trace_ports: 0000:00:02.0 rcs0: completed { 134:1508, 0:0 }
synmark2-5425 2.Ns1 1280852313us : process_csb: process_csb:2336 GEM_BUG_ON(!i915_request_completed(*execlists->active) && !reset_in_progress(execlists))

Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
Referenecs: 82c69bf58650 ("drm/i915/gt: Detect if we miss WaIdleLiteRestore")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Link: https://patchwork.freedesktop.org/patch/msgid/20200207211452.2860634-1-chris@chris-wilson.co.uk
(cherry picked from commit 5ba32c7be81e53ea8a27190b0f6be98e6c6779af)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 19b5f3b4 06-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Protect defer_request() from new waiters

Mika spotted

<4>[17436.705441] general protection fault: 0000 [#1] PREEMPT SMP PTI
<4>[17436.705447] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.5.0+ #1
<4>[17436.705449] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 3805 05/16/2018
<4>[17436.705512] RIP: 0010:__execlists_submission_tasklet+0xc4d/0x16e0 [i915]
<4>[17436.705516] Code: c5 4c 8d 60 e0 75 17 e9 8c 07 00 00 49 8b 44 24 20 49 39 c5 4c 8d 60 e0 0f 84 7a 07 00 00 49 8b 5c 24 08 49 8b 87 80 00 00 00 <48> 39 83 d8 fe ff ff 75 d9 48 8b 83 88 fe ff ff a8 01 0f 84 b6 05
<4>[17436.705518] RSP: 0018:ffffc9000012ce80 EFLAGS: 00010083
<4>[17436.705521] RAX: ffff88822ae42000 RBX: 5a5a5a5a5a5a5a5a RCX: dead000000000122
<4>[17436.705523] RDX: ffff88822ae42588 RSI: ffff8881e32a7908 RDI: ffff8881c429fd48
<4>[17436.705525] RBP: ffffc9000012cf00 R08: ffff88822ae42588 R09: 00000000fffffffe
<4>[17436.705527] R10: ffff8881c429fb80 R11: 00000000a677cf08 R12: ffff8881c42a0aa8
<4>[17436.705529] R13: ffff8881c429fd38 R14: ffff88822ae42588 R15: ffff8881c429fb80
<4>[17436.705532] FS: 0000000000000000(0000) GS:ffff88822ed00000(0000) knlGS:0000000000000000
<4>[17436.705534] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[17436.705536] CR2: 00007f858c76d000 CR3: 0000000005610003 CR4: 00000000003606e0
<4>[17436.705538] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[17436.705540] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[17436.705542] Call Trace:
<4>[17436.705545] <IRQ>
<4>[17436.705603] execlists_submission_tasklet+0xc0/0x130 [i915]

which is us consuming a partially initialised new waiter in
defer_requests(). We can prevent this by initialising the i915_dependency
prior to making it visible, and since we are using a concurrent
list_add/iterator mark them up to the compiler.

Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200206204915.2636606-2-chris@chris-wilson.co.uk
(cherry picked from commit f14f27b1663269a81ed62d3961fe70250a1a0623)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# cf274daa 17-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Rearrange code to silence compiler

Without selftests enabled, I915_SELFTEST_ONLY becomes a dummy,
generating a bare '0'. This causes the compiler to complain about a
useless line, and while we could use I915_SELFTEST_DECLARE instead, it
is a bit messier. Move the selftest-only code to a helper and make that
conditional on having selftests enabled.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200217095835.599827-1-chris@chris-wilson.co.uk


# 1883a0a4 16-Feb-2020 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Track hw reported context runtime

GPU saves accumulated context runtime (in CS timestamp units) in PPHWSP
which will be useful for us in cases when we are not able to track context
busyness ourselves (like with GuC). Keep a copy of this in struct
intel_context from where it can be easily read even if the context is not
pinned.

v2:
(Chris)
* Do not store pphwsp address in intel_context.
* Log CS wrap-around.
* Simplify calculation by relying on integer wraparound.
v3:
* Include total/avg in traces and error state for debugging

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200216133620.394962-1-chris@chris-wilson.co.uk


# e06b8524 13-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Suppress warnings for unused debugging locals

With debugging turned off, we have to tell the compiler not to warn
about the unused debug locals.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200213081217.3107410-1-chris@chris-wilson.co.uk


# c616d238 11-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Expand bad CS completion event debug

Show the ring/request/context state if we see what we believe is an
early CS completion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200211230944.1203098-1-chris@chris-wilson.co.uk


# a2f90f4f 22-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Reclaim the hanging virtual request

If we encounter a hang on a virtual engine, as we process the hang the
request may already have been moved back to the virtual engine (we are
processing the hang on the physical engine). We need to reclaim the
request from the virtual engine so that the locking is consistent and
local to the real engine on which we will hold the request for error
state capturing.

v2: Pull the reclamation into execlists_hold() and assert that cannot be
called from outside of the reset (i.e. with the tasklet disabled).
v3: Added selftest
v4: Drop the reference owned by the virtual engine

Fixes: ad18ba7b5eeb ("drm/i915/execlists: Offline error capture")
Testcase: igt/gem_exec_balancer/hang
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-2-chris@chris-wilson.co.uk
(cherry picked from commit 989df3a7bd2abe566521e61d1aebf603eb013b7f)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 317e0395 22-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Take a reference while capturing the guilty request

Thanks to preempt-to-busy, we leave the request on the HW as we submit
the preemption request. This means that the request may complete at any
moment as we process HW events, and in particular the request may be
retired as we are planning to capture it for a preemption timeout.

Be more careful while obtaining the request to capture after a
preemption timeout, and check to see if it completed before we were able
to put it on the on-hold list. If we do see it did complete just before
we capture the request, proclaim the preemption-timeout a false positive
and pardon the reset as we should hit an arbitration point momentarily
and so be able to process the preemption.

Note that even after we move the request to be on hold it may be retired
(as the reset to stop the HW comes after), so we do require to hold our
own reference as we work on the request for capture (and all of the
peeking at state within the request needs to be carefully protected).

Fixes: c3f1ed90e6ff ("drm/i915/gt: Allow temporary suspension of inflight requests")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/997
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-1-chris@chris-wilson.co.uk
(cherry picked from commit 4ba5c086a1d8e38d6927967ae1a3271a6ab7a927)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# ad18ba7b 16-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Offline error capture

Currently, we skip error capture upon forced preemption. We apply forced
preemption when there is a higher priority request that should be
running but is being blocked, and we skip inline error capture so that
the preemption request is not further delayed by a user controlled
capture -- extending the denial of service.

However, preemption reset is also used for heartbeats and regular GPU
hangs. By skipping the error capture, we remove the ability to debug GPU
hangs.

In order to capture the error without delaying the preemption request
further, we can do an out-of-line capture by removing the guilty request
from the execution queue and scheduling a worker to dump that request.
When removing a request, we need to remove the entire context and all
descendants from the execution queue, so that they do not jump past.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/738
Fixes: 3a7a92aba8fb ("drm/i915/execlists: Force preemption")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-3-chris@chris-wilson.co.uk
(cherry picked from commit 748317386afb235e11616098d2c7772e49776b58)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# c3f1ed90 16-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Allow temporary suspension of inflight requests

In order to support out-of-line error capture, we need to remove the
active request from HW and put it to one side while a worker compresses
and stores all the details associated with that request. (As that
compression may take an arbitrary user-controlled amount of time, we
want to let the engine continue running on other workloads while the
hanging request is dumped.) Not only do we need to remove the active
request, but we also have to remove its context and all requests that
were dependent on it (both in flight, queued and future submission).

Finally once the capture is complete, we need to be able to resubmit the
request and its dependents and allow them to execute.

v2: Replace stack recursion with a simple list.
v3: Check all the parents, not just the first, when searching for a
stuck ancestor!

References: https://gitlab.freedesktop.org/drm/intel/issues/738
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-2-chris@chris-wilson.co.uk
(cherry picked from commit 32ff621fd74496f0c33644125fb69ff175859b1f)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 9e2750fc 16-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Keep track of request among the scheduling lists

If we keep track of when the i915_request.sched.link is on the HW
runlist, or in the priority queue we can simplify our interactions with
the request (such as during rescheduling). This also simplifies the next
patch where we introduce a new in-between list, for requests that are
ready but neither on the run list or in the queue.

v2: Update i915_sched_node.link explanation for current usage where it
is a link on both the queue and on the runlists.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-1-chris@chris-wilson.co.uk
(cherry picked from commit 672c368f9398042b629740cc9816e8e939eff2db)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# f16ccb64 10-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Disable use of hwsp_cacheline for kernel_context

Currently on execlists, we use a local hwsp for the kernel_context,
rather than the engine's HWSP, as this is the default for execlists.
However, seqno wrap requires allocating a new HWSP cacheline, and may
require pinning a new HWSP page in the GGTT. This operation requiring
pinning in the GGTT is not allowed within the kernel_context timeline,
as doing so may require re-entering the kernel_context in order to evict
from the GGTT. As we want to avoid requiring a new HWSP for the
kernel_context, we can use the permanently pinned engine's HWSP instead.
However to do so we must prevent the use of semaphores reading the
kernel_context's HWSP, as the use of semaphores do not support rollover
onto the same cacheline. Fortunately, the kernel_context is mostly
isolated, so unlikely to give benefit to semaphores.

Reported-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200210205722.794180-5-chris@chris-wilson.co.uk


# 42827350 10-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Avoid resetting ring->head outside of its timeline mutex

We manipulate ring->head while active in i915_request_retire underneath
the timeline manipulation. We cannot rely on a stable ring->head outside
of the timeline->mutex, in particular while setting up the context for
resume and reset.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1126
Fixes: 0881954965e3 ("drm/i915: Introduce intel_context.pin_mutex for pin management")
Fixes: e5dadff4b093 ("drm/i915: Protect request retirement with timeline->mutex")
References: f3c0efc9fe7a ("drm/i915/execlists: Leave resetting ring to intel_ring")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200211120131.958949-1-chris@chris-wilson.co.uk


# a754012b 15-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Leave resetting ring to intel_ring

We need to allow concurrent intel_context_unpin, which means avoiding
doing destructive operations like intel_ring_reset(). This was already
fixed for intel_ring_unpin() in commit 0725d9a31869 ("drm/i915/gt: Make
intel_ring_unpin() safe for concurrent pint"), but I overlooked that
execlists_context_unpin() also made the same mistake.

Reported-by: Matthew Brost <matthew.brost@intel.com>
Fixes: 841350223816 ("drm/i915/gt: Drop mutex serialisation between context pin/unpin")
References: 0725d9a31869 ("drm/i915/gt: Make intel_ring_unpin() safe for concurrent pint")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200115175829.2761329-1-chris@chris-wilson.co.uk
(cherry picked from commit f3c0efc9fe7a4e61544034f525348a3aa86ac5aa)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# b6560007 09-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/selftests: Drop live_preempt_hang

live_preempt_hang's use of hang injection has been superseded by
live_preempt_reset's use of an non-preemptible spinner. The latter does
not require intrusive hacks into the code.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200209230838.361154-2-chris@chris-wilson.co.uk


# 1b5af537 14-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Use the BIT when checking the flags, not the index

In converting over to using set_bit()/test_bit(), when manually
inspecting the rq->fence.flags, we need to use BIT().

Fixes: e1c31fb5dde3 ("drm/i915: Merge i915_request.flags with i915_request.fence.flags")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200115122509.2673075-1-chris@chris-wilson.co.uk
(cherry picked from commit 72ff2b8d5f2dcb09bfa37b902c23311eec426496)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 7d7569ac 09-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Ignore tracek for nop process_csb

Recording the frequent inspection of CSB head/tail when there is
expected to be no update adds noise to the debug trace. (Not entirely
useless, but since we know the sequence of function calls, we can
surmise the function was called -- so redundant.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200209131922.180287-2-chris@chris-wilson.co.uk


# 26208d87 09-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Fix hold/unhold recursion

In eliminating the recursion from walking the tree of signalers/waiters
for processing the hold/unhold operations, a crucial error crept in
where we looked at the parent request and not the list element when
processing the list.

Brown paper bag, much?

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1166
Fixes: 32ff621fd744 ("drm/i915/gt: Allow temporary suspension of inflight requests")
Fixes: 748317386afb ("drm/i915/execlists: Offline error capture")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200209131922.180287-1-chris@chris-wilson.co.uk


# 48d7fb18 03-Feb-2020 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Remove lite restore defines

We have switched from tail manipulation to forced context restore
to implement WaIdleLiteRestore. Remove the old defines and comments.

Note: we still do emit the WA tail, and use it as our first attempt to
avoid forcing a full-restore instead of a lite-restore, we just have a
much stronger backup mechanism for repeated preemptions.

References: f26a9e959a7b ("drm/i915/gt: Detect if we miss WaIdleLiteRestore")
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200203163312.15475-1-mika.kuoppala@linux.intel.com


# 5ba32c7b 07-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Always force a context reload when rewinding RING_TAIL

If we rewind the RING_TAIL on a context, due to a preemption event, we
must force the context restore for the RING_TAIL update to be properly
handled. Rather than note which preemption events may cause us to rewind
the tail, compare the new request's tail with the previously submitted
RING_TAIL, as it turns out that timeslicing was causing unexpected
rewinds.

<idle>-0 0d.s2 1280851190us : __execlists_submission_tasklet: 0000:00:02.0 rcs0: expired last=130:4698, prio=3, hint=3
<idle>-0 0d.s2 1280851192us : __i915_request_unsubmit: 0000:00:02.0 rcs0: fence 66:119966, current 119964
<idle>-0 0d.s2 1280851195us : __i915_request_unsubmit: 0000:00:02.0 rcs0: fence 130:4698, current 4695
<idle>-0 0d.s2 1280851198us : __i915_request_unsubmit: 0000:00:02.0 rcs0: fence 130:4696, current 4695
^---- Note we unwind 2 requests from the same context

<idle>-0 0d.s2 1280851208us : __i915_request_submit: 0000:00:02.0 rcs0: fence 130:4696, current 4695
<idle>-0 0d.s2 1280851213us : __i915_request_submit: 0000:00:02.0 rcs0: fence 134:1508, current 1506
^---- But to apply the new timeslice, we have to replay the first request
before the new client can start -- the unexpected RING_TAIL rewind

<idle>-0 0d.s2 1280851219us : trace_ports: 0000:00:02.0 rcs0: submit { 130:4696*, 134:1508 }
synmark2-5425 2..s. 1280851239us : process_csb: 0000:00:02.0 rcs0: cs-irq head=5, tail=0
synmark2-5425 2..s. 1280851240us : process_csb: 0000:00:02.0 rcs0: csb[0]: status=0x00008002:0x00000000
^---- Preemption event for the ELSP update; note the lite-restore

synmark2-5425 2..s. 1280851243us : trace_ports: 0000:00:02.0 rcs0: preempted { 130:4698, 66:119966 }
synmark2-5425 2..s. 1280851246us : trace_ports: 0000:00:02.0 rcs0: promote { 130:4696*, 134:1508 }
synmark2-5425 2.... 1280851462us : __i915_request_commit: 0000:00:02.0 rcs0: fence 130:4700, current 4695
synmark2-5425 2.... 1280852111us : __i915_request_commit: 0000:00:02.0 rcs0: fence 130:4702, current 4695
synmark2-5425 2.Ns1 1280852296us : process_csb: 0000:00:02.0 rcs0: cs-irq head=0, tail=2
synmark2-5425 2.Ns1 1280852297us : process_csb: 0000:00:02.0 rcs0: csb[1]: status=0x00000814:0x00000000
synmark2-5425 2.Ns1 1280852299us : trace_ports: 0000:00:02.0 rcs0: completed { 130:4696!, 134:1508 }
synmark2-5425 2.Ns1 1280852301us : process_csb: 0000:00:02.0 rcs0: csb[2]: status=0x00000818:0x00000040
synmark2-5425 2.Ns1 1280852302us : trace_ports: 0000:00:02.0 rcs0: completed { 134:1508, 0:0 }
synmark2-5425 2.Ns1 1280852313us : process_csb: process_csb:2336 GEM_BUG_ON(!i915_request_completed(*execlists->active) && !reset_in_progress(execlists))

Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
Referenecs: 82c69bf58650 ("drm/i915/gt: Detect if we miss WaIdleLiteRestore")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Link: https://patchwork.freedesktop.org/patch/msgid/20200207211452.2860634-1-chris@chris-wilson.co.uk


# 793c2261 07-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Protect execlists_hold/unhold from new waiters

As we may add new waiters to a request as it is being run, we need to
mark the list iteration as being safe for concurrent addition.

v2: Mika spotted that we used the same trick for signalers_list, so warn
the compiler about the lockless walk there as well.

Fixes: 32ff621fd744 ("drm/i915/gt: Allow temporary suspension of inflight requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200207110213.2734386-1-chris@chris-wilson.co.uk


# f14f27b1 06-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Protect defer_request() from new waiters

Mika spotted

<4>[17436.705441] general protection fault: 0000 [#1] PREEMPT SMP PTI
<4>[17436.705447] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.5.0+ #1
<4>[17436.705449] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 3805 05/16/2018
<4>[17436.705512] RIP: 0010:__execlists_submission_tasklet+0xc4d/0x16e0 [i915]
<4>[17436.705516] Code: c5 4c 8d 60 e0 75 17 e9 8c 07 00 00 49 8b 44 24 20 49 39 c5 4c 8d 60 e0 0f 84 7a 07 00 00 49 8b 5c 24 08 49 8b 87 80 00 00 00 <48> 39 83 d8 fe ff ff 75 d9 48 8b 83 88 fe ff ff a8 01 0f 84 b6 05
<4>[17436.705518] RSP: 0018:ffffc9000012ce80 EFLAGS: 00010083
<4>[17436.705521] RAX: ffff88822ae42000 RBX: 5a5a5a5a5a5a5a5a RCX: dead000000000122
<4>[17436.705523] RDX: ffff88822ae42588 RSI: ffff8881e32a7908 RDI: ffff8881c429fd48
<4>[17436.705525] RBP: ffffc9000012cf00 R08: ffff88822ae42588 R09: 00000000fffffffe
<4>[17436.705527] R10: ffff8881c429fb80 R11: 00000000a677cf08 R12: ffff8881c42a0aa8
<4>[17436.705529] R13: ffff8881c429fd38 R14: ffff88822ae42588 R15: ffff8881c429fb80
<4>[17436.705532] FS: 0000000000000000(0000) GS:ffff88822ed00000(0000) knlGS:0000000000000000
<4>[17436.705534] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[17436.705536] CR2: 00007f858c76d000 CR3: 0000000005610003 CR4: 00000000003606e0
<4>[17436.705538] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[17436.705540] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[17436.705542] Call Trace:
<4>[17436.705545] <IRQ>
<4>[17436.705603] execlists_submission_tasklet+0xc0/0x130 [i915]

which is us consuming a partially initialised new waiter in
defer_requests(). We can prevent this by initialising the i915_dependency
prior to making it visible, and since we are using a concurrent
list_add/iterator mark them up to the compiler.

Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200206204915.2636606-2-chris@chris-wilson.co.uk


# faea1792 31-Jan-2020 Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

drm/i915: extract engine WA programming to common resume function

The workarounds are a common "feature" across gens and submission
mechanisms and we already call the other WA related functions from
common engine ones (<setup/cleanup>_common), so it makes sense to
do the same with WA application. Medium-term, This will help us
reduce the duplication once the GuC resume function is added, but short
term it will also allow us to use the workaround lists for pre-gen8
engine workarounds.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200131075716.2212299-2-chris@chris-wilson.co.uk


# e3793468 30-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use the async worker to avoid reclaim tainting the ggtt->mutex

On Braswell and Broxton (also known as Valleyview and Apollolake), we
need to serialise updates of the GGTT using the big stop_machine()
hammer. This has the side effect of appearing to lockdep as a possible
reclaim (since it uses the cpuhp mutex and that is tainted by per-cpu
allocations). However, we want to use vm->mutex (including ggtt->mutex)
from within the shrinker and so must avoid such possible taints. For this
purpose, we introduced the asynchronous vma binding and we can apply it
to the PIN_GLOBAL so long as take care to add the necessary waits for
the worker afterwards.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/211
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200130181710.2030251-3-chris@chris-wilson.co.uk


# f1042cc8 29-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Ignore discrepancies in pending[] across resets

When we reset the engine, we first remove the guilty request from the
active list. If it so happens that there is a pending preemption event
to process before we handle the reset, when we inspect that event we
find ourselves a little confused as we have bent the rules slightly to
perform the reset.

Just ignore any discrepancies inside reset, we know we'll start again
from scratch afterwards.

<0>[ 536.940213] <idle>-0 6..s1 537441383us : execlists_reset: 0000:00:02.0 vcs0: reset for CS error
<0>[ 536.940213] i915_sel-7302 2d..1 537441386us : trace_ports: 0000:00:02.0 vcs0: submit { 10c59:2*, 10c5a:2 }
<0>[ 536.940213] <idle>-0 6d.s2 537471320us : __i915_request_unsubmit: 0000:00:02.0 vcs0: fence 10c59:2, current 1
<0>[ 536.940213] <idle>-0 6d.s2 537471321us : execlists_hold: 0000:00:02.0 vcs0: fence 10c59:2, current 1 on hold
<0>[ 536.940213] <idle>-0 6.Ns1 537471328us : intel_engine_reset: 0000:00:02.0 vcs0: flags=10
<0>[ 536.940213] <idle>-0 6.Ns1 537471421us : execlists_reset_prepare: 0000:00:02.0 vcs0: depth<-1
<0>[ 536.940213] <idle>-0 6.Ns1 537471422us : intel_engine_stop_cs: 0000:00:02.0 vcs0:
<0>[ 536.940213] <idle>-0 6.Ns1 537472424us : intel_engine_stop_cs: 0000:00:02.0 vcs0: timed out on STOP_RING -> IDLE
<0>[ 536.940213] <idle>-0 6.Ns1 537472429us : __intel_gt_reset: 0000:00:02.0 engine_mask=4
<0>[ 536.940213] <idle>-0 6.Ns1 537472442us : execlists_reset_rewind: 0000:00:02.0 vcs0:
<0>[ 536.940213] <idle>-0 6dNs2 537472443us : process_csb: 0000:00:02.0 vcs0: cs-irq head=4, tail=5
<0>[ 536.940213] <idle>-0 6dNs2 537472444us : process_csb: 0000:00:02.0 vcs0: csb[5]: status=0x00008002:0x20000060
<0>[ 536.940213] <idle>-0 6dNs2 537472464us : trace_ports: 0000:00:02.0 vcs0: preempted { 10c59:2*, 0:0 }
<0>[ 536.940213] <idle>-0 6dNs2 537472465us : trace_ports: 0000:00:02.0 vcs0: promote { 10c59:2*, 10c5a:2 }
<0>[ 536.940213] <idle>-0 6dNs2 537472706us : assert_pending_valid: assert_pending_valid:1417 GEM_BUG_ON(!i915_request_is_active(rq))

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200129165935.1266132-1-chris@chris-wilson.co.uk


# 70a76a9b 28-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Hook up CS_MASTER_ERROR_INTERRUPT

Now that we have offline error capture and can reset an engine from
inside an atomic context while also preserving the GPU state for
post-mortem analysis, it is time to handle error interrupts thrown by
the command parser.

This provides a much, much faster mechanism for us to detect known
problems than using heartbeats/hangchecks, and also provides a mechanism
for when those are disabled. However, it is limited to problems the HW
can detect in the CS and so not a complete solution for detecting lockups.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200128204318.4182039-2-chris@chris-wilson.co.uk


# 8a574698 28-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlist: Mark up racy read of execlists->pending[0]

We write to execlists->pending[0] in process_csb() to acknowledge the
completion of the ESLP update, outside of the main spinlock. When we
check the current status of the previous submission in
__execlists_submission_tasklet() we should therefore use READ_ONCE() to
reflect and document the unsynchronized read.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200128171614.3845825-1-chris@chris-wilson.co.uk


# 6f280b13 23-Jan-2020 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Fix OA context id overlap with idle context id

Engine context pinned in perf OA was set to same context id as
the idle context. Set the context id to an unused value.

Clear the sw context id field in lrc descriptor before ORing with
ce->tag (Chris)

Closes: https://gitlab.freedesktop.org/drm/intel/issues/756
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200124013701.40609-1-umesh.nerlige.ramappa@intel.com


# 989df3a7 22-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Reclaim the hanging virtual request

If we encounter a hang on a virtual engine, as we process the hang the
request may already have been moved back to the virtual engine (we are
processing the hang on the physical engine). We need to reclaim the
request from the virtual engine so that the locking is consistent and
local to the real engine on which we will hold the request for error
state capturing.

v2: Pull the reclamation into execlists_hold() and assert that cannot be
called from outside of the reset (i.e. with the tasklet disabled).
v3: Added selftest
v4: Drop the reference owned by the virtual engine

Fixes: 748317386afb ("drm/i915/execlists: Offline error capture")
Testcase: igt/gem_exec_balancer/hang
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-2-chris@chris-wilson.co.uk


# 4ba5c086 22-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Take a reference while capturing the guilty request

Thanks to preempt-to-busy, we leave the request on the HW as we submit
the preemption request. This means that the request may complete at any
moment as we process HW events, and in particular the request may be
retired as we are planning to capture it for a preemption timeout.

Be more careful while obtaining the request to capture after a
preemption timeout, and check to see if it completed before we were able
to put it on the on-hold list. If we do see it did complete just before
we capture the request, proclaim the preemption-timeout a false positive
and pardon the reset as we should hit an arbitration point momentarily
and so be able to process the preemption.

Note that even after we move the request to be on hold it may be retired
(as the reset to stop the HW comes after), so we do require to hold our
own reference as we work on the request for capture (and all of the
peeking at state within the request needs to be carefully protected).

Fixes: 32ff621fd744 ("drm/i915/gt: Allow temporary suspension of inflight requests")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/997
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-1-chris@chris-wilson.co.uk


# 74831738 16-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Offline error capture

Currently, we skip error capture upon forced preemption. We apply forced
preemption when there is a higher priority request that should be
running but is being blocked, and we skip inline error capture so that
the preemption request is not further delayed by a user controlled
capture -- extending the denial of service.

However, preemption reset is also used for heartbeats and regular GPU
hangs. By skipping the error capture, we remove the ability to debug GPU
hangs.

In order to capture the error without delaying the preemption request
further, we can do an out-of-line capture by removing the guilty request
from the execution queue and scheduling a worker to dump that request.
When removing a request, we need to remove the entire context and all
descendants from the execution queue, so that they do not jump past.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/738
Fixes: 3a7a92aba8fb ("drm/i915/execlists: Force preemption")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-3-chris@chris-wilson.co.uk


# 32ff621f 16-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Allow temporary suspension of inflight requests

In order to support out-of-line error capture, we need to remove the
active request from HW and put it to one side while a worker compresses
and stores all the details associated with that request. (As that
compression may take an arbitrary user-controlled amount of time, we
want to let the engine continue running on other workloads while the
hanging request is dumped.) Not only do we need to remove the active
request, but we also have to remove its context and all requests that
were dependent on it (both in flight, queued and future submission).

Finally once the capture is complete, we need to be able to resubmit the
request and its dependents and allow them to execute.

v2: Replace stack recursion with a simple list.
v3: Check all the parents, not just the first, when searching for a
stuck ancestor!

References: https://gitlab.freedesktop.org/drm/intel/issues/738
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-2-chris@chris-wilson.co.uk


# 672c368f 16-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Keep track of request among the scheduling lists

If we keep track of when the i915_request.sched.link is on the HW
runlist, or in the priority queue we can simplify our interactions with
the request (such as during rescheduling). This also simplifies the next
patch where we introduce a new in-between list, for requests that are
ready but neither on the run list or in the queue.

v2: Update i915_sched_node.link explanation for current usage where it
is a link on both the queue and on the runlists.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-1-chris@chris-wilson.co.uk


# f3c0efc9 15-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Leave resetting ring to intel_ring

We need to allow concurrent intel_context_unpin, which means avoiding
doing destructive operations like intel_ring_reset(). This was already
fixed for intel_ring_unpin() in commit 0725d9a31869 ("drm/i915/gt: Make
intel_ring_unpin() safe for concurrent pint"), but I overlooked that
execlists_context_unpin() also made the same mistake.

Reported-by: Matthew Brost <matthew.brost@intel.com>
Fixes: 841350223816 ("drm/i915/gt: Drop mutex serialisation between context pin/unpin")
References: 0725d9a31869 ("drm/i915/gt: Make intel_ring_unpin() safe for concurrent pint")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200115175829.2761329-1-chris@chris-wilson.co.uk


# 72ff2b8d 14-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Use the BIT when checking the flags, not the index

In converting over to using set_bit()/test_bit(), when manually
inspecting the rq->fence.flags, we need to use BIT().

Fixes: e1c31fb5dde3 ("drm/i915: Merge i915_request.flags with i915_request.fence.flags")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200115122509.2673075-1-chris@chris-wilson.co.uk


# 6b7133b6 13-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Always reset the timeslice after a context switch

Currently, we reset the timer after a pre-eemption event. This has the
side-effect that the timeslice runs into the second context after the
first is completed after a normal promotion event, causing the second
context to be swapped out early and switched for a third context. To be
more fair, we want to reset the clock after promotion as well.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200113214546.1990139-1-chris@chris-wilson.co.uk


# bc8a76a1 08-Jan-2020 Akeem G Abodunrin <akeem.g.abodunrin@intel.com>

drm/i915/gen9: Clear residual context state on context switch

Intel ID: PSIRT-TA-201910-001
CVEID: CVE-2019-14615

Intel GPU Hardware prior to Gen11 does not clear EU state
during a context switch. This can result in information
leakage between contexts.

For Gen8 and Gen9, hardware provides a mechanism for
fast cleardown of the EU state, by issuing a PIPE_CONTROL
with bit 27 set. We can use this in a context batch buffer
to explicitly cleardown the state on every context switch.

As this workaround is already in place for gen8, we can borrow
the code verbatim for Gen9.

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Cc: Kumar Valsan Prathap <prathap.kumar.valsan@intel.com>
Cc: Chris Wilson <chris.p.wilson@intel.com>
Cc: Balestrieri Francesco <francesco.balestrieri@intel.com>
Cc: Bloomfield Jon <jon.bloomfield@intel.com>
Cc: Dutt Sudeep <sudeep.dutt@intel.com>


# b11b28ea 09-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Pull context activation into central intel_context_pin()

While this is encroaching on midlayer territory, having already made the
state allocation a previous step in pinning, we can now pull the common
intel_context_active_acquire() into intel_context_pin() itself. This is
a prelude to make the activation a separate step inside pinning, outside
of the ce->pin_mutex

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200109085717.873326-2-chris@chris-wilson.co.uk


# 921f0c47 08-Jan-2020 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Revert "drm/i915/tgl: Wa_1607138340"

This reverts commit 08fff7aeddc9dd72161b4c8fc27fbab12b4b9352.

For some yet unexplained reason not having this improves stability of some
media workloads.

Promise is that the media hang will be root caused properly and in the
meantime absence of this workaround is unlikely to cause problems.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Sudeep Dutt <sudeep.dutt@intel.com>
Cc: Francesco Balestrieri <francesco.balestrieri@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tony Ye <tony.ye@intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200108161954.29739-1-tvrtko.ursulin@linux.intel.com


# d7cb6975 07-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Always force restore freshly pinned contexts

It is highly unlikely, but still conceivable, that we submit a context
with the same GGTT address as last active on the HW. In this case, with
a matching LRCA, the HW would not restore the new context image causing
a potential violation of our context isolation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200107172842.3315449-1-chris@chris-wilson.co.uk


# 7807a76b 07-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Take responsibility for engine->release as the last step

In order to avoid a double cleanup on error, take ownership of
engine->release past the point of no [error] return.

Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Fixes: e26b6d434147 ("drm/i915/gt: Pull GT initialisation under intel_gt_init()")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Tested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200107143118.3288995-1-chris@chris-wilson.co.uk


# 1325008f 05-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Mark up virtual engine uabi_instance

Be sure to initialise the uabi_instance on the virtual engine to the
special invalid value, just in case we ever peek at it from the uAPI.

Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 750e76b4f9f6 ("drm/i915/gt: Move the [class][inst] lookup for engines onto the GT")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200106123921.2543886-1-chris@chris-wilson.co.uk
(cherry picked from commit f75fc37b5e70b75f21550410f88e2379648120e2)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# f75fc37b 05-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Mark up virtual engine uabi_instance

Be sure to initialise the uabi_instance on the virtual engine to the
special invalid value, just in case we ever peek at it from the uAPI.

Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 750e76b4f9f6 ("drm/i915/gt: Move the [class][inst] lookup for engines onto the GT")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200106123921.2543886-1-chris@chris-wilson.co.uk


# ab17e6ca 06-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Use memset_p to clear the ports

Put memset_p to use to clear the array of pointers used for tracking the
ELSP.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200106114234.2529613-6-chris@chris-wilson.co.uk


# e1c31fb5 06-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Merge i915_request.flags with i915_request.fence.flags

As we already have a flags field buried within i915_request, reuse it!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200106114234.2529613-3-chris@chris-wilson.co.uk


# 1d0e2c93 02-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Always poison the kernel_context image before unparking

Keep scrubbing the kernel_context image with poison before we reset it
in order to demonstrate that we will be resilient in the case where it
is accidentally overwritten on idle.

Suggested-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200102131707.1463945-5-chris@chris-wilson.co.uk


# 49a24e71 02-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Ignore stale context state upon resume

We leave the kernel_context on the HW as we suspend (and while idle).
There is no guarantee that is complete in memory, so we try to inhibit
restoration from the kernel_context. Reinforce the inhibition by
scrubbing the context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200102131707.1463945-3-chris@chris-wilson.co.uk


# d1813ca2 02-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Clear LRC image inline

When creating the initial LRC image, we also want to clear the MI_NOOPs
and register values. Rather than use a blanket memset beforehand, apply
the clears inline, close the context image and force inhibition of the
uninitialised reminder.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200102131707.1463945-2-chris@chris-wilson.co.uk


# 6a505e64 02-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Include a bunch more rcs image state

Empirically the minimal context image we use for rcs is insufficient to
state the engine. This is demonstrated if we poison the context image
such that any uninitialised state is invalid, and so if the engine
samples beyond our defined region, will fail to start.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200102131707.1463945-1-chris@chris-wilson.co.uk


# 2b64e616 30-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Leave RING_BB_STATE to default value

Do not reset RING_BB_STATE, leaving it to the default state value. This
prevents bdw/bsw from getting confused when executing batches from the
GGTT.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191230165821.3840449-2-chris@chris-wilson.co.uk


# 7b02b23e 29-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Avoid using tag 0 for the very first submission

Assume that the HW starts off with tag 0 "active" and so avoid using tag
0 for our own first ELSP submission.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191229183153.3719869-2-chris@chris-wilson.co.uk


# 987281ab 29-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Ensure that all new contexts clear STOP_RING

Set up the RING_MI_MODE in new contexts to clear the STOP_RING bit, just
in case they find it still set after a reset (as they are the first
contexts to be run).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191229183153.3719869-1-chris@chris-wilson.co.uk


# 7d70a123 22-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Merge engine init/setup loops

Now that we don't need to create GEM contexts in the middle of engine
construction, we can pull the engine init/setup loops together.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191222144046.1674865-2-chris@chris-wilson.co.uk


# e26b6d43 21-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Pull GT initialisation under intel_gt_init()

Begin pulling the GT setup underneath a single GT umbrella; let intel_gt
take ownership of its engines! As hinted, the complication is the
lifetime of the probed engine versus the active lifetime of the GT
backends. We need to detect the engine layout early and keep it until
the end so that we can sanitize state on takeover and release.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Acked-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191222120752.1368352-1-chris@chris-wilson.co.uk


# e6ba7648 21-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove i915->kernel_context

Allocate only an internal intel_context for the kernel_context, forgoing
a global GEM context for internal use as we only require a separate
address space (for our own protection).

Now having weaned GT from requiring ce->gem_context, we can stop
referencing it entirely. This also means we no longer have to create random
and unnecessary GEM contexts for internal use.

GEM contexts are now entirely for tracking GEM clients, and intel_context
the execution environment on the GPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Acked-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191221160324.1073045-1-chris@chris-wilson.co.uk


# a5e93b42 13-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Select arb on/off around batches based on preemption

Decide whether or not we need to disable arbitration within user batches
based on our intel_engine_has_preemption() flag.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191213151331.1788371-1-chris@chris-wilson.co.uk


# 9f3ccd40 20-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drop GEM context as a direct link from i915_request

Keep the intel_context as being the primary state for i915_request, with
the GEM context a backpointer from the low level state for the rarer
cases we need client information. Our goal is to remove such references
to clients from the backend, and leave the HW submission agnostic to
client interfaces and self-contained.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191220101230.256839-1-chris@chris-wilson.co.uk


# d5e19353 19-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Teach veng to defer the context allocation

Since we added the context_alloc callback to intel_context_ops, we can
safely install a custom hook for the deferred virtual context allocation.
This means that all new contexts behave the same upon creation,
simplifying later code.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191219232932.189197-1-chris@chris-wilson.co.uk


# 7d1ff0d9 19-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Add breadcrumb retire to physical engine

Avoid adding the retire workers to the virtual engine so that we don't
end up in the unenviable situation of trying to free the virtual engine
while its worker remains active.

Fixes: dc93c9b69315 ("drm/i915/gt: Schedule request retirement when signaler idles")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/867
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191219221344.161523-1-chris@chris-wilson.co.uk


# dc93c9b6 18-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Schedule request retirement when signaler idles

Very similar to commit 4f88f8747fa4 ("drm/i915/gt: Schedule request
retirement when timeline idles"), but this time instead of coupling into
the execlists CS event interrupt, we couple into the breadcrumb
interrupt and queue a timeline's retirement when the last signaler is
completed. This should allow us to more rapidly park ringbuffer
submission, and so help reduce power consumption on older systems.

v2: Fixup intel_engine_add_retire() to handle concurrent callers

References: 4f88f8747fa4 ("drm/i915/gt: Schedule request retirement when timeline idles")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191219124353.8607-1-chris@chris-wilson.co.uk


# 54400257 17-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Remove direct invocation of breadcrumb signaling

Only signal the breadcrumbs from inside the irq_work, simplifying our
interface and calling conventions. The micro-optimisation here is that
by always using the irq_work interface, we know we are always inside an
irq-off critical section for the breadcrumb signaling and can ellide
save/restore of the irq flags.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191217095642.3124521-7-chris@chris-wilson.co.uk


# 639f2f24 13-Dec-2019 Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>

drm/i915: Introduce new macros for tracing

New macros ENGINE_TRACE(), CE_TRACE(), RQ_TRACE() and
GT_TRACE() are introduce to tag device name and engine
name with contexts and requests tracing in i915.

Cc: Sudeep Dutt <sudeep.dutt@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191213155152.69182-2-venkata.s.dhanalakota@intel.com


# f26a9e95 08-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Detect if we miss WaIdleLiteRestore

In order to avoid confusing the HW, we must never submit an empty ring
during lite-restore, that is we should always advance the RING_TAIL
before submitting to stay ahead of the RING_HEAD.

Normally this is prevented by keeping a couple of spare NOPs in the
request->wa_tail so that on resubmission we can advance the tail. This
relies on the request only being resubmitted once, which is the normal
condition as it is seen once for ELSP[1] and then later in ELSP[0]. On
preemption, the requests are unwound and the tail reset back to the
normal end point (as we know the request is incomplete and therefore its
RING_HEAD is even earlier).

However, if this w/a should fail we would try and resubmit the request
with the RING_TAIL already set to the location of this request's wa_tail
potentially causing a GPU hang. We can spot when we do try and
incorrectly resubmit without advancing the RING_TAIL and spare any
embarrassment by forcing the context restore.

In the case of preempt-to-busy, we leave the requests running on the HW
while we unwind. As the ring is still live, we cannot rewind our
rq->tail without forcing a reload so leave it set to rq->wa_tail and
only force a reload if we resubmit after a lite-restore. (Normally, the
forced reload will be a part of the preemption event.)

Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/673
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: stable@kernel.vger.org
Link: https://patchwork.freedesktop.org/patch/msgid/20191209023215.3519970-1-chris@chris-wilson.co.uk
(cherry picked from commit 82c69bf58650e644c61aa2bf5100b63a1070fd2f)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 82c69bf5 08-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Detect if we miss WaIdleLiteRestore

In order to avoid confusing the HW, we must never submit an empty ring
during lite-restore, that is we should always advance the RING_TAIL
before submitting to stay ahead of the RING_HEAD.

Normally this is prevented by keeping a couple of spare NOPs in the
request->wa_tail so that on resubmission we can advance the tail. This
relies on the request only being resubmitted once, which is the normal
condition as it is seen once for ELSP[1] and then later in ELSP[0]. On
preemption, the requests are unwound and the tail reset back to the
normal end point (as we know the request is incomplete and therefore its
RING_HEAD is even earlier).

However, if this w/a should fail we would try and resubmit the request
with the RING_TAIL already set to the location of this request's wa_tail
potentially causing a GPU hang. We can spot when we do try and
incorrectly resubmit without advancing the RING_TAIL and spare any
embarrassment by forcing the context restore.

In the case of preempt-to-busy, we leave the requests running on the HW
while we unwind. As the ring is still live, we cannot rewind our
rq->tail without forcing a reload so leave it set to rq->wa_tail and
only force a reload if we resubmit after a lite-restore. (Normally, the
forced reload will be a part of the preemption event.)

Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/673
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: stable@kernel.vger.org
Link: https://patchwork.freedesktop.org/patch/msgid/20191209023215.3519970-1-chris@chris-wilson.co.uk


# 36deeddc 05-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Save irqstate around virtual_context_destroy

As virtual_context_destroy() may be called from a request signal, it may
be called from inside an irq-off section, and so we need to do a full
save/restore of the irq state rather than blindly re-enable irqs upon
unlocking.

<4> [110.024262] WARNING: inconsistent lock state
<4> [110.024277] 5.4.0-rc8-CI-CI_DRM_7489+ #1 Tainted: G U
<4> [110.024292] --------------------------------
<4> [110.024305] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
<4> [110.024323] kworker/0:0/5 [HC0[0]:SC0[0]:HE1:SE1] takes:
<4> [110.024338] ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915]
<4> [110.024592] {IN-HARDIRQ-W} state was registered at:
<4> [110.024612] lock_acquire+0xa7/0x1c0
<4> [110.024627] _raw_spin_lock_irqsave+0x33/0x50
<4> [110.024788] intel_engine_breadcrumbs_irq+0x38c/0x600 [i915]
<4> [110.024808] irq_work_run_list+0x49/0x70
<4> [110.024824] irq_work_run+0x26/0x50
<4> [110.024839] smp_irq_work_interrupt+0x44/0x1e0
<4> [110.024855] irq_work_interrupt+0xf/0x20
<4> [110.024871] __do_softirq+0xb7/0x47f
<4> [110.024885] irq_exit+0xba/0xc0
<4> [110.024898] do_IRQ+0x83/0x160
<4> [110.024910] ret_from_intr+0x0/0x1d
<4> [110.024922] irq event stamp: 172864
<4> [110.024938] hardirqs last enabled at (172863): [<ffffffff819ea214>] _raw_spin_unlock_irq+0x24/0x50
<4> [110.024963] hardirqs last disabled at (172864): [<ffffffff819e9fba>] _raw_spin_lock_irq+0xa/0x40
<4> [110.024988] softirqs last enabled at (172812): [<ffffffff81c00385>] __do_softirq+0x385/0x47f
<4> [110.025012] softirqs last disabled at (172797): [<ffffffff810b829a>] irq_exit+0xba/0xc0
<4> [110.025031]
other info that might help us debug this:
<4> [110.025049] Possible unsafe locking scenario:

<4> [110.025065] CPU0
<4> [110.025075] ----
<4> [110.025084] lock(&(&rq->lock)->rlock);
<4> [110.025099] <Interrupt>
<4> [110.025109] lock(&(&rq->lock)->rlock);
<4> [110.025124]
*** DEADLOCK ***

<4> [110.025144] 4 locks held by kworker/0:0/5:
<4> [110.025156] #0: ffff88827588f528 ((wq_completion)events){+.+.}, at: process_one_work+0x1de/0x620
<4> [110.025187] #1: ffffc9000006fe78 ((work_completion)(&engine->retire_work)){+.+.}, at: process_one_work+0x1de/0x620
<4> [110.025219] #2: ffff88825605e270 (&kernel#2){+.+.}, at: engine_retire+0x57/0xe0 [i915]
<4> [110.025405] #3: ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915]
<4> [110.025634]
stack backtrace:
<4> [110.025653] CPU: 0 PID: 5 Comm: kworker/0:0 Tainted: G U 5.4.0-rc8-CI-CI_DRM_7489+ #1
<4> [110.025675] Hardware name: /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017
<4> [110.025856] Workqueue: events engine_retire [i915]
<4> [110.025872] Call Trace:
<4> [110.025891] dump_stack+0x71/0x9b
<4> [110.025907] mark_lock+0x49a/0x500
<4> [110.025926] ? print_shortest_lock_dependencies+0x200/0x200
<4> [110.025946] mark_held_locks+0x49/0x70
<4> [110.025962] ? _raw_spin_unlock_irq+0x24/0x50
<4> [110.025978] lockdep_hardirqs_on+0xa2/0x1c0
<4> [110.025995] _raw_spin_unlock_irq+0x24/0x50
<4> [110.026171] virtual_context_destroy+0xc5/0x2e0 [i915]
<4> [110.026376] __active_retire+0xb4/0x290 [i915]
<4> [110.026396] dma_fence_signal_locked+0x9e/0x1b0
<4> [110.026613] i915_request_retire+0x451/0x930 [i915]
<4> [110.026766] retire_requests+0x4d/0x60 [i915]
<4> [110.026919] engine_retire+0x63/0xe0 [i915]

Fixes: b1e3177bd1d8 ("drm/i915: Coordinate i915_active with its own mutex")
Fixes: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191205145934.663183-1-chris@chris-wilson.co.uk
(cherry picked from commit 6f7ac8285371fb0df58aba861eaab387f79ed04d)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 6f7ac828 05-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Save irqstate around virtual_context_destroy

As virtual_context_destroy() may be called from a request signal, it may
be called from inside an irq-off section, and so we need to do a full
save/restore of the irq state rather than blindly re-enable irqs upon
unlocking.

<4> [110.024262] WARNING: inconsistent lock state
<4> [110.024277] 5.4.0-rc8-CI-CI_DRM_7489+ #1 Tainted: G U
<4> [110.024292] --------------------------------
<4> [110.024305] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
<4> [110.024323] kworker/0:0/5 [HC0[0]:SC0[0]:HE1:SE1] takes:
<4> [110.024338] ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915]
<4> [110.024592] {IN-HARDIRQ-W} state was registered at:
<4> [110.024612] lock_acquire+0xa7/0x1c0
<4> [110.024627] _raw_spin_lock_irqsave+0x33/0x50
<4> [110.024788] intel_engine_breadcrumbs_irq+0x38c/0x600 [i915]
<4> [110.024808] irq_work_run_list+0x49/0x70
<4> [110.024824] irq_work_run+0x26/0x50
<4> [110.024839] smp_irq_work_interrupt+0x44/0x1e0
<4> [110.024855] irq_work_interrupt+0xf/0x20
<4> [110.024871] __do_softirq+0xb7/0x47f
<4> [110.024885] irq_exit+0xba/0xc0
<4> [110.024898] do_IRQ+0x83/0x160
<4> [110.024910] ret_from_intr+0x0/0x1d
<4> [110.024922] irq event stamp: 172864
<4> [110.024938] hardirqs last enabled at (172863): [<ffffffff819ea214>] _raw_spin_unlock_irq+0x24/0x50
<4> [110.024963] hardirqs last disabled at (172864): [<ffffffff819e9fba>] _raw_spin_lock_irq+0xa/0x40
<4> [110.024988] softirqs last enabled at (172812): [<ffffffff81c00385>] __do_softirq+0x385/0x47f
<4> [110.025012] softirqs last disabled at (172797): [<ffffffff810b829a>] irq_exit+0xba/0xc0
<4> [110.025031]
other info that might help us debug this:
<4> [110.025049] Possible unsafe locking scenario:

<4> [110.025065] CPU0
<4> [110.025075] ----
<4> [110.025084] lock(&(&rq->lock)->rlock);
<4> [110.025099] <Interrupt>
<4> [110.025109] lock(&(&rq->lock)->rlock);
<4> [110.025124]
*** DEADLOCK ***

<4> [110.025144] 4 locks held by kworker/0:0/5:
<4> [110.025156] #0: ffff88827588f528 ((wq_completion)events){+.+.}, at: process_one_work+0x1de/0x620
<4> [110.025187] #1: ffffc9000006fe78 ((work_completion)(&engine->retire_work)){+.+.}, at: process_one_work+0x1de/0x620
<4> [110.025219] #2: ffff88825605e270 (&kernel#2){+.+.}, at: engine_retire+0x57/0xe0 [i915]
<4> [110.025405] #3: ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915]
<4> [110.025634]
stack backtrace:
<4> [110.025653] CPU: 0 PID: 5 Comm: kworker/0:0 Tainted: G U 5.4.0-rc8-CI-CI_DRM_7489+ #1
<4> [110.025675] Hardware name: /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017
<4> [110.025856] Workqueue: events engine_retire [i915]
<4> [110.025872] Call Trace:
<4> [110.025891] dump_stack+0x71/0x9b
<4> [110.025907] mark_lock+0x49a/0x500
<4> [110.025926] ? print_shortest_lock_dependencies+0x200/0x200
<4> [110.025946] mark_held_locks+0x49/0x70
<4> [110.025962] ? _raw_spin_unlock_irq+0x24/0x50
<4> [110.025978] lockdep_hardirqs_on+0xa2/0x1c0
<4> [110.025995] _raw_spin_unlock_irq+0x24/0x50
<4> [110.026171] virtual_context_destroy+0xc5/0x2e0 [i915]
<4> [110.026376] __active_retire+0xb4/0x290 [i915]
<4> [110.026396] dma_fence_signal_locked+0x9e/0x1b0
<4> [110.026613] i915_request_retire+0x451/0x930 [i915]
<4> [110.026766] retire_requests+0x4d/0x60 [i915]
<4> [110.026919] engine_retire+0x63/0xe0 [i915]

Fixes: b1e3177bd1d8 ("drm/i915: Coordinate i915_active with its own mutex")
Fixes: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191205145934.663183-1-chris@chris-wilson.co.uk


# f70de8d2 02-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Track the context validity explicitly

Rather than assume if and only if the engine->default_state is not set
that the context is invalid, instead track when we know the context has
valid state -- either because we have copied the default_state or we
have completed a context switch to save the HW state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191203124155.3019926-1-chris@chris-wilson.co.uk


# 49e74c8f 03-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Skip nested spinlock for validating pending

Only along the submission path can we guarantee that the locked request
is indeed from a foreign engine, and so the nesting of engine/rq is
permissible. On the submission tasklet (process_csb()), we may find
ourselves competing with the normal nesting of rq/engine, invalidating
our nesting. As we only use the spinlock for debug purposes, skip the
debug if we cannot acquire the spinlock for safe validation - catching
99% of the bugs is better than causing a hard lockup.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: c95d31c3df1b ("drm/i915/execlists: Lock the request while validating it during promotion")
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191203152631.3107653-2-chris@chris-wilson.co.uk


# 80aac91b 03-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Add a couple more validity checks to assert_pending()

Check the pending request submission is valid: that it at least has a
reference for the submission and that the request is on the active list.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191203152631.3107653-1-chris@chris-wilson.co.uk


# 97c16353 29-Nov-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Ensure the tasklet is decoupled upon shutdown

As we only cancel the timers asynchronously, they may
still be running on another CPU as we shutdown, raising one last
softirq. So be safe and make sure the tasklet is flushed before
destroying the engine's memory.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191129172542.1222810-1-chris@chris-wilson.co.uk


# 31177017 25-Nov-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Schedule request retirement when timeline idles

The major drawback of commit 7e34f4e4aad3 ("drm/i915/gen8+: Add RC6 CTX
corruption WA") is that it disables RC6 while Skylake (and friends) is
active, and we do not consider the GPU idle until all outstanding
requests have been retired and the engine switched over to the kernel
context. If userspace is idle, this task falls onto our background idle
worker, which only runs roughly once a second, meaning that userspace has
to have been idle for a couple of seconds before we enable RC6 again.
Naturally, this causes us to consume considerably more energy than
before as powersaving is effectively disabled while a display server
(here's looking at you Xorg) is running.

As execlists will get a completion event as each context is completed,
we can use this interrupt to queue a retire worker bound to this engine
to cleanup idle timelines. We will then immediately notice the idle
engine (without userspace intervention or the aid of the background
retire worker) and start parking the GPU. Thus during light workloads,
we will do much more work to idle the GPU faster... Hopefully with
commensurate power saving!

v2: Watch context completions and only look at those local to the engine
when retiring to reduce the amount of excess work we perform.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112315
References: 7e34f4e4aad3 ("drm/i915/gen8+: Add RC6 CTX corruption WA")
References: 2248a28384fe ("drm/i915/gen8+: Add RC6 CTX corruption WA")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125105858.1718307-3-chris@chris-wilson.co.uk
(cherry picked from commit 4f88f8747fa43c97c3b3712d8d87295ea757cc51)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 4ec5cc78 25-Nov-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Fixup cancel_port_requests()

I rushed a last minute correction to cancel_port_requests() to prevent
the snooping of *execlists->active as the inflight array was being
updated, without noticing we iterated the inflight array starting from
active! Oops.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112387
Fixes: 97f9af78f38d ("drm/i915/gt: Mark the execlists->active as the primary volatile access")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125112520.1760492-1-chris@chris-wilson.co.uk
(cherry picked from commit da0ef77e1e0ccff703efee82406c629d5c4f4bbb)
[Joonas: Fixed Fixes: tag to match drm-intel-next-fixes]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 97f9af78 25-Nov-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Mark the execlists->active as the primary volatile access

Since we want to do a lockless read of the current active request, and
that request is written to by process_csb also without serialisation, we
need to instruct gcc to take care in reading the pointer itself.

Otherwise, we have observed execlists_active() to report 0x40.

[ 2400.760381] igt/para-4098 1..s. 2376479300us : process_csb: rcs0 cs-irq head=3, tail=4
[ 2400.760826] igt/para-4098 1..s. 2376479303us : process_csb: rcs0 csb[4]: status=0x00000001:0x00000000
[ 2400.761271] igt/para-4098 1..s. 2376479306us : trace_ports: rcs0: promote { b9c59:2622, b9c55:2624 }
[ 2400.761726] igt/para-4097 0d... 2376479311us : __i915_schedule: rcs0: -2147483648->3, inflight:0000000000000040, rq:ffff888208c1e940

which is impossible!

The answer is that as we keep the existing execlists->active pointing
into the array as we copy over that array, the unserialised read may see
a partial pointer value.

Fixes: df403069029d ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125094318.1630806-1-chris@chris-wilson.co.uk
(cherry picked from commit 331bf90591573dfe6c8e892239713ef9702f1396)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# ee33baa8 19-Nov-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark up the calling context for intel_wakeref_put()

Previously, we assumed we could use mutex_trylock() within an atomic
context, falling back to a worker if contended. However, such trickery
is illegal inside interrupt context, and so we need to always use a
worker under such circumstances. As we normally are in process context,
we can typically use a plain mutex, and only defer to a work when we
know we are being called from an interrupt path.

Fixes: 51fbd8de87dc ("drm/i915/pmu: Atomically acquire the gt_pm wakeref")
References: a0855d24fc22d ("locking/mutex: Complain upon mutex API misuse in IRQ contexts")
References: https://bugs.freedesktop.org/show_bug.cgi?id=111626
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191120125433.3767149-1-chris@chris-wilson.co.uk
(cherry picked from commit 07779a76ee1f93f930cf697b22be73d16e14f50c)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 4f88f874 25-Nov-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Schedule request retirement when timeline idles

The major drawback of commit 7e34f4e4aad3 ("drm/i915/gen8+: Add RC6 CTX
corruption WA") is that it disables RC6 while Skylake (and friends) is
active, and we do not consider the GPU idle until all outstanding
requests have been retired and the engine switched over to the kernel
context. If userspace is idle, this task falls onto our background idle
worker, which only runs roughly once a second, meaning that userspace has
to have been idle for a couple of seconds before we enable RC6 again.
Naturally, this causes us to consume considerably more energy than
before as powersaving is effectively disabled while a display server
(here's looking at you Xorg) is running.

As execlists will get a completion event as each context is completed,
we can use this interrupt to queue a retire worker bound to this engine
to cleanup idle timelines. We will then immediately notice the idle
engine (without userspace intervention or the aid of the background
retire worker) and start parking the GPU. Thus during light workloads,
we will do much more work to idle the GPU faster... Hopefully with
commensurate power saving!

v2: Watch context completions and only look at those local to the engine
when retiring to reduce the amount of excess work we perform.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112315
References: 7e34f4e4aad3 ("drm/i915/gen8+: Add RC6 CTX corruption WA")
References: 2248a28384fe ("drm/i915/gen8+: Add RC6 CTX corruption WA")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125105858.1718307-3-chris@chris-wilson.co.uk


# da0ef77e 25-Nov-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Fixup cancel_port_requests()

I rushed a last minute correction to cancel_port_requests() to prevent
the snooping of *execlists->active as the inflight array was being
updated, without noticing we iterated the inflight array starting from
active! Oops.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112387
Fixes: 331bf9059157 ("drm/i915/gt: Mark the execlists->active as the primary volatile access")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125112520.1760492-1-chris@chris-wilson.co.uk


# 331bf905 25-Nov-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Mark the execlists->active as the primary volatile access

Since we want to do a lockless read of the current active request, and
that request is written to by process_csb also without serialisation, we
need to instruct gcc to take care in reading the pointer itself.

Otherwise, we have observed execlists_active() to report 0x40.

[ 2400.760381] igt/para-4098 1..s. 2376479300us : process_csb: rcs0 cs-irq head=3, tail=4
[ 2400.760826] igt/para-4098 1..s. 2376479303us : process_csb: rcs0 csb[4]: status=0x00000001:0x00000000
[ 2400.761271] igt/para-4098 1..s. 2376479306us : trace_ports: rcs0: promote { b9c59:2622, b9c55:2624 }
[ 2400.761726] igt/para-4097 0d... 2376479311us : __i915_schedule: rcs0: -2147483648->3, inflight:0000000000000040, rq:ffff888208c1e940

which is impossible!

The answer is that as we keep the existing execlists->active pointing
into the array as we copy over that array, the unserialised read may see
a partial pointer value.

Fixes: df403069029d ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125094318.1630806-1-chris@chris-wilson.co.uk


# 93b0e8fe 21-Nov-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark intel_wakeref_get() as a sleeper

Assume that intel_wakeref_get() may take the mutex, and perform other
sleeping actions in the course of its callbacks and so use might_sleep()
to ensure that all callers abide. Anything that cannot sleep has to use
e.g. intel_wakeref_get_if_active() to guarantee its avoidance of the
non-atomic paths.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191121130528.309474-1-chris@chris-wilson.co.uk


# c95d31c3 21-Nov-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Lock the request while validating it during promotion

Since the request is already on the HW as we perform its validation, it
and even its subsequent barrier may be concurrently retired before we
process the assertions. If it is retired already and so off the HW, our
assertions become void and we need to ignore them.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112363
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191121103546.146487-1-chris@chris-wilson.co.uk


# 07779a76 19-Nov-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark up the calling context for intel_wakeref_put()

Previously, we assumed we could use mutex_trylock() within an atomic
context, falling back to a worker if contended. However, such trickery
is illegal inside interrupt context, and so we need to always use a
worker under such circumstances. As we normally are in process context,
we can typically use a plain mutex, and only defer to a work when we
know we are being called from an interrupt path.

Fixes: 51fbd8de87dc ("drm/i915/pmu: Atomically acquire the gt_pm wakeref")
References: a0855d24fc22d ("locking/mutex: Complain upon mutex API misuse in IRQ contexts")
References: https://bugs.freedesktop.org/show_bug.cgi?id=111626
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191120125433.3767149-1-chris@chris-wilson.co.uk


# 98ae6fb3 11-Nov-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Move reset_active() from schedule-out to schedule-in

The gem_ctx_persistence/smoketest was detecting an odd coherency issue
inside the LRC context image; that the address of the ring buffer did
not match our associated struct intel_ring. As we set the address into
the context image when we pin the ring buffer into place before the
context is active, that leaves the question of where did it get
overwritten. Either the HW context save occurred after our pin which
would imply that our idle barriers are broken, or we overwrote the
context image ourselves. It is only in reset_active() where we dabble
inside the context image outside of a serialised path from schedule-out;
but we could equally perform the operation inside schedule-in which is
then fully serialised with the context pin -- and remains serialised by
the engine pulse with kill_context(). (The only downside, aside from
doing more work inside the engine->active.lock, was the plan to merge
all the reset paths into doing their context scrubbing on schedule-out
needs more thought.)

Fixes: d12acee84ffb ("drm/i915/execlists: Cancel banned contexts on schedule-out")
Testcase: igt/gem_ctx_persistence/smoketest
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191111133205.11590-3-chris@chris-wilson.co.uk
(cherry picked from commit 31b61f0ef9af62b6404d8df5dcd2cf58f80c9f53)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 31b61f0ef 11-Nov-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Move reset_active() from schedule-out to schedule-in

The gem_ctx_persistence/smoketest was detecting an odd coherency issue
inside the LRC context image; that the address of the ring buffer did
not match our associated struct intel_ring. As we set the address into
the context image when we pin the ring buffer into place before the
context is active, that leaves the question of where did it get
overwritten. Either the HW context save occurred after our pin which
would imply that our idle barriers are broken, or we overwrote the
context image ourselves. It is only in reset_active() where we dabble
inside the context image outside of a serialised path from schedule-out;
but we could equally perform the operation inside schedule-in which is
then fully serialised with the context pin -- and remains serialised by
the engine pulse with kill_context(). (The only downside, aside from
doing more work inside the engine->active.lock, was the plan to merge
all the reset paths into doing their context scrubbing on schedule-out
needs more thought.)

Fixes: d12acee84ffb ("drm/i915/execlists: Cancel banned contexts on schedule-out")
Testcase: igt/gem_ctx_persistence/smoketest
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191111133205.11590-3-chris@chris-wilson.co.uk


# 69a48c1d 10-Nov-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Reduce barrier on context switch to a wmb()

Having been forced to reduce Braswell back to using the aliasing ppgtt,
the coherency issue we previously observed cannot impact us. Reduce the
performance penalty imposed on all platforms from using the mfence to a
mere sfence.

References: cf66b8a0ba14 ("drm/i915/execlists: Apply a full mb before execution for Braswell")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191110185806.17413-10-chris@chris-wilson.co.uk


# 0a1f57b8 04-Nov-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Reset CSB pointers by mmio as well

Sometimes Icelake forgets to reset the CSB pointers on a GPU reset,
leading to it carry on updating the old tail of the buffer.

<0>[ 618.138490] i915_sel-5636 3d..1 673425465us : trace_ports: vecs0: submit { 14de2:504, 0:0 }
<0>[ 618.138490] i915_sel-5636 3.... 673425493us : intel_engine_reset: vecs0 flags=100
<0>[ 618.138490] i915_sel-5636 3.... 673425493us : execlists_reset_prepare: vecs0: depth<-0
<0>[ 618.138490] i915_sel-5636 3.... 673425493us : intel_engine_stop_cs: vecs0
<0>[ 618.138490] i915_sel-5636 3.... 673425523us : __intel_gt_reset: engine_mask=40
<0>[ 618.138490] i915_sel-5636 3.... 673425568us : execlists_reset: vecs0
<0>[ 618.138490] i915_sel-5636 3d..1 673425568us : process_csb: vecs0 cs-irq head=1, tail=2
<0>[ 618.138490] i915_sel-5636 3d..1 673425568us : process_csb: vecs0 csb[2]: status=0x00000001:0x40000000
<0>[ 618.138490] i915_sel-5636 3d..1 673425569us : trace_ports: vecs0: promote { 14de2:504*, 0:0 }
<0>[ 618.138490] i915_sel-5636 3d..1 673425570us : __i915_request_reset: vecs0 rq=14de2:504, guilty? yes
<0>[ 618.138490] i915_sel-5636 3d..1 673425571us : __execlists_reset: vecs0 replay {head:2de0, tail:2e48}
<0>[ 618.138490] i915_sel-5636 3d..1 673425572us : __i915_request_unsubmit: vecs0 fence 14de2:504, current 503
<0>[ 618.138490] i915_sel-5636 3.... 673435544us : intel_engine_cancel_stop_cs: vecs0
<0>[ 618.138490] i915_sel-5636 3.... 673435544us : process_csb: vecs0 cs-irq head=11, tail=11
<0>[ 618.138490] i915_sel-5636 3d..1 673435545us : __i915_request_submit: vecs0 fence 14de2:504, current 503
<0>[ 618.138490] i915_sel-5636 3d..1 673435546us : __execlists_submission_tasklet: vecs0: queue_priority_hint:-2147483648, submit:yes
<0>[ 618.138490] i915_sel-5636 3d..1 673435548us : trace_ports: vecs0: submit { 14de2:504*, 0:0 }
<0>[ 618.138490] i915_sel-5636 3.... 673435549us : execlists_reset_finish: vecs0: depth->0
<0>[ 618.138490] ksoftirq-21 2..s. 673435592us : process_csb: vecs0 cs-irq head=11, tail=3
<0>[ 618.138490] ksoftirq-21 2..s. 673435593us : process_csb: vecs0 csb[0]: status=0x00000001:0x40000000
<0>[ 618.138490] ksoftirq-21 2..s. 673435594us : trace_ports: vecs0: promote { 14de2:504*, 0:0 }
<0>[ 618.138490] ksoftirq-21 2..s. 673435596us : process_csb: vecs0 csb[1]: status=0x00000018:0x40000040
<0>[ 618.138490] ksoftirq-21 2..s. 673435597us : trace_ports: vecs0: completed { 14de2:504*, 0:0 }
<0>[ 618.138490] ksoftirq-21 2..s. 673435612us : process_csb: process_csb:2188 GEM_BUG_ON(!i915_request_completed(*execlists->active) && !reset_in_progress(execlists))

After the reset, we do another clflush before checking the CSB to be
sure we see whatever was left in the CSB prior to the reset. So it is
unlikely to be an incoherent view of the CSB, and more likely that
Icelake didn't reset its pointers.

References: 582a6f90aa0d ("drm/i915/execlists: Add a paranoid flush of the CSB pointers upon reset")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191104135307.21083-1-chris@chris-wilson.co.uk


# 38098750 01-Nov-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Ignore the inactive kernel context in assert_pending_valid

Filter out warnings for the kernel context that is used to flush
inactive contexts, as they do no not pose a risk.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191101082919.21122-1-chris@chris-wilson.co.uk


# b0b10248 01-Nov-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Verify context register state before execution

Check that the context's ring register state still matches our
expectations prior to execution.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191102125739.24626-1-chris@chris-wilson.co.uk


# 9f379407 30-Oct-2019 Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

drm/i915: drop lrc header page

Recent GuC binaries (including all the ones we're currently using)
don't require this shared area anymore, having moved the relevant
entries into the stage pool instead. i915 itself doesn't write
anything into it either, so we can safely drop it.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191031013040.25803-1-daniele.ceraolospurio@intel.com


# b79029b2 29-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Make timeslice duration configurable

Execlists uses a scheduling quantum (a timeslice) to alternate execution
between ready-to-run contexts of equal priority. This ensures that all
users (though only if they of equal importance) have the opportunity to
run and prevents livelocks where contexts may have implicit ordering due
to userspace semaphores. However, not all workloads necessarily benefit
from timeslicing and in the extreme some sysadmin may want to disable or
reduce the timeslicing granularity.

The timeslicing mechanism can be compiled out^W^W disabled (but should
DCE!) with

./scripts/config --set-val DRM_I915_TIMESLICE_DURATION 0

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191029091632.26281-1-chris@chris-wilson.co.uk


# 19c17b76 28-Oct-2019 Michal Wajdeczko <michal.wajdeczko@intel.com>

drm/i915/execlists: Use vfunc to check engine submission mode

While processing CSB there is no need to look at GuC submission
settings, just check if engine is configured for execlists mode.

While today GuC submission is disabled it's settings are still
based on modparam values that might not correctly reflect actual
submission status in case of any fallback. Until that is fully
fixed, use alternate method to confirm that engine really runs in
execlists mode by comparing set_default_submission vfunc.

v2: add other immediate use of new helper

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191028164520.31772-1-michal.wajdeczko@intel.com


# a7f328fc 27-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Simply walk back along request timeline on reset

The request's timeline will only contain requests from this context, in
order of execution. Therefore, we can simply look back along this
timeline to find the currently executing request.

If we do find that the current context has completed its last request,
that does not imply that all requests are completed in the context, so
only advance the ring->head up to the end of the known completions!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191028124125.25176-1-chris@chris-wilson.co.uk


# 35865aef 26-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/tgl: Adjust the location of RING_MI_MODE in the context image

The location of RING_MI_MODE (used to stop the ring across resets) moved
for Tigerlake. Fixup the new location and include a selftest to verify
the location in the default context image.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191026082220.32632-1-chris@chris-wilson.co.uk


# 5932925a 25-Oct-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Move intel_engine_context_in/out into intel_lrc.c

Intel_lrc.c is the only caller and so to avoid some header file ordering
issues in future patches move these two over there.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191025090952.10135-1-tvrtko.ursulin@linux.intel.com


# 2871ea85 24-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Split intel_ring_submission

Split the legacy submission backend from the common CS ring buffer
handling.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191024100344.5041-1-chris@chris-wilson.co.uk


# d12acee8 23-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Cancel banned contexts on schedule-out

On schedule-out (CS completion) of a banned context, scrub the context
image so that we do not replay the active payload. The intent is that we
skip banned payloads on request submission so that the timeline
advancement continues on in the background. However, if we are returning
to a preempted request, i915_request_skip() is ineffective and instead we
need to patch up the context image so that it continues from the start
of the next request.

v2: Fixup cancellation so that we only scrub the payload of the active
request and do not short-circuit the breadcrumbs (which might cause
other contexts to execute out of order).
v3: Grammar pass

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191023133108.21401-3-chris@chris-wilson.co.uk


# 3a7a92ab 23-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Force preemption

If the preempted context takes too long to relinquish control, e.g. it
is stuck inside a shader with arbitration disabled, evict that context
with an engine reset. This ensures that preemptions are reasonably
responsive, providing a tighter QoS for the more important context at
the cost of flagging unresponsive contexts more frequently (i.e. instead
of using an ~10s hangcheck, we now evict at ~100ms). The challenge of
lies in picking a timeout that can be reasonably serviced by HW for
typical workloads, balancing the existing clients against the needs for
responsiveness.

Note that coupled with timeslicing, this will lead to rapid GPU "hang"
detection with multiple active contexts vying for GPU time.

The forced preemption mechanism can be compiled out with

./scripts/config --set-val DRM_I915_PREEMPT_TIMEOUT 0

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191023133108.21401-2-chris@chris-wilson.co.uk


# 0587152b 22-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drop assertion that ce->pin_mutex guards state updates

The actual conditions are that we know the GPU is not accessing the
context, and we hold a pin on the context image to allow CPU access. We
used a fake lock on ce->pin_mutex so that we could try and use lockdep
to assert that access is serialised, but the various different
hardirq/softirq contexts where we need to *fake* holding the pin_mutex
are causing more trouble.

Still it would be nice if we did have a way to reassure ourselves that
the direct update to the context image is serialised with GPU execution.
In the meantime, stop lockdep complaining about false irq inversions.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111923
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191022122845.25038-1-chris@chris-wilson.co.uk


# 253a774b 18-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Don't merely skip submission if maybe timeslicing

Normally, we try and skip submission if ELSP[1] is filled. However, we
may desire to enable timeslicing due to the queue priority, even if
ELSP[1] itself does not require timeslicing. That is the queue is equal
priority to ELSP[0] and higher priority then ELSP[1]. Previously, we
would wait until the context switch to preempt the current ELSP[1], but
with timeslicing, we want to preempt ELSP[0] and replace it with the
queue.

In writing the test case, it become quickly apparent that we were also
suppressing the tasklet during promotion and so failing to notice when
the queue started requiring timeslicing.

Fixes: 2229adc81380 ("drm/i915/execlist: Trim immediate timeslice expiry")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191018072027.31948-1-chris@chris-wilson.co.uk


# 0a544a2a 23-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fixup preempt-to-busy vs resubmission of a virtual request

As preempt-to-busy leaves the request on the HW as the resubmission is
processed, that request may complete in the background and even cause a
second virtual request to enter queue. This second virtual request
breaks our "single request in the virtual pipeline" assumptions.
Furthermore, as the virtual request may be completed and retired, we
lose the reference the virtual engine assumes is held. Normally, just
removing the request from the scheduler queue removes it from the
engine, but the virtual engine keeps track of its singleton request via
its ve->request. This pointer needs protecting with a reference.

v2: Drop unnecessary motion of rq->engine = owner

Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923152844.8914-1-chris@chris-wilson.co.uk
(cherry picked from commit b647c7df01b75761b4c0b1cb6f4841088c0b1121)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 128260a4 22-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Refactor -EIO markup of hung requests

Pull setting -EIO on the hung requests into its own utility function.
Having allowed ourselves to short-circuit submission of completed
requests, we can now do the mark_eio() prior to submission and avoid
some redundant operations.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-4-chris@chris-wilson.co.uk
(cherry picked from commit 0d7cf7bc15e75bf79f2f65d61d19f896609f816a)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 2229adc8 16-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlist: Trim immediate timeslice expiry

We perform timeslicing immediately upon receipt of a request that may be
put into the second ELSP slot. The idea behind this was that since we
didn't install the timer if the second ELSP slot was empty, we would not
have any idea of how long ELSP[0] had been running and so giving the
newcomer a chance on the GPU was fair. However, this causes us extra
busy work that we may be able to avoid if we wait a jiffie for the first
timeslice as normal.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191016100851.4979-1-chris@chris-wilson.co.uk


# 08fff7ae 15-Oct-2019 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915/tgl: Wa_1607138340

Avoid possible cs hang with semaphores by disabling
lite restore.

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191015154449.10338-11-mika.kuoppala@linux.intel.com


# 2e19af94 15-Oct-2019 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915/tgl: Wa_1409600907

To avoid possible hang, we need to add depth stall if we flush the
depth cache.

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191015154449.10338-8-mika.kuoppala@linux.intel.com


# 36a6b5d9 15-Oct-2019 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915/tgl: Add extra hdc flush workaround

In order to ensure constant caches are invalidated
properly with a0, we need extra hdc flush after invalidation.

v2: use IS_TGL_REVID (Chris)

References: HSDES#1604544889
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191015154449.10338-4-mika.kuoppala@linux.intel.com


# 4aa0b5d4 15-Oct-2019 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915/tgl: Add HDC Pipeline Flush

Add hdc pipeline flush to ensure memory state is coherent
in L3 when we are done.

v2: Flush also in breadcrumbs (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191015154449.10338-3-mika.kuoppala@linux.intel.com


# 62037fff 15-Oct-2019 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915/tgl: Include ro parts of l3 to invalidate

Aim for completeness and invalidate also the ro parts
in l3 cache. This might allow to get rid of the preparser
disable/enable workaround on invalidation path.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191015154449.10338-2-mika.kuoppala@linux.intel.com


# 8b390c15 15-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Clear semaphore immediately upon ELSP promotion

There is no significance to our delay before clearing the semaphore the
engine is waiting on, so release it as soon as we acknowledge the CS
update following our preemption request. This should allow the GPU to
resume work earlier, if it was stuck on the semaphore at the end of a
request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191015093204.25693-1-chris@chris-wilson.co.uk


# 3c00660d 14-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Assert tasklet is locked for process_csb()

We rely on only the tasklet being allowed to call into process_csb(), so
assert that is locked when we do. As the tasklet uses a simple bitlock,
there is no strong lockdep checking so we must make do with a plain
assertion that the tasklet is running and assume that we are the
tasklet!

v2: Fixup intel_gt_sanitize() to prepare each engine for the reset so
that the locks are marked as held during the reset
v3: Check for existent function pointers for very early sanitisation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191014121336.30137-1-chris@chris-wilson.co.uk


# 89b6d183 13-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Tweak virtual unsubmission

Since commit e2144503bf3b ("drm/i915: Prevent bonded requests from
overtaking each other on preemption") we have restricted requests to run
on their chosen engine across preemption events. We can take this
restriction into account to know that we will want to resubmit those
requests onto the same physical engine, and so can shortcircuit the
virtual engine selection process and keep the request on the same
engine during unwind.

References: e2144503bf3b ("drm/i915: Prevent bonded requests from overtaking each other on preemption")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Ramlingam C <ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191013203012.25208-1-chris@chris-wilson.co.uk


# c3eb54aa 12-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark up "sentinel" requests

Sometimes we want to emit a terminator request, a request that flushes
the pipeline and allows no request to come after it. This can be used
for a "preempt-to-idle" to ensure that upon processing the
context-switch to that request, all other active contexts have been
flushed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191012070136.32058-1-chris@chris-wilson.co.uk


# d8ad5f52 11-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Prevent merging requests with conflicting flags

We set out-of-bound parameters inside the i915_requests.flags field,
such as disabling preemption or marking the end-of-context. We should
not coalesce consecutive requests if they have differing instructions
as we only inspect the last active request in a context. Thus if we
allow a later request to be merged into the same execution context, it
will mask any of the earlier flags.

References: 2a98f4e65bba ("drm/i915: add infrastructure to hold off preemption on a request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191011190325.10979-9-chris@chris-wilson.co.uk


# cbbf2787 11-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Only mark incomplete requests as -EIO on cancelling

Only the requests that have not completed do we want to change the
status of to signal the -EIO when cancelling the inflight set of requests
upon wedging.

Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191011103345.26013-1-chris@chris-wilson.co.uk


# c97fb526 10-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Leave tell-tales as to why pending[] is bad

Before we BUG out with bad pending state, leave a telltale as to which
test failed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191010071434.31195-2-chris@chris-wilson.co.uk


# bd9bec5b 10-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Mark up expected state during reset

Move the BUG_ON around slightly and add some explanations for each to
try and capture the expected state more carefully. We want to compare
the expected active state of our bookkeeping as compared to the tracked
HW state.

References: https://bugs.freedesktop.org/show_bug.cgi?id=111937
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191010083242.1387-1-chris@chris-wilson.co.uk


# 9d41318c 09-Oct-2019 Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

drm/i915/tgl: simplify the lrc register list for !RCS

There are small differences between the blitter and the video engines in
the xcs context image (e.g. registers 0x200 and 0x204 only exist on the
blitter). Since we never explicitly set a value for those register and
given that we don't need to update the offsets in the lrc image when we
change engine within the class for virtual engine because the HW can
handle that, instead of having a separate define for the BCS we can
just restrict the programming to the part we're interested in, which is
common across the engines.

Bspec: 45584
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191009230424.6507-2-daniele.ceraolospurio@intel.com


# ba2c74da 09-Oct-2019 Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

drm/i915/tgl: the BCS engine supports relative MMIO

The specs don't mention any specific HW limitation on the blitter and
manual inspection shows that the HW does set the relative MMIO bit in
the LRI of the blitter context image, so we can remove our limitations.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191009230424.6507-1-daniele.ceraolospurio@intel.com


# 749085a2 09-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Protect peeking at execlists->active

Now that we dropped the engine->active.lock serialisation from around
process_csb(), direct submission can run concurrently to the interrupt
handler. As such execlists->active may be advanced as we dequeue,
dropping the reference to the request. We need to employ our RCU request
protection to ensure that the request is not freed too early.

Fixes: df403069029d ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191009100955.21477-1-chris@chris-wilson.co.uk
(cherry picked from commit c949ae431467764277cdd88d7c26ff963a9db40a)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 68184eb7 23-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fixup preempt-to-busy vs reset of a virtual request

Due to the nature of preempt-to-busy the execlists active tracking and
the schedule queue may become temporarily desync'ed (between resubmission
to HW and its ack from HW). This means that we may have unwound a
request and passed it back to the virtual engine, but it is still
inflight on the HW and may even result in a GPU hang. If we detect that
GPU hang and try to reset, the hanging request->engine will no longer
match the current engine, which means that the request is not on the
execlists active list and we should not try to find an older incomplete
request. Given that we have deduced this must be a request on a virtual
engine, it is the single active request in the context and so must be
guilty (as the context is still inflight, it is prevented from being
executed on another engine as we process the reset).

Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923152844.8914-2-chris@chris-wilson.co.uk
(cherry picked from commit cb2377a919bbe8107af269c5a31a8d5cfb27d867)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# a8385f0c 22-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only enqueue already completed requests

If we are asked to submit a completed request, just move it onto the
active-list without modifying it's payload. If we try to emit the
modified payload of a completed request, we risk racing with the
ring->head update during retirement which may advance the head past our
breadcrumb and so we generate a warning for the emission being behind
the RING_HEAD.

v2: Commentary for the sneaky, shared responsibility between functions.
v3: Spelling mistakes and bonus assertion

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-3-chris@chris-wilson.co.uk
(cherry picked from commit c0bb487dc19fc45dbeede7dcf8f513df51a3cd33)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 6535a4b3 22-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Drop redundant list_del_init(&rq->sched.link)

Since amalgamating the queued and active lists in commit 422d7df4f090
("drm/i915: Replace engine->timeline with a plain list"), performing a
i915_request_submit() will remove the request from the execlists
priority queue.

References: 422d7df4f090 ("drm/i915: Replace engine->timeline with a plain list")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-2-chris@chris-wilson.co.uk
(cherry picked from commit 3231f8c01121ee1febfd82398ee22f7ff9dc5d76)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# c949ae43 09-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Protect peeking at execlists->active

Now that we dropped the engine->active.lock serialisation from around
process_csb(), direct submission can run concurrently to the interrupt
handler. As such execlists->active may be advanced as we dequeue,
dropping the reference to the request. We need to employ our RCU request
protection to ensure that the request is not freed too early.

Fixes: df403069029d ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191009100955.21477-1-chris@chris-wilson.co.uk


# 20af04f3 08-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Assign virtual_engine->uncore from first sibling

Copy across the engine->uncore shortcut to the virtual_engine from its
first physical engine, similar to the handling of the engine->gt
backpointer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191008070342.4045-1-chris@chris-wilson.co.uk


# 08ad9a38 04-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Fix annotation for decoupling virtual request

As we may signal a request and take the engine->active.lock within the
signaler, the engine submission paths have to use a nested annotation on
their requests -- but we guarantee that we can never submit on the same
engine as the signaling fence.

<4>[ 723.763281] WARNING: possible circular locking dependency detected
<4>[ 723.763285] 5.3.0-g80fa0e042cdb-drmtip_379+ #1 Tainted: G U
<4>[ 723.763288] ------------------------------------------------------
<4>[ 723.763291] gem_exec_await/1388 is trying to acquire lock:
<4>[ 723.763294] ffff93a7b53221d8 (&engine->active.lock){..-.}, at: execlists_submit_request+0x2b/0x1e0 [i915]
<4>[ 723.763378]
but task is already holding lock:
<4>[ 723.763381] ffff93a7c25f6d20 (&i915_request_get(rq)->submit/1){-.-.}, at: __i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[ 723.763420]
which lock already depends on the new lock.

<4>[ 723.763423]
the existing dependency chain (in reverse order) is:
<4>[ 723.763427]
-> #2 (&i915_request_get(rq)->submit/1){-.-.}:
<4>[ 723.763434] _raw_spin_lock_irqsave_nested+0x39/0x50
<4>[ 723.763478] __i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[ 723.763513] intel_engine_breadcrumbs_irq+0x3aa/0x5e0 [i915]
<4>[ 723.763600] cs_irq_handler+0x49/0x50 [i915]
<4>[ 723.763659] gen11_gt_irq_handler+0x17b/0x280 [i915]
<4>[ 723.763690] gen11_irq_handler+0x54/0xf0 [i915]
<4>[ 723.763695] __handle_irq_event_percpu+0x41/0x2d0
<4>[ 723.763699] handle_irq_event_percpu+0x2b/0x70
<4>[ 723.763702] handle_irq_event+0x2f/0x50
<4>[ 723.763706] handle_edge_irq+0xee/0x1a0
<4>[ 723.763709] do_IRQ+0x7e/0x160
<4>[ 723.763712] ret_from_intr+0x0/0x1d
<4>[ 723.763717] __slab_alloc.isra.28.constprop.33+0x4f/0x70
<4>[ 723.763720] kmem_cache_alloc+0x28d/0x2f0
<4>[ 723.763724] vm_area_dup+0x15/0x40
<4>[ 723.763727] dup_mm+0x2dd/0x550
<4>[ 723.763730] copy_process+0xf21/0x1ef0
<4>[ 723.763734] _do_fork+0x71/0x670
<4>[ 723.763737] __se_sys_clone+0x6e/0xa0
<4>[ 723.763741] do_syscall_64+0x4f/0x210
<4>[ 723.763744] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[ 723.763747]
-> #1 (&(&rq->lock)->rlock#2){-.-.}:
<4>[ 723.763752] _raw_spin_lock+0x2a/0x40
<4>[ 723.763789] __unwind_incomplete_requests+0x3eb/0x450 [i915]
<4>[ 723.763825] __execlists_submission_tasklet+0x9ec/0x1d60 [i915]
<4>[ 723.763864] execlists_submission_tasklet+0x34/0x50 [i915]
<4>[ 723.763874] tasklet_action_common.isra.5+0x47/0xb0
<4>[ 723.763878] __do_softirq+0xd8/0x4ae
<4>[ 723.763881] irq_exit+0xa9/0xc0
<4>[ 723.763883] smp_apic_timer_interrupt+0xb7/0x280
<4>[ 723.763887] apic_timer_interrupt+0xf/0x20
<4>[ 723.763892] cpuidle_enter_state+0xae/0x450
<4>[ 723.763895] cpuidle_enter+0x24/0x40
<4>[ 723.763899] do_idle+0x1e7/0x250
<4>[ 723.763902] cpu_startup_entry+0x14/0x20
<4>[ 723.763905] start_secondary+0x15f/0x1b0
<4>[ 723.763908] secondary_startup_64+0xa4/0xb0
<4>[ 723.763911]
-> #0 (&engine->active.lock){..-.}:
<4>[ 723.763916] __lock_acquire+0x15d8/0x1ea0
<4>[ 723.763919] lock_acquire+0xa6/0x1c0
<4>[ 723.763922] _raw_spin_lock_irqsave+0x33/0x50
<4>[ 723.763956] execlists_submit_request+0x2b/0x1e0 [i915]
<4>[ 723.764002] submit_notify+0xa8/0x13c [i915]
<4>[ 723.764035] __i915_sw_fence_complete+0x81/0x250 [i915]
<4>[ 723.764054] i915_sw_fence_wake+0x51/0x64 [i915]
<4>[ 723.764054] __i915_sw_fence_complete+0x1ee/0x250 [i915]
<4>[ 723.764054] dma_i915_sw_fence_wake_timer+0x14/0x20 [i915]
<4>[ 723.764054] dma_fence_signal_locked+0x9e/0x1c0
<4>[ 723.764054] dma_fence_signal+0x1f/0x40
<4>[ 723.764054] vgem_fence_signal_ioctl+0x67/0xc0 [vgem]
<4>[ 723.764054] drm_ioctl_kernel+0x83/0xf0
<4>[ 723.764054] drm_ioctl+0x2f3/0x3b0
<4>[ 723.764054] do_vfs_ioctl+0xa0/0x6f0
<4>[ 723.764054] ksys_ioctl+0x35/0x60
<4>[ 723.764054] __x64_sys_ioctl+0x11/0x20
<4>[ 723.764054] do_syscall_64+0x4f/0x210
<4>[ 723.764054] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[ 723.764054]
other info that might help us debug this:

<4>[ 723.764054] Chain exists of:
&engine->active.lock --> &(&rq->lock)->rlock#2 --> &i915_request_get(rq)->submit/1

<4>[ 723.764054] Possible unsafe locking scenario:

<4>[ 723.764054] CPU0 CPU1
<4>[ 723.764054] ---- ----
<4>[ 723.764054] lock(&i915_request_get(rq)->submit/1);
<4>[ 723.764054] lock(&(&rq->lock)->rlock#2);
<4>[ 723.764054] lock(&i915_request_get(rq)->submit/1);
<4>[ 723.764054] lock(&engine->active.lock);
<4>[ 723.764054]
*** DEADLOCK ***

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111862
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004194758.19679-1-chris@chris-wilson.co.uk


# 7d0eb51d 23-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Prevent bonded requests from overtaking each other on preemption

Force bonded requests to run on distinct engines so that they cannot be
shuffled onto the same engine where timeslicing will reverse the order.
A bonded request will often wait on a semaphore signaled by its master,
creating an implicit dependency -- if we ignore that implicit dependency
and allow the bonded request to run on the same engine and before its
master, we will cause a GPU hang. [Whether it will hang the GPU is
debatable, we should keep on timeslicing and each timeslice should be
"accidentally" counted as forward progress, in which case it should run
but at one-half to one-third speed.]

We can prevent this inversion by restricting which engines we allow
ourselves to jump to upon preemption, i.e. baking in the arrangement
established at first execution. (We should also consider capturing the
implicit dependency using i915_sched_add_dependency(), but first we need
to think about the constraints that requires on the execution/retirement
ordering.)

Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
References: ee1136908e9b ("drm/i915/execlists: Virtual engine bonding")
Testcase: igt/gem_exec_balancer/bonded-slice
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923152844.8914-3-chris@chris-wilson.co.uk
(cherry picked from commit e2144503bf3b22275dd33cef2880e1cb5fb200c5)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 93be1bae 07-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Remove incorrect BUG_ON for schedule-out

As we may unwind incomplete requests (for preemption) prior to
processing the CSB and the schedule-out events, we may update rq->engine
(resetting it to point back to the parent virtual engine) prior to
calling execlists_schedule_out(), invalidating the assertion that the
request still points to the inflight engine. (The likelihood of this is
increased if the CSB interrupt processing is pushed to the ksoftirqd for
being too slow and direct submission overtakes it.)

Tvrtko summarised it as:
"So unwind from direct submission resets rq->engine and races with
process_csb from the tasklet which notices request has actually
completed."

Reported-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Fixes: df403069029d ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190907105046.19934-1-chris@chris-wilson.co.uk
(cherry picked from commit d810583fc2fcf139cc766eb2303500b2d9cf064d)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 2935ed53 04-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove logical HW ID

With the introduction of ctx->engines[] we allow multiple logical
contexts to be used on the same engine (e.g. with virtual engines).
According to bspec, aach logical context requires a unique tag in order
for context-switching to occur correctly between them. [Simple
experiments show that it is not so easy to trick the HW into performing
a lite-restore with matching logical IDs, though my memory from early
Broadwell experiments do suggest that it should be generating
lite-restores.]

We only need to keep a unique tag for the active lifetime of the
context, and for as long as we need to identify that context. The HW
uses the tag to determine if it should use a lite-restore (why not the
LRCA?) and passes the tag back for various status identifies. The only
status we need to track is for OA, so when using perf, we assign the
specific context a unique tag.

v2: Calculate required number of tags to fill ELSP.

Fixes: 976b55f0e1db ("drm/i915: Allow a context to define its set of engines")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111895
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-14-chris@chris-wilson.co.uk


# 44d0a9c0 03-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Skip redundant resubmission

If we unwind the active requests, and on resubmission discover that we
intend to preempt the active contexts with themselves, simply skip the
ELSP submission.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191003210100.22250-1-chris@chris-wilson.co.uk


# fcde8c7e 02-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/selftests: Exercise potential false lite-restore

If execlists's lite-restore is based on the common GEM context tag
rather than the per-intel_context LRCA, then a context switch between
two intel_contexts on the same engine derived from the same GEM context
will perform a lite-restore instead of a full context switch. We can
exploit this by poisoning the ringbuffer of the first context and trying
to trick a simple RING_TAIL update (i.e. lite-restore)

v2: Also check what happens if preempt ce[0] with ce[1] (both instances
on the same engine from the same parent context) [Tvrtko]

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191002183459.26614-1-chris@chris-wilson.co.uk


# f8db4d05 01-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Initialise breadcrumb lists on the virtual engine

With deferring the breadcrumb signalling to the virtual engine (thanks
preempt-to-busy) we need to make sure the lists and irq-worker are ready
to send a signal.

[41958.710544] BUG: kernel NULL pointer dereference, address: 0000000000000000
[41958.710553] #PF: supervisor write access in kernel mode
[41958.710556] #PF: error_code(0x0002) - not-present page
[41958.710558] PGD 0 P4D 0
[41958.710562] Oops: 0002 [#1] SMP
[41958.710565] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G U 5.3.0+ #207
[41958.710568] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017
[41958.710602] RIP: 0010:i915_request_enable_breadcrumb+0xe1/0x130 [i915]
[41958.710605] Code: 8b 44 24 30 48 89 41 08 48 89 08 48 8b 85 98 01 00 00 48 8d 8d 90 01 00 00 48 89 95 98 01 00 00 49 89 4c 24 28 49 89 44 24 30 <48> 89 10 f0 80 4b 30 10 c6 85 88 01 00 00 00 e9 1a ff ff ff 48 83
[41958.710609] RSP: 0018:ffffc90000003de0 EFLAGS: 00010046
[41958.710612] RAX: 0000000000000000 RBX: ffff888735424480 RCX: ffff8887cddb2190
[41958.710614] RDX: ffff8887cddb3570 RSI: ffff888850362190 RDI: ffff8887cddb2188
[41958.710617] RBP: ffff8887cddb2000 R08: ffff8888503624a8 R09: 0000000000000100
[41958.710619] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8887cddb3548
[41958.710622] R13: 0000000000000000 R14: 0000000000000046 R15: ffff888850362070
[41958.710625] FS: 0000000000000000(0000) GS:ffff88885ea00000(0000) knlGS:0000000000000000
[41958.710628] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[41958.710630] CR2: 0000000000000000 CR3: 0000000002c09002 CR4: 00000000001606f0
[41958.710633] Call Trace:
[41958.710636] <IRQ>
[41958.710668] __i915_request_submit+0x12b/0x160 [i915]
[41958.710693] virtual_submit_request+0x67/0x120 [i915]
[41958.710720] __unwind_incomplete_requests+0x131/0x170 [i915]
[41958.710744] execlists_dequeue+0xb40/0xe00 [i915]
[41958.710771] execlists_submission_tasklet+0x10f/0x150 [i915]
[41958.710776] tasklet_action_common.isra.17+0x41/0xa0
[41958.710781] __do_softirq+0xc8/0x221
[41958.710785] irq_exit+0xa6/0xb0
[41958.710788] smp_apic_timer_interrupt+0x4d/0x80
[41958.710791] apic_timer_interrupt+0xf/0x20
[41958.710794] </IRQ>

Fixes: cb2377a919bb ("drm/i915: Fixup preempt-to-busy vs reset of a virtual request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191001103518.9113-1-chris@chris-wilson.co.uk


# e1237523 26-Sep-2019 MichaƂ Winiarski <michal.winiarski@intel.com>

drm/i915/execlists: Use per-process HWSP as scratch

Some of our commands (MI_FLUSH_DW / PIPE_CONTROL) require a post-sync write
operation to be performed. Currently we're using dedicated VMA for
PIPE_CONTROL and global HWSP for MI_FLUSH_DW.
On execlists platforms, each of our contexts has an area that can be
used as scratch space. Let's use that instead.

Signed-off-by: MichaƂ Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190926133142.2838-2-chris@chris-wilson.co.uk


# f9d4eae2 25-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Simplify gen12_csb_parse

Having decided that we only care about the promotion predicate, we can
simplify gen12_csb_parse to simply check whether we need to jump to a
new queue.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190925130845.17952-1-chris@chris-wilson.co.uk


# 7dc56af5 24-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/selftests: Verify the LRC register layout between init and HW

Before we submit the first context to HW, we need to construct a valid
image of the register state. This layout is defined by the HW and should
match the layout generated by HW when it saves the context image.
Asserting that this should be equivalent should help avoid any undefined
behaviour and verify that we haven't missed anything important!

Of course, having insisted that the initial register state within the
LRC should match that returned by HW, we need to ensure that it does.

v2: Drop the RELATIVE_MMIO flag from gen11, we ignore it for
constructing the lrc image.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190924145950.3011-1-chris@chris-wilson.co.uk


# e2144503 23-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Prevent bonded requests from overtaking each other on preemption

Force bonded requests to run on distinct engines so that they cannot be
shuffled onto the same engine where timeslicing will reverse the order.
A bonded request will often wait on a semaphore signaled by its master,
creating an implicit dependency -- if we ignore that implicit dependency
and allow the bonded request to run on the same engine and before its
master, we will cause a GPU hang. [Whether it will hang the GPU is
debatable, we should keep on timeslicing and each timeslice should be
"accidentally" counted as forward progress, in which case it should run
but at one-half to one-third speed.]

We can prevent this inversion by restricting which engines we allow
ourselves to jump to upon preemption, i.e. baking in the arrangement
established at first execution. (We should also consider capturing the
implicit dependency using i915_sched_add_dependency(), but first we need
to think about the constraints that requires on the execution/retirement
ordering.)

Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
References: ee1136908e9b ("drm/i915/execlists: Virtual engine bonding")
Testcase: igt/gem_exec_balancer/bonded-slice
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923152844.8914-3-chris@chris-wilson.co.uk


# cb2377a9 23-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fixup preempt-to-busy vs reset of a virtual request

Due to the nature of preempt-to-busy the execlists active tracking and
the schedule queue may become temporarily desync'ed (between resubmission
to HW and its ack from HW). This means that we may have unwound a
request and passed it back to the virtual engine, but it is still
inflight on the HW and may even result in a GPU hang. If we detect that
GPU hang and try to reset, the hanging request->engine will no longer
match the current engine, which means that the request is not on the
execlists active list and we should not try to find an older incomplete
request. Given that we have deduced this must be a request on a virtual
engine, it is the single active request in the context and so must be
guilty (as the context is still inflight, it is prevented from being
executed on another engine as we process the reset).

Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923152844.8914-2-chris@chris-wilson.co.uk


# b647c7df 23-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fixup preempt-to-busy vs resubmission of a virtual request

As preempt-to-busy leaves the request on the HW as the resubmission is
processed, that request may complete in the background and even cause a
second virtual request to enter queue. This second virtual request
breaks our "single request in the virtual pipeline" assumptions.
Furthermore, as the virtual request may be completed and retired, we
lose the reference the virtual engine assumes is held. Normally, just
removing the request from the scheduler queue removes it from the
engine, but the virtual engine keeps track of its singleton request via
its ve->request. This pointer needs protecting with a reference.

v2: Drop unnecessary motion of rq->engine = owner

Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923152844.8914-1-chris@chris-wilson.co.uk


# 0d7cf7bc 22-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Refactor -EIO markup of hung requests

Pull setting -EIO on the hung requests into its own utility function.
Having allowed ourselves to short-circuit submission of completed
requests, we can now do the mark_eio() prior to submission and avoid
some redundant operations.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-4-chris@chris-wilson.co.uk


# c0bb487d 22-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only enqueue already completed requests

If we are asked to submit a completed request, just move it onto the
active-list without modifying it's payload. If we try to emit the
modified payload of a completed request, we risk racing with the
ring->head update during retirement which may advance the head past our
breadcrumb and so we generate a warning for the emission being behind
the RING_HEAD.

v2: Commentary for the sneaky, shared responsibility between functions.
v3: Spelling mistakes and bonus assertion

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-3-chris@chris-wilson.co.uk


# 3231f8c0 22-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Drop redundant list_del_init(&rq->sched.link)

Since amalgamating the queued and active lists in commit 422d7df4f090
("drm/i915: Replace engine->timeline with a plain list"), performing a
i915_request_submit() will remove the request from the execlists
priority queue.

References: 422d7df4f090 ("drm/i915: Replace engine->timeline with a plain list")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-2-chris@chris-wilson.co.uk


# ae911b23 22-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Relax assertion for a pinned context image on reset

A gpu hang can occur at any time, given a sufficiently angry gpu. An
example is when it forgets to perform a context-switch at the end of a
request, leaving us with a hanging GPU on a completed request. Here, we
may retire the request, only leaving its context alive via the active
barrier. When we reset the GPU on a completed request, we do not modify
its context image (just updating the ring state) and can safely defer
the assertion that we have the image pinned and ready to modify.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111639
Fixes: dffa8feb3084 ("drm/i915/perf: Assert locking for i915_init_oa_perf_state()")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-1-chris@chris-wilson.co.uk


# d19d71fc 18-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark i915_request.timeline as a volatile, rcu pointer

The request->timeline is only valid until the request is retired (i.e.
before it is completed). Upon retiring the request, the context may be
unpinned and freed, and along with it the timeline may be freed. We
therefore need to be very careful when chasing rq->timeline that the
pointer does not disappear beneath us. The vast majority of users are in
a protected context, either during request construction or retirement,
where the timeline->mutex is held and the timeline cannot disappear. It
is those few off the beaten path (where we access a second timeline) that
need extra scrutiny -- to be added in the next patch after first adding
the warnings about dangerous access.

One complication, where we cannot use the timeline->mutex itself, is
during request submission onto hardware (under spinlocks). Here, we want
to check on the timeline to finalize the breadcrumb, and so we need to
impose a second rule to ensure that the request->timeline is indeed
valid. As we are submitting the request, it's context and timeline must
be pinned, as it will be used by the hardware. Since it is pinned, we
know the request->timeline must still be valid, and we cannot submit the
idle barrier until after we release the engine->active.lock, ergo while
submitting and holding that spinlock, a second thread cannot release the
timeline.

v2: Don't be lazy inside selftests; hold the timeline->mutex for as long
as we need it, and tidy up acquiring the timeline with a bit of
refactoring (i915_active_add_request)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190919111912.21631-1-chris@chris-wilson.co.uk


# c45e788d 19-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/tgl: Suspend pre-parser across GTT invalidations

Before we execute a batch, we must first issue any and all TLB
invalidations so that batch picks up the new page table entries.
Tigerlake's preparser is weakening our post-sync CS_STALL inside the
invalidate pipe-control and allowing the loading of the batch buffer
before we have setup its page table (and so it loads the wrong page and
executes indefinitely).

The igt_cs_tlb indicates that this issue can only be observed on rcs,
even though the preparser is common to all engines. Alternatively, we
could do TLB shootdown via mmio on updating the GTT.

By inserting the pre-parser disable inside EMIT_INVALIDATE, we will also
accidentally fixup execution that writes into subsequent batches, such
as gem_exec_whisper and even relocations performed on the GPU. We should
be careful not to allow this disable to become baked into the uABI! The
issue is that if userspace relies on our disabling of the HW
optimisation, when we are ready to enable that optimisation, userspace
will then be broken...

Testcase: igt/i915_selftests/live_gtt/igt_cs_tlb
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111753
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190919151811.9526-1-chris@chris-wilson.co.uk


# c210e85b 17-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/tgl: Extend MI_SEMAPHORE_WAIT

On Tigerlake, MI_SEMAPHORE_WAIT grew an extra dword, so be sure to
update the length field and emit that extra parameter and any padding
noop as required.

v2: Define the token shift while we are adding the updated MI_SEMAPHORE_WAIT
v3: Use int instead of bool in the addition so that readers are not left
wondering about the intricacies of the C spec. Now they just have to
worry what the integer value of a boolean operation is...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190917123055.28965-1-chris@chris-wilson.co.uk


# ee73e279 12-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/tgl: Disable preemption while being debugged

We see failures where the context continues executing past a
preemption event, eventually leading to situations where a request has
executed before we have event submitted it to HW! It seems like tgl is
ignoring our RING_TAIL updates, but more likely is that there is a
missing update required for our semaphore waits around preemption.

v2: And disable internal semaphore usage

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190912132313.12751-1-chris@chris-wilson.co.uk


# a17592ef 12-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Ensure the context is reloaded after a GPU reset

After we manipulate the context to allow replay after a GPU reset, force
that context to be reloaded. This should be a layer of paranoia, for if
the GPU was reset, the context will no longer be resident!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190912092933.4729-2-chris@chris-wilson.co.uk


# 582a6f90 12-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Add a paranoid flush of the CSB pointers upon reset

After a GPU reset, we need to drain all the CS events so that we have an
accurate picture of the execlists state at the time of the reset. Be
paranoid and force a read of the CSB write pointer from memory.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190912092933.4729-1-chris@chris-wilson.co.uk


# 198d2533 07-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Ignore lost completion events

Icelake hit an issue where it missed reporting a completion event and
instead jumped straight to a idle->active event (skipping over the
active->idle and not even hitting the lite-restore preemption).

661497511us : process_csb: rcs0 cs-irq head=11, tail=0
661497512us : process_csb: rcs0 csb[0]: status=0x10008002:0x00000020 [lite-restore]
661497512us : trace_ports: rcs0: preempted { 28cc8:11052, 0:0 }
661497513us : trace_ports: rcs0: promote { 28cc8:11054, 0:0 }
661497514us : __i915_request_submit: rcs0 fence 28cc8:11056, current 11052
661497514us : __execlists_submission_tasklet: rcs0: queue_priority_hint:-2147483648, submit:yes
661497515us : trace_ports: rcs0: submit { 28cc8:11056, 0:0 }
661497530us : process_csb: rcs0 cs-irq head=0, tail=1
661497530us : process_csb: rcs0 csb[1]: status=0x10008002:0x00000020 [lite-restore]
661497531us : trace_ports: rcs0: preempted { 28cc8:11054!, 0:0 }
661497535us : trace_ports: rcs0: promote { 28cc8:11056, 0:0 }
661497540us : __i915_request_submit: rcs0 fence 28cc8:11058, current 11054
661497544us : __execlists_submission_tasklet: rcs0: queue_priority_hint:-2147483648, submit:yes
661497545us : trace_ports: rcs0: submit { 28cc8:11058, 0:0 }
661497553us : process_csb: rcs0 cs-irq head=1, tail=2
661497553us : process_csb: rcs0 csb[2]: status=0x10000001:0x00000000 [idle->active]
661497574us : process_csb: process_csb:1538 GEM_BUG_ON(*execlists->active)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190907084334.28952-1-chris@chris-wilson.co.uk


# fa9a09f1 10-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Clear STOP_RING bit on reset

During reset, we try to ensure no forward progress of the CS prior to
the reset by setting the STOP_RING bit in RING_MI_MODE. Since gen9, this
register is context saved and do we end up in the odd situation where we
save the STOP_RING bit and so try to stop the engine again immediately
upon resume. This is quite unexpected and causes us to complain about an
early CS completion event!

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111514
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190910080208.4223-1-chris@chris-wilson.co.uk


# d810583f 07-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Remove incorrect BUG_ON for schedule-out

As we may unwind incomplete requests (for preemption) prior to
processing the CSB and the schedule-out events, we may update rq->engine
(resetting it to point back to the parent virtual engine) prior to
calling execlists_schedule_out(), invalidating the assertion that the
request still points to the inflight engine. (The likelihood of this is
increased if the CSB interrupt processing is pushed to the ksoftirqd for
being too slow and direct submission overtakes it.)

Tvrtko summarised it as:
"So unwind from direct submission resets rq->engine and races with
process_csb from the tasklet which notices request has actually
completed."

Reported-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Fixes: df403069029d ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190907105046.19934-1-chris@chris-wilson.co.uk


# 5bf05dc5 06-Sep-2019 Michel Thierry <michel.thierry@intel.com>

drm/i915/tgl: Register state context definition for Gen12

Gen12 has subtle changes in the reg state context offsets (some fields
are gone, some are in a different location), compared to previous Gens.

The simplest approach seems to be keeping Gen12 (and future platform)
changes apart from the previous gens, while keeping the registers that
are contiguous in functions we can reuse.

v2: alias, virtual engine, rpcs, prune unused regs
v3: use engine base (Daniele), take ctx_bb for all

Bspec: 46255
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Tested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
[ickle: Tweaked the GEM_WARN_ON after settling on a compromise with
Daniele]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190906122314.2146-2-mika.kuoppala@linux.intel.com


# cdb736fa 06-Sep-2019 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Use engine relative LRIs on context setup

Daniele pointed out that relative mmio works differently in
on context restore. Instead of adding the engine mmio base to offset,
it masks out the base and adds bits [12:2] to current engine base.

This should allow us to construct context register state to be
applicable to all instances, including virtual. And avoid the trouble
of updating the registers on virtual instances when submitting work.

v2: only enable for gen12 for now (Mika)
v3: make enabling readable (Chris)

Bspec: 20206
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190906134957.25909-1-mika.kuoppala@linux.intel.com


# dffa8feb 30-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/perf: Assert locking for i915_init_oa_perf_state()

We use the context->pin_mutex to serialise updates to the OA config and
the registers values written into each new context. Document this
relationship and assert we do hold the context->pin_mutex as used by
gen8_configure_all_contexts() to serialise updates to the OA config
itself.

v2: Add a white-lie for when we call intel_gt_resume() from init.
v3: Lie while we have the context pinned inside atomic reset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> #v1
Link: https://patchwork.freedesktop.org/patch/msgid/20190830181929.18663-1-chris@chris-wilson.co.uk


# 0b718ba1 30-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gtt: Downgrade Cherryview back to aliasing-ppgtt

With the upcoming change in timing (dramatically reducing the latency
between manipulating the ppGTT and execution), no amount of tweaking
could save Cherryview, it would always fail to invalidate its TLB.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190830180000.24608-2-chris@chris-wilson.co.uk


# 11988e39 29-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Try rearranging breadcrumb flush

The addition of the DC_FLUSH failed to ensure sanctity of the post-sync
write as CI immediately got a completion CS-event before the breadcrumb
was coherent. So let's try the other idea of moving the post-sync write
into the CS_STALL.

References: https://bugs.freedesktop.org/show_bug.cgi?id=111514
References: e8f6b4952ec5 ("drm/i915/execlists: Flush the post-sync breadcrumb write harder")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190829081150.10271-2-chris@chris-wilson.co.uk


# e8f6b495 27-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Flush the post-sync breadcrumb write harder

Quite rarely we see that the CS completion event fires before the
breadcrumb is coherent, which presumably is a result of the CS_STALL not
waiting for the post-sync operation. Try throwing in a DC_FLUSH into
the following pipecontrol to see if that makes any difference.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190827120615.31390-1-chris@chris-wilson.co.uk


# 8a9a9827 27-Aug-2019 Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

drm/i915: use a separate context for gpu relocs

The CS pre-parser can pre-fetch commands across memory sync points and
starting from gen12 it is able to pre-fetch across BB_START and BB_END
boundaries as well, so when we emit gpu relocs the pre-parser might
fetch the target location of the reloc before the memory write lands.

The parser can't pre-fetch across the ctx switch, so we use a separate
context to guarantee that the memory is synchronized before the parser
can get to it.

Note that there is no risk of the CS doing a lite restore from the reloc
context to the user context, even if the two have the same hw_id,
because since gen11 the CS also checks the LRCA when deciding if it can
lite-restore.

v2: limit new context to gen12+, release in eb_destroy, add a comment
in emit_fini_breadcrumb (Chris).

Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190827185805.21799-1-daniele.ceraolospurio@intel.com


# cccdce1d 27-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Make engine's batch pool safe for use with virtual engines

A virtual engine itself does not have a batch pool, but we can gleefully
use any of its siblings instead.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190827135935.3831-1-chris@chris-wilson.co.uk


# a20ab592 21-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Set priority hint prior to submission

Since we now run process_csb() outside of the engine->active.lock, we
can process a CS-event immediately upon our ELSP write. As we currently
inspect the pending queue *after* the ELSP write, there is an
opportunity for a CS-event to update the pending queue before we can
read it, making ourselves chases an invalid pointer.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111427
Fixes: df403069029d ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190821142336.21609-1-chris@chris-wilson.co.uk


# 13e53c5c 17-Aug-2019 Lucas De Marchi <lucas.demarchi@intel.com>

drm/i915/tgl: Introduce initial Tiger Lake workarounds

Add empty workaround hooks for Tiger Lake. The workarounds will be added
on separate patches. We were already applying
WaRsForcewakeAddDelayForAck, which is indeed still valid, so also update
the comment.

Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190817093902.2171-21-lucas.demarchi@intel.com


# f4785682 20-Aug-2019 Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

drm/i915/tgl: Gen12 csb support

The CSB format has been reworked for Gen12 to include information on
both the context we're switching away from and the context we're
switching to. After the change, some of the events don't have their
own bit anymore and need to be inferred from other values in the csb.
One of the context IDs (0x7FF) has also been reserved to indicate
the invalid ctx, i.e. engine idle.

Note that the full context ID includes the SW counter as well, but since
we currently only care if the context is valid or not we can ignore that
part.

v2: fix mask size, fix and expand comments (Tvrtko),
use if-ladder (Chris)

Bspec: 45555, 46144
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190820102201.29849-1-chris@chris-wilson.co.uk


# 487f471d 17-Aug-2019 Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

drm/i915/tgl: add Gen12 default indirect ctx offset

Gen12 uses a new indirect ctx offset.

Bspec: 11740
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190817093902.2171-28-lucas.demarchi@intel.com


# 9559c875 17-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/selftests: Check the context size

Add a redzone to our context image and check the HW does not write into
after a context save, to verify that we have the correct context size.
(This does vary with feature bits, so test with a live setup that should
match how we run userspace.)

v2: Check the redzone on every context unpin
v3: Use a kernel context to prevent loading garbage for ringbuffer
submission

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190817073711.5897-1-chris@chris-wilson.co.uk


# df403069 16-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Lift process_csb() out of the irq-off spinlock

If we only call process_csb() from the tasklet, though we lose the
ability to bypass ksoftirqd interrupt processing on direct submission
paths, we can push it out of the irq-off spinlock.

The penalty is that we then allow schedule_out to be called concurrently
with schedule_in requiring us to handle the usage count (baked into the
pointer itself) atomically.

As we do kick the tasklets (via local_bh_enable()) after our submission,
there is a possibility there to see if we can pull the local softirq
processing back from the ksoftirqd.

v2: Store the 'switch_priority_hint' on submission, so that we can
safely check during process_csb().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190816171608.11760-1-chris@chris-wilson.co.uk


# e5dadff4 15-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Protect request retirement with timeline->mutex

Forgo the struct_mutex requirement for request retirement as we have
been transitioning over to only using the timeline->mutex for
controlling the lifetime of a request on that timeline.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190815205709.24285-4-chris@chris-wilson.co.uk


# 531958f6 15-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Track timeline activeness in enter/exit

Lift moving the timeline to/from the active_list on enter/exit in order
to shorten the active tracking span in comparison to the existing
pin/unpin.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190815205709.24285-1-chris@chris-wilson.co.uk


# 845f7f7e 14-Aug-2019 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915/icl: Add gen11 specific render breadcrumbs

Flush according to what gen11 expects when writing
breadcrumbs. As only the seqnowrite + flush differs
between engine and gens, enclose the footer to
helper.

v2: avoid problem of sane local naming by not using them

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190815094929.358-1-mika.kuoppala@linux.intel.com


# 8a8b540a 15-Aug-2019 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915/icl: Add command cache invalidate

On the set of invalidations, we need to add command
cache invalidate as a new domain.

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190815083055.14132-2-mika.kuoppala@linux.intel.com


# cfba6bd8 15-Aug-2019 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915/icl: Implement gen11 flush including tile cache

Add tile cache flushing for gen11. To relive us from the
burden of previous obsolete workarounds, make a dedicated
flush/invalidate callback for gen11.

To fortify an independent single flush, do post
sync op as there are indications that without it
we don't flush everything. This should also make this
callback more readily usable in tgl (see l3 fabric flush).

v2: whitespacing

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190815083055.14132-1-mika.kuoppala@linux.intel.com


# 5f15c1e6 12-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/guc: Use a local cancel_port_requests

Since execlists and the guc have diverged in their port tracking, we
cannot simply reuse the execlists cancellation code as it leads to
unbalanced reference counting. Use a local, simpler routine for the guc.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190812203626.3948-1-chris@chris-wilson.co.uk


# f597625d 12-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Avoid sync calls during park

Since we allow ourselves to use non-process context during parking, we
cannot allow ourselves to sleep and in particular cannot call
del_timer_sync() -- but we can use a plain del_timer().

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111375
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190812091045.29587-1-chris@chris-wilson.co.uk


# 75d0a7f3 09-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Lift timeline into intel_context

Move the timeline from being inside the intel_ring to intel_context
itself. This saves much pointer dancing and makes the relations of the
context to its timeline much clearer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190809182518.20486-4-chris@chris-wilson.co.uk


# 48ae397b 09-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Push the ring creation flags to the backend

Push the ring creation flags from the outer GEM context to the inner
intel_context to avoid an unsightly back-reference from inside the
backend.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190809182518.20486-3-chris@chris-wilson.co.uk


# 4c60b1aa 09-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Make deferred context allocation explicit

Refactor the backends to handle the deferred context allocation in a
consistent manner, and allow calling it as an explicit first step in
pinning a context for the first time. This should make it easier for
backends to keep track of partially constructed contexts from
initialisation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190809182518.20486-2-chris@chris-wilson.co.uk


# 6cd34b10 09-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Backtrack along timeline

After a preempt-to-busy, we may find an active request that is caught
between execution states. Walk back along the timeline instead of the
execution list to be safe.

[ 106.417541] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[ 106.417659] ==================================================================
[ 106.418041] BUG: KASAN: slab-out-of-bounds in __execlists_reset+0x2f2/0x440 [i915]
[ 106.418123] Read of size 8 at addr ffff888703506b30 by task swapper/1/0
[ 106.418194]
[ 106.418267] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G U 5.3.0-rc3+ #5
[ 106.418344] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017
[ 106.418434] Call Trace:
[ 106.418508] <IRQ>
[ 106.418585] dump_stack+0x5b/0x90
[ 106.418941] ? __execlists_reset+0x2f2/0x440 [i915]
[ 106.419022] print_address_description+0x67/0x32d
[ 106.419376] ? __execlists_reset+0x2f2/0x440 [i915]
[ 106.419731] ? __execlists_reset+0x2f2/0x440 [i915]
[ 106.419810] __kasan_report.cold.6+0x1a/0x3c
[ 106.419888] ? __trace_bprintk+0xc0/0xd0
[ 106.420239] ? __execlists_reset+0x2f2/0x440 [i915]
[ 106.420318] check_memory_region+0x144/0x1c0
[ 106.420671] __execlists_reset+0x2f2/0x440 [i915]
[ 106.421029] execlists_reset+0x3d/0x50 [i915]
[ 106.421387] intel_engine_reset+0x203/0x3a0 [i915]
[ 106.421744] ? igt_reset_nop+0x2b0/0x2b0 [i915]
[ 106.421825] ? _raw_spin_trylock_bh+0xe0/0xe0
[ 106.421901] ? rcu_core+0x1b9/0x6a0
[ 106.422251] preempt_reset+0x9a/0xf0 [i915]
[ 106.422333] tasklet_action_common.isra.15+0xc0/0x1e0
[ 106.422685] ? execlists_submit_request+0x200/0x200 [i915]
[ 106.422764] __do_softirq+0x106/0x3cf
[ 106.422840] irq_exit+0xdc/0xf0
[ 106.422914] smp_apic_timer_interrupt+0x81/0x1c0
[ 106.422988] apic_timer_interrupt+0xf/0x20
[ 106.423059] </IRQ>
[ 106.423144] RIP: 0010:cpuidle_enter_state+0xc3/0x620
[ 106.423222] Code: 24 0f 1f 44 00 00 31 ff e8 da 87 9c ff 80 7c 24 10 00 74 12 9c 58 f6 c4 02 0f 85 33 05 00 00 31 ff e8 c1 77 a3 ff fb 45 85 e4 <0f> 89 bf 02 00 00 48 8d 7d 10 e8 4e 45 b9 ff c7 45 10 00 00 00 00
[ 106.423311] RSP: 0018:ffff88881c30fda8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[ 106.423390] RAX: 0000000000000000 RBX: ffffffff825b4c80 RCX: ffffffff810c8a00
[ 106.423465] RDX: dffffc0000000000 RSI: 0000000039f89620 RDI: ffff88881f6b00a8
[ 106.423540] RBP: ffff88881f6b5bf8 R08: 0000000000000002 R09: 000000000002ed80
[ 106.423616] R10: 0000003fdd956146 R11: ffff88881c2d1e47 R12: 0000000000000008
[ 106.423691] R13: 0000000000000008 R14: ffffffff825b4f80 R15: ffffffff825b4fc0
[ 106.423772] ? sched_idle_set_state+0x20/0x30
[ 106.423851] ? cpuidle_enter_state+0xa6/0x620
[ 106.423874] ? tick_nohz_idle_stop_tick+0x1d1/0x3f0
[ 106.423896] cpuidle_enter+0x37/0x60
[ 106.423919] do_idle+0x246/0x280
[ 106.423941] ? arch_cpu_idle_exit+0x30/0x30
[ 106.423964] ? __wake_up_common+0x46/0x240
[ 106.423986] cpu_startup_entry+0x14/0x20
[ 106.424009] start_secondary+0x1b0/0x200
[ 106.424031] ? set_cpu_sibling_map+0x990/0x990
[ 106.424054] secondary_startup_64+0xa4/0xb0
[ 106.424075]
[ 106.424096] Allocated by task 626:
[ 106.424119] save_stack+0x19/0x80
[ 106.424143] __kasan_kmalloc.constprop.7+0xc1/0xd0
[ 106.424165] kmem_cache_alloc+0xb2/0x1d0
[ 106.424277] i915_sched_lookup_priolist+0x1ab/0x320 [i915]
[ 106.424385] execlists_submit_request+0x73/0x200 [i915]
[ 106.424498] submit_notify+0x59/0x60 [i915]
[ 106.424600] __i915_sw_fence_complete+0x9b/0x330 [i915]
[ 106.424713] __i915_request_commit+0x4bf/0x570 [i915]
[ 106.424818] intel_engine_pulse+0x213/0x310 [i915]
[ 106.424925] context_close+0x22f/0x470 [i915]
[ 106.425033] i915_gem_context_destroy_ioctl+0x7b/0xa0 [i915]
[ 106.425058] drm_ioctl_kernel+0x131/0x170
[ 106.425081] drm_ioctl+0x2d9/0x4f1
[ 106.425104] do_vfs_ioctl+0x115/0x890
[ 106.425126] ksys_ioctl+0x35/0x70
[ 106.425147] __x64_sys_ioctl+0x38/0x40
[ 106.425169] do_syscall_64+0x66/0x220
[ 106.425191] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 106.425213]
[ 106.425234] Freed by task 0:
[ 106.425255] (stack is not available)
[ 106.425276]
[ 106.425297] The buggy address belongs to the object at ffff888703506a40
[ 106.425297] which belongs to the cache i915_priolist of size 104
[ 106.425321] The buggy address is located 136 bytes to the right of
[ 106.425321] 104-byte region [ffff888703506a40, ffff888703506aa8)
[ 106.425345] The buggy address belongs to the page:
[ 106.425367] page:ffffea001c0d4180 refcount:1 mapcount:0 mapping:ffff88873e1cf740 index:0xffff888703506e40 compound_mapcount: 0
[ 106.425391] flags: 0x8000000000010200(slab|head)
[ 106.425415] raw: 8000000000010200 ffffea0020192b88 ffff8888174b5450 ffff88873e1cf740
[ 106.425439] raw: ffff888703506e40 000000000010000e 00000001ffffffff 0000000000000000
[ 106.425464] page dumped because: kasan: bad access detected
[ 106.425486]
[ 106.425506] Memory state around the buggy address:
[ 106.425528] ffff888703506a00: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00
[ 106.425551] ffff888703506a80: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
[ 106.425573] >ffff888703506b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 106.425597] ^
[ 106.425619] ffff888703506b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 106.425642] ffff888703506c00: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00
[ 106.425664] ==================================================================

Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190809073723.6593-1-chris@chris-wilson.co.uk


# db94e9f1 08-Aug-2019 Jani Nikula <jani.nikula@intel.com>

drm/i915: extract i915_perf.h from i915_drv.h

It used to be handy that we only had a couple of headers, but over time
i915_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.

Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.

No functional changes.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/d7826e365695f691a3ac69a69ff6f2bbdb62700d.1565271681.git.jani.nikula@intel.com


# c7302f20 08-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Defer final intel_wakeref_put to process context

As we need to acquire a mutex to serialise the final
intel_wakeref_put, we need to ensure that we are in process context at
that time. However, we want to allow operation on the intel_wakeref from
inside timer and other hardirq context, which means that need to defer
that final put to a workqueue.

Inside the final wakeref puts, we are safe to operate in any context, as
we are simply marking up the HW and state tracking for the potential
sleep. It's only the serialisation with the potential sleeping getting
that requires careful wait avoidance. This allows us to retain the
immediate processing as before (we only need to sleep over the same
races as the current mutex_lock).

v2: Add a selftest to ensure we exercise the code while lockdep watches.
v3: That test was extremely loud and complained about many things!
v4: Not a whale!

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111295
References: https://bugs.freedesktop.org/show_bug.cgi?id=111245
References: https://bugs.freedesktop.org/show_bug.cgi?id=111256
Fixes: 18398904ca9e ("drm/i915: Only recover active engines")
Fixes: 51fbd8de87dc ("drm/i915/pmu: Atomically acquire the gt_pm wakeref")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190808202758.10453-1-chris@chris-wilson.co.uk


# a09d9a80 06-Aug-2019 Jani Nikula <jani.nikula@intel.com>

drm/i915: avoid including intel_drv.h via i915_drv.h->i915_trace.h

Disentangle i915_drv.h from intel_drv.h, which gets included via
i915_trace.h. This necessitates including i915_trace.h wherever it's
needed.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ed82bf259d3b725a1a1a3c3e9d6fb5c08bc4d489.1565085691.git.jani.nikula@intel.com


# a1c9ca22 30-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove lrc default desc from GEM context

We only compute the lrc_descriptor() on pinning the context, i.e.
infrequently, so we do not benefit from storing the template as the
addressing mode is also fixed for the lifetime of the intel_context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190730133035.1977-9-chris@chris-wilson.co.uk


# 10e36489 30-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Always clear pending&inflight requests on reset

If we skip the reset as we found the engine inactive at the time of the
reset, we still need to clear the residual inflight & pending request
bookkeeping to reflect the current state of HW.

Otherwise, we may end up stuck in a loop like:

<7> [416.490346] hangcheck rcs0
<7> [416.490371] hangcheck Awake? 1
<7> [416.490376] hangcheck Hangcheck: 8003 ms ago
<7> [416.490380] hangcheck Reset count: 0 (global 0)
<7> [416.490383] hangcheck Requests:
<7> [416.491210] hangcheck RING_START: 0x0017b000
<7> [416.491983] hangcheck RING_HEAD: 0x00000048
<7> [416.491992] hangcheck RING_TAIL: 0x00000048
<7> [416.492006] hangcheck RING_CTL: 0x00000000
<7> [416.492037] hangcheck RING_MODE: 0x00000200 [idle]
<7> [416.492044] hangcheck RING_IMR: 00000000
<7> [416.492809] hangcheck ACTHD: 0x00000000_9ca00048
<7> [416.492824] hangcheck BBADDR: 0x00000000_00001004
<7> [416.492838] hangcheck DMA_FADDR: 0x00000000_00000000
<7> [416.492845] hangcheck IPEIR: 0x00000000
<7> [416.492852] hangcheck IPEHR: 0x00000000
<7> [416.492863] hangcheck Execlist status: 0x00018001 00000000, entries 12
<7> [416.492869] hangcheck Execlist CSB read 1, write 1, tasklet queued? no (enabled)
<7> [416.492938] hangcheck Pending[0] ring:{start:0017b000, hwsp:fedf9000, seqno:00016fd6}, rq: 20ffa:16fd6!+ prio=-4094 @ 8307ms: signaled
<7> [416.492972] hangcheck Queue priority hint: -4093
<7> [416.492979] hangcheck Q 20ffa:16fd8- prio=-4093 @ 8307ms: [i915]
<7> [416.492985] hangcheck Q 20ffa:16fda prio=-4094 @ 8307ms: [i915]
<7> [416.492990] hangcheck Q 20ffa:16fdc prio=-4094 @ 8307ms: [i915]
<7> [416.492996] hangcheck Q 20ffa:16fde prio=-4094 @ 8307ms: [i915]
<7> [416.493001] hangcheck Q 20ffa:16fe0 prio=-4094 @ 8307ms: [i915]
<7> [416.493007] hangcheck Q 20ffa:16fe2 prio=-4094 @ 8307ms: [i915]
<7> [416.493013] hangcheck Q 20ffa:16fe4 prio=-4094 @ 8307ms: [i915]
<7> [416.493021] hangcheck ...skipping 21 queued requests...
<7> [416.493027] hangcheck Q 20ffa:17010 prio=-4094 @ 8307ms: [i915]
<7> [416.493081] hangcheck HWSP:
<7> [416.493089] hangcheck [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [416.493094] hangcheck *
<7> [416.493100] hangcheck [0040] 10008002 00000000 10000018 00000000 10000018 00000000 10000001 00000000
<7> [416.493106] hangcheck [0060] 10000018 00000000 10000001 00000000 10000018 00000000 10000001 00000000
<7> [416.493111] hangcheck *
<7> [416.493117] hangcheck [00a0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001
<7> [416.493123] hangcheck [00c0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [416.493127] hangcheck *
<7> [416.493132] hangcheck Idle? no
<6> [416.512124] i915 0000:00:02.0: GPU HANG: ecode 11:0:0x00000000, hang on rcs0
<6> [416.512205] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
<6> [416.512207] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
<6> [416.512208] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
<6> [416.512210] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
<6> [416.512212] [drm] GPU crash dump saved to /sys/class/drm/card0/error
<5> [416.513602] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
<7> [424.489258] hangcheck rcs0
<7> [424.489263] hangcheck Awake? 1
<7> [424.489267] hangcheck Hangcheck: 5954 ms ago
<7> [424.489271] hangcheck Reset count: 1 (global 0)
<7> [424.489274] hangcheck Requests:
<7> [424.490128] hangcheck RING_START: 0x00000000
<7> [424.490870] hangcheck RING_HEAD: 0x00000000
<7> [424.490877] hangcheck RING_TAIL: 0x00000000
<7> [424.490887] hangcheck RING_CTL: 0x00000000
<7> [424.490897] hangcheck RING_MODE: 0x00000200 [idle]
<7> [424.490904] hangcheck RING_IMR: 00000000
<7> [424.490917] hangcheck ACTHD: 0x00000000_00000000
<7> [424.490930] hangcheck BBADDR: 0x00000000_00000000
<7> [424.490943] hangcheck DMA_FADDR: 0x00000000_00000000
<7> [424.490950] hangcheck IPEIR: 0x00000000
<7> [424.490956] hangcheck IPEHR: 0x00000000
<7> [424.490968] hangcheck Execlist status: 0x00000001 00000000, entries 12
<7> [424.490972] hangcheck Execlist CSB read 11, write 11, tasklet queued? no (enabled)
<7> [424.490983] hangcheck Pending[0] ring:{start:0017b000, hwsp:fedf9000, seqno:00016fd6}, rq: 20ffa:16fd6!+ prio=-4094 @ 16305ms: signaled
<7> [424.490989] hangcheck Queue priority hint: -4093
<7> [424.490996] hangcheck Q 20ffa:16fd8- prio=-4093 @ 16305ms: [i915]
<7> [424.491001] hangcheck Q 20ffa:16fda prio=-4094 @ 16305ms: [i915]
<7> [424.491006] hangcheck Q 20ffa:16fdc prio=-4094 @ 16305ms: [i915]
<7> [424.491011] hangcheck Q 20ffa:16fde prio=-4094 @ 16305ms: [i915]
<7> [424.491016] hangcheck Q 20ffa:16fe0 prio=-4094 @ 16305ms: [i915]
<7> [424.491022] hangcheck Q 20ffa:16fe2 prio=-4094 @ 16305ms: [i915]
<7> [424.491048] hangcheck Q 20ffa:16fe4 prio=-4094 @ 16305ms: [i915]
<7> [424.491057] hangcheck ...skipping 21 queued requests...
<7> [424.491063] hangcheck Q 20ffa:17010 prio=-4094 @ 16305ms: [i915]
<7> [424.491095] hangcheck HWSP:
<7> [424.491102] hangcheck [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [424.491106] hangcheck *
<7> [424.491113] hangcheck [0040] 10008002 00000000 10000018 00000000 10000018 00000000 10000001 00000000
<7> [424.491118] hangcheck [0060] 10000018 00000000 10000001 00000000 10000018 00000000 10000001 00000000
<7> [424.491122] hangcheck *
<7> [424.491127] hangcheck [00a0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000b
<7> [424.491133] hangcheck [00c0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [424.491136] hangcheck *
<7> [424.491141] hangcheck Idle? no
<5> [424.491834] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0

Where not having cleared the pending array on reset, it persists
indefinitely.

Fixes: fff8102aaed5 ("drm/i915/execlists: Process interrupted context on reset")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190730133035.1977-2-chris@chris-wilson.co.uk


# f5d974f9 30-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Provide a local intel_context.vm

Track the currently bound address space used by the HW context. Minor
conversions to use the local intel_context.vm are made, leaving behind
some more surgery required to make intel_context the primary through the
selftests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190730143209.4549-2-chris@chris-wilson.co.uk


# a5627721 28-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Inline engine->init_context into its caller

We only use the init_context vfunc once while recording the default
context state, and we use the same sequence in each backend (eliding
steps that do not apply). Remove the vfunc for simplicity and
de-duplication.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190729113720.24830-1-chris@chris-wilson.co.uk


# ac65bdfe 19-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Keep rings pinned while the context is active

Remember to keep the rings pinned as well as the context image until the
GPU is no longer active.

v2: Introduce a ring->pin_count primarily to hide the
mock_ring that doesn't fit into the normal GGTT vma picture.

v3: Order is important in teardown, ringbuffer submission needs to drop
the pin count on the engine->kernel_context before it can gleefully free
its ring.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110946
Fixes: ce476c80b8bf ("drm/i915: Keep contexts pinned until after the next kernel context switch")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190619170135.15281-1-chris@chris-wilson.co.uk
(cherry picked from commit 09c5ab384f6fb30f834a5777888b4486dd7f015d)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# df8cf31e 18-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Hook up intel_context_fini()

Prior to freeing the struct, call the fini function to cleanup the
common members. Currently this only calls the debug functions to mark
the structs as destroyed, but may be extended to real work in future.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190718070024.21781-2-chris@chris-wilson.co.uk


# 7d6b60db 16-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Cancel breadcrumb on preempting the virtual engine

As we unwind the requests for a preemption event, we return a virtual
request back to its original virtual engine (so that it is available for
execution on any of its siblings). In the process, this means that its
breadcrumb should no longer be associated with the original physical
engine, and so we are forced to decouple it. Previously, as the request
could not complete without our awareness, we would move it to the next
real engine without any danger. However, preempt-to-busy allowed for
requests to continue on the HW and complete in the background as we
unwound, which meant that we could end up retiring the request before
fixing up the breadcrumb link.

[51679.517943] INFO: trying to register non-static key.
[51679.517956] the code is fine but needs lockdep annotation.
[51679.517960] turning off the locking correctness validator.
[51679.517966] CPU: 0 PID: 3270 Comm: kworker/u8:0 Tainted: G U 5.2.0+ #717
[51679.517971] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017
[51679.518012] Workqueue: i915 retire_work_handler [i915]
[51679.518017] Call Trace:
[51679.518026] dump_stack+0x67/0x90
[51679.518031] register_lock_class+0x52c/0x540
[51679.518038] ? find_held_lock+0x2d/0x90
[51679.518042] __lock_acquire+0x68/0x1800
[51679.518047] ? find_held_lock+0x2d/0x90
[51679.518073] ? __i915_sw_fence_complete+0xff/0x1c0 [i915]
[51679.518079] lock_acquire+0x90/0x170
[51679.518105] ? i915_request_cancel_breadcrumb+0x29/0x160 [i915]
[51679.518112] _raw_spin_lock+0x27/0x40
[51679.518138] ? i915_request_cancel_breadcrumb+0x29/0x160 [i915]
[51679.518165] i915_request_cancel_breadcrumb+0x29/0x160 [i915]
[51679.518199] i915_request_retire+0x43f/0x530 [i915]
[51679.518232] retire_requests+0x4d/0x60 [i915]
[51679.518263] i915_retire_requests+0xdf/0x1f0 [i915]
[51679.518294] retire_work_handler+0x4c/0x60 [i915]
[51679.518301] process_one_work+0x22c/0x5c0
[51679.518307] worker_thread+0x37/0x390
[51679.518311] ? process_one_work+0x5c0/0x5c0
[51679.518316] kthread+0x116/0x130
[51679.518320] ? kthread_create_on_node+0x40/0x40
[51679.518325] ret_from_fork+0x24/0x30
[51679.520177] ------------[ cut here ]------------
[51679.520189] list_del corruption, ffff88883675e2f0->next is LIST_POISON1 (dead000000000100)

Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190716124931.5870-4-chris@chris-wilson.co.uk


# c30d5dc6 16-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Push engine stopping into reset-prepare

Push the engine stop into the back reset_prepare (where it already was!)
This allows us to avoid dangerously setting the RING registers to 0 for
logical contexts. If we clear the register on a live context, those
invalid register values are recorded in the logical context state and
replayed (with hilarious results).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190716124931.5870-2-chris@chris-wilson.co.uk


# fff8102a 16-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Process interrupted context on reset

By stopping the rings, we may trigger an arbitration point resulting in
a premature context-switch (i.e. a completion event before the request
is actually complete). This clears the active context before the reset,
but we must remember to rewind the incomplete context for replay upon
resume.

Fixes: 1863e3020ab5 ("drm/i915/execlists: Always reset the context's RING registers")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190716124931.5870-3-chris@chris-wilson.co.uk


# a9877da2 16-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/oa: Reconfigure contexts on the fly

Avoid a global idle barrier by reconfiguring each context by rewriting
them with MI_STORE_DWORD from the kernel context.

v2: We only need to determine the desired register values once, they are
the same for all contexts.
v3: Don't remove the kernel context from the list of known GEM contexts;
the world is not ready for that yet.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190716213443.9874-1-chris@chris-wilson.co.uk


# 09975b86 09-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Disable preemption under GVT

Preempt-to-busy uses a GPU semaphore to enforce an idle-barrier across
preemption, but mediated gvt does not fully support semaphores.

v2: Fiddle around with the flags and settle on using has-semaphores for
the core bits so that we retain the ability to preempt our own
semaphores.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Xiaolin Zhang <xiaolin.zhang@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190709091233.8573-1-chris@chris-wilson.co.uk


# cb823ed9 12-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Use intel_gt as the primary object for handling resets

Having taken the first step in encapsulating the functionality by moving
the related files under gt/, the next step is to start encapsulating by
passing around the relevant structs rather than the global
drm_i915_private. In this step, we pass intel_gt to intel_reset.c

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190712192953.9187-1-chris@chris-wilson.co.uk


# 58d1b427 10-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Record preemption for selftests

Put back the preemption counters lost in commit 22b7a426bbe1
("drm/i915/execlists: Preempt-to-busy") so that our selftests that
assert no preemption took place continue to function.

v2: But a timeslice is only a "soft" preemption!

Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190710064454.682-1-chris@chris-wilson.co.uk


# 2a98f4e6 09-Jul-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915: add infrastructure to hold off preemption on a request

We want to set this flag in the next commit on requests containing
perf queries so that the result of the perf query can just be a delta
of global counters, rather than doing post processing of the OA
buffer.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[ickle: add basic selftest for nopreempt]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190709164227.25859-1-chris@chris-wilson.co.uk


# 46c5847e 09-Jul-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915: enumerate scratch fields

We have a bunch of offsets in the scratch buffer. As we're about to
add some more, let's group all of the offsets in a common location.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190709123351.5645-6-lionel.g.landwerlin@intel.com


# ab9e2f77 03-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Pull engine w/a initialisation into common

We need to setup the workarounds on all engines, with the knowledge
about which platforms each workaround applies to kept together in the
workaround list. As such, we can pull the w/a initialisation into the
common setup and try to avoid duplicating knowledge about when to setup
the workarounds.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190703135805.7310-2-chris@chris-wilson.co.uk


# 313443b1 03-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Assume we hold forcewake for execlists resume

We can assume the caller is holding a blanket forcewake for the
register writes during resume, and so we can skip taking individual
locks around each write inside execlists resume.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190703155225.9501-3-chris@chris-wilson.co.uk


# 2006058e 04-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move the renderstate setup under gt/

The render state is used to initialise the default RCS context, and only
used during early setup from within the gt code. As such, it makes a
good candidate for placing within gt/, even if it is not yet entirely
clean of our GEM heritage.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190704091925.7391-1-chris@chris-wilson.co.uk


# ad9e3792 03-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Hesitate before slicing

Be a little more hesitant before injecting a timeslice, and try to take
into account any change in priority that is due for the running task
before switching to another task. This will allow us to arbitrarily
prevent switching away from a request if we deem it necessarily to
disable preemption, for instance.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190703091726.11690-9-chris@chris-wilson.co.uk


# 8759aa4c 01-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Refactor CSB state machine

Daniele pointed out that the CSB status information will change with
Tigerlake and suggested that we could rearrange our state machine to
hide the differences in generation. gcc also prefers the explicit state
machine, so make it so:

process_csb 1980 1967 -13

Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190701100502.15639-4-chris@chris-wilson.co.uk


# 5f22e5b3 25-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rename intel_wakeref_[is]_active

Our general rule is to use is/has as the verb for boolean functions,
rename intel_wakeref_active to intel_wakeref_is_active so the question
being asked is clear.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190625130128.11009-6-chris@chris-wilson.co.uk


# 07bfe6bf 25-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Convert recursive defer_request() into iterative

As this engine owns the lock around rq->sched.link (for those waiters
submitted to this engine), we can use that link as an element in a local
list. We can thus replace the recursive algorithm with an iterative walk
over the ordered list of waiters.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190625130128.11009-1-chris@chris-wilson.co.uk


# 8db7933e 24-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Always clear ring_pause if we do not submit

In the unlikely case (thank you CI!), we may find ourselves wanting to
issue a preemption but having no runnable requests left. In this case,
we set the semaphore before computing the preemption and so must unset
it before forgetting (or else we leave the machine busywaiting until the
next request comes along and so likely hang).

v2: Replace readback with only a wmb after asserting the semaphore

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190624092009.30189-1-chris@chris-wilson.co.uk


# 12c255b5 21-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Provide an i915_active.acquire callback

If we introduce a callback for i915_active that is only called the first
time we use the i915_active and is symmetrically paired with the
i915_active.retire callback, we can replace the open-coded and
non-atomic implementations -- which will be very fragile (i.e. broken)
upon removing the struct_mutex serialisation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621183801.23252-4-chris@chris-wilson.co.uk


# a93615f9 21-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Throw away the active object retirement complexity

Remove the accumulated optimisations that we have for i915_vma_retire
and reduce it to the bare essential of tracking the active object
reference. This allows us to only use atomic operations, and so will be
able to avoid the struct_mutex requirement.

The principal loss here is the shrinker MRU bumping, so now if we have
to shrink, we will do so in much more random order and more likely to
try and shrink recently used objects. That is a nuisance, but shrinking
active objects is a second step we try to avoid and will always be a
system-wide performance issue.

The other loss is here is in the automatic pruning of the
reservation_object when idling. This is not as large an issue as upon
reservation_object introduction as now adding new fences into the object
replaces already signaled fences, keeping the array compact. But we do
lose the auto-expiration of stale fences and unused arrays. That may be
a noticeable problem for which we need to re-implement autopruning.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621183801.23252-3-chris@chris-wilson.co.uk


# db56f974 21-Jun-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Eliminate dual personality of i915_scratch_offset

Scratch vma lives under gt but the API used to work on i915. Make this
consistent by renaming the function to intel_gt_scratch_offset and make
it take struct intel_gt.

v2:
* Move to intel_gt. (Chris)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-33-tvrtko.ursulin@linux.intel.com


# f0c02c1b 21-Jun-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Rename i915_timeline to intel_timeline and move under gt

Move all timeline code under gt and rename to intel_gt prefix.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-32-tvrtko.ursulin@linux.intel.com


# 4c6d51ea 21-Jun-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Make timelines gt centric

Our timelines are stored inside intel_gt so we can convert the interface
to take exactly that and not i915.

At the same time re-order the params to our more typical layout and
replace the backpointer to the new containing structure.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-31-tvrtko.ursulin@linux.intel.com


# ba4134a4 21-Jun-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Save trip via top-level i915 in a few more places

For gt related operations it makes more logical sense to stay in the realm
of gt instead of dereferencing via driver i915.

This patch handles a few of the easy ones with work requiring more
refactoring still outstanding.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-30-tvrtko.ursulin@linux.intel.com


# f937f561 21-Jun-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Store backpointer to intel_gt in the engine

It will come useful in the next patch.

v2:
* Do mock_engine as well.

v3:
* And the virtual engine...

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-11-tvrtko.ursulin@linux.intel.com


# 12fdaf19 21-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Keep virtual context alive until after we kick

The call to kick_siblings() dereferences the rq->context, so we should
not drop our local reference until afterwards!

v2: Stick to setting ce.inflight=NULL before kicking as this is what the
other threads will check to see if the context is ready for takeover.

Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621080729.2652-1-chris@chris-wilson.co.uk


# 8ee36e04 20-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Minimalistic timeslicing

If we have multiple contexts of equal priority pending execution,
activate a timer to demote the currently executing context in favour of
the next in the queue when that timeslice expires. This enforces
fairness between contexts (so long as they allow preemption -- forced
preemption, in the future, will kick those who do not obey) and allows
us to avoid userspace blocking forward progress with e.g. unbounded
MI_SEMAPHORE_WAIT.

For the starting point here, we use the jiffie as our timeslice so that
we should be reasonably efficient wrt frequent CPU wakeups.

Testcase: igt/gem_exec_scheduler/semaphore-resolve
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-2-chris@chris-wilson.co.uk


# 22b7a426 20-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Preempt-to-busy

When using a global seqno, we required a precise stop-the-workd event to
handle preemption and unwind the global seqno counter. To accomplish
this, we would preempt to a special out-of-band context and wait for the
machine to report that it was idle. Given an idle machine, we could very
precisely see which requests had completed and which we needed to feed
back into the run queue.

However, now that we have scrapped the global seqno, we no longer need
to precisely unwind the global counter and only track requests by their
per-context seqno. This allows us to loosely unwind inflight requests
while scheduling a preemption, with the enormous caveat that the
requests we put back on the run queue are still _inflight_ (until the
preemption request is complete). This makes request tracking much more
messy, as at any point then we can see a completed request that we
believe is not currently scheduled for execution. We also have to be
careful not to rewind RING_TAIL past RING_HEAD on preempting to the
running context, and for this we use a semaphore to prevent completion
of the request before continuing.

To accomplish this feat, we change how we track requests scheduled to
the HW. Instead of appending our requests onto a single list as we
submit, we track each submission to ELSP as its own block. Then upon
receiving the CS preemption event, we promote the pending block to the
inflight block (discarding what was previously being tracked). As normal
CS completion events arrive, we then remove stale entries from the
inflight tracker.

v2: Be a tinge paranoid and ensure we flush the write into the HWS page
for the GPU semaphore to pick in a timely fashion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-1-chris@chris-wilson.co.uk


# 09c5ab38 19-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Keep rings pinned while the context is active

Remember to keep the rings pinned as well as the context image until the
GPU is no longer active.

v2: Introduce a ring->pin_count primarily to hide the
mock_ring that doesn't fit into the normal GGTT vma picture.

v3: Order is important in teardown, ringbuffer submission needs to drop
the pin count on the engine->kernel_context before it can gleefully free
its ring.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110946
Fixes: ce476c80b8bf ("drm/i915: Keep contexts pinned until after the next kernel context switch")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190619170135.15281-1-chris@chris-wilson.co.uk


# 73591341 17-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Detect cross-contamination with GuC

The process_csb routine from execlists_submission is incompatible with
the GuC backend. Add a warning to detect if we accidentally end up in
the wrong spot.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: MichaƂ Winiarski <michal.winiarski@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190618110736.31155-1-chris@chris-wilson.co.uk


# 44d89409 18-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Make the semaphore saturation mask global

The idea behind keeping the saturation mask local to a context backfired
spectacularly. The premise with the local mask was that we would be more
proactive in attempting to use semaphores after each time the context
idled, and that all new contexts would attempt to use semaphores
ignoring the current state of the system. This turns out to be horribly
optimistic. If the system state is still oversaturated and the existing
workloads have all stopped using semaphores, the new workloads would
attempt to use semaphores and be deprioritised behind real work. The
new contexts would not switch off using semaphores until their initial
batch of low priority work had completed. Given sufficient backload load
of equal user priority, this would completely starve the new work of any
GPU time.

To compensate, remove the local tracking in favour of keeping it as
global state on the engine -- once the system is saturated and
semaphores are disabled, everyone stops attempting to use semaphores
until the system is idle again. One of the reason for preferring local
context tracking was that it worked with virtual engines, so for
switching to global state we could either do a complete check of all the
virtual siblings or simply disable semaphores for those requests. This
takes the simpler approach of disabling semaphores on virtual engines.

The downside is that the decision that the engine is saturated is a
local measure -- we are only checking whether or not this context was
scheduled in a timely fashion, it may be legitimately delayed due to user
priorities. We still have the same dilemma though, that we do not want
to employ the semaphore poll unless it will be used.

v2: Explain why we need to assume the worst wrt virtual engines.

Fixes: ca6e56f654e7 ("drm/i915: Disable semaphore busywaits on saturated systems")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190618074153.16055-8-chris@chris-wilson.co.uk


# 422d7df4 14-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Replace engine->timeline with a plain list

To continue the onslaught of removing the assumption of a global
execution ordering, another casualty is the engine->timeline. Without an
actual timeline to track, it is overkill and we can replace it with a
much less grand plain list. We still need a list of requests inflight,
for the simple purpose of finding inflight requests (for retiring,
resetting, preemption etc).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-3-chris@chris-wilson.co.uk


# ce476c80 14-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Keep contexts pinned until after the next kernel context switch

We need to keep the context image pinned in memory until after the GPU
has finished writing into it. Since it continues to write as we signal
the final breadcrumb, we need to keep it pinned until the request after
it is complete. Currently we know the order in which requests execute on
each engine, and so to remove that presumption we need to identify a
request/context-switch we know must occur after our completion. Any
request queued after the signal must imply a context switch, for
simplicity we use a fresh request from the kernel context.

The sequence of operations for keeping the context pinned until saved is:

- On context activation, we preallocate a node for each physical engine
the context may operate on. This is to avoid allocations during
unpinning, which may be from inside FS_RECLAIM context (aka the
shrinker)

- On context deactivation on retirement of the last active request (which
is before we know the context has been saved), we add the
preallocated node onto a barrier list on each engine

- On engine idling, we emit a switch to kernel context. When this
switch completes, we know that all previous contexts must have been
saved, and so on retiring this request we can finally unpin all the
contexts that were marked as deactivated prior to the switch.

We can enhance this in future by flushing all the idle contexts on a
regular heartbeat pulse of a switch to kernel context, which will also
be used to check for hung engines.

v2: intel_context_active_acquire/_release

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-1-chris@chris-wilson.co.uk


# ab53497b 11-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rename i915_hw_ppgtt to i915_ppgtt

Keeping the _hw_ in there does not help to distinguish it from its
only brethren i915_ggtt, so drop it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190611091238.15808-2-chris@chris-wilson.co.uk


# e568ac38 11-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pull kref into i915_address_space

Make the kref common to both derived structs (i915_ggtt and i915_ppgtt)
so that we can safely reference count an abstract ctx->vm address space.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190611091238.15808-1-chris@chris-wilson.co.uk


# f6e903db 07-Jun-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Tidy intel_execlists_submission_init

Get to uncore from the engine for better logic organization and use
already available i915 everywhere.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190607084521.16845-2-tvrtko.ursulin@linux.intel.com


# dbc65183 07-Jun-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Convert some more bits to use engine mmio accessors

Remove a couple dev_priv locals as a consequence.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190607084521.16845-1-tvrtko.ursulin@linux.intel.com


# 754f7a0b 28-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rename intel_context.active to .inflight

Rename the engine this HW context is currently active upon (that we are
flying upon) to disambiguate between the mixture of different active
terms (and prevent conflict in future patches).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-14-chris@chris-wilson.co.uk


# 10be98a7 28-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move more GEM objects under gem/

Continuing the theme of separating out the GEM clutter.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-8-chris@chris-wilson.co.uk


# 8475355f 28-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move shmem object setup to its own file

Split the plain old shmem object into its own file to start decluttering
i915_gem.c

v2: Lose the confusing, hysterical raisins, suffix of _gtt.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-4-chris@chris-wilson.co.uk


# ee113690 21-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Virtual engine bonding

Some users require that when a master batch is executed on one particular
engine, a companion batch is run simultaneously on a specific slave
engine. For this purpose, we introduce virtual engine bonding, allowing
maps of master:slaves to be constructed to constrain which physical
engines a virtual engine may select given a fence on a master engine.

For the moment, we continue to ignore the issue of preemption deferring
the master request for later. Ideally, we would like to then also remove
the slave and run something else rather than have it stall the pipeline.
With load balancing, we should be able to move workload around it, but
there is a similar stall on the master pipeline while it may wait for
the slave to be executed. At the cost of more latency for the bonded
request, it may be interesting to launch both on their engines in
lockstep. (Bubbles abound.)

Opens: Also what about bonding an engine as its own master? It doesn't
break anything internally, so allow the silliness.

v2: Emancipate the bonds
v3: Couple in delayed scheduling for the selftests
v4: Handle invalid mutually exclusive bonding
v5: Mention what the uapi does
v6: s/nbond/num_bonds/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-9-chris@chris-wilson.co.uk


# 78e41ddd 21-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Apply an execution_mask to the virtual_engine

Allow the user to direct which physical engines of the virtual engine
they wish to execute one, as sometimes it is necessary to override the
load balancing algorithm.

v2: Only kick the virtual engines on context-out if required

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-7-chris@chris-wilson.co.uk


# 6d06779e 21-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Load balancing across a virtual engine

Having allowed the user to define a set of engines that they will want
to only use, we go one step further and allow them to bind those engines
into a single virtual instance. Submitting a batch to the virtual engine
will then forward it to any one of the set in a manner as best to
distribute load. The virtual engine has a single timeline across all
engines (it operates as a single queue), so it is not able to concurrently
run batches across multiple engines by itself; that is left up to the user
to submit multiple concurrent batches to multiple queues. Multiple users
will be load balanced across the system.

The mechanism used for load balancing in this patch is a late greedy
balancer. When a request is ready for execution, it is added to each
engine's queue, and when an engine is ready for its next request it
claims it from the virtual engine. The first engine to do so, wins, i.e.
the request is executed at the earliest opportunity (idle moment) in the
system.

As not all HW is created equal, the user is still able to skip the
virtual engine and execute the batch on a specific engine, all within the
same queue. It will then be executed in order on the correct engine,
with execution on other virtual engines being moved away due to the load
detection.

A couple of areas for potential improvement left!

- The virtual engine always take priority over equal-priority tasks.
Mostly broken up by applying FQ_CODEL rules for prioritising new clients,
and hopefully the virtual and real engines are not then congested (i.e.
all work is via virtual engines, or all work is to the real engine).

- We require the breadcrumb irq around every virtual engine request. For
normal engines, we eliminate the need for the slow round trip via
interrupt by using the submit fence and queueing in order. For virtual
engines, we have to allow any job to transfer to a new ring, and cannot
coalesce the submissions, so require the completion fence instead,
forcing the persistent use of interrupts.

- We only drip feed single requests through each virtual engine and onto
the physical engines, even if there was enough work to fill all ELSP,
leaving small stalls with an idle CS event at the end of every request.
Could we be greedy and fill both slots? Being lazy is virtuous for load
distribution on less-than-full workloads though.

Other areas of improvement are more general, such as reducing lock
contention, reducing dispatch overhead, looking at direct submission
rather than bouncing around tasklets etc.

sseu: Lift the restriction to allow sseu to be reconfigured on virtual
engines composed of RENDER_CLASS (rcs).

v2: macroize check_user_mbz()
v3: Cancel virtual engines on wedging
v4: Commence commenting
v5: Replace 64b sibling_mask with a list of class:instance
v6: Drop the one-element array in the uabi
v7: Assert it is an virtual engine in to_virtual_engine()
v8: Skip over holes in [class][inst] so we can selftest with (vcs0, vcs2)

Link: https://github.com/intel/media-driver/pull/283
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-6-chris@chris-wilson.co.uk


# 4cc79cbb 15-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Drop promotion on unsubmit

With the disappearance of NEWCLIENT, we no longer need to provide the
priority boost on preemption in order to prevent repeated gazumping,
and we can remove the dead code.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-5-chris@chris-wilson.co.uk


# 68fc728b 15-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Downgrade NEWCLIENT to non-preemptive

Commit 1413b2bc0717 ("drm/i915: Trim NEWCLIENT boosting") had the
intended consequence of not allowing a sequence of work that merely
crossed into a new engine the privilege to be promoted to NEWCLIENT
status. It also had the unintended consequence of actually making
NEWCLIENT effective on heavily oversubscribed transcode machines and
impacting upon their throughput.

If we consider a client packet composed of (rcsA, rcsB, vcs) and 30 of
those clients, using the NEWCLIENT boost that will be scheduled as

rcsA x 30, (rcsB, vcs) x 30

where as before it would have been

(rcsA, rcsB, vcs) x 30

That is with NEWCLIENT only boosting the first request of each client,
we would execute all rcsA requests prior to running on the vcs engines;
acruing a lot of dead time as compared to the previous case where the
vcs engine would be started in parallel to processing the second client.

The previous patch has the effect of delaying submission until it is
required by a third party (either the user with an explicit wait, or by
another client/engine). We reduce the NEWCLIENT bump to a mere WAIT,
which has the effect of removing its preemptive grant and reducing it to
the same level as any other user interaction -- that it will not be
promoted above the interengine dependencies, and so preventing NEWCLIENTS
from starving other engines. This a large nerf to the rrul properties of
the current NEWCLIENT, but it still does give prioritised submission to
new requests from light workloads.

References: b16c765122f9 ("drm/i915: Priority boost for new clients")
Fixes: 1413b2bc0717 ("drm/i915: Trim NEWCLIENT boosting") # customer impact
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-4-chris@chris-wilson.co.uk


# 519a0194 08-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/hangcheck: Replace hangcheck.seqno with RING_HEAD

After realising we need to sample RING_START to detect context switches
from preemption events that do not allow for the seqno to advance, we
can also realise that the seqno itself is just a distance along the ring
and so can be replaced by sampling RING_HEAD.

v2: Bonus comment for the mystery separate CS_STALL before MI_USER_INTERRUPT

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190508080704.24223-1-chris@chris-wilson.co.uk


# 5a6ac10b 07-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Don't apply priority boost for resets

Do not treat reset as a normal preemption event and avoid giving the
guilty request a priority boost for simply being active at the time of
reset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190507122954.6299-1-chris@chris-wilson.co.uk


# 25d851ad 07-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only reschedule the submission tasklet if preemption is possible

If we couple the scheduler more tightly with the execlists policy, we
can apply the preemption policy to the question of whether we need to
kick the tasklet at all for this priority bump.

v2: Rephrase it as a core i915 policy and not an execlists foible.
v3: Pull the kick together.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190507122544.12698-1-chris@chris-wilson.co.uk


# dc58958d 02-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Assert the local engine->wakeref is active

Due to the asynchronous tasklet and recursive GT wakeref, it may happen
that we submit to the engine (underneath it's own wakeref) prior to the
central wakeref being marked as taken. Switch to checking the local wakeref
for greater consistency.

Fixes: 79ffac8599c4 ("drm/i915: Invert the GEM wakeref hierarchy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190503115225.30831-3-chris@chris-wilson.co.uk


# c34c5bca 03-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Flush the tasklet on parking

Tidy up the cleanup sequence by always ensure that the tasklet is
flushed on parking (before we cleanup). The parking provides a
convenient point to ensure that the backend is truly idle.

v2: Do the full check for idleness before parking, to be sure we flush
any residual interrupt.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190503080942.30151-1-chris@chris-wilson.co.uk


# 45b9c968 01-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move the engine->destroy() vfunc onto the engine

Make the engine responsible for cleaning itself up!

This removes the i915->gt.cleanup vfunc that has been annoying the
casual reader and myself for the last several years, and helps keep a
future patch to add more cleanup tidy.

v2: Assert that engine->destroy is set after the backend starts
allocating its own state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190501103204.18632-1-chris@chris-wilson.co.uk


# 11334c6a 26-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Split engine setup/init into two phases

In the next patch, we require the engine vfuncs setup prior to
initialising the pinned kernel contexts, so split the vfunc setup from
the engine initialisation and call it earlier.

v2: s/setup_xcs/setup_common/ for intel_ring_submission_setup()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-6-chris@chris-wilson.co.uk


# 79ffac85 24-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Invert the GEM wakeref hierarchy

In the current scheme, on submitting a request we take a single global
GEM wakeref, which trickles down to wake up all GT power domains. This
is undesirable as we would like to be able to localise our power
management to the available power domains and to remove the global GEM
operations from the heart of the driver. (The intent there is to push
global GEM decisions to the boundary as used by the GEM user interface.)

Now during request construction, each request is responsible via its
logical context to acquire a wakeref on each power domain it intends to
utilize. Currently, each request takes a wakeref on the engine(s) and
the engines themselves take a chipset wakeref. This gives us a
transition on each engine which we can extend if we want to insert more
powermangement control (such as soft rc6). The global GEM operations
that currently require a struct_mutex are reduced to listening to pm
events from the chipset GT wakeref. As we reduce the struct_mutex
requirement, these listeners should evaporate.

Perhaps the biggest immediate change is that this removes the
struct_mutex requirement around GT power management, allowing us greater
flexibility in request construction. Another important knock-on effect,
is that by tracking engine usage, we can insert a switch back to the
kernel context on that engine immediately, avoiding any extra delay or
inserting global synchronisation barriers. This makes tracking when an
engine and its associated contexts are idle much easier -- important for
when we forgo our assumed execution ordering and need idle barriers to
unpin used contexts. In the process, it means we remove a large chunk of
code whose only purpose was to switch back to the kernel context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-5-chris@chris-wilson.co.uk


# 6eee33e8 24-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Introduce context->enter() and context->exit()

We wish to start segregating the power management into different control
domains, both with respect to the hardware and the user interface. The
first step is that at the lowest level flow of requests, we want to
process a context event (and not a global GEM operation). In this patch,
we introduce the context callbacks that in future patches will be
redirected to per-engine interfaces leading to global operations as
required.

The intent is that this will be guarded by the timeline->mutex, except
that retiring has not quite finished transitioning over from being
guarded by struct_mutex. So at the moment it is protected by
struct_mutex with a reminded to switch.

v2: Rename default handlers to intel_context_enter_engine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-3-chris@chris-wilson.co.uk


# 112ed2d3 24-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move GraphicsTechnology files under gt/

Start partitioning off the code that talks to the hardware (GT) from the
uapi layers and move the device facing code under gt/

One casualty is s/intel_ringbuffer.h/intel_engine.h/ with the plan to
subdivide that header and body further (and split out the submission
code from the ringbuffer and logical context handling). This patch aims
to be simple motion so git can fixup inflight patches with little mess.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424174839.7141-1-chris@chris-wilson.co.uk