History log of /linux-master/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
Revision Date Author Comments
# d459c86f 24-Jul-2023 Jonathan Cavitt <jonathan.cavitt@intel.com>

drm/i915/gt: Poll aux invalidation register bit on invalidation

For platforms that use Aux CCS, wait for aux invalidation to
complete by checking the aux invalidation register bit is
cleared.

Fixes: 972282c4cf24 ("drm/i915/gen12: Add aux table invalidate for all engines")
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.8+
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230725001950.1014671-7-andi.shyti@linux.intel.com


# b70df82b 24-Jul-2023 Andi Shyti <andi.shyti@linux.intel.com>

drm/i915/gt: Enable the CCS_FLUSH bit in the pipe control and in the CS

Enable the CCS_FLUSH bit 13 in the control pipe for render and
compute engines in platforms starting from Meteor Lake (BSPEC
43904 and 47112).

For the copy engine add MI_FLUSH_DW_CCS (bit 16) in the command
streamer.

Fixes: 972282c4cf24 ("drm/i915/gen12: Add aux table invalidate for all engines")
Requires: 8da173db894a ("drm/i915/gt: Rename flags with bit_group_X according to the datasheet")
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: <stable@vger.kernel.org> # v5.8+
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230725001950.1014671-6-andi.shyti@linux.intel.com


# 0fde2f23 24-Jul-2023 Jonathan Cavitt <jonathan.cavitt@intel.com>

drm/i915/gt: Poll aux invalidation register bit on invalidation

For platforms that use Aux CCS, wait for aux invalidation to
complete by checking the aux invalidation register bit is
cleared.

Fixes: 972282c4cf24 ("drm/i915/gen12: Add aux table invalidate for all engines")
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.8+
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230725001950.1014671-7-andi.shyti@linux.intel.com
(cherry picked from commit d459c86f00aa98028d155a012c65dc42f7c37e76)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# 824df77a 24-Jul-2023 Andi Shyti <andi.shyti@linux.intel.com>

drm/i915/gt: Enable the CCS_FLUSH bit in the pipe control and in the CS

Enable the CCS_FLUSH bit 13 in the control pipe for render and
compute engines in platforms starting from Meteor Lake (BSPEC
43904 and 47112).

For the copy engine add MI_FLUSH_DW_CCS (bit 16) in the command
streamer.

Fixes: 972282c4cf24 ("drm/i915/gen12: Add aux table invalidate for all engines")
Requires: 8da173db894a ("drm/i915/gt: Rename flags with bit_group_X according to the datasheet")
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: <stable@vger.kernel.org> # v5.8+
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230725001950.1014671-6-andi.shyti@linux.intel.com
(cherry picked from commit b70df82b428774875c7c56d3808102165891547c)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# 459b2606 16-Mar-2023 Suraj Kandpal <suraj.kandpal@intel.com>

drm/i915/gsc: Create GSC request submission mechanism

HDCP and PXP will require a common function to allow it to
submit commands to the gsc cs. Also adding the gsc mtl header
that needs to be added on to the existing payloads of HDCP
and PXP.

--v4
-Seprate gsc load and heci cmd submission into different
functions in different files for better scalability [Alan]
-Rename gsc address field [Alan]

--v5
-remove extra line is intel_gsc_fw.h [Uma]

Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Suraj Kandpal<suraj.kandpal@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Uma Shankar <uma.shankar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230316092927.668980-2-suraj.kandpal@intel.com


# 73a6c676 30-Jan-2023 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Add selftests for TLB invalidation

Check that we invalidate the TLB cache, the updated physical addresses
are immediately visible to the HW, and there is no retention of the old
physical address for concurrent HW access.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[ahajda: adjust to upstream driver, v2+]
Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
[tursulin: Small indentation fix.]
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230130165058.1647414-1-andrzej.hajda@intel.com


# 15bd4a67 08-Dec-2022 Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

drm/i915/gsc: GSC firmware loading

GSC FW is loaded by submitting a dedicated command via the GSC engine.
The memory area used for loading the FW is then re-purposed as local
memory for the GSC itself, so we use a separate allocation instead of
using the one where we keep the firmware stored for reload.

The GSC is not reset as part of GT reset, so we only need to load it on
first boot and S3/S4 exit.

v2: use REG_* for register fields definitions (Rodrigo), move to WQ
immediately

v3: mark worker function as static

Bspec: 63347, 65346
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221208200521.2928378-4-daniele.ceraolospurio@intel.com


# a5c3a3cb 26-Oct-2022 Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

drm/i915/perf: Determine gen12 oa ctx offset at runtime

Some SKUs of same gen12 platform may have different oactxctrl
offsets. For gen12, determine oactxctrl offsets at runtime.

v2: (Lionel)
- Move MI definitions to intel_gpu_commands.h
- Ensure __find_reg_in_lri does read past context image size

v3: (Ashutosh)
- Drop unnecessary use of double underscores
- fix find_reg_in_lri
- Return error if oa context offset is U32_MAX
- Error out if oa_ctx_ctrl_offset does not find offset

v4: (Ashutosh)
- Warn on odd MI LRI_LEN
- Remove unnecessary check for valid_oactxctrl_offset
- Drop valid_oactxctrl_offset macro

v5: Drop unrelated comment

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026222102.5526-5-umesh.nerlige.ramappa@intel.com


# 1eb31338 11-May-2022 Stuart Summers <stuart.summers@intel.com>

drm/i915/pvc: Remove additional 3D flags from PIPE_CONTROL

Although we already strip 3D-specific flags from PIPE_CONTROL
instructions when submitting to a compute engine, there are some
additional flags that need to be removed when the platform as a whole
lacks a 3D pipeline. Add those restrictions here.

v2:
- Replace LACKS_3D_PIPELINE checks with !HAS_3D_PIPELINE and add
has_3d_pipeline to all platforms except PVC. (Lucas)

Bspec: 47112
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220511060228.1179450-4-matthew.d.roper@intel.com


# d63ddca7 16-May-2022 Bommu Krishnaiah <krishnaiah.bommu@intel.com>

drm/i915: Update tiled blits selftest

Update the selftest to include Tile 4 mode and switch to Tile 4 on
platforms that supports Tile 4 but no Tile Y and vice versa.
Also switch to XY_FAST_COPY_BLT on platforms that supports it.

v4: update commit message to reflect the code changes properly.
v3: add a function to find X-tile availability for a platform.
v2: disable Tile X for iGPU in fastblit and
fix checkpath --strict warnings.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5879
Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
Co-developed-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220516082015.32020-1-nirmoy.das@intel.com


# 166c44e6 25-Apr-2022 Chris Wilson <chris.p.wilson@intel.com>

drm/i915/gt: Clear SET_PREDICATE_RESULT prior to executing the ring

Userspace may leave predication enabled upon return from the batch
buffer, which has the consequent of preventing all operation from the
ring from being executed, including all the synchronisation, coherency
control, arbitration and user signaling. This is more than just a local
gpu hang in one client, as the user has the ability to prevent the
kernel from applying critical workarounds and can cause a full GT reset.

We could simply execute MI_SET_PREDICATE upon return from the user
batch, but this has the repercussion of modifying the user's context
state. Instead, we opt to execute a fixup batch which by mixing
predicated operations can determine the state of the
SET_PREDICATE_RESULT register and restore it prior to the next userspace
batch. This allows us to protect the kernel's ring without changing the
uABI.

Suggested-by: Zbigniew Kempczynski <zbigniew.kempczynski@intel.com>
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Zbigniew Kempczynski <zbigniew.kempczynski@intel.com>
Cc: Thomas Hellstrom <thomas.hellstrom@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220425152317.4275-4-ramalingam.c@intel.com


# 717f9bad 15-Apr-2022 Matthew Brost <matthew.brost@intel.com>

drm/i915/dg2: Enable Wa_14014475959 - RCS / CCS context exit

There is bug in DG2 where if the CCS contexts switches out while the RCS
is running it can cause memory corruption. To workaround this add an
atomic to a memory address with a value 1 and semaphore wait to the same
address for a value of 0. The GuC firmware is responsible for writing 0
to the memory address when it is safe for the context to switch out.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220415224025.3693037-6-umesh.nerlige.ramappa@intel.com


# 48760ffe 05-Apr-2022 Ramalingam C <ramalingam.c@intel.com>

drm/i915/gt: Clear compress metadata for Flat-ccs objects

Xe-HP and latest devices support Flat CCS which reserved a portion of
the device memory to store compression metadata, during the clearing of
device memory buffer object we also need to clear the associated
CCS buffer.

XY_CTRL_SURF_COPY_BLT is a BLT cmd used for reading and writing the
ccs surface of a lmem memory. So on Flat-CCS capable platform we use
XY_CTRL_SURF_COPY_BLT to clear the CCS meta data.

v2: Fixed issues with platform naming [Lucas]
v3: Rebased [Ram]
Used the round_up funcs [Bob]
v4: Fixed ccs blk calculation [Ram]
Added Kdoc on flat-ccs.
v5: GENMASK is used [Matt]
mocs fix [Matt]
Comments Fix [Matt]
Flush address programming [Ram]
v6: FLUSH_DW is fixed
Few coding style fix
v7: Adopting the XY_FAST_COLOR_BLT (Thomas]
v8: XY_CTRL_SURF_COPY_BLT for ccs clearing.
v9: emit_copy_ccs is used.
v10: ctrl_surf cmds are filled in caller itself. [Thomas]
only one ctrl surf cmd is used as size of lmem is <=8M [Thomas]

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220405150840.29351-6-ramalingam.c@intel.com


# a0ed9c95 05-Apr-2022 Ramalingam C <ramalingam.c@intel.com>

drm/i915/gt: Use XY_FAST_COLOR_BLT to clear obj on graphics ver 12+

Use faster XY_FAST_COLOR_BLT cmd on graphics version of 12 and more,
for clearing (Zero out) the pages of the newly allocated object.

XY_FAST_COLOR_BLT is faster than the older XY_COLOR_BLT.

v2:
Typo fix at title [Thomas]
v3:
XY_FAST_COLOR_BLT is used only for FLAT_CCS capable gen12+

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220405150840.29351-3-ramalingam.c@intel.com


# d8b93201 28-Mar-2022 Fei Yang <fei.yang@intel.com>

drm/i915: avoid concurrent writes to aux_inv

GPU hangs have been observed when multiple engines write to the
same aux_inv register at the same time. To avoid this each engine
should only invalidate its own auxiliary table. The function
gen12_emit_flush_xcs() currently invalidate the auxiliary table for
all engines because the rq->engine is not necessarily the engine
eventually carrying out the request, and potentially the engine
could even be a virtual one (with engine->instance being -1).
With the MMIO remap feature, we can actually set bit 17 of MI_LRI
instruction and let the hardware to figure out the local aux_inv
register at runtime to avoid invalidating auxiliary table for all
engines.

Bspec: 45728

v2: Invalidate AUX table for indirect context as well.

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220328171650.1900674-1-fei.yang@intel.com


# 803efd29 01-Mar-2022 Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

drm/i915/xehp: compute engine pipe_control

CCS will reuse the RCS functions for breadcrumb and flush emission.
However, CCS pipe_control has additional programming restrictions:
- Command Streamer Stall Enable must be always set
- Post Sync Operations must not be set to Write PS Depth Count
- 3D-related bits must not be set

v2:
- Drop unwanted blank line. (Lucas)

Bspec: 47112
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220301231549.1817978-5-matthew.d.roper@intel.com


# 95c9e122 24-Sep-2021 Huang, Sean Z <sean.z.huang@intel.com>

drm/i915/pxp: Implement arb session teardown

Teardown is triggered when the display topology changes and no
long meets the secure playback requirement, and hardware trashes
all the encryption keys for display. Additionally, we want to emit a
teardown operation to make sure we're clean on boot and resume

v2: emit in the ring, use high prio request (Chris)
v3: better defines, stalling flush, cleaned up and renamed submission
funcs (Chris)
v12: fix uninitialized variable bug

Signed-off-by: Huang, Sean Z <sean.z.huang@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210924191452.1539378-9-alan.previn.teres.alexis@intel.com


# cf586021 17-Jun-2021 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Pipelined page migration

If we pipeline the PTE updates and then do the copy of those pages
within a single unpreemptible command packet, we can submit the copies
and leave them to be scheduled without having to synchronously wait
under a global lock. In order to manage migration, we need to
preallocate the page tables (and keep them pinned and available for use
at any time), causing a bottleneck for migrations as all clients must
contend on the limited resources. By inlining the ppGTT updates and
performing the blit atomically, each client only owns the PTE while in
use, and so we can reschedule individual operations however we see fit.
And most importantly, we do not need to take a global lock on the shared
vm, and wait until the operation is complete before releasing the lock
for others to claim the PTE for themselves.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210617063018.92802-8-thomas.hellstrom@linux.intel.com


# e52e4a31 19-May-2021 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

gpu: drm: replace occurrences of invalid character

There are some places at drm that ended receiving a
REPLACEMENT CHARACTER U+fffd ('�'), probably because of
some bad charset conversion.

Fix them by using what it seems to be the proper
character.

Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/e606930c73029f16673849c57acac061dd923866.1621412009.git.mchehab+huawei@kernel.org


# 24f90d66 22-Jan-2021 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: SPDX cleanup

Clean up the SPDX licence declarations to comply with checkpatch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210122192913.4518-1-chris@chris-wilson.co.uk
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 32d7171e 06-May-2020 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915/gen12: Fix HDC pipeline flush

HDC pipeline flush is bit on the first dword of
the PIPE_CONTROL, not the second. Make it so.

v2: function naming (Chris)

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200506144734.29297-2-mika.kuoppala@linux.intel.com


# f02ac414 06-May-2020 Mika Kuoppala <mika.kuoppala@linux.intel.com>

Revert "drm/i915/tgl: Include ro parts of l3 to invalidate"

This reverts commit 62037ffff229b7d94f1db5ef8d2e2ec819832ef3.

L3 ro cache invalidation is part of the dword0 of pipe
control. Also it is not relevant to this gen.

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200506144734.29297-1-mika.kuoppala@linux.intel.com


# 685d2109 24-Apr-2020 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Add per ctx batchbuffer wa for timestamp

Restoration of a previous timestamp can collide
with updating the timestamp, causing a value corruption.

Combat this issue by using indirect ctx bb to
modify the context image during restoring process.

We can preload value into scratch register. From which
we then do the actual write with LRR. LRR is faster and
thus less error prone as probability of race drops.

v2: tidying (Chris)
v3: lrr for all engines
v4: grp
v5: reg bit
v6: wa_bb_offset, virtual engines (Chris)

References: HSDES#16010904313
Testcase: igt/i915_selftest/gt_lrc
Suggested-by: Joseph Koston <joseph.koston@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200424230546.30271-1-mika.kuoppala@linux.intel.com


# 47f8253d 05-Mar-2020 Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>

drm/i915/gen7: Clear all EU/L3 residual contexts

On gen7 and gen7.5 devices, there could be leftover data residuals in
EU/L3 from the retiring context. This patch introduces workaround to clear
that residual contexts, by submitting a batch buffer with dedicated HW
context to the GPU with ring allocation for each context switching.

This security mitigation changes does not triggers any performance
regression. Performance is on par with current drm-tips.

v2: Add igt generated header file for CB kernel assembled with Mesa tool
and addressed use of Kernel macro for ptr_align comment.

v3: Resolve Sparse warnings with newly generated, and imported CB
kernel.

v4: Include new igt generated CB kernel for gen7 and gen7.5. Also
add code formatting and compiler warnings changes (Chris Wilson)

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Cc: Chris Wilson <chris.p.wilson@intel.com>
Cc: Balestrieri Francesco <francesco.balestrieri@intel.com>
Cc: Bloomfield Jon <jon.bloomfield@intel.com>
Cc: Dutt Sudeep <sudeep.dutt@intel.com>
Acked-by: Chris Wilson <chris@chris-wilso.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200306000957.2836150-2-chris@chris-wilson.co.uk


# 32d94048 11-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gem: Prepare gen7 cmdparser for async execution

The gen7 cmdparser is primarily a promotion-based system to allow access
to additional registers beyond the HW validation, and allows fallback to
normal execution of the user batch buffer if valid and requires
chaining. In the next patch, we will do the cmdparser validation in the
pipeline asynchronously and so at the point of request construction we
will not know if we want to execute the privileged and validated batch,
or the original user batch. The solution employed here is to execute
both batches, one with raised privileges and one as normal. This is
because the gen7 MI_BATCH_BUFFER_START command cannot change privilege
level within a batch and must strictly use the current privilege level
(or undefined behaviour kills the GPU). So in order to execute the
original batch, we need a second non-priviledged batch buffer chain from
the ring, i.e. we need to emit two batches for each user batch. Inside
the two batches we determine which one should actually execute, we
provide a conditional trampoline to call the original batch.

Implementation-wise, we create a single buffer and write the shadow and
the trampoline inside it at different offsets; and bind the buffer into
both the kernel GGTT for the privileged execution of the shadow and into
the user ppGTT for the non-privileged execution of the trampoline and
original batch. One buffer, two batches and two vma.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191211230858.599030-1-chris@chris-wilson.co.uk


# 755bf8a8 11-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove redundant parameters from intel_engine_cmd_parser

Declutter the calling interface by reducing the parameters to the
i915_vma and associated offsets.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191211110437.4082687-2-chris@chris-wilson.co.uk


# 4aa0b5d4 15-Oct-2019 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915/tgl: Add HDC Pipeline Flush

Add hdc pipeline flush to ensure memory state is coherent
in L3 when we are done.

v2: Flush also in breadcrumbs (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191015154449.10338-3-mika.kuoppala@linux.intel.com


# 62037fff 15-Oct-2019 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915/tgl: Include ro parts of l3 to invalidate

Aim for completeness and invalidate also the ro parts
in l3 cache. This might allow to get rid of the preparser
disable/enable workaround on invalidation path.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191015154449.10338-2-mika.kuoppala@linux.intel.com


# daed3e44 12-Oct-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: implement active wait for noa configurations

NOA configuration take some amount of time to apply. That amount of
time depends on the size of the GT. There is no documented time for
this. For example, past experimentations with powergating
configuration changes seem to indicate a 60~70us delay. We go with
500us as default for now which should be over the required amount of
time (according to HW architects).

v2: Don't forget to save/restore registers used for the wait (Chris)

v3: Name used CS_GPR registers (Chris)
Fix compile issue due to rebase (Lionel)

v4: Fix save/restore helpers (Umesh)

v5: Move noa_wait from drm_i915_private to i915_perf_stream (Lionel)

v6: Add missing struct declarations in i915_perf.h

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191012072308.30312-2-chris@chris-wilson.co.uk


# 6a45008a 12-Oct-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915/perf: allow for CS OA configs to be created lazily

Here we introduce a mechanism by which the execbuf part of the i915
driver will be able to request that a batch buffer containing the
programming for a particular OA config be created.

We'll execute these OA configuration buffers right before executing a
set of userspace commands so that a particular user batchbuffer be
executed with a given OA configuration.

This mechanism essentially allows the userspace driver to go through
several OA configuration without having to open/close the i915/perf
stream.

v2: No need for locking on object OA config object creation (Chris)
Flush cpu mapping of OA config (Chris)

v3: Properly deal with the perf_metric lock (Chris/Lionel)

v4: Fix oa config unref/put when not found (Lionel)

v5: Allocate BOs for configurations on the stream instead of globally
(Lionel)

v6: Fix 64bit division (Chris)

v7: Store allocated config BOs into the stream (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191012072308.30312-1-chris@chris-wilson.co.uk


# 74b2089a 25-Sep-2019 Michał Winiarski <michal.winiarski@intel.com>

drm/i915: Add definitions for MI_MATH command

We can use it in i915 for updating parts of unmasked registers from
within a batch. We're also adding Gen8+ versions of CS_GPR registers
(aka MI_MATH_REG in the coprocessor).

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190926100635.9416-4-michal.winiarski@intel.com


# 7d5255e0 26-Sep-2019 Michał Winiarski <michal.winiarski@intel.com>

drm/i915: Adjust length of MI_LOAD_REGISTER_REG

Default length value of MI_LOAD_REGISTER_REG is 1.
Also move it out of cmd-parser-only registers since we're going to use
it in i915.

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190926133142.2838-3-chris@chris-wilson.co.uk


# c210e85b 17-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/tgl: Extend MI_SEMAPHORE_WAIT

On Tigerlake, MI_SEMAPHORE_WAIT grew an extra dword, so be sure to
update the length field and emit that extra parameter and any padding
noop as required.

v2: Define the token shift while we are adding the updated MI_SEMAPHORE_WAIT
v3: Use int instead of bool in the addition so that readers are not left
wondering about the intricacies of the C spec. Now they just have to
worry what the integer value of a boolean operation is...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190917123055.28965-1-chris@chris-wilson.co.uk


# cdb736fa 06-Sep-2019 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Use engine relative LRIs on context setup

Daniele pointed out that relative mmio works differently in
on context restore. Instead of adding the engine mmio base to offset,
it masks out the base and adds bits [12:2] to current engine base.

This should allow us to construct context register state to be
applicable to all instances, including virtual. And avoid the trouble
of updating the registers on virtual instances when submitting work.

v2: only enable for gen12 for now (Mika)
v3: make enabling readable (Chris)

Bspec: 20206
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190906134957.25909-1-mika.kuoppala@linux.intel.com


# 8a8b540a 15-Aug-2019 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915/icl: Add command cache invalidate

On the set of invalidations, we need to add command
cache invalidate as a new domain.

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190815083055.14132-2-mika.kuoppala@linux.intel.com


# cfba6bd8 15-Aug-2019 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915/icl: Implement gen11 flush including tile cache

Add tile cache flushing for gen11. To relive us from the
burden of previous obsolete workarounds, make a dedicated
flush/invalidate callback for gen11.

To fortify an independent single flush, do post
sync op as there are indications that without it
we don't flush everything. This should also make this
callback more readily usable in tgl (see l3 fabric flush).

v2: whitespacing

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190815083055.14132-1-mika.kuoppala@linux.intel.com


# 05f219d7 10-Aug-2019 Matthew Auld <matthew.auld@intel.com>

drm/i915/blt: support copying objects

We can already clear an object with the blt, so try to do the same to
support copying from one object backing store to another. Really this is
just object -> object, which is not that useful yet, what we really want
is two backing stores, but that will require some vma rework first,
otherwise we are stuck with "tmp" objects.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190810174338.19810-1-chris@chris-wilson.co.uk


# bf1315b8 10-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/selftests: Ensure we don't clamp a random offset to 32b

Specify that we do want a 64b value for sizeof(u32) as we want to
compute the mask of the upper 62bits.

v2: Use round_down() for automatic type promotion

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190710161413.7115-1-chris@chris-wilson.co.uk


# 6501aa4e 29-May-2019 Matthew Auld <matthew.auld@intel.com>

drm/i915: add in-kernel blitter client

The plan is to use the blitter engine for async object clearing when
using local memory, but before we can move the worker to get_pages() we
have to first tame some more of our struct_mutex usage. With this in
mind we should be able to upstream the object clearing as some
selftests, which should serve as a guinea pig for the ongoing locking
rework and upcoming async get_pages() framework.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: CQ Tang <cq.tang@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190529123108.24422-2-matthew.auld@intel.com


# 112ed2d3 24-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move GraphicsTechnology files under gt/

Start partitioning off the code that talks to the hardware (GT) from the
uapi layers and move the device facing code under gt/

One casualty is s/intel_ringbuffer.h/intel_engine.h/ with the plan to
subdivide that header and body further (and split out the submission
code from the ringbuffer and logical context handling). This patch aims
to be simple motion so git can fixup inflight patches with little mess.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424174839.7141-1-chris@chris-wilson.co.uk