History log of /linux-master/drivers/gpu/drm/i915/gt/intel_gtt.h
Revision Date Author Comments
# ca02a011 07-Nov-2023 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Record which client owns a VM

To enable accounting of indirect client memory usage (such as page tables)
in the following patch, lets start recording the creator of each PPGTT.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231107101806.608990-2-tvrtko.ursulin@linux.intel.com


# 9bb66c17 26-Oct-2023 Andrzej Hajda <andrzej.hajda@intel.com>

drm/i915: Reserve some kernel space per vm

Reserve one page in each vm for kernel space to use for things
such as workarounds.

v2: use real memory, do not decrease vm.total
v4: reserve only one page and explain flag
v5: remove allocated object on ppgtt cleanup
v6: decrease vm->total by reservation size

Suggested-by: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231026-wabb-v6-1-4aa7d55d0a8a@intel.com


# 3f5f6288 26-Sep-2023 Nirmoy Das <nirmoy.das@intel.com>

drm/i915: Parameterize binder context creation

Add i915_ggtt_require_binder() to indicate that i915
needs to create binder context which will be used
by subsequent patch to enable i915_address_space vfuncs
that will use GPU commands to update GGTT.

Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230926083742.14740-5-nirmoy.das@intel.com


# 0e514878 26-Sep-2023 Nirmoy Das <nirmoy.das@intel.com>

drm/i915: Implement for_each_sgt_daddr_next

Implement a way to iterate over sgt with pre-initialized
sgt_iter state.

Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230926083742.14740-4-nirmoy.das@intel.com


# 6b711386 11-Sep-2023 Jani Nikula <jani.nikula@intel.com>

drm/i915/gt: rename DBG() to GTT_TRACE()

intel_gtt.h is indirectly included absolutely everywhere in the
driver. DBG() is too short a name. Rename it GTT_TRACE() after
GEM_TRACE().

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230911123305.1682554-1-jani.nikula@intel.com


# 9275277d 09-May-2023 Fei Yang <fei.yang@intel.com>

drm/i915: use pat_index instead of cache_level

Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.

From UMD's perspective there is also a necessity to set caching policy for
performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity. PAT is expected to work much like MOCS already works today,
and by design userspace is expected to select the index that exactly
matches the desired behavior described in the hardware specification.

For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For kernel objects, cache_level is used
for simplicity and backward compatibility. For Pre-gen12 platforms PAT can
have 1:1 mapping to i915_cache_level, so these two are interchangeable. see
the use of LEGACY_CACHELEVEL.

One consequence of this change is that gen8_pte_encode is no longer working
for gen12 platforms due to the fact that gen12 platforms has different PAT
definitions. In the meantime the mtl_pte_encode introduced specfically for
MTL becomes generic for all gen12 platforms. This patch renames the MTL
PTE encode function into gen12_pte_encode and apply it to all gen12. Even
though this change looks unrelated, but separating them would temporarily
break gen12 PTE encoding, thus squash them in one patch.

Special note: this patch changes the way caching behavior is controlled in
the sense that some objects are left to be managed by userspace. For such
objects we need to be careful not to change the userspace settings.There
are kerneldoc and comments added around obj->cache_coherent, cache_dirty,
and how to bypass the checkings by i915_gem_object_has_cache_level. For
full understanding, these changes need to be looked at together with the
two follow-up patches, one disables the {set|get}_caching ioctl's and the
other adds set_pat extension to the GEM_CREATE uAPI.

Bspec: 63019

Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230509165200.1740-3-fei.yang@intel.com


# e798a3b3 02-May-2023 Jani Nikula <jani.nikula@intel.com>

drm/i915/gtt: fix i915_vm_resv_put() kernel-doc parameter name

drivers/gpu/drm/i915/gt/intel_gtt.h:515: warning: Function parameter or member 'vm' not described in 'i915_vm_resv_put'
drivers/gpu/drm/i915/gt/intel_gtt.h:515: warning: Excess function parameter 'resv' description in 'i915_vm_resv_put'

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/c22e58e770019667980b3617f6e963b76d7e79a7.1683041799.git.jani.nikula@intel.com


# 341ad0e8 24-Apr-2023 Fei Yang <fei.yang@intel.com>

drm/i915/mtl: Add PTE encode function

PTE encode functions are platform dependent. This patch implements
PTE functions for MTL, and ensures the correct PTE encode function
is used by calling pte_encode function pointer instead of the
hardcoded gen8 version of PTE encode.

Fixes: b76c0deef627 ("drm/i915/mtl: Define MOCS and PAT tables for MTL")
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230424182902.3663500-2-fei.yang@intel.com


# b76c0dee 21-Apr-2023 Madhumitha Tolakanahalli Pradeep <madhumitha.tolakanahalli.pradeep@intel.com>

drm/i915/mtl: Define MOCS and PAT tables for MTL

On MTL, GT can no longer allocate on LLC - only the CPU can.
This, along with programming new register bits that MTL
requires calls for a MOCS/PAT table update.
Also the PAT index registers are multicasted for primary GT,
and there is an address jump from index 7 to 8. This patch
makes sure that these registers are programmed in the proper
way.

BSpec: 44509, 45101, 44235

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Madhumitha Tolakanahalli Pradeep <madhumitha.tolakanahalli.pradeep@intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230421182535.292670-2-andi.shyti@linux.intel.com


# b288d740 10-Mar-2023 Andrzej Hajda <andrzej.hajda@intel.com>

drm/i915/gt: introduce vm->scratch_range callback

The callback will be responsible for setting scratch page PTEs for
specified range. In contrast to clear_range it cannot be optimized to nop.
It will be used by code adding guard pages.

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230308-guard_error_capture-v6-1-1b5f31422563@intel.com


# de3a9ab9 30-Nov-2022 Andi Shyti <andi.shyti@linux.intel.com>

Revert "drm/i915: Improve on suspend / resume time with VT-d enabled"

This reverts commit 2ef6efa79fecd5e3457b324155d35524d95f2b6b.

Checking the presence if the IRST (Intel Rapid Start Technology)
through the ACPI to decide whether to rebuild or not the GGTT
puts us at the mercy of the boot firmware and we need to
unnecessarily rely on third parties.

Because now we avoid adding scratch pages to the entire GGTT we
don't need this hack anymore.

Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221130235805.221010-6-andi.shyti@linux.intel.com


# f2053d34 10-Nov-2022 Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

drm/i915/guc: enable GuC GGTT invalidation from the start

Invalidating the GuC TLBs while GuC is not loaded does not have negative
consequences, so if we're starting the driver with GuC enabled we can
use the GGTT invalidation function from the get-go, instead of switching
to it when we initialize the GuC objects.

In MTL, this fixes and issue where we try to overwrite the invalidation
function twice (once for each GuC), due to the GGTT being shared between
the primary and media GTs

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221110175823.3867135-1-daniele.ceraolospurio@intel.com


# 0f857158 21-Nov-2022 Aravind Iddamsetty <aravind.iddamsetty@intel.com>

drm/i915/mtl: Media GT and Render GT share common GGTT

On XE_LPM+ platforms the media engines are carved out into a separate
GT but have a common GGTMMADR address range which essentially makes
the GGTT address space to be shared between media and render GT. As a
result any updates in GGTT shall invalidate TLB of GTs sharing it and
similarly any operation on GGTT requiring an action on a GT will have to
involve all GTs sharing it. setup_private_pat was being done on a per
GGTT based as that doesn't touch any GGTT structures moved it to per GT
based.

BSPEC: 63834

v2:
1. Add details to commit msg
2. includes fix for failure to add item to ggtt->gt_list, as suggested
by Lucas
3. as ggtt_flush() is used only for ggtt drop i915_is_ggtt check within
it.
4. setup_private_pat moved out of intel_gt_tiles_init

v3:
1. Move out for_each_gt from i915_driver.c (Jani Nikula)

v4: drop using RCU primitives on ggtt->gt_list as it is not an RCU list
(Matt Roper)

Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221122070126.4813-1-aravind.iddamsetty@intel.com


# 77fa9efc 14-Oct-2022 Matt Roper <matthew.d.roper@intel.com>

drm/i915/xehp: Create separate reg definitions for new MCR registers

Starting in Xe_HP, several registers our driver works with have been
converted from singleton registers into replicated registers with
multicast behavior. Although the registers are still located at the
same MMIO offsets as on previous platforms, let's duplicate the register
definitions in preparation for upcoming patches that will handle
multicast registers in a special manner.

The registers that are now replicated on Xe_HP are:
* PAT_INDEX (mslice replication)
* FF_MODE2 (gslice replication)
* COMMON_SLICE_CHICKEN3 (gslice replication)
* SLICE_COMMON_ECO_CHICKEN1 (gslice replication)
* SLICE_UNIT_LEVEL_CLKGATE (gslice replication)
* LNCFCMOCS (lncf replication)

Note that there are a couple places in selftest_mocs.c where the
gen9 version of LNCFCMOCS is still used without regards for which
platform we're on. Those cases are just doing an offset lookup and not
issuing any CPU reads/writes of the register, so the potentially
multicast nature of the register doesn't come into play.

v2:
- Add commit message note about the unconditional GEN9_LNCFCMOCS usage
in selftest_mocs. (Bala)
- Include some additional TLB registers.

Bspec: 66534
Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014230239.1023689-3-matthew.d.roper@intel.com


# 8133a6da 03-Oct-2022 Matthew Auld <matthew.auld@intel.com>

drm/i915: enable PS64 support for DG2

It turns out that on production DG2/ATS HW we should have support for
PS64. This feature allows to provide a 64K TLB hint at the PTE level,
which is a lot more flexible than the current method of enabling 64K GTT
pages for the entire page-table, since that leads to all kinds of
annoying restrictions, as documented in:

commit caa574ffc4aaf4f29b890223878c63e2e7772f62
Author: Matthew Auld <matthew.auld@intel.com>
Date: Sat Feb 19 00:17:49 2022 +0530

drm/i915/uapi: document behaviour for DG2 64K support

On discrete platforms like DG2, we need to support a minimum page size
of 64K when dealing with device local-memory. This is quite tricky for
various reasons, so try to document the new implicit uapi for this.

With PS64, we can now drop the 2M GTT alignment restriction, and instead
only require 64K or larger when dealing with lmem. We still use the
compact-pt layout when possible, but only when we are certain that this
doesn't interfere with userspace.

Note that this is a change in uAPI behaviour, but hopefully shouldn't be
a concern (IGT is at least able to autodetect the alignment), since we
are only making the GTT alignment constraint less restrictive.

Based on a patch from CQ Tang.

v2: update the comment wrt scratch page
v3: (Nirmoy)
- Fix the selftest to actually use the random size, plus some comment
improvements, also drop the rem stuff.

Reported-by: Michal Mrozek <michal.mrozek@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Yang A Shi <yang.a.shi@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Acked-by: Michal Mrozek <michal.mrozek@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221004114915.221708-1-matthew.auld@intel.com


# f5e92d23 13-Sep-2022 Anshuman Gupta <anshuman.gupta@intel.com>

drm/i915: Refactor userfault_wakeref to re-use

Refactor userfault_wakeref to re-use for discrete lmem mmap mapping
as well, as on discrete GTT mmap are not supported. Moving
userfault_wakeref from ggtt to gt structure.

Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220913152714.16541-2-anshuman.gupta@intel.com


# a0696856 24-Jun-2022 Nirmoy Das <nirmoy.das@intel.com>

drm/i915: Fix a lockdep warning at error capture

For some platfroms we use stop_machine version of
gen8_ggtt_insert_page/gen8_ggtt_insert_entries to avoid a
concurrent GGTT access bug but this causes a circular locking
dependency warning:

Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&ggtt->error_mutex);
lock(dma_fence_map);
lock(&ggtt->error_mutex);
lock(cpu_hotplug_lock);

Fix this by calling gen8_ggtt_insert_page/gen8_ggtt_insert_entries
directly at error capture which is concurrent GGTT access safe because
reset path make sure of that.

v2: Fix rebase conflict and added a comment.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5595
Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220624110821.29190-1-nirmoy.das@intel.com


# 9ce07d94 17-Jun-2022 Lucas De Marchi <lucas.demarchi@intel.com>

drm/i915/gt: Re-do the intel-gtt split

Re-do what was attempted in commit 7a5c922377b4 ("drm/i915/gt: Split
intel-gtt functions by arch"). The goal of that commit was to split the
handlers for older hardware that depend on intel-gtt.ko so i915 can
be built for non-x86 archs, after some more patches. Other archs do not
need intel-gtt.ko.

Main issue with the previous approach: it moved all the hooks, including
the gen8, which is used by all platforms gen8 and newer. Re-do the
split moving only the handlers for gen < 6, which are the only ones
calling out to the separate module.

While at it do some minor cleanups:
- Rename the prefix s/gen5_/gmch_/ to be more accurate what platforms
are covered by intel_ggtt_gmch.c
- Remove dead code for gen12 out of needs_idle_maps()
- Remove TODO comment leftover
- Re-order if/else ladder in ggtt_probe_hw() to keep newest platforms
first

v2: Add minor cleanups (Matt Roper)

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220617230559.2109427-2-lucas.demarchi@intel.com


# 2ef6efa7 17-Jun-2022 Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/i915: Improve on suspend / resume time with VT-d enabled

When DMAR / VT-d is enabled, the display engine uses overfetching,
presumably to deal with the increased latency. To avoid display engine
errors and DMAR faults, as a workaround the GGTT is populated with scatch
PTEs when VT-d is enabled. However starting with gen10, Write-combined
writing of scratch PTES is no longer possible and as a result, populating
the full GGTT with scratch PTEs like on resume becomes very slow as
uncached access is needed.

Therefore, on integrated GPUs utilize the fact that the PTEs are stored in
stolen memory which retain content across S3 suspend. Don't clear the PTEs
on suspend and resume. This improves on resume time with around 100 ms.
While 100+ms might appear like a short time it's 10% to 20% of total resume
time and important in some applications.

One notable exception is Intel Rapid Start Technology which may cause
stolen memory to be lost across what the OS percieves as S3 suspend.
If IRST is enabled or if we can't detect whether IRST is enabled, retain
the old workaround, clearing and re-instating PTEs.

As an additional measure, if we detect that the last ggtt pte was lost
during suspend, print a warning and re-populate the GGTT ptes

On discrete GPUs, the display engine scans out from LMEM which isn't
subject to DMAR, and presumably the workaround is therefore not needed,
but that needs to be verified and disabling the workaround for dGPU,
if possible, will be deferred to a follow-up patch.

v2:
- Rely on retained ptes to also speed up suspend and resume re-binding.
- Re-build GGTT ptes if Intel rst is enabled.
v3:
- Re-build GGTT ptes also if we can't detect whether Intel rst is enabled,
and if the guard page PTE and end of GGTT was lost.
v4:
- Fix some kerneldoc issues (Matthew Auld), rebase.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220617152856.249295-1-thomas.hellstrom@linux.intel.com


# 7a5c9223 30-Mar-2022 Casey Bowman <casey.g.bowman@intel.com>

drm/i915/gt: Split intel-gtt functions by arch

Some functions defined in the intel-gtt module are used in several
areas, but is only supported on x86 platforms.

By separating these calls and their static underlying functions to
another area, we are able to compile out these functions for
non-x86 builds and provide stubs for the non-x86 implementations.

In addition to the problematic calls, we are moving the gmch-related
functions to the new area.

Signed-off-by: Casey Bowman <casey.g.bowman@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220330234809.1218210-2-casey.g.bowman@intel.com


# a7f46d5b 29-Mar-2022 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Move intel_vtd_active and run_as_guest to i915_utils

Continuation of the effort to declutter i915_drv.h.

Also, component specific helpers which consult the iommu/virtualization
helpers moved to respective component source/header files as appropriate.

v2:
* s/dev_priv/i915/ in intel_scanout_needs_vtd_wa. (Lucas)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220329090204.2324499-1-tvrtko.ursulin@linux.intel.com
[tursulin: fixup conflict in i915_drv.h]


# e1a7ab4f 04-Mar-2022 Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/i915: Remove the vm open count

vms are not getting properly closed. Rather than fixing that,
Remove the vm open count and instead rely on the vm refcount.

The vm open count existed solely to break the strong references the
vmas had on the vms. Now instead make those references weak and
ensure vmas are destroyed when the vm is destroyed.

Unfortunately if the vm destructor and the object destructor both
wants to destroy a vma, that may lead to a race in that the vm
destructor just unbinds the vma and leaves the actual vma destruction
to the object destructor. However in order for the object destructor
to ensure the vma is unbound it needs to grab the vm mutex. In order
to keep the vm mutex alive until the object destructor is done with
it, somewhat hackishly grab a vm_resv refcount that is released late
in the vma destruction process, when the vm mutex is no longer needed.

v2: Address review-comments from Niranjana
- Clarify that the struct i915_address_space::skip_pte_rewrite is a hack
and should ideally be replaced in an upcoming patch.
- Remove an unneeded continue in clear_vm_list and update comment.

v3:
- Documentation update
- Commit message formatting

Co-developed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220304082641.308069-2-thomas.hellstrom@linux.intel.com


# 2cff4b9e 18-Feb-2022 Matthew Auld <matthew.auld@intel.com>

drm/i915/gtt: allow overriding the pt alignment

On some platforms we have alignment restrictions when accessing LMEM
from the GTT. In the next few patches we need to be able to modify the
page-tables directly via the GTT itself.

Suggested-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220218184752.7524-10-ramalingam.c@intel.com


# 5189e312 18-Feb-2022 Matthew Auld <matthew.auld@intel.com>

drm/i915: support 64K GTT pages for discrete cards

discrete cards optimise 64K GTT pages for local-memory, since everything
should be allocated at 64K granularity. We say goodbye to sparse
entries, and instead get a compact 256B page-table for 64K pages,
which should be more cache friendly. 4K pages for local-memory
are no longer supported by the HW.

v4: don't return uninitialized err in igt_ppgtt_compact
Reported-by: kernel test robot <lkp@intel.com>

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220218184752.7524-8-ramalingam.c@intel.com


# 87bd701e 18-Feb-2022 Matthew Auld <matthew.auld@intel.com>

drm/i915: enforce min GTT alignment for discrete cards

For local-memory objects we need to align the GTT addresses
to 64K, both for the ppgtt and ggtt.

We need to support vm->min_alignment > 4K, depending
on the vm itself and the type of object we are inserting.
With this in mind update the GTT selftests to take this
into account.

For compact-pt we further align and pad lmem object GTT addresses
to 2MB to ensure PDEs contain consistent page sizes as
required by the HW.

v3:
* use needs_compact_pt flag to discriminate between
64K and 64K with compact-pt
* add i915_vm_obj_min_alignment
* use i915_vm_obj_min_alignment to round up vma reservation
if compact-pt instead of hard coding
v5:
* fix i915_vm_obj_min_alignment for internal objects which
have no memory region
v6:
* tiled_blits_create correctly pick largest required alignment
v8:
* i915_vm_min_alignment protect against array overflow for mock region

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220218184752.7524-7-ramalingam.c@intel.com


# 2f6b90da 10-Jan-2022 Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/i915: Use vma resources for async unbinding

Implement async (non-blocking) unbinding by not syncing the vma before
calling unbind on the vma_resource.
Add the resulting unbind fence to the object's dma_resv from where it is
picked up by the ttm migration code.
Ideally these unbind fences should be coalesced with the migration blit
fence to avoid stalling the migration blit waiting for unbind, as they
can certainly go on in parallel, but since we don't yet have a
reasonable data structure to use to coalesce fences and attach the
resulting fence to a timeline, we defer that for now.

Note that with async unbinding, even while the unbind waits for the
preceding bind to complete before unbinding, the vma itself might have been
destroyed in the process, clearing the vma pages. Therefore we can
only allow async unbinding if we have a refcounted sg-list and keep a
refcount on that for the vma resource pages to stay intact until
binding occurs. If this condition is not met, a request for an async
unbind is diverted to a sync unbind.

v2:
- Use a separate kmem_cache for vma resources for now to isolate their
memory allocation and aid debugging.
- Move the check for vm closed to the actual unbinding thread. Regardless
of whether the vm is closed, we need the unbind fence to properly wait
for capture.
- Clear vma_res::vm on unbind and update its documentation.
v4:
- Take cache coloring into account when searching for vma resources
pending unbind. (Matthew Auld)
v5:
- Fix timeout and error check in i915_vma_resource_bind_dep_await().
- Avoid taking a reference on the object for async binding if
async unbind capable.
- Fix braces around a single-line if statement.
v6:
- Fix up the cache coloring adjustment. (Kernel test robot <lkp@intel.com>)
- Don't allow async unbinding if the vma_res pages are not the same as
the object pages. (Matthew Auld)
v7:
- s/unsigned long/u64/ in a number of places (Matthew Auld)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220110172219.107131-5-thomas.hellstrom@linux.intel.com


# 39a2bd34 10-Jan-2022 Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/i915: Use the vma resource as argument for gtt binding / unbinding

When introducing asynchronous unbinding, the vma itself may no longer
be alive when the actual binding or unbinding takes place.

Update the gtt i915_vma_ops accordingly to take a struct i915_vma_resource
instead of a struct i915_vma for the bind_vma() and unbind_vma() ops.
Similarly change the insert_entries() op for struct i915_address_space.

Replace a couple of i915_vma_snapshot members with their newly introduced
i915_vma_resource counterparts, since they have the same lifetime.

Also make sure to avoid changing the struct i915_vma_flags (in particular
the bind flags) async. That should now only be done sync under the
vm mutex.

v2:
- Update the vma_res::bound_flags when binding to the aliased ggtt
v6:
- Remove I915_VMA_ALLOC_BIT (Matthew Auld)
- Change some members of struct i915_vma_resource from unsigned long to u64
(Matthew Auld)
v7:
- Fix vma resource size parameters to be u64 rather than unsigned long
(Matthew Auld)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220110172219.107131-3-thomas.hellstrom@linux.intel.com


# 0b4d1f0e 16-Dec-2021 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Remove pages_mutex and intel_gtt->vma_ops.set/clear_pages members, v3.

Big delta, but boils down to moving set_pages to i915_vma.c, and removing
the special handling, all callers use the defaults anyway. We only remap
in ggtt, so default case will fall through.

Because we still don't require locking in i915_vma_unpin(), handle this by
using xchg in get_pages(), as it's locked with obj->mutex, and cmpxchg in
unpin, which only fails if we race a against a new pin.

Changes since v1:
- aliasing gtt sets ZERO_SIZE_PTR, not -ENODEV, remove special case
from __i915_vma_get_pages(). (Matt)
Changes since v2:
- Free correct old pages in __i915_vma_get_pages(). (Matt)
Remove race of clearing vma->pages accidentally from put,
free it but leave it set, as only get has the lock.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211216142749.1966107-4-maarten.lankhorst@linux.intel.com
Reviewed-by: Matthew Auld <matthew.auld@intel.com>


# fef53be0 08-Dec-2021 Matthew Auld <matthew.auld@intel.com>

drm/i915/gtt/xehpsdv: move scratch page to system memory

On some platforms the hw has dropped support for 4K GTT pages when
dealing with LMEM, and due to the design of 64K GTT pages in the hw, we
can only mark the *entire* page-table as operating in 64K GTT mode,
since the enable bit is still on the pde, and not the pte. And since we
we still need to allow 4K GTT pages for SMEM objects, we can't have a
"normal" 4K page-table with scratch pointing to LMEM, since that's
undefined from the hw pov. The simplest solution is to just move the 64K
scratch page to SMEM on such platforms and call it a day, since that
should work for all configurations.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211208141613.7251-4-ramalingam.c@intel.com


# 5f978167 06-Dec-2021 Michael Cheng <michael.cheng@intel.com>

drm/i915: Introduce new macros for i915 PTE

Certain functions within i915 uses macros that are defined for
specific architectures by the mmu, such as _PAGE_RW and _PAGE_PRESENT
(Some architectures don't even have these macros defined, like ARM64).

Instead of re-using bits defined for the CPU, we should use bits
defined for i915. This patch introduces two new 64 bit macros,
GEN8_PAGE_PRESENT and GEN8_PAGE_RW, to check for bits 0 and 1 and, to
replace all occurrences of _PAGE_RW and _PAGE_PRESENT within i915.

v2(Michael Cheng): Use GEN8_ instead of I915_

Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
[ Move defines together with other GEN8 defines ]
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211206215245.513677-2-michael.cheng@intel.com


# 8d2f683f 01-Nov-2021 Imre Deak <imre.deak@intel.com>

drm/i915: Factor out i915_ggtt_suspend_vm/i915_ggtt_resume_vm()

Factor out functions that are needed by the next patch to suspend/resume
the memory mappings for DPT FBs.

No functional change, except reordering during suspend the
ggtt->invalidate(ggtt) call wrt. atomic_set(&ggtt->vm.open, open) and
mutex_unlock(&ggtt->vm.mutex). This shouldn't matter due to the i915
suspend sequence being single threaded.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211101183551.3580546-1-imre.deak@intel.com


# a259cc14 22-Sep-2021 Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/i915: Reduce the number of objects subject to memcpy recover

We really only need memcpy restore for objects that affect the
operability of the migrate context. That is, primarily the page-table
objects of the migrate VM.

Add an object flag, I915_BO_ALLOC_PM_EARLY for objects that need early
restores using memcpy and a way to assign LMEM page-table object flags
to be used by the vms.

Restore objects without this flag with the gpu blitter and only objects
carrying the flag using TTM memcpy.

Initially mark the migrate, gt, gtt and vgpu vms to use this flag, and
defer for a later audit which vms actually need it. Most importantly, user-
allocated vms with pinned page-table objects can be restored using the
blitter.

Performance-wise memcpy restore is probably as fast as gpu restore if not
faster, but using gpu restore will help tackling future restrictions in
mappable LMEM size.

v4:
- Don't mark the aliasing ppgtt page table flags for early resume, but
rather the ggtt page table flags as intended. (Matthew Auld)
- The check for user buffer objects during early resume is pointless, since
they are never marked I915_BO_ALLOC_PM_EARLY. (Matthew Auld)
v5:
- Mark GuC LMEM objects with I915_BO_ALLOC_PM_EARLY to have them restored
before we fire up the migrate context.

Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210922062527.865433-8-thomas.hellstrom@linux.intel.com


# dcc5d820 02-Sep-2021 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Stop rcu support for i915_address_space

The full audit is quite a bit of work:

- i915_dpt has very simple lifetime (somehow we create a display pagetable vm
per object, so its _very_ simple, there's only ever a single vma in there),
and uses i915_vm_close(), which internally does a i915_vm_put(). No rcu.

Aside: wtf is i915_dpt doing in the intel_display.c garbage collector as a new
feature, instead of added as a separate file with some clean-ish interface.

Also, i915_dpt unfortunately re-introduces some coding patterns from
pre-dma_resv_lock conversion times.

- i915_gem_proto_ctx is fully refcounted and no rcu, all protected by
fpriv->proto_context_lock.

- i915_gem_context is itself rcu protected, and that might leak to anything it
points at. Before

commit cf977e18610e66e48c31619e7e0cfa871be9eada
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Dec 2 11:21:40 2020 +0000

drm/i915/gem: Spring clean debugfs

and

commit db80a1294c231b6ac725085f046bb2931e00c9db
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon Jan 18 11:08:54 2021 +0000

drm/i915/gem: Remove per-client stats from debugfs/i915_gem_objects

we had a bunch of debugfs files that relied on rcu protecting everything, but
those are gone now. The main one was removed even earlier with

There doesn't seem to be anything left that's actually protecting
stuff now that the ctx->vm itself is invariant. See

commit ccbc1b97948ab671335e950271e39766729736c3
Author: Jason Ekstrand <jason@jlekstrand.net>
Date: Thu Jul 8 10:48:30 2021 -0500

drm/i915/gem: Don't allow changing the VM on running contexts (v4)

Note that we drop the vm refcount before the final release of the gem context
refcount, so this is all very dangerous even without rcu. Note that aside from
later on creating new engines (a defunct feature) and debug output we're never
looked at gem_ctx->vm for anything functional, hence why this is ok.
Fingers crossed.

Preceeding patches removed all vestiges of rcu use from gem_ctx->vm
derferencing to make it clear it's really not used.

The gem_ctx->rcu protection was introduced in

commit a4e7ccdac38ec8335d9e4e2656c1a041c77feae1
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri Oct 4 14:40:09 2019 +0100

drm/i915: Move context management under GEM

The commit message is somewhat entertaining because it fails to
mention this fact completely, and compensates that by an in-commit
changelog entry that claims that ctx->vm is protected by ctx->mutex.
Which was the case _before_ this commit, but no longer after it.

- intel_context holds a full reference. Unfortunately intel_context is also rcu
protected and the reference to the ->vm is dropped before the
rcu barrier - only the kfree is delayed. So again we need to check
whether that leaks anywhere on the intel_context->vm. RCU is only
used to protect intel_context sitting on the breadcrumb lists, which
don't look at the vm anywhere, so we are fine.

Nothing else relies on rcu protection of intel_context and hence is
fully protected by the kref refcount alone, which protects
intel_context->vm in turn.

The breadcrumbs rcu usage was added in

commit c744d50363b714783bbc88d986cc16def13710f7
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Thu Nov 26 14:04:06 2020 +0000

drm/i915/gt: Split the breadcrumb spinlock between global and contexts

its parent commit added the intel_context rcu protection:

commit 14d1eaf08845c534963c83f754afe0cb14cb2512
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Thu Nov 26 14:04:05 2020 +0000

drm/i915/gt: Protect context lifetime with RCU

given some credence to my claim that I've actually caught them all.

- drm_i915_gem_object's shares_resv_from pointer has a full refcount to the
dma_resv, which is a sub-refcount that's released after the final
i915_vm_put() has been called. Safe.

Aside: Maybe we should have a struct dma_resv_shared which is just dma_resv +
kref as a stand-alone thing. It's a pretty useful pattern which other drivers
might want to copy.

For a bit more context see

commit 4d8151ae5329cf50781a02fd2298a909589a5bab
Author: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Date: Tue Jun 1 09:46:41 2021 +0200

drm/i915: Don't free shared locks while shared

- the fpriv->vm_xa was relying on rcu_read_lock for lookup, but that
was updated in a prep patch too to just be a spinlock-protected
lookup.

- intel_gt->vm is set at driver load in intel_gt_init() and released
in intel_gt_driver_release(). There seems to be some issue that
in some error paths this is called twice, but otherwise no rcu to be
found anywhere. This was added in the below commit, which
unfortunately doesn't explain why this complication exists.

commit e6ba76480299a0d77c51d846f7467b1673aad25b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Sat Dec 21 16:03:24 2019 +0000

drm/i915: Remove i915->kernel_context

The proper fix most likely for this is to start using drmm_ at large
scale, but that's also huge amounts of work.

- i915_vma->vm is some real pain, because rcu is rcu protected, at
least in the vma lookup in the context lookup cache in
eb_lookup_vma(). This was added in

commit 4ff4b44cbb70c269259958cbcc48d7b8a2cb9ec8
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri Jun 16 15:05:16 2017 +0100

drm/i915: Store a direct lookup from object handle to vma

This was changed to a radix tree from the hashtable in, but with the
locking unchanged, in

commit d1b48c1e7184d9bc4ae6d7f9fe2eed9efed11ffc
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Aug 16 09:52:08 2017 +0100

drm/i915: Replace execbuf vma ht with an idr

In

commit 93159e12353c2a47e5576d642845a91fa00530bf
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon Mar 23 09:28:41 2020 +0000

drm/i915/gem: Avoid gem_context->mutex for simple vma lookup

the locking was changed from dev->struct_mutex to rcu, which added
the requirement to rcu protect i915_vma. Somehow this was missed in
review (or I'm completely blind).

Irrespective of all that the vma lookup cache rcu_read_lock grabs a
full reference of the vma and the rcu doesn't leak further. So no
impact on i915_address_space from that.

I have not found any other rcu use for i915_vma, but given that it
seems broken I also didn't bother to do a careful in-depth audit.

Alltogether there's nothing left in-tree anymore which requires that a
pointer deref to an i915_address_space is safe undre rcu_read_lock
only.

rcu protection of i915_address_space was introduced in

commit b32fa811156328aea5a3c2ff05cc096490382456
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Thu Jun 20 19:37:05 2019 +0100

drm/i915/gtt: Defer address space cleanup to an RCU worker

by mixing up a bugfixing (i915_address_space needs to be released from
a worker) with enabling rcu support. The commit message also seems
somewhat confused, because it talks about cleanup of WC pages
requiring sleep, while the code and linked bugzilla are about a
requirement to take dev->struct_mutex (which yes sleeps but it's a
much more specific problem). Since final kref_put can be called from
pretty much anywhere (including hardirq context through the
scheduler's i915_active cleanup) we need a worker here. Hence that
part must be kept.

Ideally all these reclaim workers should have some kind of integration
with our shrinkers, but for some of these it's rather tricky. Anyway,
that's a preexisting condition in the codeebase that we wont fix in
this patch here.

We also remove the rcu_barrier in ggtt_cleanup_hw added in

commit 60a4233a4952729089e4df152e730f8f4d0e82ce
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon Jul 29 14:24:12 2019 +0100

drm/i915: Flush the i915_vm_release before ggtt shutdown

Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-11-daniel.vetter@ffwll.ch


# 8579d37e 08-Jul-2021 Jason Ekstrand <jason@jlekstrand.net>

drm/i915/gt: Drop i915_address_space::file (v2)

There's a big comment saying how useful it is but no one is using this
for anything anymore.

It was added in 2bfa996e031b ("drm/i915: Store owning file on the
i915_address_space") and used for debugfs at the time as well as telling
the difference between the global GTT and a PPGTT. In f6e8aa387171
("drm/i915: Report the number of closed vma held by each context in
debugfs") we removed one use of it by switching to a context walk and
comparing with the VM in the context. Finally, VM stats for debugfs
were entirely nuked in db80a1294c23 ("drm/i915/gem: Remove per-client
stats from debugfs/i915_gem_objects")

v2 (Daniel Vetter):
- Delete a struct drm_i915_file_private pre-declaration
- Add a comment to the commit message about history

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-24-jason@jlekstrand.net


# 3607e1e9 17-Jun-2021 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Add a routine to iterate over the pagetables of a GTT

In the next patch, we will want to look at the dma addresses of
individual page tables, so add a routine to iterate over them.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210617063018.92802-6-thomas.hellstrom@linux.intel.com


# 4d8151ae 01-Jun-2021 Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/i915: Don't free shared locks while shared

We are currently sharing the VM reservation locks across a number of
gem objects with page-table memory. Since TTM will individiualize the
reservation locks when freeing objects, including accessing the shared
locks, make sure that the shared locks are not freed until that is done.
For PPGTT we add an additional refcount, for GGTT we take additional
measures to make sure objects sharing the GGTT reservation lock are
freed at GGTT takedown

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210601074654.3103-3-thomas.hellstrom@linux.intel.com


# 74862d4c 24-May-2021 Imre Deak <imre.deak@intel.com>

drm/i915/adlp: Fix GEM VM asserts for DPT VMs

An object mapped via DPT can have remapped and rotated VMA instances
besides the normal VMA instance, similarly to GGTT VMA instances.
Adjust the corresponding VMA lookup asserts.

While at it also check if a DPT VM is passed incorrectly to
i915_vm_to_ppgtt().

Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210524172703.2113058-2-imre.deak@intel.com


# 33e7a975 06-May-2021 Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915/xelpd: First stab at DPT support

Add support for DPT (display page table). DPT is a
slightly peculiar two level page table scheme used for
tiled scanout buffers (linear uses direct ggtt mapping
still). The plane surface address will point at a page
in the DPT which holds the PTEs for 512 actual pages.
Thus we require 1/512 of the ggttt address space
compared to a direct ggtt mapping.

We create a new DPT address space for each framebuffer and
track two vmas (one for the DPT, another for the ggtt).

TODO:
- Is the i915_address_space approaach sane?
- Maybe don't map the whole DPT to write the PTEs?
- Deal with remapping/rotation? Need to create a
separate DPT for each remapped/rotated plane I
guess. Or else we'd need to make the per-fb DPT
large enough to support potentially several
remapped/rotated vmas. How large should that be?

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
Cc: Wilson Chris P <Chris.P.Wilson@intel.com>
Cc: Tang CQ <cq.tang@intel.com>
Cc: Auld Matthew <matthew.auld@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Reviewed-by: Wilson Chris P <Chris.P.Wilson@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210506161930.309688-5-imre.deak@intel.com


# 6aed5673 27-Apr-2021 Matthew Auld <matthew.auld@intel.com>

drm/i915/gtt/dgfx: place the PD in LMEM

It's a requirement that for dgfx we place all the paging structures in
device local-memory.

v2: use i915_coherent_map_type()
v3: improve the shared dma-resv object comment

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210427085417.120246-4-matthew.auld@intel.com


# 529b9ec8 27-Apr-2021 Matthew Auld <matthew.auld@intel.com>

drm/i915/gtt: map the PD up front

We need to generalise our accessor for the page directories and tables from
using the simple kmap_atomic to support local memory, and this setup
must be done on acquisition of the backing storage prior to entering
fence execution contexts. Here we replace the kmap with the object
mapping code that for simple single page shmemfs object will return a
plain kmap, that is then kept for the lifetime of the page directory.

Note that keeping the mapping around is a potential concern here, since
while the vma is pinned the mapping remains there for the PDs
underneath, or at least until the used_count reaches zero, at which
point we can safely destroy the mapping. For 32b this will be even worse
since the address space is more limited, but since this change mostly
impacts full ppGTT platforms, the justification is that for modern
platforms we shouldn't care too much about 32b.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210427085417.120246-3-matthew.auld@intel.com


# e762bdf5 03-Feb-2021 Matthew Auld <matthew.auld@intel.com>

drm/i915/gtt/dg1: add PTE_LM plumbing for GGTT

For the PTEs we get an LM bit, to signal whether the page resides in
SMEM or LMEM.

Based on a patch from Michel Thierry.

BSpec: 45015

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20210203171231.551338-3-matthew.auld@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 11724eea 03-Feb-2021 Matthew Auld <matthew.auld@intel.com>

drm/i915/gtt/dg1: add PTE_LM plumbing for ppGTT

For the PTEs we get an LM bit, to signal whether the page resides in
SMEM or LMEM.

BSpec: 45040

v2: just use gen8_pte_encode for dg1

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20210203171231.551338-2-matthew.auld@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 26ad4f8b 23-Mar-2021 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Use a single page table lock for each gtt.

We may create page table objects on the fly, but we may need to
wait with the ww lock held. Instead of waiting on a freed obj
lock, ensure we have the same lock for each object to keep
-EDEADLK working. This ensures that i915_vma_pin_ww can lock
the page tables when required.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-41-maarten.lankhorst@linux.intel.com


# 2a665968 23-Mar-2021 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Move pinning to inside engine_wa_list_verify()

This should be done as part of the ww loop, in order to remove a
i915_vma_pin that needs ww held.

Now only i915_ggtt_pin() callers remaining.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-25-maarten.lankhorst@linux.intel.com


# a4d86249 18-Dec-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Provide a utility to create a scratch buffer

Primarily used by selftests, but also by runtime debugging of engine
w/a, is a routine to create a temporarily bound buffer for readback.
Almagamate the duplicated routines into one.

Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201219020343.22681-2-chris@chris-wilson.co.uk


# 8a473dba 06-Oct-2020 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Fix DMA mapped scatterlist walks

When walking DMA mapped scatterlists sg_dma_len has to be used since it
can be different (coalesced) from the backing store entry.

This also means we have to end the walk when encountering a zero length
DMA entry and cannot rely on the normal sg list end marker.

Both issues were there in theory for some time but were hidden by the fact
Intel IOMMU driver was never coalescing entries. As there are ongoing
efforts to change this we need to start handling it.

v2:
* Use unsigned int for local storing sg_dma_len. (Logan)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
References: 85d1225ec066 ("drm/i915: Introduce & use new lightweight SGL iterators")
References: b31144c0daa8 ("drm/i915: Micro-optimise gen6_ppgtt_insert_entries()")
Reported-by: Tom Murphy <murphyt7@tcd.ie>
Suggested-by: Tom Murphy <murphyt7@tcd.ie> # __sgt_iter
Suggested-by: Logan Gunthorpe <logang@deltatee.com> # __sgt_iter
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201006092508.1064287-1-tvrtko.ursulin@linux.intel.com


# 82adf901 29-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Shrink i915_page_directory's slab bucket

kmalloc uses power-of-two slab buckets for small allocations (up to a
few pages). Since i915_page_directory is a page of pointers, plus a
couple more, this is rounded up to 8K, and we waste nearly 50% of that
allocation. Long terms this leads to poor memory utilisation, bloating
the kernel footprint, but the problem is exacerbated by our conservative
preallocation scheme for binding VMA. As we are required to allocate all
levels for each vma just in case we need to insert them upon binding,
this leads to a large multiplication factor for a single page vma. By
halving the allocation we need for the page directory structure, we
halve the impact of that factor, bringing workloads that once fitted into
memory, hopefully back to fitting into memory.

We maintain the split between i915_page_directory and i915_page_table as
we only need half the allocation for the lowest, most populous, level.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-3-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 89351925 29-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Switch to object allocations for page directories

The GEM object is grossly overweight for the practicality of tracking
large numbers of individual pages, yet it is currently our only
abstraction for tracking DMA allocations. Since those allocations need
to be reserved upfront before an operation, and that we need to break
away from simple system memory, we need to ditch using plain struct page
wrappers.

In the process, we drop the WC mapping as we ended up clflushing
everything anyway due to various issues across a wider range of
platforms. Though in a future step, we need to drop the kmap_atomic
approach which suggests we need to pre-map all the pages and keep them
mapped.

v2: Verify our large scratch page is suitably DMA aligned; and manually
clear the scratch since we are allocating plain struct pages full of
prior content.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-2-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# cd0452aa 29-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Preallocate stashes for vma page-directories

We need to make the DMA allocations used for page directories to be
performed up front so that we can include those allocations in our
memory reservation pass. The downside is that we have to assume the
worst case, even before we know the final layout, and always allocate
enough page directories for this object, even when there will be overlap.
This unfortunately can be quite expensive, especially as we have to
clear/reset the page directories and DMA pages, but it should only be
required during early phases of a workload when new objects are being
discovered, or after memory/eviction pressure when we need to rebind.
Once we reach steady state, the objects should not be moved and we no
longer need to preallocating the pages tables.

It should be noted that the lifetime for the page directories DMA is
more or less decoupled from individual fences as they will be shared
across objects across timelines.

v2: Only allocate enough PD space for the PTE we may use, we do not need
to allocate PD that will be left as scratch.
v3: Store the shift unto the first PD level to encapsulate the different
PTE counts for gen6/gen8.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 12b07256 03-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Export ppgtt_bind_vma

Reuse the ppgtt_bind_vma() for aliasing_ppgtt_bind_vma() so we can
reduce some code near-duplication. The catch is that we need to then
pass along the i915_address_space and not rely on vma->vm, as they
differ with the aliasing-ppgtt.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200703102519.26539-1-chris@chris-wilson.co.uk


# 0b6bc81d 16-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Allocate i915_fence_reg array

Since the number of fence regs can vary dramactically between platforms,
allocate the array on demand so we don't waste as much space.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200316113846.4974-4-chris@chris-wilson.co.uk


# f899f786 16-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move GGTT fence registers under gt/

Since the fence registers control HW detiling through the GGTT
aperture, make them a part of the intel_ggtt under gt/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200316113846.4974-1-chris@chris-wilson.co.uk


# ad2f9bc9 27-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Pull marking vm as closed underneath the vm->mutex

Pull the final atomic_dec of vm->open (marking the vm as closed)
underneath the same vm->mutex as used to close it. This is required to
correctly serialise with attempting to reuse the vma as the vm is closed
by a second thread.

References: 00de702c6c6f ("drm/i915: Check that the vma hasn't been closed before we insert it")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200227085723.1961649-10-chris@chris-wilson.co.uk


# 69edc390 26-Feb-2020 Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

drm/i915/ggtt: do not set bits 1-11 in gen12 ptes

On TGL, bits 2-4 in the GGTT PTE are not ignored anymore and are
instead used for some extra VT-d capabilities. We don't (yet?) have
support for those capabilities, but, given that we shared the pte_encode
function betweed GGTT and PPGTT, we still set those bits to the PPGTT
PPAT values. The DMA engine gets very confused when those bits are
set while the iommu is enabled, leading to errors. E.g. when loading
the GuC we get:

[ 9.796218] DMAR: DRHD: handling fault status reg 2
[ 9.796235] DMAR: [DMA Write] Request device [00:02.0] PASID ffffffff fault addr 0 [fault reason 02] Present bit in context entry is clear
[ 9.899215] [drm:intel_guc_fw_upload [i915]] *ERROR* GuC firmware signature verification failed

To fix this, just have dedicated gen8_pte_encode function per type of
gtt. Also, explicitly set vm->pte_encode for gen8_ppgtt, even if we
don't use it, to make sure we don't accidentally assign it to the GGTT
one, like we do for gen6_ppgtt, in case we need it in the future.

Reported-by: "Sodhi, Vunny" <vunny.sodhi@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200226185657.26445-1-daniele.ceraolospurio@intel.com


# e986209c 30-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Rename i915_gem_restore_ggtt_mappings() for its new placement

The i915_ggtt now sits beneath gt/ outside of the auspices of gem/ and
should be given a fresh name to reflect that. We also want to give it a
name that reflects its role in the system suspend/resume, with the
intention of pulling together all the GGTT operations (e.g. restoring
the fence registers once they are pulled under gt/intel_ggtt_detiler.c)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Rreviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200130181710.2030251-2-chris@chris-wilson.co.uk


# 742379c0 09-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Start chopping up the GPU error capture

In the near future, we will want to start a GPU error capture from a new
context, from inside the softirq region of a forced preemption. To do
so requires us to break up the monolithic error capture to provide new
entry points with finer control; in particular focusing on one
engine/gt, and being able to compose an error state from little pieces
of HW capture.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Acked-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200110123059.1348712-1-chris@chris-wilson.co.uk


# 2c86e55d 07-Jan-2020 Matthew Auld <matthew.auld@intel.com>

drm/i915/gtt: split up i915_gem_gtt

Attempt to split i915_gem_gtt.[ch] into more manageable chunks.

Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200107134009.3255354-1-chris@chris-wilson.co.uk