#
86ceaaae |
|
28-Nov-2023 |
Jonathan Cavitt <jonathan.cavitt@intel.com> |
drm/i915/gem: Atomically invalidate userptr on mmu-notifier Never block for outstanding work on userptr object upon receipt of a mmu-notifier. The reason we originally did so was to immediately unbind the userptr and unpin its pages, but since that has been dropped in commit b4b9731b02c3c ("drm/i915: Simplify userptr locking"), we never return the pages to the system i.e. never drop our page->mapcount and so do not allow the page and CPU PTE to be revoked. Based on this history, we know we are safe to drop the wait entirely. Upon return from mmu-notifier, we will still have the userptr pages pinned preventing the following PTE operation (such as try_to_unmap) adjusting the vm_area_struct, so it is safe to keep the pages around for as long as we still have i/o pending. We do not have any means currently to asynchronously revalidate the userptr pages, that is always prior to next use. Signed-off-by: Chris Wilson <chris.p.wilson@linux.intel.com> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231128162505.3493942-1-jonathan.cavitt@intel.com
|
#
31accc37 |
|
03-Dec-2023 |
Zhao Liu <zhao1.liu@intel.com> |
drm/i915: Use kmap_local_page() in gem/i915_gem_execbuffer.c The use of kmap_atomic() is being deprecated in favor of kmap_local_page()[1], and this patch converts the calls from kmap_atomic() to kmap_local_page(). The main difference between atomic and local mappings is that local mappings doesn't disable page faults or preemption (the preemption is disabled for !PREEMPT_RT case, otherwise it only disables migration). With kmap_local_page(), we can avoid the often unwanted side effect of unnecessary page faults and preemption disables. In i915_gem_execbuffer.c, eb->reloc_cache.vaddr is mapped by kmap_atomic() in eb_relocate_entry(), and is unmapped by kunmap_atomic() in reloc_cache_reset(). And this mapping/unmapping occurs in two places: one is in eb_relocate_vma(), and another is in eb_relocate_vma_slow(). The function eb_relocate_vma() or eb_relocate_vma_slow() doesn't need to disable pagefaults and preemption during the above mapping/ unmapping. So it can simply use kmap_local_page() / kunmap_local() that can instead do the mapping / unmapping regardless of the context. Convert the calls of kmap_atomic() / kunmap_atomic() to kmap_local_page() / kunmap_local(). [1]: https://lore.kernel.org/all/20220813220034.806698-1-ira.weiny@intel.com Suggested-by: Ira Weiny <ira.weiny@intel.com> Suggested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231203132947.2328805-10-zhao1.liu@linux.intel.com
|
#
786b96d0 |
|
22-Nov-2023 |
Thomas Zimmermann <tzimmermann@suse.de> |
drm/i915: Include <drm/drm_auth.h> One of the source files includes <drm/drm_auth.h> via <drm/drm_legacy.h>, which will be removed. Include drm_auth.h directly. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: intel-gfx@lists.freedesktop.org Reviewed-by: David Airlie <airlied@gmail.com> Reviewed-by: Daniel Vetter <daniel@ffwll.ch> Acked-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231122122449.11588-5-tzimmermann@suse.de
|
#
5e4e06e4 |
|
30-Oct-2023 |
Andrzej Hajda <andrzej.hajda@intel.com> |
drm/i915: Track gt pm wakerefs Track every intel_gt_pm_get() until its corresponding release in intel_gt_pm_put() by returning a cookie to the caller for acquire that must be passed by on released. When there is an imbalance, we can see who either tried to free a stale wakeref, or who forgot to free theirs. v2: track recently added calls in gen8_ggtt_bind_get_ce and destroyed_worker_func Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231030-ref_tracker_i915-v1-2-006fe6b96421@intel.com
|
#
bae9fca9 |
|
07-Nov-2023 |
Sam James <sam@gentoo.org> |
drm: i915: Adapt to -Walloc-size GCC 14 introduces a new -Walloc-size included in -Wextra which errors out like: ``` drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c: In function ‘eb_copy_relocations’: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:1681:24: error: allocation of insufficient size ‘1’ for type ‘struct drm_i915_gem_relocation_entry’ with size ‘32’ [-Werror=alloc-size] 1681 | relocs = kvmalloc_array(size, 1, GFP_KERNEL); | ^ ``` So, just swap the number of members and size arguments to match the prototype, as we're initialising 1 element of size `size`. GCC then sees we're not doing anything wrong. Signed-off-by: Sam James <sam@gentoo.org> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231107215538.1891359-1-sam@gentoo.org
|
#
2fc37c0c |
|
21-Sep-2023 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915/gem: remove inlines from i915_gem_execbuffer.c Just let the compiler decide what's best. Turns out absolutely nothing changes in the output with the inlines removed. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230921160637.3862597-1-jani.nikula@intel.com
|
#
18367034 |
|
03-Sep-2023 |
Jim Cromie <jim.cromie@gmail.com> |
drm/i915: add trailing newlines to msgs By at least strong convention, a print-buffer's trailing newline says "message complete, send it". The exception (no TNL, followed by a call to pr_cont) proves the general rule. Most DRM.debug calls already comport with this: 207 DRM_DEV_DEBUG, 1288 drm_dbg. Clean up the remainders, in maintainer sized chunks. No functional changes. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Rodrigo changed subject while pushing] Link: https://patchwork.freedesktop.org/patch/msgid/20230903184607.272198-4-jim.cromie@gmail.com
|
#
d3f23ab9 |
|
20-Jul-2023 |
Andrzej Hajda <andrzej.hajda@intel.com> |
drm/i915: use direct alias for i915 in requests i915_request contains direct alias to i915, there is no point to go via rq->engine->i915. v2: added missing rq.i915 initialization in measure_breadcrumb_dw. Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230720113002.1541572-1-andrzej.hajda@intel.com
|
#
f56fe3e9 |
|
08-Jun-2023 |
Nirmoy Das <nirmoy.das@intel.com> |
drm/i915: Fix a VMA UAF for multi-gt platform Ensure correct handling of closed VMAs on multi-gt platforms to prevent Use-After-Free. Currently, when GT0 goes idle, closed VMAs that are exclusively added to GT0's closed_vma link (gt->closed_vma) and subsequently freed by i915_vma_parked(), which assumes the entire GPU is idle. However, on platforms with multiple GTs, such as MTL, GT1 may remain active while GT0 is idle. This causes GT0 to mistakenly consider the closed VMAs in its closed_vma list as unnecessary, potentially leading to Use-After-Free issues if a job for GT1 attempts to access a freed VMA. Although we do take a wakeref for GT0 but it happens later, after evaluating VMAs. To mitigate this, it is necessary to hold a GT0 wakeref early. v2: Use gt id to detect multi-tile(Andi) Fix the incorrect error path. v3: Add more comment(Andi) Use the new gt var when possible(Andrzej) Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Chris Wilson <chris.p.wilson@intel.com> Cc: Andi Shyti <andi.shyti@linux.intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Tested-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Tested-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230608110103.777594-1-andi.shyti@linux.intel.com
|
#
927fc4a0 |
|
26-May-2023 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915/gem: drop unused but set variable unpinned Prepare for re-enabling -Wunused-but-set-variable. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Jouni Högander <jouni.hogander@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/6831c21567e8e84da424f32a8b7b48932803ab7b.1685119007.git.jani.nikula@intel.com
|
#
9275277d |
|
09-May-2023 |
Fei Yang <fei.yang@intel.com> |
drm/i915: use pat_index instead of cache_level Currently the KMD is using enum i915_cache_level to set caching policy for buffer objects. This is flaky because the PAT index which really controls the caching behavior in PTE has far more levels than what's defined in the enum. In addition, the PAT index is platform dependent, having to translate between i915_cache_level and PAT index is not reliable, and makes the code more complicated. From UMD's perspective there is also a necessity to set caching policy for performance fine tuning. It's much easier for the UMD to directly use PAT index because the behavior of each PAT index is clearly defined in Bspec. Having the abstracted i915_cache_level sitting in between would only cause more ambiguity. PAT is expected to work much like MOCS already works today, and by design userspace is expected to select the index that exactly matches the desired behavior described in the hardware specification. For these reasons this patch replaces i915_cache_level with PAT index. Also note, the cache_level is not completely removed yet, because the KMD still has the need of creating buffer objects with simple cache settings such as cached, uncached, or writethrough. For kernel objects, cache_level is used for simplicity and backward compatibility. For Pre-gen12 platforms PAT can have 1:1 mapping to i915_cache_level, so these two are interchangeable. see the use of LEGACY_CACHELEVEL. One consequence of this change is that gen8_pte_encode is no longer working for gen12 platforms due to the fact that gen12 platforms has different PAT definitions. In the meantime the mtl_pte_encode introduced specfically for MTL becomes generic for all gen12 platforms. This patch renames the MTL PTE encode function into gen12_pte_encode and apply it to all gen12. Even though this change looks unrelated, but separating them would temporarily break gen12 PTE encoding, thus squash them in one patch. Special note: this patch changes the way caching behavior is controlled in the sense that some objects are left to be managed by userspace. For such objects we need to be careful not to change the userspace settings.There are kerneldoc and comments added around obj->cache_coherent, cache_dirty, and how to bypass the checkings by i915_gem_object_has_cache_level. For full understanding, these changes need to be looked at together with the two follow-up patches, one disables the {set|get}_caching ioctl's and the other adds set_pat extension to the GEM_CREATE uAPI. Bspec: 63019 Cc: Chris Wilson <chris.p.wilson@linux.intel.com> Signed-off-by: Fei Yang <fei.yang@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230509165200.1740-3-fei.yang@intel.com
|
#
ae1da08f |
|
16-Mar-2023 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
drm/i915: Simplify vcs/bsd engine selection No need to look at the mask of present engines when we already have a count stored ever since e2d0ff3525b9 ("drm/i915: Count engine instances per uabi class"). Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230316142728.1335239-1-tvrtko.ursulin@linux.intel.com [tursulin: fixup typo in patch title]
|
#
960dafa3 |
|
03-Feb-2023 |
Rob Clark <robdclark@chromium.org> |
drm/i915: Move fd_install after last use of fence Because eb_composite_fence_create() drops the fence_array reference after creation of the sync_file, only the sync_file holds a ref to the fence. But fd_install() makes that reference visable to userspace, so it must be the last thing we do with the fence. Signed-off-by: Rob Clark <robdclark@chromium.org> Fixes: 00dae4d3d35d ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)") Cc: <stable@vger.kernel.org> # v5.15+ [tursulin: Added stable tag.] Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230203164937.4035503-1-robdclark@gmail.com
|
#
801fa7a8 |
|
16-Dec-2022 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915: improve the catch-all evict to handle lock contention The catch-all evict can fail due to object lock contention, since it only goes as far as trylocking the object, due to us already holding the vm->mutex. Doing a full object lock here can deadlock, since the vm->mutex is always our inner lock. Add another execbuf pass which drops the vm->mutex and then tries to grab the object will the full lock, before then retrying the eviction. This should be good enough for now to fix the immediate regression with userspace seeing -ENOSPC from execbuf due to contended object locks during GTT eviction. v2 (Mani) - Also revamp the docs for the different passes. Testcase: igt@gem_ppgtt@shrink-vs-evict-* Fixes: 7e00897be8bf ("drm/i915: Add object locking to i915_gem_evict_for_node and i915_gem_evict_something, v2.") References: https://gitlab.freedesktop.org/drm/intel/-/issues/7627 References: https://gitlab.freedesktop.org/drm/intel/-/issues/7570 References: https://bugzilla.mozilla.org/show_bug.cgi?id=1779558 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Mani Milani <mani@chromium.org> Cc: <stable@vger.kernel.org> # v5.18+ Reviewed-by: Mani Milani <mani@chromium.org> Tested-by: Mani Milani <mani@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20221216113456.414183-1-matthew.auld@intel.com
|
#
f67986b0 |
|
08-Dec-2022 |
Alan Previn <alan.previn.teres.alexis@intel.com> |
drm/i915/pxp: Promote pxp subsystem to top-level of i915 Starting with MTL, there will be two GT-tiles, a render and media tile. PXP as a service for supporting workloads with protected contexts and protected buffers can be subscribed by process workloads on any tile. However, depending on the platform, only one of the tiles is used for control events pertaining to PXP operation (such as creating the arbitration session and session tear-down). PXP as a global feature is accessible via batch buffer instructions on any engine/tile and the coherency across tiles is handled implicitly by the HW. In fact, for the foreseeable future, we are expecting this single-control-tile for the PXP subsystem. In MTL, it's the standalone media tile (not the root tile) because it contains the VDBOX and KCR engine (among the assets PXP relies on for those events). Looking at the current code design, each tile is represented by the intel_gt structure while the intel_pxp structure currently hangs off the intel_gt structure. Keeping the intel_pxp structure within the intel_gt structure makes some internal functionalities more straight forward but adds code complexity to code readability and maintainibility to many external-to-pxp subsystems which may need to pick the correct intel_gt structure. An example of this would be the intel_pxp_is_active or intel_pxp_is_enabled functionality which should be viewed as a global level inquiry, not a per-gt inquiry. That said, this series promotes the intel_pxp structure into the drm_i915_private structure making it a top-level subsystem and the PXP subsystem will select the control gt internally and keep a pointer to it for internal reference. This promotion comes with two noteworthy changes: 1. Exported pxp functions that are called by external subsystems (such as intel_pxp_enabled/active) will have to check implicitly if i915->pxp is valid as that structure will not be allocated for HW that doesn't support PXP. 2. Since GT is now considered a soft-dependency of PXP we are ensuring that GT init happens before PXP init and vice versa for fini. This causes a minor ordering change whereby we previously called intel_pxp_suspend after intel_uc_suspend but now is before i915_gem_suspend_late but the change is required for correct dependency flows. Additionally, this re-order change doesn't have any impact because at that point in either case, the top level entry to i915 won't observe any PXP events (since the GPU was quiesced during suspend_prepare). Also, any PXP event doesn't really matter when we disable the PXP HW (global GT irqs are already off anyway, so even if there was a bug that generated spurious events we wouldn't see it and we would just clean it up on resume which is okay since the default fallback action for PXP would be to keep the sessions off at this suspend stage). Changes from prior revs: v11: - Reformat a comment (Tvrtko). v10: - Change the code flow for intel_pxp_init to make it more cleaner and readible with better comments explaining the difference between full-PXP-feature vs the partial-teelink inits depending on the platform. Additionally, only do the pxp allocation when we are certain the subsystem is needed. (Tvrtko). v9: - Cosmetic cleanups in supported/enabled/active. (Daniele). - Add comments for intel_pxp_init and pxp_get_ctrl_gt that explain the functional flow for when PXP is not supported but the backend-assets are needed for HuC authentication (Daniele and Tvrtko). - Fix two remaining functions that are accessible outside PXP that need to be checking pxp ptrs before using them: intel_pxp_irq_handler and intel_pxp_huc_load_and_auth (Tvrtko and Daniele). - User helper macro in pxp-debugfs (Tvrtko). v8: - Remove pxp_to_gt macro (Daniele). - Fix a bug in pxp_get_ctrl_gt for the case of MTL and we don't support GSC-FW on it. (Daniele). - Leave i915->pxp as NULL if we dont support PXP and in line with that, do additional validity check on i915->pxp for intel_pxp_is_supported/enabled/active (Daniele). - Remove unncessary include header from intel_gt_debugfs.c and check drm_minor i915->drm.primary (Daniele). - Other cosmetics / minor issues / more comments on suspend flow order change (Daniele). v7: - Drop i915_dev_to_pxp and in intel_pxp_init use 'i915->pxp' through out instead of local variable newpxp. (Rodrigo) - In the case intel_pxp_fini is called during driver unload but after i915 loading failed without pxp being allocated, check i915->pxp before referencing it. (Alan) v6: - Remove HAS_PXP macro and replace it with intel_pxp_is_supported because : [1] introduction of 'ctrl_gt' means we correct this for MTL's upcoming series now. [2] Also, this has little impact globally as its only used by PXP-internal callers at the moment. - Change intel_pxp_init/fini to take in i915 as its input to avoid ptr-to-ptr in init/fini calls.(Jani). - Remove the backpointer from pxp->i915 since we can use pxp->ctrl_gt->i915 if we need it. (Rodrigo). v5: - Switch from series to single patch (Rodrigo). - change function name from pxp_get_kcr_owner_gt to pxp_get_ctrl_gt. - Fix CI BAT failure by removing redundant call to intel_pxp_fini from driver-remove. - NOTE: remaining open still persists on using ptr-to-ptr and back-ptr. v4: - Instead of maintaining intel_pxp as an intel_gt structure member and creating a number of convoluted helpers that takes in i915 as input and redirects to the correct intel_gt or takes any intel_gt and internally replaces with the correct intel_gt, promote it to be a top-level i915 structure. v3: - Rename gt level helper functions to "intel_pxp_is_enabled/ supported/ active_on_gt" (Daniele) - Upgrade _gt_supports_pxp to replace what was intel_gtpxp_is supported as the new intel_pxp_is_supported_on_gt to check for PXP feature support vs the tee support for huc authentication. Fix pxp-debugfs-registration to use only the former to decide support. (Daniele) - Couple minor optimizations. v2: - Avoid introduction of new device info or gt variables and use existing checks / macros to differentiate the correct GT->PXP control ownership (Daniele Ceraolo Spurio) - Don't reuse the updated global-checkers for per-GT callers (such as other files within PXP) to avoid unnecessary GT-reparsing, expose a replacement helper like the prior ones. (Daniele). v1: - Add one more patch to the series for the intel_pxp suspend/resume for similar refactoring References: https://patchwork.freedesktop.org/patch/msgid/20221202011407.4068371-1-alan.previn.teres.alexis@intel.com Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221208180542.998148-1-alan.previn.teres.alexis@intel.com
|
#
8e4ee5e8 |
|
30-Nov-2022 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Wrap all access to i915_vma.node.start|size We already wrap i915_vma.node.start for use with the GGTT, as there we can perform additional sanity checks that the node belongs to the GGTT and fits within the 32b registers. In the next couple of patches, we will introduce guard pages around the objects _inside_ the drm_mm_node allocation. That is we will offset the vma->pages so that the first page is at drm_mm_node.start + vma->guard (not 0 as is currently the case). All users must then not use i915_vma.node.start directly, but compute the guard offset, thus all users are converted to use a i915_vma_offset() wrapper. The notable exceptions are the selftests that are testing exact behaviour of i915_vma_pin/i915_vma_insert. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com> Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221130235805.221010-3-andi.shyti@linux.intel.com
|
#
251e8c5b |
|
03-Feb-2023 |
Rob Clark <robdclark@chromium.org> |
drm/i915: Move fd_install after last use of fence Because eb_composite_fence_create() drops the fence_array reference after creation of the sync_file, only the sync_file holds a ref to the fence. But fd_install() makes that reference visable to userspace, so it must be the last thing we do with the fence. Signed-off-by: Rob Clark <robdclark@chromium.org> Fixes: 00dae4d3d35d ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)") Cc: <stable@vger.kernel.org> # v5.15+ [tursulin: Added stable tag.] Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230203164937.4035503-1-robdclark@gmail.com (cherry picked from commit 960dafa30455450d318756a9896a02727f2639e0) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
3f882f2d |
|
16-Dec-2022 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915: improve the catch-all evict to handle lock contention The catch-all evict can fail due to object lock contention, since it only goes as far as trylocking the object, due to us already holding the vm->mutex. Doing a full object lock here can deadlock, since the vm->mutex is always our inner lock. Add another execbuf pass which drops the vm->mutex and then tries to grab the object will the full lock, before then retrying the eviction. This should be good enough for now to fix the immediate regression with userspace seeing -ENOSPC from execbuf due to contended object locks during GTT eviction. v2 (Mani) - Also revamp the docs for the different passes. Testcase: igt@gem_ppgtt@shrink-vs-evict-* Fixes: 7e00897be8bf ("drm/i915: Add object locking to i915_gem_evict_for_node and i915_gem_evict_something, v2.") References: https://gitlab.freedesktop.org/drm/intel/-/issues/7627 References: https://gitlab.freedesktop.org/drm/intel/-/issues/7570 References: https://bugzilla.mozilla.org/show_bug.cgi?id=1779558 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Mani Milani <mani@chromium.org> Cc: <stable@vger.kernel.org> # v5.18+ Reviewed-by: Mani Milani <mani@chromium.org> Tested-by: Mani Milani <mani@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20221216113456.414183-1-matthew.auld@intel.com (cherry picked from commit 801fa7a81f6da533cc5442fc40e32c72b76cd42a) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
8032bf12 |
|
09-Oct-2022 |
Jason A. Donenfeld <Jason@zx2c4.com> |
treewide: use get_random_u32_below() instead of deprecated function This is a simple mechanical transformation done by: @@ expression E; @@ - prandom_u32_max + get_random_u32_below (E) Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Reviewed-by: SeongJae Park <sj@kernel.org> # for damon Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> # for arm Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
#
2a76fc89 |
|
19-Oct-2022 |
Andrzej Hajda <andrzej.hajda@intel.com> |
drm/i915: call i915_request_await_object from _i915_vma_move_to_active Since almost all calls to i915_vma_move_to_active are prepended with i915_request_await_object, let's call the latter from _i915_vma_move_to_active by default and add flag allowing bypassing it. Adjust all callers accordingly. The patch should not introduce functional changes. Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221019215906.295296-2-andrzej.hajda@intel.com
|
#
801543b2 |
|
09-Nov-2022 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: stop including i915_irq.h from i915_trace.h Turns out many of the files that need i915_reg.h get it implicitly via {display/intel_de.h, gt/intel_context.h} -> i915_trace.h -> i915_irq.h -> i915_reg.h. Since i915_trace.h doesn't actually need i915_irq.h, makes sense to drop it, but that requires adding quite a few new includes all over the place. Prefer including i915_reg.h where needed instead of adding another implicit include, because eventually we'll want to split up i915_reg.h and only include the specific registers at each place. Also some places actually needed i915_irq.h too. Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/6e78a2e0ac1bffaf5af3b5ccc21dff05e6518cef.1668008071.git.jani.nikula@intel.com
|
#
a10234fd |
|
09-Nov-2022 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
drm/i915: Partial abandonment of legacy DRM logging macros Convert some usages of legacy DRM logging macros into versions which tell us on which device have the events occurred. v2: * Don't have struct drm_device as local. (Jani, Ville) v3: * Store gt, not i915, in workaround list. (John) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221109104633.2579245-1-tvrtko.ursulin@linux.intel.com
|
#
b801d714 |
|
27-Sep-2022 |
Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> |
drm/i915: Remove unwanted pointer unpacking In await_fence_array(), unpacking syncobj pointer is not needed. Remove it. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220927181346.1187-1-niranjana.vishwanathapura@intel.com
|
#
81895a65 |
|
05-Oct-2022 |
Jason A. Donenfeld <Jason@zx2c4.com> |
treewide: use prandom_u32_max() when possible, part 1 Rather than incurring a division or requesting too many random bytes for the given range, use the prandom_u32_max() function, which only takes the minimum required bytes from the RNG and avoids divisions. This was done mechanically with this coccinelle script: @basic@ expression E; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u64; @@ ( - ((T)get_random_u32() % (E)) + prandom_u32_max(E) | - ((T)get_random_u32() & ((E) - 1)) + prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2) | - ((u64)(E) * get_random_u32() >> 32) + prandom_u32_max(E) | - ((T)get_random_u32() & ~PAGE_MASK) + prandom_u32_max(PAGE_SIZE) ) @multi_line@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; identifier RAND; expression E; @@ - RAND = get_random_u32(); ... when != RAND - RAND %= (E); + RAND = prandom_u32_max(E); // Find a potential literal @literal_mask@ expression LITERAL; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; position p; @@ ((T)get_random_u32()@p & (LITERAL)) // Add one to the literal. @script:python add_one@ literal << literal_mask.LITERAL; RESULT; @@ value = None if literal.startswith('0x'): value = int(literal, 16) elif literal[0] in '123456789': value = int(literal, 10) if value is None: print("I don't know how to handle %s" % (literal)) cocci.include_match(False) elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1: print("Skipping 0x%x for cleanup elsewhere" % (value)) cocci.include_match(False) elif value & (value + 1) != 0: print("Skipping 0x%x because it's not a power of two minus one" % (value)) cocci.include_match(False) elif literal.startswith('0x'): coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1)) else: coccinelle.RESULT = cocci.make_expr("%d" % (value + 1)) // Replace the literal mask with the calculated result. @plus_one@ expression literal_mask.LITERAL; position literal_mask.p; expression add_one.RESULT; identifier FUNC; @@ - (FUNC()@p & (LITERAL)) + prandom_u32_max(RESULT) @collapse_ret@ type T; identifier VAR; expression E; @@ { - T VAR; - VAR = (E); - return VAR; + return E; } @drop_var@ type T; identifier VAR; @@ { - T VAR; ... when != VAR } Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: KP Singh <kpsingh@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
#
3890f749 |
|
11-Jul-2022 |
Lu Baolu <baolu.lu@linux.intel.com> |
drm/i915: Remove unnecessary include intel-iommu.h is not needed in drm/i915 anymore. Remove its include. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Link: https://lore.kernel.org/r/20220514014322.2927339-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
#
71b1669e |
|
29-Jun-2022 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915/uapi: tweak error capture on recoverable contexts A non-recoverable context must be used if the user wants proper error capture on discrete platforms. In the future the kernel may want to blit the contents of some objects when later doing the capture stage. Also extend to newer integrated platforms. v2(Thomas): - Also extend to newer integrated platforms, for capture buffer memory allocation purposes. v3 (Reported-by: kernel test robot <lkp@intel.com>): - Fix build on !CONFIG_DRM_I915_CAPTURE_ERROR Testcase: igt@gem_exec_capture@capture-recoverable Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jordan Justen <jordan.l.justen@intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220629174350.384910-8-matthew.auld@intel.com
|
#
420a07b8 |
|
25-May-2022 |
Nirmoy Das <nirmoy.das@intel.com> |
drm/i915: Individualize fences before adding to dma_resv obj _i915_vma_move_to_active() can receive > 1 fences for multiple batch buffers submission. Because dma_resv_add_fence() can only accept one fence at a time, change _i915_vma_move_to_active() to be aware of multiple fences so that it can add individual fences to the dma resv object. v6: fix multi-line comment. v5: remove double fence reservation for batch VMAs. v4: Reserve fences for composite_fence on multi-batch contexts and also reserve fence slots to composite_fence for each VMAs. v3: dma_resv_reserve_fences is not cumulative so pass num_fences. v2: make sure to reserve enough fence slots before adding. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5614 Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf") Cc: <stable@vger.kernel.org> # v5.16+ Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220525095955.15371-1-nirmoy.das@intel.com
|
#
451374ee |
|
11-May-2022 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Use i915_gem_object_ggtt_pin_ww for reloc_iomap When removing short term pins, I've changed the the batch buffer pinning for relocation to use __i915_vma_pin, because i915_gem_object_ggtt_pin_ww was destroying the old vma. This caused regressions, because the functions are not identical. Fix the regressions by calling i915_gem_object_ggtt_pin_ww() again on ggtt-only platforms, but only if the batch can be pinned without being moved. Fixes: b5cfe6f7a6e1 ("drm/i915: Remove short-term pins from execbuf, v6.") Cc: Matthew Auld <matthew.auld@intel.com> Reported-by: Mateusz Jończyk <mat.jonczyk@o2.pl> Tested-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Acked-by: Matthew Auld <matthew.auld@intel.com> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5806 Link: https://patchwork.freedesktop.org/patch/msgid/20220511115219.46507-1-maarten.lankhorst@linux.intel.com
|
#
842d9346 |
|
25-May-2022 |
Nirmoy Das <nirmoy.das@intel.com> |
drm/i915: Individualize fences before adding to dma_resv obj _i915_vma_move_to_active() can receive > 1 fences for multiple batch buffers submission. Because dma_resv_add_fence() can only accept one fence at a time, change _i915_vma_move_to_active() to be aware of multiple fences so that it can add individual fences to the dma resv object. v6: fix multi-line comment. v5: remove double fence reservation for batch VMAs. v4: Reserve fences for composite_fence on multi-batch contexts and also reserve fence slots to composite_fence for each VMAs. v3: dma_resv_reserve_fences is not cumulative so pass num_fences. v2: make sure to reserve enough fence slots before adding. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5614 Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf") Cc: <stable@vger.kernel.org> # v5.16+ Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220525095955.15371-1-nirmoy.das@intel.com (cherry picked from commit 420a07b841d03f6a436d8c06571c69aa5c783897) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
c8d4c18b |
|
16-Nov-2021 |
Christian König <christian.koenig@amd.com> |
dma-buf/drivers: make reserving a shared slot mandatory v4 Audit all the users of dma_resv_add_excl_fence() and make sure they reserve a shared slot also when only trying to add an exclusive fence. This is the next step towards handling the exclusive fence like a shared one. v2: fix missed case in amdgpu v3: and two more radeon, rename function v4: add one more case to TTM, fix i915 after rebase Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20220406075132.3263-2-christian.koenig@amd.com
|
#
89754df8 |
|
21-Mar-2022 |
Michael Cheng <michael.cheng@intel.com> |
drm/i915/: Re-work clflush_write32 Use drm_clflush_virt_range instead of clflushopt and remove the memory barrier, since drm_clflush_virt_range takes care of that. v2(Michael Cheng): Use sizeof(*addr) instead of sizeof(addr) to get the actual size of the page. Thanks to Matt Roper for pointing this out. Signed-off-by: Michael Cheng <michael.cheng@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220321223819.72833-5-michael.cheng@intel.com
|
#
e1a7ab4f |
|
04-Mar-2022 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/i915: Remove the vm open count vms are not getting properly closed. Rather than fixing that, Remove the vm open count and instead rely on the vm refcount. The vm open count existed solely to break the strong references the vmas had on the vms. Now instead make those references weak and ensure vmas are destroyed when the vm is destroyed. Unfortunately if the vm destructor and the object destructor both wants to destroy a vma, that may lead to a race in that the vm destructor just unbinds the vma and leaves the actual vma destruction to the object destructor. However in order for the object destructor to ensure the vma is unbound it needs to grab the vm mutex. In order to keep the vm mutex alive until the object destructor is done with it, somewhat hackishly grab a vm_resv refcount that is released late in the vma destruction process, when the vm mutex is no longer needed. v2: Address review-comments from Niranjana - Clarify that the struct i915_address_space::skip_pte_rewrite is a hack and should ideally be replaced in an upcoming patch. - Remove an unneeded continue in clear_vm_list and update comment. v3: - Documentation update - Commit message formatting Co-developed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220304082641.308069-2-thomas.hellstrom@linux.intel.com
|
#
e9b67ec2 |
|
03-Mar-2022 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: include linux/highmem.h and linux/swap.h where needed Include linux/highmem.h and linux/swap.h explicitly where needed so we can drop the linux/i2c.h include from i915_drv.h where it pulled in the dependencies implicitly. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220303181931.1661767-5-jani.nikula@intel.com
|
#
7b1d6924 |
|
11-May-2022 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Use i915_gem_object_ggtt_pin_ww for reloc_iomap When removing short term pins, I've changed the the batch buffer pinning for relocation to use __i915_vma_pin, because i915_gem_object_ggtt_pin_ww was destroying the old vma. This caused regressions, because the functions are not identical. Fix the regressions by calling i915_gem_object_ggtt_pin_ww() again on ggtt-only platforms, but only if the batch can be pinned without being moved. Fixes: b5cfe6f7a6e1 ("drm/i915: Remove short-term pins from execbuf, v6.") Cc: Matthew Auld <matthew.auld@intel.com> Reported-by: Mateusz Jończyk <mat.jonczyk@o2.pl> Tested-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Acked-by: Matthew Auld <matthew.auld@intel.com> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5806 Link: https://patchwork.freedesktop.org/patch/msgid/20220511115219.46507-1-maarten.lankhorst@linux.intel.com (cherry picked from commit 451374eef622fca6f00eeeda89aaccb45a30a149) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
5472b3f2 |
|
10-Feb-2022 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: split out i915_file_private.h from i915_drv.h Limit the scope of struct drm_i915_file_private to the files that actually need it. Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/e375859dc1729a1b988036e4103e5b1bd48caa00.1644507885.git.jani.nikula@intel.com
|
#
cb935c46 |
|
11-Jan-2022 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915: Lock timeline mutex directly in error path of eb_pin_timeline Don't use the interruptable version of the timeline mutex lock in the error path of eb_pin_timeline as the cleanup must always happen. v2: (John Harrison) - Don't check for interrupt during mutex lock v3: (Tvrtko) - A comment explaining why lock helper isn't used Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220111163929.14017-1-matthew.brost@intel.com
|
#
b5cfe6f7 |
|
14-Jan-2022 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Remove short-term pins from execbuf, v6. Add a flag PIN_VALIDATE, to indicate we don't need to pin and only protected by the object lock. This removes the need to unpin, which is done by just releasing the lock. eb_reserve is slightly reworked for readability, but the same steps are still done: - First pass pins with NONBLOCK. - Second pass unbinds all objects first, then pins. - Third pass is only called when not all objects are softpinned, and unbinds all objects, then calls i915_gem_evict_vm(), then pins. Changes since v1: - Split out eb_reserve() into separate functions for readability. Changes since v2: - Make batch buffer mappable on platforms where only GGTT is available, to prevent moving the batch buffer during relocations. Changes since v3: - Preserve current behavior for batch buffer, instead be cautious when calling i915_gem_object_ggtt_pin_ww, and re-use the current batch vma if it's inside ggtt and map-and-fenceable. - Remove impossible condition check from eb_reserve. (Matt) Changes since v5: - Do not even temporarily pin, just call i915_gem_evict_vm() and mark all vma's as unpinned. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220114132320.109030-7-maarten.lankhorst@linux.intel.com
|
#
6945c53b |
|
17-Jan-2022 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Add locking to i915_gem_evict_vm(), v3. i915_gem_evict_vm will need to be able to evict objects that are locked by the current ctx. By testing if the current context already locked the object, we can do this correctly. This allows us to evict the entire vm even if we already hold some objects' locks. Previously, this was spread over several commits, but it makes more sense to commit the changes to i915_gem_evict_vm separately from the changes to i915_gem_evict_something() and i915_gem_evict_for_node(). Changes since v1: - Handle evicting dead objects better. Changes since v2: - Use for_i915_gem_ww in igt_evict_vm. (Thomas) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> [mlankhorst: Fix up doc warning.] Link: https://patchwork.freedesktop.org/patch/msgid/20220117075604.131477-1-maarten.lankhorst@linux.intel.com
|
#
60dc43d1 |
|
10-Jan-2022 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/i915: Use struct vma_resource instead of struct vma_snapshot There is always a struct vma_resource guaranteed to be alive when we access a corresponding struct vma_snapshot. So ditch the latter and instead of allocating vma_snapshots, reference the already existning vma_resource. This requires a couple of extra members in struct vma_resource but that's a small price to pay for the simplification. v2: - Fix a missing include and declaration (kernel test robot <lkp@intel.com>) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220110172219.107131-7-thomas.hellstrom@linux.intel.com
|
#
e1a4bbb6 |
|
10-Jan-2022 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/i915: Initial introduction of vma resources Introduce vma resources, sort of similar to TTM resources, needed for asynchronous bind management. Initially we will use them to hold completion of unbinding when we capture data from a vma, but they will be used extensively in upcoming patches for asynchronous vma unbinding. v6: - Some documentation updates Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220110172219.107131-2-thomas.hellstrom@linux.intel.com
|
#
2ef97818 |
|
07-Jan-2022 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: split out i915_gem_evict.h from i915_drv.h We already have the i915_gem_evict.c file. v2: Fixed commit message (Tvrtko) Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ec666853171d04daeb21a93083940df36907c343.1641561552.git.jani.nikula@intel.com
|
#
23d639d7 |
|
07-Jan-2022 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: split out i915_cmd_parser.h from i915_drv.h We already have the i915_cmd_parser.c file. Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1a02b8788266f4f2fd4de12808b55c4a66179e98.1641561552.git.jani.nikula@intel.com
|
#
5c24c9d2 |
|
19-Dec-2021 |
Michał Winiarski <michal.winiarski@intel.com> |
drm/i915/gem: Use to_gt() helper for GGTT accesses GGTT is currently available both through i915->ggtt and gt->ggtt, and we eventually want to get rid of the i915->ggtt one. Use to_gt() for all i915->ggtt accesses to help with the future refactoring. Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211219212500.61432-4-andi.shyti@linux.intel.com
|
#
62eeb9ae |
|
14-Dec-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915: Increment composite fence seqno Increment composite fence seqno on each fence creation. Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211214195913.35735-1-matthew.brost@intel.com
|
#
5ae13c30 |
|
11-Jan-2022 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915: Lock timeline mutex directly in error path of eb_pin_timeline Don't use the interruptable version of the timeline mutex lock in the error path of eb_pin_timeline as the cleanup must always happen. v2: (John Harrison) - Don't check for interrupt during mutex lock v3: (Tvrtko) - A comment explaining why lock helper isn't used Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220111163929.14017-1-matthew.brost@intel.com (cherry picked from commit cb935c4618bd2ff9058feee4af7088446da6a763) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
#
c2ea703d |
|
21-Dec-2021 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/i915: Require the vm mutex for i915_vma_bind() Protect updates of struct i915_vma flags and async binding / unbinding with the vm::mutex. This means that i915_vma_bind() needs to assert vm::mutex held. In order to make that possible drop the caching of kmap_atomic() maps around i915_vma_bind(). An alternative would be to use kmap_local() but since we block cpu unplugging during sleeps inside kmap_local() sections this may have unwanted side-effects. Particularly since we might wait for gpu while holding the vm mutex. This change may theoretically increase execbuf cpu-usage on snb, but at least on non-highmem systems that increase should be very small. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211221200050.436316-5-thomas.hellstrom@linux.intel.com
|
#
1a9c4db4 |
|
14-Dec-2021 |
Michał Winiarski <michal.winiarski@intel.com> |
drm/i915/gem: Use to_gt() helper Use to_gt() helper consistently throughout the codebase. Pure mechanical s/i915->gt/to_gt(i915). No functional changes. Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211214193346.21231-6-andi.shyti@linux.intel.com
|
#
8722ded4 |
|
01-Dec-2021 |
Dan Carpenter <dan.carpenter@oracle.com> |
drm/i915: Fix error pointer dereference in i915_gem_do_execbuffer() Originally "out_fence" was set using out_fence = sync_file_create() but which returns NULL, but now it is set with out_fence = eb_requests_create() which returns error pointers. The error path needs to be modified to avoid an Oops in the "goto err_request;" path. Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211202044831.29583-1-matthew.brost@intel.com
|
#
ff20afc4 |
|
29-Nov-2021 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/i915: Update error capture code to avoid using the current vma state With asynchronous migrations, the vma state may be several migrations ahead of the state that matches the request we're capturing. Address that by introducing an i915_vma_snapshot structure that can be used to snapshot relevant state at request submission. In order to make sure we access the correct memory, the snapshots take references on relevant sg-tables and memory regions. Also move the capture list allocation out of the fence signaling critical path and use the CONFIG_DRM_I915_CAPTURE_ERROR define to avoid compiling in members and functions used for error capture when they're not used. Finally, Introduce lockdep annotation. v4: - Break out the capture allocation mode change to a separate patch. v5: - Fix compilation error in the !CONFIG_DRM_I915_CAPTURE_ERROR case (kernel test robot) v6: - Use #if IS_ENABLED() instead of #ifdef to match driver style. - Move yet another change of allocation mode to the separate patch. - Commit message rework due to patch reordering. v7: - Adjust for removal of region refcounting. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Ramalingam C <ramalingam.c@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211129202245.472043-1-thomas.hellstrom@linux.intel.com
|
#
95c3d275 |
|
17-Nov-2021 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Remove resv from i915_vma It's just an alias to vma->obj->base.resv, no need to duplicate it. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211117142024.1043017-5-matthew.auld@intel.com
|
#
86752bd6 |
|
25-Oct-2021 |
Wan Jiabing <wanjiabing@vivo.com> |
drm/i915: Use ERR_CAST instead of ERR_PTR(PTR_ERR()) Fix following coccicheck warning: ./drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:3117:15-22: WARNING: ERR_CAST can be used with eb->requests[i]. Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211025113316.24424-1-wanjiabing@vivo.com
|
#
d46f329a |
|
14-Dec-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915: Increment composite fence seqno Increment composite fence seqno on each fence creation. Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211214195913.35735-1-matthew.brost@intel.com (cherry picked from commit 62eeb9ae1364cd96991ccc6e3c5c69d66b8c64df) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
9cdb54be |
|
01-Dec-2021 |
Dan Carpenter <dan.carpenter@oracle.com> |
drm/i915: Fix error pointer dereference in i915_gem_do_execbuffer() Originally "out_fence" was set using out_fence = sync_file_create() but which returns NULL, but now it is set with out_fence = eb_requests_create() which returns error pointers. The error path needs to be modified to avoid an Oops in the "goto err_request;" path. Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211202044831.29583-1-matthew.brost@intel.com (cherry picked from commit 8722ded49ce8a0c706b373e8087eb810684962ff) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
df94fd05 |
|
18-Oct-2021 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915: expand on the kernel-doc for cache_dirty Add some details around non-LLC platforms and cflushing, when dealing with the flush-on-acquire, which is potentially security sensitive. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-7-matthew.auld@intel.com
|
#
7647f009 |
|
14-Oct-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915: Update I915_GEM_BUSY IOCTL to understand composite fences Parallel submission create composite fences (dma_fence_array) for excl / shared slots in objects. The I915_GEM_BUSY IOCTL checks these slots to determine the busyness of the object. Prior to patch it only check if the fence in the slot was a i915_request. Update the check to understand composite fences and correctly report the busyness. v2: (Tvrtko) - Remove duplicate BUILD_BUG_ON Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-24-matthew.brost@intel.com
|
#
544460c3 |
|
14-Oct-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915: Multi-BB execbuf Allow multiple batch buffers to be submitted in a single execbuf IOCTL after a context has been configured with the 'set_parallel' extension. The number batches is implicit based on the contexts configuration. This is implemented with a series of loops. First a loop is used to find all the batches, a loop to pin all the HW contexts, a loop to create all the requests, a loop to submit (emit BB start, etc...) all the requests, a loop to tie the requests to the VMAs they touch, and finally a loop to commit the requests to the backend. A composite fence is also created for the generated requests to return to the user and to stick in dma resv slots. No behavior from the existing IOCTL should be changed aside from when throttling because the ring for a context is full. In this situation, i915 will now wait while holding the object locks. This change was done because the code is much simpler to wait while holding the locks and we believe there isn't a huge benefit of dropping these locks. If this proves false we can restructure the code to drop the locks during the wait. IGT: https://patchwork.freedesktop.org/patch/447008/?series=93071&rev=1 media UMD: https://github.com/intel/media-driver/pull/1252 v2: (Matthew Brost) - Return proper error value if i915_request_create fails v3: (John Harrison) - Add comment explaining create / add order loops + locking - Update commit message explaining different in IOCTL behavior - Line wrap some comments - eb_add_request returns void - Return -EINVAL rather triggering BUG_ON if cmd parser used (Checkpatch) - Check eb->batch_len[*current_batch] v4: (CI) - Set batch len if passed if via execbuf args - Call __i915_request_skip after __i915_request_commit (Kernel test robot) - Initialize rq to NULL in eb_pin_timeline v5: (John Harrison) - Fix typo in comments near bb order loops Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-21-matthew.brost@intel.com
|
#
ef6ba31d |
|
24-Sep-2021 |
Anshuman Gupta <anshuman.gupta@intel.com> |
drm/i915/pxp: Add plane decryption support Add support to enable/disable PLANE_SURF Decryption Request bit. It requires only to enable plane decryption support when following condition met. 1. PXP session is enabled. 2. Buffer object is protected. v2: - Used gen fb obj user_flags instead gem_object_metadata. [Krishna] v3: - intel_pxp_gem_object_status() API changes. v4: use intel_pxp_is_active (Daniele) v5: rebase and use the new protected object status checker (Daniele) v6: used plane state for plane_decryption to handle async flip as suggested by Ville. v7: check pxp session while plane decrypt state computation. [Ville] removed pointless code. [Ville] v8 (Daniele): update PXP check v9: move decrypt check after icl_check_nv12_planes() when overlays have fb set (Juston) v10 (Daniele): update PXP check again to match rework in earlier patches and don't consider protection valid if the object has not been used in an execbuf beforehand. Cc: Bommu Krishnaiah <krishnaiah.bommu@intel.com> Cc: Huang Sean Z <sean.z.huang@intel.com> Cc: Gaurav Kumar <kumar.gaurav@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Juston Li <juston.li@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Uma Shankar <uma.shankar@intel.com> #v9 Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210924191452.1539378-14-alan.previn.teres.alexis@intel.com
|
#
d3ac8d42 |
|
24-Sep-2021 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/pxp: interfaces for using protected objects This api allow user mode to create protected buffers and to mark contexts as making use of such objects. Only when using contexts marked in such a way is the execution guaranteed to work as expected. Contexts can only be marked as using protected content at creation time (i.e. the parameter is immutable) and they must be both bannable and not recoverable. Given that the protected session gets invalidated on suspend, contexts created this way hold a runtime pm wakeref until they're either destroyed or invalidated. All protected objects and contexts will be considered invalid when the PXP session is destroyed and all new submissions using them will be rejected. All intel contexts within the invalidated gem contexts will be marked banned. Userspace can detect that an invalidation has occurred via the RESET_STATS ioctl, where we report it the same way as a ban due to a hang. v5: squash patches, rebase on proto_ctx, update kerneldoc v6: rebase on obj create_ext changes v7: Use session counter to check if an object it valid, hold wakeref in context, don't add a new flag to RESET_STATS (Daniel) v8: don't increase guilty count for contexts banned during pxp invalidation (Rodrigo) v9: better comments, avoid wakeref put race between pxp_inval and context_close, add usage examples (Rodrigo) v10: modify internal set/get-protected-context functions to not return -ENODEV when setting PXP param to false or getting param when running on pxp-unsupported hw or getting param when i915 was built with CONFIG_PXP off Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Cc: Daniel Vetter <daniel.vetter@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210924191452.1539378-11-alan.previn.teres.alexis@intel.com
|
#
a82a9979 |
|
02-Sep-2021 |
Daniel Vetter <daniel.vetter@ffwll.ch> |
drm/i915: Add i915_gem_context_is_full_ppgtt And use it anywhere we have open-coded checks for ctx->vm that really only check for full ppgtt. Plus for paranoia add a GEM_BUG_ON that checks it's really only set when we have full ppgtt, just in case. gem_context->vm is different since it's NULL in ggtt mode, unlike intel_context->vm or gt->vm, which is always set. v2: 0day found a testcase that I missed. v3: Repaint shed (Jon, Tvrtko) Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-7-daniel.vetter@ffwll.ch
|
#
e1068a9e |
|
02-Sep-2021 |
Daniel Vetter <daniel.vetter@ffwll.ch> |
drm/i915: Drop code to handle set-vm races from execbuf Changing the vm from a finalized gem ctx is no longer possible, which means we don't have to check for that anymore. I was pondering whether to keep the check as a WARN_ON, but things go boom real bad real fast if the vm of a vma is wrong. Plus we'd need to also get the ggtt vm for !full-ppgtt platforms. Ditching it all seemed like a better idea. Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> References: ccbc1b97948a ("drm/i915/gem: Don't allow changing the VM on running contexts (v4)") Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-4-daniel.vetter@ffwll.ch
|
#
8e02cceb |
|
03-Aug-2021 |
Daniel Vetter <daniel.vetter@ffwll.ch> |
drm/i915: delete gpu reloc code It's already removed, this just garbage collects it all. v2: Rebase over s/GEN/GRAPHICS_VER/ v3: Also ditch eb.reloc_pool and eb.reloc_context (Maarten) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Link: https://patchwork.freedesktop.org/patch/msgid/20210803124833.3817354-2-daniel.vetter@ffwll.ch
|
#
ce13c78f |
|
03-Aug-2021 |
Daniel Vetter <daniel.vetter@ffwll.ch> |
drm/i915: Disable gpu relocations Media userspace was the last userspace to still use them, and they converted now too: https://github.com/intel/media-driver/commit/144020c37770083974bedf59902b70b8f444c799 This means no reason anymore to make relocations faster than they've been for the first 9 years of gem. This code was added in commit 7dd4f6729f9243bd7046c6f04c107a456bda38eb Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Jun 16 15:05:24 2017 +0100 drm/i915: Async GPU relocation processing Furthermore there's pretty strong indications it's buggy, since the code to use it by default as the only option had to be reverted: commit ad5d95e4d538737ed3fa25493777decf264a3011 Author: Dave Airlie <airlied@redhat.com> Date: Tue Sep 8 15:41:17 2020 +1000 Revert "drm/i915/gem: Async GPU relocations only" This code just disables gpu relocations, leaving the garbage collection for later patches and more importantly, much less confusing diff. Also given how much headaches this code has caused in the past, letting this soak for a bit seems justified. Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Link: https://patchwork.freedesktop.org/patch/msgid/20210803124833.3817354-1-daniel.vetter@ffwll.ch
|
#
93b71330 |
|
14-Jul-2021 |
Jason Ekstrand <jason@jlekstrand.net> |
drm/i915: Revert "drm/i915/gem: Asynchronous cmdparser" This reverts 686c7c35abc2 ("drm/i915/gem: Asynchronous cmdparser"). The justification for this commit in the git history was a vague comment about getting it out from under the struct_mutex. While this may improve perf for some workloads on Gen7 platforms where we rely on the command parser for features such as indirect rendering, no numbers were provided to prove such an improvement. It claims to closed two gitlab/bugzilla issues but with no explanation whatsoever as to why or what bug it's fixing. Meanwhile, by moving command parsing off to an async callback, it leaves us with a problem of what to do on error. When things were synchronous, EXECBUFFER2 would fail with an error code if parsing failed. When moving it to async, we needed another way to handle that error and the solution employed was to set an error on the dma_fence and then trust that said error gets propagated to the client eventually. Moving back to synchronous will help us untangle the fence error propagation mess. This also reverts most of 0edbb9ba1bfe ("drm/i915: Move cmd parser pinning to execbuffer") which is a refactor of some of our allocation paths for asynchronous parsing. Now that everything is synchronous, we don't need it. v2 (Daniel Vetter): - Add stabel Cc and Fixes tag Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Cc: <stable@vger.kernel.org> # v5.6+ Fixes: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already signaled fences") Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210714193419.1459723-2-jason@jlekstrand.net
|
#
046d1660 |
|
08-Jul-2021 |
Jason Ekstrand <jason@jlekstrand.net> |
drm/i915/gem: Return an error ptr from context_lookup We're about to start doing lazy context creation which means contexts get created in i915_gem_context_lookup and we may start having more errors than -ENOENT. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-23-jason@jlekstrand.net
|
#
5ac545b8 |
|
08-Jul-2021 |
Jason Ekstrand <jason@jlekstrand.net> |
drm/i915/request: Remove the hook from await_execution This was only ever used for FENCE_SUBMIT automatic engine selection which was removed in the previous commit. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-12-jason@jlekstrand.net
|
#
dd4f1bba |
|
08-Jul-2021 |
Jason Ekstrand <jason@jlekstrand.net> |
drm/i915/gem: Remove engine auto-magic with FENCE_SUBMIT (v2) Even though FENCE_SUBMIT is only documented to wait until the request in the in-fence starts instead of waiting until it completes, it has a bit more magic than that. If FENCE_SUBMIT is used to submit something to a balanced engine, we would wait to assign engines until the primary request was ready to start and then attempt to assign it to a different engine than the primary. There is an IGT test (the bonded-slice subtest of gem_exec_balancer) which exercises this by submitting a primary batch to a specific VCS and then using FENCE_SUBMIT to submit a secondary which can run on any VCS and have i915 figure out which VCS to run it on such that they can run in parallel. However, this functionality has never been used in the real world. The media driver (the only user of FENCE_SUBMIT) always picks exactly two physical engines to bond and never asks us to pick which to use. v2 (Daniel Vetter): - Mention the exact IGT test this breaks Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-11-jason@jlekstrand.net
|
#
00dae4d3 |
|
08-Jul-2021 |
Jason Ekstrand <jason@jlekstrand.net> |
drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4) This API is entirely unnecessary and I'd love to get rid of it. If userspace wants a single timeline across multiple contexts, they can either use implicit synchronization or a syncobj, both of which existed at the time this feature landed. The justification given at the time was that it would help GL drivers which are inherently single-timeline. However, neither of our GL drivers actually wanted the feature. i965 was already in maintenance mode at the time and iris uses syncobj for everything. Unfortunately, as much as I'd love to get rid of it, it is used by the media driver so we can't do that. We can, however, do the next-best thing which is to embed a syncobj in the context and do exactly what we'd expect from userspace internally. This isn't an entirely identical implementation because it's no longer atomic if userspace races with itself by calling execbuffer2 twice simultaneously from different threads. It won't crash in that case; it just doesn't guarantee any ordering between those two submits. It also means that sync files exported from different engines on a SINGLE_TIMELINE context will have different fence contexts. This is visible to userspace if it looks at the obj_name field of sync_fence_info. Moving SINGLE_TIMELINE to a syncobj emulation has a couple of technical advantages beyond mere annoyance. One is that intel_timeline is no longer an api-visible object and can remain entirely an implementation detail. This may be advantageous as we make scheduler changes going forward. Second is that, together with deleting the CLONE_CONTEXT API, we should now have a 1:1 mapping between intel_context and intel_timeline which may help us reduce locking. v2 (Tvrtko Ursulin): - Update the comment on i915_gem_context::syncobj to mention that it's an emulation and the possible race if userspace calls execbuffer2 twice on the same context concurrently. v2 (Jason Ekstrand): - Wrap the checks for eb.gem_context->syncobj in unlikely() - Drop the dma_fence reference - Improved commit message v3 (Jason Ekstrand): - Move the dma_fence_put() to before the error exit v4 (Tvrtko Ursulin): - Add a comment about fence contexts to the commit message Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-8-jason@jlekstrand.net
|
#
6ff6d61d |
|
08-Jul-2021 |
Jason Ekstrand <jason@jlekstrand.net> |
drm/i915: Drop I915_CONTEXT_PARAM_NO_ZEROMAP The idea behind this param is to support OpenCL drivers with relocations because OpenCL reserves 0x0 for NULL and, if we placed memory there, it would confuse CL kernels. It was originally sent out as part of a patch series including libdrm [1] and Beignet [2] support. However, the libdrm and Beignet patches never landed in their respective upstream projects so this API has never been used. It's never been used in Mesa or any other driver, either. Dropping this API allows us to delete a small bit of code. [1]: https://lists.freedesktop.org/archives/intel-gfx/2015-May/067030.html [2]: https://lists.freedesktop.org/archives/intel-gfx/2015-May/067031.html Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-4-jason@jlekstrand.net
|
#
ca319ee9 |
|
18-Jun-2021 |
Daniel Vetter <daniel.vetter@ffwll.ch> |
drm/i915/eb: Fix pagefault disabling in the first slowpath In commit ebc0808fa2da0548a78e715858024cb81cd732bc Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Oct 18 13:02:51 2016 +0100 drm/i915: Restrict pagefault disabling to just around copy_from_user() we entirely missed that there's a slow path call to eb_relocate_entry (or i915_gem_execbuffer_relocate_entry as it was called back then) which was left fully wrapped by pagefault_disable/enable() calls. Previously any issues with blocking calls where handled by the following code: /* we can't wait for rendering with pagefaults disabled */ if (pagefault_disabled() && !object_is_idle(obj)) return -EFAULT; Now at this point the prefaulting was still around, which means in normal applications it was very hard to hit this bug. No idea why the regressions in igts weren't caught. Now this all changed big time with 2 patches merged closely together. First commit 2889caa9232109afc8881f29a2205abeb5709d0c Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Jun 16 15:05:19 2017 +0100 drm/i915: Eliminate lots of iterations over the execobjects array removes the prefaulting from the first relocation path, pushing it into the first slowpath (of which this patch added a total of 3 escalation levels). This would have really quickly uncovered the above bug, were it not for immediate adding a duct-tape on top with commit 7dd4f6729f9243bd7046c6f04c107a456bda38eb Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Jun 16 15:05:24 2017 +0100 drm/i915: Async GPU relocation processing by pushing all all the relocation patching to the gpu if the buffer was busy, which avoided all the possible blocking calls. The entire slowpath was then furthermore ditched in commit 7dc8f1143778a35b190f9413f228b3cf28f67f8d Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Mar 11 16:03:10 2020 +0000 drm/i915/gem: Drop relocation slowpath and resurrected in commit fd1500fcd4420eee06e2c7f3aa6067b78ac05871 Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Date: Wed Aug 19 16:08:43 2020 +0200 Revert "drm/i915/gem: Drop relocation slowpath". but this did not further impact what's going on. Since pagefault_disable/enable is an atomic section, any sleeping in there is prohibited, and we definitely do that without gpu relocations since we have to wait for the gpu usage to finish before we can patch up the relocations. Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Link: https://patchwork.freedesktop.org/patch/msgid/20210618214503.1773805-1-daniel.vetter@ffwll.ch
|
#
5cd57f67 |
|
15-Jun-2021 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/i915: Perform execbuffer object locking as a separate step To help avoid evicting already resident buffers from the batch we're processing, perform locking as a separate step. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210615113600.30660-1-thomas.hellstrom@linux.intel.com
|
#
440d0f12 |
|
05-May-2021 |
Christian König <christian.koenig@amd.com> |
dma-buf: add dma_fence_chain_alloc/free v3 Add a common allocation helper. Cleaning up the mix of kzalloc/kmalloc and some unused code in the selftest. v2: polish kernel doc a bit v3: polish kernel doc even a bit more Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210611120301.10595-3-christian.koenig@amd.com
|
#
b4b9731b |
|
10-Jun-2021 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/i915: Simplify userptr locking Use an rwlock instead of spinlock for the global notifier lock to reduce risk of contention in execbuf. Protect object state with the object lock whenever possible rather than with the global notifier lock Don't take an explicit page_ref in userptr_submit_init() but rather call get_pages() after obtaining the page list so that get_pages() holds the page_ref. This means we don't need to call userptr_submit_fini(), which is needed to avoid awkward locking in our upcoming VM_BIND code. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210610143525.624677-1-thomas.hellstrom@linux.intel.com
|
#
c9d9fdbc |
|
14-Jul-2021 |
Jason Ekstrand <jason@jlekstrand.net> |
drm/i915: Revert "drm/i915/gem: Asynchronous cmdparser" This reverts 686c7c35abc2 ("drm/i915/gem: Asynchronous cmdparser"). The justification for this commit in the git history was a vague comment about getting it out from under the struct_mutex. While this may improve perf for some workloads on Gen7 platforms where we rely on the command parser for features such as indirect rendering, no numbers were provided to prove such an improvement. It claims to closed two gitlab/bugzilla issues but with no explanation whatsoever as to why or what bug it's fixing. Meanwhile, by moving command parsing off to an async callback, it leaves us with a problem of what to do on error. When things were synchronous, EXECBUFFER2 would fail with an error code if parsing failed. When moving it to async, we needed another way to handle that error and the solution employed was to set an error on the dma_fence and then trust that said error gets propagated to the client eventually. Moving back to synchronous will help us untangle the fence error propagation mess. This also reverts most of 0edbb9ba1bfe ("drm/i915: Move cmd parser pinning to execbuffer") which is a refactor of some of our allocation paths for asynchronous parsing. Now that everything is synchronous, we don't need it. v2 (Daniel Vetter): - Add stabel Cc and Fixes tag Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Cc: <stable@vger.kernel.org> # v5.6+ Fixes: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already signaled fences") Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210714193419.1459723-2-jason@jlekstrand.net (cherry picked from commit 93b713304188844b8514074dc13ffd56d12235d3) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
d3fae3b3 |
|
02-Jun-2021 |
Christian König <christian.koenig@amd.com> |
dma-buf: drop the _rcu postfix on function names v3 The functions can be called both in _rcu context as well as while holding the lock. v2: add some kerneldoc as suggested by Daniel v3: fix indentation Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210602111714.212426-7-christian.koenig@amd.com
|
#
40e1956e |
|
05-Jun-2021 |
Lucas De Marchi <lucas.demarchi@intel.com> |
drm/i915/gem: replace IS_GEN and friends with GRAPHICS_VER This was done by the following semantic patch: @@ expression i915; @@ - INTEL_GEN(i915) + GRAPHICS_VER(i915) @@ expression i915; expression E; @@ - INTEL_GEN(i915) >= E + GRAPHICS_VER(i915) >= E @@ expression dev_priv; expression E; @@ - !IS_GEN(dev_priv, E) + GRAPHICS_VER(dev_priv) != E @@ expression dev_priv; expression E; @@ - IS_GEN(dev_priv, E) + GRAPHICS_VER(dev_priv) == E @@ expression dev_priv; expression from, until; @@ - IS_GEN_RANGE(dev_priv, from, until) + IS_GRAPHICS_VER(dev_priv, from, until) @def@ expression E; identifier id =~ "^gen$"; @@ - id = GRAPHICS_VER(E) + ver = GRAPHICS_VER(E) @@ identifier def.id; @@ - id + ver It also takes care of renaming the variable we assign to GRAPHICS_VER() so to use "ver" rather than "gen". Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210605155356.4183026-4-lucas.demarchi@intel.com
|
#
8802190f |
|
12-Apr-2021 |
Lucas De Marchi <lucas.demarchi@intel.com> |
drm/i915: eliminate remaining uses of intel_device_info->gen Replace gen with the new graphics_ver value and use GRAPHICS_VER() in those places. Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210413051002.92589-10-lucas.demarchi@intel.com
|
#
c9398775 |
|
23-Mar-2021 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Defer pin calls in buffer pool until first use by caller. We need to take the obj lock to pin pages, so wait until the callers have done so, before making the object unshrinkable. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-30-maarten.lankhorst@linux.intel.com
|
#
ed29c269 |
|
23-Mar-2021 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Fix userptr so we do not have to worry about obj->mm.lock, v7. Instead of doing what we do currently, which will never work with PROVE_LOCKING, do the same as AMD does, and something similar to relocation slowpath. When all locks are dropped, we acquire the pages for pinning. When the locks are taken, we transfer those pages in .get_pages() to the bo. As a final check before installing the fences, we ensure that the mmu notifier was not called; if it is, we return -EAGAIN to userspace to signal it has to start over. Changes since v1: - Unbinding is done in submit_init only. submit_begin() removed. - MMU_NOTFIER -> MMU_NOTIFIER Changes since v2: - Make i915->mm.notifier a spinlock. Changes since v3: - Add WARN_ON if there are any page references left, should have been 0. - Return 0 on success in submit_init(), bug from spinlock conversion. - Release pvec outside of notifier_lock (Thomas). Changes since v4: - Mention why we're clearing eb->[i + 1].vma in the code. (Thomas) - Actually check all invalidations in eb_move_to_gpu. (Thomas) - Do not wait when process is exiting to fix gem_ctx_persistence.userptr. Changes since v5: - Clarify why check on PF_EXITING is (temporarily) required. Changes since v6: - Ensure userptr validity is checked in set_domain through a special path. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Acked-by: Dave Airlie <airlied@redhat.com> [danvet: s/kfree/kvfree/ in i915_gem_object_userptr_drop_ref in the previous review round, but which got lost. The other open questions around page refcount are imo better discussed in a separate series, with amdgpu folks involved]. Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-17-maarten.lankhorst@linux.intel.com
|
#
20ee27bd |
|
23-Mar-2021 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Make compilation of userptr code depend on MMU_NOTIFIER. Now that unsynchronized mappings are removed, the only time userptr works is when the MMU notifier is enabled. Put all of the userptr code behind a mmu notifier ifdef. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-16-maarten.lankhorst@linux.intel.com
|
#
bfaae47d |
|
23-Mar-2021 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: make lockdep slightly happier about execbuf. As soon as we install fences, we should stop allocating memory in order to prevent any potential deadlocks. This is required later on, when we start adding support for dma-fence annotations. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-11-maarten.lankhorst@linux.intel.com
|
#
237647f4 |
|
23-Mar-2021 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Add missing -EDEADLK handling to execbuf pinning, v2. i915_vma_pin may fail with -EDEADLK when we start locking page tables, so ensure we handle this correctly. Changes since v1: - Drop -EDEADLK todo, this commit handles it. - Change eb_pin_vma from sort-of-bool + -EDEADLK to a proper int. (Matt) Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-5-maarten.lankhorst@linux.intel.com
|
#
0edbb9ba |
|
23-Mar-2021 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Move cmd parser pinning to execbuffer We need to get rid of allocations in the cmd parser, because it needs to be called from a signaling context, first move all pinning to execbuf, where we already hold all locks. Allocate jump_whitelist in the execbuffer, and add annotations around intel_engine_cmd_parser(), to ensure we only call the command parser without allocating any memory, or taking any locks we're not supposed to. Because i915_gem_object_get_page() may also allocate memory, add a path to i915_gem_object_get_sg() that prevents memory allocations, and walk the sg list manually. It should be similarly fast. This has the added benefit of being able to catch all memory allocation errors before the point of no return, and return -ENOMEM safely to the execbuf submitter. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-4-maarten.lankhorst@linux.intel.com
|
#
2eb8e1a6 |
|
17-Mar-2021 |
Jason Ekstrand <jason@jlekstrand.net> |
drm/i915/gem: Drop relocation support on all new hardware (v6) The Vulkan driver in Mesa for Intel hardware never uses relocations if it's running on a version of i915 that supports at least softpin which all versions of i915 supporting Gen12 do. On the OpenGL side, Gen12+ is only supported by iris which never uses relocations. The older i965 driver in Mesa does use relocations but it only supports Intel hardware through Gen11 and has been deprecated for all hardware Gen9+. The compute driver also never uses relocations. This only leaves the media driver which is supposed to be switching to softpin going forward. Making softpin a requirement for all future hardware seems reasonable. There is one piece of hardware enabled by default in i915: RKL which was enabled by e22fa6f0a976 which has not yet landed in drm-next so this almost but not really a userspace API change for RKL. If it becomes a problem, we can always add !IS_ROCKETLAKE(eb->i915) to the condition. Rejecting relocations starting with newer Gen12 platforms has the benefit that we don't have to bother supporting it on platforms with local memory. Given how much CPU touching of memory is required for relocations, not having to do so on platforms where not all memory is directly CPU-accessible carries significant advantages. v2 (Jason Ekstrand): - Allow TGL-LP platforms as they've already shipped v3 (Jason Ekstrand): - WARN_ON platforms with LMEM support in case the check is wrong v4 (Jason Ekstrand): - Call out Rocket Lake in the commit message v5 (Jason Ekstrand): - Drop the HAS_LMEM check as it's already covered by the version check v6 (Jason Ekstrand): - Move the check to eb_validate_vma() with all the other exec_object validation checks. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210317234014.2271006-3-jason@jlekstrand.net
|
#
b5b6f6a6 |
|
17-Mar-2021 |
Jason Ekstrand <jason@jlekstrand.net> |
drm/i915/gem: Drop legacy execbuffer support (v2) libdrm has supported the newer execbuffer2 ioctl and using it by default when it exists since libdrm commit b50964027bef which landed Mar 2, 2010. The i915 and i965 drivers in Mesa at the time both used libdrm and so did the Intel X11 back-end. The SNA back-end for X11 has always used execbuffer2. v2 (Jason Ekstrand): - Add a comment saying what Linux version it's being removed in. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Keith Packard <keithp@keithp.com> Acked-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210317234014.2271006-2-jason@jlekstrand.net
|
#
8f47c8c3 |
|
19-Jan-2021 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915/pool: constrain pool objects by mapping type In a few places we always end up mapping the pool object with the FORCE constraint(to prevent hitting -EBUSY) which will destroy the cached mapping if it has a different type. As a simple first step, make the mapping type part of the pool interface, where the behaviour is to only give out pool objects which match the requested mapping type. Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20210119133106.66294-4-matthew.auld@intel.com
|
#
9b3a8f55 |
|
08-Jan-2021 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Disable arbitration on no-preempt requests If a request is submitted and known to require no preemption, disable arbitration around the batch which prevents the HW from handling a preemption request during the payload. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210108204026.20682-7-chris@chris-wilson.co.uk
|
#
bb80d878 |
|
03-Jan-2021 |
Arnd Bergmann <arnd@arndb.de> |
drm/i915: fix shift warning Randconfig builds on 32-bit machines show lots of warnings for the i915 driver for passing a 32-bit value into __const_hweight64(): drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:2584:9: error: shift count >= width of type [-Werror,-Wshift-count-overflow] return hweight64(VDBOX_MASK(&i915->gt)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/asm-generic/bitops/const_hweight.h:29:49: note: expanded from macro 'hweight64' #define hweight64(w) (__builtin_constant_p(w) ? __const_hweight64(w) : __arch_hweight64(w)) Change it to hweight_long() to avoid the warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20210103135158.3591442-1-arnd@kernel.org
|
#
26ebc511 |
|
24-Dec-2020 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915: clear the gpu reloc batch The reloc batch is short lived but can exist in the user visible ppGTT, and since it's backed by an internal object, which lacks page clearing, we should take care to clear it upfront. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201224151358.401345-2-matthew.auld@intel.com Cc: stable@vger.kernel.org
|
#
45233ab2 |
|
16-Dec-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Move gen8 CS emitters into gen8_engine_cs.h Reduce the pollution of intel_engine.h by moving gen8_emit_pipe_control and friends to gen8_engine_cs.h Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201216135452.6063-1-chris@chris-wilson.co.uk
|
#
5f22cc0b |
|
16-Dec-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Fix mismatch between misplaced vma check and vma insert When inserting a VMA, we restrict the placement to the low 4G unless the caller opts into using the full range. This was done to allow usersapce the opportunity to transition slowly from a 32b address space, and to avoid breaking inherent 32b assumptions of some commands. However, for insert we limited ourselves to 4G-4K, but on verification we allowed the full 4G. This causes some attempts to bind a new buffer to sporadically fail with -ENOSPC, but at other times be bound successfully. commit 48ea1e32c39d ("drm/i915/gen9: Set PIN_ZONE_4G end to 4GB - 1 page") suggests that there is a genuine problem with stateless addressing that cannot utilize the last page in 4G and so we purposefully excluded it. This means that the quick pin pass may cause us to utilize a buggy placement. Reported-by: CQ Tang <cq.tang@intel.com> Testcase: igt/gem_exec_params/larger-than-life-batch Fixes: 48ea1e32c39d ("drm/i915/gen9: Set PIN_ZONE_4G end to 4GB - 1 page") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: CQ Tang <cq.tang@intel.com> Reviewed-by: CQ Tang <cq.tang@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Cc: <stable@vger.kernel.org> # v4.5+ Link: https://patchwork.freedesktop.org/patch/msgid/20201216092951.7124-1-chris@chris-wilson.co.uk
|
#
e9f4829f |
|
07-Dec-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Drop false !i915_vma_is_closed assertion Closed vma are protected by the GT wakeref held as we lookup the vma, so we know that the vma will not be freed as we process it for the execbuf. Instead we expect to catch the closed status of the context, and simply allow the close-race on an individual vma to be washed away. Longer term, the GT wakeref protection will be removed by explicit vma.kref tracking. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2245 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201207193824.18114-1-chris@chris-wilson.co.uk
|
#
ba38b79e |
|
03-Dec-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Propagate error from cancelled submit due to context closure In the course of discovering and closing many races with context closure and execbuf submission, since commit 61231f6bd056 ("drm/i915/gem: Check that the context wasn't closed during setup") we started checking that the context was not closed by another userspace thread during the execbuf ioctl. In doing so we cancelled the inflight request (by telling it to be skipped), but kept reporting success since we do submit a request, albeit one that doesn't execute. As the error is known before we return from the ioctl, we can report the error we detect immediately, rather than leave it on the fence status. With the immediate propagation of the error, it is easier for userspace to handle. Fixes: 61231f6bd056 ("drm/i915/gem: Check that the context wasn't closed during setup") Testcase: igt/gem_ctx_exec/basic-close-race Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: <stable@vger.kernel.org> # v5.7+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201203103432.31526-1-chris@chris-wilson.co.uk
|
#
641382e9 |
|
24-Dec-2020 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915: clear the gpu reloc batch The reloc batch is short lived but can exist in the user visible ppGTT, and since it's backed by an internal object, which lacks page clearing, we should take care to clear it upfront. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201224151358.401345-2-matthew.auld@intel.com Cc: stable@vger.kernel.org (cherry picked from commit 26ebc511e799f621357982ccc37a7987a56a00f4) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
0e53656a |
|
16-Dec-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Fix mismatch between misplaced vma check and vma insert When inserting a VMA, we restrict the placement to the low 4G unless the caller opts into using the full range. This was done to allow usersapce the opportunity to transition slowly from a 32b address space, and to avoid breaking inherent 32b assumptions of some commands. However, for insert we limited ourselves to 4G-4K, but on verification we allowed the full 4G. This causes some attempts to bind a new buffer to sporadically fail with -ENOSPC, but at other times be bound successfully. commit 48ea1e32c39d ("drm/i915/gen9: Set PIN_ZONE_4G end to 4GB - 1 page") suggests that there is a genuine problem with stateless addressing that cannot utilize the last page in 4G and so we purposefully excluded it. This means that the quick pin pass may cause us to utilize a buggy placement. Reported-by: CQ Tang <cq.tang@intel.com> Testcase: igt/gem_exec_params/larger-than-life-batch Fixes: 48ea1e32c39d ("drm/i915/gen9: Set PIN_ZONE_4G end to 4GB - 1 page") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: CQ Tang <cq.tang@intel.com> Reviewed-by: CQ Tang <cq.tang@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Cc: <stable@vger.kernel.org> # v4.5+ Link: https://patchwork.freedesktop.org/patch/msgid/20201216092951.7124-1-chris@chris-wilson.co.uk (cherry picked from commit 5f22cc0b134ab702d7f64b714e26018f7288ffee) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
0e124e19 |
|
03-Dec-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Propagate error from cancelled submit due to context closure In the course of discovering and closing many races with context closure and execbuf submission, since commit 61231f6bd056 ("drm/i915/gem: Check that the context wasn't closed during setup") we started checking that the context was not closed by another userspace thread during the execbuf ioctl. In doing so we cancelled the inflight request (by telling it to be skipped), but kept reporting success since we do submit a request, albeit one that doesn't execute. As the error is known before we return from the ioctl, we can report the error we detect immediately, rather than leave it on the fence status. With the immediate propagation of the error, it is easier for userspace to handle. Fixes: 61231f6bd056 ("drm/i915/gem: Check that the context wasn't closed during setup") Testcase: igt/gem_ctx_exec/basic-close-race Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: <stable@vger.kernel.org> # v5.7+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201203103432.31526-1-chris@chris-wilson.co.uk (cherry picked from commit ba38b79eaeaeed29d2383f122d5c711ebf5ed3d1) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
d5e87821 |
|
14-Oct-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Support parsing of oversize batches Matthew Auld noted that on more recent systems (such as the parser for gen9) we may have objects that are larger than expected by the GEM uAPI (i.e. greater than u32). These objects would have incorrect implicit batch lengths, causing the parser to reject them for being incomplete, or worse. Based on a patch by Matthew Auld. Reported-by: Matthew Auld <matthew.auld@intel.com> Fixes: 435e8fc059db ("drm/i915: Allow parsing of unsized batches") Testcase: igt/gem_exec_params/larger-than-life-batch Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Cc: stable@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20201015115954.871-1-chris@chris-wilson.co.uk (cherry picked from commit 57b2d834bf235daab388c3ba12d035c820ae09c6) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
57b2d834 |
|
14-Oct-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Support parsing of oversize batches Matthew Auld noted that on more recent systems (such as the parser for gen9) we may have objects that are larger than expected by the GEM uAPI (i.e. greater than u32). These objects would have incorrect implicit batch lengths, causing the parser to reject them for being incomplete, or worse. Based on a patch by Matthew Auld. Reported-by: Matthew Auld <matthew.auld@intel.com> Fixes: 435e8fc059db ("drm/i915: Allow parsing of unsized batches") Testcase: igt/gem_exec_params/larger-than-life-batch Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Cc: stable@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20201015115954.871-1-chris@chris-wilson.co.uk
|
#
c60b93cd |
|
28-Sep-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Avoid mixing integer types during batch copies Be consistent and use unsigned long throughout the chunk copies to avoid the inherent clumsiness of mixing integer types of different widths and signs. Failing to take acount of a wider unsigned type when using min_t can lead to treating it as a negative, only for it flip back to a large unsigned value after passing a boundary check. Fixes: ed13033f0287 ("drm/i915/cmdparser: Only cache the dst vmap") Testcase: igt/gen9_exec_parse/bb-large Reported-by: "Candelaria, Jared" <jared.candelaria@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: "Candelaria, Jared" <jared.candelaria@intel.com> Cc: "Bloomfield, Jon" <jon.bloomfield@intel.com> Cc: <stable@vger.kernel.org> # v4.9+ Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200928215942.31917-1-chris@chris-wilson.co.uk (cherry picked from commit b7eeb2b4132ccf1a7d38f434cde7043913d1ed3c) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
b7eeb2b4 |
|
28-Sep-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Avoid mixing integer types during batch copies Be consistent and use unsigned long throughout the chunk copies to avoid the inherent clumsiness of mixing integer types of different widths and signs. Failing to take acount of a wider unsigned type when using min_t can lead to treating it as a negative, only for it flip back to a large unsigned value after passing a boundary check. Fixes: ed13033f0287 ("drm/i915/cmdparser: Only cache the dst vmap") Testcase: igt/gen9_exec_parse/bb-large Reported-by: "Candelaria, Jared" <jared.candelaria@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: "Candelaria, Jared" <jared.candelaria@intel.com> Cc: "Bloomfield, Jon" <jon.bloomfield@intel.com> Cc: <stable@vger.kernel.org> # v4.9+ Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200928215942.31917-1-chris@chris-wilson.co.uk
|
#
166774a2 |
|
10-Sep-2020 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Fix slightly botched merge in __reloc_entry_gpu This function should be an int, not a bool. Presumably because we had the same 2 reverts in a slightly different way, git got confused. Thanks to Dan for reporting. :) The conflict is between the 3 reverts in drm-fixes: 4993a8a37808 ("Revert "drm/i915: Remove i915_gem_object_get_dirty_page()"") ad5d95e4d538 ("Revert "drm/i915/gem: Async GPU relocations only"") 20561da3a2e1 ("Revert "drm/i915/gem: Delete unused code"") And the slightly different combined revert in drm-intel-gt-next, but with the same goal: 102a0a9051f4 ("Revert "drm/i915/gem: Async GPU relocations only"") In the merge commit 1f4b2aca794f ("Merge tag 'drm-intel-gt-next-2020-09-07' of git://anongit.freedesktop.org/drm/drm-intel into drm-next") things went wrong, but the merge commit view now doesn't show any conflict anymore (as git tends to do when the resolution picks one or the other branch). The need to handle other than just true/false error codes in __reloc_entry_gpu was added in the dma_resv locking changes in c43ce12328df ("drm/i915: Use per object locking in execbuf, v12.") Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Dave Airlie <airlied@redhat.com> [danvet: Explain this entire saga a lot better, adding tons of commit references. Also note that this was merged before full intel-gfx-CI results, only after BAT, since the breakage at the BAT run is already severe enough to block all pre-merge testing.] Fixes: 1f4b2aca794f ("Merge tag 'drm-intel-gt-next-2020-09-07' of git://anongit.freedesktop.org/drm/drm-intel into drm-next") Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20200910111225.2184193-1-maarten.lankhorst@linux.intel.com
|
#
20561da3 |
|
07-Sep-2020 |
Dave Airlie <airlied@redhat.com> |
Revert "drm/i915/gem: Delete unused code" These commits caused a regression on Lenovo t520 sandybridge machine belonging to reporter. We are reverting them for 5.10 for other reasons, so just do it for 5.9 as well. This reverts commit 7ac2d2536dfa71c275a74813345779b1e7522c91. Reported-by: Harald Arnesen <harald@skogtun.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
|
#
ad5d95e4 |
|
07-Sep-2020 |
Dave Airlie <airlied@redhat.com> |
Revert "drm/i915/gem: Async GPU relocations only" These commits caused a regression on Lenovo t520 sandybridge machine belonging to reporter. We are reverting them for 5.10 for other reasons, so just do it for 5.9 as well. This reverts commit 9e0f9464e2ab36b864359a59b0e9058fdef0ce47. Reported-by: Harald Arnesen <harald@skogtun.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
|
#
47b08693 |
|
19-Aug-2020 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Make sure execbuffer always passes ww state to i915_vma_pin. As a preparation step for full object locking and wait/wound handling during pin and object mapping, ensure that we always pass the ww context in i915_gem_execbuffer.c to i915_vma_pin, use lockdep to ensure this happens. This also requires changing the order of eb_parse slightly, to ensure we pass ww at a point where we could still handle -EDEADLK safely. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-15-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
2bf541ff |
|
19-Aug-2020 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Pin engine before pinning all objects, v5. We want to lock all gem objects, including the engine context objects, rework the throttling to ensure that we can do this. Now we only throttle once, but can take eb_pin_engine while acquiring objects. This means we will have to drop the lock to wait. If we don't have to throttle we can still take the fastpath, if not we will take the slowpath and wait for the throttle request while unlocked. The engine has to be pinned as first step, otherwise gpu relocations won't work. Changes since v1: - Only need to get a throttled request in the fastpath, no need for a global flag any more. - Always free the waited request correctly. Changes since v2: - Use intel_engine_pm_get()/put() to keeep engine pool alive during EDEADLK handling. Changes since v3: - Fix small rq leak. Changes since v4: - Use a single reloc_context, for intel_context_pin_ww(). Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-13-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
b49a7d51 |
|
19-Aug-2020 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Nuke arguments to eb_pin_engine Those arguments are already set as eb.file and eb.args, so kill off the extra arguments. This will allow us to move eb_pin_engine() to after we reserved all BO's. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-12-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
c43ce123 |
|
19-Aug-2020 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Use per object locking in execbuf, v12. Now that we changed execbuf submission slightly to allow us to do all pinning in one place, we can now simply add ww versions on top of struct_mutex. All we have to do is a separate path for -EDEADLK handling, which needs to unpin all gem bo's before dropping the lock, then starting over. This finally allows us to do parallel submission, but because not all of the pinning code uses the ww ctx yet, we cannot completely drop struct_mutex yet. Changes since v1: - Keep struct_mutex for now. :( Changes since v2: - Make sure we always lock the ww context in slowpath. Changes since v3: - Don't call __eb_unreserve_vma in eb_move_to_gpu now; this can be done on normal unlock path. - Unconditionally release vmas and context. Changes since v4: - Rebased on top of struct_mutex reduction. Changes since v5: - Remove training wheels. Changes since v6: - Fix accidentally broken -ENOSPC handling. Changes since v7: - Handle gt buffer pool better. Changes since v8: - Properly clear variables, to make -EDEADLK handling not BUG. Change since v9: - Fix unpinning fence on pnv and below. Changes since v10: - Make relocation gpu chaining working again. Changes since v11: - Remove relocation chaining, pain to make it work. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-9-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
8e4ba491 |
|
19-Aug-2020 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Parse command buffer earlier in eb_relocate(slow) We want to introduce backoff logic, but we need to lock the pool object as well for command parsing. Because of this, we will need backoff logic for the engine pool obj, move the batch validation up slightly to eb_lookup_vmas, and the actual command parsing in a separate function which can get called from execbuf relocation fast and slowpath. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-8-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
1af343cd |
|
19-Aug-2020 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Remove locking from i915_gem_object_prepare_read/write Execbuffer submission will perform its own WW locking, and we cannot rely on the implicit lock there. This also makes it clear that the GVT code will get a lockdep splat when multiple batchbuffer shadows need to be performed in the same instance, fix that up. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-7-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
80f0b679 |
|
19-Aug-2020 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2. i915_gem_ww_ctx is used to lock all gem bo's for pinning and memory eviction. We don't use it yet, but lets start adding the definition first. To use it, we have to pass a non-NULL ww to gem_object_lock, and don't unlock directly. It is done in i915_gem_ww_ctx_fini. Changes since v1: - Change ww_ctx and obj order in locking functions (Jonas Lahtinen) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-6-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
8ae275c2 |
|
19-Aug-2020 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
Revert "drm/i915/gem: Split eb_vma into its own allocation" This reverts commit 0f1dd02295f3 ("drm/i915/gem: Split eb_vma into its own allocation") and also moves all unreserving to a single place at the end, which is a minor simplification. With the WW locking, we will drop all references only at the end when unlocking, so refcounting can now be removed. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-5-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
fd1500fc |
|
19-Aug-2020 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
Revert "drm/i915/gem: Drop relocation slowpath". This reverts commit 7dc8f1143778 ("drm/i915/gem: Drop relocation slowpath"). We need the slowpath relocation for taking ww-mutex inside the page fault handler, and we will take this mutex when pinning all objects. We also functionally revert ef398881d27d ("drm/i915/gem: Limit struct_mutex to eb_reserve"), as we need the struct_mutex in the slowpath as well, and a tiny part of 003d8b9143a6 ("drm/i915/gem: Only call eb_lookup_vma once during execbuf ioctl"). Specifically, we make the -EAGAIN handling part of fallback to slowpath again. With this, we have a proper working slowpath again, which will allow us to do fault handling with WW locks held. [mlankhorst: Adjusted for reloc_gpu_flush() changes] Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> [mlankhorst: Removed extra reloc_gpu_flush()] Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-4-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
50ae6c61 |
|
19-Aug-2020 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Revert relocation chaining commits. This reverts commit 964a9b0f611ee ("drm/i915/gem: Use chained reloc batches") and commit 0e97fbb080553 ("drm/i915/gem: Use a single chained reloc batches for a single execbuf"). When adding ww locking to execbuf, it's hard enough to deal with a single BO that is part of relocation execution. Chaining is hard to get right, and with GPU relocation deprecated, it's best to drop this altogether, instead of trying to fix something we will remove. This is not a completely 1:1 revert, we reset rq_size to 0 in reloc_cache_init, this was from e3d291301f99 ("drm/i915/gem: Implement legacy MI_STORE_DATA_IMM"), because we don't want to break the selftests. (Daniel) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-3-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
102a0a90 |
|
19-Aug-2020 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
Revert "drm/i915/gem: Async GPU relocations only" This reverts commit 9e0f9464e2ab ("drm/i915/gem: Async GPU relocations only"), and related commit 7ac2d2536dfa7 ("drm/i915/gem: Delete unused code"). Async GPU relocations are not the path forward, we want to remove GPU accelerated relocation support eventually when userspace is fixed to use VM_BIND, and this is the first step towards that. We will keep async gpu relocations around for now, until userspace is fixed. Relocation support will be disabled completely on platforms where there was never any userspace that depends on it, as the hardware doesn't require it from at least gen9+ onward. For older platforms, the plan is to use cpu relocations only. The igt side is fixed in igt commit 39e9aa1032a4e ("tests/i915: Remove subtests that rely on async relocation behavior"). Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-2-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
da1ea128 |
|
06-Aug-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Free the fence after a fence-chain lookup failure If dma_fence_chain_find_seqno() reports an error, it does so in its preamble before it disposes of the input fence. On handling the error, we need to drop the reference to the fence. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2292 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: 13149e8bafc4 ("drm/i915: add syncobj timeline support") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200806161056.17593-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
5d934137 |
|
31-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Export a preallocate variant of i915_active_acquire() Sometimes we have to be very careful not to allocate underneath a mutex (or spinlock) and yet still want to track activity. Enter i915_active_acquire_for_context(). This raises the activity counter on i915_active prior to use and ensures that the fence-tree contains a slot for the context. v2: Refactor active_lookup() so it can be called again before/after locking to resolve contention. Since we protect the rbtree until we idle, we can do a lockfree lookup, with the caveat that if another thread performs a concurrent insertion, the rotations from the insert may cause us to not find our target. A second pass holding the treelock will find the target if it exists, or the place to perform our insertion. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200731085015.32368-3-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
27a5dcfe |
|
28-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Remove disordered per-file request list for throttling I915_GEM_THROTTLE dates back to the time before contexts where there was just a single engine, and therefore a single timeline and request list globally. That request list was in execution/retirement order, and so walking it to find a particular aged request made sense and could be split per file. That is no more. We now have many timelines with a file, as many as the user wants to construct (essentially per-engine, per-context). Each of those run independently and so make the single list futile. Remove the disordered list, and iterate over all the timelines to find a request to wait on in each to satisfy the criteria that the CPU is no more than 20ms ahead of its oldest request. It should go without saying that the I915_GEM_THROTTLE ioctl is no longer used as the primary means of throttling, so it makes sense to push the complication into the ioctl where it only impacts upon its few irregular users, rather than the execbuf/retire where everybody has to pay the cost. Fortunately, the few users do not create vast amount of contexts, so the loops over contexts/engines should be concise. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200728152010.30701-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
13149e8b |
|
04-Aug-2020 |
Lionel Landwerlin <lionel.g.landwerlin@intel.com> |
drm/i915: add syncobj timeline support Introduces a new parameters to execbuf so that we can specify syncobj handles as well as timeline points. v2: Reuse i915_user_extension_fn v3: Check that the chained extension is only present once (Chris) v4: Check that dma_fence_chain_find_seqno returns a non NULL fence (Lionel) v5: Use BIT_ULL (Chris) v6: Fix issue with already signaled timeline points, dma_fence_chain_find_seqno() setting fence to NULL (Chris) v7: Report ENOENT with invalid syncobj handle (Lionel) v8: Check for out of order timeline point insertion (Chris) v9: After explanations on https://lists.freedesktop.org/archives/dri-devel/2019-August/229287.html drop the ordering check from v8 (Lionel) v10: Set first extension enum item to 1 (Jason) v11: Rebase v12: Allow multiple extension nodes of timeline syncobj (Chris) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Co-authored-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> (v11) Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200804085954.350343-3-lionel.g.landwerlin@intel.com Link: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2901 Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
cda9edd0 |
|
04-Aug-2020 |
Lionel Landwerlin <lionel.g.landwerlin@intel.com> |
drm/i915: introduce a mechanism to extend execbuf2 We're planning to use this for a couple of new feature where we need to provide additional parameters to execbuf. v2: Check for invalid flags in execbuffer2 (Lionel) v3: Rename I915_EXEC_EXT -> I915_EXEC_USE_EXTENSIONS (Chris) v4: Rebase Move array fence parsing in i915_gem_do_execbuffer() Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200804085954.350343-2-lionel.g.landwerlin@intel.com Link: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2901 Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
792592e7 |
|
07-Jul-2020 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915: Move the engine mask to intel_gt_info Since the engines belong to the GT, move the runtime-updated list of available engines to the intel_gt struct. The original mask has been renamed to indicate it contains the maximum engine list that can be found on a matching device. In preparation for other info being moved to the gt in follow up patches (sseu), introduce an intel_gt_info structure to group all gt-related runtime info. v2: s/max_engine_mask/platform_engine_mask (tvrtko), fix selftest Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Andi Shyti <andi.shyti@intel.com> Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-5-daniele.ceraolospurio@intel.com
|
#
f7ce8639 |
|
02-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Split the context's obj:vma lut into its own mutex Rather than reuse the common ctx->mutex for locking the execbuffer LUT, split it into its own lock to avoid being taken [as part of ctx->mutex] at inappropriate times. In particular to avoid the inversion from taking the timeline->mutex for the whole execbuf submission in the next patch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200703004306.11117-1-chris@chris-wilson.co.uk
|
#
096a42dd |
|
01-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Move obj->lut_list under its own lock The obj->lut_list is traversed when the object is closed as the file table is destroyed during process termination. As this occurs before we kill any outstanding context if, due to some bug or another, the closure is blocked, then we fail to shootdown any inflight operations potentially leaving the GPU spinning forever. As we only need to guard the list against concurrent closures and insertions, the hold is short and merits being treated as a simple spinlock. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200701084439.17025-1-chris@chris-wilson.co.uk
|
#
d7466a5a |
|
04-Jun-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Mark the buffer pool as active for the cmdparser If the execbuf is interrupted after building the cmdparser pipeline, and before we commit to submitting the request to HW, we would attempt to clean up the cmdparser early. While we held active references to the vma being parsed and constructed, we did not hold an active reference for the buffer pool itself. The result was that an interrupted execbuf could still have run the cmdparser pipeline, but since the buffer pool was idle, its target vma could have been recycled. Note this problem only occurs if the cmdparser is running async due to pipelined waits on busy fences, and the execbuf is interrupted. Fixes: 686c7c35abc2 ("drm/i915/gem: Asynchronous cmdparser") Fixes: 16e87459673a ("drm/i915/gt: Move the batch buffer pool from the engine to the gt") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200604103751.18816-1-chris@chris-wilson.co.uk (cherry picked from commit 57a78ca4eceab1ecb0299fba8a10211289329889) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
7ac2d253 |
|
05-Jun-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Delete unused code Unused as of commit 9e0f9464e2ab ("drm/i915/gem: Async GPU relocations only"), but left behind. >> drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:933:21: error: unused function 'unmask_page' [-Werror,-Wunused-function] static inline void *unmask_page(unsigned long p) ^ >> drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:938:28: error: unused function 'unmask_flags' [-Werror,-Wunused-function] static inline unsigned int unmask_flags(unsigned long p) ^ >> drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:945:33: error: unused function 'cache_to_ggtt' [-Werror,-Wunused-function] static inline struct i915_ggtt *cache_to_ggtt(struct reloc_cache *cache) Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200605200357.13069-1-chris@chris-wilson.co.uk
|
#
9e0f9464 |
|
04-Jun-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Async GPU relocations only Reduce the 3 relocation paths down to the single path that accommodates all. The primary motivation for this is to guard the relocations with a natural fence (derived from the i915_request used to write the relocation from the GPU). The tradeoff in using async gpu relocations is that it increases latency over using direct CPU relocations, for the cases where the target is idle and accessible by the CPU. The benefit is greatly reduced lock contention and improved concurrency by pipelining. Note that forcing the async gpu relocations does reveal a few issues they have. Firstly, is that they are visible as writes to gem_busy, causing to mark some buffers are being to written to by the GPU even though userspace only reads. Secondly is that, in combination with the cmdparser, they can cause priority inversions. This should be the case where the work is being put into a common workqueue losing our priority information and so being executed in FIFO from the worker, denying us the opportunity to reorder the requests afterwards. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200604211457.19696-1-chris@chris-wilson.co.uk
|
#
57a78ca4 |
|
04-Jun-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Mark the buffer pool as active for the cmdparser If the execbuf is interrupted after building the cmdparser pipeline, and before we commit to submitting the request to HW, we would attempt to clean up the cmdparser early. While we held active references to the vma being parsed and constructed, we did not hold an active reference for the buffer pool itself. The result was that an interrupted execbuf could still have run the cmdparser pipeline, but since the buffer pool was idle, its target vma could have been recycled. Note this problem only occurs if the cmdparser is running async due to pipelined waits on busy fences, and the execbuf is interrupted. Fixes: 686c7c35abc2 ("drm/i915/gem: Asynchronous cmdparser") Fixes: 16e87459673a ("drm/i915/gt: Move the batch buffer pool from the engine to the gt") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200604103751.18816-1-chris@chris-wilson.co.uk
|
#
5a833995 |
|
02-Jun-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Drop i915_request.i915 backpointer We infrequently use the direct i915 backpointer from the i915_request, so do we really need to waste the space in the struct for it? 8 bytes from the most frequently allocated struct vs an 3 bytes and pointer chasing in using rq->engine->i915? Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200602220953.21178-1-chris@chris-wilson.co.uk
|
#
ea97c4ca |
|
25-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Suppress some random warnings Leave the error propagation in place, but limit the warnings to only show up in CI if the unlikely errors are hit. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200525141957.3061-2-chris@chris-wilson.co.uk
|
#
6db20e27 |
|
04-May-2020 |
Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> |
drm/i915/gem: Prefer drm_WARN* over WARN* struct drm_device specific drm_WARN* macros include device information in the backtrace, so we know what device the warnings originate from. Prefer drm_WARN* over WARN* at places where struct drm_device pointer can be extracted. Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200504181600.18503-6-pankaj.laxminarayan.bharadiya@intel.com
|
#
18e4af04 |
|
13-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Drop no-semaphore boosting Now that we have fast timeslicing on semaphores, we no longer need to prioritise none-semaphore work as we will yield any work blocked on a semaphore to the next in the queue. Previously with no timeslicing, blocking on the semaphore caused extremely bad scheduling with multiple clients utilising multiple rings. Now, there is no impact and we can remove the complication. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200513173504.28322-1-chris@chris-wilson.co.uk
|
#
889333c7 |
|
13-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Remove redundant exec_fence Since there can only be one of in_fence/exec_fence, just use the single in_fence local. v2: Consolidate lookup Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200513180937.28992-1-chris@chris-wilson.co.uk
|
#
eec39e44 |
|
07-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove wait priority boosting Upon waiting a request (when asked), we gave that request a small priority boost, not enough for it to cause preemption, but enough for it to be scheduled next before all equals. We also used that bit to give new clients a small priority boost, similar to FQ_CODEL, such that we favoured short interactive tasks ahead of long running streams. However, this is causing lots of complications with timeslicing where we both want to honour the boost and yet ignore it. Those complications cause unexpected user behaviour (tasks not being timesliced and run concurrently as epxected), and the easiest way to resolve that is to remove the boost. Hopefully, we can find a compromise again if we need to, but in theory timeslicing itself and future more advanced schedulers should give us the interactivity boost we seek. Testcase: igt/gem_exec_schedule/lateslice Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200507152338.7452-3-chris@chris-wilson.co.uk
|
#
e3d29130 |
|
04-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Implement legacy MI_STORE_DATA_IMM The older arches did not convert MI_STORE_DATA_IMM to using the GTT, but left them writing to a physical address. The notes suggest that the primary reason would be so that the writes were cache coherent, as the CPU cache uses physical tagging. As such we did not implement the legacy variant of MI_STORE_DATA_IMM and so left all the relocations synchronous -- but with a small function to convert from the vma address into the physical address, we can implement asynchronous relocs on these older arches, fixing up a few tests that require them. In order to be able to test the legacy paths, refactor the gpu relocations so that we can hook them up to a selftest. v2: Use an array of offsets not enum labels for the selftest v3: Refactor the common igt_hexdump() Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/757 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200504140629.28240-1-chris@chris-wilson.co.uk
|
#
f5b62bdb |
|
04-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Specify address type for chained reloc batches It is required that a chained batch be in the same address domain as its parent, and also that must be specified in the command for earlier gen as it is not inferred from the chaining until gen6. Fixes: 964a9b0f611e ("drm/i915/gem: Use chained reloc batches") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200504125149.4396-1-chris@chris-wilson.co.uk
|
#
6f576d62 |
|
01-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Try an alternate engine for relocations If at first we don't succeed, try try again. Not all engines may support the MI ops we need to perform asynchronous relocation patching, and so we end up falling back to a synchronous operation that has a liability of blocking. However, Tvrtko pointed out we don't need to use the same engine to perform the relocations as we are planning to execute the execbuf on, and so if we switch over to a working engine, we can perform the relocation asynchronously. The user execbuf will be queued after the relocations by virtue of fencing. This patch creates a new context per execbuf requiring asynchronous relocations on an unusable engines. This is perhaps a bit excessive and can be ameliorated by a small context cache, but for the moment we only need it for working around a little used engine on Sandybridge, and only if relocations are actually required to an active batch buffer. Now we just need to teach the relocation code to handle physical addressing for gen2/3, and we should then have universal support! Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Testcase: igt/gem_exec_reloc/basic-spin # snb Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200501192945.22215-3-chris@chris-wilson.co.uk
|
#
0e97fbb0 |
|
01-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Use a single chained reloc batches for a single execbuf As we can now keep chaining together a relocation batch to process any number of relocations, we can keep building that relocation batch for all of the target vma. This avoiding emitting a new request into the ring for each target, consuming precious ring space and a potential stall. v2: Propagate the failure from submitting the relocation batch. Testcase: igt/gem_exec_reloc/basic-wide-active Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200501192945.22215-2-chris@chris-wilson.co.uk
|
#
964a9b0f |
|
01-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Use chained reloc batches The ring is a precious resource: we anticipate to only use a few hundred bytes for a request, and only try to reserve that before we start. If we go beyond our guess in building the request, then instead of waiting at the start of execbuf before we hold any locks or other resources, we may trigger a wait inside a critical region. One example is in using gpu relocations, where currently we emit a new MI_BB_START from the ring every time we overflow a page of relocation entries. However, instead of insert the command into the precious ring, we can chain the next page of relocation entries as MI_BB_START from the end of the previous. v2: Delay the emit_bb_start until after all the chained vma synchronisation is complete. Since the buffer pool batches are idle, this _should_ be a no-op, but one day we may some fancy async GPU bindings for new vma! v3: Use pool/batch consitently, once we start thinking in terms of the batch vma, use batch->obj. v4: Explain the magic number 4. Tvrtko spotted that we lose propagation of the error for failing to submit the relocation request; that's easier to fix up in the next patch. Testcase: igt/gem_exec_reloc/basic-many-active Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200501192945.22215-1-chris@chris-wilson.co.uk
|
#
b44f6873 |
|
03-Apr-2020 |
Christophe Leroy <christophe.leroy@c-s.fr> |
drm/i915/gem: Replace user_access_begin by user_write_access_begin When i915_gem_execbuffer2_ioctl() is using user_access_begin(), that's only to perform unsafe_put_user() so use user_write_access_begin() in order to only open write access. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ebf1250b6d4f351469fb339e5399d8b92aa8a1c1.1585898438.git.christophe.leroy@c-s.fr
|
#
16e87459 |
|
29-Apr-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Move the batch buffer pool from the engine to the gt Since the introduction of 'soft-rc6', we aim to park the device quickly and that results in frequent idling of the whole device. Currently upon idling we free the batch buffer pool, and so this renders the cache ineffective for many workloads. If we want to have an effective cache of recently allocated buffers available for reuse, we need to decouple that cache from the engine powermanagement and make it timer based. As there is no reason then to keep it within the engine (where it once made retirement order easier to track), we can move it up the hierarchy to the owner of the memory allocations. v2: Hook up to debugfs/drop_caches to clear the cache on demand. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200430111819.10262-2-chris@chris-wilson.co.uk
|
#
50689771 |
|
22-Apr-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Only close vma we open The history of i915_vma_close() is confusing, as is its use. As the lifetime of the i915_vma is currently bounded by the object it is attached to, we needed a means of identify when a vma was no longer in use by userspace (via the user's fd). This is further complicated by that only ppgtt vma should be closed at the user's behest, as the ggtt were always shared. Now that we attach the vma to a lut on the user's context, the open count does indicate how many unique and open context/vm are referencing this vma from the user. As such, we can and should just use the open_count to track when the vma is still in use by userspace. It's a poor man's replacement for reference counting. Closes: https://gitlab.freedesktop.org/drm/intel/issues/1193 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200422190558.30509-1-chris@chris-wilson.co.uk
|
#
e94f7856 |
|
07-Apr-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Promote 'remain' to unsigned long Tidy the code by casting remain to unsigned long once for the duration of eb_relocate_vma() Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200407085930.19421-1-chris@chris-wilson.co.uk
|
#
1aaea847 |
|
05-Apr-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Flush all the reloc_gpu batch __i915_gem_object_flush_map() takes a byte range, so feed it the written bytes and do not mistake the u32 index as bytes! Fixes: a679f58d0510 ("drm/i915: Flush pages on acquisition") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.william.auld@gmail.com> Cc: <stable@vger.kernel.org> # v5.2+ Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200406114821.10949-1-chris@chris-wilson.co.uk (cherry picked from commit 30c88a47f1abd5744908d3681f54dcf823fe2a12) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
721017cf |
|
31-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Ignore readonly failures when updating relocs If the user passes in a readonly reloc[], by the time we notice we have already committed to modifying the execobjects, or have indeed done so already. Reporting the failure just compounds the issue as we have no second pass to fall back to anymore. "Be damned if you do, and damned if you don't." Testcase: igt/gem_exec_reloc/readonly Fixes: 7dc8f1143778 ("drm/i915/gem: Drop relocation slowpath") References: fddcd00a49e9 ("drm/i915: Force the slow path after a user-write error") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200331162150.3635-1-chris@chris-wilson.co.uk (cherry picked from commit 97a37c919f6262fe75afc4a4eb838093bf18b032) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
39d571d1 |
|
06-Apr-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Take DBG_FORCE_RELOC into account prior to using reloc_gpu If we set the debug flag to force ourselves not to relocate via the gpu, do not relocate via the gpu. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200406123616.7334-1-chris@chris-wilson.co.uk
|
#
30c88a47 |
|
05-Apr-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Flush all the reloc_gpu batch __i915_gem_object_flush_map() takes a byte range, so feed it the written bytes and do not mistake the u32 index as bytes! Fixes: a679f58d0510 ("drm/i915: Flush pages on acquisition") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.william.auld@gmail.com> Cc: <stable@vger.kernel.org> # v5.2+ Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200406114821.10949-1-chris@chris-wilson.co.uk
|
#
8a338f4b |
|
01-Apr-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Try allocating va from free space If the current node/entry location is occupied, and the object is not pinned, try assigning it some free space. We cannot wait here, so if in doubt, we unreserve and try to grab all at once. v2: Use the final pin_flags so that we won't have to move the object if we find the wrong free space. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200401194135.5442-1-chris@chris-wilson.co.uk
|
#
97a37c91 |
|
31-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Ignore readonly failures when updating relocs If the user passes in a readonly reloc[], by the time we notice we have already committed to modifying the execobjects, or have indeed done so already. Reporting the failure just compounds the issue as we have no second pass to fall back to anymore. "Be damned if you do, and damned if you don't." Testcase: igt/gem_exec_reloc/readonly Fixes: 7dc8f1143778 ("drm/i915/gem: Drop relocation slowpath") References: fddcd00a49e9 ("drm/i915: Force the slow path after a user-write error") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200331162150.3635-1-chris@chris-wilson.co.uk
|
#
0f1dd022 |
|
30-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Split eb_vma into its own allocation Use a separate array allocation for the execbuf vma, so that we can track their lifetime independently from the copy of the user arguments. With luck, this has a secondary benefit of splitting the malloc size to within reason and avoid vmalloc. The downside is that we might require two separate vmallocs -- but much less likely. In the process, this prevents a memory leak on the ww_mutex error unwind. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1390 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200330133710.14385-1-chris@chris-wilson.co.uk
|
#
7bf03e75 |
|
13-Feb-2020 |
Nathan Chancellor <nathan@kernel.org> |
drm/i915: Cast remain to unsigned long in eb_relocate_vma A recent commit in clang added -Wtautological-compare to -Wall, which is enabled for i915 after -Wtautological-compare is disabled for the rest of the kernel so we see the following warning on x86_64: ../drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:1433:22: warning: result of comparison of constant 576460752303423487 with expression of type 'unsigned int' is always false [-Wtautological-constant-out-of-range-compare] if (unlikely(remain > N_RELOC(ULONG_MAX))) ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~ ../include/linux/compiler.h:78:42: note: expanded from macro 'unlikely' # define unlikely(x) __builtin_expect(!!(x), 0) ^ 1 warning generated. It is not wrong in the case where ULONG_MAX > UINT_MAX but it does not account for the case where this file is built for 32-bit x86, where ULONG_MAX == UINT_MAX and this check is still relevant. Cast remain to unsigned long, which keeps the generated code the same (verified with clang-11 on x86_64 and GCC 9.2.0 on x86 and x86_64) and the warning is silenced so we can catch more potential issues in the future. Closes: https://github.com/ClangBuiltLinux/linux/issues/778 Suggested-by: Michel Dänzer <michel@daenzer.net> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200214054706.33870-1-natechancellor@gmail.com
|
#
2e46a2a0 |
|
19-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Use explicit flag to mark unreachable intel_context I need to keep the GEM context around a bit longer so adding an explicit flag for syncing execbuf with closed/abandonded contexts. v2: * Use already available context flags. (Chris) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200319170707.8262-1-chris@chris-wilson.co.uk (cherry picked from commit 207e4a71fb53e761be72daaeb78a49225bc31c69) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
92581f9f |
|
24-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Immediately execute the fenced work If the caller allows and we do not have to wait for any signals, immediately execute the work within the caller's process. By doing so we avoid the overhead of scheduling a new task, and the latency in executing it, at the cost of pulling that work back into the immediate context. (Sometimes we still prefer to offload the task to another cpu, especially if we plan on executing many such tasks in parallel for this client.) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200325120227.8044-2-chris@chris-wilson.co.uk
|
#
41e4065a |
|
23-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Rely on direct submission to the queue Drop the pretense of kicking the tasklet (used only for the defunct guc submission backend, it should just take ownership of the submit!) and so remove the bh-kicking from around submission. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200323092841.22240-5-chris@chris-wilson.co.uk
|
#
93159e12 |
|
23-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Avoid gem_context->mutex for simple vma lookup As we store the handle lookup inside a radix tree, we do not need the gem_context->mutex except until we need to insert our lookup into the common radix tree. This takes a small bit of rearranging to ensure that the lut we insert into the tree is ready prior to actually inserting it (as soon as it is exposed via the radixtree, it is visible to any other submission). v2: For brownie points, remove the goto spaghetti. v3: Tighten up the closed-handle checks. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200323092841.22240-8-chris@chris-wilson.co.uk
|
#
207e4a71 |
|
19-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Use explicit flag to mark unreachable intel_context I need to keep the GEM context around a bit longer so adding an explicit flag for syncing execbuf with closed/abandonded contexts. v2: * Use already available context flags. (Chris) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200319170707.8262-1-chris@chris-wilson.co.uk
|
#
7dc8f114 |
|
11-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Drop relocation slowpath Since the relocations are no longer performed under a global struct_mutex, or any other lock, that is also held by pagefault handlers, we can relax and allow our fast path to take a fault. As we no longer need to abort the fast path for lock avoidance, we no longer need the slow path handling at all. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200311160310.26711-1-chris@chris-wilson.co.uk
|
#
1d61c5d7 |
|
05-Mar-2020 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915: be more solid in checking the alignment The alignment is u64, and yet is_power_of_2() assumes unsigned long, which might give different results between 32b and 64b kernel. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200305203534.210466-1-matthew.auld@intel.com Cc: stable@vger.kernel.org (cherry picked from commit 2920516b2f719546f55079bc39a7fe409d9e80ab) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
ef398881 |
|
06-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Limit struct_mutex to eb_reserve We only need to serialise the multiple pinning during the eb_reserve phase. Ideally this would be using the vm->mutex as an outer lock, or using a composite global mutex (ww_mutex), but at the moment we are using struct_mutex for the group. Closes: https://gitlab.freedesktop.org/drm/intel/issues/1381 Fixes: 003d8b9143a6 ("drm/i915/gem: Only call eb_lookup_vma once during execbuf ioctl") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200306071614.2846708-3-chris@chris-wilson.co.uk
|
#
2920516b |
|
05-Mar-2020 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915: be more solid in checking the alignment The alignment is u64, and yet is_power_of_2() assumes unsigned long, which might give different results between 32b and 64b kernel. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200305203534.210466-1-matthew.auld@intel.com Cc: stable@vger.kernel.org
|
#
36e191f0 |
|
03-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Apply i915_request_skip() on submission Trying to use i915_request_skip() prior to i915_request_add() causes us to try and fill the ring upto request->postfix, which has not yet been set, and so may cause us to memset() past the end of the ring. Instead of skipping the request immediately, just flag the error on the request (only accepting the first fatal error we see) and then clear the request upon submission. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200304121849.2448028-1-chris@chris-wilson.co.uk
|
#
003d8b91 |
|
03-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Only call eb_lookup_vma once during execbuf ioctl As we no longer stash anything inside i915_vma under the exclusive protection of struct_mutex, we do not need to revoke the i915_vma stashes before dropping struct_mutex to handle pagefaults. Knowing that we must drop the struct_mutex while keeping the eb->vma around, means that we are required to hold onto to the object reference until we have marked the vma as active. Fixes: 155ab8836caa ("drm/i915: Move object close under its own lock") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200303204345.1859734-3-chris@chris-wilson.co.uk
|
#
7d6236bb |
|
03-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Extract transient execbuf flags from i915_vma For our convenience, and to avoid frequent allocations, we placed some lists we use for execbuf inside the common i915_vma struct. As we look to parallelise execbuf, such fields guarded by the struct_mutex BKL must be pulled under local control. Instead of using the i915_vma as our primary means of tracking the user's list of objects and their virtual mappings, we use a local eb_vma with the same lists as before (just now local not global). This should allow us to only perform the lookup of vma used for execution once during the execbuf ioctl, as currently we need to remove our secrets from inside i915_vma everytime we drop the struct_mutex as another execbuf may use the shared locations. Once potential user visible consequence is that we can remove the requirement that the execobj[] be unique, and only require that they do not conflict (i.e. you cannot softpin the same object into two locations. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200303204345.1859734-2-chris@chris-wilson.co.uk
|
#
2920bb94 |
|
03-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Drop inspection of execbuf flags during evict With the goal of removing the serialisation from around execbuf, we will no longer have the privilege of there being a single execbuf in flight at any time and so will only be able to inspect the user's flags within the carefully controlled execbuf context. i915_gem_evict_for_node() is the only user outside of execbuf that currently peeks at the flag to convert an overlapping softpinned request from ENOSPC to EINVAL. Retract this nicety and only report ENOSPC if the location is in current use, either due to this execbuf or another. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200303204345.1859734-1-chris@chris-wilson.co.uk
|
#
61231f6b |
|
03-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Check that the context wasn't closed during setup As setup takes a long time, the user may close the context during the construction of the execbuf. In order to make sure we correctly track all outstanding work with non-persistent contexts, we need to serialise the submission with the context closure and mop up any leaks. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200303080546.1140508-3-chris@chris-wilson.co.uk
|
#
83d2bdb6 |
|
25-Feb-2020 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: significantly reduce the use of <drm/i915_drm.h> The #include has been splattered all over the place, but there are precious few places, all .c files, that actually need it. v2: remove leftover double newlines Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200225133131.3301-1-jani.nikula@intel.com
|
#
cb4d5dc3 |
|
25-Feb-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Honour O_NONBLOCK before throttling execbuf submissions Check the user's flags on the struct file before deciding whether or not to stall before submitting a request. This allows us to reasonably cheaply honour O_NONBLOCK without checking at more critical phases during request submission. Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Steve Carbonari <steven.carbonari@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200225192206.1107336-3-chris@chris-wilson.co.uk
|
#
2c59fd06 |
|
25-Feb-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Cleanup shadow batch after I915_EXEC_SECURE Tidy up after a call to eb_parse() if a later bind fails. Closes: https://gitlab.freedesktop.org/drm/intel/issues/1312 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200225082233.274530-4-chris@chris-wilson.co.uk
|
#
01c1b2cb |
|
13-Jan-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Take local vma references for the parser Take and hold a reference to each of the vma (and their objects) as we process them with the cmdparser. This stops them being freed during the work if the GEM execbuf is interrupted and the request we expected to keep the objects alive is incomplete. Fixes: 686c7c35abc2 ("drm/i915/gem: Asynchronous cmdparser") Closes: https://gitlab.freedesktop.org/drm/intel/issues/970 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200113154555.1909639-1-chris@chris-wilson.co.uk (cherry picked from commit 36c8e356a76e147f0b631fd29838147c01b50d04) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
34ffabe3 |
|
24-Jan-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove 'prefault_disable' modparam The 'prefault_disable' modparam was used by IGT to prevent a few prefaulting operations to make fault handling under struct_mutex more prominent. With the removal of struct_mutex, this is not as important any more and we have almost completely stopped using the parameter. The remaining use in execbuf is now immaterial and can be dropped without affecting coverage. We must re-address the idea of fault injection though. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Antonio Argenziano <antonio.argenziano@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200124230656.687503-1-chris@chris-wilson.co.uk
|
#
d0bf4582 |
|
22-Jan-2020 |
Wambui Karuga <wambui.karugax@gmail.com> |
drm/i915/gem: manual conversion to struct drm_device logging macros. Convert most of the remaining uses of the printk based logging macros to the new struct drm_device based logging macros in drm/i915/gem. This also involves extracting the struct drm_i915_private device from various types, and using it in the various macros. Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Wambui Karuga <wambui.karugax@gmail.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200122125750.9737-3-wambui.karugax@gmail.com
|
#
baa89ba3 |
|
22-Jan-2020 |
Wambui Karuga <wambui.karugax@gmail.com> |
drm/i915/gem: initial conversion to new logging macros using coccinelle First pass of conversion to the new struct drm_based device logging macros in the drm/i915/gem directory. This conversion was achieved using the following coccinelle script that transforms based on the existence of a straightforward struct drm_i915_private device: @rule1@ identifier fn, T; @@ fn(struct drm_i915_private *T,...) { <+... ( -DRM_INFO( +drm_info(&T->drm, ...) | -DRM_ERROR( +drm_err(&T->drm, ...) | -DRM_WARN( +drm_warn(&T->drm, ...) | -DRM_DEBUG( +drm_dbg(&T->drm, ...) | -DRM_DEBUG_DRIVER( +drm_dbg(&T->drm, ...) | -DRM_DEBUG_KMS( +drm_dbg_kms(&T->drm, ...) | -DRM_DEBUG_ATOMIC( +drm_dbg_atomic(&T->drm, ...) ) ...+> } @rule2@ identifier fn, T; @@ fn(...) { ... struct drm_i915_private *T = ...; <+... ( -DRM_INFO( +drm_info(&T->drm, ...) | -DRM_ERROR( +drm_err(&T->drm, ...) | -DRM_WARN( +drm_warn(&T->drm, ...) | -DRM_DEBUG( +drm_dbg(&T->drm, ...) | -DRM_DEBUG_KMS( +drm_dbg_kms(&T->drm, ...) | -DRM_DEBUG_DRIVER( +drm_dbg(&T->drm, ...) | -DRM_DEBUG_ATOMIC( +drm_dbg_atomic(&T->drm, ...) ) ...+> } Checkpatch warnings were addressed manually. Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Wambui Karuga <wambui.karugax@gmail.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200122125750.9737-2-wambui.karugax@gmail.com
|
#
36c8e356 |
|
13-Jan-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Take local vma references for the parser Take and hold a reference to each of the vma (and their objects) as we process them with the cmdparser. This stops them being freed during the work if the GEM execbuf is interrupted and the request we expected to keep the objects alive is incomplete. Fixes: 686c7c35abc2 ("drm/i915/gem: Asynchronous cmdparser") Closes: https://gitlab.freedesktop.org/drm/intel/issues/970 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200113154555.1909639-1-chris@chris-wilson.co.uk
|
#
e1c31fb5 |
|
06-Jan-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Merge i915_request.flags with i915_request.fence.flags As we already have a flags field buried within i915_request, reuse it! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200106114234.2529613-3-chris@chris-wilson.co.uk
|
#
9f3ccd40 |
|
20-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Drop GEM context as a direct link from i915_request Keep the intel_context as being the primary state for i915_request, with the GEM context a backpointer from the low level state for the rarer cases we need client information. Our goal is to remove such references to clients from the backend, and leave the HW submission agnostic to client interfaces and self-contained. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191220101230.256839-1-chris@chris-wilson.co.uk
|
#
a76cf569 |
|
17-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Keep request alive while attaching fences Since commit e5dadff4b093 ("drm/i915: Protect request retirement with timeline->mutex"), the request retirement can happen outside of the struct_mutex serialised only by the timeline->mutex. We drop the timeline->mutex on submitting the request (i915_request_add) so after that point, it is liable to be freed. Make sure our local reference is kept alive until we have finished attaching it to the signalers. (Note that this erodes the argument that i915_request_add should consume the reference, but that is a slightly larger patch!) Fixes: e5dadff4b093 ("drm/i915: Protect request retirement with timeline->mutex") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191217134729.3297818-1-chris@chris-wilson.co.uk (cherry picked from commit e14177f19739d74839eb496a27f5f5d958beaa5b) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
e14177f1 |
|
17-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Keep request alive while attaching fences Since commit e5dadff4b093 ("drm/i915: Protect request retirement with timeline->mutex"), the request retirement can happen outside of the struct_mutex serialised only by the timeline->mutex. We drop the timeline->mutex on submitting the request (i915_request_add) so after that point, it is liable to be freed. Make sure our local reference is kept alive until we have finished attaching it to the signalers. (Note that this erodes the argument that i915_request_add should consume the reference, but that is a slightly larger patch!) Fixes: e5dadff4b093 ("drm/i915: Protect request retirement with timeline->mutex") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191217134729.3297818-1-chris@chris-wilson.co.uk
|
#
686c7c35 |
|
11-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Asynchronous cmdparser Execute the cmdparser asynchronously as part of the submission pipeline. Using our dma-fences, we can schedule execution after an asynchronous piece of work, so we move the cmdparser out from under the struct_mutex inside execbuf as run it as part of the submission pipeline. The same security rules apply, we copy the user batch before validation and userspace cannot touch the validation shadow. The only caveat is that we will do request construction before we complete cmdparsing and so we cannot know the outcome of the validation step until later -- so the execbuf ioctl does not report -EINVAL directly, but we must cancel execution of the request and flag the error on the out-fence. Closes: https://gitlab.freedesktop.org/drm/intel/issues/611 Closes: https://gitlab.freedesktop.org/drm/intel/issues/412 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191211230858.599030-2-chris@chris-wilson.co.uk
|
#
32d94048 |
|
11-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Prepare gen7 cmdparser for async execution The gen7 cmdparser is primarily a promotion-based system to allow access to additional registers beyond the HW validation, and allows fallback to normal execution of the user batch buffer if valid and requires chaining. In the next patch, we will do the cmdparser validation in the pipeline asynchronously and so at the point of request construction we will not know if we want to execute the privileged and validated batch, or the original user batch. The solution employed here is to execute both batches, one with raised privileges and one as normal. This is because the gen7 MI_BATCH_BUFFER_START command cannot change privilege level within a batch and must strictly use the current privilege level (or undefined behaviour kills the GPU). So in order to execute the original batch, we need a second non-priviledged batch buffer chain from the ring, i.e. we need to emit two batches for each user batch. Inside the two batches we determine which one should actually execute, we provide a conditional trampoline to call the original batch. Implementation-wise, we create a single buffer and write the shadow and the trampoline inside it at different offsets; and bind the buffer into both the kernel GGTT for the privileged execution of the shadow and into the user ppGTT for the non-privileged execution of the trampoline and original batch. One buffer, two batches and two vma. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191211230858.599030-1-chris@chris-wilson.co.uk
|
#
51696691 |
|
11-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Tidy up error handling for eb_parse() As the caller no longer uses the i915_vma result, stop returning it and just return the error code instead. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191211110437.4082687-4-chris@chris-wilson.co.uk
|
#
755bf8a8 |
|
11-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove redundant parameters from intel_engine_cmd_parser Declutter the calling interface by reducing the parameters to the i915_vma and associated offsets. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191211110437.4082687-2-chris@chris-wilson.co.uk
|
#
00aff3f6 |
|
08-Dec-2019 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
drm/i915: Improve execbuf debug Convert i915_gem_check_execbuffer to return the error code instead of a boolean so our neat EINVAL debugging trick works within this function. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191209122314.16289-1-tvrtko.ursulin@linux.intel.com
|
#
05975cd9 |
|
04-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove vestigal i915_gem_context locals from cmdparser The use GEM context itself was removed in commit cd30a5031704 ("drm/i915/gem: Excise the per-batch whitelist from the context"), but the locals were left in place as an oversight. Remove the parameters and clean up. References: cd30a5031704 ("drm/i915/gem: Excise the per-batch whitelist from the context") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191204232616.94397-1-chris@chris-wilson.co.uk
|
#
d92f77de |
|
28-Nov-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
Revert "drm/i915: use a separate context for gpu relocs" Since commit c45e788d95b4 ("drm/i915/tgl: Suspend pre-parser across GTT invalidations"), we now disable the advanced preparser on Tigerlake for the invalidation phase at the start of the batch, we no longer need to emit the GPU relocations from a second context as they are now flushed inlined. References: 8a9a982767b7 ("drm/i915: use a separate context for gpu relocs") References: c45e788d95b4 ("drm/i915/tgl: Suspend pre-parser across GTT invalidations") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191129124846.949100-1-chris@chris-wilson.co.uk
|
#
b291ce0a |
|
15-Nov-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Purge the sudden reappearance of i915_gem_object_pin() This died many years ago as we now use i915_vma first and foremost. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191115170835.1367869-1-chris@chris-wilson.co.uk
|
#
f8c08d8f |
|
20-Sep-2018 |
Jon Bloomfield <jon.bloomfield@intel.com> |
drm/i915/cmdparser: Add support for backward jumps To keep things manageable, the pre-gen9 cmdparser does not attempt to track any form of nested BB_START's. This did not prevent usermode from using nested starts, or even chained batches because the cmdparser is not strictly enforced pre gen9. Instead, the existence of a nested BB_START would cause the batch to be emitted in insecure mode, and any privileged capabilities would not be available. For Gen9, the cmdparser becomes mandatory (for BCS at least), and so not providing any form of nested BB_START support becomes overly restrictive. Any such batch will simply not run. We make heavy use of backward jumps in igt, and it is much easier to add support for this restricted subset of nested jumps, than to rewrite the whole of our test suite to avoid them. Add the required logic to support limited backward jumps, to instructions that have already been validated by the parser. Note that it's not sufficient to simply approve any BB_START that jumps backwards in the buffer because this would allow an attacker to embed a rogue instruction sequence within the operand words of a harmless instruction (say LRI) and jump to that. We introduce a bit array to track every instr offset successfully validated, and test the target of BB_START against this. If the target offset hits, it is re-written to the same offset in the shadow buffer and the BB_START cmd is allowed. Note: This patch deliberately ignores checkpatch issues in the cmdtables, in order to match the style of the surrounding code. We'll correct the entire file in one go in a later patch. v2: set dispatch secure late (Mika) v3: rebase (Mika) v4: Clear whitelist on each parse Minor review updates (Chris) v5: Correct backward jump batching v6: fix compilation error due to struct eb shuffle (Mika) Cc: Tony Luck <tony.luck@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
|
#
435e8fc0 |
|
01-Aug-2018 |
Jon Bloomfield <jon.bloomfield@intel.com> |
drm/i915: Allow parsing of unsized batches In "drm/i915: Add support for mandatory cmdparsing" we introduced the concept of mandatory parsing. This allows the cmdparser to be invoked even when user passes batch_len=0 to the execbuf ioctl's. However, the cmdparser needs to know the extents of the buffer being scanned. Refactor the code to ensure the cmdparser uses the actual object size, instead of the incoming length, if user passes 0. Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Tyler Hicks <tyhicks@canonical.com> Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
|
#
4f7af194 |
|
22-May-2018 |
Jon Bloomfield <jon.bloomfield@intel.com> |
drm/i915: Support ro ppgtt mapped cmdparser shadow buffers For Gen7, the original cmdparser motive was to permit limited use of register read/write instructions in unprivileged BB's. This worked by copying the user supplied bb to a kmd owned bb, and running it in secure mode, from the ggtt, only if the scanner finds no unsafe commands or registers. For Gen8+ we can't use this same technique because running bb's from the ggtt also disables access to ppgtt space. But we also do not actually require 'secure' execution since we are only trying to reduce the available command/register set. Instead we will copy the user buffer to a kmd owned read-only bb in ppgtt, and run in the usual non-secure mode. Note that ro pages are only supported by ppgtt (not ggtt), but luckily that's exactly what we need. Add the required paths to map the shadow buffer to ppgtt ro for Gen8+ v2: IS_GEN7/IS_GEN (Mika) v3: rebase v4: rebase v5: rebase Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
|
#
311a50e7 |
|
01-Aug-2018 |
Jon Bloomfield <jon.bloomfield@intel.com> |
drm/i915: Add support for mandatory cmdparsing The existing cmdparser for gen7 can be bypassed by specifying batch_len=0 in the execbuf call. This is safe because bypassing simply reduces the cmd-set available. In a later patch we will introduce cmdparsing for gen9, as a security measure, which must be strictly enforced since without it we are vulnerable to DoS attacks. Introduce the concept of 'required' cmd parsing that cannot be bypassed by submitting zero-length bb's. v2: rebase (Mika) v2: rebase (Mika) v3: fix conflict on engine flags (Mika) Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
|
#
66d8aba1 |
|
08-Jun-2018 |
Jon Bloomfield <jon.bloomfield@intel.com> |
drm/i915: Remove Master tables from cmdparser The previous patch has killed support for secure batches on gen6+, and hence the cmdparsers master tables are now dead code. Remove them. Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Tyler Hicks <tyhicks@canonical.com> Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
|
#
44157641 |
|
08-Jun-2018 |
Jon Bloomfield <jon.bloomfield@intel.com> |
drm/i915: Disable Secure Batches for gen6+ Retroactively stop reporting support for secure batches through the api for gen6+ so that older binaries trigger the fallback path instead. Older binaries use secure batches pre gen6 to access resources that are not available to normal usermode processes. However, all known userspace explicitly checks for HAS_SECURE_BATCHES before relying on the secure batch feature. Since there are no known binaries relying on this for newer gens we can kill secure batches from gen6, via I915_PARAM_HAS_SECURE_BATCHES. v2: rebase (Mika) v3: rebase (Mika) Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
|
#
2871ea85 |
|
24-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Split intel_ring_submission Split the legacy submission backend from the common CS ring buffer handling. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191024100344.5041-1-chris@chris-wilson.co.uk
|
#
9cd20ef7 |
|
14-Oct-2019 |
Lionel Landwerlin <lionel.g.landwerlin@intel.com> |
drm/i915/perf: allow holding preemption on filtered ctx We would like to make use of perf in Vulkan. The Vulkan API is much lower level than OpenGL, with applications directly exposed to the concept of command buffers (pretty much equivalent to our batch buffers). In Vulkan, queries are always limited in scope to a command buffer. In OpenGL, the lack of command buffer concept meant that queries' duration could span multiple command buffers. With that restriction gone in Vulkan, we would like to simplify measuring performance just by measuring the deltas between the counter snapshots written by 2 MI_RECORD_PERF_COUNT commands, rather than the more complex scheme we currently have in the GL driver, using 2 MI_RECORD_PERF_COUNT commands and doing some post processing on the stream of OA reports, coming from the global OA buffer, to remove any unrelated deltas in between the 2 MI_RECORD_PERF_COUNT. Disabling preemption only apply to a single context with which want to query performance counters for and is considered a privileged operation, by default protected by CAP_SYS_ADMIN. It is possible to enable it for a normal user by disabling the paranoid stream setting. v2: Store preemption setting in intel_context (Chris) v3: Use priorities to avoid preemption rather than the HW mechanism v4: Just modify the port priority reporting function v5: Add nopreempt flag on gem context and always flag requests appropriately, regarless of OA reconfiguration. Link: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/932 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191014201404.22468-4-chris@chris-wilson.co.uk
|
#
a4e7ccda |
|
04-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Move context management under GEM Keep track of the GEM contexts underneath i915->gem.contexts and assign them their own lock for the purposes of list management. v2: Focus on lock tracking; ctx->vm is protected by ctx->mutex v3: Correct split with removal of logical HW ID Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-15-chris@chris-wilson.co.uk
|
#
2850748e |
|
04-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Pull i915_vma_pin under the vm->mutex Replace the struct_mutex requirement for pinning the i915_vma with the local vm->mutex instead. Note that the vm->mutex is tainted by the shrinker (we require unbinding from inside fs-reclaim) and so we cannot allocate while holding that mutex. Instead we have to preallocate workers to do allocate and apply the PTE updates after we have we reserved their slot in the drm_mm (using fences to order the PTE writes with the GPU work and with later unbind). In adding the asynchronous vma binding, one subtle requirement is to avoid coupling the binding fence into the backing object->resv. That is the asynchronous binding only applies to the vma timeline itself and not to the pages as that is a more global timeline (the binding of one vma does not need to be ordered with another vma, nor does the implicit GEM fencing depend on a vma, only on writes to the backing store). Keeping the vma binding distinct from the backing store timelines is verified by a number of async gem_exec_fence and gem_exec_schedule tests. The way we do this is quite simple, we keep the fence for the vma binding separate and only wait on it as required, and never add it to the obj->resv itself. Another consequence in reducing the locking around the vma is the destruction of the vma is no longer globally serialised by struct_mutex. A natural solution would be to add a kref to i915_vma, but that requires decoupling the reference cycles, possibly by introducing a new i915_mm_pages object that is own by both obj->mm and vma->pages. However, we have not taken that route due to the overshadowing lmem/ttm discussions, and instead play a series of complicated games with trylocks to (hopefully) ensure that only one destruction path is called! v2: Add some commentary, and some helpers to reduce patch churn. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk
|
#
b290a78b |
|
03-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Use helpers for drm_mm_node booleans A subset of 71724f708997 ("drm/mm: Use helpers for drm_mm_node booleans") in order to prepare drm-intel-next-queued for subsequent patches before we can backmerge 71724f708997 itself. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004142226.13711-1-chris@chris-wilson.co.uk
|
#
4ee92c71 |
|
03-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/mm: Convert drm_mm_node booleans to bitops A straightforward conversion of assignment and checking of the boolean state flags (allocated, scanned) into non-atomic bitops. The caller remains responsible for all locking around the drm_mm and its nodes. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191003210100.22250-4-chris@chris-wilson.co.uk
|
#
71724f70 |
|
03-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/mm: Use helpers for drm_mm_node booleans In preparation for rearranging the booleans into a flags field, ensure all the current users are using the inline helpers and not directly accessing the members. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191003210100.22250-3-chris@chris-wilson.co.uk
|
#
8a9a9827 |
|
27-Aug-2019 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915: use a separate context for gpu relocs The CS pre-parser can pre-fetch commands across memory sync points and starting from gen12 it is able to pre-fetch across BB_START and BB_END boundaries as well, so when we emit gpu relocs the pre-parser might fetch the target location of the reloc before the memory write lands. The parser can't pre-fetch across the ctx switch, so we use a separate context to guarantee that the memory is synchronized before the parser can get to it. Note that there is no risk of the CS doing a lite restore from the reloc context to the user context, even if the two have the same hw_id, because since gen11 the CS also checks the LRCA when deciding if it can lite-restore. v2: limit new context to gen12+, release in eb_destroy, add a comment in emit_fini_breadcrumb (Chris). Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190827185805.21799-1-daniele.ceraolospurio@intel.com
|
#
cccdce1d |
|
27-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Make engine's batch pool safe for use with virtual engines A virtual engine itself does not have a batch pool, but we can gleefully use any of its siblings instead. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190827135935.3831-1-chris@chris-wilson.co.uk
|
#
1f7fd484 |
|
22-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Replace i915_vma_put_fence() Avoid calling i915_vma_put_fence() by using our alternate paths that bind a secondary vma avoiding the original fenced vma. For the few instances where we need to release the fence (i.e. on binding when the GGTT range becomes invalid), replace the put_fence with a revoke_fence. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190822061557.18402-1-chris@chris-wilson.co.uk
|
#
6846895f |
|
21-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Replace PIN_NONFAULT with calls to PIN_NOEVICT When under severe stress for GTT mappable space, the LRU eviction model falls off a cliff. We spend all our time scanning the much larger non-mappable area searching for something within the mappable zone we can evict. Turn this on its head by only using the full vma for the object if it is already pinned in the mappable zone or there is sufficient *free* space to accommodate it (prioritizing speedy reuse). If there is not, immediately fall back to using small chunks (tilerow for GTT mmap, single pages for pwrite/relocation) and using random eviction before doing a full search. Testcase: igt/gem_concurrent_blt References: https://bugs.freedesktop.org/show_bug.cgi?id=110848 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190821123234.19194-1-chris@chris-wilson.co.uk
|
#
44c22f3f |
|
20-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Serialize insertion into the file->mm.request_list Currently, we remove the from per-file request list for throttling and retirement under a dedicated spinlock, but insertion is governed by struct_mutex. This needs to be the same lock so that the retirement/insertion of neighbouring requests (at the tail) doesn't break the list. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190820080907.4665-1-chris@chris-wilson.co.uk
|
#
70d6894d |
|
18-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Serialize against vma moves Make sure that when submitting requests, we always serialize against potential vma moves and clflushes. Time for a i915_request_await_vma() interface! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190819112033.30638-1-chris@chris-wilson.co.uk
|
#
e5dadff4 |
|
15-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Protect request retirement with timeline->mutex Forgo the struct_mutex requirement for request retirement as we have been transitioning over to only using the timeline->mutex for controlling the lifetime of a request on that timeline. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190815205709.24285-4-chris@chris-wilson.co.uk
|
#
52791eee |
|
11-Aug-2019 |
Christian König <christian.koenig@amd.com> |
dma-buf: rename reservation_object to dma_resv Be more consistent with the naming of the other DMA-buf objects. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/323401/
|
#
e6a9522a |
|
25-Jul-2019 |
Josh Poimboeuf <jpoimboe@redhat.com> |
drm/i915: Remove redundant user_access_end() from __copy_from_user() error path Objtool reports: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: .altinstr_replacement+0x36: redundant UACCESS disable __copy_from_user() already does both STAC and CLAC, so the user_access_end() in its error path adds an extra unnecessary CLAC. Fixes: 0b2c8f8b6b0c ("i915: fix missing user_access_end() in page fault exception case") Reported-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://github.com/ClangBuiltLinux/linux/issues/617 Link: https://lkml.kernel.org/r/51a4155c5bc2ca847a9cbe85c1c11918bb193141.1564086017.git.jpoimboe@redhat.com
|
#
75d0a7f3 |
|
09-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Lift timeline into intel_context Move the timeline from being inside the intel_ring to intel_context itself. This saves much pointer dancing and makes the relations of the context to its timeline much clearer. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190809182518.20486-4-chris@chris-wilson.co.uk
|
#
1a07e86c |
|
09-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Generalise BSD default selection For the default I915_EXEC_BSD round robin selector, it may select any available VCS engine. Make it so. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190809091010.23281-3-chris@chris-wilson.co.uk
|
#
6b86f900 |
|
09-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Replace global bsd_dispatch_index with random seed We keep a global seed for the legacy BSD round-robin selector, but in our testing of multiple simultaneous client workloads, a random seed spreads the load more evenly. (As even as an initial round-robin selector can be!) Removing the global is one less variable we have to find a home for! We can simulate multi-client (both same and mixed workloads) using igt/gem_wsim to work out optimal strategies and then compare our simulation with the actual transcoder on multi-engine machines. This fixed round-robin turns out to be one of the worst methods. No user is advised to use this method; the current suggestion is to use a virtual engine for agnostic batches, randomised submission or using the busyness tracking to select the most idle engine at the time of dispatch. At the present time, intel-media is explicit, but libva still seems to use it, with the exception of batches that must execute on vcs0. Oh well. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190809091010.23281-2-chris@chris-wilson.co.uk
|
#
d5b2a3a4 |
|
09-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Check for a second VCS engine more carefully To use the legacy BSD selector, you must have a second VCS engine, or else the ABI simply maps the request for another engine onto VCS0. However, we only checked a single VCS1 location and overlooking the possibility of a sparse VCS set being mapped to the dense ABI. v2: num_vcs_engines() turns out to be reusable and futureproof it so we never have to worry about this silly bit of ABI again! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190809123153.20574-1-chris@chris-wilson.co.uk
|
#
6da4a2c4 |
|
06-Aug-2019 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: remove unnecessary includes of intel_display_types.h header With its original name intel_drv.h the intel_display_types.h header was superfluously cargo-cult included all over the place, while it's really mostly about display internals. Remove the unnecessary includes. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/e3d737f0ab87c55969e62c1e077e15c04c238297.1565085692.git.jani.nikula@intel.com
|
#
1d455f8d |
|
06-Aug-2019 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: rename intel_drv.h to display/intel_display_types.h Everything about the file is about display, and mostly about types related to display. Move under display/ as intel_display_types.h to reflect the facts. There's still plenty to clean up, but start off with moving the file where it logically belongs and naming according to contents. v2: fix the include guard name in the renamed file Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190806113933.11799-1-jani.nikula@intel.com
|
#
b40d7378 |
|
04-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Replace struct_mutex for batch pool serialisation Switch to tracking activity via i915_active on individual nodes, only keeping a list of retired objects in the cache, and reaping the cache when the engine itself idles. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190804124826.30272-2-chris@chris-wilson.co.uk
|
#
a4e57f90 |
|
04-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Teach execbuffer to take the engine wakeref not GT In the next patch, we would like to couple into the engine wakeref to free the batch pool on idling. The caveat here is that we therefore want to track the engine wakeref more precisely and to hold it instead of the broader GT wakeref as we process the ioctl. v2: Avoid introducing odd semantics for a shortlived timeline->mutex acquisition interface. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190804124826.30272-1-chris@chris-wilson.co.uk
|
#
576f0586 |
|
29-Jul-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Flush extra hard after writing relocations through the GTT Recently discovered in commit bdae33b8b82b ("drm/i915: Use maximum write flush for pwrite_gtt") was that we needed to our full write barrier before changing the GGTT PTE to ensure that our indirect writes through the GTT landed before the PTE changed (and the writes end up in a different page). That also applies to our GGTT relocation path. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Reviewed-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190730112151.5633-4-chris@chris-wilson.co.uk
|
#
f5d974f9 |
|
30-Jul-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Provide a local intel_context.vm Track the currently bound address space used by the HW context. Minor conversions to use the local intel_context.vm are made, leaving behind some more surgery required to make intel_context the primary through the selftests. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190730143209.4549-2-chris@chris-wilson.co.uk
|
#
cb823ed9 |
|
12-Jul-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Use intel_gt as the primary object for handling resets Having taken the first step in encapsulating the functionality by moving the related files under gt/, the next step is to start encapsulating by passing around the relevant structs rather than the global drm_i915_private. In this step, we pass intel_gt to intel_reset.c Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190712192953.9187-1-chris@chris-wilson.co.uk
|
#
0c91621c |
|
25-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Pass intel_gt to pm routines Switch from passing the i915 container to newly named struct intel_gt. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190625130128.11009-2-chris@chris-wilson.co.uk
|
#
baea429d |
|
21-Jun-2019 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
drm/i915: Move i915_gem_chipset_flush to intel_gt This aligns better with the rest of restructuring. v2: * Move call out of line. (Chris) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-24-tvrtko.ursulin@linux.intel.com
|
#
df0566a6 |
|
13-Jun-2019 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: move modesetting core code under display/ Now that we have a new subdirectory for display code, continue by moving modesetting core code. display/intel_frontbuffer.h sticks out like a sore thumb, otherwise this is, again, a surprisingly clean operation. v2: - don't move intel_sideband.[ch] (Ville) - use tabs for Makefile file lists and sort them Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190613084416.6794-3-jani.nikula@intel.com
|
#
e568ac38 |
|
11-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Pull kref into i915_address_space Make the kref common to both derived structs (i915_ggtt and i915_ppgtt) so that we can safely reference count an abstract ctx->vm address space. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190611091238.15808-1-chris@chris-wilson.co.uk
|
#
155ab883 |
|
05-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Move object close under its own lock Use i915_gem_object_lock() to guard the LUT and active reference to allow us to break free of struct_mutex for handling GEM_CLOSE. Testcase: igt/gem_close_race Testcase: igt/gem_exec_parallel Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190606112320.9704-1-chris@chris-wilson.co.uk
|
#
6951e589 |
|
28-May-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Move GEM object domain management from struct_mutex to local Use the per-object local lock to control the cache domain of the individual GEM objects, not struct_mutex. This is a huge leap forward for us in terms of object-level synchronisation; execbuffers are coordinated using the ww_mutex and pread/pwrite is finally fully serialised again. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-10-chris@chris-wilson.co.uk
|
#
10be98a7 |
|
28-May-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Move more GEM objects under gem/ Continuing the theme of separating out the GEM clutter. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-8-chris@chris-wilson.co.uk
|