#
801fa7a8 |
|
16-Dec-2022 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915: improve the catch-all evict to handle lock contention The catch-all evict can fail due to object lock contention, since it only goes as far as trylocking the object, due to us already holding the vm->mutex. Doing a full object lock here can deadlock, since the vm->mutex is always our inner lock. Add another execbuf pass which drops the vm->mutex and then tries to grab the object will the full lock, before then retrying the eviction. This should be good enough for now to fix the immediate regression with userspace seeing -ENOSPC from execbuf due to contended object locks during GTT eviction. v2 (Mani) - Also revamp the docs for the different passes. Testcase: igt@gem_ppgtt@shrink-vs-evict-* Fixes: 7e00897be8bf ("drm/i915: Add object locking to i915_gem_evict_for_node and i915_gem_evict_something, v2.") References: https://gitlab.freedesktop.org/drm/intel/-/issues/7627 References: https://gitlab.freedesktop.org/drm/intel/-/issues/7570 References: https://bugzilla.mozilla.org/show_bug.cgi?id=1779558 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Mani Milani <mani@chromium.org> Cc: <stable@vger.kernel.org> # v5.18+ Reviewed-by: Mani Milani <mani@chromium.org> Tested-by: Mani Milani <mani@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20221216113456.414183-1-matthew.auld@intel.com
|
#
0f857158 |
|
21-Nov-2022 |
Aravind Iddamsetty <aravind.iddamsetty@intel.com> |
drm/i915/mtl: Media GT and Render GT share common GGTT On XE_LPM+ platforms the media engines are carved out into a separate GT but have a common GGTMMADR address range which essentially makes the GGTT address space to be shared between media and render GT. As a result any updates in GGTT shall invalidate TLB of GTs sharing it and similarly any operation on GGTT requiring an action on a GT will have to involve all GTs sharing it. setup_private_pat was being done on a per GGTT based as that doesn't touch any GGTT structures moved it to per GT based. BSPEC: 63834 v2: 1. Add details to commit msg 2. includes fix for failure to add item to ggtt->gt_list, as suggested by Lucas 3. as ggtt_flush() is used only for ggtt drop i915_is_ggtt check within it. 4. setup_private_pat moved out of intel_gt_tiles_init v3: 1. Move out for_each_gt from i915_driver.c (Jani Nikula) v4: drop using RCU primitives on ggtt->gt_list as it is not an RCU list (Matt Roper) Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221122070126.4813-1-aravind.iddamsetty@intel.com
|
#
3f882f2d |
|
16-Dec-2022 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915: improve the catch-all evict to handle lock contention The catch-all evict can fail due to object lock contention, since it only goes as far as trylocking the object, due to us already holding the vm->mutex. Doing a full object lock here can deadlock, since the vm->mutex is always our inner lock. Add another execbuf pass which drops the vm->mutex and then tries to grab the object will the full lock, before then retrying the eviction. This should be good enough for now to fix the immediate regression with userspace seeing -ENOSPC from execbuf due to contended object locks during GTT eviction. v2 (Mani) - Also revamp the docs for the different passes. Testcase: igt@gem_ppgtt@shrink-vs-evict-* Fixes: 7e00897be8bf ("drm/i915: Add object locking to i915_gem_evict_for_node and i915_gem_evict_something, v2.") References: https://gitlab.freedesktop.org/drm/intel/-/issues/7627 References: https://gitlab.freedesktop.org/drm/intel/-/issues/7570 References: https://bugzilla.mozilla.org/show_bug.cgi?id=1779558 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Mani Milani <mani@chromium.org> Cc: <stable@vger.kernel.org> # v5.18+ Reviewed-by: Mani Milani <mani@chromium.org> Tested-by: Mani Milani <mani@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20221216113456.414183-1-matthew.auld@intel.com (cherry picked from commit 801fa7a81f6da533cc5442fc40e32c72b76cd42a) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
7e00897b |
|
14-Jan-2022 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Add object locking to i915_gem_evict_for_node and i915_gem_evict_something, v2. Because we will start to require the obj->resv lock for unbinding, ensure these vma eviction utility functions also take the lock. This requires some function signature changes, to ensure that the ww context is passed around, but is mostly straightforward. Previously this was split up into several patches, but reworking should allow for easier bisection. Changes since v1: - Handle evicting dead objects better. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220114132320.109030-4-maarten.lankhorst@linux.intel.com
|
#
6945c53b |
|
17-Jan-2022 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Add locking to i915_gem_evict_vm(), v3. i915_gem_evict_vm will need to be able to evict objects that are locked by the current ctx. By testing if the current context already locked the object, we can do this correctly. This allows us to evict the entire vm even if we already hold some objects' locks. Previously, this was spread over several commits, but it makes more sense to commit the changes to i915_gem_evict_vm separately from the changes to i915_gem_evict_something() and i915_gem_evict_for_node(). Changes since v1: - Handle evicting dead objects better. Changes since v2: - Use for_i915_gem_ww in igt_evict_vm. (Thomas) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> [mlankhorst: Fix up doc warning.] Link: https://patchwork.freedesktop.org/patch/msgid/20220117075604.131477-1-maarten.lankhorst@linux.intel.com
|
#
2ef97818 |
|
07-Jan-2022 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: split out i915_gem_evict.h from i915_drv.h We already have the i915_gem_evict.c file. v2: Fixed commit message (Tvrtko) Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ec666853171d04daeb21a93083940df36907c343.1641561552.git.jani.nikula@intel.com
|
#
b97060a9 |
|
21-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC When running the GuC the GPU can't be considered idle if the GuC still has contexts pinned. As such, a call has been added in intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for the number of unpinned contexts to go to zero. v2: rtimeout -> remaining_timeout v3: Drop unnecessary includes, guc_submission_busy_loop -> guc_submission_send_busy_loop, drop negatie timeout trick, move a refactor of guc_context_unpin to earlier path (John H) v4: Add stddef.h back into intel_gt_requests.h, sort circuit idle function if not in GuC submission mode Cc: John Harrison <john.c.harrison@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-16-matthew.brost@intel.com
|
#
e956996c |
|
19-Jan-2021 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Protect used framebuffers from casual eviction In the shrinker, we protect framebuffers from light reclaim as we typically expect framebuffers to be reused in the near future (and with low latency requirements). We can apply the same logic to the GGTT eviction and defer framebuffers to the second pass only used if the caller is desperate enough to wait for space to become available. In most cases, the caller will use a smaller partial vma instead of trying to force the object into the GGTT if doing so will cause other users to be evicted. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210119214336.1463-5-chris@chris-wilson.co.uk
|
#
e9d2871f |
|
16-Nov-2020 |
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> |
drm: fix some kernel-doc markups Some identifiers have different names between their prototypes and the kernel-doc markup. Others need to be fixed, as kernel-doc markups should use this format: identifier - description Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/12d4ca26f6843618200529ce5445063734d38c04.1605521731.git.mchehab+huawei@kernel.org
|
#
955da9d7 |
|
08-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Handle idling during i915_gem_evict_something busy loops i915_gem_evict_something() is charged with finding a slot within the GTT that we may reuse. Since our goal is not to stall, we first look for a slot that only overlaps idle vma. To this end, on the first pass we move any active vma to the end of the search list. However, we only stopped moving active vma after we see the first active vma twice. If during the search, that first active vma completed, we would not notice and keep on extending the search list. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1746 Fixes: 2850748ef876 ("drm/i915: Pull i915_vma_pin under the vm->mutex") Fixes: b1e3177bd1d8 ("drm/i915: Coordinate i915_active with its own mutex") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.5+ Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200509115217.26853-1-chris@chris-wilson.co.uk (cherry picked from commit 73e28cc40bf00b5d168cb8f5cff1ae63e9097446) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
73e28cc4 |
|
08-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Handle idling during i915_gem_evict_something busy loops i915_gem_evict_something() is charged with finding a slot within the GTT that we may reuse. Since our goal is not to stall, we first look for a slot that only overlaps idle vma. To this end, on the first pass we move any active vma to the end of the search list. However, we only stopped moving active vma after we see the first active vma twice. If during the search, that first active vma completed, we would not notice and keep on extending the search list. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1746 Fixes: 2850748ef876 ("drm/i915: Pull i915_vma_pin under the vm->mutex") Fixes: b1e3177bd1d8 ("drm/i915: Coordinate i915_active with its own mutex") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.5+ Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200509115217.26853-1-chris@chris-wilson.co.uk
|
#
53dd7028 |
|
08-Apr-2020 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915/evict: watch out for unevictable nodes In an address space there can be sprinkling of I915_COLOR_UNEVICTABLE nodes, which lack a parent vma. For platforms with cache coloring we might be very unlucky and abut with such a node thinking we can simply unbind the vma. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200408170456.399604-1-matthew.auld@intel.com
|
#
2920bb94 |
|
03-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Drop inspection of execbuf flags during evict With the goal of removing the serialisation from around execbuf, we will no longer have the privilege of there being a single execbuf in flight at any time and so will only be able to inspect the user's flags within the carefully controlled execbuf context. i915_gem_evict_for_node() is the only user outside of execbuf that currently peeks at the flag to convert an overlapping softpinned request from ENOSPC to EINVAL. Retract this nicety and only report ENOSPC if the location is in current use, either due to this execbuf or another. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200303204345.1859734-1-chris@chris-wilson.co.uk
|
#
83d2bdb6 |
|
25-Feb-2020 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: significantly reduce the use of <drm/i915_drm.h> The #include has been splattered all over the place, but there are precious few places, all .c files, that actually need it. v2: remove leftover double newlines Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200225133131.3301-1-jani.nikula@intel.com
|
#
a725d711 |
|
05-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Ignore most failures during evict-vm Removing all vma from the VM is best effort -- we only remove all those ready to be removed, so forgive and VMA that becomes pinned. While forgiving those that become pinned, also take a second look for any that became unpinned as we waited. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191205113726.413351-1-chris@chris-wilson.co.uk
|
#
66101975 |
|
04-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Move request runtime management onto gt Requests are run from the gt and are tided into the gt runtime power management, so pull the runtime request management under gt/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-12-chris@chris-wilson.co.uk
|
#
f33a8a51 |
|
04-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Merge wait_for_timelines with retire_request wait_for_timelines is essentially the same loop as retiring requests (with an extra timeout), so merge the two into one routine. v2: i915_retire_requests_timeout and keep VT'd w/a as !interruptible Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-10-chris@chris-wilson.co.uk
|
#
2850748e |
|
04-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Pull i915_vma_pin under the vm->mutex Replace the struct_mutex requirement for pinning the i915_vma with the local vm->mutex instead. Note that the vm->mutex is tainted by the shrinker (we require unbinding from inside fs-reclaim) and so we cannot allocate while holding that mutex. Instead we have to preallocate workers to do allocate and apply the PTE updates after we have we reserved their slot in the drm_mm (using fences to order the PTE writes with the GPU work and with later unbind). In adding the asynchronous vma binding, one subtle requirement is to avoid coupling the binding fence into the backing object->resv. That is the asynchronous binding only applies to the vma timeline itself and not to the pages as that is a more global timeline (the binding of one vma does not need to be ordered with another vma, nor does the implicit GEM fencing depend on a vma, only on writes to the backing store). Keeping the vma binding distinct from the backing store timelines is verified by a number of async gem_exec_fence and gem_exec_schedule tests. The way we do this is quite simple, we keep the fence for the vma binding separate and only wait on it as required, and never add it to the obj->resv itself. Another consequence in reducing the locking around the vma is the destruction of the vma is no longer globally serialised by struct_mutex. A natural solution would be to add a kref to i915_vma, but that requires decoupling the reference cycles, possibly by introducing a new i915_mm_pages object that is own by both obj->mm and vma->pages. However, we have not taken that route due to the overshadowing lmem/ttm discussions, and instead play a series of complicated games with trylocks to (hopefully) ensure that only one destruction path is called! v2: Add some commentary, and some helpers to reduce patch churn. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk
|
#
b290a78b |
|
03-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Use helpers for drm_mm_node booleans A subset of 71724f708997 ("drm/mm: Use helpers for drm_mm_node booleans") in order to prepare drm-intel-next-queued for subsequent patches before we can backmerge 71724f708997 itself. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004142226.13711-1-chris@chris-wilson.co.uk
|
#
71724f70 |
|
03-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/mm: Use helpers for drm_mm_node booleans In preparation for rearranging the booleans into a flags field, ensure all the current users are using the inline helpers and not directly accessing the members. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191003210100.22250-3-chris@chris-wilson.co.uk
|
#
33dd8899 |
|
09-Sep-2019 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915: cleanup cache-coloring Try to tidy up the cache-coloring such that we rid the code of any mm.color_adjust assumptions, this should hopefully make it more obvious in the code when we need to actually use the cache-level as the color, and as a bonus should make adding a different color-scheme simpler. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190909124052.22900-3-matthew.auld@intel.com
|
#
6846895f |
|
21-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Replace PIN_NONFAULT with calls to PIN_NOEVICT When under severe stress for GTT mappable space, the LRU eviction model falls off a cliff. We spend all our time scanning the much larger non-mappable area searching for something within the mappable zone we can evict. Turn this on its head by only using the full vma for the object if it is already pinned in the mappable zone or there is sufficient *free* space to accommodate it (prioritizing speedy reuse). If there is not, immediately fall back to using small chunks (tilerow for GTT mmap, single pages for pwrite/relocation) and using random eviction before doing a full search. Testcase: igt/gem_concurrent_blt References: https://bugs.freedesktop.org/show_bug.cgi?id=110848 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190821123234.19194-1-chris@chris-wilson.co.uk
|
#
6da4a2c4 |
|
06-Aug-2019 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: remove unnecessary includes of intel_display_types.h header With its original name intel_drv.h the intel_display_types.h header was superfluously cargo-cult included all over the place, while it's really mostly about display internals. Remove the unnecessary includes. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/e3d737f0ab87c55969e62c1e077e15c04c238297.1565085692.git.jani.nikula@intel.com
|
#
1d455f8d |
|
06-Aug-2019 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: rename intel_drv.h to display/intel_display_types.h Everything about the file is about display, and mostly about types related to display. Move under display/ as intel_display_types.h to reflect the facts. There's still plenty to clean up, but start off with moving the file where it logically belongs and naming according to contents. v2: fix the include guard name in the renamed file Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190806113933.11799-1-jani.nikula@intel.com
|
#
10be98a7 |
|
28-May-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Move more GEM objects under gem/ Continuing the theme of separating out the GEM clutter. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-8-chris@chris-wilson.co.uk
|
#
79ffac85 |
|
24-Apr-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Invert the GEM wakeref hierarchy In the current scheme, on submitting a request we take a single global GEM wakeref, which trickles down to wake up all GT power domains. This is undesirable as we would like to be able to localise our power management to the available power domains and to remove the global GEM operations from the heart of the driver. (The intent there is to push global GEM decisions to the boundary as used by the GEM user interface.) Now during request construction, each request is responsible via its logical context to acquire a wakeref on each power domain it intends to utilize. Currently, each request takes a wakeref on the engine(s) and the engines themselves take a chipset wakeref. This gives us a transition on each engine which we can extend if we want to insert more powermangement control (such as soft rc6). The global GEM operations that currently require a struct_mutex are reduced to listening to pm events from the chipset GT wakeref. As we reduce the struct_mutex requirement, these listeners should evaporate. Perhaps the biggest immediate change is that this removes the struct_mutex requirement around GT power management, allowing us greater flexibility in request construction. Another important knock-on effect, is that by tracking engine usage, we can insert a switch back to the kernel context on that engine immediately, avoiding any extra delay or inserting global synchronisation barriers. This makes tracking when an engine and its associated contexts are idle much easier -- important for when we forgo our assumed execution ordering and need idle barriers to unpin used contexts. In the process, it means we remove a large chunk of code whose only purpose was to switch back to the kernel context. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Imre Deak <imre.deak@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-5-chris@chris-wilson.co.uk
|
#
7d6ce558 |
|
08-Mar-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove has-kernel-context We can no longer assume execution ordering, and in particular we cannot assume which context will execute last. One side-effect of this is that we cannot determine if the kernel-context is resident on the GPU, so remove the routines that claimed to do so. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308093657.8640-4-chris@chris-wilson.co.uk
|
#
c6eeb479 |
|
08-Mar-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Reduce presumption of request ordering for barriers Currently we assume that we know the order in which requests run and so can determine if we need to reissue a switch-to-kernel-context prior to idling. That assumption does not hold for the future, so instead of tracking which barriers have been used, simply determine if we have ever switched away from the kernel context by using the engine and before idling ensure that all engines that have been used since the last idle are synchronously switched back to the kernel context for safety (and else of shrinking memory while idle). v2: Use intel_engine_mask_t and ALL_ENGINES Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308093657.8640-3-chris@chris-wilson.co.uk
|
#
09d7e46b |
|
28-Jan-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Pull VM lists under the VM mutex. A starting point to counter the pervasive struct_mutex. For the goal of avoiding (or at least blocking under them!) global locks during user request submission, a simple but important step is being able to manage each clients GTT separately. For which, we want to replace using the struct_mutex as the guard for all things GTT/VM and switch instead to a specific mutex inside i915_address_space. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190128102356.15037-2-chris@chris-wilson.co.uk
|
#
499197dc |
|
28-Jan-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Stop tracking MRU activity on VMA Our goal is to remove struct_mutex and replace it with fine grained locking. One of the thorny issues is our eviction logic for reclaiming space for an execbuffer (or GTT mmaping, among a few other examples). While eviction itself is easy to move under a per-VM mutex, performing the activity tracking is less agreeable. One solution is not to do any MRU tracking and do a simple coarse evaluation during eviction of active/inactive, with a loose temporal ordering of last insertion/evaluation. That keeps all the locking constrained to when we are manipulating the VM itself, neatly avoiding the tricky handling of possible recursive locking during execbuf and elsewhere. Note that discarding the MRU (currently implemented as a pair of lists, to avoid scanning the active list for a NONBLOCKING search) is unlikely to impact upon our efficiency to reclaim VM space (where we think a LRU model is best) as our current strategy is to use random idle replacement first before doing a search, and over time the use of softpinned 48b per-ppGTT is growing (thereby eliminating any need to perform any eviction searches, in theory at least) with the remaining users being found on much older devices (gen2-gen6). v2: Changelog and commentary rewritten to elaborate on the duality of a single list being both an inactive and active list. v3: Consolidate bool parameters into a single set of flags; don't comment on the duality of a single variable being a multiplicity of bits. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190128102356.15037-1-chris@chris-wilson.co.uk
|
#
2f80d7bd |
|
08-Jan-2019 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: drop all drmP.h includes Needs just a few additional includes here and there. Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190108082709.3748-1-jani.nikula@intel.com
|
#
ec625fb9 |
|
09-Jul-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Provide a timeout to i915_gem_wait_for_idle() Usually we have no idea about the upper bound we need to wait to catch up with userspace when idling the device, but in a few situations we know the system was idle beforehand and can provide a short timeout in order to very quickly catch a failure, long before hangcheck kicks in. In the following patches, we will use the timeout to curtain two overly long waits, where we know we can expect the GPU to complete within a reasonable time or declare it broken. In particular, with a broken GPU we expect it to fail during the initial GPU setup where do a couple of context switches to record the defaults. This is a task that takes a few milliseconds even on the slowest of devices, but we may have to wait 60s for hangcheck to give in and declare the machine inoperable. In this a case where any gpu hang is unacceptable, both from a timeliness and practical standpoint. The other improvement is that in selftests, we do not need to arm an independent timer to inject a wedge, as we can just limit the timeout on the wait directly. v2: Include the timeout parameter in the trace. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180709122044.7028-1-chris@chris-wilson.co.uk
|
#
e61e0f51 |
|
21-Feb-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Rename drm_i915_gem_request to i915_request We want to de-emphasize the link between the request (dependency, execution and fence tracking) from GEM and so rename the struct from drm_i915_gem_request to i915_request. That is we may implement the GEM user interface on top of requests, but they are an abstraction for tracking execution rather than an implementation detail of GEM. (Since they are not tied to HW, we keep the i915 prefix as opposed to intel.) In short, the spatch: @@ @@ - struct drm_i915_gem_request + struct i915_request A corollary to contracting the type name, we also harmonise on using 'rq' shorthand for local variables where space if of the essence and repetition makes 'request' unwieldy. For globals and struct members, 'request' is still much preferred for its clarity. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180221095636.6649-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
20ccd4d3 |
|
24-Oct-2017 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Use same test for eviction and submitting kernel context During evict, we wish to idle the GPU if we see that the GGTT is full. However, our test for idle in i915_gem_evict_something() and in i915_gem_switch_to_kernel_context() do not match leading to disappointment - we never believe that we are idle and keep trying to flush the GGTT ad infinitum. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103438 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171024220855.30155-2-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
753bdbd0 |
|
24-Oct-2017 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Call cond_resched() before repeating i915_gem_evict_something() Insert a breakpoint, a chance to escape back to the scheduler and run something else for a bit, if we find that the GGTT is full and needs to be idled in order to make some room. In practice, this should only be an issue in stress tests as the wait itself will normally give the chance for the scheduler to intervene and make progress. References: https://bugs.freedesktop.org/show_bug.cgi?id=103438 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171024205053.7845-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
99b169d3 |
|
12-Oct-2017 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Fix eviction when the GGTT is idle but full In the full-ppgtt world, we can fill the GGTT full of context objects. These context objects are currently implicitly tracked by the requests that pin them i.e. they are only unpinned when the request is completed and retired, but we do not have the link from the vma to the request (anymore). In order to unpin those contexts, we have to issue another request and wait upon the switch to the kernel context. The bug during eviction was that we assumed that a full GGTT meant we would have requests on the GGTT timeline, and so we missed situations where those requests where merely in flight (and when even they have not yet been submitted to hw yet). The fix employed here is to change the already-is-idle test to no look at the execution timeline, but count the outstanding requests and then check that we have switched to the kernel context. Erring on the side of overkill here just means that we stall a little longer than may be strictly required, but we only expect to hit this path in extreme corner cases where returning an erroneous error is worse than the delay. v2: Logical inversion when swapping over branches. Fixes: 80b204bce8f2 ("drm/i915: Enable multiple timelines") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171012125726.14736-1-chris@chris-wilson.co.uk (cherry picked from commit 55b4f1ce2f23692c57205b9974fba61baa4b9321) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
9c1477e8 |
|
12-Oct-2017 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/selftests: Exercise adding requests to a full GGTT A bug recently encountered involved the issue where are we were submitting requests to different ppGTT, each would pin a segment of the GGTT for its logical context and ring. However, this is invisible to eviction as we do not tie the context/ring VMA to a request and so do not automatically wait upon it them (instead they are marked as pinned, preventing eviction entirely). Instead the eviction code must flush those contexts by switching to the kernel context. This selftest tries to fill the GGTT with contexts to exercise a path where the switch-to-kernel-context failed to make forward progress and we fail with ENOSPC. v2: Make the hole in the filled GGTT explicit. v3: Swap out the arbitrary timeout for a private notification from i915_gem_evict_something() Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171012125726.14736-3-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
#
55b4f1ce |
|
12-Oct-2017 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Fix eviction when the GGTT is idle but full In the full-ppgtt world, we can fill the GGTT full of context objects. These context objects are currently implicitly tracked by the requests that pin them i.e. they are only unpinned when the request is completed and retired, but we do not have the link from the vma to the request (anymore). In order to unpin those contexts, we have to issue another request and wait upon the switch to the kernel context. The bug during eviction was that we assumed that a full GGTT meant we would have requests on the GGTT timeline, and so we missed situations where those requests where merely in flight (and when even they have not yet been submitted to hw yet). The fix employed here is to change the already-is-idle test to no look at the execution timeline, but count the outstanding requests and then check that we have switched to the kernel context. Erring on the side of overkill here just means that we stall a little longer than may be strictly required, but we only expect to hit this path in extreme corner cases where returning an erroneous error is worse than the delay. v2: Logical inversion when swapping over branches. Fixes: 80b204bce8f2 ("drm/i915: Enable multiple timelines") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171012125726.14736-1-chris@chris-wilson.co.uk
|
#
f34a93bb |
|
09-Oct-2017 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Check PIN_NONFAULT overlaps in evict_for_node If the caller says that he doesn't want to evict any other faulting vma, honour that flag. The logic was used in evict_something, but not the more specific evict_for_node, now being used as a preliminary probe since commit 606fec956c0e ("drm/i915: Prefer random replacement before eviction search"). Fixes: 606fec956c0e ("drm/i915: Prefer random replacement before eviction search") Fixes: 821188778b9b ("drm/i915: Choose not to evict faultable objects from the GGTT") References: https://bugs.freedesktop.org/show_bug.cgi?id=102490 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171009084401.29090-4-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
a65adaf8 |
|
09-Oct-2017 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Track user GTT faulting per-vma We don't wish to refault the entire object (other vma) when unbinding one partial vma. To do this track which vma have been faulted into the user's address space. v2: Use a local vma_offset to tidy up a multiline unmap_mapping_range(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171009084401.29090-3-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
c7c6e46f |
|
16-Aug-2017 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Convert execbuf to use struct-of-array packing for critical fields When userspace is doing most of the work, avoiding relocs (using NO_RELOC) and opting out of implicit synchronisation (using ASYNC), we still spend a lot of time processing the arrays in execbuf, even though we now should have nothing to do most of the time. One issue that becomes readily apparent in profiling anv is that iterating over the large execobj[] is unfriendly to the loop prefetchers of the CPU and it much prefers iterating over a pair of arrays rather than one big array. v2: Clear vma[] on construction to handle errors during vma lookup Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170816085210.4199-3-chris@chris-wilson.co.uk
|
#
2889caa9 |
|
16-Jun-2017 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Eliminate lots of iterations over the execobjects array The major scaling bottleneck in execbuffer is the processing of the execobjects. Creating an auxiliary list is inefficient when compared to using the execobject array we already have allocated. Reservation is then split into phases. As we lookup up the VMA, we try and bind it back into active location. Only if that fails, do we add it to the unbound list for phase 2. In phase 2, we try and add all those objects that could not fit into their previous location, with fallback to retrying all objects and evicting the VM in case of severe fragmentation. (This is the same as before, except that phase 1 is now done inline with looking up the VMA to avoid an iteration over the execobject array. In the ideal case, we eliminate the separate reservation phase). During the reservation phase, we only evict from the VM between passes (rather than currently as we try to fit every new VMA). In testing with Unreal Engine's Atlantis demo which stresses the eviction logic on gen7 class hardware, this speed up the framerate by a factor of 2. The second loop amalgamation is between move_to_gpu and move_to_active. As we always submit the request, even if incomplete, we can use the current request to track active VMA as we perform the flushes and synchronisation required. The next big advancement is to avoid copying back to the user any execobjects and relocations that are not changed. v2: Add a Theory of Operation spiel. v3: Fall back to slow relocations in preparation for flushing userptrs. v4: Document struct members, factor out eb_validate_vma(), add a few more comments to explain some magic and hide other magic behind macros. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
8c45cec4 |
|
15-Jun-2017 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Split vma exec_link/evict_link Currently the vma has one link member that is used for both holding its place in the execbuf reservation list, and in any eviction list. This dual property is quite tricky and error prone. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170615081435.17699-3-chris@chris-wilson.co.uk
|
#
d55495b4 |
|
15-Jun-2017 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Use vma->exec_entry as our double-entry placeholder This has the benefit of not requiring us to manipulate the vma->exec_link list when tearing down the execbuffer, and is a marginally cheaper test to detect the user error. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170615081435.17699-2-chris@chris-wilson.co.uk
|
#
72022a70 |
|
30-Mar-2017 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Move retire-requests into i915_gem_wait_for_idle() As we now distinguish everywhere that can call i915_gem_retire_requests() following a successful wait_for_idle, we can remove the duplication by moving that call into i915_gem_wait_for_idle() itself. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-3-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
aac66bf5 |
|
06-Mar-2017 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915: use correct node for handling cache domain eviction It looks like we were incorrectly comparing vma->node against itself instead of the target node, when evicting for a node on systems where we need guard pages between regions with different cache domains. As a consequence we can end up trying to needlessly evict neighbouring nodes, even if they have the same cache domain, and if they were pinned we would fail the eviction. Fixes: 625d988acc28 ("drm/i915: Extract reserving space in the GTT to a helper") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170306235414.23407-3-matthew.auld@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (cherry picked from commit fe65cbdbc97929e4a522716ed279a36783656142) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
fe65cbdb |
|
06-Mar-2017 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915: use correct node for handling cache domain eviction It looks like we were incorrectly comparing vma->node against itself instead of the target node, when evicting for a node on systems where we need guard pages between regions with different cache domains. As a consequence we can end up trying to needlessly evict neighbouring nodes, even if they have the same cache domain, and if they were pinned we would fail the eviction. Fixes: 625d988acc28 ("drm/i915: Extract reserving space in the GTT to a helper") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170306235414.23407-3-matthew.auld@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
#
381b943b |
|
15-Feb-2017 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove i915_address_space.start Once upon a time, back in the UMS days, we supported userspace initialising the GTT and sharing portions of the GTT with other users. Now, we own the GTT (both global and per-process) and the tables always start at 0 - so we can remove i915_address_space.start and forget about this old complication. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170215084357.19977-20-chris@chris-wilson.co.uk
|
#
f40a7b75 |
|
13-Feb-2017 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Initial selftests for exercising eviction Very simple tests to just ask eviction to find some free space in a full GTT and one with some available space. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-41-chris@chris-wilson.co.uk
|
#
a6508ded |
|
06-Feb-2017 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Use page coloring to provide the guard page at the end of the GTT As we now mark the reserved hole (drm_mm.head_node) with the special UNEVICTABLE color, we can use the page coloring to avoid prefetching of the CS beyond the end of the GTT. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170206084547.27921-3-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com>
|
#
4e64e553 |
|
02-Feb-2017 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm: Improve drm_mm search (and fix topdown allocation) with rbtrees The drm_mm range manager claimed to support top-down insertion, but it was neither searching for the top-most hole that could fit the allocation request nor fitting the request to the hole correctly. In order to search the range efficiently, we create a secondary index for the holes using either their size or their address. This index allows us to find the smallest hole or the hole at the bottom or top of the range efficiently, whilst keeping the hole stack to rapidly service evictions. v2: Search for holes both high and low. Rename flags to mode. v3: Discover rb_entry_safe() and use it! v4: Kerneldoc for enum drm_mm_insert_mode. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Sean Paul <seanpaul@chromium.org> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Christian Gmeiner <christian.gmeiner@gmail.com> Cc: Rob Clark <robdclark@gmail.com> Cc: Thierry Reding <thierry.reding@gmail.com> Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Alexandre Courbot <gnurou@gmail.com> Cc: Eric Anholt <eric@anholt.net> Cc: Sinclair Yeh <syeh@vmware.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com> # vmwgfx Reviewed-by: Lucas Stach <l.stach@pengutronix.de> #etnaviv Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20170202210438.28702-1-chris@chris-wilson.co.uk
|
#
e88893fe |
|
05-Jan-2017 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Clear ret before unbinding in i915_gem_evict_something() Missed when rebasing patches, I failed to set ret to zero before starting the unbind loop (which depends upon ret being zero). Reported-by: Matthew Auld <matthew.william.auld@gmail.com> Fixes: 9332f3b1b99a ("drm/i915: Combine loops within i915_gem_evict_something") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.william.auld@gmail.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170105155940.10033-1-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com> Cc: <stable@vger.kernel.org> # v4.9+ (cherry picked from commit 121dfbb2a2ef1c5f49e15c38ccc47ff0beb59446) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
16ee2061 |
|
11-Jan-2017 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Detect vma reserved for execbuf in evict-for-node The vma->exec_list is still the only means we have for both reserving an object in execbuf, and for constructing the eviction list. So during the construction of the eviction list, we must treat anything already on the exec_list as being pinned. Yes, this sharing of two semantically different lists will be fixed! But in the meantime, we have the issue that this is tripping up CI since we started using i915_gem_gtt_reserve_node() + i915_gem_evict_for_node() from the regular execbuf reservation path in commit 606fec956c0e ("drm/i915: Prefer random replacement before eviction search"): [ 108.424063] kernel BUG at drivers/gpu/drm/i915/i915_vma.h:254! [ 108.424072] invalid opcode: 0000 [#1] PREEMPT SMP [ 108.424079] Modules linked in: snd_hda_intel i915 intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_codec snd_hwdep snd_hda_core mei_me snd_pcm lpc_ich mei sdhci_pci sdhci mmc_core e1000e ptp pps_core [last unloaded: i915] [ 108.424132] CPU: 1 PID: 6865 Comm: gem_cs_tlb Tainted: G U 4.10.0-rc3-CI-CI_DRM_2049+ #1 [ 108.424143] Hardware name: Hewlett-Packard HP EliteBook 8440p/172A, BIOS 68CCU Ver. F.24 09/13/2013 [ 108.424154] task: ffff88012ae22600 task.stack: ffffc90000a14000 [ 108.424220] RIP: 0010:i915_gem_evict_for_node+0x237/0x410 [i915] [ 108.424229] RSP: 0018:ffffc90000a17a58 EFLAGS: 00010202 [ 108.424237] RAX: 0000000000005871 RBX: ffff88012d1ad778 RCX: 0000000000000000 [ 108.424246] RDX: 000000007ffff000 RSI: ffffc90000a17a68 RDI: ffff880127e694d8 [ 108.424255] RBP: ffffc90000a17aa0 R08: ffffc90000a17a68 R09: 0000000000000000 [ 108.424264] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000080000000 [ 108.424273] R13: ffffc90000a17a68 R14: ffff880127e694d8 R15: ffffffffa0387330 [ 108.424283] FS: 00007f8236e3d8c0(0000) GS:ffff880137c40000(0000) knlGS:0000000000000000 [ 108.424293] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 108.424305] CR2: 00007f82347a2000 CR3: 000000012c866000 CR4: 00000000000006e0 [ 108.424317] Call Trace: [ 108.424368] i915_gem_gtt_reserve+0x67/0x80 [i915] [ 108.424424] __i915_vma_do_pin+0x248/0x620 [i915] [ 108.424487] ? __i915_vma_do_pin+0x162/0x620 [i915] [ 108.424540] i915_gem_execbuffer_reserve_vma.isra.8+0x153/0x1f0 [i915] [ 108.424591] i915_gem_execbuffer_reserve.isra.9+0x40e/0x440 [i915] [ 108.424643] i915_gem_do_execbuffer.isra.15+0x6d9/0x1b20 [i915] [ 108.424696] i915_gem_execbuffer2+0xc0/0x250 [i915] [ 108.424712] drm_ioctl+0x200/0x450 [ 108.424760] ? i915_gem_execbuffer+0x330/0x330 [i915] [ 108.424776] do_vfs_ioctl+0x90/0x6e0 [ 108.424789] ? up_read+0x1a/0x40 [ 108.424800] ? trace_hardirqs_on_caller+0x122/0x1b0 [ 108.424813] SyS_ioctl+0x3c/0x70 [ 108.424828] entry_SYSCALL_64_fastpath+0x1c/0xb1 [ 108.424839] RIP: 0033:0x7f8235867357 [ 108.424848] RSP: 002b:00007ffdc14504c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 108.424866] RAX: ffffffffffffffda RBX: 00007ffdc1450600 RCX: 00007f8235867357 [ 108.424878] RDX: 00007ffdc14505a0 RSI: 0000000040406469 RDI: 0000000000000003 [ 108.424890] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000022 [ 108.424903] R10: 0000000000000007 R11: 0000000000000246 R12: 0000000000000002 [ 108.424915] R13: 0000000000419101 R14: 00007ffdc1450600 R15: 00007ffdc14505f0 [ 108.424928] Code: 45 b8 8b 4d c0 4c 89 f2 48 89 de ff d0 49 8b 07 4c 8b 45 b8 48 85 c0 75 dd 65 ff 0d d4 a1 c8 5f 0f 84 47 01 00 00 e9 0d fe ff ff <0f> 0b 45 31 f6 4c 8b 65 c8 49 8b 04 24 4d 39 ec 49 8d 9c 24 28 [ 108.425055] RIP: i915_gem_evict_for_node+0x237/0x410 [i915] RSP: ffffc90000a17a58 Fixes: 172ae5b4c8c1 ("drm/i915: Fix i915_gem_evict_for_vma (soft-pinning)") Fixes: 606fec956c0e ("drm/i915: Prefer random replacement before eviction search") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170111182132.19174-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
625d988a |
|
11-Jan-2017 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Extract reserving space in the GTT to a helper Extract drm_mm_reserve_node + calling i915_gem_evict_for_node into its own routine so that it can be shared rather than duplicated. v2: Kerneldoc Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: igvt-g-dev@lists.01.org Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170111112312.31493-2-chris@chris-wilson.co.uk
|
#
f51455d4 |
|
10-Jan-2017 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Replace 4096 with PAGE_SIZE or I915_GTT_PAGE_SIZE Start converting over from the byte count to its semantic macro, either we want to allocate the size of a physical page in main memory or we want the size of a virtual page in the GTT. 4096 could mean either, but PAGE_SIZE and I915_GTT_PAGE_SIZE are explicit and should help improve code comprehension and future changes. In the future, we may want to use variable GTT page sizes and so have the challenge of knowing which hardcoded values were used to represent a physical page vs the virtual page. v2: Look for a few more 4096s to convert, discover IS_ALIGNED(). v3: 4096ul paranoia, make fence alignment a distinct value of 4096, keep bdw stolen w/a as 4096 until we know better. v4: Add asserts that i915_vma_insert() start/end are aligned to GTT page sizes. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170110144734.26052-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
121dfbb2 |
|
05-Jan-2017 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Clear ret before unbinding in i915_gem_evict_something() Missed when rebasing patches, I failed to set ret to zero before starting the unbind loop (which depends upon ret being zero). Reported-by: Matthew Auld <matthew.william.auld@gmail.com> Fixes: 9332f3b1b99a ("drm/i915: Combine loops within i915_gem_evict_something") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.william.auld@gmail.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170105155940.10033-1-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com> Cc: <stable@vger.kernel.org> # v4.9+
|
#
3fa489da |
|
22-Dec-2016 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm: Apply tight eviction scanning to color_adjust Using mm->color_adjust makes the eviction scanner much tricker since we don't know the actual neighbours of the target hole until after it is created (after scanning is complete). To work out whether we need to evict the neighbours because they impact upon the hole, we have to then check the hole afterwards - requiring an extra step in the user of the eviction scanner when they apply color_adjust. v2: Massage kerneldoc. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161222083641.2691-34-chris@chris-wilson.co.uk
|
#
0b04d474 |
|
22-Dec-2016 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm: Compute tight evictions for drm_mm_scan Compute the minimal required hole during scan and only evict those nodes that overlap. This enables us to reduce the number of nodes we need to evict to the bare minimum. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161222083641.2691-31-chris@chris-wilson.co.uk
|
#
2c4b3895 |
|
22-Dec-2016 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm: Unconditionally do the range check in drm_mm_scan_add_block() Doing the check is trivial (low cost in comparison to overall eviction) and helps simplify the code. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161222083641.2691-29-chris@chris-wilson.co.uk
|
#
9a71e277 |
|
22-Dec-2016 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm: Extract struct drm_mm_scan from struct drm_mm The scan state occupies a large proportion of the struct drm_mm and is rarely used and only contains temporary state. That makes it suitable to moving to its struct and onto the stack of the callers. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> [danvet: Fix up etnaviv to compile, was missing a BUG_ON.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
7155b057 |
|
09-Dec-2016 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Retire before attempting to evict from the active lists Some object retain an extra pin whilst they are active (e.g. contexts). This excludes them from being considered for eviction unless we idle the GPU. If before we look at the active list, we retire beforehand we can hopefully remove a few excess pins and reduce the amount of searching required. v2: Similar principle applies to evict_for_vma Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20161209150555.602-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
#
172ae5b4 |
|
05-Dec-2016 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Fix i915_gem_evict_for_vma (soft-pinning) Soft-pinning depends upon being able to check for availabilty of an interval and evict overlapping object from a drm_mm range manager very quickly. Currently it uses a linear list, and so performance is dire and not suitable as a general replacement. Worse, the current code will oops if it tries to evict an active buffer. It also helps if the routine reports the correct error codes as expected by its callers and emits a tracepoint upon use. For posterity since the wrong patch was pushed (i.e. that missed these key points and had known bugs), this is the changelog that should have been on commit 506a8e87d8d2 ("drm/i915: Add soft-pinning API for execbuffer"): Userspace can pass in an offset that it presumes the object is located at. The kernel will then do its utmost to fit the object into that location. The assumption is that userspace is handling its own object locations (for example along with full-ppgtt) and that the kernel will rarely have to make space for the user's requests. This extends the DRM_IOCTL_I915_GEM_EXECBUFFER2 to do the following: * if the user supplies a virtual address via the execobject->offset *and* sets the EXEC_OBJECT_PINNED flag in execobject->flags, then that object is placed at that offset in the address space selected by the context specifier in execbuffer. * the location must be aligned to the GTT page size, 4096 bytes * as the object is placed exactly as specified, it may be used by this execbuffer call without relocations pointing to it It may fail to do so if: * EINVAL is returned if the object does not have a 4096 byte aligned address * the object conflicts with another pinned object (either pinned by hardware in that address space, e.g. scanouts in the aliasing ppgtt) or within the same batch. EBUSY is returned if the location is pinned by hardware EINVAL is returned if the location is already in use by the batch * EINVAL is returned if the object conflicts with its own alignment (as meets the hardware requirements) or if the placement of the object does not fit within the address space All other execbuffer errors apply. Presence of this execbuf extension may be queried by passing I915_PARAM_HAS_EXEC_SOFTPIN to DRM_IOCTL_I915_GETPARAM and checking for a reported value of 1 (or greater). v2: Combine the hole/adjusted-hole ENOSPC checks v3: More color, more splitting, more blurb. Fixes: 506a8e87d8d2 ("drm/i915: Add soft-pinning API for execbuffer") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161205142941.21965-2-chris@chris-wilson.co.uk
|
#
49d73912 |
|
29-Nov-2016 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Convert vm->dev backpointer to vm->i915 99% of the time we access i915_address_space->dev we want the i915 device and not the drm device, so let's store the drm_i915_private backpointer instead. The only real complication here are the inlines in i915_vma.h where drm_i915_private is not yet defined and so we have to choose an alternate path for our asserts. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161129095008.32622-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
80b204bc |
|
28-Oct-2016 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Enable multiple timelines With the infrastructure converted over to tracking multiple timelines in the GEM API whilst preserving the efficiency of using a single execution timeline internally, we can now assign a separate timeline to every context with full-ppgtt. v2: Add a comment to indicate the xfer between timelines upon submission. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-35-chris@chris-wilson.co.uk
|
#
4c7d62c6 |
|
28-Oct-2016 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Markup GEM API with lockdep asserts Add lockdep_assert_held(struct_mutex) to the API preamble of the internal GEM interfaces. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-9-chris@chris-wilson.co.uk
|
#
275f039d |
|
24-Oct-2016 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Move user fault tracking to a separate list We want to decouple RPM and struct_mutex, but currently RPM has to walk the list of bound objects and remove userspace mmapping before we suspend (otherwise userspace may continue to access the GTT whilst it is powered down). This currently requires the struct_mutex to walk the bound_list, but if we move that to a separate list and lock we can take the first step towards removing the struct_mutex. v2: Split runtime suspend unmapping vs regular unmapping, to make the locking (and barriers) clearer. Add the object to the userfault_list prior to inserting the first PTE, the race between add/revoke depends upon struct_mutex for regular unmappings and rpm for runtime-suspend. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> #v1 Link: http://patchwork.freedesktop.org/patch/msgid/20161024124218.18252-1-chris@chris-wilson.co.uk
|
#
3b3f1650 |
|
13-Oct-2016 |
Akash Goel <akash.goel@intel.com> |
drm/i915: Allocate intel_engine_cs structure only for the enabled engines With the possibility of addition of many more number of rings in future, the drm_i915_private structure could bloat as an array, of type intel_engine_cs, is embedded inside it. struct intel_engine_cs engine[I915_NUM_ENGINES]; Though this is still fine as generally there is only a single instance of drm_i915_private structure used, but not all of the possible rings would be enabled or active on most of the platforms. Some memory can be saved by allocating intel_engine_cs structure only for the enabled/active engines. Currently the engine/ring ID is kept static and dev_priv->engine[] is simply indexed using the enums defined in intel_engine_id. To save memory and continue using the static engine/ring IDs, 'engine' is defined as an array of pointers. struct intel_engine_cs *engine[I915_NUM_ENGINES]; dev_priv->engine[engine_ID] will be NULL for disabled engine instances. There is a text size reduction of 928 bytes, from 1028200 to 1027272, for i915.o file (but for i915.ko file text size remain same as 1193131 bytes). v2: - Remove the engine iterator field added in drm_i915_private structure, instead pass a local iterator variable to the for_each_engine** macros. (Chris) - Do away with intel_engine_initialized() and instead directly use the NULL pointer check on engine pointer. (Chris) v3: - Remove for_each_engine_id() macro, as the updated macro for_each_engine() can be used in place of it. (Chris) - Protect the access to Render engine Fault register with a NULL check, as engine specific init is done later in Driver load sequence. v4: - Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris) - Kill the superfluous init_engine_lists(). v5: - Cleanup the intel_engines_init() & intel_engines_setup(), with respect to allocation of intel_engine_cs structure. (Chris) v6: - Rebase. v7: - Optimize the for_each_engine_masked() macro. (Chris) - Change the type of 'iter' local variable to enum intel_engine_id. (Chris) - Rebase. v8: Rebase. v9: Rebase. v10: - For index calculation use engine ID instead of pointer based arithmetic in intel_engine_sync_index() as engine pointers are not contiguous now (Chris) - For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas) - Use for_each_engine macro for cleanup in intel_engines_init() and remove check for NULL engine pointer in cleanup() routines. (Joonas) v11: Rebase. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Akash Goel <akash.goel@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
|
#
22dd3bb9 |
|
09-Sep-2016 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Mark up all locked waiters In the next patch we want to handle reset directly by a locked waiter in order to avoid issues with returning before the reset is handled. To handle the reset, we must first know whether we hold the struct_mutex. If we do not hold the struct_mtuex we can not perform the reset, but we do not block the reset worker either (and so we can just continue to wait for request completion) - otherwise we must relinquish the mutex. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-10-chris@chris-wilson.co.uk
|
#
ea746f36 |
|
09-Sep-2016 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Expand bool interruptible to pass flags to i915_wait_request() We need finer control over wakeup behaviour during i915_wait_request(), so expand the current bool interruptible to a bitmask. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-9-chris@chris-wilson.co.uk
|
#
82118877 |
|
18-Aug-2016 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Choose not to evict faultable objects from the GGTT Often times we do not want to evict mapped objects from the GGTT as these are quite expensive to teardown and frequently reused (causing an equally, if not more so, expensive setup). In particular, when faulting in a new object we want to avoid evicting an active object, or else we may trigger a page-fault-of-doom as we ping-pong between evicting two objects. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-26-chris@chris-wilson.co.uk
|
#
dcff85c8 |
|
05-Aug-2016 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Enable i915_gem_wait_for_idle() without holding struct_mutex The principal motivation for this was to try and eliminate the struct_mutex from i915_gem_suspend - but we still need to hold the mutex current for the i915_gem_context_lost(). (The issue there is that there may be an indirect lockdep cycle between cpu_hotplug (i.e. suspend) and struct_mutex via the stop_machine().) For the moment, enabling last request tracking for the engine, allows us to do busyness checking and waiting without requiring the struct_mutex - which is useful in its own right. As a side-effect of having a robust means for tracking engine busyness, we can replace our other busyness heuristic, that of comparing against the last submitted seqno. For paranoid reasons, we have a semi-ordered check of that seqno inside the hangchecker, which we can now improve to an ordered check of the engine's busyness (removing a locked xchg in the process). v2: Pass along "bool interruptible" as being unlocked we cannot rely on i915->mm.interruptible being stable or even under our control. v3: Replace check Ironlake i915_gpu_busy() with the common precalculated value Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-6-git-send-email-chris@chris-wilson.co.uk
|
#
20dfbde4 |
|
04-Aug-2016 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Wrap vma->pin_count accessors with small inline helpers In the next few patches, the VMA pinning API is overhauled and to reduce the churn we pull out the update to the accessors into a prep patch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-14-git-send-email-chris@chris-wilson.co.uk
|
#
2ffffd0f |
|
04-Aug-2016 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Fix up vma alignment to be u64 This is not the full fix, as we are required to percolate the u64 nature down through the drm_mm stack, but this is required now to prevent explosions due to mismatch between execbuf (eb_vma_misplaced) and vma binding (i915_vma_misplaced) - and reduces the risk of spurious changes as we adjust the vma interface in the next patches. v2: long long casts not required for u64 printk (%llx) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-6-git-send-email-chris@chris-wilson.co.uk
|
#
e522ac23 |
|
04-Aug-2016 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove surplus drm_device parameter to i915_gem_evict_something() Eviction is VM local, so we can ignore the significance of the drm_device in the caller, and leave it to i915_gem_evict_something() to manage itself. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-2-git-send-email-chris@chris-wilson.co.uk
|
#
9332f3b1 |
|
04-Aug-2016 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Combine loops within i915_gem_evict_something Slight micro-optimise to produce combine loops so that gcc is able to optimise the inner-loops concisely. Since we are reviewing the loops, we can update the comments to describe the current state of affairs, in particular the distinction between evicting from the global GTT (which may contain untracked items and transient global pins) and the per-process GTT. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-1-git-send-email-chris@chris-wilson.co.uk
|
#
b1f788c6 |
|
04-Aug-2016 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Release vma when the handle is closed In order to prevent a leak of the vma on shared objects, we need to hook into the object_close callback to destroy the vma on the object for this file. However, if we destroyed that vma immediately we may cause unexpected application stalls as we try to unbind a busy vma - hence we defer the unbind to when we retire the vma. v2: Keep vma allocated until closed. This is useful for a later optimisation, but it is required now in order to handle potential recursion of i915_vma_unbind() by retiring itself. v3: Comments are important. Testcase: igt/gem_ppggtt/flink-and-close-vma-leak Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-26-git-send-email-chris@chris-wilson.co.uk
|
#
f8c417cd |
|
20-Jul-2016 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Rename drm_gem_object_unreference in preparation for lockless free Ultimately wraps kref_put(), so adopt its nomenclature for consistency with other subsystems. s/drm_gem_object_unreference/i915_gem_object_put/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-6-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-5-git-send-email-chris@chris-wilson.co.uk
|
#
25dc556a |
|
20-Jul-2016 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Wrap drm_gem_object_reference in i915_gem_object_get Ultimately wraps kref_get(), so adopt its nomenclature for consistency with other subsystems. s/drm_gem_object_reference/i915_gem_object_get/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-5-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-4-git-send-email-chris@chris-wilson.co.uk
|
#
945657b4 |
|
15-Jul-2016 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/evict: Always switch away from the current context Currently execlists is exempt from emitting a request to switch each ring away from the current context over to the dev_priv->kernel_context (for whatever reason, just under execlists the GGTT is unlikely to be as fragmented, however the switch may help in some extreme cases). Extract the switcher and enable it for execlsts as well, as we need to do so in a later patch to force the context switch before suspend. (And since for that switch we explicitly require the disposable kernel context, rename the extracted function.) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1468590980-6186-1-git-send-email-chris@chris-wilson.co.uk
|
#
883445d4 |
|
24-Jun-2016 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Only switch to default context when evicting from GGTT The contexts only pin space within the global GTT. Therefore forcing the switch to the perma-pinned kernel context only has an effect when trying to evict from and find room within the global GTT. We can then restrict the switch to only when operating on the default context. This is mostly a no-op as full-ppgtt only exists with execlists at present which skips the context switch anyway. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1466776558-21516-7-git-send-email-chris@chris-wilson.co.uk
|
#
6e5a5beb |
|
24-Jun-2016 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Split idling from forcing context switch We only need to force a switch to the kernel context placeholder during eviction. All other uses of i915_gpu_idle() just want to wait until existing work on the GPU is idle. Rename i915_gpu_idle() to i915_gem_wait_for_idle() to avoid any implications about "parking" the context first. v2: Tweak an error message if the wait fails for the ilk vtd w/a Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1466776558-21516-6-git-send-email-chris@chris-wilson.co.uk
|
#
c033666a |
|
06-May-2016 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Store a i915 backpointer from engine, and use it text data bss dec hex filename 6309351 3578714 696320 10584385 a18141 vmlinux 6308391 3578714 696320 10583425 a17d81 vmlinux Almost 1KiB of code reduction. v2: More s/INTEL_INFO()->gen/INTEL_GEN()/ and IS_GENx() conversions text data bss dec hex filename 6304579 3578778 696320 10579677 a16edd vmlinux 6303427 3578778 696320 10578525 a16a5d vmlinux Now over 1KiB! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1462545621-30125-3-git-send-email-chris@chris-wilson.co.uk
|
#
1c7f4bca |
|
26-Feb-2016 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Rename vma->*_list to *_link for consistency Elsewhere we have adopted the convention of using '_link' to denote elements in the list (and '_list' for the actual list_head itself), and that the name should indicate which list the link belongs to (and preferrably not just where the link is being stored). s/vma_link/obj_link/ (we iterate over obj->vma_list) s/mm_list/vm_link/ (we iterate over vm->[in]active_list) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
#
506a8e87 |
|
08-Dec-2015 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Add soft-pinning API for execbuffer Userspace can pass in an offset that it presumes the object is located at. The kernel will then do its utmost to fit the object into that location. The assumption is that userspace is handling its own object locations (for example along with full-ppgtt) and that the kernel will rarely have to make space for the user's requests. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> v2: Fixed incorrect eviction found by Michal Winiarski - fix suggested by Chris Wilson. Fixed incorrect error paths causing crash found by Michal Winiarski. (Not published externally) v3: Rebased because of trivial conflict in object_bind_to_vm. Fixed eviction to allow eviction of soft-pinned objects when another soft-pinned object used by a subsequent execbuffer overlaps reported by Michal Winiarski. (Not published externally) v4: Moved soft-pinned objects to the front of ordered_vmas so that they are pinned first after an address conflict happens to avoid repeated conflicts in rare cases (Suggested by Chris Wilson). Expanded comment on drm_i915_gem_exec_object2.offset to cover this new API. v5: Added I915_PARAM_HAS_EXEC_SOFTPIN parameter for detecting this capability (Kristian). Added check for multiple pinnings on eviction (Akash). Made sure buffers are not considered misplaced without the user specifying EXEC_OBJECT_SUPPORTS_48B_ADDRESS. User must assume responsibility for any addressing workarounds. Updated object2.offset field comment again to clarify NO_RELOC case (Chris). checkpatch cleanup. v6: Trivial rebase on latest drm-intel-nightly v7: Catch attempts to pin above the max virtual address size and return EINVAL (Tvrtko). Decouple EXEC_OBJECT_SUPPORTS_48B_ADDRESS and EXEC_OBJECT_PINNED flags, user must pass both flags in any attempt to pin something at an offset above 4GB (Chris, Daniel Vetter). Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Akash Goel <akash.goel@intel.com> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Zou Nanhai <nanhai.zou@intel.com> Cc: Kristian Høgsberg <hoegsberg@gmail.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Acked-by: PDT Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1449575707-20933-1-git-send-email-thomas.daniel@intel.com
|
#
ce8daef3 |
|
30-Sep-2015 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove dead i915_gem_evict_everything() With UMS gone, we no longer use it during suspend. And with the last user removed from the shrinker, we can remove the dead code. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
eb0b44ad |
|
18-Mar-2015 |
Daniel Vetter <daniel.vetter@ffwll.ch> |
drm/i915: kerneldoc for i915_gem_shrinker.c And remove one bogus * from i915_gem_gtt.c since that's not a kerneldoc there. v2: Review from Chris: - Clarify memory space to better distinguish from address space. - Add note that shrink doesn't guarantee the freed memory and that users must fall back to shrink_all. - Explain how pinning ties in with eviction/shrinker. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
b9b5dce5 |
|
23-Dec-2014 |
Ben Widawsky <benjamin.widawsky@intel.com> |
drm/i915: Add some extra guards in evict_vm v2: Use WARN_ONs (Daniel) Cc: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
7838a63a |
|
05-Jan-2015 |
Daniel Vetter <daniel.vetter@ffwll.ch> |
drm/i915: Include i915_gem_evict.c kerneldoc into the drm docbook I've written these long before we've had a reasonable docbook structure, and naturally they've gone stale. Fix this up asap. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
|
#
cf303626 |
|
09-Sep-2014 |
Michel Thierry <michel.thierry@intel.com> |
drm/i915: fix another use-after-free in i915_gem_evict_everything Also here, i915_gem_evict_vm causes an unbind, which can end up dropping the last ref to the ppgtt. Triggered by igt gem_evict_everything test. Testcase: igt/gem_evict_everything Signed-off-by: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Chris Wilson <chris@cris-wilsonc.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
d23db88c |
|
23-May-2014 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Prevent negative relocation deltas from wrapping This is pure evil. Userspace, I'm looking at you SNA, repacks batch buffers on the fly after generation as they are being passed to the kernel for execution. These batches also contain self-referenced relocations as a single buffer encompasses the state commands, kernels, vertices and sampler. During generation the buffers are placed at known offsets within the full batch, and then the relocation deltas (as passed to the kernel) are tweaked as the batch is repacked into a smaller buffer. This means that userspace is passing negative relocations deltas, which subsequently wrap to large values if the batch is at a low address. The GPU hangs when it then tries to use the large value as a base for its address offsets, rather than wrapping back to the real value (as one would hope). As the GPU uses positive offsets from the base, we can treat the relocation address as the minimum address read by the GPU. For the upper bound, we trust that userspace will not read beyond the end of the buffer. So, how do we fix negative relocations from wrapping? We can either check that every relocation looks valid when we write it, and then position each object such that we prevent the offset wraparound, or we just special-case the self-referential behaviour of SNA and force all batches to be above 256k. Daniel prefers the latter approach. This fixes a GPU hang when it tries to use an address (relocation + offset) greater than the GTT size. The issue would occur quite easily with full-ppgtt as each fd gets its own VM space, so low offsets would often be handed out. However, with the rearrangement of the low GTT due to capturing the BIOS framebuffer, it is already affecting kernels 3.15 onwards. I think only IVB+ is susceptible to this bug, but the workaround should only kick in rarely, so it seems sensible to always apply it. v3: Use a bias for batch buffers to prevent small negative delta relocations from wrapping. v4 from Daniel: - s/BIAS/BATCH_OFFSET_BIAS/ - Extract eb_vma_misplaced/i915_vma_misplaced since the conditions were growing rather cumbersome. - Add a comment to eb_get_batch explaining why we do this. - Apply the batch offset bias everywhere but mention that we've only observed it on gen7 gpus. - Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch. v5: Add static to eb_get_batch, spotted by 0-day tester. Testcase: igt/gem_bad_reloc Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3) Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
50227e1c |
|
31-Mar-2014 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: prefer struct drm_i915_private to drm_i915_private_t Remove the rest of the references to drm_i915_private_t. No functional changes. Signed-off-by: Jani Nikula <jani.nikula@intel.com> [danvet: Drop hunk in i915_cmd_parser.c] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
1ec9e26d |
|
14-Feb-2014 |
Daniel Vetter <daniel.vetter@ffwll.ch> |
drm/i915: Consolidate binding parameters into flags Anything more than just one bool parameter is just a pain to read, symbolic constants are much better. Split out from Chris' vma-binding rework patch. v2: Undo the behaviour change in object_pin that Chris spotted. v3: Split out misplaced hunk to handle set_cache_level errors, spotted by Jani. v4: Keep the current over-zealous binding logic in the execbuffer code working with a quick hack while the overall binding code gets shuffled around. v5: Reorder the PIN_ flags for more natural patch splitup. v6: Pull out the PIN_GLOBAL split-up again. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
c2c1d491 |
|
29-Jan-2014 |
Daniel Vetter <daniel.vetter@ffwll.ch> |
drm/i915: Kerneldoc for i915_gem_evict.c Request by Ben Widawsky in his review of a patch touching this code. v2: Clarify the disdinction between evicting vmas (to free up virtual address space) and evicting objects (to free up actual system memory). Suggested by Ben. Cc: Ben Widawsky <benjamin.widawsky@intel.com> Acked-by: Ben Widawsky <benjamin.widawsky@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
3036537d |
|
28-Jan-2014 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: VM eviction only targets address space not physical pages During eviction, we are only considering how to free up space within the current address space and not concerned with freeing up physical memory. As such we need only skip nodes that pinned in the current VM and not globally. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
74e21ac2 |
|
20-Jan-2014 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Repeat evictions whilst pageflip completions are outstanding Since an old pageflip will keep its scanout buffer object pinned until it has executed its unpin task on the common workqueue, we can clog up our GGTT with stale pinned objects. As we cannot flush those workqueues without dropping our locks, we have to resort to falling back to userspace and telling them to repeat the operation in order to have a chance to run our workqueues and free up the required memory. If we fail, then we are forced to report ENOSPC back to userspace causing the operation to fail and best-case scenario is that it introduces temporary corruption. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
d7f46fc4 |
|
06-Dec-2013 |
Ben Widawsky <benjamin.widawsky@intel.com> |
drm/i915: Make pin count per VMA Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
ad071acb |
|
09-Dec-2013 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Repeat eviction search after idling the GPU With the advent of hw context support, we gained some objects that are pinned for the duration of their request. That is we can make aperture space available by idling the GPU and in the process performing a context switch back to the always-pinned default context. As such, we should not conclude that there is no space in the aperture for the current object until we have unpinned any such context objects. Note that we also have the problem of outstanding pageflips preventing eviction of their framebuffer objects to resolve. Testcase: igt/gem_ctx_exec/eviction Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72507 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: lu hua <huax.lu@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
bcccff84 |
|
24-Sep-2013 |
Ben Widawsky <benjamin.widawsky@intel.com> |
drm/i915: trace vm eviction instead of everything Tracing vm eviction is really the event we care about. For the cases we evict everything, we still will get the trace. v2: Add the drm device to the trace since we might not be the only device in the system. (Chris) Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
68c8c17f |
|
11-Sep-2013 |
Ben Widawsky <benjamin.widawsky@intel.com> |
drm/i915: evict VM instead of everything When reserving objects during execbuf, it is possible to come across an object which will not fit given the current fragmentation of the address space. We do not have any defragment in drm_mm, so the strategy is to instead evict everything, and reallocate objects. With the upcoming addition of multiple VMs, there is no point to evict everything since doing so is overkill for the specific case mentioned above. Recommended-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: One additional s/evict_everything/evict_vm/ to update a comment in the code.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
7b796122 |
|
11-Sep-2013 |
Ben Widawsky <benjamin.widawsky@intel.com> |
drm/i915: Extract vm specific part of eviction As we'll see in the next patch, being able to evict for just 1 VM is handy. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
b93dab6e |
|
26-Aug-2013 |
Daniel Vetter <daniel.vetter@ffwll.ch> |
drm/i915: More vma fixups around unbind/destroy The important bugfix here is that we must not unlink the vma when we keep it around as a placeholder for the execbuf code. Since then we won't find it again when execbuf gets interrupt and restarted and create a 2nd vma. And since the code as-is isn't fit yet to deal with more than one vma, hilarity ensues. Specifically the dma map/unmap of the sg table isn't adjusted for multiple vmas yet and will blow up like this: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffffa008fb37>] i915_gem_gtt_finish_object+0x73/0xc8 [i915] PGD 56bb5067 PUD ad3dd067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: tcp_lp ppdev parport_pc lp parport ipv6 dm_mod dcdbas snd_hda_codec_hdmi pcspkr snd_hda_codec_realtek serio_raw i2c_i801 iTCO_wdt iTCO_vendor_support snd_hda_intel snd_hda_codec lpc_ich snd_hwdep mfd_core snd_pcm snd_page_alloc snd_timer snd soundcore acpi_cpufreq i915 video button drm_kms_helper drm mperf freq_table CPU: 1 PID: 16650 Comm: fbo-maxsize Not tainted 3.11.0-rc4_nightlytop_d93f59_debug_20130814_+ #6957 Hardware name: Dell Inc. OptiPlex 9010/03JR84, BIOS A01 05/04/2012 task: ffff8800563b3f00 ti: ffff88004bdf4000 task.ti: ffff88004bdf4000 RIP: 0010:[<ffffffffa008fb37>] [<ffffffffa008fb37>] i915_gem_gtt_finish_object+0x73/0xc8 [i915] RSP: 0018:ffff88004bdf5958 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8801135e0000 RCX: ffff8800ad3bf8e0 RDX: ffff8800ad3bf8e0 RSI: 0000000000000000 RDI: ffff8801007ee780 RBP: ffff88004bdf5978 R08: ffff8800ad3bf8e0 R09: 0000000000000000 R10: ffffffff86ca1810 R11: ffff880036a17101 R12: ffff8801007ee780 R13: 0000000000018001 R14: ffff880118c4e000 R15: ffff8801007ee780 FS: 00007f401a0ce740(0000) GS:ffff88011e280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000005635c000 CR4: 00000000001407e0 Stack: ffff8801007ee780 ffff88005c253180 0000000000018000 ffff8801135e0000 ffff88004bdf59a8 ffffffffa0088e55 0000000000000011 ffff8801007eec00 0000000000018000 ffff880036a17101 ffff88004bdf5a08 ffffffffa0089026 Call Trace: [<ffffffffa0088e55>] i915_vma_unbind+0xdf/0x1ab [i915] [<ffffffffa0089026>] __i915_gem_shrink+0x105/0x177 [i915] [<ffffffffa0089452>] i915_gem_object_get_pages_gtt+0x108/0x309 [i915] [<ffffffffa0085ba9>] i915_gem_object_get_pages+0x61/0x90 [i915] [<ffffffffa008f22b>] ? gen6_ppgtt_insert_entries+0x103/0x125 [i915] [<ffffffffa008a113>] i915_gem_object_pin+0x1fa/0x5df [i915] [<ffffffffa008cdfe>] i915_gem_execbuffer_reserve_object.isra.6+0x8d/0x1bc [i915] [<ffffffffa008d156>] i915_gem_execbuffer_reserve+0x229/0x367 [i915] [<ffffffffa008dbf6>] i915_gem_do_execbuffer.isra.12+0x4dc/0xf3a [i915] [<ffffffff810fc823>] ? might_fault+0x40/0x90 [<ffffffffa008eb89>] i915_gem_execbuffer2+0x187/0x222 [i915] [<ffffffffa000971c>] drm_ioctl+0x308/0x442 [drm] [<ffffffffa008ea02>] ? i915_gem_execbuffer+0x3ae/0x3ae [i915] [<ffffffff817db156>] ? __do_page_fault+0x3dd/0x481 [<ffffffff8112fdba>] vfs_ioctl+0x26/0x39 [<ffffffff811306a2>] do_vfs_ioctl+0x40e/0x451 [<ffffffff817deda7>] ? sysret_check+0x1b/0x56 [<ffffffff8113073c>] SyS_ioctl+0x57/0x87 [<ffffffff8135bbfe>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff817ded82>] system_call_fastpath+0x16/0x1b Code: 48 c7 c6 84 30 0e a0 31 c0 e8 d0 e9 f7 ff bf c6 a7 00 00 e8 07 af 2c e1 41 f6 84 24 03 01 00 00 10 75 44 49 8b 84 24 08 01 00 00 <8b> 50 08 48 8b 30 49 8b 86 b0 04 00 00 48 89 c7 48 81 c7 98 00 RIP [<ffffffffa008fb37>] i915_gem_gtt_finish_object+0x73/0xc8 [i915] RSP <ffff88004bdf5958> CR2: 0000000000000008 As a consequence we need to change the "only one vma for now" check in vma_unbind - since vma_destroy isn't always called the obj->vma_list might not be empty. Instead check that the vma list is singular at the beginning of vma_unbind. This is also more symmetric with bind_to_vm. This fixes the igt/gem_evict_everything|alignment testcases. v2: - Add a paranoid WARN to mark_free in the eviction code to make sure we never try to evict a vma used by the execbuf code right now. - Move the check for a temporary execbuf vma into vma_destroy - otherwise the failure path cleanup in bind_to_vm will blow up. Our first attempting at fixing this was commit 1be81a2f2cfd8789a627401d470423358fba2d76 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Aug 20 12:56:40 2013 +0100 drm/i915: Don't destroy the vma placeholder during execbuffer reservation Squash with this when merging! v3: Improvements suggested in Chris' review: - Move the WARN_ON in vma_destroy that checks for vmas with an drm_mm allocation before the early return. - Bail out if we hit the WARN in mark_free to hopefully make the kernel survive for long enough to capture it. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68298 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68171 Tested-by: lu hua <huax.lu@intel.com> (v2) Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
8637b407 |
|
16-Aug-2013 |
Ben Widawsky <benjamin.widawsky@intel.com> |
drm/i915/vma: Correct use after free in eviction The vma will [possibly] be destroyed during unbind in eviction. Immediately after this, we try to delete the list entry. Chris and Ville did the debug on this before I woke up, I just get to take credit for the fix :p For future reference the Oops that Mika reported: [ 403.472448] BUG: unable to handle kernel paging request at 6b6b6b6b [ 403.472473] IP: [<c12c1500>] __list_del_entry+0x20/0xe0 [ 403.472514] *pdpt = 000000002e89c001 *pde = 0000000000000000 [ 403.472556] Oops: 0000 [#1] SMP [ 403.472582] Modules linked in: mxm_wmi snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi psmouse snd_seq_midi_event snd_seq serio_raw snd_timer snd_seq_device snd soundcore snd_page_alloc wmi bnep rfcomm bluetooth mac_hid parport_pc ppdev lp parport usbhid dm_crypt firewire_ohci firewire_core crc_itu_t i915 drm_kms_helper e1000e ptp drm i2c_algo_bit pps_core xhci_hcd video [ 403.472895] CPU: 2 PID: 1940 Comm: Xorg Not tainted 3.11.0-rc2+ #827 [ 403.472938] Hardware name: /DZ77BH-55K, BIOS BHZ7710H.86A.0070.2012.0416.2117 04/16/2012 [ 403.473002] task: ec866c00 ti: ee6a2000 task.ti: ee6a2000 [ 403.473039] EIP: 0060:[<c12c1500>] EFLAGS: 00013202 CPU: 2 [ 403.473078] EIP is at __list_del_entry+0x20/0xe0 [ 403.473109] EAX: f016d9bc EBX: f016d9bc ECX: 6b6b6b6b EDX: 6b6b6b6b [ 403.473151] ESI: 00000000 EDI: ee6a3c90 EBP: ee6a3c60 ESP: ee6a3c48 [ 403.473193] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 403.473230] CR0: 80050033 CR2: 6b6b6b6b CR3: 2ec43000 CR4: 001407f0 [ 403.473271] Stack: [ 403.473285] f63b2ff0 f61f98c0 f61f8000 f016d9bc 00000000 f016d9bc ee6a3cac f8519a4a [ 403.473347] 00000000 00000000 10000000 f61f8000 0100a000 10000000 00000001 008ca000 [ 403.473410] f64ee840 f61f98c0 f016d9bc f016dcec ee6a3c98 ee6a3c98 f61f98c0 dcc58f00 [ 403.473472] Call Trace: [ 403.473509] [<f8519a4a>] i915_gem_evict_something+0x17a/0x2d0 [i915] [ 403.473567] [<f8516ed1>] i915_gem_object_pin+0x271/0x660 [i915] [ 403.473622] [<f851c740>] ? i915_ggtt_clear_range+0x20/0x20 [i915] [ 403.473676] [<f8517afa>] i915_gem_object_pin_to_display_plane+0xda/0x190 [i915] [ 403.473742] [<f852d9fa>] intel_pin_and_fence_fb_obj+0xba/0x140 [i915] [ 403.473800] [<f852db40>] intel_gen7_queue_flip+0x30/0x1c0 [i915] [ 403.473856] [<f85337b0>] intel_crtc_page_flip+0x1a0/0x320 [i915] [ 403.473911] [<f847b549>] ? drm_framebuffer_reference+0x39/0x80 [drm] [ 403.473965] [<f847f9fb>] drm_mode_page_flip_ioctl+0x28b/0x320 [drm] [ 403.474018] [<f846fec8>] drm_ioctl+0x4b8/0x560 [drm] [ 403.474064] [<f847f770>] ? drm_mode_gamma_get_ioctl+0xd0/0xd0 [drm] [ 403.474113] [<c1140f8a>] ? do_sync_read+0x6a/0xa0 [ 403.474154] [<f846fa10>] ? drm_copy_field+0x80/0x80 [drm] [ 403.474193] [<c115134c>] do_vfs_ioctl+0x7c/0x5b0 [ 403.474228] [<c1141d2f>] ? vfs_read+0xef/0x160 [ 403.474263] [<c108dcbb>] ? ktime_get_ts+0x4b/0x120 [ 403.474298] [<c1151917>] SyS_ioctl+0x97/0xa0 [ 403.474330] [<c1590bc1>] sysenter_do_call+0x12/0x22 [ 403.474364] Code: 55 f4 8b 45 f8 e9 75 ff ff ff 90 55 89 e5 53 83 ec 14 8b 08 8b 50 04 81 f9 00 01 10 00 74 24 81 fa 00 02 20 00 0f 84 8e 00 00 00 <8b> 1a 39 d8 75 62 8b 59 04 39 d8 75 35 89 51 04 89 0a 83 c4 14 [ 403.474566] EIP: [<c12c1500>] __list_del_entry+0x20/0xe0 SS:ESP 0068:ee6a3c48 [ 403.476513] CR2: 000000006b6b6b6b v2: Missed the drm_object_unreference use after free (Ville) Daniel Vetter <daniel@ffwll.ch> writes: Reported-by: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Add the Oops from Mika to the commit message.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
82a55ad1 |
|
14-Aug-2013 |
Ben Widawsky <ben@bwidawsk.net> |
drm/i915: Switch eviction code to use vmas The execbuf wants to do relocations usings vmas, so we need a vma->exec_list. The eviction code also uses the old obj execbuf list for it's own book-keeping, but would really prefer to deal in vmas only. So switch it over to the new list. Again this is just a prep patch for the big execbuf vma conversion. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Split out from Ben's big execbuf vma patch.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
ca191b13 |
|
31-Jul-2013 |
Ben Widawsky <ben@bwidawsk.net> |
drm/i915: mm_list is per VMA formerly: "drm/i915: Create VMAs (part 5) - move mm_list" The mm_list is used for the active/inactive LRUs. Since those LRUs are per address space, the link should be per VMx . Because we'll only ever have 1 VMA before this point, it's not incorrect to defer this change until this point in the patch series, and doing it here makes the change much easier to understand. Shamelessly manipulated out of Daniel: "active/inactive stuff is used by eviction when we run out of address space, so needs to be per-vma and per-address space. Bound/unbound otoh is used by the shrinker which only cares about the amount of memory used and not one bit about in which address space this memory is all used in. Of course to actual kick out an object we need to unbind it from every address space, but for that we have the per-object list of vmas." v2: only bump GGTT LRU in i915_gem_object_set_to_gtt_domain (Chris) v3: Moved earlier in the series v4: Add dropped message from v3 Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Frob patch to apply and use vma->node.size directly as discused with Ben. Also drop a needles BUG_ON before move_to_inactive, the function itself has the same check.] [danvet 2nd: Rebase on top of the lost "drm/i915: Cleanup more of VMA in destroy", specifically unlink the vma from the mm_list in vma_unbind (to keep it symmetric with bind_to_vm) instead of vma_destroy.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
f6cd1f15 |
|
31-Jul-2013 |
Ben Widawsky <ben@bwidawsk.net> |
drm/i915: Use new bind/unbind in eviction code Eviction code, like the rest of the converted code needs to be aware of the address space for which it is evicting (or the everything case, all addresses). With the updated bind/unbind interfaces of the last patch, we can now safely move the eviction code over. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
07fe0b12 |
|
31-Jul-2013 |
Ben Widawsky <ben@bwidawsk.net> |
drm/i915: plumb VM into bind/unbind code As alluded to in several patches, and it will be reiterated later... A VMA is an abstraction for a GEM BO bound into an address space. Therefore it stands to reason, that the existing bind, and unbind are the ones which will be the most impacted. This patch implements this, and updates all callers which weren't already updated in the series (because it was too messy). This patch represents the bulk of an earlier, larger patch. I've pulled out a bunch of things by the request of Daniel. The history is preserved for posterity with the email convention of ">" One big change from the original patch aside from a bunch of cropping is I've created an i915_vma_unbind() function. That is because we always have the VMA anyway, and doing an extra lookup is useful. There is a caveat, we retain an i915_gem_object_ggtt_unbind, for the global cases which might not talk in VMAs. > drm/i915: plumb VM into object operations > > This patch was formerly known as: > "drm/i915: Create VMAs (part 3) - plumbing" > > This patch adds a VM argument, bind/unbind, and the object > offset/size/color getters/setters. It preserves the old ggtt helper > functions because things still need, and will continue to need them. > > Some code will still need to be ported over after this. > > v2: Fix purge to pick an object and unbind all vmas > This was doable because of the global bound list change. > > v3: With the commit to actually pin/unpin pages in place, there is no > longer a need to check if unbind succeeded before calling put_pages(). > Make put_pages only BUG() after checking pin count. > > v4: Rebased on top of the new hangcheck work by Mika > plumbed eb_destroy also > Many checkpatch related fixes > > v5: Very large rebase > > v6: > Change BUG_ON to WARN_ON (Daniel) > Rename vm to ggtt in preallocate stolen, since it is always ggtt when > dealing with stolen memory. (Daniel) > list_for_each will short-circuit already (Daniel) > remove superflous space (Daniel) > Use per object list of vmas (Daniel) > Make obj_bound_any() use obj_bound for each vm (Ben) > s/bind_to_gtt/bind_to_vm/ (Ben) > > Fixed up the inactive shrinker. As Daniel noticed the code could > potentially count the same object multiple times. While it's not > possible in the current case, since 1 object can only ever be bound into > 1 address space thus far - we may as well try to get something more > future proof in place now. With a prep patch before this to switch over > to using the bound list + inactive check, we're now able to carry that > forward for every address space an object is bound into. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Rebase on top of the loss of "drm/i915: Cleanup more of VMA in destroy".] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
a70a3148 |
|
31-Jul-2013 |
Ben Widawsky <ben@bwidawsk.net> |
drm/i915: Make proper functions for VMs Earlier in the conversion sequence we attempted to quickly wedge in the transitional interface as static inlines. Now that we're sure these interfaces are sane, for easier debug and to decrease code size (since many of these functions may be called quite a bit), make them real functions While at it, kill off the set_color interface. We'll always have the VMA, or easily get to it. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
2f633156 |
|
17-Jul-2013 |
Ben Widawsky <ben@bwidawsk.net> |
drm/i915: Create VMAs Formerly: "drm/i915: Create VMAs (part 1)" In a previous patch, the notion of a VM was introduced. A VMA describes an area of part of the VM address space. A VMA is similar to the concept in the linux mm. However, instead of representing regular memory, a VMA is backed by a GEM BO. There may be many VMAs for a given object, one for each VM the object is to be used in. This may occur through flink, dma-buf, or a number of other transient states. Currently the code depends on only 1 VMA per object, for the global GTT (and aliasing PPGTT). The following patches will address this and make the rest of the infrastructure more suited v2: s/i915_obj/i915_gem_obj (Chris) v3: Only move an object to the now global unbound list if there are no more VMAs for the object which are bound into a VM (ie. the list is empty). v4: killed obj->gtt_space some reworks due to rebase v5: Free vma on error path (Imre) v6: Another missed vma free in i915_gem_object_bind_to_gtt error path (Imre) Fixed vma freeing in stolen preallocation (Imre) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Imre Deak <imre.deak@intel.com> [danvet: Squash in fixup from Ben to not deref a non-existing vma in set_cache_level, reported by Chris.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
5cef07e1 |
|
16-Jul-2013 |
Ben Widawsky <ben@bwidawsk.net> |
drm/i915: Move active/inactive lists to new mm Shamelessly manipulated out of Daniel :-) "When moving the lists around explain that the active/inactive stuff is used by eviction when we run out of address space, so needs to be per-vma and per-address space. Bound/unbound otoh is used by the shrinker which only cares about the amount of memory used and not one bit about in which address space this memory is all used in. Of course to actual kick out an object we need to unbind it from every address space, but for that we have the per-object list of vmas." v2: Leave the bound list as a global one. (Chris, indirectly) v3: Rebased with no i915_gtt_vm. In most places I added a new *vm local, since it will eventually be replaces by a vm argument. Put comment back inline, since it no longer makes sense to do otherwise. v4: Rebased on hangcheck/error state movement Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
93bd8649 |
|
16-Jul-2013 |
Ben Widawsky <ben@bwidawsk.net> |
drm/i915: Put the mm in the parent address space Every address space should support object allocation. It therefore makes sense to have the allocator be part of the "superclass" which GGTT and PPGTT will derive. Since our maximum address space size is only 2GB we're not yet able to avoid doing allocation/eviction; but we'd hope one day this becomes almost irrelvant. v2: Rebased Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
c6cfb325 |
|
05-Jul-2013 |
Ben Widawsky <ben@bwidawsk.net> |
drm/i915: Embed drm_mm_node in i915 gem obj Embedding the node in the obj is more natural in the transition to VMAs which will also have embedded nodes. This change also helps transition away from put_block to remove node. Though it's quite an uncommon occurrence, it's somewhat convenient to not fail at bind time because we cannot allocate the node. Though in practice there are other allocations (like the request structure) which would probably make this point not terribly useful. Quoting Daniel: Note that the only difference between put_block and remove_node is that the former fills up the preallocation cache. Which we don't need anyway and hence is just wasted space. v2: Clean up the stolen preallocation code. Rebased on the reserve_node patches renames ggtt_ stuff to gtt_ stuff WARN_ON if the object is already bound (which doesn't mean it's in the bound list, tricky) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
5d4545ae |
|
17-Jan-2013 |
Ben Widawsky <ben@bwidawsk.net> |
drm/i915: Create a gtt structure The purpose of the gtt structure is to help isolate our gtt specific properties from the rest of the code (in doing so it help us finish the isolation from the AGP connection). The following members are pulled out (and renamed): gtt_start gtt_total gtt_mappable_end gtt_mappable gtt_base_addr gsm The gtt structure will serve as a nice place to put gen specific gtt routines in upcoming patches. As far as what else I feel belongs in this structure: it is meant to encapsulate the GTT's physical properties. This is why I've not added fields which track various drm_mm properties, or things like gtt_mtrr (which is itself a pretty transient field). Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> [Ben modified commit messages] Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
760285e7 |
|
02-Oct-2012 |
David Howells <dhowells@redhat.com> |
UAPI: (Scripted) Convert #include "..." to #include <path/...> in drivers/gpu/ Convert #include "..." to #include <path/...> in drivers/gpu/. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Dave Airlie <airlied@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>
|
#
4126d5d6 |
|
02-Oct-2012 |
David Howells <dhowells@redhat.com> |
UAPI: (Scripted) Remove redundant DRM UAPI header #inclusions from drivers/gpu/. Remove redundant DRM UAPI header #inclusions from drivers/gpu/. Remove redundant #inclusions of core DRM UAPI headers (drm.h, drm_mode.h and drm_sarea.h). They are now #included via drmP.h and drm_crtc.h via a preceding patch. Without this patch and the patch to make include the UAPI headers from the core headers, after the UAPI split, the DRM C sources cannot find these UAPI headers because the DRM code relies on specific -I flags to make #include "..." work on headers in include/drm/ - but that does not work after the UAPI split without adding more -I flags. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Dave Airlie <airlied@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>
|
#
86a1ee26 |
|
11-Aug-2012 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Only pwrite through the GTT if there is space in the aperture Avoid stalling and waiting for the GPU by checking to see if there is sufficient inactive space in the aperture for us to bind the buffer prior to writing through the GTT. If there is inadequate space we will have to stall waiting for the GPU, and incur overheads moving objects about. Instead, only incur the clflush overhead on the target object by writing through shmem. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
6c085a72 |
|
20-Aug-2012 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Track unbound pages When dealing with a working set larger than the GATT, or even the mappable aperture when touching through the GTT, we end up with evicting objects only to rebind them at a new offset again later. Moving an object into and out of the GTT requires clflushing the pages, thus causing a double-clflush penalty for rebinding. To avoid having to clflush on rebinding, we can track the pages as they are evicted from the GTT and only relinquish those pages on memory pressure. As usual, if it were not for the handling of out-of-memory condition and having to manually shrink our own bo caches, it would be a net reduction of code. Alas. Note: The patch also contains a few changes to the last-hope evict_everything logic in i916_gem_execbuffer.c - we no longer try to only evict the purgeable stuff in a first try (since that's superflous and only helps in OOM corner-cases, not fragmented-gtt trashing situations). Also, the extraction of the get_pages retry loop from bind_to_gtt (and other callsites) to get_pages should imo have been a separate patch. v2: Ditch the newly added put_pages (for unbound objects only) in i915_gem_reset. A quick irc discussion hasn't revealed any important reason for this, so if we need this, I'd like to have a git blame'able explanation for it. v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Split out code movements and rant a bit in the commit message with a few Notes. Done v2] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
42d6ab48 |
|
26-Jul-2012 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Segregate memory domains in the GTT using coloring Several functions of the GPU have the restriction that differing memory domains cannot be placed next to each other (as the GPU may prefetch beyond the end of one domain and hang as it crosses into the other domain). We use the facility of the drm_mm to mark ranges with a particular color that corresponds to the cache attributes of those pages in order to prevent allocating adjacent blocks of differing memory types. v2: Rebase ontop of drm_mm coloring v2. v3: Fix rebinding existing gtt_space and add a verification routine. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
65ce3027 |
|
19-Jul-2012 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove the defunct flushing list As we guarantee to emit a flush before emitting the breadcrumb or the next batchbuffer, there is no further need for the flushing list. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
6b9d89b4 |
|
10-Jul-2012 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm: Add colouring to the range allocator In order to support snoopable memory on non-LLC architectures (so that we can bind vgem objects into the i915 GATT for example), we have to avoid the prefetcher on the GPU from crossing memory domains and so prevent allocation of a snoopable PTE immediately following an uncached PTE. To do that, we need to extend the range allocator with support for tracking and segregating different node colours. This will be used by i915 to segregate memory domains within the GTT. v2: Now with more drm_mm helpers and less driver interference. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dave Airlie <airlied@redhat.com Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@gmail.com>
|
#
b4519513 |
|
11-May-2012 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Introduce for_each_ring() macro In many places we wish to iterate over the rings associated with the GPU, so refactor them to use a common macro. Along the way, there are a few code removals that should be side-effect free and some rearrangement which should only have a cosmetic impact, such as error-state. Note that this slightly changes the semantics in the hangcheck code: We now always cycle through all enabled rings instead of short-circuiting the logic. v2: Pull in a couple of suggestions from Ben and Daniel for intel_ring_initialized() and not removing the warning (just moving them to a new home, closer to the error). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Added note to commit message about the small behaviour change, suggested by Ben Widawsky.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
b2da9fe5 |
|
26-Apr-2012 |
Ben Widawsky <ben@bwidawsk.net> |
drm/i915: remove do_retire from i915_wait_request This originates from a hack by me to quickly fix a bug in an earlier patch where we needed control over whether or not waiting on a seqno actually did any retire list processing. Since the two operations aren't clearly related, we should pull the parameter out of the wait function, and make the caller responsible for retiring if the action is desired. The only function call site which did not get an explicit retire_request call (on purpose) is i915_gem_inactive_shrink(). That code was already calling retire_request a second time. v2: don't modify any behavior excepit i915_gem_inactive_shrink(Daniel) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
1b50247a |
|
24-Apr-2012 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove the list of pinned inactive objects Simplify object tracking by removing the inactive but pinned list. The only place where this was used is for counting the available memory, which is just as easy performed by checking all objects on the rare occasions it is required (application startup). For ease of debugging, we keep the reporting of pinned objects through the error-state and debugfs. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
a39d7efc |
|
24-Apr-2012 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove i915_gem_evict_inactive() This was only used by one external caller who would just be as happy with evict-everything, so perform the replacement and make the function private. In the process we note that unbinding the inactive list should not fail, and make it a warning instead. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
70424970 |
|
23-Feb-2012 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: No need to search again after retiring requests Retiring requests does not typically free up space in the aperture, so the additional search is pointless. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
b6708242 |
|
23-Feb-2012 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Only bump refcnt on objects scheduled for eviction Incrementing the reference count on all objects walked when searching for space in the aperture is a non-neglible amount of overhead. In fact, we only need to hold on to a reference for objects that we will evict, so we can therefore delay the referencing until we find a suitable hole and only add those objects that fall inside. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
b93f9cf1 |
|
25-Jan-2012 |
Ben Widawsky <ben@bwidawsk.net> |
drm/i915: argument to control retiring behavior Sometimes it may be the case when we idle the gpu or wait on something we don't actually want to process the retiring list. This patch allows callers to choose the behavior. Reviewed-by: Keith Packard <keithp@keithp.com> Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
0206e353 |
|
16-Aug-2011 |
Akshay Joshi <me@akshayjoshi.com> |
Drivers: i915: Fix all space related issues. Various issues involved with the space character were generating warnings in the checkpatch.pl file. This patch removes most of those warnings. Signed-off-by: Akshay Joshi <me@akshayjoshi.com> Signed-off-by: Keith Packard <keithp@keithp.com>
|
#
db53a302 |
|
03-Feb-2011 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Refine tracepoints A lot of minor tweaks to fix the tracepoints, improve the outputting for ftrace, and to generally make the tracepoints useful again. It is a start and enough to begin identifying performance issues and gaps in our coverage. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
#
092de6f2 |
|
10-Jan-2011 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/evict: Ensure we completely cleanup on failure ... and not leave the objects in a inconsistent state. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org
|
#
432e58ed |
|
25-Nov-2010 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Avoid allocation for execbuffer object list Besides the minimal improvement in reducing the execbuffer overhead, the real benefit is clarifying a few routines. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
#
05394f39 |
|
08-Nov-2010 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Use drm_i915_gem_object as the preferred type A glorified s/obj_priv/obj/ with a net reduction of over a 100 lines and many characters! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
#
5eac3ab4 |
|
31-Oct-2010 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Evict just the purgeable GTT entries on the first pass Take two passes to evict everything whilst searching for sufficient free space to bind the batchbuffer. After searching for sufficient free space using LRU eviction, evict everything that is purgeable and try again. Only then if there is insufficient free space (or the GTT is too badly fragmented) evict everything from the aperture and try one last time. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
#
395b70be |
|
28-Oct-2010 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Flush read-only buffers from the active list upon idle as well It is possible for the active list to only contain a read-only buffer so that the ring->gpu_write_list remains entry. This leads to an inconsistency between i915_gpu_is_active() and i915_gpu_idle() causing an infinite spin during the shrinker and an assertion failure that i915_gpu_idle() does indeed flush all buffers from the active lists. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
#
a6e0aa42 |
|
16-Sep-2010 |
Daniel Vetter <daniel.vetter@ffwll.ch> |
drm/i915: range-restricted eviction support Add a mappable parameter to i915_gem_evict_something to distinguish the two cases (non-restricted vs. mappable gtt allocations). No functional changes because the mappable limit is set to the end of the gtt currently. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
#
549f7365 |
|
19-Oct-2010 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Enable SandyBridge blitter ring Based on an original patch by Zhenyu Wang, this initializes the BLT ring for SandyBridge and enables support for user execbuffers. Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
#
69dc4987 |
|
19-Oct-2010 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Track objects in global active list (as well as per-ring) To handle retirements, we need per-ring tracking of active objects. To handle evictions, we need global tracking of active objects. As we enable more rings, rebuilding the global list from the individual per-ring lists quickly grows tiresome and overly complicated. Tracking the active objects in two lists is the lesser of two evils. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
#
87acb0a5 |
|
19-Oct-2010 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Simplify most HAS_BSD() checks ... by always initialising the empty ringbuffer it is always then safe to check whether it is active. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
#
e39a0150 |
|
29-Sep-2010 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Fix refleak during eviction. Now that we hold onto a reference whilst evicting objects, we need to be sure that we drop all the references taken -- even on the error paths. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
#
97d1ebaf |
|
29-Sep-2010 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/debug: Remove defunct WATCH_LRU This has bitrotted through inuse and superseded by tracing and debugfs. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
#
af626103 |
|
20-Sep-2010 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Hold a reference to the object whilst unbinding the eviction list During heavy aperture thrashing we may be forced to wait upon several active objects during eviction. The active list may be the last reference to these objects and so the action of waiting upon one of them may cause another to be freed (and itself unbound). To prevent the object disappearing underneath us, we need to acquire and hold a reference whilst unbinding. This should fix the reported page refcount OOPS: kernel BUG at drivers/gpu/drm/i915/i915_gem.c:1444! ... RIP: 0010:[<ffffffffa0093026>] [<ffffffffa0093026>] i915_gem_object_put_pages+0x25/0xf5 [i915] Call Trace: [<ffffffffa009481d>] i915_gem_object_unbind+0xc5/0x1a7 [i915] [<ffffffffa0098ab2>] i915_gem_evict_something+0x3bd/0x409 [i915] [<ffffffffa0027923>] ? drm_gem_object_lookup+0x27/0x57 [drm] [<ffffffffa0093bc3>] i915_gem_object_bind_to_gtt+0x1d3/0x279 [i915] [<ffffffffa0095b30>] i915_gem_object_pin+0xa3/0x146 [i915] [<ffffffffa0027948>] ? drm_gem_object_lookup+0x4c/0x57 [drm] [<ffffffffa00961bc>] i915_gem_do_execbuffer+0x50d/0xe32 [i915] Reported-by: Shawn Starr <shawn.starr@rogers.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=18902 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
#
de227ef0 |
|
03-Jul-2010 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Kill the active list spinlock This spinlock only served debugging purposes in a time when we could not be sure of the mutex ever being released upon a GPU hang. As we now should be able rely on hangcheck to do the job for us (and that error reporting should not itself require the struct mutex) we can kill the incomplete attempt at protection. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
#
cd377ea9 |
|
07-Aug-2010 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Implement fair lru eviction across both rings. (v2) Based in a large part upon Daniel Vetter's implementation and adapted for handling multiple rings in a single pass. This should lead to better gtt usage and fixes the page-fault-of-doom triggered. The fairness is provided by scanning through the GTT space amalgamating space in rendering order. As soon as we have a contiguous space in the GTT large enough for the new object (and its alignment), evict any object which lies within that space. This should keep more objects resident in the GTT. Doing throughput testing on a PineView machine with cairo-perf-trace indicates that there is very little difference with the new LRU scan, perhaps a small improvement... Except oddly for the poppler trace. Reference: Bug 15911 - Intermittent X crash (freeze) https://bugzilla.kernel.org/show_bug.cgi?id=15911 Bug 20152 - cannot view JPG in firefox when running UXA https://bugs.freedesktop.org/show_bug.cgi?id=20152 Bug 24369 - Hang when scrolling firefox page with window in front https://bugs.freedesktop.org/show_bug.cgi?id=24369 Bug 28478 - Intermittent graphics lockups due to overflow/loop https://bugs.freedesktop.org/show_bug.cgi?id=28478 v2: Attempt to clarify the logic and order of eviction through the use of comments and macros. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Eric Anholt <eric@anholt.net>
|
#
b47eb4a2 |
|
07-Aug-2010 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Move the eviction logic to its own file. The eviction code is the gnarly underbelly of memory management, and is clearer if kept separated from the normal domain management in GEM. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Eric Anholt <eric@anholt.net>
|