#
946e047a |
|
20-Jul-2023 |
Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> |
drm/i915: Fix premature release of request's reusable memory Infinite waits for completion of GPU activity have been observed in CI, mostly inside __i915_active_wait(), triggered by igt@gem_barrier_race or igt@perf@stress-open-close. Root cause analysis, based of ftrace dumps generated with a lot of extra trace_printk() calls added to the code, revealed loops of request dependencies being accidentally built, preventing the requests from being processed, each waiting for completion of another one's activity. After we substitute a new request for a last active one tracked on a timeline, we set up a dependency of our new request to wait on completion of current activity of that previous one. While doing that, we must take care of keeping the old request still in memory until we use its attributes for setting up that await dependency, or we can happen to set up the await dependency on an unrelated request that already reuses the memory previously allocated to the old one, already released. Combined with perf adding consecutive kernel context remote requests to different user context timelines, unresolvable loops of await dependencies can be built, leading do infinite waits. We obtain a pointer to the previous request to wait upon when we substitute it with a pointer to our new request in an active tracker, e.g. in intel_timeline.last_request. In some processing paths we protect that old request from being freed before we use it by getting a reference to it under RCU protection, but in others, e.g. __i915_request_commit() -> __i915_request_add_to_timeline() -> __i915_request_ensure_ordering(), we don't. But anyway, since the requests' memory is SLAB_FAILSAFE_BY_RCU, that RCU protection is not sufficient against reuse of memory. We could protect i915_request's memory from being prematurely reused by calling its release function via call_rcu() and using rcu_read_lock() consequently, as proposed in v1. However, that approach leads to significant (up to 10 times) increase of SLAB utilization by i915_request SLAB cache. Another potential approach is to take a reference to the previous active fence. When updating an active fence tracker, we first lock the new fence, substitute a pointer of the current active fence with the new one, then we lock the substituted fence. With this approach, there is a time window after the substitution and before the lock when the request can be concurrently released by an interrupt handler and its memory reused, then we may happen to lock and return a new, unrelated request. Always get a reference to the current active fence first, before replacing it with a new one. Having it protected from premature release and reuse, lock it and then replace with the new one but only if not yet signalled via a potential concurrent interrupt nor replaced with another one by a potential concurrent thread, otherwise retry, starting from getting a reference to the new current one. Adjust users to not get a reference to the previous active fence themselves and always put the reference got by __i915_active_fence_set() when no longer needed. v3: Fix lockdep splat reports and other issues caused by incorrect use of try_cmpxchg() (use (cmpxchg() != prev) instead) v2: Protect request's memory by getting a reference to it in favor of delegating its release to call_rcu() (Chris) Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8211 Fixes: df9f85d8582e ("drm/i915: Serialise i915_active_fence_set() with itself") Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.6+ Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230720093543.832147-2-janusz.krzysztofik@linux.intel.com
|
#
a337b64f |
|
20-Jul-2023 |
Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> |
drm/i915: Fix premature release of request's reusable memory Infinite waits for completion of GPU activity have been observed in CI, mostly inside __i915_active_wait(), triggered by igt@gem_barrier_race or igt@perf@stress-open-close. Root cause analysis, based of ftrace dumps generated with a lot of extra trace_printk() calls added to the code, revealed loops of request dependencies being accidentally built, preventing the requests from being processed, each waiting for completion of another one's activity. After we substitute a new request for a last active one tracked on a timeline, we set up a dependency of our new request to wait on completion of current activity of that previous one. While doing that, we must take care of keeping the old request still in memory until we use its attributes for setting up that await dependency, or we can happen to set up the await dependency on an unrelated request that already reuses the memory previously allocated to the old one, already released. Combined with perf adding consecutive kernel context remote requests to different user context timelines, unresolvable loops of await dependencies can be built, leading do infinite waits. We obtain a pointer to the previous request to wait upon when we substitute it with a pointer to our new request in an active tracker, e.g. in intel_timeline.last_request. In some processing paths we protect that old request from being freed before we use it by getting a reference to it under RCU protection, but in others, e.g. __i915_request_commit() -> __i915_request_add_to_timeline() -> __i915_request_ensure_ordering(), we don't. But anyway, since the requests' memory is SLAB_FAILSAFE_BY_RCU, that RCU protection is not sufficient against reuse of memory. We could protect i915_request's memory from being prematurely reused by calling its release function via call_rcu() and using rcu_read_lock() consequently, as proposed in v1. However, that approach leads to significant (up to 10 times) increase of SLAB utilization by i915_request SLAB cache. Another potential approach is to take a reference to the previous active fence. When updating an active fence tracker, we first lock the new fence, substitute a pointer of the current active fence with the new one, then we lock the substituted fence. With this approach, there is a time window after the substitution and before the lock when the request can be concurrently released by an interrupt handler and its memory reused, then we may happen to lock and return a new, unrelated request. Always get a reference to the current active fence first, before replacing it with a new one. Having it protected from premature release and reuse, lock it and then replace with the new one but only if not yet signalled via a potential concurrent interrupt nor replaced with another one by a potential concurrent thread, otherwise retry, starting from getting a reference to the new current one. Adjust users to not get a reference to the previous active fence themselves and always put the reference got by __i915_active_fence_set() when no longer needed. v3: Fix lockdep splat reports and other issues caused by incorrect use of try_cmpxchg() (use (cmpxchg() != prev) instead) v2: Protect request's memory by getting a reference to it in favor of delegating its release to call_rcu() (Chris) Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8211 Fixes: df9f85d8582e ("drm/i915: Serialise i915_active_fence_set() with itself") Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.6+ Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230720093543.832147-2-janusz.krzysztofik@linux.intel.com (cherry picked from commit 946e047a3d88d46d15b5c5af0414098e12b243f7) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
#
bfad380c |
|
14-Mar-2023 |
Nirmoy Das <nirmoy.das@intel.com> |
drm/i915/active: Fix missing debug object activation debug_active_activate() expected ref->count to be zero which is not true anymore as __i915_active_activate() calls debug_active_activate() after incrementing the count. v2: No need to check for "ref->count == 1" as __i915_active_activate() already make sure of that(Janusz). References: https://gitlab.freedesktop.org/drm/intel/-/issues/6733 Fixes: 04240e30ed06 ("drm/i915: Skip taking acquire mutex for no ref->active callback") Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Thomas Hellström <thomas.hellstrom@intel.com> Cc: Andi Shyti <andi.shyti@linux.intel.com> Cc: intel-gfx@lists.freedesktop.org Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.10+ Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230313114613.9874-1-nirmoy.das@intel.com
|
#
50600605 |
|
02-Mar-2023 |
Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> |
drm/i915/active: Fix misuse of non-idle barriers as fence trackers Users reported oopses on list corruptions when using i915 perf with a number of concurrently running graphics applications. Root cause analysis pointed at an issue in barrier processing code -- a race among perf open / close replacing active barriers with perf requests on kernel context and concurrent barrier preallocate / acquire operations performed during user context first pin / last unpin. When adding a request to a composite tracker, we try to reuse an existing fence tracker, already allocated and registered with that composite. The tracker we obtain may already track another fence, may be an idle barrier, or an active barrier. If the tracker we get occurs a non-idle barrier then we try to delete that barrier from a list of barrier tasks it belongs to. However, while doing that we don't respect return value from a function that performs the barrier deletion. Should the deletion ever fail, we would end up reusing the tracker still registered as a barrier task. Since the same structure field is reused with both fence callback lists and barrier tasks list, list corruptions would likely occur. Barriers are now deleted from a barrier tasks list by temporarily removing the list content, traversing that content with skip over the node to be deleted, then populating the list back with the modified content. Should that intentionally racy concurrent deletion attempts be not serialized, one or more of those may fail because of the list being temporary empty. Related code that ignores the results of barrier deletion was initially introduced in v5.4 by commit d8af05ff38ae ("drm/i915: Allow sharing the idle-barrier from other kernel requests"). However, all users of the barrier deletion routine were apparently serialized at that time, then the issue didn't exhibit itself. Results of git bisect with help of a newly developed igt@gem_barrier_race@remote-request IGT test indicate that list corruptions might start to appear after commit 311770173fac ("drm/i915/gt: Schedule request retirement when timeline idles"), introduced in v5.5. Respect results of barrier deletion attempts -- mark the barrier as idle only if successfully deleted from the list. Then, before proceeding with setting our fence as the one currently tracked, make sure that the tracker we've got is not a non-idle barrier. If that check fails then don't use that tracker but go back and try to acquire a new, usable one. v3: use unlikely() to document what outcome we expect (Andi), - fix bad grammar in commit description. v2: no code changes, - blame commit 311770173fac ("drm/i915/gt: Schedule request retirement when timeline idles"), v5.5, not commit d8af05ff38ae ("drm/i915: Allow sharing the idle-barrier from other kernel requests"), v5.4, - reword commit description. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6333 Fixes: 311770173fac ("drm/i915/gt: Schedule request retirement when timeline idles") Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org # v5.5 Cc: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230302120820.48740-1-janusz.krzysztofik@linux.intel.com
|
#
e92eb246 |
|
14-Mar-2023 |
Nirmoy Das <nirmoy.das@intel.com> |
drm/i915/active: Fix missing debug object activation debug_active_activate() expected ref->count to be zero which is not true anymore as __i915_active_activate() calls debug_active_activate() after incrementing the count. v2: No need to check for "ref->count == 1" as __i915_active_activate() already make sure of that(Janusz). References: https://gitlab.freedesktop.org/drm/intel/-/issues/6733 Fixes: 04240e30ed06 ("drm/i915: Skip taking acquire mutex for no ref->active callback") Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Thomas Hellström <thomas.hellstrom@intel.com> Cc: Andi Shyti <andi.shyti@linux.intel.com> Cc: intel-gfx@lists.freedesktop.org Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.10+ Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230313114613.9874-1-nirmoy.das@intel.com (cherry picked from commit bfad380c542438a9b642f8190b7fd37bc77e2723) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
e0e6b416 |
|
02-Mar-2023 |
Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> |
drm/i915/active: Fix misuse of non-idle barriers as fence trackers Users reported oopses on list corruptions when using i915 perf with a number of concurrently running graphics applications. Root cause analysis pointed at an issue in barrier processing code -- a race among perf open / close replacing active barriers with perf requests on kernel context and concurrent barrier preallocate / acquire operations performed during user context first pin / last unpin. When adding a request to a composite tracker, we try to reuse an existing fence tracker, already allocated and registered with that composite. The tracker we obtain may already track another fence, may be an idle barrier, or an active barrier. If the tracker we get occurs a non-idle barrier then we try to delete that barrier from a list of barrier tasks it belongs to. However, while doing that we don't respect return value from a function that performs the barrier deletion. Should the deletion ever fail, we would end up reusing the tracker still registered as a barrier task. Since the same structure field is reused with both fence callback lists and barrier tasks list, list corruptions would likely occur. Barriers are now deleted from a barrier tasks list by temporarily removing the list content, traversing that content with skip over the node to be deleted, then populating the list back with the modified content. Should that intentionally racy concurrent deletion attempts be not serialized, one or more of those may fail because of the list being temporary empty. Related code that ignores the results of barrier deletion was initially introduced in v5.4 by commit d8af05ff38ae ("drm/i915: Allow sharing the idle-barrier from other kernel requests"). However, all users of the barrier deletion routine were apparently serialized at that time, then the issue didn't exhibit itself. Results of git bisect with help of a newly developed igt@gem_barrier_race@remote-request IGT test indicate that list corruptions might start to appear after commit 311770173fac ("drm/i915/gt: Schedule request retirement when timeline idles"), introduced in v5.5. Respect results of barrier deletion attempts -- mark the barrier as idle only if successfully deleted from the list. Then, before proceeding with setting our fence as the one currently tracked, make sure that the tracker we've got is not a non-idle barrier. If that check fails then don't use that tracker but go back and try to acquire a new, usable one. v3: use unlikely() to document what outcome we expect (Andi), - fix bad grammar in commit description. v2: no code changes, - blame commit 311770173fac ("drm/i915/gt: Schedule request retirement when timeline idles"), v5.5, not commit d8af05ff38ae ("drm/i915: Allow sharing the idle-barrier from other kernel requests"), v5.4, - reword commit description. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6333 Fixes: 311770173fac ("drm/i915/gt: Schedule request retirement when timeline idles") Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org # v5.5 Cc: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230302120820.48740-1-janusz.krzysztofik@linux.intel.com (cherry picked from commit 506006055769b10d1b2b4e22f636f3b45e0e9fc7) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
1ea7fe77 |
|
08-Jul-2022 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Bump GT idling delay to 2 jiffies In monitoring a transcode pipeline that is latency sensitive (it waits between submitting frames, and each frame requires work on rcs/vcs/vecs engines), it is found that it took longer than a single jiffy for it to sustain its workload. Allowing an extra jiffy headroom for the userspace prevents us from prematurely parking and having to exit powersaving immediately. Link: https://gitlab.freedesktop.org/drm/intel/-/issues/6284 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/e37911ec087a9ce50630d6faf61fa2c0d5f96d44.1657289332.git.karolina.drobnik@intel.com
|
#
ad5c99e0 |
|
16-Dec-2021 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Remove unused bits of i915_vma/active api When reworking the code to move the eviction fence to the object, the best code is removed code. Remove some functions that are unused, and change the function definition if it's only used in 1 place. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> [mlankhorst: Remove new use of i915_active_has_exclusive] Link: https://patchwork.freedesktop.org/patch/msgid/20211216142749.1966107-2-maarten.lankhorst@linux.intel.com
|
#
512ba03e |
|
27-Jul-2021 |
Daniel Vetter <daniel.vetter@ffwll.ch> |
drm/i915: move i915_active slab to direct module init/exit With the global kmem_cache shrink infrastructure gone there's nothing special and we can convert them over. I'm doing this split up into each patch because there's quite a bit of noise with removing the static global.slab_cache to just a slab_cache. v2: Make slab static (Jason, 0day) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Cc: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727121037.2041102-2-daniel.vetter@ffwll.ch
|
#
4f62a7e0 |
|
21-Jul-2021 |
Daniel Vetter <daniel.vetter@ffwll.ch> |
drm/i915: Ditch i915 globals shrink infrastructure This essentially reverts commit 84a1074920523430f9dc30ff907f4801b4820072 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jan 24 11:36:08 2018 +0000 drm/i915: Shrink the GEM kmem_caches upon idling mm/vmscan.c:do_shrink_slab() is a thing, if there's an issue with it then we need to fix that there, not hand-roll our own slab shrinking code in i915. Also when this was added there was only one other caller of kmem_cache_shrink (added 2005 to the acpi code). Now there's a 2nd one outside of i915 code in a kunit test, which seems legit since that wants to very carefully control what's in the kmem_cache. This out of a total of over 500 calls to kmem_cache_create. This alone should have been warning sign enough that we're doing something silly. Noticed while reviewing a patch set from Jason to fix up some issues in our i915_init() and i915_exit() module load/cleanup code. Now that i915_globals.c isn't any different than normal init/exit functions, we should convert them over to one unified table and remove i915_globals.[hc] entirely. v2: Improve commit message (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Cc: David Airlie <airlied@linux.ie> Cc: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210721183229.4136488-1-daniel.vetter@ffwll.ch
|
#
c3b14760 |
|
04-May-2021 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915: drop the __i915_active_call pointer packing We use some of the lower bits of the retire function pointer for potential flags, which is quite thorny, since the caller needs to remember to give the function the correct alignment with __i915_active_call, otherwise we might incorrectly unpack the pointer and jump to some garbage address later. Instead of all this let's just pass the flags along as a separate parameter. Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Suggested-by: Daniel Vetter <daniel@ffwll.ch> References: ca419f407b43 ("drm/i915: Fix crash in auto_retire") References: d8e44e4dd221 ("drm/i915/overlay: Fix active retire callback alignment") References: fd5f262db118 ("drm/i915/selftests: Fix active retire callback alignment") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210504164136.96456-1-matthew.auld@intel.com
|
#
ca419f40 |
|
28-Apr-2021 |
Stéphane Marchesin <marcheu@chromium.org> |
drm/i915: Fix crash in auto_retire The retire logic uses the 2 lower bits of the pointer to the retire function to store flags. However, the auto_retire function is not guaranteed to be aligned to a multiple of 4, which causes crashes as we jump to the wrong address, for example like this: 2021-04-24T18:03:53.804300Z WARNING kernel: [ 516.876901] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI 2021-04-24T18:03:53.804310Z WARNING kernel: [ 516.876906] CPU: 7 PID: 146 Comm: kworker/u16:6 Tainted: G U 5.4.105-13595-g3cd84167b2df #1 2021-04-24T18:03:53.804311Z WARNING kernel: [ 516.876907] Hardware name: Google Volteer2/Volteer2, BIOS Google_Volteer2.13672.76.0 02/22/2021 2021-04-24T18:03:53.804312Z WARNING kernel: [ 516.876911] Workqueue: events_unbound active_work 2021-04-24T18:03:53.804313Z WARNING kernel: [ 516.876914] RIP: 0010:auto_retire+0x1/0x20 2021-04-24T18:03:53.804314Z WARNING kernel: [ 516.876916] Code: e8 01 f2 ff ff eb 02 31 db 48 89 d8 5b 5d c3 0f 1f 44 00 00 55 48 89 e5 f0 ff 87 c8 00 00 00 0f 88 ab 47 4a 00 31 c0 5d c3 0f <1f> 44 00 00 55 48 89 e5 f0 ff 8f c8 00 00 00 0f 88 9a 47 4a 00 74 2021-04-24T18:03:53.804319Z WARNING kernel: [ 516.876918] RSP: 0018:ffff9b4d809fbe38 EFLAGS: 00010286 2021-04-24T18:03:53.804320Z WARNING kernel: [ 516.876919] RAX: 0000000000000007 RBX: ffff927915079600 RCX: 0000000000000007 2021-04-24T18:03:53.804320Z WARNING kernel: [ 516.876921] RDX: ffff9b4d809fbe40 RSI: 0000000000000286 RDI: ffff927915079600 2021-04-24T18:03:53.804321Z WARNING kernel: [ 516.876922] RBP: ffff9b4d809fbe68 R08: 8080808080808080 R09: fefefefefefefeff 2021-04-24T18:03:53.804321Z WARNING kernel: [ 516.876924] R10: 0000000000000010 R11: ffffffff92e44bd8 R12: ffff9279150796a0 2021-04-24T18:03:53.804322Z WARNING kernel: [ 516.876925] R13: ffff92791c368180 R14: ffff927915079640 R15: 000000001c867605 2021-04-24T18:03:53.804323Z WARNING kernel: [ 516.876926] FS: 0000000000000000(0000) GS:ffff92791ffc0000(0000) knlGS:0000000000000000 2021-04-24T18:03:53.804323Z WARNING kernel: [ 516.876928] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2021-04-24T18:03:53.804324Z WARNING kernel: [ 516.876929] CR2: 0000239514955000 CR3: 00000007f82da001 CR4: 0000000000760ee0 2021-04-24T18:03:53.804325Z WARNING kernel: [ 516.876930] PKRU: 55555554 2021-04-24T18:03:53.804325Z WARNING kernel: [ 516.876931] Call Trace: 2021-04-24T18:03:53.804326Z WARNING kernel: [ 516.876935] __active_retire+0x77/0xcf 2021-04-24T18:03:53.804326Z WARNING kernel: [ 516.876939] process_one_work+0x1da/0x394 2021-04-24T18:03:53.804327Z WARNING kernel: [ 516.876941] worker_thread+0x216/0x375 2021-04-24T18:03:53.804327Z WARNING kernel: [ 516.876944] kthread+0x147/0x156 2021-04-24T18:03:53.804335Z WARNING kernel: [ 516.876946] ? pr_cont_work+0x58/0x58 2021-04-24T18:03:53.804335Z WARNING kernel: [ 516.876948] ? kthread_blkcg+0x2e/0x2e 2021-04-24T18:03:53.804336Z WARNING kernel: [ 516.876950] ret_from_fork+0x1f/0x40 2021-04-24T18:03:53.804336Z WARNING kernel: [ 516.876952] Modules linked in: cdc_mbim cdc_ncm cdc_wdm xt_cgroup rfcomm cmac algif_hash algif_skcipher af_alg xt_MASQUERADE uinput snd_soc_rt5682_sdw snd_soc_rt5682 snd_soc_max98373_sdw snd_soc_max98373 snd_soc_rl6231 regmap_sdw snd_soc_sof_sdw snd_soc_hdac_hdmi snd_soc_dmic snd_hda_codec_hdmi snd_sof_pci snd_sof_intel_hda_common intel_ipu6_psys snd_sof_xtensa_dsp soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof snd_soc_hdac_hda snd_soc_acpi_intel_match snd_soc_acpi snd_hda_ext_core soundwire_bus snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core intel_ipu6_isys videobuf2_dma_contig videobuf2_v4l2 videobuf2_common videobuf2_memops mei_hdcp intel_ipu6 ov2740 ov8856 at24 sx9310 dw9768 v4l2_fwnode cros_ec_typec intel_pmc_mux roles acpi_als typec fuse iio_trig_sysfs cros_ec_light_prox cros_ec_lid_angle cros_ec_sensors cros_ec_sensors_core industrialio_triggered_buffer cros_ec_sensors_ring kfifo_buf industrialio cros_ec_sensorhub 2021-04-24T18:03:53.804337Z WARNING kernel: [ 516.876972] cdc_ether usbnet iwlmvm lzo_rle lzo_compress iwl7000_mac80211 iwlwifi zram cfg80211 r8152 mii btusb btrtl btintel btbcm bluetooth ecdh_generic ecc joydev 2021-04-24T18:03:53.804337Z EMERG kernel: [ 516.879169] gsmi: Log Shutdown Reason 0x03 This change fixes this by aligning the function. Signed-off-by: Stéphane Marchesin <marcheu@chromium.org> Fixes: 229007e02d69 ("drm/i915: Wrap i915_active in a simple kreffed struct") Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210429031021.1218091-1-marcheu@chromium.org
|
#
402be8a1 |
|
28-Apr-2021 |
Stéphane Marchesin <marcheu@chromium.org> |
drm/i915: Fix crash in auto_retire The retire logic uses the 2 lower bits of the pointer to the retire function to store flags. However, the auto_retire function is not guaranteed to be aligned to a multiple of 4, which causes crashes as we jump to the wrong address, for example like this: 2021-04-24T18:03:53.804300Z WARNING kernel: [ 516.876901] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI 2021-04-24T18:03:53.804310Z WARNING kernel: [ 516.876906] CPU: 7 PID: 146 Comm: kworker/u16:6 Tainted: G U 5.4.105-13595-g3cd84167b2df #1 2021-04-24T18:03:53.804311Z WARNING kernel: [ 516.876907] Hardware name: Google Volteer2/Volteer2, BIOS Google_Volteer2.13672.76.0 02/22/2021 2021-04-24T18:03:53.804312Z WARNING kernel: [ 516.876911] Workqueue: events_unbound active_work 2021-04-24T18:03:53.804313Z WARNING kernel: [ 516.876914] RIP: 0010:auto_retire+0x1/0x20 2021-04-24T18:03:53.804314Z WARNING kernel: [ 516.876916] Code: e8 01 f2 ff ff eb 02 31 db 48 89 d8 5b 5d c3 0f 1f 44 00 00 55 48 89 e5 f0 ff 87 c8 00 00 00 0f 88 ab 47 4a 00 31 c0 5d c3 0f <1f> 44 00 00 55 48 89 e5 f0 ff 8f c8 00 00 00 0f 88 9a 47 4a 00 74 2021-04-24T18:03:53.804319Z WARNING kernel: [ 516.876918] RSP: 0018:ffff9b4d809fbe38 EFLAGS: 00010286 2021-04-24T18:03:53.804320Z WARNING kernel: [ 516.876919] RAX: 0000000000000007 RBX: ffff927915079600 RCX: 0000000000000007 2021-04-24T18:03:53.804320Z WARNING kernel: [ 516.876921] RDX: ffff9b4d809fbe40 RSI: 0000000000000286 RDI: ffff927915079600 2021-04-24T18:03:53.804321Z WARNING kernel: [ 516.876922] RBP: ffff9b4d809fbe68 R08: 8080808080808080 R09: fefefefefefefeff 2021-04-24T18:03:53.804321Z WARNING kernel: [ 516.876924] R10: 0000000000000010 R11: ffffffff92e44bd8 R12: ffff9279150796a0 2021-04-24T18:03:53.804322Z WARNING kernel: [ 516.876925] R13: ffff92791c368180 R14: ffff927915079640 R15: 000000001c867605 2021-04-24T18:03:53.804323Z WARNING kernel: [ 516.876926] FS: 0000000000000000(0000) GS:ffff92791ffc0000(0000) knlGS:0000000000000000 2021-04-24T18:03:53.804323Z WARNING kernel: [ 516.876928] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2021-04-24T18:03:53.804324Z WARNING kernel: [ 516.876929] CR2: 0000239514955000 CR3: 00000007f82da001 CR4: 0000000000760ee0 2021-04-24T18:03:53.804325Z WARNING kernel: [ 516.876930] PKRU: 55555554 2021-04-24T18:03:53.804325Z WARNING kernel: [ 516.876931] Call Trace: 2021-04-24T18:03:53.804326Z WARNING kernel: [ 516.876935] __active_retire+0x77/0xcf 2021-04-24T18:03:53.804326Z WARNING kernel: [ 516.876939] process_one_work+0x1da/0x394 2021-04-24T18:03:53.804327Z WARNING kernel: [ 516.876941] worker_thread+0x216/0x375 2021-04-24T18:03:53.804327Z WARNING kernel: [ 516.876944] kthread+0x147/0x156 2021-04-24T18:03:53.804335Z WARNING kernel: [ 516.876946] ? pr_cont_work+0x58/0x58 2021-04-24T18:03:53.804335Z WARNING kernel: [ 516.876948] ? kthread_blkcg+0x2e/0x2e 2021-04-24T18:03:53.804336Z WARNING kernel: [ 516.876950] ret_from_fork+0x1f/0x40 2021-04-24T18:03:53.804336Z WARNING kernel: [ 516.876952] Modules linked in: cdc_mbim cdc_ncm cdc_wdm xt_cgroup rfcomm cmac algif_hash algif_skcipher af_alg xt_MASQUERADE uinput snd_soc_rt5682_sdw snd_soc_rt5682 snd_soc_max98373_sdw snd_soc_max98373 snd_soc_rl6231 regmap_sdw snd_soc_sof_sdw snd_soc_hdac_hdmi snd_soc_dmic snd_hda_codec_hdmi snd_sof_pci snd_sof_intel_hda_common intel_ipu6_psys snd_sof_xtensa_dsp soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof snd_soc_hdac_hda snd_soc_acpi_intel_match snd_soc_acpi snd_hda_ext_core soundwire_bus snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core intel_ipu6_isys videobuf2_dma_contig videobuf2_v4l2 videobuf2_common videobuf2_memops mei_hdcp intel_ipu6 ov2740 ov8856 at24 sx9310 dw9768 v4l2_fwnode cros_ec_typec intel_pmc_mux roles acpi_als typec fuse iio_trig_sysfs cros_ec_light_prox cros_ec_lid_angle cros_ec_sensors cros_ec_sensors_core industrialio_triggered_buffer cros_ec_sensors_ring kfifo_buf industrialio cros_ec_sensorhub 2021-04-24T18:03:53.804337Z WARNING kernel: [ 516.876972] cdc_ether usbnet iwlmvm lzo_rle lzo_compress iwl7000_mac80211 iwlwifi zram cfg80211 r8152 mii btusb btrtl btintel btbcm bluetooth ecdh_generic ecc joydev 2021-04-24T18:03:53.804337Z EMERG kernel: [ 516.879169] gsmi: Log Shutdown Reason 0x03 This change fixes this by aligning the function. Signed-off-by: Stéphane Marchesin <marcheu@chromium.org> Fixes: 229007e02d69 ("drm/i915: Wrap i915_active in a simple kreffed struct") Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210429031021.1218091-1-marcheu@chromium.org (cherry picked from commit ca419f407b43cc89942ebc297c7a63d94abbcae4) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
bfaae47d |
|
23-Mar-2021 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: make lockdep slightly happier about execbuf. As soon as we install fences, we should stop allocating memory in order to prevent any potential deadlocks. This is required later on, when we start adding support for dma-fence annotations. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-11-maarten.lankhorst@linux.intel.com
|
#
5e963508 |
|
11-Dec-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Use cmpxchg64 for 32b compatilibity By using the double wide cmpxchg64 on 32bit, we can use the same algorithm on both 32/64b systems. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201211110310.22740-1-chris@chris-wilson.co.uk
|
#
f6e98a18 |
|
21-Jan-2021 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Always flush the active worker before returning from the wait The first thing the active retirement worker does is decrement the i915_active count. The first thing we do during i915_active_wait is try to increment the i915_active count, but only if already active [non-zero]. The wait may see that the retirement is already started and so marked the i915_active as idle, and skip waiting for the retirement handler. However, the caller of i915_active_wait may immediately free the i915_active upon returning (e.g. i915_vma_destroy) so we must not return before the concurrent access from the worker is completed. We must always flush the worker. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2473 Fixes: 274cbf20fd10 ("drm/i915: Push the i915_active.retire into a worker") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.5+ Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210121232807.16618-1-chris@chris-wilson.co.uk (cherry picked from commit 977a372e972cb42799746c284035a33c64ebace9) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
f9e62f31 |
|
14-Aug-2020 |
Stephen Boyd <swboyd@chromium.org> |
treewide: Make all debug_obj_descriptors const This should make it harder for the kernel to corrupt the debug object descriptor, used to call functions to fixup state and track debug objects, by moving the structure to read-only memory. Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20200815004027.2046113-3-swboyd@chromium.org
|
#
9ff33bbc |
|
31-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Reduce locking around i915_active_acquire_preallocate_barrier() As the conversion between idle-barrier and full i915_active_fence is already serialised by explicit memory barriers, we can reduce the spinlock in i915_active_acquire_preallocate_barrier() for finding an idle-barrier to reuse to an RCU read lock to ensure the fence remains valid, only taking the spinlock for the update of the rbtree itself. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200731085015.32368-6-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
e28860ae |
|
31-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Make the stale cached active node available for any timeline Rather than require the next timeline after idling to match the MRU before idling, reset the index on the node and allow it to match the first request. However, this requires cmpxchg(u64) and so is not trivial on 32b, so for compatibility we just fallback to keeping the cached node pointing to the MRU timeline. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200731085015.32368-5-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
99a7f4da |
|
31-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Keep the most recently used active-fence upon discard Whenever an i915_active idles, we prune its tree of old fence slots to prevent a gradual leak should it be used to track many, many timelines. The downside is that we then have to frequently reallocate the rbtree. A compromise is that we keep the most recently used fence slot, and reuse that for the next active reference as that is the most likely timeline to be reused. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200731085015.32368-4-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
5d934137 |
|
31-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Export a preallocate variant of i915_active_acquire() Sometimes we have to be very careful not to allocate underneath a mutex (or spinlock) and yet still want to track activity. Enter i915_active_acquire_for_context(). This raises the activity counter on i915_active prior to use and ensures that the fence-tree contains a slot for the context. v2: Refactor active_lookup() so it can be called again before/after locking to resolve contention. Since we protect the rbtree until we idle, we can do a lockfree lookup, with the caveat that if another thread performs a concurrent insertion, the rotations from the insert may cause us to not find our target. A second pass holding the treelock will find the target if it exists, or the place to perform our insertion. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200731085015.32368-3-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
04240e30 |
|
31-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Skip taking acquire mutex for no ref->active callback If no active callback is defined for i915_active, we do not need to serialise its enabling with the mutex. We still do only want to call the debug activate once, and must still serialise with a concurrent retire. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200731085015.32368-2-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
e714977e |
|
01-Aug-2020 |
Tianjia Zhang <tianjia.zhang@linux.alibaba.com> |
drm/i915: Fix wrong return value In function i915_active_acquire_preallocate_barrier(), not all paths have the return value set correctly, and in case of memory allocation failure, a negative error code should be returned. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200802115655.25568-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
3b0a0579 |
|
06-Apr-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Allow asynchronous waits on the i915_active barriers Allow the caller to also wait upon the barriers stored in i915_active. v2: Hook up i915_request_await_active(I915_ACTIVE_AWAIT_BARRIER) as well for completeness, and avoid the lazy GEM_BUG_ON()! v3: Pull flush_lazy_signals() under the active-ref protection as it too walks the rbtree and so we must be careful that we do not free it as we iterate. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200406155840.1728-2-chris@chris-wilson.co.uk
|
#
442dbc5c |
|
06-Apr-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Make exclusive awaits on i915_active optional Later use will require asynchronous waits on the active timelines, but will not utilize an async wait on the exclusive channel. Make the await on the exclusive fence explicit in the selection flags. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200406155840.1728-1-chris@chris-wilson.co.uk
|
#
229007e0 |
|
27-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Wrap i915_active in a simple kreffed struct For conveniences of callers that just want to use an i915_active to track a wide array of concurrent timelines, wrap the base i915_active struct inside a kref. This i915_active will self-destruct after use. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200327112212.16046-2-chris@chris-wilson.co.uk
|
#
d75a92a8 |
|
27-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Allow for different modes of interruptible i915_active_wait Allow some users the discretion to not immediately return on a normal signal. Hopefully, they will opt to use TASK_KILLABLE instead. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200327112212.16046-1-chris@chris-wilson.co.uk
|
#
edee52c9 |
|
23-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Delay release of engine-pm after last retirement Keep the engine-pm awake until the next jiffie, to avoid immediate ping-pong under moderate load. (Forcing the idle barrier excerbates the moderate load, dramatically increasing the driver overhead.) On the other hand, delaying the idle-barrier slightly incurs longer rc6-off and so more power consumption. Closes: https://gitlab.freedesktop.org/drm/intel/issues/848 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200323092841.22240-4-chris@chris-wilson.co.uk
|
#
29e6ecf3 |
|
11-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Extend i915_request_await_active to use all timelines Extend i915_request_await_active() to be able to asynchronously wait on all the tracked timelines simultaneously. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200311092044.16353-1-chris@chris-wilson.co.uk
|
#
c0e31018 |
|
27-Feb-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Skip barriers inside waits Attaching to the i915_active barrier is a two stage process, and a flush is only effective when the barrier is activation. Thus it is possible for us to see a barrier, and attempt to flush, only for our flush to have no effect. As such, before attempting to activate signaling on the fence we need to double check it is a fence! Fixes: d13a31770077 ("drm/i915: Flush idle barriers when waiting") Closes: https://gitlab.freedesktop.org/drm/intel/issues/1333 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200227085723.1961649-1-chris@chris-wilson.co.uk
|
#
d13a3177 |
|
25-Feb-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Flush idle barriers when waiting If we do find ourselves with an idle barrier inside our active while waiting, attempt to flush it by emitting a pulse using the kernel context. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Steve Carbonari <steven.carbonari@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200225192206.1107336-1-chris@chris-wilson.co.uk
|
#
143d9c3e |
|
25-Feb-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Drop assertion that active->fence is unchanged We cannot assert the fence is not yet changed as the next thread may change it prior to acquiring our lock. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200225082233.274530-1-chris@chris-wilson.co.uk
|
#
52144db1 |
|
29-Jan-2020 |
José Roberto de Souza <jose.souza@intel.com> |
drm/i915: Fix preallocated barrier list append Only the first and the last nodes were being added to ref->preallocated_barriers. Renaming variables to make it more easy to read. Fixes: 841350223816 ("drm/i915/gt: Drop mutex serialisation between context pin/unpin") Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200129232345.84512-1-jose.souza@intel.com (cherry picked from commit d4c3c0b8221a72107eaf35c80c40716b81ca463e) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
7c34bb03 |
|
26-Jan-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Tighten atomicity of i915_active_acquire vs i915_active_release As we use a mutex to serialise the first acquire (as it may be a lengthy operation), but only an atomic decrement for the release, we have to be careful in case a second thread races and completes both acquire/release as the first finishes its acquire. Thread A Thread B i915_active_acquire i915_active_acquire atomic_read() == 0 atomic_read() == 0 mutex_lock() mutex_lock() atomic_read() == 0 ref->active(); atomic_inc() mutex_unlock() atomic_read() == 1 i915_active_release atomic_dec_and_test() -> 0 ref->retire() atomic_inc() -> 1 mutex_unlock() So thread A has acquired the ref->active_count but since the ref was still active at the time, it did not initialise it. By switching the check inside the mutex to an atomic increment only if already active, we close the race. Fixes: c9ad602feabe ("drm/i915: Split i915_active.mutex into an irq-safe spinlock for the rbtree") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200126102346.1877661-3-chris@chris-wilson.co.uk (cherry picked from commit ac0e331a628b5ded087eab09fad2ffb082ac61ba) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
30ca04e1 |
|
03-Feb-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Hold reference to previous active fence as we queue Take a reference to the previous exclusive fence on the i915_active, as we wish to add an await to it in the caller (and so must prevent it from being freed until we have completed that task). Fixes: e3793468b466 ("drm/i915: Use the async worker to avoid reclaim tainting the ggtt->mutex") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200203094152.4150550-1-chris@chris-wilson.co.uk
|
#
e3793468 |
|
30-Jan-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Use the async worker to avoid reclaim tainting the ggtt->mutex On Braswell and Broxton (also known as Valleyview and Apollolake), we need to serialise updates of the GGTT using the big stop_machine() hammer. This has the side effect of appearing to lockdep as a possible reclaim (since it uses the cpuhp mutex and that is tainted by per-cpu allocations). However, we want to use vm->mutex (including ggtt->mutex) from within the shrinker and so must avoid such possible taints. For this purpose, we introduced the asynchronous vma binding and we can apply it to the PIN_GLOBAL so long as take care to add the necessary waits for the worker afterwards. Closes: https://gitlab.freedesktop.org/drm/intel/issues/211 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200130181710.2030251-3-chris@chris-wilson.co.uk
|
#
d4c3c0b8 |
|
29-Jan-2020 |
José Roberto de Souza <jose.souza@intel.com> |
drm/i915: Fix preallocated barrier list append Only the first and the last nodes were being added to ref->preallocated_barriers. Renaming variables to make it more easy to read. Fixes: 841350223816 ("drm/i915/gt: Drop mutex serialisation between context pin/unpin") Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200129232345.84512-1-jose.souza@intel.com
|
#
ac0e331a |
|
26-Jan-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Tighten atomicity of i915_active_acquire vs i915_active_release As we use a mutex to serialise the first acquire (as it may be a lengthy operation), but only an atomic decrement for the release, we have to be careful in case a second thread races and completes both acquire/release as the first finishes its acquire. Thread A Thread B i915_active_acquire i915_active_acquire atomic_read() == 0 atomic_read() == 0 mutex_lock() mutex_lock() atomic_read() == 0 ref->active(); atomic_inc() mutex_unlock() atomic_read() == 1 i915_active_release atomic_dec_and_test() -> 0 ref->retire() atomic_inc() -> 1 mutex_unlock() So thread A has acquired the ref->active_count but since the ref was still active at the time, it did not initialise it. By switching the check inside the mutex to an atomic increment only if already active, we close the race. Fixes: c9ad602feabe ("drm/i915: Split i915_active.mutex into an irq-safe spinlock for the rbtree") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200126102346.1877661-3-chris@chris-wilson.co.uk
|
#
416d3838 |
|
17-Jan-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Satisfy smatch that a loop has at least one iteration Smatch worries that the engine->mask may be 0 leading to the loop being shortcircuited leaving the next pointer unset, drivers/gpu/drm/i915/i915_active.c:667 i915_active_acquire_preallocate_barrier() error: uninitialized symbol 'next'. Assert that mask is not 0 and smatch can then verify that next must be initialised before use. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200117110603.2982286-1-chris@chris-wilson.co.uk
|
#
84135022 |
|
06-Jan-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Drop mutex serialisation between context pin/unpin The last remaining reason for serialising the pin/unpin of the intel_context is to ensure that our preallocated wakerefs are not consumed too early (i.e. the unpin of the previous phase does not emit the idle barriers for this phase before we even submit). All of the other operations within the context pin/unpin are supposed to be atomic... Therefore, we can reduce the serialisation to being just on the i915_active.preallocated_barriers itself and drop the nested pin_mutex from intel_context_unpin(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200106114234.2529613-5-chris@chris-wilson.co.uk
|
#
e6ba7648 |
|
21-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove i915->kernel_context Allocate only an internal intel_context for the kernel_context, forgoing a global GEM context for internal use as we only require a separate address space (for our own protection). Now having weaned GT from requiring ce->gem_context, we can stop referencing it entirely. This also means we no longer have to create random and unnecessary GEM contexts for internal use. GEM contexts are now entirely for tracking GEM clients, and intel_context the execution environment on the GPU. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Acked-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191221160324.1073045-1-chris@chris-wilson.co.uk
|
#
bbca083d |
|
05-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Serialise i915_active_acquire() with __active_retire() As __active_retire() does it's final atomic_dec() under the ref->tree_lock spinlock, in order to prevent ourselves from reusing the ref->cache and ref->tree as they are being destroyed, we need to serialise with the retirement during i915_active_acquire(). [ +0.000005] kernel BUG at drivers/gpu/drm/i915/i915_active.c:157! [ +0.000011] invalid opcode: 0000 [#1] SMP [ +0.000004] CPU: 7 PID: 188 Comm: kworker/u16:4 Not tainted 5.4.0-rc8-03070-gac5e57322614 #89 [ +0.000002] Hardware name: Razer Razer Blade Stealth 13 Late 2019/LY320, BIOS 1.02 09/10/2019 [ +0.000082] Workqueue: events_unbound active_work [i915] [ +0.000059] RIP: 0010:__active_retire+0x115/0x120 [i915] [ +0.000003] Code: 75 28 48 8b 3d 8c 6e 1a 00 48 89 ee e8 e4 5f a5 c0 48 8b 44 24 10 65 48 33 04 25 28 00 00 00 75 0f 48 83 c4 18 5b 5d 41 5c c3 <0f> 0b 0f 0b 0f 0b e8 a0 90 87 c0 0f 1f 44 00 00 48 8b 3d 54 6e 1a [ +0.000002] RSP: 0018:ffffb833003f7e48 EFLAGS: 00010286 [ +0.000003] RAX: ffff8d6e8d726d00 RBX: ffff8d6f9db4e840 RCX: 0000000000000000 [ +0.000001] RDX: ffffffff82605930 RSI: ffff8d6f9adc4908 RDI: ffff8d6e96cefe28 [ +0.000002] RBP: ffff8d6e96cefe00 R08: 0000000000000000 R09: ffff8d6f9ffe9a50 [ +0.000002] R10: 0000000000000048 R11: 0000000000000018 R12: ffff8d6f9adc4930 [ +0.000001] R13: ffff8d6f9e04fb00 R14: 0000000000000000 R15: ffff8d6f9adc4988 [ +0.000002] FS: 0000000000000000(0000) GS:ffff8d6f9ffc0000(0000) knlGS:0000000000000000 [ +0.000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000002] CR2: 000055eb5a34cf10 CR3: 000000018d609002 CR4: 0000000000760ee0 [ +0.000002] PKRU: 55555554 [ +0.000001] Call Trace: [ +0.000010] process_one_work+0x1aa/0x350 [ +0.000004] worker_thread+0x4d/0x3a0 [ +0.000004] kthread+0xfb/0x130 [ +0.000004] ? process_one_work+0x350/0x350 [ +0.000003] ? kthread_park+0x90/0x90 [ +0.000005] ret_from_fork+0x1f/0x40 Reported-by: Kenneth Graunke <kenneth@whitecape.org> Fixes: c9ad602feabe ("drm/i915: Split i915_active.mutex into an irq-safe spinlock for the rbtree") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org> Cc: Matthew Auld <matthew.auld@intel.com> Tested-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Link: https://patchwork.freedesktop.org/patch/msgid/20191205183332.801237-1-chris@chris-wilson.co.uk
|
#
e1cda6a5 |
|
02-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Serialise i915_active_wait() with its retirement As the i915_active.retire() may be running on another CPU as we detect that the i915_active is idle, we may not wait for the retirement itself. Wait for the remote callback by waiting for the retirement worker. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112424 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191202140133.2444217-2-chris@chris-wilson.co.uk
|
#
ae303004 |
|
02-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Specialise i915_active.work lock classes Similar to for i915_active.mutex, we require each class of i915_active to have distinct lockdep chains as some, but by no means all, i915_active are used within the shrinker and so have much more severe usage constraints. By using a lockclass local to i915_active_init() all i915_active workers have the same lock class, and we may generate false positives when waiting for the i915_active. If we push the lockclass into the caller, each class of i915_active will have distinct lockdep chains. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191202140133.2444217-1-chris@chris-wilson.co.uk
|
#
df9f85d8 |
|
27-Nov-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Serialise i915_active_fence_set() with itself The expected downside to commit 58b4c1a07ada ("drm/i915: Reduce nested prepare_remote_context() to a trylock") was that it would need to return -EAGAIN to userspace in order to resolve potential mutex inversion. Such an unsightly round trip is unnecessary if we could atomically insert a barrier into the i915_active_fence, so make it happen. Currently, we use the timeline->mutex (or some other named outer lock) to order insertion into the i915_active_fence (and so individual nodes of i915_active). Inside __i915_active_fence_set, we only need then serialise with the interrupt handler in order to claim the timeline for ourselves. However, if we remove the outer lock, we need to ensure the order is intact between not only multiple threads trying to insert themselves into the timeline, but also with the interrupt handler completing the previous occupant. We use xchg() on insert so that we have an ordered sequence of insertions (and each caller knows the previous fence on which to wait, preserving the chain of all fences in the timeline), but we then have to cmpxchg() in the interrupt handler to avoid overwriting the new occupant. The only nasty side-effect is having to temporarily strip off the RCU-annotations to apply the atomic operations, otherwise the rules are much more conventional! Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112402 Fixes: 58b4c1a07ada ("drm/i915: Reduce nested prepare_remote_context() to a trylock") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191127134527.3438410-1-chris@chris-wilson.co.uk
|
#
ee33baa8 |
|
19-Nov-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Mark up the calling context for intel_wakeref_put() Previously, we assumed we could use mutex_trylock() within an atomic context, falling back to a worker if contended. However, such trickery is illegal inside interrupt context, and so we need to always use a worker under such circumstances. As we normally are in process context, we can typically use a plain mutex, and only defer to a work when we know we are being called from an interrupt path. Fixes: 51fbd8de87dc ("drm/i915/pmu: Atomically acquire the gt_pm wakeref") References: a0855d24fc22d ("locking/mutex: Complain upon mutex API misuse in IRQ contexts") References: https://bugs.freedesktop.org/show_bug.cgi?id=111626 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191120125433.3767149-1-chris@chris-wilson.co.uk (cherry picked from commit 07779a76ee1f93f930cf697b22be73d16e14f50c) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
07779a76 |
|
19-Nov-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Mark up the calling context for intel_wakeref_put() Previously, we assumed we could use mutex_trylock() within an atomic context, falling back to a worker if contended. However, such trickery is illegal inside interrupt context, and so we need to always use a worker under such circumstances. As we normally are in process context, we can typically use a plain mutex, and only defer to a work when we know we are being called from an interrupt path. Fixes: 51fbd8de87dc ("drm/i915/pmu: Atomically acquire the gt_pm wakeref") References: a0855d24fc22d ("locking/mutex: Complain upon mutex API misuse in IRQ contexts") References: https://bugs.freedesktop.org/show_bug.cgi?id=111626 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191120125433.3767149-1-chris@chris-wilson.co.uk
|
#
093b9228 |
|
14-Nov-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Split i915_active.mutex into an irq-safe spinlock for the rbtree As we want to be able to run inside atomic context for retiring the i915_active, and we are no longer allowed to abuse mutex_trylock, split the tree management portion of i915_active.mutex into an irq-safe spinlock. References: a0855d24fc22d ("locking/mutex: Complain upon mutex API misuse in IRQ contexts") References: https://bugs.freedesktop.org/show_bug.cgi?id=111626 Fixes: 274cbf20fd10 ("drm/i915: Push the i915_active.retire into a worker") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191114172535.1116-1-chris@chris-wilson.co.uk (cherry picked from commit c9ad602feabe4271d2adf1bdae5d8b20c2dc84f1) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
c9ad602f |
|
14-Nov-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Split i915_active.mutex into an irq-safe spinlock for the rbtree As we want to be able to run inside atomic context for retiring the i915_active, and we are no longer allowed to abuse mutex_trylock, split the tree management portion of i915_active.mutex into an irq-safe spinlock. References: a0855d24fc22d ("locking/mutex: Complain upon mutex API misuse in IRQ contexts") References: https://bugs.freedesktop.org/show_bug.cgi?id=111626 Fixes: 274cbf20fd10 ("drm/i915: Push the i915_active.retire into a worker") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191114172535.1116-1-chris@chris-wilson.co.uk
|
#
2871ea85 |
|
24-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Split intel_ring_submission Split the legacy submission backend from the common CS ring buffer handling. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191024100344.5041-1-chris@chris-wilson.co.uk
|
#
b5e8e954 |
|
21-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Introduce barrier pulses along engines To flush idle barriers, and even inflight requests, we want to send a preemptive 'pulse' along an engine. We use a no-op request along the pinned kernel_context at high priority so that it should run or else kick off the stuck requests. We can use this to ensure idle barriers are immediately flushed, as part of a context cancellation mechanism, or as part of a heartbeat mechanism to detect and reset a stuck GPU. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191021174339.5389-1-chris@chris-wilson.co.uk
|
#
a50134b1 |
|
17-Oct-2019 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
drm/i915: Make for_each_engine_masked work on intel_gt Medium term goal is to eliminate the i915->engine[] array and to get there we have recently introduced equivalent array in intel_gt. Now we need to migrate the code further towards this state. This next step is to eliminate usage of i915->engines[] from the for_each_engine_masked iterator. For this to work we also need to use engine->id as index when populating the gt->engine[] array and adjust the default engine set indexing to use engine->legacy_idx instead of assuming gt->engines[] indexing. v2: * Populate gt->engine[] earlier. * Check that we don't duplicate engine->legacy_idx v3: * Work around the initialization order issue between default_engines() and intel_engines_driver_register() which sets engine->legacy_idx for now. It will be fixed properly later. v4: * Merge with forgotten v2.5. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191017161852.8836-1-tvrtko.ursulin@linux.intel.com
|
#
b7234840 |
|
04-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Move idle barrier cleanup into engine-pm Now that we now longer need to guarantee that the active callback is under the struct_mutex, we can lift it out of the i915_gem_park() and into the engine parking itself. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-7-chris@chris-wilson.co.uk
|
#
b1e3177b |
|
04-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Coordinate i915_active with its own mutex Forgo the struct_mutex serialisation for i915_active, and interpose its own mutex handling for active/retire. This is a multi-layered sleight-of-hand. First, we had to ensure that no active/retire callbacks accidentally inverted the mutex ordering rules, nor assumed that they were themselves serialised by struct_mutex. More challenging though, is the rule over updating elements of the active rbtree. Instead of the whole i915_active now being serialised by struct_mutex, allocations/rotations of the tree are serialised by the i915_active.mutex and individual nodes are serialised by the caller using the i915_timeline.mutex (we need to use nested spinlocks to interact with the dma_fence callback lists). The pain point here is that instead of a single mutex around execbuf, we now have to take a mutex for active tracker (one for each vma, context, etc) and a couple of spinlocks for each fence update. The improvement in fine grained locking allowing for multiple concurrent clients (eventually!) should be worth it in typical loads. v2: Add some comments that barely elucidate anything :( Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-6-chris@chris-wilson.co.uk
|
#
274cbf20 |
|
04-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Push the i915_active.retire into a worker As we need to use a mutex to serialise i915_active activation (because we want to allow the callback to sleep), we need to push the i915_active.retire into a worker callback in case we get need to retire from an atomic context. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-5-chris@chris-wilson.co.uk
|
#
2850748e |
|
04-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Pull i915_vma_pin under the vm->mutex Replace the struct_mutex requirement for pinning the i915_vma with the local vm->mutex instead. Note that the vm->mutex is tainted by the shrinker (we require unbinding from inside fs-reclaim) and so we cannot allocate while holding that mutex. Instead we have to preallocate workers to do allocate and apply the PTE updates after we have we reserved their slot in the drm_mm (using fences to order the PTE writes with the GPU work and with later unbind). In adding the asynchronous vma binding, one subtle requirement is to avoid coupling the binding fence into the backing object->resv. That is the asynchronous binding only applies to the vma timeline itself and not to the pages as that is a more global timeline (the binding of one vma does not need to be ordered with another vma, nor does the implicit GEM fencing depend on a vma, only on writes to the backing store). Keeping the vma binding distinct from the backing store timelines is verified by a number of async gem_exec_fence and gem_exec_schedule tests. The way we do this is quite simple, we keep the fence for the vma binding separate and only wait on it as required, and never add it to the obj->resv itself. Another consequence in reducing the locking around the vma is the destruction of the vma is no longer globally serialised by struct_mutex. A natural solution would be to add a kref to i915_vma, but that requires decoupling the reference cycles, possibly by introducing a new i915_mm_pages object that is own by both obj->mm and vma->pages. However, we have not taken that route due to the overshadowing lmem/ttm discussions, and instead play a series of complicated games with trylocks to (hopefully) ensure that only one destruction path is called! v2: Add some commentary, and some helpers to reduce patch churn. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk
|
#
d19d71fc |
|
18-Sep-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Mark i915_request.timeline as a volatile, rcu pointer The request->timeline is only valid until the request is retired (i.e. before it is completed). Upon retiring the request, the context may be unpinned and freed, and along with it the timeline may be freed. We therefore need to be very careful when chasing rq->timeline that the pointer does not disappear beneath us. The vast majority of users are in a protected context, either during request construction or retirement, where the timeline->mutex is held and the timeline cannot disappear. It is those few off the beaten path (where we access a second timeline) that need extra scrutiny -- to be added in the next patch after first adding the warnings about dangerous access. One complication, where we cannot use the timeline->mutex itself, is during request submission onto hardware (under spinlocks). Here, we want to check on the timeline to finalize the breadcrumb, and so we need to impose a second rule to ensure that the request->timeline is indeed valid. As we are submitting the request, it's context and timeline must be pinned, as it will be used by the hardware. Since it is pinned, we know the request->timeline must still be valid, and we cannot submit the idle barrier until after we release the engine->active.lock, ergo while submitting and holding that spinlock, a second thread cannot release the timeline. v2: Don't be lazy inside selftests; hold the timeline->mutex for as long as we need it, and tidy up acquiring the timeline with a bit of refactoring (i915_active_add_request) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190919111912.21631-1-chris@chris-wilson.co.uk
|
#
f52c6d0d |
|
27-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Only activate i915_active debugobject once The point of debug_object_activate is to mark the first, and only the first, acquisition. The object then remains active until the last release. However, we marked up all successful first acquires even though we allowed concurrent parties to try and acquire the i915_active simultaneously (serialised by the i915_active.mutex). Testcase: igt/gem_mmap_gtt/fault-concurrent Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190827132631.18627-1-chris@chris-wilson.co.uk
|
#
e1d7b66b |
|
19-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: i915_active.retire() is optional Check that i915_active.retire() exists before calling. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190819075835.20065-6-chris@chris-wilson.co.uk
|
#
25ffd4b1 |
|
16-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Markup expected timeline locks for i915_active As every i915_active_request should be serialised by a dedicated lock, i915_active consists of a tree of locks; one for each node. Markup up the i915_active_request with what lock is supposed to be guarding it so that we can verify that the serialised updated are indeed serialised. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190816121000.8507-2-chris@chris-wilson.co.uk
|
#
f130b712 |
|
13-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Serialise read/write of the barrier's engine We use the request pointer inside the i915_active_node as the indicator of the barrier's status; we mark it as used during i915_request_add_active_barriers(), and search for an available barrier in reuse_idle_barrier(). That check must be carefully serialised to ensure we do use an engine for the barrier and not just a random pointer. (Along the other reuse path, we are fully serialised by the timeline->mutex.) The acquisition of the barrier itself is ordered through the strong memory barrier in llist_del_all(). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111397 Fixes: d8af05ff38ae ("drm/i915: Allow sharing the idle-barrier from other kernel requests") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190813200905.11369-1-chris@chris-wilson.co.uk
|
#
75d0a7f3 |
|
09-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Lift timeline into intel_context Move the timeline from being inside the intel_ring to intel_context itself. This saves much pointer dancing and makes the relations of the context to its timeline much clearer. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190809182518.20486-4-chris@chris-wilson.co.uk
|
#
d8af05ff |
|
02-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Allow sharing the idle-barrier from other kernel requests By placing our idle-barriers in the i915_active fence tree, we expose those for reuse by other components that are issuing requests along the kernel_context. Reusing the proto-barrier active_node is perfectly fine as the new request implies a context-switch, and so an opportune point to run the idle-barrier. However, the proto-barrier is not equivalent to a normal active_node and care must be taken to avoid dereferencing the ERR_PTR used as its request marker. v2: Comment the more egregious cheek v3: A glossary! Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: ce476c80b8bf ("drm/i915: Keep contexts pinned until after the next kernel context switch") Fixes: a9877da2d629 ("drm/i915/oa: Reconfigure contexts on the fly") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190802100015.1281-1-chris@chris-wilson.co.uk
|
#
3f99a614 |
|
25-Jul-2019 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
drm/i915: Do not rely on for loop caching the mask for_each_engine_masked caches the engine mask but what does the caller know. Cache it explicitly for clarity and while at it correct the type to match. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190725125056.11942-1-tvrtko.ursulin@linux.intel.com
|
#
79c7a28e |
|
25-Jul-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Capture vma contents outside of spinlock Currently we use the engine->active.lock to ensure that the request is not retired as we capture the data. However, we only need to ensure that the vma are not removed prior to use acquiring their contents, and since we have already relinquished our stop-machine protection, we assume that the user will not be overwriting the contents before we are able to record them. In order to capture the vma outside of the spinlock, we acquire a reference and mark the vma as active to prevent it from being unbound. However, since it is tricky allocate an entry in the fence tree (doing so would require taking a mutex) while inside the engine spinlock, we use an atomic bit and special case the handling for i915_active_wait. The core benefit is that we can use some non-atomic methods for mapping the device pages, we can remove the slow compression phase out of atomic context (i.e. stop antagonising the nmi-watchdog), and no we longer need large reserves of atomic pages. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111215 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190725223843.8971-1-chris@chris-wilson.co.uk
|
#
d650d1f5 |
|
03-Jul-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Markup potential lock for i915_active Make the lockchains more deterministic via i915_active by flagging the potential lock. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190703091726.11690-7-chris@chris-wilson.co.uk
|
#
afd1bcd4 |
|
02-Jul-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Report if i915_active is still busy upon waiting If we try to wait on an i915_active from within a critical section, it will remain busy (such as if we are shrinking from within i915_active_ref). Report the failure so that we do not proceed thinking it is idle. Extracted from a future patch "drm/i915: Coordinate i915_active with its own mutex". Fixes: 12c255b5dad1 ("drm/i915: Provide an i915_active.acquire callback") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190702092117.1707-1-chris@chris-wilson.co.uk
|
#
12c255b5 |
|
21-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Provide an i915_active.acquire callback If we introduce a callback for i915_active that is only called the first time we use the i915_active and is symmetrically paired with the i915_active.retire callback, we can replace the open-coded and non-atomic implementations -- which will be very fragile (i.e. broken) upon removing the struct_mutex serialisation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190621183801.23252-4-chris@chris-wilson.co.uk
|
#
5361db1a |
|
21-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Track i915_active using debugobjects Provide runtime asserts and tracking of i915_active via debugobjects. For example, this should allow us to check that the i915_active is only active when we expect it to be and is never freed too early. One consequence is that, for simplicity, we no longer allow i915_active to be on-stack which only affected the selftests. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190621183801.23252-2-chris@chris-wilson.co.uk
|
#
7009db14 |
|
18-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Keep engine alive as we retire the context Though we pin the context first before taking the pm wakeref, during retire we need to unpin before dropping the pm wakeref (breaking the "natural" onion). During the unpin, we may need to attach a cleanup operation on to the engine wakeref, ergo we want to keep the engine awake until after the unpin. v2: Push the engine wakeref into the barrier so we keep the onion unwind ordering in the request itself Fixes: ce476c80b8bf ("drm/i915: Keep contexts pinned until after the next kernel context switch") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190618074153.16055-1-chris@chris-wilson.co.uk
|
#
ce476c80 |
|
14-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Keep contexts pinned until after the next kernel context switch We need to keep the context image pinned in memory until after the GPU has finished writing into it. Since it continues to write as we signal the final breadcrumb, we need to keep it pinned until the request after it is complete. Currently we know the order in which requests execute on each engine, and so to remove that presumption we need to identify a request/context-switch we know must occur after our completion. Any request queued after the signal must imply a context switch, for simplicity we use a fresh request from the kernel context. The sequence of operations for keeping the context pinned until saved is: - On context activation, we preallocate a node for each physical engine the context may operate on. This is to avoid allocations during unpinning, which may be from inside FS_RECLAIM context (aka the shrinker) - On context deactivation on retirement of the last active request (which is before we know the context has been saved), we add the preallocated node onto a barrier list on each engine - On engine idling, we emit a switch to kernel context. When this switch completes, we know that all previous contexts must have been saved, and so on retiring this request we can finally unpin all the contexts that were marked as deactivated prior to the switch. We can enhance this in future by flushing all the idle contexts on a regular heartbeat pulse of a switch to kernel context, which will also be used to check for hung engines. v2: intel_context_active_acquire/_release Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-1-chris@chris-wilson.co.uk
|
#
df069367 |
|
08-Feb-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Protect i915_active iterators from the shrinker If we allocate while iterating the rbtree of active nodes, we may hit the shrinker and so retire the i915_active, reaping the rbtree. Modifying the rbtree as we iterate is not good behaviour, so acquire the i915_active first to keep the tree intact whenever we allocate. Fixes: a42375af0a30 ("drm/i915: Release the active tracker tree upon idling") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190208134704.23039-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (cherry picked from commit 312c4ba1bb71d666f924f84afd5bdc775b71278f) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
103b76ee |
|
05-Mar-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Use i915_global_register() Rather than manually add every new global into each hook, use i915_global_register() function and keep a list of registered globals to invoke instead. However, I haven't found a way for random drivers to add an .init table to avoid having to manually add ourselves to i915_globals_init() each time. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190305213830.18094-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
#
32eb6bcf |
|
28-Feb-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Make request allocation caches global As kmem_caches share the same properties (size, allocation/free behaviour) for all potential devices, we can use global caches. While this potential has worse fragmentation behaviour (one can argue that different devices would have different activity lifetimes, but you can also argue that activity is temporal across the system) it is the default behaviour of the system at large to amalgamate matching caches. The benefit for us is much reduced pointer dancing along the frequent allocation paths. v2: Defer shrinking until after a global grace period for futureproofing multiple consumers of the slab caches, similar to the current strategy for avoiding shrinking too early. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190228102035.5857-1-chris@chris-wilson.co.uk
|
#
312c4ba1 |
|
08-Feb-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Protect i915_active iterators from the shrinker If we allocate while iterating the rbtree of active nodes, we may hit the shrinker and so retire the i915_active, reaping the rbtree. Modifying the rbtree as we iterate is not good behaviour, so acquire the i915_active first to keep the tree intact whenever we allocate. Fixes: a42375af0a30 ("drm/i915: Release the active tracker tree upon idling") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190208134704.23039-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
#
21950ee7 |
|
05-Feb-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Pull i915_gem_active into the i915_active family Looking forward, we need to break the struct_mutex dependency on i915_gem_active. In the meantime, external use of i915_gem_active is quite beguiling, little do new users suspect that it implies a barrier as each request it tracks must be ordered wrt the previous one. As one of many, it can be used to track activity across multiple timelines, a shared fence, which fits our unordered request submission much better. We need to steer external users away from the singular, exclusive fence imposed by i915_gem_active to i915_active instead. As part of that process, we move i915_gem_active out of i915_request.c into i915_active.c to start separating the two concepts, and rename it to i915_active_request (both to tie it to the concept of tracking just one request, and to give it a longer, less appealing name). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190205130005.2807-5-chris@chris-wilson.co.uk
|
#
5f5c139d |
|
05-Feb-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Allocate active tracking nodes from a slabcache Wrap the active tracking for a GPU references in a slabcache for faster allocations, and hopefully better fragmentation reduction. v3: Nothing device specific left, it's just a slabcache that we can make global. v4: Include i915_active.h and don't put the initfunc under DEBUG_GEM Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190205130005.2807-4-chris@chris-wilson.co.uk
|
#
a42375af |
|
05-Feb-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Release the active tracker tree upon idling As soon as we detect that the active tracker is idle and we prepare to call the retire callback, release the storage for our tree of per-timeline nodes. We expect these to be infrequently used and quick to allocate, so there is little benefit in keeping the tree cached and we would prefer to return the pages back to the system in a timely fashion. This also means that when we finalize the struct as a whole, we know as the activity tracker must be idle, the tree has already been released. Indeed we can reduce i915_active_fini() just to the assertions that there is nothing to do. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190205130005.2807-3-chris@chris-wilson.co.uk
|
#
64d6c500 |
|
05-Feb-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Generalise GPU activity tracking We currently track GPU memory usage inside VMA, such that we never release memory used by the GPU until after it has finished accessing it. However, we may want to track other resources aside from VMA, or we may want to split a VMA into multiple independent regions and track each separately. For this purpose, generalise our request tracking (akin to struct reservation_object) so that we can embed it into other objects. v2: Tweak error handling during selftest setup. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190205130005.2807-2-chris@chris-wilson.co.uk
|