Cross Reference: /linux-master/drivers/gpu/drm/i915/gt/uc/intel_guc

History log of /linux-master/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
Revision	Date	Author	Comments
# be5bcc4b	06-Dec-2023	Andi Shyti <andi.shyti@linux.intel.com>	drm/i915/guc: Create the guc_to_i915() wrapper Given a reference to "guc", the guc_to_i915() returns the pointer to "i915" private data. Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231206184322.57111-1-andi.shyti@linux.intel.com
# 706785c1	13-Nov-2023	John Harrison <John.C.Harrison@Intel.com>	drm/i915/guc: Add a selftest for FAST_REQUEST errors There is a mechanism for reporting errors from fire and forget H2G messages. This is the only way to find out about almost any error in the GuC backend submission path. So it would be useful to know that it is working. v2: Fix some dumb over-complications and a couple of typos - review feedback from Daniele. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231114010016.234570-3-John.C.Harrison@Intel.com
# af58ee22	17-Oct-2023	Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>	drm/i915: Define and use GuC and CTB TLB invalidation routines The GuC firmware had defined the interface for Translation Look-Aside Buffer (TLB) invalidation. We should use this interface when invalidating the engine and GuC TLBs. Add additional functionality to intel_gt_invalidate_tlb, invalidating the GuC TLBs and falling back to GT invalidation when the GuC is disabled. The invalidation is done by sending a request directly to the GuC tlb_lookup that invalidates the table. The invalidation is submitted as a wait request and is performed in the CT event handler. This means we cannot perform this TLB invalidation path if the CT is not enabled. If the request isn't fulfilled in two seconds, this would constitute an error in the invalidation as that would constitute either a lost request or a severe GuC overload. With this new invalidation routine, we can perform GuC-based GGTT invalidations. GuC-based GGTT invalidation is incompatible with MMIO invalidation so we should not perform MMIO invalidation when GuC-based GGTT invalidation is expected. The additional complexity incurred in this patch will be necessary for range-based tlb invalidations, which will be platformed in the future. Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com> Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Signed-off-by: Fei Yang <fei.yang@intel.com> CC: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231017180806.3054290-4-jonathan.cavitt@intel.com
# ff0dac08	17-Oct-2023	Jonathan Cavitt <jonathan.cavitt@intel.com>	drm/i915/guc: Add CT size delay helper As of now, there is no mechanism for tracking a given request's progress through the queue. Instead, add a helper that returns an estimated maximum time the queue should take to drain if completely full. Suggested-by: John Harrison <john.c.harrison@intel.com> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: John Harrison <john.c.harrison@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231017180806.3054290-3-jonathan.cavitt@intel.com
# 3e78f771	06-Oct-2023	Kees Cook <keescook@chromium.org>	drm/i915/guc: Annotate struct ct_incoming_msg with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct ct_incoming_msg. Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: intel-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: linux-hardening@vger.kernel.org Link: https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci [1] Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231006201744.work.135-kees@kernel.org
# b2edc414	15-Aug-2023	John Harrison <John.C.Harrison@Intel.com>	drm/i915/guc: Force a reset on internal GuC error If GuC hits an internal error (and survives long enough to report it to the KMD), it is basically toast and will stop until a GT reset and subsequent GuC reload is performed. Previously, the KMD just printed an error message and then waited for the heartbeat to eventually kick in and trigger a reset (assuming the heartbeat had not been disabled). Instead, force the reset immediately to guarantee that it happens and to eliminate the very long heartbeat delay. The captured error state is also more likely to be useful if captured at the time of the error rather than many seconds later. Note that it is not possible to trigger a reset from with the G2H handler itself. The reset prepare process involves flushing outstanding G2H contents. So a deadlock could result. Instead, the G2H handler queues a worker thread to do the reset asynchronously. v2: Flush the worker on suspend and shutdown. Add rate limiting to prevent spam from a totally dead system (review feedback from Daniele). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230816003957.3572654-1-John.C.Harrison@Intel.com
# f1530f91	07-Aug-2023	Jonathan Cavitt <jonathan.cavitt@intel.com>	drm/i915/gt: Apply workaround 22016122933 correctly WA_22016122933 was recently applied to all MeteorLake engines, which is simultaneously too broad (should only apply to Media engines) and too specific (should apply to all platforms that use the same media engine as MeteorLake). Correct this in cases where coherency settings are modified. There were also two additional places where the workaround was applied unconditionally. The change was confirmed as necessary for all platforms, so the workaround label was removed. Suggested-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Acked-by: Fei Yang <fei.yang@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230801153242.2445478-4-jonathan.cavitt@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20230807121957.598420-4-andi.shyti@linux.intel.com
# 84596e1a	09-May-2023	Michal Wajdeczko <michal.wajdeczko@intel.com>	drm/i915/guc: Drop legacy CTB definitions We've already switched to new HXG definitions some time ago, drop legacy CTB definitions to avoid mistakes. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230509201103.538-1-michal.wajdeczko@intel.com
# a5606b94	26-May-2023	Michal Wajdeczko <michal.wajdeczko@intel.com>	drm/i915/guc: Track all sent actions to GuC For easier debug of any unexpected error responses from GuC that might be related to non-blocking fast requests, track action code (and stack if under DEBUG_GUC config) for every H2G request. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230526235538.2230780-4-John.C.Harrison@Intel.com
# d9911020	26-May-2023	Michal Wajdeczko <michal.wajdeczko@intel.com>	drm/i915/guc: Update log for unsolicited CTB response Instead of printing message fence twice, include HXG header of the unexpected message and its len. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230526235538.2230780-3-John.C.Harrison@Intel.com
# edfd93e6	26-May-2023	Michal Wajdeczko <michal.wajdeczko@intel.com>	drm/i915/guc: Use FAST_REQUEST for non-blocking H2G calls In addition to the already defined REQUEST HXG message format, which is used when sender expects some confirmation or data, HXG protocol includes definition of the FAST REQUEST message, that may be used when sender does not expect any useful data to be returned. Using this instead of GUC_HXG_TYPE_EVENT for non-blocking CTB requests will allow GuC to send back GUC_HXG_TYPE_RESPONSE_FAILURE in case of errors. Note that it is not possible to return such errors to the caller, since this is for non-blocking calls and the related fence is not stored. Instead such messages are treated as unexpected, which will give an indication of potential GuC misprogramming that warrants extra debugging effort. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230526235538.2230780-2-John.C.Harrison@Intel.com
# f6eeea8d	18-Apr-2023	John Harrison <John.C.Harrison@Intel.com>	drm/i915/guc: Dump error capture to dmesg on CTB error In the past, There have been sporadic CTB failures which proved hard to reproduce manually. The most effective solution was to dump the GuC log at the point of failure and let the CI system do the repro. It is preferable not to dump the GuC log via dmesg for all issues as it is not always necessary and is not helpful for end users. But rather than trying to re-invent the code to do this each time it is wanted, commit the code but for DEBUG_GUC builds only. v2: Use IS_ENABLED for testing config options. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230418181744.3251240-3-John.C.Harrison@Intel.com
# a161b6db	24-Apr-2023	Fei Yang <fei.yang@intel.com>	drm/i915/mtl: workaround coherency issue for Media This patch implements Wa_22016122933. In MTL, memory writes initiated by the Media tile update the whole cache line, even for partial writes. This creates a coherency problem for cacheable memory if both CPU and GPU are writing data to different locations within a single cache line. This patch circumvents the issue by making CPU/GPU shared memory uncacheable (WC on CPU side, and PAT index 2 for GPU). Additionally, it ensures that CPU writes are visible to the GPU with an intel_guc_write_barrier(). While fixing the CTB issue, we noticed some random GSC firmware loading failure because the share buffers are cacheable (WB) on CPU side but uncached on GPU side. To fix these issues we need to map such shared buffers as WC on CPU side. Since such allocations are not all done through GuC allocator, to avoid too many code changes, the i915_coherent_map_type() is now hard coded to return WC for MTL. v2: Simplify the commit message(Matt). BSpec: 45101 Signed-off-by: Fei Yang <fei.yang@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230424182902.3663500-3-fei.yang@intel.com
# 7388acb2	28-Jan-2023	Michal Wajdeczko <michal.wajdeczko@intel.com>	drm/i915/guc: Update GuC messages in intel_guc_ct.c Use new macros to have common prefix that also include GT#. v2: drop unused helpers Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230128195907.1837-5-michal.wajdeczko@intel.com
# dd9d3cbe	27-Jul-2022	John Harrison <John.C.Harrison@Intel.com>	drm/i915/guc: Don't abort on CTB_UNUSED status When the KMD sends a CLIENT_RESET request to GuC (as part of the suspend sequence), GuC will mark the CTB buffer as 'UNUSED'. If the KMD then checked the CTB queue, it would see a non-zero status value and report the buffer as corrupted. Technically, no G2H messages should be received once the CLIENT_RESET has been sent. However, if a context was outstanding on an engine then it would get reset and a reset notification would be sent. So, don't actually treat UNUSED as a catastrophic error. Just flag it up as unexpected and keep going. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220728024225.2363663-7-John.C.Harrison@Intel.com
# 22645976	15-Jul-2022	Zhanjun Dong <zhanjun.dong@intel.com>	drm/i915/guc: Check for ct enabled while waiting for response We are seeing error message of "No response for request". Some cases happened while waiting for response and reset/suspend action was triggered. In this case, no response is not an error, active requests will be cancelled. This patch will handle this condition and change the error message into debug message. Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220715211313.143645-1-zhanjun.dong@intel.com
# ff9fbe7c	25-Feb-2022	Lucas De Marchi <lucas.demarchi@intel.com>	drm/i915: Use str_enabled_disabled() Remove the local enableddisabled() implementation and adopt the str_enabled_disabled() from linux/string_helpers.h. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220225234631.3725943-3-lucas.demarchi@intel.com
# 707c3a7d	25-Feb-2022	Lucas De Marchi <lucas.demarchi@intel.com>	drm/i915: Use str_enable_disable() Remove the local enabledisable() implementation and adopt the str_enable_disable() from linux/string_helpers.h. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220225234631.3725943-2-lucas.demarchi@intel.com
# cec49bce	24-Jan-2022	Gustavo A. R. Silva <gustavoars@kernel.org>	drm/i915/guc: Use struct_size() helper in kmalloc() Make use of the struct_size() helper instead of an open-coded version, in order to avoid any potential type mistakes or integer overflows that, in the worst scenario, could lead to heap overflows. Also, address the following sparse warnings: drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c:792:23: warning: using sizeof on a flexible structure Link: https://github.com/KSPP/linux/issues/174 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220125180726.GA68646@embeddedor
# 77b6f79d	06-Jan-2022	John Harrison <John.C.Harrison@Intel.com>	drm/i915/guc: Update to GuC version 69.0.3 Update to the latest GuC release. The latest GuC firmware introduces a number of interface changes: GuC may return NO_RESPONSE_RETRY message for requests sent over CTB. Add support for this reply and try resending the request again as a new CTB message. A KLV (key-length-value) mechanism is now used for passing configuration data such as CTB management. With the new KLV scheme, the old CTB management actions are no longer used and are removed. Register capture on hang is now supported by GuC. Full i915 support for this will be added by a later patch. A minimum support of providing capture memory and register lists is required though, so add that in. The device id of the current platform needs to be provided at init time. The 'poll CS' w/a (Wa_22012773006) was blanket enabled by previous versions of GuC. It must now be explicitly requested by the KMD. So, add in the code to turn it on when relevant. The GuC log entry format has changed. This requires adding a new field to the log header structure to mark the wrap point at the end of the buffer (as the buffer size is no longer a multiple of the log entry size). New CTB notification messages are now sent for some things that were previously only sent via MMIO notifications. Of these, the crash dump notification was not really being handled by i915. It called the log flush code but that only flushed the regular debug log and then only if relay logging was enabled. So just report an error message instead. The 'exception' notification was just being ignored completely. So add an error message for that as well. Note that in either the crash dump or the exception case, the GuC is basically dead. The KMD will detect this via the heartbeat and trigger both an error log (which will include the crash dump as part of the GuC log) and a GT reset. So no other processing is really required. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220107000622.292081-3-John.C.Harrison@Intel.com
# 2aa9f833	14-Dec-2021	Matthew Brost <matthew.brost@intel.com>	drm/i915/guc: Kick G2H tasklet if no credits Let's be paranoid and kick the G2H tasklet, which dequeues messages, if G2H credits are exhausted. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211214170500.28569-7-matthew.brost@intel.com
# 6e94d539	14-Dec-2021	Matthew Brost <matthew.brost@intel.com>	drm/i915/guc: Add extra debug on CT deadlock Print CT state (H2G + G2H head / tail pointers, credits) on CT deadlock. v2: (John Harrison) - Add units to debug messages Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211214170500.28569-6-matthew.brost@intel.com
# 6b540bf6	14-Oct-2021	Matthew Brost <matthew.brost@intel.com>	drm/i915/guc: Implement multi-lrc submission Implement multi-lrc submission via a single workqueue entry and single H2G. The workqueue entry contains an updated tail value for each request, of all the contexts in the multi-lrc submission, and updates these values simultaneously. As such, the tasklet and bypass path have been updated to coalesce requests into a single submission. v2: (John Harrison) - s/wqe/wqi - Use FIELD_PREP macros - Add GEM_BUG_ONs ensures length fits within field - Add comment / white space to intel_guc_write_barrier (Kernel test robot) - Make need_tasklet a static function v3: (Docs) - A comment for submission_stall_reason v4: (Kernel test robot) - Initialize return value in bypass tasklt submit function (John Harrison) - Add comment near work queue defs - Add BUILD_BUG_ON to ensure WQ_SIZE is a power of 2 - Update write_barrier comment to talk about work queue v5: (John Harrison) - Fix typo in work queue comment Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-13-matthew.brost@intel.com
# fb2d2de3	26-Sep-2021	Michal Wajdeczko <michal.wajdeczko@intel.com>	drm/i915/guc: Move and improve error message for missed CTB reply If we timeout waiting for a CT reply we print very simple error message. Improve that and by moving error reporting to the caller we can use CT_ERROR instead of DRM_ERROR and report just fence as error code will be reported later anyway. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210926184545.1407-5-michal.wajdeczko@intel.com
# 0e9deac5	26-Sep-2021	Michal Wajdeczko <michal.wajdeczko@intel.com>	drm/i915/guc: Print error name on CTB send failure Instead of plain error value (%d) print more user friendly error name (%pe). Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210926184545.1407-4-michal.wajdeczko@intel.com
# 0de9765d	26-Sep-2021	Michal Wajdeczko <michal.wajdeczko@intel.com>	drm/i915/guc: Print error name on CTB (de)registration failure Instead of plain error value (%d) print more user friendly error name (%pe). Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210926184545.1407-3-michal.wajdeczko@intel.com
# 217ecd31	26-Sep-2021	Michal Wajdeczko <michal.wajdeczko@intel.com>	drm/i915/guc: Verify result from CTB (de)register action In commit b839a869dfc9 ("drm/i915/guc: Add support for data reporting in GuC responses") we missed the hypothetical case that GuC might return positive non-zero value as success data. While that would be lucky treated as error case, and at the end will result in reporting valid -EIO, in the meantime this value will be passed to ERR_PTR that could be misleading. v2: rebased Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210926184545.1407-2-michal.wajdeczko@intel.com
# d67e3d5a	09-Sep-2021	Matthew Brost <matthew.brost@intel.com>	drm/i915/guc: Process all G2H message at once in work queue Rather than processing 1 G2H at a time and re-queuing the work queue if more messages exist, process all the G2H in a single pass of the work queue. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-6-matthew.brost@intel.com
# d75dc57f	26-Jul-2021	John Harrison <John.C.Harrison@Intel.com>	drm/i915/guc: Don't complain about reset races It is impossible to seal all race conditions of resets occurring concurrent to other operations. At least, not without introducing excesive mutex locking. Instead, don't complain if it occurs. In particular, don't complain if trying to send a H2G during a reset. Whatever the H2G was about should get redone once the reset is over. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-17-matthew.brost@intel.com
# f7957e60	26-Jul-2021	Matthew Brost <matthew.brost@intel.com>	drm/i915/guc: Handle engine reset failure notification GuC will notify the driver, via G2H, if it fails to reset an engine. We recover by resorting to a full GPU reset. v2: (John Harrison): - s/drm_dbg/drm_err Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Fernando Pacheco <fernando.pacheco@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-14-matthew.brost@intel.com
# 1e0fd2b5	26-Jul-2021	Matthew Brost <matthew.brost@intel.com>	drm/i915/guc: Handle context reset notification GuC will issue a reset on detecting an engine hang and will notify the driver via a G2H message. The driver will service the notification by resetting the guilty context to a simple state or banning it completely. v2: (John Harrison) - Move msg[0] lookup after length check v3: (John Harrison) - s/drm_dbg/drm_err Cc: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-13-matthew.brost@intel.com
# 28ff6520	21-Jul-2021	Matthew Brost <matthew.brost@intel.com>	drm/i915/guc: Update GuC debugfs to support new GuC Update GuC debugfs to support the new GuC structures. v2: (John Harrison) - Remove intel_lrc_reg.h include from i915_debugfs.c (Michal) - Rename GuC debugfs functions Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-17-matthew.brost@intel.com
# b97060a9	21-Jul-2021	Matthew Brost <matthew.brost@intel.com>	drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC When running the GuC the GPU can't be considered idle if the GuC still has contexts pinned. As such, a call has been added in intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for the number of unpinned contexts to go to zero. v2: rtimeout -> remaining_timeout v3: Drop unnecessary includes, guc_submission_busy_loop -> guc_submission_send_busy_loop, drop negatie timeout trick, move a refactor of guc_context_unpin to earlier path (John H) v4: Add stddef.h back into intel_gt_requests.h, sort circuit idle function if not in GuC submission mode Cc: John Harrison <john.c.harrison@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-16-matthew.brost@intel.com
# f4eb1f3f	21-Jul-2021	Matthew Brost <matthew.brost@intel.com>	drm/i915/guc: Ensure G2H response has space in buffer Ensure G2H response has space in the buffer before sending H2G CTB as the GuC can't handle any backpressure on the G2H interface. v2: (Matthew) - s/INTEL_GUC_SEND/INTEL_GUC_CT_SEND v3: (Matthew) - Add G2H credit accounting to blocking path, add g2h_release_space helper (John H) - CTB_G2H_BUFFER_SIZE / 4 == G2H_ROOM_BUFFER_SIZE Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-15-matthew.brost@intel.com
# e0717063	21-Jul-2021	Matthew Brost <matthew.brost@intel.com>	drm/i915/guc: Defer context unpin until scheduling is disabled With GuC scheduling, it isn't safe to unpin a context while scheduling is enabled for that context as the GuC may touch some of the pinned state (e.g. LRC). To ensure scheduling isn't enabled when an unpin is done, a call back is added to intel_context_unpin when pin count == 1 to disable scheduling for that context. When the response CTB is received it is safe to do the final unpin. Future patches may add a heuristic / delay to schedule the disable call back to avoid thrashing on schedule enable / disable. v2: (John H) - s/drm_dbg/drm_err (Daneiel) - Clean up sched state function Cc: John Harrison <john.c.harrison@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-9-matthew.brost@intel.com
# 3a4cdf19	21-Jul-2021	Matthew Brost <matthew.brost@intel.com>	drm/i915/guc: Implement GuC context operations for new inteface Implement GuC context operations which includes GuC specific operations alloc, pin, unpin, and destroy. v2: (Daniel Vetter) - Use msleep_interruptible rather than cond_resched in busy loop (Michal) - Remove C++ style comment v3: (Matthew Brost) - Drop GUC_ID_START (John Harrison) - Fix a bunch of typos - Use drm_err rather than drm_dbg for G2H errors (Daniele) - Fix ;; typo - Clean up sched state functions - Add lockdep for guc_id functions - Don't call __release_guc_id when guc_id is invalid - Use MISSING_CASE - Add comment in guc_context_pin - Use shorter path to rpm (Daniele / CI) - Don't call release_guc_id on an invalid guc_id in destroy v4: (Daniel Vetter) - Add FIXME comment Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-7-matthew.brost@intel.com
# 3101e995	08-Jul-2021	John Harrison <John.C.Harrison@Intel.com>	drm/i915/guc: Module load failure test for CT buffer creation Add several module failure load inject points in the CT buffer creation code path. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210708162055.129996-8-matthew.brost@intel.com
# 75452167	08-Jul-2021	Matthew Brost <matthew.brost@intel.com>	drm/i915/guc: Optimize CTB writes and reads CTB writes are now in the path of command submission and should be optimized for performance. Rather than reading CTB descriptor values (e.g. head, tail) which could result in accesses across the PCIe bus, store shadow local copies and only read/write the descriptor values when absolutely necessary. Also store the current space in the each channel locally. v2: (Michal) - Add additional sanity checks for head / tail pointers - Use GUC_CTB_HDR_LEN rather than magic 1 v3: (Michal / John H) - Drop redundant check of head value v4: (John H) - Drop redundant checks of tail / head values v5: (Michal) - Address more nits v6: (Michal) - Add GEM_BUG_ON sanity check on ctb->space Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210708162055.129996-7-matthew.brost@intel.com
# b43b9950	08-Jul-2021	Matthew Brost <matthew.brost@intel.com>	drm/i915/guc: Add stall timer to non blocking CTB send function Implement a stall timer which fails H2G CTBs once a period of time with no forward progress is reached to prevent deadlock. v2: (Michal) - Improve error message in ct_deadlock() - Set broken when ct_deadlock() returns true - Return -EPIPE on ct_deadlock() v3: (Michal) - Add ms to stall timer comment (Matthew) - Move broken check to intel_guc_ct_send() Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210708162055.129996-6-matthew.brost@intel.com
# 1681924d	08-Jul-2021	Matthew Brost <matthew.brost@intel.com>	drm/i915/guc: Add non blocking CTB send function Add non blocking CTB send function, intel_guc_send_nb. GuC submission will send CTBs in the critical path and does not need to wait for these CTBs to complete before moving on, hence the need for this new function. The non-blocking CTB now must have a flow control mechanism to ensure the buffer isn't overrun. A lazy spin wait is used as we believe the flow control condition should be rare with a properly sized buffer. The function, intel_guc_send_nb, is exported in this patch but unused. Several patches later in the series make use of this function. v2: (Michal) - Use define for H2G room calculations - Move INTEL_GUC_SEND_NB define (Daniel Vetter) - Use msleep_interruptible rather than cond_resched v3: (Michal) - Move includes to following patch - s/INTEL_GUC_SEND_NB/INTEL_GUC_CT_SEND_NB/g v4: (John H) - Update comment, add type local variable Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210708162055.129996-5-matthew.brost@intel.com