Cross Reference: /linux-master/drivers/gpu/drm/i915/gt/uc/intel_guc

History log of /linux-master/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
Revision	Date	Author	Comments
# be5bcc4b	06-Dec-2023	Andi Shyti <andi.shyti@linux.intel.com>	drm/i915/guc: Create the guc_to_i915() wrapper Given a reference to "guc", the guc_to_i915() returns the pointer to "i915" private data. Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231206184322.57111-1-andi.shyti@linux.intel.com
# d3110f07	12-Oct-2023	Ville Syrjälä <ville.syrjala@linux.intel.com>	drm/i915/guc: Clean up zero initializers Just use a simple {} to zero initialize arrays/structs instead of the hodgepodge of stuff we are using currently. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231012122442.15718-4-ville.syrjala@linux.intel.com Reviewed-by: Jani Nikula <jani.nikula@intel.com>
# b3343230	26-May-2023	Jani Nikula <jani.nikula@intel.com>	drm/i915/gt/uc: drop unused but set variable sseu Prepare for re-enabling -Wunused-but-set-variable. Apparently sseu is leftover from commit 9a92732f040a ("drm/i915/gt: Add general DSS steering iterator to intel_gt_mcr"). Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Jouni Högander <jouni.hogander@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/d542f25bffd5a50ff621bee93415a972c7768a2a.1685119007.git.jani.nikula@intel.com
# 8ba3ba99	11-May-2023	John Harrison <John.C.Harrison@Intel.com>	drm/i915/guc: Fix confused register capture list creation The GuC has a completely separate engine class enum when referring to register capture lists, which combines render and compute. The driver was using the 'normal' GuC specific engine class enum instead. That meant that it thought it was defining a capture list for compute engines, the list was actually being applied to the GSC engine. And if a platform didn't have a render engine, then it would get no compute register captures at all. Fix that. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230512013544.3367606-1-John.C.Harrison@Intel.com
# 275dac1f	28-Apr-2023	John Harrison <John.C.Harrison@Intel.com>	drm/i915/guc: Don't capture Gen8 regs on Xe devices A pair of pre-Xe registers were being included in the Xe capture list. GuC was rejecting those as being invalid and logging errors about them. So, stop doing it. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Fixes: dce2bd542337 ("drm/i915/guc: Add Gen9 registers for GuC error state capture.") Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230428185636.457407-2-John.C.Harrison@Intel.com (cherry picked from commit b049132d61336f643d8faf2f6574b063667088cf) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
# e4730ae4	28-Apr-2023	John Harrison <John.C.Harrison@Intel.com>	drm/i915/guc: Fix error capture for virtual engines GuC based register dumps in error capture logs were basically broken for virtual engines. This can be seen in igt@gem_exec_balancer@hang: [IGT] gem_exec_balancer: starting subtest hang [drm] GPU HANG: ecode 12:4:e1524110, in gem_exec_balanc [6388] [drm] GT0: GUC: No register capture node found for 0x1005 / 0xFEDC311D [drm] GPU HANG: ecode 12:4:00000000, in gem_exec_balanc [6388] [IGT] gem_exec_balancer: exiting, ret=0 The test causes a hang on both engines of a virtual engine context. The engine instance zero hang gets a valid error capture but the non-instance-zero hang does not. Fix that by scanning through the list of pending register captures when a hang notification for a virtual engine is received. That way, the hang can be assigned to the correct physical engine prior to starting the error capture process. So later on, when the error capture handler tries to find the engine register list, it looks for one on the correct engine. Also, sneak in a missing blank line before a comment in the node search code. v2: Fix null pointer deref on non-GuC platforms. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230428185636.457407-5-John.C.Harrison@Intel.com
# 44e36855	28-Apr-2023	John Harrison <John.C.Harrison@Intel.com>	drm/i915/guc: Capture list naming clean up Don't use 'xe_lp*' prefixes for register lists that are common with Gen8. Don't add Xe only GSC registers to pre-Xe devices that don't even have a GSC engine. Fix Xe_LP name. Don't use GEN9 as a prefix for register lists that contain all GEN8 registers. Rename the 'default_' register list prefix to 'gen8_' as that is the more accurate name. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230428185636.457407-4-John.C.Harrison@Intel.com
# 684ee005	28-Apr-2023	John Harrison <John.C.Harrison@Intel.com>	drm/i915/guc: Consolidate duplicated capture list code Remove 99% duplicated steered register list code. Also, include the pre-Xe steered registers in the pre-Xe list generation. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230428185636.457407-3-John.C.Harrison@Intel.com
# b049132d	28-Apr-2023	John Harrison <John.C.Harrison@Intel.com>	drm/i915/guc: Don't capture Gen8 regs on Xe devices A pair of pre-Xe registers were being included in the Xe capture list. GuC was rejecting those as being invalid and logging errors about them. So, stop doing it. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Fixes: dce2bd542337 ("drm/i915/guc: Add Gen9 registers for GuC error state capture.") Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230428185636.457407-2-John.C.Harrison@Intel.com
# 53c4e64c	10-Mar-2023	John Harrison <John.C.Harrison@Intel.com>	drm/i915/guc: Clean up of register capture search The comparison in the search for a matching register capture node was not the most readable. It was also assuming that a zero GuC id means invalid, which it does not. So remove one invalid term, one redundant term and re-format to keep each term on a single line, and only one term per line. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230311063714.570389-3-John.C.Harrison@Intel.com
# 9724ecdb	10-Mar-2023	John Harrison <John.C.Harrison@Intel.com>	drm/i915/guc: Fix missing ecodes Error captures are tagged with an 'ecode'. This is a pseduo-unique magic number that is meant to distinguish similar seeming bugs with different underlying signatures. It is a combination of two ring state registers. Unfortunately, the register state being used is only valid in execlist mode. In GuC mode, the register state exists in a separate list of arbitrary register address/value pairs rather than the named entry structure. So, search through that list to find the two exciting registers and copy them over to the structure's named members. v2: if else if instead of if if (Alan) Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Fixes: a6f0f9cf330a ("drm/i915/guc: Plumb GuC-capture into gpu_coredump") Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Cc: Michael Cheng <michael.cheng@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Bruce Chang <yu.bruce.chang@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230311063714.570389-2-John.C.Harrison@Intel.com
# 96eecf9b	06-Feb-2023	John Harrison <John.C.Harrison@Intel.com>	drm/i915/guc: More debug print updates - GuC reg capture Update a bunch more debug prints to use the new GT based scheme. v2: Upgrade the no node found message to a warning on the grounds of it being quite important if the error capture can't find any register state information. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230207050717.1833718-4-John.C.Harrison@Intel.com
# 8df23e4c	10-Mar-2023	John Harrison <John.C.Harrison@Intel.com>	drm/i915/guc: Fix missing ecodes Error captures are tagged with an 'ecode'. This is a pseduo-unique magic number that is meant to distinguish similar seeming bugs with different underlying signatures. It is a combination of two ring state registers. Unfortunately, the register state being used is only valid in execlist mode. In GuC mode, the register state exists in a separate list of arbitrary register address/value pairs rather than the named entry structure. So, search through that list to find the two exciting registers and copy them over to the structure's named members. v2: if else if instead of if if (Alan) Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Fixes: a6f0f9cf330a ("drm/i915/guc: Plumb GuC-capture into gpu_coredump") Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Cc: Michael Cheng <michael.cheng@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Bruce Chang <yu.bruce.chang@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230311063714.570389-2-John.C.Harrison@Intel.com (cherry picked from commit 9724ecdbb9ddd6da3260e4a442574b90fc75188a) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
# 583ebae7	26-Jan-2023	John Harrison <John.C.Harrison@Intel.com>	drm/i915/guc: Rename GuC register state capture node to be more obvious The GuC specific register state entry in the error capture object was just called 'capture'. Although the companion 'node' entry was called 'guc_capture_node'. Rename the base entry to be 'guc_capture' instead so that it is a) more consistent and b) more obvious what it is. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-9-John.C.Harrison@Intel.com
# 8b7f7a9b	22-Nov-2022	Jani Nikula <jani.nikula@intel.com>	drm/i915/guc: make default_lists const data The default_lists array should be in rodata. Fixes: dce2bd542337 ("drm/i915/guc: Add Gen9 registers for GuC error state capture.") Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221122141616.3469214-1-jani.nikula@intel.com
# dfa5e6ef	22-Nov-2022	Jani Nikula <jani.nikula@intel.com>	drm/i915/guc: make default_lists const data The default_lists array should be in rodata. Fixes: dce2bd542337 ("drm/i915/guc: Add Gen9 registers for GuC error state capture.") Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221122141616.3469214-1-jani.nikula@intel.com (cherry picked from commit 8b7f7a9b10b704ba7d73199ff0f01354e0bad7a5) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
# b186b2d9	10-Nov-2022	Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>	drm/i915/guc: add the GSC CS to the GuC capture list For the GSC engine we only want to capture the instance regs. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221111001832.4144910-1-daniele.ceraolospurio@intel.com
# befb231d	26-Oct-2022	Alan Previn <alan.previn.teres.alexis@intel.com>	drm/i915/guc: Fix GuC error capture sizing estimation and reporting During GuC error capture initialization, we estimate the amount of size we need for the error-capture-region of the shared GuC-log-buffer. This calculation was incorrect so fix that. With the fixed calculation we can reduce the allocation of error-capture region from 4MB to 1MB (see note2 below for reasoning). Additionally, switch from drm_notice to drm_debug for the 3X spare size check since that would be impossible to hit without redesigning gpu_coredump framework to hold multiple captures. NOTE1: Even for 1x the min size estimation case, actually running out of space is a corner case because it can only occur if all engine instances get reset all at once and i915 isn't able extract the capture data fast enough within G2H handler worker. NOTE2: With the corrected calculation, a DG2 part required ~77K and a PVC required ~115K (1X min-est-size that is calculated as one-shot all-engine- reset scenario). Fixes: d7c15d76a554 ("drm/i915/guc: Check sizing of guc_capture output") Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris.p.wilson@intel.com> Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221026060506.1007830-2-alan.previn.teres.alexis@intel.com
# 5f8a3f65	19-Oct-2022	Alan Previn <alan.previn.teres.alexis@intel.com>	drm/i915/guc: Add compute reglist for guc err capture We missed this at initial upstream because at that time none of the GuC enabled platforms had a compute engine. Add this now. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221019072930.17755-3-alan.previn.teres.alexis@intel.com
# a8940778	19-Oct-2022	Alan Previn <alan.previn.teres.alexis@intel.com>	drm/i915/guc: Add error-capture init warnings when needed If GuC is being used and we initialized GuC-error-capture, we need to be warning if we don't provide an error-capture register list in the firmware ADS, for valid GT engines. A warning makes sense as this would impact debugability without realizing why a reglist wasn't retrieved and reported by GuC. However, depending on the platform, we might have certain engines that have a register list for engine instance error state but not for engine class. Thus, add a check only to warn if the register list was non existent vs an empty list (use the empty lists to skip the warning). NOTE: if a future platform were to introduce new registers in place of what was an empty list on existing / legacy hardware engines no warning is provided as the empty list is meant to be used intentionally. As an example, if a future hardware were to add blitter engine-class-registers (new) on top of the legacy blitter engine-instance-register (HEAD, TAIL, etc.), no warning is generated. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221019072930.17755-2-alan.previn.teres.alexis@intel.com
# 58bc2453	14-Oct-2022	Matt Roper <matthew.d.roper@intel.com>	drm/i915: Define multicast registers as a new type Rather than treating multicast registers as 'i915_reg_t' let's define them as a completely new type. This will allow the compiler to help us make sure we're using multicast-aware functions to operate on multicast registers. This plan does break down a bit in places where we're just maintaining heterogeneous lists of registers (e.g., various MMIO whitelists used by perf, GVT, etc.) rather than performing reads/writes. We only really care about the offset in those cases, so for now we can "cast" the registers as non-MCR, leaving us with a list of i915_reg_t's, but we may want to look for better ways to store mixed collections of i915_reg_t and i915_mcr_reg_t in the future. v2: - Add TLB invalidation registers v3: - Make type checking of i915_mmio_reg_offset() stricter. It will accept either i915_reg_t or i915_mcr_reg_t, but will now raise a compile error if any other type is passed, even if that type contains a 'reg' field. (Jani) - Drop a ton of GVT changes; allowing i915_mmio_reg_offset() to take either an i915_reg_t or an i915_mcr_reg_t means that the huge lists of MMIO_D*() macros used in GVT will continue to work without modification. We need only make changes to structures that have an explicit i915_reg_t in them now. Cc: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221014230239.1023689-13-matthew.d.roper@intel.com
# dfa13f1b	14-Oct-2022	Matt Roper <matthew.d.roper@intel.com>	drm/i915/gen8: Create separate reg definitions for new MCR registers Gen8 was the first time our hardware had multicast registers (or at least the first time the multicast nature was exposed and MMIO accesses could be steered). There are some registers that transitioned from singleton behavior to multicast during the gen7 -> gen8 transition; let's duplicate the register definitions for those registers in preparation for upcoming patches that will handle MCR registers in a special manner. The registers adjusted are: * MISCCPCTL * SAMPLER_INSTDONE * ROW_INSTDONE * ROW_CHICKEN2 * HALF_SLICE_CHICKEN1 * HALF_SLICE_CHICKEN3 v2: - Use the gen8 version of HALF_SLICE_CHICKEN3 in GVT's gen9 engine MMIO list. (Bala) - Update to the gen8 version of MISCCPCTL in a couple new workarounds that were recently added for DG2/PVC. (Bala) Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221014230239.1023689-2-matthew.d.roper@intel.com
# b092e4a9	27-Jul-2022	John Harrison <John.C.Harrison@Intel.com>	drm/i915/guc: Reduce spam from error capture Some debug code got left in when the GuC based register save for error capture was added. Remove that. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220728022028.2190627-8-John.C.Harrison@Intel.com
# 8ad0152a	27-Jul-2022	John Harrison <John.C.Harrison@Intel.com>	drm/i915/guc: Make GuC log sizes runtime configurable The GuC log buffer sizes had to be configured statically at compile time. This can be quite troublesome when needing to get larger logs out of a released driver. So re-organise the code to allow a boot time module parameter override. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220728022028.2190627-7-John.C.Harrison@Intel.com
# 56c7f0e2	27-Jul-2022	John Harrison <John.C.Harrison@Intel.com>	drm/i915/guc: Fix capture size warning and bump the size There was a size check to warn if the GuC error state capture buffer allocation would be too small to fit a reasonable amount of capture data for the current platform. Unfortunately, the test was done too early in the boot sequence and was actually testing 'if(-ENODEV > size)'. Move the check to be later. The check is only used to print a warning message, so it doesn't really matter how early or late it is done. Note that it is not possible to dynamically size the buffer because the allocation needs to be done before the engine information is available (at least, it would be in the intended two-phase GuC init process). Now that the check works, it is reporting size too small for newer platforms. The check includes a 3x oversample multiplier to allow for multiple error captures to be bufferd by GuC before i915 has a chance to read them out. This is less important than simply being big enough to fit the first capture. So a) bump the default size to be large enough for one capture minimum and b) make the warning only if one capture won't fit, instead use a notice for the 3x size. Note that the size estimate is a worst case scenario. Actual captures will likely be smaller. Lastly, use drm_warn istead of DRM_WARN as the former provides more infmration and the latter is deprecated. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220728022028.2190627-3-John.C.Harrison@Intel.com
# 9a92732f	01-Jul-2022	Matt Roper <matthew.d.roper@intel.com>	drm/i915/gt: Add general DSS steering iterator to intel_gt_mcr Although all DSS belong to a single pool on Xe_HP platforms (i.e., they're not organized into slices from a topology point of view), we do still need to pass 'group' and 'instance' targets when steering register accesses to a specific instance of a per-DSS multicast register. The rules for how to determine group and instance IDs (which previously used legacy terms "slice" and "subslice") varies by platform. Some platforms determine steering by gslice membership, some platforms by cslice membership, and future platforms may have other rules. Since looping over each DSS and performing steered unicast register accesses is a relatively common pattern, let's add a dedicated iteration macro to handle this (and replace the platform-specific "instdone" loop we were using previously. This will avoid the calling code needing to figure out the details about how to obtain steering IDs for a specific DSS. Most of the places where we use this new loop are in the GPU errorstate code at the moment, but we do have some additional features coming in the future that will also need to loop over each DSS and steer some register accesses accordingly. Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220701232006.1016135-1-matthew.d.roper@intel.com
# b94a1a20	06-Jun-2022	Alan Previn <alan.previn.teres.alexis@intel.com>	drm/i915/guc: Asynchronous flush of GuC log regions Both error-capture and relay-logging mechanism use the GuC log infrastructure. That means the KMD must send a log flush complete notification back to GuC after reading the data out. This call is currently being sent synchronously. However, synchronous H2Gs cause problems when the system is backed up. There is no need for this to be synchronous. The KMD wasn't even looking at the return status from it. So make it asynchronous and then there is no issue about time outs. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220607002314.1451656-2-alan.previn.teres.alexis@intel.com
# e180a7b2	06-May-2022	Alan Previn <alan.previn.teres.alexis@intel.com>	drm/i915/guc: Remove unnecessary GuC err capture noise GuC error capture blurts some debug messages about empty register lists for certain register types on engines during firmware initialization. These are not errors or warnings, so get rid of them. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220507045847.862261-2-alan.previn.teres.alexis@intel.com
# c0c73850	23-Mar-2022	Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>	drm/i915/guc: Correctly free guc capture struct on error On error the "new" allocation is not freed, so add the required kfree. Fixes: 247f8071d5893 ("drm/i915/guc: Pre-allocate output nodes for extraction") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: John Harrison <john.c.harrison@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220324000439.2370440-1-daniele.ceraolospurio@intel.com
# a0f1f7b4	21-Mar-2022	Alan Previn <alan.previn.teres.alexis@intel.com>	drm/i915/guc: Print the GuC error capture output register list. Print the GuC captured error state register list (string names and values) when gpu_coredump_state printout is invoked via the i915 debugfs for flushing the gpu error-state that was captured prior. Since GuC could have reported multiple engine register dumps in a single notification event, parse the captured data (appearing as a stream of structures) to identify each dump as a different 'engine-capture-group-output'. Finally, for each 'engine-capture-group-output' that is found, verify if the engine register dump corresponds to the engine_coredump content that was previously populated by the i915_gpu_coredump function. That function would have copied the context's vma's including the bacth buffer during the G2H-context-reset notification that occurred earlier. Perform this verification check by comparing guc_id, lrca and engine- instance obtained from the 'engine-capture-group-output' vs a copy of that same info taken during i915_gpu_coredump. If they match, then print those vma's as well (such as the batch buffers). NOTE: the output format was verified using the gem_exec_capture IGT test. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220321164527.2500062-14-alan.previn.teres.alexis@intel.com
# a6f0f9cf	21-Mar-2022	Alan Previn <alan.previn.teres.alexis@intel.com>	drm/i915/guc: Plumb GuC-capture into gpu_coredump Add a flags parameter through all of the coredump creation functions. Add a bitmask flag to indicate if the top level gpu_coredump event is triggered in response to a GuC context reset notification. Using that flag, ensure all coredump functions that read or print mmio-register values related to work submission or command-streamer engines are skipped and replaced with a calls guc-capture module equivalent functions to retrieve or print the register dump. While here, split out display related register reading and printing into its own function that is called agnostic to whether GuC had triggered the reset. For now, introduce an empty printing function that can filled in on a subsequent patch just to handle formatting. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220321164527.2500062-13-alan.previn.teres.alexis@intel.com
# 247f8071	21-Mar-2022	Alan Previn <alan.previn.teres.alexis@intel.com>	drm/i915/guc: Pre-allocate output nodes for extraction In the rare but possible scenario where we are in the midst of multiple GuC error-capture (and engine reset) events and the user also triggers a forced full GT reset or the internal watchdog triggers the same, intel_guc_submission_reset_prepare's call to flush_work(&guc->ct.requests.worker) can cause the G2H message handler to trigger intel_guc_capture_store_snapshot upon receiving new G2H error-capture notifications. This can happen despite the prior call to disable_submission(guc);. However, there's no race-free way for intel_guc_capture_store_snapshot to know that we are in the midst of a reset. That said, we can never dynamically allocate the output nodes in this handler. Thus, we shall pre-allocate a fixed number of empty nodes up front (at the time of ADS registration) that we can consume from or return to an internal cached list of nodes. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220321164527.2500062-12-alan.previn.teres.alexis@intel.com
# f5718a72	21-Mar-2022	Alan Previn <alan.previn.teres.alexis@intel.com>	drm/i915/guc: Extract GuC error capture lists on G2H notification. - Upon the G2H Notify-Err-Capture event, parse through the GuC Log Buffer (error-capture-subregion) and generate one or more capture-nodes. A single node represents a single "engine- instance-capture-dump" and contains at least 3 register lists: global, engine-class and engine-instance. An internal link list is maintained to store one or more nodes. - Because the link-list node generation happen before the call to i915_gpu_codedump, duplicate global and engine-class register lists for each engine-instance register dump if we find dependent-engine resets in a engine-capture-group. - When i915_gpu_coredump calls into capture_engine, (in a subsequent patch) we detach the matching node (guc-id, LRCA, etc) from the link list above and attach it to i915_gpu_coredump's intel_engine_coredump structure when have matching LRCA/guc-id/engine-instance. Additional notes to be aware of: - GuC generates the error capture dump into the GuC log buffer but this buffer is one big log buffer with 3 independent subregions within it. Each subregion is populated with different content and used in different ways and timings but all regions operate behave as independent ring buffers. Each guc-log subregion (general-logs, crash-dump and error- capture) has it's own guc_log_buffer_state that contain independent read and write pointers. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220321164527.2500062-11-alan.previn.teres.alexis@intel.com
# d7c15d76	21-Mar-2022	Alan Previn <alan.previn.teres.alexis@intel.com>	drm/i915/guc: Check sizing of guc_capture output Add intel_guc_capture_output_min_size_est function to provide a reasonable minimum size for error-capture region before allocating the shared buffer. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220321164527.2500062-10-alan.previn.teres.alexis@intel.com
# dce2bd54	21-Mar-2022	Alan Previn <alan.previn.teres.alexis@intel.com>	drm/i915/guc: Add Gen9 registers for GuC error state capture. Abstract out a Gen9 register list as the default for all other platforms we don't yet formally support GuC submission on. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220321164527.2500062-6-alan.previn.teres.alexis@intel.com
# 33a220f6	21-Mar-2022	Alan Previn <alan.previn.teres.alexis@intel.com>	drm/i915/guc: Add DG2 registers for GuC error state capture. Add additional DG2 registers for GuC error state capture. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220321164527.2500062-5-alan.previn.teres.alexis@intel.com
# 193be3f4	21-Mar-2022	Alan Previn <alan.previn.teres.alexis@intel.com>	drm/i915/guc: Add XE_LP steered register lists support Add the ability for runtime allocation and freeing of steered register list extentions that depend on the detected HW config fuses. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220321164527.2500062-4-alan.previn.teres.alexis@intel.com
# 8b72c216	21-Mar-2022	Alan Previn <alan.previn.teres.alexis@intel.com>	drm/i915/guc: Add XE_LP static registers for GuC error capture. Add device specific tables and register lists to cover different engines class types for GuC error state capture for XE_LP products. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220321164527.2500062-3-alan.previn.teres.alexis@intel.com
# 24492514	21-Mar-2022	Alan Previn <alan.previn.teres.alexis@intel.com>	drm/i915/guc: Update GuC ADS size for error capture lists Update GuC ADS size allocation to include space for the lists of error state capture register descriptors. Then, populate GuC ADS with the lists of registers we want GuC to report back to host on engine reset events. This list should include global, engine-class and engine-instance registers for every engine-class type on the current hardware. Ensure we allocate a persistent store for the register lists that are populated into ADS so that we don't need to allocate memory during GT resets when GuC is reloaded and ADS population happens again. NOTE: Start with a sample static table of register lists to layout the framework before adding real registers in subsequent patch. This static register tables are a different format from the ADS populated list. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220321164527.2500062-2-alan.previn.teres.alexis@intel.com