History log of /linux-master/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
Revision Date Author Comments
# d3bbc4df 26-Mar-2024 Miguel Ojeda <ojeda@kernel.org>

drm/msm: fix the `CRASHDUMP_READ` target of `a6xx_get_shader_block()`

Clang 14 in an (essentially) defconfig arm64 build for next-20240326
reports [1]:

drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c:843:6: error:
variable 'out' set but not used [-Werror,-Wunused-but-set-variable]

The variable `out` in these functions is meant to compute the `target` of
`CRASHDUMP_READ()`, but in this case only the initial value (`dumper->iova
+ A6XX_CD_DATA_OFFSET`) was being passed.

Thus use `out` as it was intended by Connor [2].

There was an alternative patch at [3] that removed the variable
altogether, but that would only use the initial value.

Fixes: 64d6255650d4 ("drm/msm: More fully implement devcoredump for a7xx")
Closes: https://lore.kernel.org/lkml/CANiq72mjc5t4n25SQvYSrOEhxxpXYPZ4pPzneSJHEnc3qApu2Q@mail.gmail.com/ [1]
Link: https://lore.kernel.org/lkml/CACu1E7HhCKMJd6fixZSPiNAz6ekoZnkMTHTcLFVmbZ-9VoLxKg@mail.gmail.com/ [2]
Link: https://lore.kernel.org/lkml/20240307093727.1978126-1-colin.i.king@gmail.com/ [3]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
Patchwork: https://patchwork.freedesktop.org/patch/584955/
Signed-off-by: Rob Clark <robdclark@chromium.org>


# 64d62556 25-Jan-2024 Connor Abbott <cwabbott0@gmail.com>

drm/msm: More fully implement devcoredump for a7xx

Use the vendor-provided snapshot headers to dump the contextless
registers, shader blocks, and cluster registers. Still unimplemented are
the GMU registers and "external core" registers, which would require
more work because they use register spaces we don't have described in
devicetree and dump registers from multiple spaces in a single list.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/575919/
Signed-off-by: Rob Clark <robdclark@chromium.org>


# d98c220f 25-Jan-2024 Connor Abbott <cwabbott0@gmail.com>

drm/msm: Fix snapshotting a7xx indexed regs

We were overwriting the last indexed reg (CP_ROQ) and we were
snapshotting the same CP_MEMPOOL block twice instead of snapshotting
CP_BV_MEMPOOL as intended.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/575920/
Signed-off-by: Rob Clark <robdclark@chromium.org>


# b08d26da 11-Oct-2023 Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

drm/msm/a7xx: actually use a7xx state registers

Make a6xx_get_registers() use a7xx registers instead of a6xx ones if the
detected Adreno is from the A7xx family.

Fixes: e997ae5f45ca ("drm/msm/a6xx: Mostly implement A7xx gpu_state")
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/562233/
Signed-off-by: Rob Clark <robdclark@chromium.org>


# e997ae5f 25-Sep-2023 Konrad Dybcio <konrad.dybcio@linaro.org>

drm/msm/a6xx: Mostly implement A7xx gpu_state

Provide the necessary alternations to mostly support state dumping on
A7xx. Newer GPUs will probably require more changes here. Crashdumper
and debugbus remain untested.

Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
Tested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> # sm8450
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/559289/
Signed-off-by: Rob Clark <robdclark@chromium.org>


# 5a903a44 15-Jun-2023 Konrad Dybcio <konrad.dybcio@linaro.org>

drm/msm/a6xx: Introduce GMU wrapper support

Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
but don't implement the associated GMUs. This is due to the fact that
the GMU directly pokes at RPMh. Sadly, this means we have to take care
of enabling & scaling power rails, clocks and bandwidth ourselves.

Reuse existing Adreno-common code and modify the deeply-GMU-infused
A6XX code to facilitate these GPUs. This involves if-ing out lots
of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
the actual name that Qualcomm uses in their downstream kernels).

This is essentially a register region which is convenient to model
as a device. We'll use it for managing the GDSCs. The register
layout matches the actual GMU_CX/GX regions on the "real GMU" devices
and lets us reuse quite a bit of gmu_read/write/rmw calls.

Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/542766/
Signed-off-by: Rob Clark <robdclark@chromium.org>


# f73343fa 20-Mar-2023 Rob Clark <robdclark@chromium.org>

drm/msm: Update generated headers

It's been a bit overdue. Regen headers to pull in a2xx perfcntr
updates, etc.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/527926/
Link: https://lore.kernel.org/r/20230320185416.938842-2-robdclark@gmail.com


# 3cba4a2c 21-Dec-2022 Akhil P Oommen <quic_akhilpo@quicinc.com>

drm/msm/a6xx: Update ROQ size in coredump

Since RoQ size differs between generations, calculate dynamically the
RoQ size while capturing coredump.

Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
Patchwork: https://patchwork.freedesktop.org/patch/515610/
Link: https://lore.kernel.org/r/20221221203925.v2.4.I07f22966395eb045f6b312710f53890d5d7e69d4@changeid
Signed-off-by: Rob Clark <robdclark@chromium.org>


# 1e05bba5 21-Dec-2022 Akhil P Oommen <quic_akhilpo@quicinc.com>

drm/msm/a6xx: Update a6xx gpu coredump

Update gpu coredump for a660/a650 family of gpus with the extra
information available.

Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
Patchwork: https://patchwork.freedesktop.org/patch/515608/
Link: https://lore.kernel.org/r/20221221203925.v2.3.Ifbfce6d693b202dac92006345bb825e7c5aee9c6@changeid
Signed-off-by: Rob Clark <robdclark@chromium.org>


# cade05b2 14-Nov-2022 Rob Clark <robdclark@chromium.org>

drm/msm/adreno: Simplify read64/write64 helpers

The _HI reg is always following the _LO reg, so no need to pass these
offsets seprately.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
Patchwork: https://patchwork.freedesktop.org/patch/511581/
Link: https://lore.kernel.org/r/20221114193049.1533391-2-robdclark@gmail.com


# ec4fbd79 13-Oct-2022 Rob Clark <robdclark@chromium.org>

drm/msm/a6xx: Remove state objects from list before freeing

Technically it worked as it was before, only because it was using the
_safe version of the iterator. But it is sloppy practice to leave
dangling pointers.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
Patchwork: https://patchwork.freedesktop.org/patch/507017/
Link: https://lore.kernel.org/r/20221013225520.371226-4-robdclark@gmail.com


# fab384c4 13-Oct-2022 Rob Clark <robdclark@chromium.org>

drm/msm/a6xx: Skip snapshotting unused GMU buffers

Some buffers are unused on certain sub-generations of a6xx. So just
skip them.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
Patchwork: https://patchwork.freedesktop.org/patch/507013/
Link: https://lore.kernel.org/r/20221013225520.371226-3-robdclark@gmail.com


# 83d18e9d 13-Oct-2022 Rob Clark <robdclark@chromium.org>

drm/msm/a6xx: Fix kvzalloc vs state_kcalloc usage

adreno_show_object() is a trap! It will re-allocate the pointer it is
passed on first call, when the data is ascii85 encoded, using kvmalloc/
kvfree(). Which means the data *passed* to it must be kvmalloc'd, ie.
we cannot use the state_kcalloc() helper.

This partially reverts commit ec8f1813bf8d ("drm/msm/a6xx: Replace
kcalloc() with kvzalloc()"), but adds the missing kvfree() to fix the
memory leak that was present previously. And adds a warning comment.

Fixes: ec8f1813bf8d ("drm/msm/a6xx: Replace kcalloc() with kvzalloc()")
Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/20
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
Patchwork: https://patchwork.freedesktop.org/patch/507014/
Link: https://lore.kernel.org/r/20221013225520.371226-2-robdclark@gmail.com


# ec8f1813 27-Sep-2022 Akhil P Oommen <quic_akhilpo@quicinc.com>

drm/msm/a6xx: Replace kcalloc() with kvzalloc()

In order to reduce chance of allocation failure while capturing a6xx
gpu state, use kvzalloc() instead of kcalloc() in state_kcalloc().

Indirectly, this patch helps to fix leaking memory allocated for
gmu_debug object.

Fixes: b859f9b009b (drm/msm/gpu: Snapshot GMU debug buffer)
Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
Patchwork: https://patchwork.freedesktop.org/patch/505074/
Link: https://lore.kernel.org/r/20220928124830.1.I8ea24a8d586b4978823b848adde000f92f74d5c2@changeid
Signed-off-by: Rob Clark <robdclark@chromium.org>


# 08c4aa3e 09-Dec-2021 Rob Clark <robdclark@chromium.org>

drm/msm/a6xx: Skip crashdumper state if GPU needs_hw_init

I am seeing some crash logs which imply that we are trying to use
crashdumper hw to read back GPU state when the GPU isn't initialized.
This doesn't go well (for example, GPU could be in 32b address mode
and ignoring the upper bits of buffer that it is trying to dump state
to).

I'm not *quite* sure how we get into this state in the first place,
but lets not make a bad situation worse by triggering iova fault
crashes.

While we're at it, also add the information about whether the GPU is
initialized to the devcore dump to make this easier to see in the
logs (which makes the WARN_ON() redundant and even harmful because
it fills up the small bit of dmesg we get with the crash report).

Signed-off-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20211209193118.1163248-1-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>


# b859f9b0 24-Nov-2021 Rob Clark <robdclark@chromium.org>

drm/msm/gpu: Snapshot GMU debug buffer

It appears to be a GMU fw build option whether it does anything with
debug and log buffers, but if they are all zeros it won't add anything
to the devcore size.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20211124214151.1427022-10-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>


# 1691e005 24-Nov-2021 Rob Clark <robdclark@chromium.org>

drm/msm/gpu: Also snapshot GMU HFI buffer

This also includes a history of start index of the last 8 messages on
each queue, since parsing backwards to decode recently sent HFI messages
is hard(ish).

Signed-off-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20211124214151.1427022-9-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>


# 203dcd5e 24-Nov-2021 Rob Clark <robdclark@chromium.org>

drm/msm/gpu: Make a6xx_get_gmu_log() more generic

Turn it into a thing we can use to snapshot other GMU buffers.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20211124214151.1427022-8-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>


# 518380cb 24-Nov-2021 Akhil P Oommen <quic_akhilpo@quicinc.com>

drm/msm/a6xx: Capture gmu log in devcoredump

Capture gmu log in coredump to enhance debugging.

Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20211124214151.1427022-2-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>


# b4d25abf 03-Nov-2021 Douglas Anderson <dianders@chromium.org>

drm/msm/a6xx: Allocate enough space for GMU registers

In commit 142639a52a01 ("drm/msm/a6xx: fix crashstate capture for
A650") we changed a6xx_get_gmu_registers() to read 3 sets of
registers. Unfortunately, we didn't change the memory allocation for
the array. That leads to a KASAN warning (this was on the chromeos-5.4
kernel, which has the problematic commit backported to it):

BUG: KASAN: slab-out-of-bounds in _a6xx_get_gmu_registers+0x144/0x430
Write of size 8 at addr ffffff80c89432b0 by task A618-worker/209
CPU: 5 PID: 209 Comm: A618-worker Tainted: G W 5.4.156-lockdep #22
Hardware name: Google Lazor Limozeen without Touchscreen (rev5 - rev8) (DT)
Call trace:
dump_backtrace+0x0/0x248
show_stack+0x20/0x2c
dump_stack+0x128/0x1ec
print_address_description+0x88/0x4a0
__kasan_report+0xfc/0x120
kasan_report+0x10/0x18
__asan_report_store8_noabort+0x1c/0x24
_a6xx_get_gmu_registers+0x144/0x430
a6xx_gpu_state_get+0x330/0x25d4
msm_gpu_crashstate_capture+0xa0/0x84c
recover_worker+0x328/0x838
kthread_worker_fn+0x32c/0x574
kthread+0x2dc/0x39c
ret_from_fork+0x10/0x18

Allocated by task 209:
__kasan_kmalloc+0xfc/0x1c4
kasan_kmalloc+0xc/0x14
kmem_cache_alloc_trace+0x1f0/0x2a0
a6xx_gpu_state_get+0x164/0x25d4
msm_gpu_crashstate_capture+0xa0/0x84c
recover_worker+0x328/0x838
kthread_worker_fn+0x32c/0x574
kthread+0x2dc/0x39c
ret_from_fork+0x10/0x18

Fixes: 142639a52a01 ("drm/msm/a6xx: fix crashstate capture for A650")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20211103153049.1.Idfa574ccb529d17b69db3a1852e49b580132035c@changeid
Signed-off-by: Rob Clark <robdclark@chromium.org>


# 1c8e5748 02-Oct-2021 Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

drm/msm/a6xx: correct cx_debugbus_read arguments

First argument of cx_debugbus_read() should be 'void __iomem *' rather
than 'void * __iomem' to make sparse happy.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20211002183118.748841-1-dmitry.baryshkov@linaro.org
Signed-off-by: Rob Clark <robdclark@chromium.org>


# 030af2b0 27-Jul-2021 Rob Clark <robdclark@chromium.org>

drm/msm: drop drm_gem_object_put_locked()

No idea why we were still using this. It certainly hasn't been needed
for some time. So drop the pointless twin codepaths.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20210728010632.2633470-4-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>


# e25e92e0 10-Jun-2021 Rob Clark <robdclark@chromium.org>

drm/msm: devcoredump iommu fault support

Wire up support to stall the SMMU on iova fault, and collect a devcore-
dump snapshot for easier debugging of faults.

Currently this is a6xx-only, but mostly only because so far it is the
only one using adreno-smmu-priv.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Jordan Crouse <jordan@cosmicpenguin.net>
Link: https://lore.kernel.org/r/20210610214431.539029-6-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>


# a5fc7aa9 23-Apr-2021 Jonathan Marek <jonathan@marek.ca>

drm/msm: replace MSM_BO_UNCACHED with MSM_BO_WC for internal objects

msm_gem_get_vaddr() currently always maps as writecombine, so use the right
flag instead of relying on broken behavior (things don't actually work if
they are mapped as uncached).

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Acked-by: Jordan Crouse <jordan@cosmicpenguin.net>
Link: https://lore.kernel.org/r/20210423190833.25319-3-jonathan@marek.ca
Signed-off-by: Rob Clark <robdclark@chromium.org>


# 692bdf97 24-Nov-2020 Lee Jones <lee.jones@linaro.org>

drm/msm/adreno/a6xx_gpu_state: Make some local functions static

Fixes the following W=1 kernel build warning(s):

drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c:83:7: warning: no previous prototype for ‘state_kcalloc’ [-Wmissing-prototypes]
drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c:95:7: warning: no previous prototype for ‘state_kmemdup’ [-Wmissing-prototypes]
drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c:947:6: warning: no previous prototype for ‘a6xx_gpu_state_destroy’ [-Wmissing-prototypes]

Cc: Rob Clark <robdclark@gmail.com>
Cc: Sean Paul <sean@poorly.run>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: linux-arm-msm@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: freedreno@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>


# 08d3ab4b 12-Sep-2020 Zhenzhong Duan <zhenzhong.duan@gmail.com>

drm/msm/a6xx: fix a potential overflow issue

It's allocating an array of a6xx_gpu_state_obj structure rathor than
its pointers.

This patch fix it.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@gmail.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>


# 6f7cd6e4 11-Aug-2020 Rob Clark <robdclark@chromium.org>

drm/msm/a6xx: add module param to enable debugbus snapshot

For production devices, the debugbus sections will typically be fused
off and empty in the gpu device coredump. But since this may contain
data like cache contents, don't capture it by default.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>


# 142639a5 29-Jun-2020 Jonathan Marek <jonathan@marek.ca>

drm/msm/a6xx: fix crashstate capture for A650

A650 has a separate RSCC region, so dump RSCC registers separately, reading
them from the RSCC base. Without this change a GPU hang will cause a system
reset if CONFIG_DEV_COREDUMP is enabled.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>


# a5ab3176 03-Dec-2019 Sharat Masetty <smasetty@codeaurora.org>

drm: msm: a6xx: Dump GBIF registers, debugbus in gpu state

Add the relevant GBIF registers and the debug bus to the a6xx gpu
state. This comes in pretty handy when debugging GPU bus related
issues.

Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>


# 7f4009c4 06-Nov-2019 Sharat Masetty <smasetty@codeaurora.org>

drm: msm: a6xx: fix debug bus register configuration

Fix the cx debugbus related register configuration, to collect accurate
bus data during gpu snapshot. This helps with complete snapshot dump
and also complete proper GPU recovery.

Fixes: 1707add81551 ("drm/msm/a6xx: Add a6xx gpu state")
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/339165


# e400b9ed 03-Dec-2018 Jordan Crouse <jcrouse@codeaurora.org>

drm/msm/a6xx: Add a name for the crashdumper buffer

Add a buffer object name for the a6xx crashdumper so it can be
seen with the changes introduced by 7799a98edd
("drm/msm: Add a name field for gem objects").

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>


# d135c7eb 03-Dec-2018 Jordan Crouse <jcrouse@codeaurora.org>

drm/msm/a6xx: Use new kernel API free function for gpu state

dadb36b7ec42 ("drm/msm: Add a common function to free kernel buffer objects")
missed freeing the crashdumper state for a6xx.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>


# 7ad0e8cf 07-Nov-2018 Jordan Crouse <jcrouse@codeaurora.org>

drm/msm: Count how many times iova memory is pinned

Add a reference count to track how many times a particular
chunk of iova memory is pinned (mapped) in the iomu and
add msm_gem_unpin_iova to give up references.

It is important to note that msm_gem_unpin_iova replaces
msm_gem_put_iova because the new implicit behavior
that an assigned iova in a given vma is now valid for the
life of the buffer and what we are really focusing on is
the use of that iova.

For now the unmappings are lazy; once the reference counts
go to zero they *COULD* be unmapped dynamically but that
will require an outside force such as a shrinker or
mm_notifiers. For now, we're just focusing on getting
the counting right and setting ourselves up to be ready
for the future.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>


# d6852b4b 02-Nov-2018 Jordan Crouse <jcrouse@codeaurora.org>

drm/msm/a6xx: Track and manage a6xx state memory

The a6xx GPU state allocates a LOT of memory. Add a bit of
infrastructure to track the memory allocations in the GPU structure
and delete them when the state is destroyed much the same way
that devm works with the device model as a whole. This protects
against the developer accidentally forgetting to add a kfree() to
an ever growing list.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>


# 1707add8 02-Nov-2018 Jordan Crouse <jcrouse@codeaurora.org>

drm/msm/a6xx: Add a6xx gpu state

Add support for gathering and dumping the a6xx GPU state including
registers, GMU registers, indexed registers, shader blocks,
context clusters and debugbus.

v2: Fix bugs discovered by Sharat Masetty

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>