History log of /linux-master/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
Revision Date Author Comments
# a3a4c0b1 21-Mar-2024 shaoyunl <shaoyun.liu@amd.com>

drm/amdgpu : Add mes_log_enable to control mes log feature

The MES log might slow down the performance for extra step of log the data,
disable it by default and introduce a parameter can enable it when necessary

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 93d64097 22-Feb-2024 Tim Huang <Tim.Huang@amd.com>

drm/amdgpu: reserve more memory for MES runtime DRAM

This patch fixes a MES firmware boot failure issue
when backdoor loading the MES firmware.

MES firmware runtime DRAM size is changed to 512k,
the driver needs to reserve this amount of memory in
FB, otherwise adjacent memory will be overwritten by
the MES firmware startup code.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 846f7385 03-Jan-2024 Yifan Zhang <yifan1.zhang@amd.com>

drm/amdgpu: add mes firmware support for GC 11.5.1

This patch to add MES PIPE0 and PIPE1 firmware support for gc_11_5_1.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 47c45335 23-Nov-2023 shaoyunl <shaoyun.liu@amd.com>

drm/amdgpu: Enable event log on MES 11

Enable event log through the HW specific FW API

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 28ab9a02 11-Oct-2023 Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/mes11: remove aggregated doorbell code

It's not enabled in hardware so the code is dead.
Remove it.

Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 75792058 04-Oct-2023 Jay Cornwall <jay.cornwall@amd.com>

drm/amdgpu: Improve MES responsiveness during oversubscription

When MES is oversubscribed it may not frequently check for new
command submissions from driver if the scheduling load is high.
Response latency as high as 5 seconds has been observed.

Enable a flag which adds a check for new commands between
scheduling quantums.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Cc: Alexandru Tudor <alexandru.tudor@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 4e8303cf 11-Sep-2023 Lijo Lazar <lijo.lazar@amd.com>

drm/amdgpu: Use function for IP version check

Use an inline function for version check. Gives more flexibility to
handle any format changes.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 10c9d869 24-May-2023 Aaron Liu <aaron.liu@amd.com>

drm/amdgpu: add mes firmware support for gc_11_5_0

Add scheduler and kiq firmware support for gc_11_5_0.

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 91aafa3c 02-Aug-2023 Ran Sun <sunran001@208suo.com>

drm/amdgpu: Clean up errors in mes_v11_0.c

Fix the following errors reported by checkpatch:

ERROR: else should follow close brace '}'

Signed-off-by: Ran Sun <sunran001@208suo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 8a92e867 23-Jul-2023 Bob Zhou <bob.zhou@amd.com>

drm/amdgpu: remove repeat code for mes_add_queue_pkt

The setting of mes_add_queue_pkt is repeated, so remove it.

Signed-off-by: Bob Zhou <bob.zhou@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 50fbe0cc 23-Jul-2023 Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>

drm/amdgpu: Add -ENOMEM error handling when there is no memory

Return -ENOMEM, when there is no sufficient dynamically allocated memory

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 7a1c5c67 12-Jul-2023 Jonathan Kim <jonathan.kim@amd.com>

drm/amdkfd: enable cooperative groups for gfx11

MES can concurrently schedule queues on the device that require
exclusive device access if marked exclusively_scheduled without the
requirement of GWS. Similar to the F32 HWS, MES will manage
quality of service for these queues.
Use this for cooperative groups since cooperative groups are device
occupancy limited.

Since some GFX11 devices can only be debugged with partial CUs, do not
allow the debugging of cooperative groups on these devices as the CU
occupancy limit will change on attach.

In addition, zero initialize the MES add queue submission vector for MES
initialization tests as we do not want these to be cooperative
dispatches.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 09d49e14 23-May-2023 Jonathan Kim <jonathan.kim@amd.com>

drm/amdkfd: fix and enable debugging for gfx11

There are a couple of fixes required to enable gfx11 debugging.

First, ADD_QUEUE.trap_en is an inappropriate place to toggle
a per-process register so move it to SET_SHADER_DEBUGGER.trap_en.
When ADD_QUEUE.skip_process_ctx_clear is set, MES will prioritize
the SET_SHADER_DEBUGGER.trap_en setting.

Second, to preserve correct save/restore priviledged wave states
in coordination with the trap enablement setting, resume suspended
waves early in the disable call.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 69a8c3ae 01-Sep-2022 Jonathan Kim <jonathan.kim@amd.com>

drm/amdkfd: apply trap workaround for gfx11

Due to a HW bug, waves in only half the shader arrays can enter trap.

When starting a debug session, relocate all waves to the first shader
array of each shader engine and mask off the 2nd shader array as
unavailable.

When ending a debug session, re-enable the 2nd shader array per
shader engine.

User CU masking per queue cannot be guaranteed to remain functional
if requested during debugging (e.g. user cu mask requests only 2nd shader
array as an available resource leading to zero HW resources available)
nor can runtime be alerted of any of these changes during execution.

Make user CU masking and debugging mutual exclusive with respect to
availability.

If the debugger tries to attach to a process with a user cu masked
queue, return the runtime status as enabled but busy.

If the debugger tries to attach and fails to reallocate queue waves to
the first shader array of each shader engine, return the runtime status
as enabled but with an error.

In addition, like any other mutli-process debug supported devices,
disable trap temporary setup per-process to avoid performance impact from
setup overhead.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# a9818854 26-Aug-2022 Jonathan Kim <jonathan.kim@amd.com>

drm/amdgpu: expose debug api for mes

Similar to the F32 HWS, the RS64 HWS for GFX11 now supports a multi-process
debug API.

The skip_process_ctx_clear ADD_QUEUE requirement is to prevent the MES
from clearing the process context when the first queue is added to the
scheduler in order to maintain debug mode settings during queue preemption
and restore. The MES clears the process context in this case due to an
unresolved FW caching bug during normal mode operations.
During debug mode, the KFD will hold a reference to the target process
so the process context should never go stale and MES can afford to skip
this requirement.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 93ab59ac 12-May-2023 Guchun Chen <guchun.chen@amd.com>

drm/amdgpu: switch to unified amdgpu_ring_test_helper

This will simplify code.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# e602157e 17-May-2023 Jack Xiao <Jack.Xiao@amd.com>

drm/amdgpu: fix S3 issue if MQD in VRAM

1. Need flush HDP for MQD putting in vram
2. Zero out mes MQD

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# f4caf584 14-Sep-2022 Hawking Zhang <Hawking.Zhang@amd.com>

drm/amdgpu: introduce vmhub definition for multi-partition cases (v3)

v1: Each partition has its own gfxhub or mmhub. adjust
the num of MAX_VMHUBS and the GFXHUB/MMHUB layout (Le)

v2: re-design the AMDGPU_GFXHUB/AMDGPU_MMHUB layout (Le)

v3: apply the gfxhub/mmhub layout to new IPs (Hawking)

v4: fix up gmc11 (Alex)

v5: rebase (Alex)

Signed-off-by: Le Ma <le.ma@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 1cfb4d61 26-Apr-2023 Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: put MQDs in VRAM

Reduces preemption latency.
Only enable this for gfx10 and 11 for now
to avoid changing behavior on gfx 8 and 9.

v2: move MES MQDs into VRAM as well (YuBiao)
v3: enable on gfx10, 11 only (Alex)
v4: minor style changes, document why gfx10/11 only (Alex)

Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 277bd337 23-May-2022 Le Ma <le.ma@amd.com>

drm/amdgpu: convert gfx.kiq to array type (v3)

v1: more kiq instances are a available in SOC (Le)
v2: squash commits to avoid breaking the build (Le)
v3: make the conversion for gfx/mec v11_0 (Hawking)

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 8855818c 12-Apr-2023 Li Ma <li.ma@amd.com>

drm/amdgpu: reserve the old gc_11_0_*_mes.bin

Reserve the MOUDLE_FIRMWARE declaration of gc_11_0_*_mes.bin
to fix falling back to old mes bin on failure via autoload.

Fixes: 97998b893c30 ("drm/amd/amdgpu: introduce gc_*_mes_2.bin v2")
Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 554836cc 29-Mar-2023 Yifan Zha <Yifan.Zha@amd.com>

drm/amdgpu: Add MES KIQ clear to tell RLC that KIQ is dequeued

[Why]
As MES KIQ is dequeued, tell RLC that KIQ is inactive

[How]
Clear the RLC_CP_SCHEDULERS Active bit which RLC checks KIQ status
In addition, driver can halt MES under SRIOV when unloading driver

v2:
Use scheduler0 mask to clear KIQ portion of RLC_CP_SCHEDULERS

Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Reviewed-by: Horace Chen <horace.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 9eb28ac1 29-Mar-2023 Yifan Zha <Yifan.Zha@amd.com>

drm/amdgpu: Add MES KIQ dequeue in MES hw fini

[Why]
Need dequeue MES KIQ under SRIOV when unloading driver

[How]
Modify mes_v11_0_kiq_dequeue_sched which was used to dequeue MES SCHED
to support veriable pipe.
Add MES KIQ dequeue in hw fini

Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Reviewed-by: Horace Chen <horace.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 97998b89 24-Mar-2023 Jack Xiao <Jack.Xiao@amd.com>

drm/amd/amdgpu: introduce gc_*_mes_2.bin v2

To avoid new mes fw running with old driver, rename
mes schq fw to gc_*_mes_2.bin.

v2: add MODULE_FIRMWARE declaration
v3: squash in fixup patch

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# dc907c9d 09-Feb-2023 Jack Xiao <Jack.Xiao@amd.com>

drm/amd/amdgpu: fix warning during suspend

Freeing memory was warned during suspend.
Move the self test out of suspend.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=2151825
Cc: jfalempe@redhat.com
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-and-tested-by: Evan Quan <evan.quan@amd.com>
Tested-by: Jocelyn Falempe <jfalempe@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# a462ef87 20-Jan-2023 Li Ma <li.ma@amd.com>

drm/amdgpu: declare firmware for new MES 11.0.4

To support new mes ip block

Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 601ff522 19-Jan-2023 Jonathan Kim <jonathan.kim@amd.com>

drm/amdgpu: remove unconditional trap enable on add gfx11 queues

Rebase of driver has incorrect unconditional trap enablement
for GFX11 when adding mes queues.

Reported-by: Graham Sider <graham.sider@amd.com>
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Graham Sider <graham.sider@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# e78105c8 02-Jan-2023 Mario Limonciello <mario.limonciello@amd.com>

drm/amd: Remove superfluous assignment for `adev->mes.adev`

`amdgpu_mes_init` already sets `adev->mes.adev`, so there is no need
to also set it in the IP specific versions.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 11e0b006 03-Jan-2023 Mario Limonciello <mario.limonciello@amd.com>

drm/amd: Use `amdgpu_ucode_*` helpers for MES

The `amdgpu_ucode_request` helper will ensure that the return code for
missing firmware is -ENODEV so that early_init can fail.

The `amdgpu_ucode_release` helper provides symmetry for releasing firmware.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# cc42e76e 28-Dec-2022 Mario Limonciello <mario.limonciello@amd.com>

drm/amd: Load MES microcode during early_init

Add an early_init phase to MES for fetching and validating microcode
from the filesystem.

If MES microcode is required but not available during early init, the
firmware framebuffer will have already been released and the screen will
freeze.

Move the request for MES microcode into the early_init phase
so that if it's not available, early_init will fail.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 58ab2c08 14-Jan-2022 Christian König <christian.koenig@amd.com>

drm/amdgpu: use VRAM|GTT for a bunch of kernel allocations

Technically all of those can use GTT as well, no need to force things
into VRAM.

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# e3bf7e96 19-Dec-2022 Tim Huang <tim.huang@amd.com>

drm/amdgpu: skip mes self test after s0i3 resume for MES IP v11.0

MES is part of gfxoff and MES suspend and resume are skipped for S0i3.
But the mes_self_test call path is still in the amdgpu_device_ip_late_init.
it's should also be skipped for s0ix as no hardware re-initialization
happened.

Besides, mes_self_test will free the BO that triggers a lot of warning
messages while in the suspend state.

[ 81.656085] WARNING: CPU: 2 PID: 1550 at drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425 amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]
[ 81.679435] Call Trace:
[ 81.679726] <TASK>
[ 81.679981] amdgpu_mes_remove_hw_queue+0x17a/0x230 [amdgpu]
[ 81.680857] amdgpu_mes_self_test+0x390/0x430 [amdgpu]
[ 81.681665] mes_v11_0_late_init+0x37/0x50 [amdgpu]
[ 81.682423] amdgpu_device_ip_late_init+0x53/0x280 [amdgpu]
[ 81.683257] amdgpu_device_resume+0xae/0x2a0 [amdgpu]
[ 81.684043] amdgpu_pmops_resume+0x37/0x70 [amdgpu]
[ 81.684818] pci_pm_resume+0x5c/0xa0
[ 81.685247] ? pci_pm_thaw+0x90/0x90
[ 81.685658] dpm_run_callback+0x4e/0x160
[ 81.686110] device_resume+0xad/0x210
[ 81.686529] async_resume+0x1e/0x40
[ 81.686931] async_run_entry_fn+0x33/0x120
[ 81.687405] process_one_work+0x21d/0x3f0
[ 81.687869] worker_thread+0x4a/0x3c0
[ 81.688293] ? process_one_work+0x3f0/0x3f0
[ 81.688777] kthread+0xff/0x130
[ 81.689157] ? kthread_complete_and_exit+0x20/0x20
[ 81.689707] ret_from_fork+0x22/0x30
[ 81.690118] </TASK>
[ 81.690380] ---[ end trace 0000000000000000 ]---

v2: make the comment clean and use adev->in_s0ix instead of
adev->suspend

Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 8f323789 09-Feb-2023 Jack Xiao <Jack.Xiao@amd.com>

drm/amd/amdgpu: fix warning during suspend

Freeing memory was warned during suspend.
Move the self test out of suspend.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=2151825
Cc: jfalempe@redhat.com
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-and-tested-by: Evan Quan <evan.quan@amd.com>
Tested-by: Jocelyn Falempe <jfalempe@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x


# f0f77436 20-Jan-2023 Li Ma <li.ma@amd.com>

drm/amdgpu: declare firmware for new MES 11.0.4

To support new mes ip block

Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 2de37698 19-Jan-2023 Jonathan Kim <jonathan.kim@amd.com>

drm/amdgpu: remove unconditional trap enable on add gfx11 queues

Rebase of driver has incorrect unconditional trap enablement
for GFX11 when adding mes queues.

Reported-by: Graham Sider <graham.sider@amd.com>
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Graham Sider <graham.sider@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x


# 8660495a 19-Dec-2022 Tim Huang <tim.huang@amd.com>

drm/amdgpu: skip mes self test after s0i3 resume for MES IP v11.0

MES is part of gfxoff and MES suspend and resume are skipped for S0i3.
But the mes_self_test call path is still in the amdgpu_device_ip_late_init.
it's should also be skipped for s0ix as no hardware re-initialization
happened.

Besides, mes_self_test will free the BO that triggers a lot of warning
messages while in the suspend state.

[ 81.656085] WARNING: CPU: 2 PID: 1550 at drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425 amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]
[ 81.679435] Call Trace:
[ 81.679726] <TASK>
[ 81.679981] amdgpu_mes_remove_hw_queue+0x17a/0x230 [amdgpu]
[ 81.680857] amdgpu_mes_self_test+0x390/0x430 [amdgpu]
[ 81.681665] mes_v11_0_late_init+0x37/0x50 [amdgpu]
[ 81.682423] amdgpu_device_ip_late_init+0x53/0x280 [amdgpu]
[ 81.683257] amdgpu_device_resume+0xae/0x2a0 [amdgpu]
[ 81.684043] amdgpu_pmops_resume+0x37/0x70 [amdgpu]
[ 81.684818] pci_pm_resume+0x5c/0xa0
[ 81.685247] ? pci_pm_thaw+0x90/0x90
[ 81.685658] dpm_run_callback+0x4e/0x160
[ 81.686110] device_resume+0xad/0x210
[ 81.686529] async_resume+0x1e/0x40
[ 81.686931] async_run_entry_fn+0x33/0x120
[ 81.687405] process_one_work+0x21d/0x3f0
[ 81.687869] worker_thread+0x4a/0x3c0
[ 81.688293] ? process_one_work+0x3f0/0x3f0
[ 81.688777] kthread+0xff/0x130
[ 81.689157] ? kthread_complete_and_exit+0x20/0x20
[ 81.689707] ret_from_fork+0x22/0x30
[ 81.690118] </TASK>
[ 81.690380] ---[ end trace 0000000000000000 ]---

v2: make the comment clean and use adev->in_s0ix instead of
adev->suspend

Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0, 6.1


# a6b3b618 28-Nov-2022 Jack Xiao <Jack.Xiao@amd.com>

drm/amdgpu/mes11: enable reg active poll

Enable reg active poll in mes11.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Tested-and-acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# f3416dc8 09-Nov-2022 YuBiao Wang <YuBiao.Wang@amd.com>

drm/amdgpu: Stop clearing kiq position during unload

Do not clear kiq position in RLC_CP_SCHEDULER so that CP could perform
IDLE-SAVE after VF fini. CPG also needs to be active in save command.

v2: drop unused variable (Alex)

Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 9a1662f5 29-Sep-2022 Graham Sider <Graham.Sider@amd.com>

drm/amdgpu: extend halt_if_hws_hang to MES

Hang on MES timeout if halt_if_hws_hang is set to 1.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 6040517e 25-Oct-2022 Graham Sider <Graham.Sider@amd.com>

drm/amdgpu: remove deprecated MES version vars

MES scheduler and kiq versions are stored in mes.sched_version and
mes.kiq_version, respectively, which are read from a register after
their queues are initialized. Remove mes.ucode_fw_version and
mes.data_fw_version which tried to read this versioning info from the
firmware headers (which don't contain this information).

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# bb3c846a 18-Oct-2022 Yiqing Yao <yiqing.yao@amd.com>

drm/amdgpu: Adjust MES polling timeout for sriov

[why]
MES response time in sriov may be longer than default value
due to reset or init in other VF. A timeout value specific
to sriov is needed.

[how]
When in sriov, adjust the timeout value to calculated
worst case scenario.

Signed-off-by: Yiqing Yao <yiqing.yao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# c4dfad81 12-Oct-2022 YuBiao Wang <YuBiao.Wang@amd.com>

drm/amdgpu: dequeue mes scheduler during fini

[Why]
If mes is not dequeued during fini, mes will be in an uncleaned state
during reload, then mes couldn't receive some commands which leads to
reload failure.

[How]
Perform MES dequeue via MMIO after all the unmap jobs are done by mes
and before kiq fini.

v2: Move the dequeue operation inside kiq_hw_fini.

Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# bbce8cdb 25-Jul-2022 Likun Gao <Likun.Gao@amd.com>

drm/amdgpu: skip mes self test for gc 11.0.3

Temporary disable mes self teset for gc 11.0.3.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 226dcfad 18-Oct-2022 Yiqing Yao <yiqing.yao@amd.com>

drm/amdgpu: Adjust MES polling timeout for sriov

[why]
MES response time in sriov may be longer than default value
due to reset or init in other VF. A timeout value specific
to sriov is needed.

[how]
When in sriov, adjust the timeout value to calculated
worst case scenario.

Signed-off-by: Yiqing Yao <yiqing.yao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 2abe92c7 12-Oct-2022 YuBiao Wang <YuBiao.Wang@amd.com>

drm/amdgpu: dequeue mes scheduler during fini

[Why]
If mes is not dequeued during fini, mes will be in an uncleaned state
during reload, then mes couldn't receive some commands which leads to
reload failure.

[How]
Perform MES dequeue via MMIO after all the unmap jobs are done by mes
and before kiq fini.

v2: Move the dequeue operation inside kiq_hw_fini.

Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# b7a76a29 25-Jul-2022 Likun Gao <Likun.Gao@amd.com>

drm/amdgpu: skip mes self test for gc 11.0.3

Temporary disable mes self teset for gc 11.0.3.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 3e9cf234 19-Sep-2022 Graham Sider <Graham.Sider@amd.com>

drm/amdgpu: pass queue size and is_aql_queue to MES

Update mes_v11_api_def.h add_queue API with is_aql_queue parameter. Also
re-use gds_size for the queue size (unused for KFD). MES requires the
queue size in order to compute the actual wptr offset within the queue
RB since it increases monotonically for AQL queues.

v2: Make is_aql_queue assign clearer

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 585a8261 25-Aug-2022 David Belanger <david.belanger@amd.com>

drm/amdgpu: Enable SA software trap.

Enables support for software trap for MES >= 4.
Adapted from implementation from Jay Cornwall.

v2: Add IP version check in conditions.
v3: Remove debugger code changes.

Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 701a4ad9 25-Aug-2022 Hawking Zhang <Hawking.Zhang@amd.com>

drm/amdgpu: declare firmware for new MES 11.0.3

To support new mes ip block

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Frank Min <Frank.Min@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 2aefa9a3 15-Aug-2022 Graham Sider <Graham.Sider@amd.com>

drm/amdgpu: Update mes_v11_api_def.h

New GFX11 MES FW adds the trap_en bit. For now hardcode to 1 (traps
enabled).

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 91ef6cfd 19-Sep-2022 Graham Sider <Graham.Sider@amd.com>

drm/amdgpu: pass queue size and is_aql_queue to MES

Update mes_v11_api_def.h add_queue API with is_aql_queue parameter. Also
re-use gds_size for the queue size (unused for KFD). MES requires the
queue size in order to compute the actual wptr offset within the queue
RB since it increases monotonically for AQL queues.

v2: Make is_aql_queue assign clearer

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 47e04eed 15-Aug-2022 Graham Sider <Graham.Sider@amd.com>

drm/amdgpu: Update mes_v11_api_def.h

New GFX11 MES FW adds the trap_en bit. For now hardcode to 1 (traps
enabled).

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# ed67f729 20-Jul-2022 Jack Xiao <Jack.Xiao@amd.com>

drm/amdgpu: move mes self test after drm sched re-started

mes self test rely on vm mapping, move it after
drm sched re-started so that vm mapping can work
during gpu reset.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-and-tested-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# b7320117 06-Jul-2021 Jack Xiao <Jack.Xiao@amd.com>

drm/amdgpu/mes11: initialize aggregated doorbell

Allocate and enable aggregated doorbell.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 63677486 08-Jul-2022 Jack Xiao <Jack.Xiao@amd.com>

drm/amdgpu/mes: set correct mes ring ready flag

Set corresponding ready flag for mes ring when enable or disable
mes ring.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 35ba8850 07-Jul-2022 Jack Xiao <Jack.Xiao@amd.com>

drm/amdgpu/mes: fix mes submission in atomic context

For some cases (accessing registers, unmap legacy queue), it needs
access mes in atomic context. Use spinlock to protect agaist mes
ring buffer race condition.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 7acd7ab0 21-Jun-2022 Jack Xiao <Jack.Xiao@amd.com>

drm/amdgpu/mes11: fix to unmap legacy queue

MES fw updated to support unmapping legacy gfx/compute queue.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# cf606729 16-Jun-2022 Jack Xiao <Jack.Xiao@amd.com>

drm/amdgpu: enable mes to access registers v2

Enable mes to access registers.

v2: squash mes sched ring enablement flag

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 7d4705b3 16-Jun-2022 Jack Xiao <Jack.Xiao@amd.com>

drm/amdgpu/mes11: add mes11 misc op

Add misc op commands in mes11.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# fe4e9ff9 02-Jun-2022 Jack Xiao <Jack.Xiao@amd.com>

drm/amdgpu: add mc wptr addr support for mes

MES requires mc wptr address for usermode queues.
Export bo gart address for mc wptr address.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# a9579956 28-Apr-2022 Graham Sider <Graham.Sider@amd.com>

drm/amdgpu: Update mes_v11_api_def.h

Update MES API to support oversubscription without aggregated doorbell
for usermode queues.

v2: Change oversubscription_no_aggregated_en to is_kfd_process (align
with MES)

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# ff83e6e7 03-Jun-2022 Graham Sider <Graham.Sider@amd.com>

drm/amdgpu: Fetch MES scheduler/KIQ versions

Store MES scheduler and MES KIQ version numbers in amdgpu_mes for GFX11.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 8728df26 02-Jun-2022 Yifan Zhang <yifan1.zhang@amd.com>

drm/amdgpu/mes: only invalid/prime icache when finish loading both pipe MES FWs.

invalid/prime icahce operation takes effect both pipes cuconrrently,
therefore CP_MES_IC_BASE_LO/HI and CP_MES_MDBASE_LO/HI both have to be
set before prime icache. Otherwise MES hardware gets garbage data in
above regsters and causes page fault

[ 470.873200] amdgpu 0000:33:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:217 vmid:0 pasid:0, for process pid 0 thread pid 0)
[ 470.873222] amdgpu 0000:33:00.0: amdgpu: in page starting at address 0x000092cb89b00000 from client 10
[ 470.873234] amdgpu 0000:33:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000BB3
[ 470.873242] amdgpu 0000:33:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5)
[ 470.873247] amdgpu 0000:33:00.0: amdgpu: MORE_FAULTS: 0x1
[ 470.873251] amdgpu 0000:33:00.0: amdgpu: WALKER_ERROR: 0x1
[ 470.873256] amdgpu 0000:33:00.0: amdgpu: PERMISSION_FAULTS: 0xb
[ 470.873260] amdgpu 0000:33:00.0: amdgpu: MAPPING_ERROR: 0x1
[ 470.873264] amdgpu 0000:33:00.0: amdgpu: RW: 0x0

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 431d0712 02-Jun-2022 Yifan Zhang <yifan1.zhang@amd.com>

drm/amdgpu/mes: only invalid/prime icache when finish loading both pipe MES FWs.

invalid/prime icahce operation takes effect both pipes cuconrrently,
therefore CP_MES_IC_BASE_LO/HI and CP_MES_MDBASE_LO/HI both have to be
set before prime icache. Otherwise MES hardware gets garbage data in
above regsters and causes page fault

[ 470.873200] amdgpu 0000:33:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:217 vmid:0 pasid:0, for process pid 0 thread pid 0)
[ 470.873222] amdgpu 0000:33:00.0: amdgpu: in page starting at address 0x000092cb89b00000 from client 10
[ 470.873234] amdgpu 0000:33:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000BB3
[ 470.873242] amdgpu 0000:33:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5)
[ 470.873247] amdgpu 0000:33:00.0: amdgpu: MORE_FAULTS: 0x1
[ 470.873251] amdgpu 0000:33:00.0: amdgpu: WALKER_ERROR: 0x1
[ 470.873256] amdgpu 0000:33:00.0: amdgpu: PERMISSION_FAULTS: 0xb
[ 470.873260] amdgpu 0000:33:00.0: amdgpu: MAPPING_ERROR: 0x1
[ 470.873264] amdgpu 0000:33:00.0: amdgpu: RW: 0x0

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 7bd3114b 11-May-2022 Jack Xiao <Jack.Xiao@amd.com>

drm/amdgpu/gfx11: fix mes mqd settings

Use the correct Memory Queue Descriptor (MQD)
structure for GC 11.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# b0abae7d 14-Dec-2021 Huang Rui <ray.huang@amd.com>

drm/amdgpu: add GC v11.0.1 into mes v11

Add GC v11.0.1 support into mes v11.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Xiaojian Du <Xiaojian.Du@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 32697fea 15-Apr-2022 Flora Cui <flora.cui@amd.com>

drm/amdgpu: add mes 11 firmware for mes 11.0.2

Define firmware for MES 11.0.2.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 98bae896 04-May-2022 Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/gfx11: remove some register fields that no longer exist

Some copy paste leftovers for older asics. They were protected
by __BIG_ENDIAN, so we didn't notice them initially.

Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# d81d75c9 04-Aug-2020 Jack Xiao <Jack.Xiao@amd.com>

drm/amdgpu/gfx11: enable kiq to map mes ring

Enable KIQ to map MES ring:
1). add MES queue mapping support in MAP_QUEUES packet.
2). use correct MQD settings for MES queue.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 028c3fb3 13-Apr-2022 Jack Xiao <Jack.Xiao@amd.com>

drm/amdgpu/mes11: initiate mes v11 support

Initiate mes v11 code base from mes v10, rename function
and register names.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>