History log of /linux-master/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
Revision Date Author Comments
# b8f67b9d 18-Jan-2024 Shashank Sharma <shashank.sharma@amd.com>

drm/amdgpu: change vm->task_info handling

This patch changes the handling and lifecycle of vm->task_info object.
The major changes are:
- vm->task_info is a dynamically allocated ptr now, and its uasge is
reference counted.
- introducing two new helper funcs for task_info lifecycle management
- amdgpu_vm_get_task_info: reference counts up task_info before
returning this info
- amdgpu_vm_put_task_info: reference counts down task_info
- last put to task_info() frees task_info from the vm.

This patch also does logistical changes required for existing usage
of vm->task_info.

V2: Do not block all the prints when task_info not found (Felix)

V3: Fixed review comments from Felix
- Fix wrong indentation
- No debug message for -ENOMEM
- Add NULL check for task_info
- Do not duplicate the debug messages (ti vs no ti)
- Get first reference of task_info in vm_init(), put last
in vm_fini()

V4: Fixed review comments from Felix
- fix double reference increment in create_task_info
- change amdgpu_vm_get_task_info_pasid
- additional changes in amdgpu_gem.c while porting

Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 9749c868 04-Jan-2024 Ma Jun <Jun.Ma2@amd.com>

drm/amdgpu: Fix the warning info in mode1 reset

Fix the warning info below during mode1 reset.
[ +0.000004] Call Trace:
[ +0.000004] <TASK>
[ +0.000006] ? show_regs+0x6e/0x80
[ +0.000011] ? __flush_work.isra.0+0x2e8/0x390
[ +0.000005] ? __warn+0x91/0x150
[ +0.000009] ? __flush_work.isra.0+0x2e8/0x390
[ +0.000006] ? report_bug+0x19d/0x1b0
[ +0.000013] ? handle_bug+0x46/0x80
[ +0.000012] ? exc_invalid_op+0x1d/0x80
[ +0.000011] ? asm_exc_invalid_op+0x1f/0x30
[ +0.000014] ? __flush_work.isra.0+0x2e8/0x390
[ +0.000007] ? __flush_work.isra.0+0x208/0x390
[ +0.000007] ? _prb_read_valid+0x216/0x290
[ +0.000008] __cancel_work_timer+0x11d/0x1a0
[ +0.000007] ? try_to_grab_pending+0xe8/0x190
[ +0.000012] cancel_work_sync+0x14/0x20
[ +0.000008] amddrm_sched_stop+0x3c/0x1d0 [amd_sched]
[ +0.000032] amdgpu_device_gpu_recover+0x29a/0xe90 [amdgpu]

This warning info was printed after applying the patch
"drm/sched: Convert drm scheduler to use a work queue rather than kthread".
The root cause is that amdgpu driver tries to use the uninitialized
work_struct in the struct drm_gpu_scheduler

v2:
- Rename the function to amdgpu_ring_sched_ready and move it to
amdgpu_ring.c (Alex)
v3:
- Fix a few more checks based on Vitaly's patch (Alex)
v4:
- squash in fix noticed by Bert in
https://gitlab.freedesktop.org/drm/amd/-/issues/3139

Fixes: 11b3b9f461c5 ("drm/sched: Check scheduler ready before calling timeout handling")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# bb34bc2c 04-Jan-2024 Ma Jun <Jun.Ma2@amd.com>

drm/amdgpu: Fix the warning info in mode1 reset

Fix the warning info below during mode1 reset.
[ +0.000004] Call Trace:
[ +0.000004] <TASK>
[ +0.000006] ? show_regs+0x6e/0x80
[ +0.000011] ? __flush_work.isra.0+0x2e8/0x390
[ +0.000005] ? __warn+0x91/0x150
[ +0.000009] ? __flush_work.isra.0+0x2e8/0x390
[ +0.000006] ? report_bug+0x19d/0x1b0
[ +0.000013] ? handle_bug+0x46/0x80
[ +0.000012] ? exc_invalid_op+0x1d/0x80
[ +0.000011] ? asm_exc_invalid_op+0x1f/0x30
[ +0.000014] ? __flush_work.isra.0+0x2e8/0x390
[ +0.000007] ? __flush_work.isra.0+0x208/0x390
[ +0.000007] ? _prb_read_valid+0x216/0x290
[ +0.000008] __cancel_work_timer+0x11d/0x1a0
[ +0.000007] ? try_to_grab_pending+0xe8/0x190
[ +0.000012] cancel_work_sync+0x14/0x20
[ +0.000008] amddrm_sched_stop+0x3c/0x1d0 [amd_sched]
[ +0.000032] amdgpu_device_gpu_recover+0x29a/0xe90 [amdgpu]

This warning info was printed after applying the patch
"drm/sched: Convert drm scheduler to use a work queue rather than kthread".
The root cause is that amdgpu driver tries to use the uninitialized
work_struct in the struct drm_gpu_scheduler

v2:
- Rename the function to amdgpu_ring_sched_ready and move it to
amdgpu_ring.c (Alex)
v3:
- Fix a few more checks based on Vitaly's patch (Alex)
v4:
- squash in fix noticed by Bert in
https://gitlab.freedesktop.org/drm/amd/-/issues/3139

Fixes: 11b3b9f461c5 ("drm/sched: Check scheduler ready before calling timeout handling")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 5eb8094a 19-Dec-2023 Mangesh Gadre <Mangesh.Gadre@amd.com>

drm/amdgpu: Add register read/write debugfs support for AID's

SMN address is larger than 32 bits for registers on different AID's
Updating existing interface to support access to such registers.

Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# afe58346 27-Nov-2023 Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/debugfs: fix error code when smc register accessors are NULL

Should be -EOPNOTSUPP.

Fixes: 5104fdf50d32 ("drm/amdgpu: Fix a null pointer access when the smc_rreg pointer is NULL")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# b2662d4c 23-Nov-2023 shaoyunl <shaoyun.liu@amd.com>

drm/amdgpu: SW part of MES event log enablement

This is the generic SW part, prepare the event log buffer and dump it through debugfs

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 7b194fdc 22-Nov-2023 Lu Yao <yaolu@kylinos.cn>

drm/amdgpu: Fix cat debugfs amdgpu_regs_didt causes kernel null pointer

For 'AMDGPU_FAMILY_SI' family cards, in 'si_common_early_init' func, init
'didt_rreg' and 'didt_wreg' to 'NULL'. But in func
'amdgpu_debugfs_regs_didt_read/write', using 'RREG32_DIDT' 'WREG32_DIDT'
lacks of relevant judgment. And other 'amdgpu_ip_block_version' that use
these two definitions won't be added for 'AMDGPU_FAMILY_SI'.

So, add null pointer judgment before calling.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Lu Yao <yaolu@kylinos.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 35963cf2 30-Oct-2023 Matthew Brost <matthew.brost@intel.com>

drm/sched: Add drm_sched_wqueue_* helpers

Add scheduler wqueue ready, stop, and start helpers to hide the
implementation details of the scheduler from the drivers.

v2:
- s/sched_wqueue/sched_wqueue (Luben)
- Remove the extra white line after the return-statement (Luben)
- update drm_sched_wqueue_ready comment (Luben)

Cc: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://lore.kernel.org/r/20231031032439.1558703-2-matthew.brost@intel.com
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>


# 2161e09c 22-Nov-2023 Lu Yao <yaolu@kylinos.cn>

drm/amdgpu: Fix cat debugfs amdgpu_regs_didt causes kernel null pointer

For 'AMDGPU_FAMILY_SI' family cards, in 'si_common_early_init' func, init
'didt_rreg' and 'didt_wreg' to 'NULL'. But in func
'amdgpu_debugfs_regs_didt_read/write', using 'RREG32_DIDT' 'WREG32_DIDT'
lacks of relevant judgment. And other 'amdgpu_ip_block_version' that use
these two definitions won't be added for 'AMDGPU_FAMILY_SI'.

So, add null pointer judgment before calling.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Lu Yao <yaolu@kylinos.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 5104fdf5 22-Oct-2023 Qu Huang <qu.huang@linux.dev>

drm/amdgpu: Fix a null pointer access when the smc_rreg pointer is NULL

In certain types of chips, such as VEGA20, reading the amdgpu_regs_smc file could result in an abnormal null pointer access when the smc_rreg pointer is NULL. Below are the steps to reproduce this issue and the corresponding exception log:

1. Navigate to the directory: /sys/kernel/debug/dri/0
2. Execute command: cat amdgpu_regs_smc
3. Exception Log::
[4005007.702554] BUG: kernel NULL pointer dereference, address: 0000000000000000
[4005007.702562] #PF: supervisor instruction fetch in kernel mode
[4005007.702567] #PF: error_code(0x0010) - not-present page
[4005007.702570] PGD 0 P4D 0
[4005007.702576] Oops: 0010 [#1] SMP NOPTI
[4005007.702581] CPU: 4 PID: 62563 Comm: cat Tainted: G OE 5.15.0-43-generic #46-Ubunt u
[4005007.702590] RIP: 0010:0x0
[4005007.702598] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[4005007.702600] RSP: 0018:ffffa82b46d27da0 EFLAGS: 00010206
[4005007.702605] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa82b46d27e68
[4005007.702609] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9940656e0000
[4005007.702612] RBP: ffffa82b46d27dd8 R08: 0000000000000000 R09: ffff994060c07980
[4005007.702615] R10: 0000000000020000 R11: 0000000000000000 R12: 00007f5e06753000
[4005007.702618] R13: ffff9940656e0000 R14: ffffa82b46d27e68 R15: 00007f5e06753000
[4005007.702622] FS: 00007f5e0755b740(0000) GS:ffff99479d300000(0000) knlGS:0000000000000000
[4005007.702626] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[4005007.702629] CR2: ffffffffffffffd6 CR3: 00000003253fc000 CR4: 00000000003506e0
[4005007.702633] Call Trace:
[4005007.702636] <TASK>
[4005007.702640] amdgpu_debugfs_regs_smc_read+0xb0/0x120 [amdgpu]
[4005007.703002] full_proxy_read+0x5c/0x80
[4005007.703011] vfs_read+0x9f/0x1a0
[4005007.703019] ksys_read+0x67/0xe0
[4005007.703023] __x64_sys_read+0x19/0x20
[4005007.703028] do_syscall_64+0x5c/0xc0
[4005007.703034] ? do_user_addr_fault+0x1e3/0x670
[4005007.703040] ? exit_to_user_mode_prepare+0x37/0xb0
[4005007.703047] ? irqentry_exit_to_user_mode+0x9/0x20
[4005007.703052] ? irqentry_exit+0x19/0x30
[4005007.703057] ? exc_page_fault+0x89/0x160
[4005007.703062] ? asm_exc_page_fault+0x8/0x30
[4005007.703068] entry_SYSCALL_64_after_hwframe+0x44/0xae
[4005007.703075] RIP: 0033:0x7f5e07672992
[4005007.703079] Code: c0 e9 b2 fe ff ff 50 48 8d 3d fa b2 0c 00 e8 c5 1d 02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 e c 28 48 89 54 24
[4005007.703083] RSP: 002b:00007ffe03097898 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[4005007.703088] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f5e07672992
[4005007.703091] RDX: 0000000000020000 RSI: 00007f5e06753000 RDI: 0000000000000003
[4005007.703094] RBP: 00007f5e06753000 R08: 00007f5e06752010 R09: 00007f5e06752010
[4005007.703096] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000022000
[4005007.703099] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[4005007.703105] </TASK>
[4005007.703107] Modules linked in: nf_tables libcrc32c nfnetlink algif_hash af_alg binfmt_misc nls_ iso8859_1 ipmi_ssif ast intel_rapl_msr intel_rapl_common drm_vram_helper drm_ttm_helper amd64_edac t tm edac_mce_amd kvm_amd ccp mac_hid k10temp kvm acpi_ipmi ipmi_si rapl sch_fq_codel ipmi_devintf ipm i_msghandler msr parport_pc ppdev lp parport mtd pstore_blk efi_pstore ramoops pstore_zone reed_solo mon ip_tables x_tables autofs4 ib_uverbs ib_core amdgpu(OE) amddrm_ttm_helper(OE) amdttm(OE) iommu_v 2 amd_sched(OE) amdkcl(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm igb ahci xhci_pci libahci i2c_piix4 i2c_algo_bit xhci_pci_renesas dca
[4005007.703184] CR2: 0000000000000000
[4005007.703188] ---[ end trace ac65a538d240da39 ]---
[4005007.800865] RIP: 0010:0x0
[4005007.800871] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[4005007.800874] RSP: 0018:ffffa82b46d27da0 EFLAGS: 00010206
[4005007.800878] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa82b46d27e68
[4005007.800881] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9940656e0000
[4005007.800883] RBP: ffffa82b46d27dd8 R08: 0000000000000000 R09: ffff994060c07980
[4005007.800886] R10: 0000000000020000 R11: 0000000000000000 R12: 00007f5e06753000
[4005007.800888] R13: ffff9940656e0000 R14: ffffa82b46d27e68 R15: 00007f5e06753000
[4005007.800891] FS: 00007f5e0755b740(0000) GS:ffff99479d300000(0000) knlGS:0000000000000000
[4005007.800895] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[4005007.800898] CR2: ffffffffffffffd6 CR3: 00000003253fc000 CR4: 00000000003506e0

Signed-off-by: Qu Huang <qu.huang@linux.dev>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 2d6a2a28 15-Sep-2023 André Almeida <andrealmeid@igalia.com>

drm/amdgpu: Encapsulate all device reset info

To better organize struct amdgpu_device, keep all reset information
related fields together in a separated struct.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# ad19c200 26-Jul-2023 Praful Swarnakar <Praful.Swarnakar@amd.com>

drm/amdgpu: Fix style issues in amdgpu_debugfs.c

Fixes the following to align to linux coding style:

WARNING: Missing a blank line after declarations
WARNING: sizeof *rd should be sizeof(*rd)

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Praful Swarnakar <Praful.Swarnakar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 8ed49dd1 16-Jun-2023 Victor Lu <victorchengchi.lu@amd.com>

drm/amdgpu: Add RLCG interface driver implementation for gfx v9.4.3 (v3)

Add RLCG interface support for gfx v9.4.3 and multiple XCCs.
Do not enable it yet.

v2: Fix amdgpu_rlcg_reg_access_ctrl init, add support for multiple XCCs
in amdgpu_mm_wreg_mmio_rlc

v3: Use GET_INST() when indexing amdgpu_rlcg_reg_access_ctrl

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 9c9d501b 25-May-2023 Dan Carpenter <dan.carpenter@linaro.org>

drm/amd/amdgpu: Fix up locking etc in amdgpu_debugfs_gprwave_ioctl()

There are two bugs here.
1) Drop the lock if copy_from_user() fails.
2) If the copy fails then the correct error code is -EFAULT instead of
-EINVAL.

I also broke up the long line and changed "sizeof rd->id" to
"sizeof(rd->id)".

Fixes: 553f973a0d7b ("drm/amd/amdgpu: Update debugfs for XCC support (v3)")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 109b4d8c 14-May-2023 Su Hui <suhui@nfschina.com>

drm/amdgpu: remove unnecessary (void*) conversions

No need cast (void*) to (struct amdgpu_device *).

Signed-off-by: Su Hui <suhui@nfschina.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 553f973a 11-Oct-2022 Tom St Denis <tom.stdenis@amd.com>

drm/amd/amdgpu: Update debugfs for XCC support (v3)

This patch updates the 'regs2' interface for MMIO
registers to add a new IOCTL command for a 'v2' state
data that includes the XCC ID.

This patch then updates amdgpu_gfx_select_se_sh()
and amdgpu_gfx_select_me_pipe_q() (and the implementations
in the gfx drivers) to support an additional parameter.

This patch then creates a new debugfs interface "gprwave"
which is a merge of shader GPR and wave status access. This
new inteface uses an IOCTL to select banks as well as XCC identity.

(v2) Fix missing xcc_id in wave_ind function

(v3) Fix pm runtime calls and mutex locking

(v4) Fix bad label

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 8fa76350 28-Apr-2023 Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>

drm/amd/amdgpu: Fix style problems in amdgpu_debugfs.c

Fix the following issues reported by checkpatch:

WARNING: please, no space before tabs
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: sizeof *rd should be sizeof(*rd)
WARNING: Missing a blank line after declarations
WARNING: sizeof rd->id should be sizeof(rd->id)
WARNING: static const char * array should probably be static const char * const
WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using octal permissions '0444'.
WARNING: Prefer seq_puts to seq_printf
ERROR: space prohibited after that open parenthesis '('

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# d51ac6d0 23-May-2022 Le Ma <le.ma@amd.com>

drm/amdgpu: add xcc index argument to select_sh_se function v2

v1: To support multiple XCD case (Le)
v2: introduce xcc index to gfx_v11_0_select_sh_se (Hawking)

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# cd3a8a59 18-Nov-2022 Christian König <christian.koenig@amd.com>

drm/ttm: remove ttm_bo_(un)lock_delayed_workqueue

Those functions never worked correctly since it is still perfectly
possible that a buffer object is released and the background worker
restarted even after calling them.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221125102137.1801-2-christian.koenig@amd.com


# d09ef243 19-Jul-2022 Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: clarify DC checks

There are several places where we don't want to check
if a particular asic could support DC, but rather, if
DC is enabled. Set a flag if DC is enabled and check
for that rather than if a device supports DC or not.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 6aa58939 12-Oct-2022 Victor Zhao <Victor.Zhao@amd.com>

Revert "drm/amdgpu: add debugfs amdgpu_reset_level"

This reverts commit 5bd8d53f6fa53eab5433698d1362dae2aa53c1cc.

This commit breaks the reset logic for aldebaran, revert it for now.
Will move the mask inside the reset handler.

Fixes: 5bd8d53f6fa53e ("drm/amdgpu: add debugfs amdgpu_reset_level")
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# afbaa155 12-Oct-2022 Victor Zhao <Victor.Zhao@amd.com>

Revert "drm/amdgpu: add debugfs amdgpu_reset_level"

This reverts commit 5bd8d53f6fa53eab5433698d1362dae2aa53c1cc.

This commit breaks the reset logic for aldebaran, revert it for now.
Will move the mask inside the reset handler.

Fixes: 5bd8d53f6fa53e ("drm/amdgpu: add debugfs amdgpu_reset_level")
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 0ad7347a 10-Aug-2022 André Almeida <andrealmeid@igalia.com>

drm/amd: Add detailed GFXOFF stats to debugfs

Add debugfs interface to log GFXOFF statistics:

- Read amdgpu_gfxoff_count to get the total GFXOFF entry count at the
time of query since system power-up

- Write 1 to amdgpu_gfxoff_residency to start logging, and 0 to stop.
Read it to get average GFXOFF residency % multiplied by 100
during the last logging interval.

Both features are designed to be keep the values persistent between
suspends.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 5bd8d53f 13-Jun-2022 Victor Zhao <Victor.Zhao@amd.com>

drm/amdgpu: add debugfs amdgpu_reset_level

Introduce amdgpu_reset_level debugfs in order to help debug and
test specific type of reset. Also helps blocking unwanted type of
resets.

By default, mode2 reset will not be enabled

v2: make this debugfs in adev and use debugfs_create_u32

Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# ad2feebd 29-Jul-2022 Sebin Sebastian <mailmesebin00@gmail.com>

drm/amdgpu: double free error and freeing uninitialized null pointer

Fix a double free and an uninitialized pointer read error. Both tmp and
new are pointing at same address and both are freed which leads to
double free. Adding a check to verify if new and tmp are free in the
error_free label fixes the double free issue. new is not initialized to
null which also leads to a free on an uninitialized pointer.

Reviewed-by: André Almeida <andrealmeid@igalia.com>
Suggested by: S. Amaranath <Amaranath.Somalapuram@amd.com>
Signed-off-by: Sebin Sebastian <mailmesebin00@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 4686177f 14-Jul-2022 André Almeida <andrealmeid@igalia.com>

drm/amd/debugfs: Expose GFXOFF state to userspace

GFXOFF has two different "state" values: one to define if the GPU is
allowed/disallowed to enter GFXOFF, usually called state; and another
one to define if currently GFXOFF is being used, usually called status.
Even when GFXOFF is allowed, GPU firmware can decide to not used it
accordingly to the GPU load.

Userspace can allow/disallow GPUs to enter into GFXOFF via debugfs. The
kernel maintains a counter of requests for GFXOFF (gfx_off_req_count)
that should be decreased to allow GFXOFF and increased to disallow.

The issue with this interface is that userspace can't be sure if GFXOFF
is currently allowed. Even by checking amdgpu_gfxoff file, one might get
an ambiguous 2, that means that GPU is currently out of GFXOFF, but that
can be either because it's currently disallowed or because it's allowed
but given the current GPU load it's enabled. Then, userspace needs to
rely on the fact that GFXOFF is enabled by default on boot and to track
this information.

To make userspace life easier and GFXOFF more reliable, return the
current state of GFXOFF to userspace when reading amdgpu_gfxoff with the
same semantics of writing: 0 means not allowed, not 0 means allowed.

Expose the current status of GFXOFF through a new file,
amdgpu_gfxoff_status.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# edadd6fc 04-Jul-2022 André Almeida <andrealmeid@igalia.com>

drm/amdpgu/debugfs: Simplify some exit paths

To avoid code repetition, unify the function exit path when possible. No
functional changes.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 651d7ee6 01-Jun-2022 Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>

drm/amdgpu: save the reset dump register value for devcoredump

Allocate memory for register value and use the same values for devcoredump.
v1 -> v2: Change krealloc_array() to kmalloc_array()
v2 -> v3: Fix alignment

Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Reviewed-by: Shashank Sharma <Shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# e50d9ba0 17-Apr-2022 Candice Li <candice.li@amd.com>

drm/amdgpu: Add debugfs TA load/unload/invoke support

v1:
Add debugfs support to load/unload/invoke TA in runtime.

v2:
1. Update some variables to static.
2. Use PAGE_ALIGN to calculate shared buf size directly.
3. Remove fp check.
4. Update debugfs from read to write.

Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# dc2947b3 08-Apr-2022 Tom St Denis <tom.stdenis@amd.com>

drm/amd/amdgpu: Update debugfs GCA data

The data revision was not changed to 5 from 4 when the CG flags
were extended to 64-bits. Since this was missed I took
the opportunity to add future upper 64-bits of PG flags
as well so we don't need to bump it again when that comes.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 25faeddc 25-Mar-2022 Evan Quan <evan.quan@amd.com>

drm/amdgpu: expand cg_flags from u32 to u64

With this, we can support more CG flags.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 11eb648d 02-Mar-2022 Ruijing Dong <ruijing.dong@amd.com>

drm/amdgpu/vcn: Add vcn firmware log

vcn fwlog is for debugging purpose only,
by default, it is disabled.

Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# ca6fcfa8 27-Feb-2022 Tom Rix <trix@redhat.com>

drm/amdgpu: Fix realloc of ptr

Clang static analysis reports this error
amdgpu_debugfs.c:1690:9: warning: 1st function call
argument is an uninitialized value
tmp = krealloc_array(tmp, i + 1,
^~~~~~~~~~~~~~~~~~~~~~~~~~~

realloc uses tmp, so tmp can not be garbage.
And the return needs to be checked.

Fixes: 5ce5a584cb82 ("drm/amdgpu: add debugfs for reset registers list")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 5ce5a584 23-Feb-2022 Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>

drm/amdgpu: add debugfs for reset registers list

List of register populated for dump collection during the GPU reset.

Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# e7c47231 18-Feb-2022 Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: expose benchmarks via debugfs

They provide a nice smoke test of transfer performance
using SDMA. Allow the user to run these at runtime
rather than only at init time.

v2: fix permissions (Alex)

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 8f74f68d 15-Feb-2022 Tom St Denis <tom.stdenis@amd.com>

drm/amd/amdgpu: Add APU flag to gca_config debugfs data (v3)

Needed by umr to detect if ip discovered ASIC is an APU or not.

(v2): Remove asic type from packet it's not strictly needed
(v3): Correct comment

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# d0fb18b5 19-Jan-2022 Andrey Grodzovsky <andrey.grodzovsky@amd.com>

drm/amdgpu: Move reset sem into reset_domain

We want single instance of reset sem across all
reset clients because in case of XGMI we should stop
access cross device MMIO because any of them could be
in a reset in the moment.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://www.spinics.net/lists/amd-gfx/msg74117.html


# 4bd8dd0d 18-Jan-2022 Yongzhi Liu <lyz_cs@pku.edu.cn>

drm/amdgpu: Add missing pm_runtime_put_autosuspend

pm_runtime_get_sync() increments the runtime PM usage counter even
when it returns an error code, thus a matching decrement is needed
on the error handling path to keep the counter balanced.

Signed-off-by: Yongzhi Liu <lyz_cs@pku.edu.cn>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# bc143d8b 21-Nov-2021 Evan Quan <evan.quan@amd.com>

drm/amd/pm: do not expose implementation details to other blocks out of power

Those implementation details(whether swsmu supported, some ppt_funcs supported,
accessing internal statistics ...)should be kept internally. It's not a good
practice and even error prone to expose implementation details.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 7e31a858 12-Dec-2021 Evan Quan <evan.quan@amd.com>

drm/amdgpu: move smu_debug_mask to a more proper place

As the smu_context will be invisible from outside(of power). Also,
the smu_debug_mask can be shared around all power code instead of
some specific framework(swSMU) only.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 6ff7fddb 29-Nov-2021 Lang Yu <lang.yu@amd.com>

drm/amdgpu: add support for SMU debug option

SMU firmware expects the driver maintains error context
and doesn't interact with SMU any more when SMU errors
occurred. That will aid in debugging SMU firmware issues.

Add SMU debug option support for this request, it can be
enabled or disabled via amdgpu_smu_debug debugfs file.
Use a 32-bit mask to indicate corresponding debug modes.
Currently, only one mode(HALT_ON_ERROR) is supported.
When enabled, it brings hardware to a kind of halt state
so that no one can touch it any more in the envent of SMU
errors.

The dirver interacts with SMU via sending messages. And
threre are three ways to sending messages to SMU in current
implementation. Handle them respectively as following:

1, smu_cmn_send_smc_msg_with_param() for normal timeout cases

Halt on any error.

2, smu_cmn_send_msg_without_waiting()/smu_cmn_wait_for_response()
for longer timeout cases

Halt on errors apart from ETIME. Otherwise this way won't work.
Let the user handle ETIME error in such a case.

3, smu_cmn_send_msg_without_waiting() for no waiting cases

Halt on errors apart from ETIME. Otherwise second way won't work.

== Command Guide ==

1, enable HALT_ON_ERROR mode

# echo 0x1 > /sys/kernel/debug/dri/0/amdgpu_smu_debug

2, disable HALT_ON_ERROR mode

# echo 0x0 > /sys/kernel/debug/dri/0/amdgpu_smu_debug

v5:
- Use bit mask to allow more debug features.(Evan)
- Use WRAN() instead of BUG().(Evan)

v4:
- Set to halt state instead of a simple hang.(Christian)

v3:
- Use debugfs_create_bool().(Christian)
- Put variable into smu_context struct.
- Don't resend command when timeout.

v2:
- Resend command when timeout.(Lijo)
- Use debugfs file instead of module parameter.

Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 58144d28 06-Oct-2021 Nirmoy Das <nirmoy.das@amd.com>

drm/amdgpu: unify BO evicting method in amdgpu_ttm

Unify BO evicting functionality for possible memory
types in amdgpu_ttm.c.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 5b9581df 06-Oct-2021 Nirmoy Das <nirmoy.das@amd.com>

drm/amdgpu: return early if debugfs is not initialized

Check first if debugfs is initialized before creating
amdgpu debugfs files.

References: https://gitlab.freedesktop.org/drm/amd/-/issues/1686
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# c8365dbd 30-Sep-2021 Christian König <christian.koenig@amd.com>

drm/amdgpu: revert "Add autodump debugfs node for gpu reset v8"

This reverts commit 728e7e0cd61899208e924472b9e641dbeb0775c4.

Further discussion reveals that this feature is severely broken
and needs to be reverted ASAP.

GPU reset can never be delayed by userspace even for debugging or
otherwise we can run into in kernel deadlocks.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 81d1bf01 20-Jul-2021 Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: add debugfs access to the IP discovery table

Useful for debugging and new asic validation.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 0fcfb300 10-Sep-2021 xinhui pan <xinhui.pan@amd.com>

drm/amdgpu: Fix a race of IB test

Direct IB submission should be exclusive. So use write lock.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 62d266b2 02-Sep-2021 Nirmoy Das <nirmoy.das@amd.com>

drm/amdgpu: cleanup debugfs for amdgpu rings

Use debugfs_create_file_size API for creating ring debugfs, and as its a
NULL returning API, change the return type for amdgpu_debugfs_ring_init
API as well. Also cleanup surrounding code.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 59715cff 02-Sep-2021 Nirmoy Das <nirmoy.das@amd.com>

drm/amdgpu: use IS_ERR for debugfs APIs

debugfs APIs returns encoded error so use
IS_ERR for checking return value.

v2: return PTR_ERR(ent)

References: https://gitlab.freedesktop.org/drm/amd/-/issues/1686
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-By: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 37df9560 20-Aug-2021 Tom St Denis <tom.stdenis@amd.com>

drm/amd/amdgpu: New debugfs interface for MMIO registers (v5)

This new debugfs interface uses an IOCTL interface in order to pass
along state information like SRBM and GRBM bank switching. This
new interface also allows a full 32-bit MMIO address range which
the previous didn't. With this new design we have room to grow
the flexibility of the file as need be.

(v2): Move read/write to .read/.write, fix style, add comment
for IOCTL data structure

(v3): C style comments

(v4): use u32 in struct and remove offset variable

(v5): Drop flag clearing in op function, use 0xFFFFFFFF for broadcast
instead of 0x3FF, use mutex for op/ioctl.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# b04ce53e 02-Sep-2021 Nirmoy Das <nirmoy.das@amd.com>

drm/amdgpu: use IS_ERR for debugfs APIs

debugfs APIs returns encoded error so use
IS_ERR for checking return value.

v2: return PTR_ERR(ent)

References: https://gitlab.freedesktop.org/drm/amd/-/issues/1686
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-By: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org


# c530b02f 12-May-2021 Jack Zhang <Jack.Zhang1@amd.com>

drm/amd/amdgpu embed hw_fence into amdgpu_job

Why: Previously hw fence is alloced separately with job.
It caused historical lifetime issues and corner cases.
The ideal situation is to take fence to manage both job
and fence's lifetime, and simplify the design of gpu-scheduler.

How:
We propose to embed hw_fence into amdgpu_job.
1. We cover the normal job submission by this method.
2. For ib_test, and submit without a parent job keep the
legacy way to create a hw fence separately.
v2:
use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is
embedded in a job.
v3:
remove redundant variable ring in amdgpu_job
v4:
add tdr sequence support for this feature. Add a job_run_counter to
indicate whether this job is a resubmit job.
v5
add missing handling in amdgpu_fence_enable_signaling

Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Signed-off-by: Jack Zhang <Jack.Zhang7@hotmail.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 391629bd 15-Jun-2021 Nirmoy Das <nirmoy.das@amd.com>

drm/amdgpu: remove amdgpu_vm_pt

Page table entries are now in embedded in VM BO, so
we do not need struct amdgpu_vm_pt. This patch replaces
struct amdgpu_vm_pt with struct amdgpu_vm_bo_base.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# e72d4a8b 20-May-2021 Lee Jones <lee.jones@linaro.org>

drm/amd/amdgpu/amdgpu_debugfs: Fix a couple of misnamed functions

Fixes the following W=1 kernel build warning(s):

drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1004: warning: expecting prototype for amdgpu_debugfs_regs_gfxoff_write(). Prototype was for amdgpu_debugfs_gfxoff_write() instead
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1053: warning: expecting prototype for amdgpu_debugfs_regs_gfxoff_status(). Prototype was for amdgpu_debugfs_gfxoff_read() instead

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 43fb6c19 02-Mar-2021 Kevin Wang <kevin1.wang@amd.com>

drm/amdgpu: fix parameter error of RREG32_PCIE() in amdgpu_regs_pcie

the register offset isn't needed division by 4 to pass RREG32_PCIE()

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 7271a5c2 25-Feb-2021 Yang Li <yang.lee@linux.alibaba.com>

drm/amdgpu: Replace DEFINE_SIMPLE_ATTRIBUTE with DEFINE_DEBUGFS_ATTRIBUTE

Fix the following coccicheck warning:
./drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1589:0-23: WARNING:
fops_ib_preempt should be defined with DEFINE_DEBUGFS_ATTRIBUTE
./drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1592:0-23: WARNING:
fops_sclk_set should be defined with DEFINE_DEBUGFS_ATTRIBUTE

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 98d28ac2 15-Feb-2021 Nirmoy Das <nirmoy.das@amd.com>

drm/amdgpu: do not use drm middle layer for debugfs

Use debugfs API directly instead of drm middle layer.

This also includes following debugfs file output changes:
1 amdgpu_evict_vram/amdgpu_evict_gtt output will not contain any braces.
e.g. (0) --> 0
2 amdgpu_gpu_recover output will print return value of
amdgpu_device_gpu_recover() instead of not so important "gpu recover"
message.

v2: * checkpatch.pl: use '0444' instead of S_IRUGO.
* remove S_IFREG from mode.
* remove mode variable.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 373720f7 14-Feb-2021 Nirmoy Das <nirmoy.das@amd.com>

drm/amd/pm: do not use drm middle layer for debugfs

Use debugfs API directly instead of drm middle layer.

v2: * checkpatch.pl: use '0444' instead of S_IRUGO.
* remove S_IFREG from mode.
* remove mode variable.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# afd3a359 14-Feb-2021 Nirmoy Das <nirmoy.das@amd.com>

drm/amd/display: do not use drm middle layer for debugfs

Use debugfs API directly instead of drm middle layer.

v2: * checkpatch.pl: use '0444' instead of S_IRUGO.
* remove S_IFREG from mode.
* remove mode variable.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 88293c03 10-Feb-2021 Nirmoy Das <nirmoy.das@amd.com>

drm/amdgpu: do not keep debugfs dentry

Cleanup unnecessary debugfs dentries and surrounding functions.

v3: remove return value check for debugfs_create_file()
v2: remove ttm_debugfs_entries array.
do not init variables.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 1aa46901 02-Mar-2021 Kevin Wang <kevin1.wang@amd.com>

drm/amdgpu: fix parameter error of RREG32_PCIE() in amdgpu_regs_pcie

the register offset isn't needed division by 4 to pass RREG32_PCIE()

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org


# ecaafb7b 08-Dec-2020 Jinzhou Su <Jinzhou.Su@amd.com>

drm/amdgpu: Add secure display TA interface

Add interface to load, unload, invoke command for
secure display TA.

v2: Add debugfs interface for secure display TA
v3: fix warning in copy_from_user (Alex)

Signed-off-by: Jinzhou.Su <Jinzhou.Su@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 6efa4b46 03-Dec-2020 Luben Tuikov <luben.tuikov@amd.com>

gpu/drm: ring_mirror_list --> pending_list

Rename "ring_mirror_list" to "pending_list",
to describe what something is, not what it does,
how it's used, or how the hardware implements it.

This also abstracts the actual hardware
implementation, i.e. how the low-level driver
communicates with the device it drives, ring, CAM,
etc., shouldn't be exposed to DRM.

The pending_list keeps jobs submitted, which are
out of our control. Usually this means they are
pending execution status in hardware, but the
latter definition is a more general (inclusive)
definition.

Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/405573/

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Christian König <christian.koenig@amd.com>


# 8935ff00 03-Dec-2020 Luben Tuikov <luben.tuikov@amd.com>

drm/scheduler: "node" --> "list"

Rename "node" to "list" in struct drm_sched_job,
in order to make it consistent with what we see
being used throughout gpu_scheduler.h, for
instance in struct drm_sched_entity, as well as
the rest of DRM and the kernel.

Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/403515/

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Christian König <christian.koenig@amd.com>


# 20ed491b 13-Nov-2020 Lee Jones <lee.jones@linaro.org>

drm/amd/amdgpu/amdgpu_debugfs: Demote obvious abuse of kernel-doc formatting

Fixes the following W=1 kernel build warning(s):

drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:308: warning: Function parameter or member 'f' not described in 'amdgpu_debugfs_regs_read'
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:308: warning: Function parameter or member 'buf' not described in 'amdgpu_debugfs_regs_read'
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:308: warning: Function parameter or member 'size' not described in 'amdgpu_debugfs_regs_read'
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:308: warning: Function parameter or member 'pos' not described in 'amdgpu_debugfs_regs_read'
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:317: warning: Function parameter or member 'f' not described in 'amdgpu_debugfs_regs_write'
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:317: warning: Function parameter or member 'buf' not described in 'amdgpu_debugfs_regs_write'
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:317: warning: Function parameter or member 'size' not described in 'amdgpu_debugfs_regs_write'
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:317: warning: Function parameter or member 'pos' not described in 'amdgpu_debugfs_regs_write'

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# f3729f7b 02-Nov-2020 Deepak R Varma <mh12gx2825@gmail.com>

drm/amdgpu/amdgpu: improve code indentation and alignment

General code indentation and alignment changes such as replace spaces
by tabs or align function arguments as per the coding style
guidelines. The patch corrects issues for various amdgpu_*.c files
for this driver. Issue reported by checkpatch script.

Signed-off-by: Deepak R Varma <mh12gx2825@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 19ae3330 26-Oct-2020 John Clements <john.clements@amd.com>

drm/amdgpu: added support for psp fw attestation

loaded fw can be queried from sys fs interface

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# ff72bc40 08-Oct-2020 Mihir Bhogilal Patel <Mihir.Patel@amd.com>

drm/amdgpu: Add debugfs entry for printing VM info

Create new debugfs entry to print memory info using VM buffer
objects.

V2: Added Common function for printing BO info.
Dump more VM lists for evicted, moved, relocated, invalidated.
Removed dumping VM mapped BOs.
V3: Fixed coding style comments, renamed print API and variables.
V4: Fixed coding style comments.

Signed-off-by: Mihir Bhogilal Patel <Mihir.Patel@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 4ce032d6 01-Oct-2020 Christian König <christian.koenig@amd.com>

drm/ttm: nuke ttm_bo_evict_mm and rename mgr function v3

Make it more clear what the resource manager function
does and nuke the wrapper function.

v2: nuke the wrapper
v3: fix typo in radeon, rebased

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> (v2)
Link: https://patchwork.freedesktop.org/patch/393914/


# f7ee1874 18-Sep-2020 Hawking Zhang <Hawking.Zhang@amd.com>

drm/amdgpu: support indirect access reg outside of mmio bar (v2)

support both direct and indirect accessor in unified
helper functions.

v2: Retire indirect mmio access via mm_index/data

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 4a580877 23-Aug-2020 Luben Tuikov <luben.tuikov@amd.com>

drm/amdgpu: Get DRM dev from adev by inline-f

Add a static inline adev_to_drm() to obtain
the DRM device pointer from an amdgpu_device pointer.

Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 1348969a 23-Aug-2020 Luben Tuikov <luben.tuikov@amd.com>

drm/amdgpu: drm_device to amdgpu_device by inline-f (v2)

Get the amdgpu_device from the DRM device by use
of an inline function, drm_to_adev(). The inline
function resolves a pointer to struct drm_device
to a pointer to struct amdgpu_device.

v2: Use a typed visible static inline function
instead of an invisible macro.

Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 6049db43 19-Aug-2020 Dennis Li <Dennis.Li@amd.com>

drm/amdgpu: change reset lock from mutex to rw_semaphore

clients don't need reset-lock for synchronization when no
GPU recovery.

v2:
change to return the return value of down_read_killable.

v3:
if GPU recovery begin, VF ignore FLR notification.

Reviewed-by: Monk Liu <monk.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 53b3f8f4 19-Aug-2020 Dennis Li <Dennis.Li@amd.com>

drm/amdgpu: refine codes to avoid reentering GPU recovery

if other threads have holden the reset lock, recovery will
fail to try_lock. Therefore we introduce atomic hive->in_reset
and adev->in_gpu_reset, to avoid reentering GPU recovery.

v2:
drop "? true : false" in the definition of amdgpu_in_reset

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# f1403342 12-Aug-2020 Christian König <christian.koenig@amd.com>

drm/amdgpu: revert "fix system hang issue during GPU reset"

The whole approach wasn't thought through till the end.

We already had a reset lock like this in the past and it caused the same problems like this one.

Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary.

This reverts commit df9c8d1aa278c435c30a69b8f2418b4a52fcb929.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# a4322e18 10-Aug-2020 Wenhui Sheng <Wenhui.Sheng@amd.com>

drm/amdgpu: add debugfs interface for RAP test

After amdgpu driver loading successfully, we can use
RAP debugfs interface <debugfs_dir>/dri/xxx/rap_test
to trigger RAP test.

Currently only L0 validate test is supported.

v2: refine amdgpu_rap.h

Signed-off-by: Wenhui Sheng <Wenhui.Sheng@amd.com>
Reviewed-by: Guchun Chen <Guchun.Chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# df9c8d1a 08-Jul-2020 Dennis Li <Dennis.Li@amd.com>

drm/amdgpu: fix system hang issue during GPU reset

when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover,
the atomic adev->in_gpu_reset and hive->in_reset are used to avoid
re-entering GPU recovery.

During GPU reset and resume, it is unsafe that other threads access GPU,
which maybe cause GPU reset failed. Therefore the new rw_semaphore
adev->reset_sem is introduced, which protect GPU from being accessed by
external threads during recovery.

v2:
1. add rwlock for some ioctls, debugfs and file-close function.
2. change to use dqm->is_resetting and dqm_lock for protection in kfd
driver.
3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid
re-enter GPU recovery for the same GPU hang.

v3:
1. change back to use adev->reset_sem to protect kfd callback
functions, because dqm_lock couldn't protect all codes, for example:
free_mqd must be called outside of dqm_lock;

[ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
[ 1230.177221] Call Trace:
[ 1230.178249] dump_stack+0x98/0xd5
[ 1230.179443] amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu]
[ 1230.180673] gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu]
[ 1230.181882] amdgpu_gart_unbind+0xa9/0xe0 [amdgpu]
[ 1230.183098] amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu]
[ 1230.184239] ? ttm_bo_put+0x171/0x5f0 [ttm]
[ 1230.185394] ttm_tt_unbind+0x21/0x40 [ttm]
[ 1230.186558] ttm_tt_destroy.part.12+0x12/0x60 [ttm]
[ 1230.187707] ttm_tt_destroy+0x13/0x20 [ttm]
[ 1230.188832] ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm]
[ 1230.189979] ttm_bo_put+0x1be/0x5f0 [ttm]
[ 1230.191230] amdgpu_bo_unref+0x1e/0x30 [amdgpu]
[ 1230.192522] amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu]
[ 1230.193833] free_mqd+0x25/0x40 [amdgpu]
[ 1230.195143] destroy_queue_cpsch+0x1a7/0x270 [amdgpu]
[ 1230.196475] pqm_destroy_queue+0x105/0x260 [amdgpu]
[ 1230.197819] kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu]
[ 1230.199154] kfd_ioctl+0x277/0x500 [amdgpu]
[ 1230.200458] ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu]
[ 1230.201656] ? tomoyo_file_ioctl+0x19/0x20
[ 1230.202831] ksys_ioctl+0x98/0xb0
[ 1230.204004] __x64_sys_ioctl+0x1a/0x20
[ 1230.205174] do_syscall_64+0x5f/0x250
[ 1230.206339] entry_SYSCALL_64_after_hwframe+0x49/0xbe

2. remove try_lock and introduce atomic hive->in_reset, to avoid
re-enter GPU recovery.

v4:
1. remove an unnecessary whitespace change in kfd_chardev.c
2. remove comment codes in amdgpu_device.c
3. add more detailed comment in commit message
4. define a wrap function amdgpu_in_reset

v5:
1. Fix some style issues.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Suggested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com>
Suggested-by: Luben Tukov <luben.tuikov@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 443c7f3c 07-Jul-2020 Jinzhou.Su <Jinzhou.Su@amd.com>

drm/amdgpu: add read amdgpu_gfxoff status in debugfs

Add interface for SMU12 device, used by UMR.

v2: fix code style

Signed-off-by: Jinzhou.Su <Jinzhou.Su@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# d845a205 09-Jul-2020 Jack Xiao <Jack.Xiao@amd.com>

drm/amdgpu: fix preemption unit test

Remove signaled jobs from job list and ensure the
job was indeed preempted.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 7bdb0899 09-Jul-2020 Jack Xiao <Jack.Xiao@amd.com>

drm/amdgpu: fix preemption unit test

Remove signaled jobs from job list and ensure the
job was indeed preempted.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# e5ef784b 09-Jun-2020 Evan Quan <evan.quan@amd.com>

drm/amd/powerplay: revise calling chain on retrieving frequency range

This helps to maintain clear code layers and drop unnecessary
parameter.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# c98f31d1 09-Jun-2020 Evan Quan <evan.quan@amd.com>

drm/amd/powerplay: revise calling chain on setting soft limit

This helps to maintain clear code layers and drop unnecessary
parameter.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 9eee152a 17-Jun-2020 Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/debugfs: fix ref count leak when pm_runtime_get_sync fails

The call to pm_runtime_get_sync increments the counter even in case of
failure, leading to incorrect ref count.
In case of failure, decrement the ref count before returning.

Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 888e32d7 17-Jun-2020 Chen Tao <chentao107@huawei.com>

drm/amdgpu/debugfs: fix memory leak when amdgpu_virt_enable_access_debugfs failed

Fix memory leak in amdgpu_debugfs_gpr_read not freeing data when
amdgpu_virt_enable_access_debugfs failed.

Fixes: 95a2f917387a2 ("drm/amdgpu: restrict debugfs register access under SR-IOV")
Signed-off-by: Chen Tao <chentao107@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 3e4aeff3 17-Jun-2020 Chen Tao <chentao107@huawei.com>

drm/amdgpu/debugfs: fix memory leak when pm_runtime_get_sync failed

Fix memory leak in amdgpu_debugfs_gpr_read not freeing data when
pm_runtime_get_sync failed.

Fixes: a9ffe2a983383 ("drm/amdgpu/debugfs: properly handle runtime pm")
Signed-off-by: Chen Tao <chentao107@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 728e7e0c 26-Apr-2020 Jiange Zhao <Jiange.Zhao@amd.com>

drm/amdgpu: Add autodump debugfs node for gpu reset v8

When GPU got timeout, it would notify an interested part
of an opportunity to dump info before actual GPU reset.

A usermode app would open 'autodump' node under debugfs system
and poll() for readable/writable. When a GPU reset is due,
amdgpu would notify usermode app through wait_queue_head and give
it 10 minutes to dump info.

After usermode app has done its work, this 'autodump' node is closed.
On node closure, amdgpu gets to know the dump is done through
the completion that is triggered in release().

There is no write or read callback because necessary info can be
obtained through dmesg and umr. Messages back and forth between
usermode app and amdgpu are unnecessary.

v2: (1) changed 'registered' to 'app_listening'
(2) add a mutex in open() to prevent race condition

v3 (chk): grab the reset lock to avoid race in autodump_open,
rename debugfs file to amdgpu_autodump,
provide autodump_read as well,
style and code cleanups

v4: add 'bool app_listening' to differentiate situations, so that
the node can be reopened; also, there is no need to wait for
completion when no app is waiting for a dump.

v5: change 'bool app_listening' to 'enum amdgpu_autodump_state'
add 'app_state_mutex' for race conditions:
(1)Only 1 user can open this file node
(2)wait_dump() can only take effect after poll() executed.
(3)eliminated the race condition between release() and
wait_dump()

v6: removed 'enum amdgpu_autodump_state' and 'app_state_mutex'
removed state checking in amdgpu_debugfs_wait_dump
Improve on top of version 3 so that the node can be reopened.

v7: move reinit_completion into open() so that only one user
can open it.

v8: remove complete_all() from amdgpu_debugfs_wait_dump().

Signed-off-by: Jiange Zhao <Jiange.Zhao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 95a2f917 07-Apr-2020 Yintian Tao <yttao@amd.com>

drm/amdgpu: restrict debugfs register access under SR-IOV

Under bare metal, there is no more else to take
care of the GPU register access through MMIO.
Under Virtualization, to access GPU register is
implemented through KIQ during run-time due to
world-switch.

Therefore, under SR-IOV user can only access
debugfs to r/w GPU registers when meets all
three conditions below.
- amdgpu_gpu_recovery=0
- TDR happened
- in_gpu_reset=0

v2: merge amdgpu_virt_can_access_debugfs() into
amdgpu_virt_enable_access_debugfs()

v3: drop ret variable in amdgpu_virt_enable_access_debugfs()
and directly return result

Signed-off-by: Yintian Tao <yttao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 2e0cc4d4 10-Mar-2020 Monk Liu <Monk.Liu@amd.com>

drm/amdgpu: revise RLCG access path

what changed:
1)provide new implementation interface for the rlcg access path
2)put SQ_CMD/SQ_IND_INDEX to GFX9 RLCG path to let debugfs's reg_op
function can access reg that need RLCG path help

now even debugfs's reg_op can used to dump wave.

tested-by: Monk Liu <monk.liu@amd.com>
tested-by: Zhou pengju <pengju.zhou@amd.com>
Signed-off-by: Zhou pengju <pengju.zhou@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 6397ec58 10-Mar-2020 Tom St Denis <tom.stdenis@amd.com>

drm/amd/amdgpu: Fix GPR read from debugfs (v2)

The offset into the array was specified in bytes but should
be in terms of 32-bit words. Also prevent large reads that
would also cause a buffer overread.

v2: Read from correct offset from internal storage buffer.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 17cb04f2 10-Mar-2020 Stanley.Yang <Stanley.Yang@amd.com>

drm/amdgpu: use amdgpu_ras.h in amdgpu_debugfs.c

include amdgpu_ras.h head file instead of use extern
ras_debugfs_create_all function

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 5bbc6604 10-Mar-2020 Tom St Denis <tom.stdenis@amd.com>

drm/amd/amdgpu: Fix GPR read from debugfs (v2)

The offset into the array was specified in bytes but should
be in terms of 32-bit words. Also prevent large reads that
would also cause a buffer overread.

v2: Read from correct offset from internal storage buffer.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org


# 204eaac6 05-Mar-2020 Tao Zhou <tao.zhou1@amd.com>

drm/amdgpu: call ras_debugfs_create_all in debugfs_init

and remove each ras IP's own debugfs creation

this is required to fix ras when the driver does not use the drm load
and unload callbacks due to ordering issues with the drm device node.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 0cf64555 26-Feb-2020 Chengming Gui <Jack.Gui@amd.com>

drm/amdgpu: Add debugfs interface to set arbitrary sclk for navi14 (v2)

add debugfs interface amdgpu_force_sclk
to set arbitrary sclk for navi14

v2: Add lock

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# d2790e10 27-Feb-2020 Yintian Tao <yttao@amd.com>

drm/amdgpu: no need to clean debugfs at amdgpu

drm_minor_unregister will invoke drm_debugfs_cleanup
to clean all the child node under primary minor node.
We don't need to invoke amdgpu_debugfs_fini and
amdgpu_debugfs_regs_cleanup to clean agian.
Otherwise, it will raise the NULL pointer like below.
[ 45.046029] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
[ 45.047256] PGD 0 P4D 0
[ 45.047713] Oops: 0002 [#1] SMP PTI
[ 45.048198] CPU: 0 PID: 2796 Comm: modprobe Tainted: G W OE 4.18.0-15-generic #16~18.04.1-Ubuntu
[ 45.049538] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 45.050651] RIP: 0010:down_write+0x1f/0x40
[ 45.051194] Code: 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb e8 ce d9 ff ff 48 ba 01 00 00 00 ff ff ff ff 48 89 d8 <f0> 48 0f c1 10 85 d2 74 05 e8 53 1c ff ff 65 48 8b 04 25 00 5c 01
[ 45.053702] RSP: 0018:ffffad8f4133fd40 EFLAGS: 00010246
[ 45.054384] RAX: 00000000000000a8 RBX: 00000000000000a8 RCX: ffffa011327dd814
[ 45.055349] RDX: ffffffff00000001 RSI: 0000000000000001 RDI: 00000000000000a8
[ 45.056346] RBP: ffffad8f4133fd48 R08: 0000000000000000 R09: ffffffffc0690a00
[ 45.057326] R10: ffffad8f4133fd58 R11: 0000000000000001 R12: ffffa0113cff0300
[ 45.058266] R13: ffffa0113c0a0000 R14: ffffffffc0c02a10 R15: ffffa0113e5c7860
[ 45.059221] FS: 00007f60d46f9540(0000) GS:ffffa0113fc00000(0000) knlGS:0000000000000000
[ 45.060809] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 45.061826] CR2: 00000000000000a8 CR3: 0000000136250004 CR4: 00000000003606f0
[ 45.062913] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 45.064404] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 45.065897] Call Trace:
[ 45.066426] debugfs_remove+0x36/0xa0
[ 45.067131] amdgpu_debugfs_ring_fini+0x15/0x20 [amdgpu]
[ 45.068019] amdgpu_debugfs_fini+0x2c/0x50 [amdgpu]
[ 45.068756] amdgpu_pci_remove+0x49/0x70 [amdgpu]
[ 45.069439] pci_device_remove+0x3e/0xc0
[ 45.070037] device_release_driver_internal+0x18a/0x260
[ 45.070842] driver_detach+0x3f/0x80
[ 45.071325] bus_remove_driver+0x59/0xd0
[ 45.071850] driver_unregister+0x2c/0x40
[ 45.072377] pci_unregister_driver+0x22/0xa0
[ 45.073043] amdgpu_exit+0x15/0x57c [amdgpu]
[ 45.073683] __x64_sys_delete_module+0x146/0x280
[ 45.074369] do_syscall_64+0x5a/0x120
[ 45.074916] entry_SYSCALL_64_after_hwframe+0x44/0xa9

v2: remove all debugfs cleanup/fini code at amdgpu
v3: squash in unused variable removal

Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# d090e7db 25-Feb-2020 Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/display: move debugfs init into core amdgpu debugfs (v2)

In order to remove the load and unload drm callbacks,
we need to reorder the init sequence to move all the drm
debugfs file handling. Do this for display.

v2: add config guard for DC

Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Harry Wentland <harry.wentland@amd.com> (v1)
Acked-by: Christian König <christian.koenig@amd.com> (v1)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# fd23cfcc 25-Feb-2020 Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/ring: move debugfs init into core amdgpu debugfs

In order to remove the load and unload drm callbacks,
we need to reorder the init sequence to move all the drm
debugfs file handling. Do this for rings.

Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# cd9e29e7 04-Feb-2020 Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/firmware: move debugfs init into core amdgpu debugfs

In order to remove the load and unload drm callbacks,
we need to reorder the init sequence to move all the drm
debugfs file handling. Do this for firmware.

Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# f9d64e6c 04-Feb-2020 Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/regs: move debugfs init into core amdgpu debugfs

In order to remove the load and unload drm callbacks,
we need to reorder the init sequence to move all the drm
debugfs file handling. Do this for register access files.

Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 3f5cea67 04-Feb-2020 Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/gem: move debugfs init into core amdgpu debugfs

In order to remove the load and unload drm callbacks,
we need to reorder the init sequence to move all the drm
debugfs file handling. Do this for gem.

Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 24038d58 03-Feb-2020 Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/fence: move debugfs init into core amdgpu debugfs

In order to remove the load and unload drm callbacks,
we need to reorder the init sequence to move all the drm
debugfs file handling. Do this for fence handling.

Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 15997544 03-Feb-2020 Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/sa: move debugfs init into core amdgpu debugfs

In order to remove the load and unload drm callbacks,
we need to reorder the init sequence to move all the drm
debugfs file handling. Do this for SA (sub allocator).

Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# a4c5b1bb 03-Feb-2020 Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/pm: move debugfs init into core amdgpu debugfs

In order to remove the load and unload drm callbacks,
we need to reorder the init sequence to move all the drm
debugfs file handling. Do this for pm.

Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# c5820361 03-Feb-2020 Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/ttm: move debugfs init into core amdgpu debugfs

In order to remove the load and unload drm callbacks,
we need to reorder the init sequence to move all the drm
debugfs file handling. Do this for ttm.

Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 923ffa6b 03-Feb-2020 Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: rename amdgpu_debugfs_preempt_cleanup

to amdgpu_debugfs_fini. It will be used for other things in
the future.

Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 669e2f91 21-Feb-2020 Tom St Denis <tom.stdenis@amd.com>

drm/amd/amdgpu: Add gfxoff debugfs entry

Write a 32-bit value of zero to disable GFXOFF and write a 32-bit
value of non-zero to enable GFXOFF.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# a9ffe2a9 10-Jan-2020 Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/debugfs: properly handle runtime pm

If driver debugfs files are accessed, power up the GPU
when necessary.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# c5b2bd5d 23-Dec-2019 zhengbin <zhengbin13@huawei.com>

drm/amdgpu: use true, false for bool variable in amdgpu_debugfs.c

Fixes coccicheck warning:

drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:132:2-10: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:140:2-10: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:142:13-21: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# a28fda31 05-Nov-2019 Andrey Grodzovsky <andrey.grodzovsky@amd.com>

drm/amdgpu: Avoid accidental thread reactivation.

Problem:
During GPU reset we call the GPU scheduler to suspend it's
thread, those two functions in amdgpu also suspend and resume
the sceduler for their needs but this can collide with GPU
reset in progress and accidently restart a suspended thread
before time.

Fix:
Serialize with GPU reset.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 44b582b3 21-Oct-2019 Geert Uytterhoeven <geert+renesas@glider.be>

drm/amdgpu: Remove superfluous void * cast in debugfs_create_file() call

There is no need to cast a typed pointer to a void pointer when calling
a function that accepts the latter. Remove it, as the cast prevents
further compiler checks.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 20323246 03-Sep-2019 zhong jiang <zhongjiang@huawei.com>

drm/amdgpu: remove the redundant null checks

debugfs_remove and kfree has taken the null check in account.
hence it is unnecessary to check it. Just remove the condition.
No functional change.

This issue was detected by using the Coccinelle software.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 929e571c 27-Jul-2019 Wang Xiayang <xywang.sjtu@sjtu.edu.cn>

drm/amdgpu: fix a potential information leaking bug

Coccinelle reports a path that the array "data" is never initialized.
The path skips the checks in the conditional branches when either
of callback functions, read_wave_vgprs and read_wave_sgprs, is not
registered. Later, the uninitialized "data" array is read
in the while-loop below and passed to put_user().

Fix the path by allocating the array with kcalloc().

The patch is simplier than adding a fall-back branch that explicitly
calls memset(data, 0, ...). Also it does not need the multiplication
1024*sizeof(*data) as the size parameter for memset() though there is
no risk of integer overflow.

Signed-off-by: Wang Xiayang <xywang.sjtu@sjtu.edu.cn>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 1a2c29bc 27-Jul-2019 Wang Xiayang <xywang.sjtu@sjtu.edu.cn>

drm/amdgpu: fix a potential information leaking bug

Coccinelle reports a path that the array "data" is never initialized.
The path skips the checks in the conditional branches when either
of callback functions, read_wave_vgprs and read_wave_sgprs, is not
registered. Later, the uninitialized "data" array is read
in the while-loop below and passed to put_user().

Fix the path by allocating the array with kcalloc().

The patch is simplier than adding a fall-back branch that explicitly
calls memset(data, 0, ...). Also it does not need the multiplication
1024*sizeof(*data) as the size parameter for memset() though there is
no risk of integer overflow.

Signed-off-by: Wang Xiayang <xywang.sjtu@sjtu.edu.cn>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 88891430 16-Jul-2019 Tom St Denis <tom.stdenis@amd.com>

drm/amd/amdgpu: Fix offset for vmid selection in debugfs interface

The register debugfs interface was using the wrong bitmask for vmid
selection for GFX_CNTL.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 0fa4246e 12-Jul-2019 Tom St Denis <tom.stdenis@amd.com>

drm/amd/amdgpu: Add VMID to SRBM debugfs bank selection

Add 5 bits to the offset for SRBM selection to handle VMIDs. Also
update the select_me_pipe_q() callback to also select VMID.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 80f8fb91 22-Jan-2019 Jack Xiao <Jack.Xiao@amd.com>

drm/amdgpu: mark the partial job as preempted in mcbp unit test

In mcbp unit test, the test should detect the preempted job which may
be a partial execution ib and mark it as preempted; so that the gfx
block can correctly generate PM4 frame.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 6698a3d0 20-Jun-2019 Jack Xiao <Jack.Xiao@amd.com>

drm/amdgpu: add mcbp unit test in debugfs (v3)

The MCBP unit test is used to test the functionality of MCBP.
It emualtes to send preemption request and resubmit the unfinished
jobs.

v2: squash in fixes (Alex)
v3: squash in memory leak fix (Jack)

Acked-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# fdf2f6c5 09-Jun-2019 Sam Ravnborg <sam@ravnborg.org>

drm/amd: drop use of drmP.h in amdgpu/amdgpu*

Drop use of drmP.h in all files named amdgpu*
in drm/amd/amdgpu/

Fix fallout.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20190609220757.10862-10-sam@ravnborg.org


# 4a5a2de6 10-Jan-2019 Kevin Wang <Kevin1.Wang@amd.com>

drm/amd/powerplay: implement sysfs of amdgpu_get_busy_percent for smu11

add interface amdgpu_get_busy_percent for smu11

v2: convert data pointer type to uint32_t *.

Signed-off-by: Kevin Wang <Kevin1.Wang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 7a5e0d9a 11-Feb-2019 Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: don't clamp debugfs register access to the BAR size

This prevents us from accessing extended registers in tools like
umr. The register access functions already check if the offset
is beyond the BAR size and use the indirect accessors with locking
so this is safe.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# d344b21b 14-Jun-2018 Dan Carpenter <dan.carpenter@oracle.com>

drm/amd/amdgpu: Fix debugfs error handling

The error handling is wrong and "ent" could be NULL we when dereference
it to get "ent->d_inode".

The thing is that normally debugfs_create_file() is not supposed to
require (or have) any error handling. That function does return error
pointers if debugfs is turned off but we know it's enable here. When
it's enabled, then it returns NULL on error.

So what I did was I stripped out all the error handling except around
the i_size_write(). I could have just used a NULL check instead of an
IS_ERR_OR_NULL() but I figured this was more clear because that way you
don't have to look at the surrounding code to see whether debugfs is
enabled or not.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 7e4237db 02-May-2018 Tom St Denis <tom.stdenis@amd.com>

drm/amd/amdgpu: Add some documentation to the debugfs entries

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# b13aa109 26-Mar-2018 Rex Zhu <Rex.Zhu@amd.com>

drm/amdgpu: Use dpm_enabled as dpm state flag

driver will set dpm_enabled to true only when
module parameter amdgpu_dpm not equal to 0 and
smu hw initialize successfully.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# f7a9ee81 29-Mar-2018 Andrey Grodzovsky <andrey.grodzovsky@amd.com>

drm/amdgpu: Add support for SRBM selection v3

Also remove code duplication in write and read regs functions.
This also fixes potential missing unlock in amdgpu_debugfs_regs_write
in case get_user would fail.

v2: Add SRBM mutex locking.
v3: Fix TO counter and fix comment location.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 87e90c76 19-Feb-2018 Christian König <christian.koenig@amd.com>

drm/amdgpu: add amdgpu_evict_gtt debugfs entry

Allow evicting all BOs from the GTT domain.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexdeucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 75758255 14-Dec-2017 Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: move debugfs functions to their own file

amdgpu_device.c was getting pretty cluttered.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>