#
b8f67b9d |
|
18-Jan-2024 |
Shashank Sharma <shashank.sharma@amd.com> |
drm/amdgpu: change vm->task_info handling This patch changes the handling and lifecycle of vm->task_info object. The major changes are: - vm->task_info is a dynamically allocated ptr now, and its uasge is reference counted. - introducing two new helper funcs for task_info lifecycle management - amdgpu_vm_get_task_info: reference counts up task_info before returning this info - amdgpu_vm_put_task_info: reference counts down task_info - last put to task_info() frees task_info from the vm. This patch also does logistical changes required for existing usage of vm->task_info. V2: Do not block all the prints when task_info not found (Felix) V3: Fixed review comments from Felix - Fix wrong indentation - No debug message for -ENOMEM - Add NULL check for task_info - Do not duplicate the debug messages (ti vs no ti) - Get first reference of task_info in vm_init(), put last in vm_fini() V4: Fixed review comments from Felix - fix double reference increment in create_task_info - change amdgpu_vm_get_task_info_pasid - additional changes in amdgpu_gem.c while porting Cc: Christian Koenig <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
9749c868 |
|
04-Jan-2024 |
Ma Jun <Jun.Ma2@amd.com> |
drm/amdgpu: Fix the warning info in mode1 reset Fix the warning info below during mode1 reset. [ +0.000004] Call Trace: [ +0.000004] <TASK> [ +0.000006] ? show_regs+0x6e/0x80 [ +0.000011] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000005] ? __warn+0x91/0x150 [ +0.000009] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000006] ? report_bug+0x19d/0x1b0 [ +0.000013] ? handle_bug+0x46/0x80 [ +0.000012] ? exc_invalid_op+0x1d/0x80 [ +0.000011] ? asm_exc_invalid_op+0x1f/0x30 [ +0.000014] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000007] ? __flush_work.isra.0+0x208/0x390 [ +0.000007] ? _prb_read_valid+0x216/0x290 [ +0.000008] __cancel_work_timer+0x11d/0x1a0 [ +0.000007] ? try_to_grab_pending+0xe8/0x190 [ +0.000012] cancel_work_sync+0x14/0x20 [ +0.000008] amddrm_sched_stop+0x3c/0x1d0 [amd_sched] [ +0.000032] amdgpu_device_gpu_recover+0x29a/0xe90 [amdgpu] This warning info was printed after applying the patch "drm/sched: Convert drm scheduler to use a work queue rather than kthread". The root cause is that amdgpu driver tries to use the uninitialized work_struct in the struct drm_gpu_scheduler v2: - Rename the function to amdgpu_ring_sched_ready and move it to amdgpu_ring.c (Alex) v3: - Fix a few more checks based on Vitaly's patch (Alex) v4: - squash in fix noticed by Bert in https://gitlab.freedesktop.org/drm/amd/-/issues/3139 Fixes: 11b3b9f461c5 ("drm/sched: Check scheduler ready before calling timeout handling") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
bb34bc2c |
|
04-Jan-2024 |
Ma Jun <Jun.Ma2@amd.com> |
drm/amdgpu: Fix the warning info in mode1 reset Fix the warning info below during mode1 reset. [ +0.000004] Call Trace: [ +0.000004] <TASK> [ +0.000006] ? show_regs+0x6e/0x80 [ +0.000011] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000005] ? __warn+0x91/0x150 [ +0.000009] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000006] ? report_bug+0x19d/0x1b0 [ +0.000013] ? handle_bug+0x46/0x80 [ +0.000012] ? exc_invalid_op+0x1d/0x80 [ +0.000011] ? asm_exc_invalid_op+0x1f/0x30 [ +0.000014] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000007] ? __flush_work.isra.0+0x208/0x390 [ +0.000007] ? _prb_read_valid+0x216/0x290 [ +0.000008] __cancel_work_timer+0x11d/0x1a0 [ +0.000007] ? try_to_grab_pending+0xe8/0x190 [ +0.000012] cancel_work_sync+0x14/0x20 [ +0.000008] amddrm_sched_stop+0x3c/0x1d0 [amd_sched] [ +0.000032] amdgpu_device_gpu_recover+0x29a/0xe90 [amdgpu] This warning info was printed after applying the patch "drm/sched: Convert drm scheduler to use a work queue rather than kthread". The root cause is that amdgpu driver tries to use the uninitialized work_struct in the struct drm_gpu_scheduler v2: - Rename the function to amdgpu_ring_sched_ready and move it to amdgpu_ring.c (Alex) v3: - Fix a few more checks based on Vitaly's patch (Alex) v4: - squash in fix noticed by Bert in https://gitlab.freedesktop.org/drm/amd/-/issues/3139 Fixes: 11b3b9f461c5 ("drm/sched: Check scheduler ready before calling timeout handling") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
5eb8094a |
|
19-Dec-2023 |
Mangesh Gadre <Mangesh.Gadre@amd.com> |
drm/amdgpu: Add register read/write debugfs support for AID's SMN address is larger than 32 bits for registers on different AID's Updating existing interface to support access to such registers. Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
afe58346 |
|
27-Nov-2023 |
Alex Deucher <alexander.deucher@amd.com> |
drm/amdgpu/debugfs: fix error code when smc register accessors are NULL Should be -EOPNOTSUPP. Fixes: 5104fdf50d32 ("drm/amdgpu: Fix a null pointer access when the smc_rreg pointer is NULL") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
b2662d4c |
|
23-Nov-2023 |
shaoyunl <shaoyun.liu@amd.com> |
drm/amdgpu: SW part of MES event log enablement This is the generic SW part, prepare the event log buffer and dump it through debugfs Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
7b194fdc |
|
22-Nov-2023 |
Lu Yao <yaolu@kylinos.cn> |
drm/amdgpu: Fix cat debugfs amdgpu_regs_didt causes kernel null pointer For 'AMDGPU_FAMILY_SI' family cards, in 'si_common_early_init' func, init 'didt_rreg' and 'didt_wreg' to 'NULL'. But in func 'amdgpu_debugfs_regs_didt_read/write', using 'RREG32_DIDT' 'WREG32_DIDT' lacks of relevant judgment. And other 'amdgpu_ip_block_version' that use these two definitions won't be added for 'AMDGPU_FAMILY_SI'. So, add null pointer judgment before calling. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Lu Yao <yaolu@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
35963cf2 |
|
30-Oct-2023 |
Matthew Brost <matthew.brost@intel.com> |
drm/sched: Add drm_sched_wqueue_* helpers Add scheduler wqueue ready, stop, and start helpers to hide the implementation details of the scheduler from the drivers. v2: - s/sched_wqueue/sched_wqueue (Luben) - Remove the extra white line after the return-statement (Luben) - update drm_sched_wqueue_ready comment (Luben) Cc: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://lore.kernel.org/r/20231031032439.1558703-2-matthew.brost@intel.com Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
|
#
2161e09c |
|
22-Nov-2023 |
Lu Yao <yaolu@kylinos.cn> |
drm/amdgpu: Fix cat debugfs amdgpu_regs_didt causes kernel null pointer For 'AMDGPU_FAMILY_SI' family cards, in 'si_common_early_init' func, init 'didt_rreg' and 'didt_wreg' to 'NULL'. But in func 'amdgpu_debugfs_regs_didt_read/write', using 'RREG32_DIDT' 'WREG32_DIDT' lacks of relevant judgment. And other 'amdgpu_ip_block_version' that use these two definitions won't be added for 'AMDGPU_FAMILY_SI'. So, add null pointer judgment before calling. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Lu Yao <yaolu@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
5104fdf5 |
|
22-Oct-2023 |
Qu Huang <qu.huang@linux.dev> |
drm/amdgpu: Fix a null pointer access when the smc_rreg pointer is NULL In certain types of chips, such as VEGA20, reading the amdgpu_regs_smc file could result in an abnormal null pointer access when the smc_rreg pointer is NULL. Below are the steps to reproduce this issue and the corresponding exception log: 1. Navigate to the directory: /sys/kernel/debug/dri/0 2. Execute command: cat amdgpu_regs_smc 3. Exception Log:: [4005007.702554] BUG: kernel NULL pointer dereference, address: 0000000000000000 [4005007.702562] #PF: supervisor instruction fetch in kernel mode [4005007.702567] #PF: error_code(0x0010) - not-present page [4005007.702570] PGD 0 P4D 0 [4005007.702576] Oops: 0010 [#1] SMP NOPTI [4005007.702581] CPU: 4 PID: 62563 Comm: cat Tainted: G OE 5.15.0-43-generic #46-Ubunt u [4005007.702590] RIP: 0010:0x0 [4005007.702598] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. [4005007.702600] RSP: 0018:ffffa82b46d27da0 EFLAGS: 00010206 [4005007.702605] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa82b46d27e68 [4005007.702609] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9940656e0000 [4005007.702612] RBP: ffffa82b46d27dd8 R08: 0000000000000000 R09: ffff994060c07980 [4005007.702615] R10: 0000000000020000 R11: 0000000000000000 R12: 00007f5e06753000 [4005007.702618] R13: ffff9940656e0000 R14: ffffa82b46d27e68 R15: 00007f5e06753000 [4005007.702622] FS: 00007f5e0755b740(0000) GS:ffff99479d300000(0000) knlGS:0000000000000000 [4005007.702626] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [4005007.702629] CR2: ffffffffffffffd6 CR3: 00000003253fc000 CR4: 00000000003506e0 [4005007.702633] Call Trace: [4005007.702636] <TASK> [4005007.702640] amdgpu_debugfs_regs_smc_read+0xb0/0x120 [amdgpu] [4005007.703002] full_proxy_read+0x5c/0x80 [4005007.703011] vfs_read+0x9f/0x1a0 [4005007.703019] ksys_read+0x67/0xe0 [4005007.703023] __x64_sys_read+0x19/0x20 [4005007.703028] do_syscall_64+0x5c/0xc0 [4005007.703034] ? do_user_addr_fault+0x1e3/0x670 [4005007.703040] ? exit_to_user_mode_prepare+0x37/0xb0 [4005007.703047] ? irqentry_exit_to_user_mode+0x9/0x20 [4005007.703052] ? irqentry_exit+0x19/0x30 [4005007.703057] ? exc_page_fault+0x89/0x160 [4005007.703062] ? asm_exc_page_fault+0x8/0x30 [4005007.703068] entry_SYSCALL_64_after_hwframe+0x44/0xae [4005007.703075] RIP: 0033:0x7f5e07672992 [4005007.703079] Code: c0 e9 b2 fe ff ff 50 48 8d 3d fa b2 0c 00 e8 c5 1d 02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 e c 28 48 89 54 24 [4005007.703083] RSP: 002b:00007ffe03097898 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [4005007.703088] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f5e07672992 [4005007.703091] RDX: 0000000000020000 RSI: 00007f5e06753000 RDI: 0000000000000003 [4005007.703094] RBP: 00007f5e06753000 R08: 00007f5e06752010 R09: 00007f5e06752010 [4005007.703096] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000022000 [4005007.703099] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 [4005007.703105] </TASK> [4005007.703107] Modules linked in: nf_tables libcrc32c nfnetlink algif_hash af_alg binfmt_misc nls_ iso8859_1 ipmi_ssif ast intel_rapl_msr intel_rapl_common drm_vram_helper drm_ttm_helper amd64_edac t tm edac_mce_amd kvm_amd ccp mac_hid k10temp kvm acpi_ipmi ipmi_si rapl sch_fq_codel ipmi_devintf ipm i_msghandler msr parport_pc ppdev lp parport mtd pstore_blk efi_pstore ramoops pstore_zone reed_solo mon ip_tables x_tables autofs4 ib_uverbs ib_core amdgpu(OE) amddrm_ttm_helper(OE) amdttm(OE) iommu_v 2 amd_sched(OE) amdkcl(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm igb ahci xhci_pci libahci i2c_piix4 i2c_algo_bit xhci_pci_renesas dca [4005007.703184] CR2: 0000000000000000 [4005007.703188] ---[ end trace ac65a538d240da39 ]--- [4005007.800865] RIP: 0010:0x0 [4005007.800871] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. [4005007.800874] RSP: 0018:ffffa82b46d27da0 EFLAGS: 00010206 [4005007.800878] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa82b46d27e68 [4005007.800881] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9940656e0000 [4005007.800883] RBP: ffffa82b46d27dd8 R08: 0000000000000000 R09: ffff994060c07980 [4005007.800886] R10: 0000000000020000 R11: 0000000000000000 R12: 00007f5e06753000 [4005007.800888] R13: ffff9940656e0000 R14: ffffa82b46d27e68 R15: 00007f5e06753000 [4005007.800891] FS: 00007f5e0755b740(0000) GS:ffff99479d300000(0000) knlGS:0000000000000000 [4005007.800895] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [4005007.800898] CR2: ffffffffffffffd6 CR3: 00000003253fc000 CR4: 00000000003506e0 Signed-off-by: Qu Huang <qu.huang@linux.dev> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
2d6a2a28 |
|
15-Sep-2023 |
André Almeida <andrealmeid@igalia.com> |
drm/amdgpu: Encapsulate all device reset info To better organize struct amdgpu_device, keep all reset information related fields together in a separated struct. Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Reviewed-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
ad19c200 |
|
26-Jul-2023 |
Praful Swarnakar <Praful.Swarnakar@amd.com> |
drm/amdgpu: Fix style issues in amdgpu_debugfs.c Fixes the following to align to linux coding style: WARNING: Missing a blank line after declarations WARNING: sizeof *rd should be sizeof(*rd) Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Praful Swarnakar <Praful.Swarnakar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
8ed49dd1 |
|
16-Jun-2023 |
Victor Lu <victorchengchi.lu@amd.com> |
drm/amdgpu: Add RLCG interface driver implementation for gfx v9.4.3 (v3) Add RLCG interface support for gfx v9.4.3 and multiple XCCs. Do not enable it yet. v2: Fix amdgpu_rlcg_reg_access_ctrl init, add support for multiple XCCs in amdgpu_mm_wreg_mmio_rlc v3: Use GET_INST() when indexing amdgpu_rlcg_reg_access_ctrl Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
9c9d501b |
|
25-May-2023 |
Dan Carpenter <dan.carpenter@linaro.org> |
drm/amd/amdgpu: Fix up locking etc in amdgpu_debugfs_gprwave_ioctl() There are two bugs here. 1) Drop the lock if copy_from_user() fails. 2) If the copy fails then the correct error code is -EFAULT instead of -EINVAL. I also broke up the long line and changed "sizeof rd->id" to "sizeof(rd->id)". Fixes: 553f973a0d7b ("drm/amd/amdgpu: Update debugfs for XCC support (v3)") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
109b4d8c |
|
14-May-2023 |
Su Hui <suhui@nfschina.com> |
drm/amdgpu: remove unnecessary (void*) conversions No need cast (void*) to (struct amdgpu_device *). Signed-off-by: Su Hui <suhui@nfschina.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
553f973a |
|
11-Oct-2022 |
Tom St Denis <tom.stdenis@amd.com> |
drm/amd/amdgpu: Update debugfs for XCC support (v3) This patch updates the 'regs2' interface for MMIO registers to add a new IOCTL command for a 'v2' state data that includes the XCC ID. This patch then updates amdgpu_gfx_select_se_sh() and amdgpu_gfx_select_me_pipe_q() (and the implementations in the gfx drivers) to support an additional parameter. This patch then creates a new debugfs interface "gprwave" which is a merge of shader GPR and wave status access. This new inteface uses an IOCTL to select banks as well as XCC identity. (v2) Fix missing xcc_id in wave_ind function (v3) Fix pm runtime calls and mutex locking (v4) Fix bad label Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
8fa76350 |
|
28-Apr-2023 |
Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> |
drm/amd/amdgpu: Fix style problems in amdgpu_debugfs.c Fix the following issues reported by checkpatch: WARNING: please, no space before tabs WARNING: Prefer 'unsigned int' to bare use of 'unsigned' WARNING: sizeof *rd should be sizeof(*rd) WARNING: Missing a blank line after declarations WARNING: sizeof rd->id should be sizeof(rd->id) WARNING: static const char * array should probably be static const char * const WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using octal permissions '0444'. WARNING: Prefer seq_puts to seq_printf ERROR: space prohibited after that open parenthesis '(' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
d51ac6d0 |
|
23-May-2022 |
Le Ma <le.ma@amd.com> |
drm/amdgpu: add xcc index argument to select_sh_se function v2 v1: To support multiple XCD case (Le) v2: introduce xcc index to gfx_v11_0_select_sh_se (Hawking) Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
cd3a8a59 |
|
18-Nov-2022 |
Christian König <christian.koenig@amd.com> |
drm/ttm: remove ttm_bo_(un)lock_delayed_workqueue Those functions never worked correctly since it is still perfectly possible that a buffer object is released and the background worker restarted even after calling them. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221125102137.1801-2-christian.koenig@amd.com
|
#
d09ef243 |
|
19-Jul-2022 |
Alex Deucher <alexander.deucher@amd.com> |
drm/amdgpu: clarify DC checks There are several places where we don't want to check if a particular asic could support DC, but rather, if DC is enabled. Set a flag if DC is enabled and check for that rather than if a device supports DC or not. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
6aa58939 |
|
12-Oct-2022 |
Victor Zhao <Victor.Zhao@amd.com> |
Revert "drm/amdgpu: add debugfs amdgpu_reset_level" This reverts commit 5bd8d53f6fa53eab5433698d1362dae2aa53c1cc. This commit breaks the reset logic for aldebaran, revert it for now. Will move the mask inside the reset handler. Fixes: 5bd8d53f6fa53e ("drm/amdgpu: add debugfs amdgpu_reset_level") Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
afbaa155 |
|
12-Oct-2022 |
Victor Zhao <Victor.Zhao@amd.com> |
Revert "drm/amdgpu: add debugfs amdgpu_reset_level" This reverts commit 5bd8d53f6fa53eab5433698d1362dae2aa53c1cc. This commit breaks the reset logic for aldebaran, revert it for now. Will move the mask inside the reset handler. Fixes: 5bd8d53f6fa53e ("drm/amdgpu: add debugfs amdgpu_reset_level") Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
0ad7347a |
|
10-Aug-2022 |
André Almeida <andrealmeid@igalia.com> |
drm/amd: Add detailed GFXOFF stats to debugfs Add debugfs interface to log GFXOFF statistics: - Read amdgpu_gfxoff_count to get the total GFXOFF entry count at the time of query since system power-up - Write 1 to amdgpu_gfxoff_residency to start logging, and 0 to stop. Read it to get average GFXOFF residency % multiplied by 100 during the last logging interval. Both features are designed to be keep the values persistent between suspends. Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
5bd8d53f |
|
13-Jun-2022 |
Victor Zhao <Victor.Zhao@amd.com> |
drm/amdgpu: add debugfs amdgpu_reset_level Introduce amdgpu_reset_level debugfs in order to help debug and test specific type of reset. Also helps blocking unwanted type of resets. By default, mode2 reset will not be enabled v2: make this debugfs in adev and use debugfs_create_u32 Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
ad2feebd |
|
29-Jul-2022 |
Sebin Sebastian <mailmesebin00@gmail.com> |
drm/amdgpu: double free error and freeing uninitialized null pointer Fix a double free and an uninitialized pointer read error. Both tmp and new are pointing at same address and both are freed which leads to double free. Adding a check to verify if new and tmp are free in the error_free label fixes the double free issue. new is not initialized to null which also leads to a free on an uninitialized pointer. Reviewed-by: André Almeida <andrealmeid@igalia.com> Suggested by: S. Amaranath <Amaranath.Somalapuram@amd.com> Signed-off-by: Sebin Sebastian <mailmesebin00@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
4686177f |
|
14-Jul-2022 |
André Almeida <andrealmeid@igalia.com> |
drm/amd/debugfs: Expose GFXOFF state to userspace GFXOFF has two different "state" values: one to define if the GPU is allowed/disallowed to enter GFXOFF, usually called state; and another one to define if currently GFXOFF is being used, usually called status. Even when GFXOFF is allowed, GPU firmware can decide to not used it accordingly to the GPU load. Userspace can allow/disallow GPUs to enter into GFXOFF via debugfs. The kernel maintains a counter of requests for GFXOFF (gfx_off_req_count) that should be decreased to allow GFXOFF and increased to disallow. The issue with this interface is that userspace can't be sure if GFXOFF is currently allowed. Even by checking amdgpu_gfxoff file, one might get an ambiguous 2, that means that GPU is currently out of GFXOFF, but that can be either because it's currently disallowed or because it's allowed but given the current GPU load it's enabled. Then, userspace needs to rely on the fact that GFXOFF is enabled by default on boot and to track this information. To make userspace life easier and GFXOFF more reliable, return the current state of GFXOFF to userspace when reading amdgpu_gfxoff with the same semantics of writing: 0 means not allowed, not 0 means allowed. Expose the current status of GFXOFF through a new file, amdgpu_gfxoff_status. Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
edadd6fc |
|
04-Jul-2022 |
André Almeida <andrealmeid@igalia.com> |
drm/amdpgu/debugfs: Simplify some exit paths To avoid code repetition, unify the function exit path when possible. No functional changes. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
651d7ee6 |
|
01-Jun-2022 |
Somalapuram Amaranath <Amaranath.Somalapuram@amd.com> |
drm/amdgpu: save the reset dump register value for devcoredump Allocate memory for register value and use the same values for devcoredump. v1 -> v2: Change krealloc_array() to kmalloc_array() v2 -> v3: Fix alignment Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com> Reviewed-by: Shashank Sharma <Shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
e50d9ba0 |
|
17-Apr-2022 |
Candice Li <candice.li@amd.com> |
drm/amdgpu: Add debugfs TA load/unload/invoke support v1: Add debugfs support to load/unload/invoke TA in runtime. v2: 1. Update some variables to static. 2. Use PAGE_ALIGN to calculate shared buf size directly. 3. Remove fp check. 4. Update debugfs from read to write. Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
dc2947b3 |
|
08-Apr-2022 |
Tom St Denis <tom.stdenis@amd.com> |
drm/amd/amdgpu: Update debugfs GCA data The data revision was not changed to 5 from 4 when the CG flags were extended to 64-bits. Since this was missed I took the opportunity to add future upper 64-bits of PG flags as well so we don't need to bump it again when that comes. Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
25faeddc |
|
25-Mar-2022 |
Evan Quan <evan.quan@amd.com> |
drm/amdgpu: expand cg_flags from u32 to u64 With this, we can support more CG flags. Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
11eb648d |
|
02-Mar-2022 |
Ruijing Dong <ruijing.dong@amd.com> |
drm/amdgpu/vcn: Add vcn firmware log vcn fwlog is for debugging purpose only, by default, it is disabled. Signed-off-by: Ruijing Dong <ruijing.dong@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
ca6fcfa8 |
|
27-Feb-2022 |
Tom Rix <trix@redhat.com> |
drm/amdgpu: Fix realloc of ptr Clang static analysis reports this error amdgpu_debugfs.c:1690:9: warning: 1st function call argument is an uninitialized value tmp = krealloc_array(tmp, i + 1, ^~~~~~~~~~~~~~~~~~~~~~~~~~~ realloc uses tmp, so tmp can not be garbage. And the return needs to be checked. Fixes: 5ce5a584cb82 ("drm/amdgpu: add debugfs for reset registers list") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
5ce5a584 |
|
23-Feb-2022 |
Somalapuram Amaranath <Amaranath.Somalapuram@amd.com> |
drm/amdgpu: add debugfs for reset registers list List of register populated for dump collection during the GPU reset. Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
e7c47231 |
|
18-Feb-2022 |
Alex Deucher <alexander.deucher@amd.com> |
drm/amdgpu: expose benchmarks via debugfs They provide a nice smoke test of transfer performance using SDMA. Allow the user to run these at runtime rather than only at init time. v2: fix permissions (Alex) Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
8f74f68d |
|
15-Feb-2022 |
Tom St Denis <tom.stdenis@amd.com> |
drm/amd/amdgpu: Add APU flag to gca_config debugfs data (v3) Needed by umr to detect if ip discovered ASIC is an APU or not. (v2): Remove asic type from packet it's not strictly needed (v3): Correct comment Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
d0fb18b5 |
|
19-Jan-2022 |
Andrey Grodzovsky <andrey.grodzovsky@amd.com> |
drm/amdgpu: Move reset sem into reset_domain We want single instance of reset sem across all reset clients because in case of XGMI we should stop access cross device MMIO because any of them could be in a reset in the moment. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74117.html
|
#
4bd8dd0d |
|
18-Jan-2022 |
Yongzhi Liu <lyz_cs@pku.edu.cn> |
drm/amdgpu: Add missing pm_runtime_put_autosuspend pm_runtime_get_sync() increments the runtime PM usage counter even when it returns an error code, thus a matching decrement is needed on the error handling path to keep the counter balanced. Signed-off-by: Yongzhi Liu <lyz_cs@pku.edu.cn> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
bc143d8b |
|
21-Nov-2021 |
Evan Quan <evan.quan@amd.com> |
drm/amd/pm: do not expose implementation details to other blocks out of power Those implementation details(whether swsmu supported, some ppt_funcs supported, accessing internal statistics ...)should be kept internally. It's not a good practice and even error prone to expose implementation details. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
7e31a858 |
|
12-Dec-2021 |
Evan Quan <evan.quan@amd.com> |
drm/amdgpu: move smu_debug_mask to a more proper place As the smu_context will be invisible from outside(of power). Also, the smu_debug_mask can be shared around all power code instead of some specific framework(swSMU) only. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
6ff7fddb |
|
29-Nov-2021 |
Lang Yu <lang.yu@amd.com> |
drm/amdgpu: add support for SMU debug option SMU firmware expects the driver maintains error context and doesn't interact with SMU any more when SMU errors occurred. That will aid in debugging SMU firmware issues. Add SMU debug option support for this request, it can be enabled or disabled via amdgpu_smu_debug debugfs file. Use a 32-bit mask to indicate corresponding debug modes. Currently, only one mode(HALT_ON_ERROR) is supported. When enabled, it brings hardware to a kind of halt state so that no one can touch it any more in the envent of SMU errors. The dirver interacts with SMU via sending messages. And threre are three ways to sending messages to SMU in current implementation. Handle them respectively as following: 1, smu_cmn_send_smc_msg_with_param() for normal timeout cases Halt on any error. 2, smu_cmn_send_msg_without_waiting()/smu_cmn_wait_for_response() for longer timeout cases Halt on errors apart from ETIME. Otherwise this way won't work. Let the user handle ETIME error in such a case. 3, smu_cmn_send_msg_without_waiting() for no waiting cases Halt on errors apart from ETIME. Otherwise second way won't work. == Command Guide == 1, enable HALT_ON_ERROR mode # echo 0x1 > /sys/kernel/debug/dri/0/amdgpu_smu_debug 2, disable HALT_ON_ERROR mode # echo 0x0 > /sys/kernel/debug/dri/0/amdgpu_smu_debug v5: - Use bit mask to allow more debug features.(Evan) - Use WRAN() instead of BUG().(Evan) v4: - Set to halt state instead of a simple hang.(Christian) v3: - Use debugfs_create_bool().(Christian) - Put variable into smu_context struct. - Don't resend command when timeout. v2: - Resend command when timeout.(Lijo) - Use debugfs file instead of module parameter. Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
58144d28 |
|
06-Oct-2021 |
Nirmoy Das <nirmoy.das@amd.com> |
drm/amdgpu: unify BO evicting method in amdgpu_ttm Unify BO evicting functionality for possible memory types in amdgpu_ttm.c. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
5b9581df |
|
06-Oct-2021 |
Nirmoy Das <nirmoy.das@amd.com> |
drm/amdgpu: return early if debugfs is not initialized Check first if debugfs is initialized before creating amdgpu debugfs files. References: https://gitlab.freedesktop.org/drm/amd/-/issues/1686 Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
c8365dbd |
|
30-Sep-2021 |
Christian König <christian.koenig@amd.com> |
drm/amdgpu: revert "Add autodump debugfs node for gpu reset v8" This reverts commit 728e7e0cd61899208e924472b9e641dbeb0775c4. Further discussion reveals that this feature is severely broken and needs to be reverted ASAP. GPU reset can never be delayed by userspace even for debugging or otherwise we can run into in kernel deadlocks. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Nirmoy Das <nirmoy.das@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
81d1bf01 |
|
20-Jul-2021 |
Alex Deucher <alexander.deucher@amd.com> |
drm/amdgpu: add debugfs access to the IP discovery table Useful for debugging and new asic validation. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
0fcfb300 |
|
10-Sep-2021 |
xinhui pan <xinhui.pan@amd.com> |
drm/amdgpu: Fix a race of IB test Direct IB submission should be exclusive. So use write lock. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
62d266b2 |
|
02-Sep-2021 |
Nirmoy Das <nirmoy.das@amd.com> |
drm/amdgpu: cleanup debugfs for amdgpu rings Use debugfs_create_file_size API for creating ring debugfs, and as its a NULL returning API, change the return type for amdgpu_debugfs_ring_init API as well. Also cleanup surrounding code. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
59715cff |
|
02-Sep-2021 |
Nirmoy Das <nirmoy.das@amd.com> |
drm/amdgpu: use IS_ERR for debugfs APIs debugfs APIs returns encoded error so use IS_ERR for checking return value. v2: return PTR_ERR(ent) References: https://gitlab.freedesktop.org/drm/amd/-/issues/1686 Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-By: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
37df9560 |
|
20-Aug-2021 |
Tom St Denis <tom.stdenis@amd.com> |
drm/amd/amdgpu: New debugfs interface for MMIO registers (v5) This new debugfs interface uses an IOCTL interface in order to pass along state information like SRBM and GRBM bank switching. This new interface also allows a full 32-bit MMIO address range which the previous didn't. With this new design we have room to grow the flexibility of the file as need be. (v2): Move read/write to .read/.write, fix style, add comment for IOCTL data structure (v3): C style comments (v4): use u32 in struct and remove offset variable (v5): Drop flag clearing in op function, use 0xFFFFFFFF for broadcast instead of 0x3FF, use mutex for op/ioctl. Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
b04ce53e |
|
02-Sep-2021 |
Nirmoy Das <nirmoy.das@amd.com> |
drm/amdgpu: use IS_ERR for debugfs APIs debugfs APIs returns encoded error so use IS_ERR for checking return value. v2: return PTR_ERR(ent) References: https://gitlab.freedesktop.org/drm/amd/-/issues/1686 Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-By: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
|
#
c530b02f |
|
12-May-2021 |
Jack Zhang <Jack.Zhang1@amd.com> |
drm/amd/amdgpu embed hw_fence into amdgpu_job Why: Previously hw fence is alloced separately with job. It caused historical lifetime issues and corner cases. The ideal situation is to take fence to manage both job and fence's lifetime, and simplify the design of gpu-scheduler. How: We propose to embed hw_fence into amdgpu_job. 1. We cover the normal job submission by this method. 2. For ib_test, and submit without a parent job keep the legacy way to create a hw fence separately. v2: use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is embedded in a job. v3: remove redundant variable ring in amdgpu_job v4: add tdr sequence support for this feature. Add a job_run_counter to indicate whether this job is a resubmit job. v5 add missing handling in amdgpu_fence_enable_signaling Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com> Signed-off-by: Jack Zhang <Jack.Zhang7@hotmail.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed by: Monk Liu <monk.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
391629bd |
|
15-Jun-2021 |
Nirmoy Das <nirmoy.das@amd.com> |
drm/amdgpu: remove amdgpu_vm_pt Page table entries are now in embedded in VM BO, so we do not need struct amdgpu_vm_pt. This patch replaces struct amdgpu_vm_pt with struct amdgpu_vm_bo_base. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
e72d4a8b |
|
20-May-2021 |
Lee Jones <lee.jones@linaro.org> |
drm/amd/amdgpu/amdgpu_debugfs: Fix a couple of misnamed functions Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1004: warning: expecting prototype for amdgpu_debugfs_regs_gfxoff_write(). Prototype was for amdgpu_debugfs_gfxoff_write() instead drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1053: warning: expecting prototype for amdgpu_debugfs_regs_gfxoff_status(). Prototype was for amdgpu_debugfs_gfxoff_read() instead Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
43fb6c19 |
|
02-Mar-2021 |
Kevin Wang <kevin1.wang@amd.com> |
drm/amdgpu: fix parameter error of RREG32_PCIE() in amdgpu_regs_pcie the register offset isn't needed division by 4 to pass RREG32_PCIE() Signed-off-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
7271a5c2 |
|
25-Feb-2021 |
Yang Li <yang.lee@linux.alibaba.com> |
drm/amdgpu: Replace DEFINE_SIMPLE_ATTRIBUTE with DEFINE_DEBUGFS_ATTRIBUTE Fix the following coccicheck warning: ./drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1589:0-23: WARNING: fops_ib_preempt should be defined with DEFINE_DEBUGFS_ATTRIBUTE ./drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1592:0-23: WARNING: fops_sclk_set should be defined with DEFINE_DEBUGFS_ATTRIBUTE Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
98d28ac2 |
|
15-Feb-2021 |
Nirmoy Das <nirmoy.das@amd.com> |
drm/amdgpu: do not use drm middle layer for debugfs Use debugfs API directly instead of drm middle layer. This also includes following debugfs file output changes: 1 amdgpu_evict_vram/amdgpu_evict_gtt output will not contain any braces. e.g. (0) --> 0 2 amdgpu_gpu_recover output will print return value of amdgpu_device_gpu_recover() instead of not so important "gpu recover" message. v2: * checkpatch.pl: use '0444' instead of S_IRUGO. * remove S_IFREG from mode. * remove mode variable. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
373720f7 |
|
14-Feb-2021 |
Nirmoy Das <nirmoy.das@amd.com> |
drm/amd/pm: do not use drm middle layer for debugfs Use debugfs API directly instead of drm middle layer. v2: * checkpatch.pl: use '0444' instead of S_IRUGO. * remove S_IFREG from mode. * remove mode variable. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
afd3a359 |
|
14-Feb-2021 |
Nirmoy Das <nirmoy.das@amd.com> |
drm/amd/display: do not use drm middle layer for debugfs Use debugfs API directly instead of drm middle layer. v2: * checkpatch.pl: use '0444' instead of S_IRUGO. * remove S_IFREG from mode. * remove mode variable. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
88293c03 |
|
10-Feb-2021 |
Nirmoy Das <nirmoy.das@amd.com> |
drm/amdgpu: do not keep debugfs dentry Cleanup unnecessary debugfs dentries and surrounding functions. v3: remove return value check for debugfs_create_file() v2: remove ttm_debugfs_entries array. do not init variables. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
1aa46901 |
|
02-Mar-2021 |
Kevin Wang <kevin1.wang@amd.com> |
drm/amdgpu: fix parameter error of RREG32_PCIE() in amdgpu_regs_pcie the register offset isn't needed division by 4 to pass RREG32_PCIE() Signed-off-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
|
#
ecaafb7b |
|
08-Dec-2020 |
Jinzhou Su <Jinzhou.Su@amd.com> |
drm/amdgpu: Add secure display TA interface Add interface to load, unload, invoke command for secure display TA. v2: Add debugfs interface for secure display TA v3: fix warning in copy_from_user (Alex) Signed-off-by: Jinzhou.Su <Jinzhou.Su@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
6efa4b46 |
|
03-Dec-2020 |
Luben Tuikov <luben.tuikov@amd.com> |
gpu/drm: ring_mirror_list --> pending_list Rename "ring_mirror_list" to "pending_list", to describe what something is, not what it does, how it's used, or how the hardware implements it. This also abstracts the actual hardware implementation, i.e. how the low-level driver communicates with the device it drives, ring, CAM, etc., shouldn't be exposed to DRM. The pending_list keeps jobs submitted, which are out of our control. Usually this means they are pending execution status in hardware, but the latter definition is a more general (inclusive) definition. Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/405573/ Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Christian König <christian.koenig@amd.com>
|
#
8935ff00 |
|
03-Dec-2020 |
Luben Tuikov <luben.tuikov@amd.com> |
drm/scheduler: "node" --> "list" Rename "node" to "list" in struct drm_sched_job, in order to make it consistent with what we see being used throughout gpu_scheduler.h, for instance in struct drm_sched_entity, as well as the rest of DRM and the kernel. Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/403515/ Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Christian König <christian.koenig@amd.com>
|
#
20ed491b |
|
13-Nov-2020 |
Lee Jones <lee.jones@linaro.org> |
drm/amd/amdgpu/amdgpu_debugfs: Demote obvious abuse of kernel-doc formatting Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:308: warning: Function parameter or member 'f' not described in 'amdgpu_debugfs_regs_read' drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:308: warning: Function parameter or member 'buf' not described in 'amdgpu_debugfs_regs_read' drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:308: warning: Function parameter or member 'size' not described in 'amdgpu_debugfs_regs_read' drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:308: warning: Function parameter or member 'pos' not described in 'amdgpu_debugfs_regs_read' drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:317: warning: Function parameter or member 'f' not described in 'amdgpu_debugfs_regs_write' drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:317: warning: Function parameter or member 'buf' not described in 'amdgpu_debugfs_regs_write' drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:317: warning: Function parameter or member 'size' not described in 'amdgpu_debugfs_regs_write' drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:317: warning: Function parameter or member 'pos' not described in 'amdgpu_debugfs_regs_write' Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
f3729f7b |
|
02-Nov-2020 |
Deepak R Varma <mh12gx2825@gmail.com> |
drm/amdgpu/amdgpu: improve code indentation and alignment General code indentation and alignment changes such as replace spaces by tabs or align function arguments as per the coding style guidelines. The patch corrects issues for various amdgpu_*.c files for this driver. Issue reported by checkpatch script. Signed-off-by: Deepak R Varma <mh12gx2825@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
19ae3330 |
|
26-Oct-2020 |
John Clements <john.clements@amd.com> |
drm/amdgpu: added support for psp fw attestation loaded fw can be queried from sys fs interface Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
ff72bc40 |
|
08-Oct-2020 |
Mihir Bhogilal Patel <Mihir.Patel@amd.com> |
drm/amdgpu: Add debugfs entry for printing VM info Create new debugfs entry to print memory info using VM buffer objects. V2: Added Common function for printing BO info. Dump more VM lists for evicted, moved, relocated, invalidated. Removed dumping VM mapped BOs. V3: Fixed coding style comments, renamed print API and variables. V4: Fixed coding style comments. Signed-off-by: Mihir Bhogilal Patel <Mihir.Patel@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
4ce032d6 |
|
01-Oct-2020 |
Christian König <christian.koenig@amd.com> |
drm/ttm: nuke ttm_bo_evict_mm and rename mgr function v3 Make it more clear what the resource manager function does and nuke the wrapper function. v2: nuke the wrapper v3: fix typo in radeon, rebased Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> (v2) Link: https://patchwork.freedesktop.org/patch/393914/
|
#
f7ee1874 |
|
18-Sep-2020 |
Hawking Zhang <Hawking.Zhang@amd.com> |
drm/amdgpu: support indirect access reg outside of mmio bar (v2) support both direct and indirect accessor in unified helper functions. v2: Retire indirect mmio access via mm_index/data Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
4a580877 |
|
23-Aug-2020 |
Luben Tuikov <luben.tuikov@amd.com> |
drm/amdgpu: Get DRM dev from adev by inline-f Add a static inline adev_to_drm() to obtain the DRM device pointer from an amdgpu_device pointer. Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
1348969a |
|
23-Aug-2020 |
Luben Tuikov <luben.tuikov@amd.com> |
drm/amdgpu: drm_device to amdgpu_device by inline-f (v2) Get the amdgpu_device from the DRM device by use of an inline function, drm_to_adev(). The inline function resolves a pointer to struct drm_device to a pointer to struct amdgpu_device. v2: Use a typed visible static inline function instead of an invisible macro. Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
6049db43 |
|
19-Aug-2020 |
Dennis Li <Dennis.Li@amd.com> |
drm/amdgpu: change reset lock from mutex to rw_semaphore clients don't need reset-lock for synchronization when no GPU recovery. v2: change to return the return value of down_read_killable. v3: if GPU recovery begin, VF ignore FLR notification. Reviewed-by: Monk Liu <monk.liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
53b3f8f4 |
|
19-Aug-2020 |
Dennis Li <Dennis.Li@amd.com> |
drm/amdgpu: refine codes to avoid reentering GPU recovery if other threads have holden the reset lock, recovery will fail to try_lock. Therefore we introduce atomic hive->in_reset and adev->in_gpu_reset, to avoid reentering GPU recovery. v2: drop "? true : false" in the definition of amdgpu_in_reset Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
f1403342 |
|
12-Aug-2020 |
Christian König <christian.koenig@amd.com> |
drm/amdgpu: revert "fix system hang issue during GPU reset" The whole approach wasn't thought through till the end. We already had a reset lock like this in the past and it caused the same problems like this one. Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary. This reverts commit df9c8d1aa278c435c30a69b8f2418b4a52fcb929. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
a4322e18 |
|
10-Aug-2020 |
Wenhui Sheng <Wenhui.Sheng@amd.com> |
drm/amdgpu: add debugfs interface for RAP test After amdgpu driver loading successfully, we can use RAP debugfs interface <debugfs_dir>/dri/xxx/rap_test to trigger RAP test. Currently only L0 validate test is supported. v2: refine amdgpu_rap.h Signed-off-by: Wenhui Sheng <Wenhui.Sheng@amd.com> Reviewed-by: Guchun Chen <Guchun.Chen@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
df9c8d1a |
|
08-Jul-2020 |
Dennis Li <Dennis.Li@amd.com> |
drm/amdgpu: fix system hang issue during GPU reset when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover, the atomic adev->in_gpu_reset and hive->in_reset are used to avoid re-entering GPU recovery. During GPU reset and resume, it is unsafe that other threads access GPU, which maybe cause GPU reset failed. Therefore the new rw_semaphore adev->reset_sem is introduced, which protect GPU from being accessed by external threads during recovery. v2: 1. add rwlock for some ioctls, debugfs and file-close function. 2. change to use dqm->is_resetting and dqm_lock for protection in kfd driver. 3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid re-enter GPU recovery for the same GPU hang. v3: 1. change back to use adev->reset_sem to protect kfd callback functions, because dqm_lock couldn't protect all codes, for example: free_mqd must be called outside of dqm_lock; [ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019 [ 1230.177221] Call Trace: [ 1230.178249] dump_stack+0x98/0xd5 [ 1230.179443] amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu] [ 1230.180673] gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu] [ 1230.181882] amdgpu_gart_unbind+0xa9/0xe0 [amdgpu] [ 1230.183098] amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu] [ 1230.184239] ? ttm_bo_put+0x171/0x5f0 [ttm] [ 1230.185394] ttm_tt_unbind+0x21/0x40 [ttm] [ 1230.186558] ttm_tt_destroy.part.12+0x12/0x60 [ttm] [ 1230.187707] ttm_tt_destroy+0x13/0x20 [ttm] [ 1230.188832] ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm] [ 1230.189979] ttm_bo_put+0x1be/0x5f0 [ttm] [ 1230.191230] amdgpu_bo_unref+0x1e/0x30 [amdgpu] [ 1230.192522] amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu] [ 1230.193833] free_mqd+0x25/0x40 [amdgpu] [ 1230.195143] destroy_queue_cpsch+0x1a7/0x270 [amdgpu] [ 1230.196475] pqm_destroy_queue+0x105/0x260 [amdgpu] [ 1230.197819] kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu] [ 1230.199154] kfd_ioctl+0x277/0x500 [amdgpu] [ 1230.200458] ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu] [ 1230.201656] ? tomoyo_file_ioctl+0x19/0x20 [ 1230.202831] ksys_ioctl+0x98/0xb0 [ 1230.204004] __x64_sys_ioctl+0x1a/0x20 [ 1230.205174] do_syscall_64+0x5f/0x250 [ 1230.206339] entry_SYSCALL_64_after_hwframe+0x49/0xbe 2. remove try_lock and introduce atomic hive->in_reset, to avoid re-enter GPU recovery. v4: 1. remove an unnecessary whitespace change in kfd_chardev.c 2. remove comment codes in amdgpu_device.c 3. add more detailed comment in commit message 4. define a wrap function amdgpu_in_reset v5: 1. Fix some style issues. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Suggested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com> Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com> Suggested-by: Luben Tukov <luben.tuikov@amd.com> Signed-off-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
443c7f3c |
|
07-Jul-2020 |
Jinzhou.Su <Jinzhou.Su@amd.com> |
drm/amdgpu: add read amdgpu_gfxoff status in debugfs Add interface for SMU12 device, used by UMR. v2: fix code style Signed-off-by: Jinzhou.Su <Jinzhou.Su@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
d845a205 |
|
09-Jul-2020 |
Jack Xiao <Jack.Xiao@amd.com> |
drm/amdgpu: fix preemption unit test Remove signaled jobs from job list and ensure the job was indeed preempted. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
7bdb0899 |
|
09-Jul-2020 |
Jack Xiao <Jack.Xiao@amd.com> |
drm/amdgpu: fix preemption unit test Remove signaled jobs from job list and ensure the job was indeed preempted. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
e5ef784b |
|
09-Jun-2020 |
Evan Quan <evan.quan@amd.com> |
drm/amd/powerplay: revise calling chain on retrieving frequency range This helps to maintain clear code layers and drop unnecessary parameter. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
c98f31d1 |
|
09-Jun-2020 |
Evan Quan <evan.quan@amd.com> |
drm/amd/powerplay: revise calling chain on setting soft limit This helps to maintain clear code layers and drop unnecessary parameter. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
9eee152a |
|
17-Jun-2020 |
Alex Deucher <alexander.deucher@amd.com> |
drm/amdgpu/debugfs: fix ref count leak when pm_runtime_get_sync fails The call to pm_runtime_get_sync increments the counter even in case of failure, leading to incorrect ref count. In case of failure, decrement the ref count before returning. Acked-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
888e32d7 |
|
17-Jun-2020 |
Chen Tao <chentao107@huawei.com> |
drm/amdgpu/debugfs: fix memory leak when amdgpu_virt_enable_access_debugfs failed Fix memory leak in amdgpu_debugfs_gpr_read not freeing data when amdgpu_virt_enable_access_debugfs failed. Fixes: 95a2f917387a2 ("drm/amdgpu: restrict debugfs register access under SR-IOV") Signed-off-by: Chen Tao <chentao107@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
3e4aeff3 |
|
17-Jun-2020 |
Chen Tao <chentao107@huawei.com> |
drm/amdgpu/debugfs: fix memory leak when pm_runtime_get_sync failed Fix memory leak in amdgpu_debugfs_gpr_read not freeing data when pm_runtime_get_sync failed. Fixes: a9ffe2a983383 ("drm/amdgpu/debugfs: properly handle runtime pm") Signed-off-by: Chen Tao <chentao107@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
728e7e0c |
|
26-Apr-2020 |
Jiange Zhao <Jiange.Zhao@amd.com> |
drm/amdgpu: Add autodump debugfs node for gpu reset v8 When GPU got timeout, it would notify an interested part of an opportunity to dump info before actual GPU reset. A usermode app would open 'autodump' node under debugfs system and poll() for readable/writable. When a GPU reset is due, amdgpu would notify usermode app through wait_queue_head and give it 10 minutes to dump info. After usermode app has done its work, this 'autodump' node is closed. On node closure, amdgpu gets to know the dump is done through the completion that is triggered in release(). There is no write or read callback because necessary info can be obtained through dmesg and umr. Messages back and forth between usermode app and amdgpu are unnecessary. v2: (1) changed 'registered' to 'app_listening' (2) add a mutex in open() to prevent race condition v3 (chk): grab the reset lock to avoid race in autodump_open, rename debugfs file to amdgpu_autodump, provide autodump_read as well, style and code cleanups v4: add 'bool app_listening' to differentiate situations, so that the node can be reopened; also, there is no need to wait for completion when no app is waiting for a dump. v5: change 'bool app_listening' to 'enum amdgpu_autodump_state' add 'app_state_mutex' for race conditions: (1)Only 1 user can open this file node (2)wait_dump() can only take effect after poll() executed. (3)eliminated the race condition between release() and wait_dump() v6: removed 'enum amdgpu_autodump_state' and 'app_state_mutex' removed state checking in amdgpu_debugfs_wait_dump Improve on top of version 3 so that the node can be reopened. v7: move reinit_completion into open() so that only one user can open it. v8: remove complete_all() from amdgpu_debugfs_wait_dump(). Signed-off-by: Jiange Zhao <Jiange.Zhao@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
95a2f917 |
|
07-Apr-2020 |
Yintian Tao <yttao@amd.com> |
drm/amdgpu: restrict debugfs register access under SR-IOV Under bare metal, there is no more else to take care of the GPU register access through MMIO. Under Virtualization, to access GPU register is implemented through KIQ during run-time due to world-switch. Therefore, under SR-IOV user can only access debugfs to r/w GPU registers when meets all three conditions below. - amdgpu_gpu_recovery=0 - TDR happened - in_gpu_reset=0 v2: merge amdgpu_virt_can_access_debugfs() into amdgpu_virt_enable_access_debugfs() v3: drop ret variable in amdgpu_virt_enable_access_debugfs() and directly return result Signed-off-by: Yintian Tao <yttao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
2e0cc4d4 |
|
10-Mar-2020 |
Monk Liu <Monk.Liu@amd.com> |
drm/amdgpu: revise RLCG access path what changed: 1)provide new implementation interface for the rlcg access path 2)put SQ_CMD/SQ_IND_INDEX to GFX9 RLCG path to let debugfs's reg_op function can access reg that need RLCG path help now even debugfs's reg_op can used to dump wave. tested-by: Monk Liu <monk.liu@amd.com> tested-by: Zhou pengju <pengju.zhou@amd.com> Signed-off-by: Zhou pengju <pengju.zhou@amd.com> Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
6397ec58 |
|
10-Mar-2020 |
Tom St Denis <tom.stdenis@amd.com> |
drm/amd/amdgpu: Fix GPR read from debugfs (v2) The offset into the array was specified in bytes but should be in terms of 32-bit words. Also prevent large reads that would also cause a buffer overread. v2: Read from correct offset from internal storage buffer. Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
17cb04f2 |
|
10-Mar-2020 |
Stanley.Yang <Stanley.Yang@amd.com> |
drm/amdgpu: use amdgpu_ras.h in amdgpu_debugfs.c include amdgpu_ras.h head file instead of use extern ras_debugfs_create_all function Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
5bbc6604 |
|
10-Mar-2020 |
Tom St Denis <tom.stdenis@amd.com> |
drm/amd/amdgpu: Fix GPR read from debugfs (v2) The offset into the array was specified in bytes but should be in terms of 32-bit words. Also prevent large reads that would also cause a buffer overread. v2: Read from correct offset from internal storage buffer. Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
|
#
204eaac6 |
|
05-Mar-2020 |
Tao Zhou <tao.zhou1@amd.com> |
drm/amdgpu: call ras_debugfs_create_all in debugfs_init and remove each ras IP's own debugfs creation this is required to fix ras when the driver does not use the drm load and unload callbacks due to ordering issues with the drm device node. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
0cf64555 |
|
26-Feb-2020 |
Chengming Gui <Jack.Gui@amd.com> |
drm/amdgpu: Add debugfs interface to set arbitrary sclk for navi14 (v2) add debugfs interface amdgpu_force_sclk to set arbitrary sclk for navi14 v2: Add lock Signed-off-by: Chengming Gui <Jack.Gui@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
d2790e10 |
|
27-Feb-2020 |
Yintian Tao <yttao@amd.com> |
drm/amdgpu: no need to clean debugfs at amdgpu drm_minor_unregister will invoke drm_debugfs_cleanup to clean all the child node under primary minor node. We don't need to invoke amdgpu_debugfs_fini and amdgpu_debugfs_regs_cleanup to clean agian. Otherwise, it will raise the NULL pointer like below. [ 45.046029] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 [ 45.047256] PGD 0 P4D 0 [ 45.047713] Oops: 0002 [#1] SMP PTI [ 45.048198] CPU: 0 PID: 2796 Comm: modprobe Tainted: G W OE 4.18.0-15-generic #16~18.04.1-Ubuntu [ 45.049538] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 45.050651] RIP: 0010:down_write+0x1f/0x40 [ 45.051194] Code: 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb e8 ce d9 ff ff 48 ba 01 00 00 00 ff ff ff ff 48 89 d8 <f0> 48 0f c1 10 85 d2 74 05 e8 53 1c ff ff 65 48 8b 04 25 00 5c 01 [ 45.053702] RSP: 0018:ffffad8f4133fd40 EFLAGS: 00010246 [ 45.054384] RAX: 00000000000000a8 RBX: 00000000000000a8 RCX: ffffa011327dd814 [ 45.055349] RDX: ffffffff00000001 RSI: 0000000000000001 RDI: 00000000000000a8 [ 45.056346] RBP: ffffad8f4133fd48 R08: 0000000000000000 R09: ffffffffc0690a00 [ 45.057326] R10: ffffad8f4133fd58 R11: 0000000000000001 R12: ffffa0113cff0300 [ 45.058266] R13: ffffa0113c0a0000 R14: ffffffffc0c02a10 R15: ffffa0113e5c7860 [ 45.059221] FS: 00007f60d46f9540(0000) GS:ffffa0113fc00000(0000) knlGS:0000000000000000 [ 45.060809] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 45.061826] CR2: 00000000000000a8 CR3: 0000000136250004 CR4: 00000000003606f0 [ 45.062913] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 45.064404] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 45.065897] Call Trace: [ 45.066426] debugfs_remove+0x36/0xa0 [ 45.067131] amdgpu_debugfs_ring_fini+0x15/0x20 [amdgpu] [ 45.068019] amdgpu_debugfs_fini+0x2c/0x50 [amdgpu] [ 45.068756] amdgpu_pci_remove+0x49/0x70 [amdgpu] [ 45.069439] pci_device_remove+0x3e/0xc0 [ 45.070037] device_release_driver_internal+0x18a/0x260 [ 45.070842] driver_detach+0x3f/0x80 [ 45.071325] bus_remove_driver+0x59/0xd0 [ 45.071850] driver_unregister+0x2c/0x40 [ 45.072377] pci_unregister_driver+0x22/0xa0 [ 45.073043] amdgpu_exit+0x15/0x57c [amdgpu] [ 45.073683] __x64_sys_delete_module+0x146/0x280 [ 45.074369] do_syscall_64+0x5a/0x120 [ 45.074916] entry_SYSCALL_64_after_hwframe+0x44/0xa9 v2: remove all debugfs cleanup/fini code at amdgpu v3: squash in unused variable removal Signed-off-by: Yintian Tao <yttao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
d090e7db |
|
25-Feb-2020 |
Alex Deucher <alexander.deucher@amd.com> |
drm/amdgpu/display: move debugfs init into core amdgpu debugfs (v2) In order to remove the load and unload drm callbacks, we need to reorder the init sequence to move all the drm debugfs file handling. Do this for display. v2: add config guard for DC Tested-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Harry Wentland <harry.wentland@amd.com> (v1) Acked-by: Christian König <christian.koenig@amd.com> (v1) Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
fd23cfcc |
|
25-Feb-2020 |
Alex Deucher <alexander.deucher@amd.com> |
drm/amdgpu/ring: move debugfs init into core amdgpu debugfs In order to remove the load and unload drm callbacks, we need to reorder the init sequence to move all the drm debugfs file handling. Do this for rings. Tested-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
cd9e29e7 |
|
04-Feb-2020 |
Alex Deucher <alexander.deucher@amd.com> |
drm/amdgpu/firmware: move debugfs init into core amdgpu debugfs In order to remove the load and unload drm callbacks, we need to reorder the init sequence to move all the drm debugfs file handling. Do this for firmware. Tested-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
f9d64e6c |
|
04-Feb-2020 |
Alex Deucher <alexander.deucher@amd.com> |
drm/amdgpu/regs: move debugfs init into core amdgpu debugfs In order to remove the load and unload drm callbacks, we need to reorder the init sequence to move all the drm debugfs file handling. Do this for register access files. Tested-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
3f5cea67 |
|
04-Feb-2020 |
Alex Deucher <alexander.deucher@amd.com> |
drm/amdgpu/gem: move debugfs init into core amdgpu debugfs In order to remove the load and unload drm callbacks, we need to reorder the init sequence to move all the drm debugfs file handling. Do this for gem. Tested-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
24038d58 |
|
03-Feb-2020 |
Alex Deucher <alexander.deucher@amd.com> |
drm/amdgpu/fence: move debugfs init into core amdgpu debugfs In order to remove the load and unload drm callbacks, we need to reorder the init sequence to move all the drm debugfs file handling. Do this for fence handling. Tested-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
15997544 |
|
03-Feb-2020 |
Alex Deucher <alexander.deucher@amd.com> |
drm/amdgpu/sa: move debugfs init into core amdgpu debugfs In order to remove the load and unload drm callbacks, we need to reorder the init sequence to move all the drm debugfs file handling. Do this for SA (sub allocator). Tested-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
a4c5b1bb |
|
03-Feb-2020 |
Alex Deucher <alexander.deucher@amd.com> |
drm/amdgpu/pm: move debugfs init into core amdgpu debugfs In order to remove the load and unload drm callbacks, we need to reorder the init sequence to move all the drm debugfs file handling. Do this for pm. Tested-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
c5820361 |
|
03-Feb-2020 |
Alex Deucher <alexander.deucher@amd.com> |
drm/amdgpu/ttm: move debugfs init into core amdgpu debugfs In order to remove the load and unload drm callbacks, we need to reorder the init sequence to move all the drm debugfs file handling. Do this for ttm. Tested-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
923ffa6b |
|
03-Feb-2020 |
Alex Deucher <alexander.deucher@amd.com> |
drm/amdgpu: rename amdgpu_debugfs_preempt_cleanup to amdgpu_debugfs_fini. It will be used for other things in the future. Tested-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
669e2f91 |
|
21-Feb-2020 |
Tom St Denis <tom.stdenis@amd.com> |
drm/amd/amdgpu: Add gfxoff debugfs entry Write a 32-bit value of zero to disable GFXOFF and write a 32-bit value of non-zero to enable GFXOFF. Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
a9ffe2a9 |
|
10-Jan-2020 |
Alex Deucher <alexander.deucher@amd.com> |
drm/amdgpu/debugfs: properly handle runtime pm If driver debugfs files are accessed, power up the GPU when necessary. Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
c5b2bd5d |
|
23-Dec-2019 |
zhengbin <zhengbin13@huawei.com> |
drm/amdgpu: use true, false for bool variable in amdgpu_debugfs.c Fixes coccicheck warning: drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:132:2-10: WARNING: Assignment of 0/1 to bool variable drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:140:2-10: WARNING: Assignment of 0/1 to bool variable drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:142:13-21: WARNING: Assignment of 0/1 to bool variable Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
a28fda31 |
|
05-Nov-2019 |
Andrey Grodzovsky <andrey.grodzovsky@amd.com> |
drm/amdgpu: Avoid accidental thread reactivation. Problem: During GPU reset we call the GPU scheduler to suspend it's thread, those two functions in amdgpu also suspend and resume the sceduler for their needs but this can collide with GPU reset in progress and accidently restart a suspended thread before time. Fix: Serialize with GPU reset. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
44b582b3 |
|
21-Oct-2019 |
Geert Uytterhoeven <geert+renesas@glider.be> |
drm/amdgpu: Remove superfluous void * cast in debugfs_create_file() call There is no need to cast a typed pointer to a void pointer when calling a function that accepts the latter. Remove it, as the cast prevents further compiler checks. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
20323246 |
|
03-Sep-2019 |
zhong jiang <zhongjiang@huawei.com> |
drm/amdgpu: remove the redundant null checks debugfs_remove and kfree has taken the null check in account. hence it is unnecessary to check it. Just remove the condition. No functional change. This issue was detected by using the Coccinelle software. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
929e571c |
|
27-Jul-2019 |
Wang Xiayang <xywang.sjtu@sjtu.edu.cn> |
drm/amdgpu: fix a potential information leaking bug Coccinelle reports a path that the array "data" is never initialized. The path skips the checks in the conditional branches when either of callback functions, read_wave_vgprs and read_wave_sgprs, is not registered. Later, the uninitialized "data" array is read in the while-loop below and passed to put_user(). Fix the path by allocating the array with kcalloc(). The patch is simplier than adding a fall-back branch that explicitly calls memset(data, 0, ...). Also it does not need the multiplication 1024*sizeof(*data) as the size parameter for memset() though there is no risk of integer overflow. Signed-off-by: Wang Xiayang <xywang.sjtu@sjtu.edu.cn> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
1a2c29bc |
|
27-Jul-2019 |
Wang Xiayang <xywang.sjtu@sjtu.edu.cn> |
drm/amdgpu: fix a potential information leaking bug Coccinelle reports a path that the array "data" is never initialized. The path skips the checks in the conditional branches when either of callback functions, read_wave_vgprs and read_wave_sgprs, is not registered. Later, the uninitialized "data" array is read in the while-loop below and passed to put_user(). Fix the path by allocating the array with kcalloc(). The patch is simplier than adding a fall-back branch that explicitly calls memset(data, 0, ...). Also it does not need the multiplication 1024*sizeof(*data) as the size parameter for memset() though there is no risk of integer overflow. Signed-off-by: Wang Xiayang <xywang.sjtu@sjtu.edu.cn> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
88891430 |
|
16-Jul-2019 |
Tom St Denis <tom.stdenis@amd.com> |
drm/amd/amdgpu: Fix offset for vmid selection in debugfs interface The register debugfs interface was using the wrong bitmask for vmid selection for GFX_CNTL. Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
0fa4246e |
|
12-Jul-2019 |
Tom St Denis <tom.stdenis@amd.com> |
drm/amd/amdgpu: Add VMID to SRBM debugfs bank selection Add 5 bits to the offset for SRBM selection to handle VMIDs. Also update the select_me_pipe_q() callback to also select VMID. Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
80f8fb91 |
|
22-Jan-2019 |
Jack Xiao <Jack.Xiao@amd.com> |
drm/amdgpu: mark the partial job as preempted in mcbp unit test In mcbp unit test, the test should detect the preempted job which may be a partial execution ib and mark it as preempted; so that the gfx block can correctly generate PM4 frame. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
6698a3d0 |
|
20-Jun-2019 |
Jack Xiao <Jack.Xiao@amd.com> |
drm/amdgpu: add mcbp unit test in debugfs (v3) The MCBP unit test is used to test the functionality of MCBP. It emualtes to send preemption request and resubmit the unfinished jobs. v2: squash in fixes (Alex) v3: squash in memory leak fix (Jack) Acked-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
fdf2f6c5 |
|
09-Jun-2019 |
Sam Ravnborg <sam@ravnborg.org> |
drm/amd: drop use of drmP.h in amdgpu/amdgpu* Drop use of drmP.h in all files named amdgpu* in drm/amd/amdgpu/ Fix fallout. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20190609220757.10862-10-sam@ravnborg.org
|
#
4a5a2de6 |
|
10-Jan-2019 |
Kevin Wang <Kevin1.Wang@amd.com> |
drm/amd/powerplay: implement sysfs of amdgpu_get_busy_percent for smu11 add interface amdgpu_get_busy_percent for smu11 v2: convert data pointer type to uint32_t *. Signed-off-by: Kevin Wang <Kevin1.Wang@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
7a5e0d9a |
|
11-Feb-2019 |
Alex Deucher <alexander.deucher@amd.com> |
drm/amdgpu: don't clamp debugfs register access to the BAR size This prevents us from accessing extended registers in tools like umr. The register access functions already check if the offset is beyond the BAR size and use the indirect accessors with locking so this is safe. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
d344b21b |
|
14-Jun-2018 |
Dan Carpenter <dan.carpenter@oracle.com> |
drm/amd/amdgpu: Fix debugfs error handling The error handling is wrong and "ent" could be NULL we when dereference it to get "ent->d_inode". The thing is that normally debugfs_create_file() is not supposed to require (or have) any error handling. That function does return error pointers if debugfs is turned off but we know it's enable here. When it's enabled, then it returns NULL on error. So what I did was I stripped out all the error handling except around the i_size_write(). I could have just used a NULL check instead of an IS_ERR_OR_NULL() but I figured this was more clear because that way you don't have to look at the surrounding code to see whether debugfs is enabled or not. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
7e4237db |
|
02-May-2018 |
Tom St Denis <tom.stdenis@amd.com> |
drm/amd/amdgpu: Add some documentation to the debugfs entries Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
b13aa109 |
|
26-Mar-2018 |
Rex Zhu <Rex.Zhu@amd.com> |
drm/amdgpu: Use dpm_enabled as dpm state flag driver will set dpm_enabled to true only when module parameter amdgpu_dpm not equal to 0 and smu hw initialize successfully. Reviewed-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
f7a9ee81 |
|
29-Mar-2018 |
Andrey Grodzovsky <andrey.grodzovsky@amd.com> |
drm/amdgpu: Add support for SRBM selection v3 Also remove code duplication in write and read regs functions. This also fixes potential missing unlock in amdgpu_debugfs_regs_write in case get_user would fail. v2: Add SRBM mutex locking. v3: Fix TO counter and fix comment location. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
87e90c76 |
|
19-Feb-2018 |
Christian König <christian.koenig@amd.com> |
drm/amdgpu: add amdgpu_evict_gtt debugfs entry Allow evicting all BOs from the GTT domain. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexdeucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
75758255 |
|
14-Dec-2017 |
Alex Deucher <alexander.deucher@amd.com> |
drm/amdgpu: move debugfs functions to their own file amdgpu_device.c was getting pretty cluttered. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|