Cross Reference: /linux-master/drivers/gpu/drm/amd/amdgpu/amdgpu

Revision	Date	Author	Comments
# f88e295e	19-Apr-2023	Christian König <christian.koenig@amd.com>	drm/amdgpu: add VM generation token Instead of using the VRAM lost counter add a 64bit token which indicates if a context or job is still valid to use. Should the VRAM be lost or the page tables need re-creation the token will change indicating that userspace needs to act and re-create the contexts and re-submit the work. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# ac928705	09-Mar-2023	Christian König <christian.koenig@amd.com>	drm/amdgpu: add gfx shadow CS IOCTL support Add support for submitting the shadow update packet when submitting an IB. Needed for MCBP on GFX11. v2: update API for CSA (Alex) v3: fix ordering; SET_Q_PREEMPTION_MODE most come before COND_EXEC Add missing check for AMDGPU_CHUNK_ID_CP_GFX_SHADOW in amdgpu_cs_pass1() Only initialize shadow on first use (Alex) v4: Pass parameters rather than job to new ring callback (Alex) v5: squash in change to call SET_Q_PREEMPTION_MODE/COND_EXEC before RELEASE_MEM to complete the UMDs use of the shadow (Alex) Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 5f3c40e9	25-Nov-2022	Christian König <christian.koenig@amd.com>	drm/amdgpu: cleanup SPM support a bit This should probably not access job->vm and also emit the SPM switch under the conditional execute. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 56b0989e	25-Nov-2022	Christian König <christian.koenig@amd.com>	drm/amdgpu: fix GDS/GWS/OA switch handling Bas pointed out that this isn't working as expected and could cause crashes. Fix the handling by storing the marker that a switch is needed inside the job instead. Reported-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 1728baa7	29-Sep-2022	Christian König <christian.koenig@amd.com>	drm/amdgpu: use scheduler dependencies for CS Entirely remove the sync obj in the job. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-11-christian.koenig@amd.com
# 1b2d5eda	29-Sep-2022	Christian König <christian.koenig@amd.com>	drm/amdgpu: move explicit sync check into the CS This moves the memory allocation out of the critical code path. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-8-christian.koenig@amd.com
# f7d66fb2	28-Sep-2022	Christian König <christian.koenig@amd.com>	drm/amdgpu: cleanup scheduler job initialization v2 Init the DRM scheduler base class while allocating the job. This makes the whole handling much more cleaner. v2: fix coding style Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-7-christian.koenig@amd.com
# 68ce8b24	02-Mar-2022	Christian König <christian.koenig@amd.com>	drm/amdgpu: add gang submit backend v2 Allows submitting jobs as gang which needs to run on multiple engines at the same time. Basic idea is that we have a global gang submit fence representing when the gang leader is finally pushed to run on the hardware last. Jobs submitted as gang are never re-submitted in case of a GPU reset since this won't work and will just deadlock the hardware immediately again. v2: fix logic inversion, improve documentation, fix rcu Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# c2b08e7a	05-Sep-2022	Christian König <christian.koenig@amd.com>	drm/amdgpu: move entity selection and job init earlier during CS Initialize the entity for the CS and scheduler job much earlier. v2: fix job initialisation order and use correct scheduler instance Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 736ec9fa	01-Mar-2022	Christian König <christian.koenig@amd.com>	drm/amdgpu: move setting the job resources Move setting the job resources into amdgpu_job.c Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# f6a3f660	12-Jul-2022	Andrey Grodzovsky <andrey.grodzovsky@amd.com>	drm/amdgpu: Get rid of amdgpu_job->external_hw_fence This is a follow-up cleanup to [1]. See bellow refcount balancing for calling amdgpu_job_submit_direct after this cleanup as far as I calculated. amdgpu_fence_emit dma_fence_init 1 dma_fence_get(fence) 2 rcu_assign_pointer(ptr, dma_fence_get(fence) 3 ---> amdgpu_job_submit_direct completes before fence signaled amdgpu_sa_bo_free (sa_bo)->fence = dma_fence_get(fence) 4 amdgpu_job_free dma_fence_put 3 amdgpu_vcn_enc_get_destroy_msg fence = dma_fence_get(f) 4 dma_fence_put(f); 3 amdgpu_vcn_enc_ring_test_ib dma_fence_put(fence) 2 amdgpu_fence_process dma_fence_put 1 amdgpu_sa_bo_remove_locked dma_fence_put 0 ---> amdgpu_job_submit_direct completes after fence signaled amdgpu_fence_process dma_fence_put 2 amdgpu_job_free dma_fence_put 1 amdgpu_vcn_enc_get_destroy_msg fence = dma_fence_get(f) 2 dma_fence_put(f); 1 amdgpu_vcn_enc_ring_test_ib dma_fence_put(fence) 0 [1] - https://patchwork.kernel.org/project/dri-devel/cover/20220624180955.485440-1-andrey.grodzovsky@amd.com/ Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 6103b2f2	01-Mar-2022	Christian König <christian.koenig@amd.com>	drm/amdgpu: properly embed the IBs into the job We now have standard macros for that. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# a190f8dc	21-Feb-2022	Christian König <christian.koenig@amd.com>	drm/amdgpu: header cleanup No function change, just move a bunch of definitions from amdgpu.h into separate header files. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# c530b02f	12-May-2021	Jack Zhang <Jack.Zhang1@amd.com>	drm/amd/amdgpu embed hw_fence into amdgpu_job Why: Previously hw fence is alloced separately with job. It caused historical lifetime issues and corner cases. The ideal situation is to take fence to manage both job and fence's lifetime, and simplify the design of gpu-scheduler. How: We propose to embed hw_fence into amdgpu_job. 1. We cover the normal job submission by this method. 2. For ib_test, and submit without a parent job keep the legacy way to create a hw fence separately. v2: use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is embedded in a job. v3: remove redundant variable ring in amdgpu_job v4: add tdr sequence support for this feature. Add a job_run_counter to indicate whether this job is a resubmit job. v5 add missing handling in amdgpu_fence_enable_signaling Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com> Signed-off-by: Jack Zhang <Jack.Zhang7@hotmail.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed by: Monk Liu <monk.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 0bb5d5b0	22-Apr-2020	Luben Tuikov <luben.tuikov@amd.com>	drm/amdgpu: Move to a per-IB secure flag (TMZ) Move from a per-CS secure flag (TMZ) to a per-IB secure flag. Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# cb5fae14	08-Aug-2019	Huang Rui <ray.huang@amd.com>	drm/amdgpu: job is secure iff CS is secure (v5) Mark a job as secure, if and only if the command submission flag has the secure flag set. v2: fix the null job pointer while in vmid 0 submission. v3: Context --> Command submission. v4: filling cs parser with cs->in.flags v5: move the job secure flag setting out of amdgpu_cs_submit() Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# c8e42d57	25-Mar-2020	xinhui pan <xinhui.pan@amd.com>	drm/amdgpu: implement more ib pools (v2) We have three ib pools, they are normal, VM, direct pools. Any jobs which schedule IBs without dependence on gpu scheduler should use DIRECT pool. Any jobs schedule direct VM update IBs should use VM pool. Any other jobs use NORMAL pool. v2: squash in coding style fix Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 971fe555	16-Dec-2019	Christian König <christian.koenig@amd.com>	drm/amdgpu: drop amdgpu_job.owner Entirely unused. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 7c6e68c7	13-Sep-2019	Andrey Grodzovsky <andrey.grodzovsky@amd.com>	drm/amdgpu: Avoid HW GPU reset for RAS. Problem: Under certain conditions, when some IP bocks take a RAS error, we can get into a situation where a GPU reset is not possible due to issues in RAS in SMU/PSP. Temporary fix until proper solution in PSP/SMU is ready: When uncorrectable error happens the DF will unconditionally broadcast error event packets to all its clients/slave upon receiving fatal error event and freeze all its outbound queues, err_event_athub interrupt will be triggered. In such case and we use this interrupt to issue GPU reset. THe GPU reset code is modified for such case to avoid HW reset, only stops schedulers, deatches all in progress and not yet scheduled job's fences, set error code on them and signals. Also reject any new incoming job submissions from user space. All this is done to notify the applications of the problem. v2: Extract amdgpu_amdkfd_pre/post_reset from amdgpu_device_lock/unlock_adev Move amdgpu_job_stop_all_jobs_on_sched to amdgpu_job.c Remove print param from amdgpu_ras_query_error_count v3: Update based on prevoius bug fixing patch to properly call amdgpu_amdkfd_pre_reset for other XGMI hive memebers. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# d8780dc7	17-Jan-2019	Jack Xiao <Jack.Xiao@amd.com>	drm/amdgpu: add ib preemption status in amdgpu_job (v2) Add ib preemption status in amdgpu_job, so that ring level function can detect preemption and program for resuming it. v2: squash in fix to restore job->preamble_status back to status value (Jack) Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 34955e03	23-Oct-2018	Rex Zhu <Rex.Zhu@amd.com>	drm/amdgpu: Modify the argument of emit_ib interface use the point of struct amdgpu_job as the function argument instand of vmid, so the other members of struct amdgpu_job can be visit in emit_ib function. v2: add a wrapper for getting the VMID add the job before the ib on the parameter list. v3: refine the wrapper name Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# a1917b73	13-Jul-2018	Christian König <christian.koenig@amd.com>	drm/amdgpu: remove job->adev (v2) We can get that from the ring. v2: squash in "drm/amdgpu: always initialize job->base.sched" (Alex) Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com> Acked-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# ee913fd9	13-Jul-2018	Christian König <christian.koenig@amd.com>	drm/amdgpu: add amdgpu_job_submit_direct helper Make sure that we properly initialize at least the sched member. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com> Acked-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 3320b8d2	13-Jul-2018	Christian König <christian.koenig@amd.com>	drm/amdgpu: remove job->ring We can easily get that from the scheduler. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com> Acked-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 0e28b10f	13-Jul-2018	Christian König <christian.koenig@amd.com>	drm/amdgpu: remove ring parameter from amdgpu_job_submit We know the ring through the entity anyway. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com> Acked-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# eb3961a5	13-Jul-2018	Christian König <christian.koenig@amd.com>	drm/amdgpu: remove fence context from the job Can be obtained directly from the fence as well. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com> Acked-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 050d9d43	13-Jul-2018	Christian König <christian.koenig@amd.com>	drm/amdgpu: cleanup job header Move job related defines, structure and function declarations to amdgpu_job.h Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com> Acked-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>