Cross Reference: /linux-master/drivers/gpu/drm/amd/amdgpu/amdgpu

History log of /linux-master/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
Revision	Date	Author	Comments
# 6fef2d4c	10-Apr-2024	xinhui pan <xinhui.pan@amd.com>	drm/amdgpu: validate the parameters of bo mapping operations more clearly Verify the parameters of amdgpu_vm_bo_(map/replace_map/clearing_mappings) in one common place. Fixes: dc54d3d1744d ("drm/amdgpu: implement AMDGPU_VA_OP_CLEAR v2") Cc: stable@vger.kernel.org Reported-by: Vlad Stolyarov <hexed@google.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# c68cbbfd	15-Aug-2023	Christian König <christian.koenig@amd.com>	drm/amdgpu: cleanup conditional execution First of all calculating the number of dw to patch into a conditional execution is not something HW generation specific. This is just standard ring buffer calculations. While at it also reduce the BUG_ON() into WARN_ON(). Then instead of a random bit pattern use 0 as default value for the number of dw skipped, this way it's not mandatory any more to patch the conditional execution. And last make the address to check a parameter of the conditional execution instead of getting this from the ring. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# b8f67b9d	18-Jan-2024	Shashank Sharma <shashank.sharma@amd.com>	drm/amdgpu: change vm->task_info handling This patch changes the handling and lifecycle of vm->task_info object. The major changes are: - vm->task_info is a dynamically allocated ptr now, and its uasge is reference counted. - introducing two new helper funcs for task_info lifecycle management - amdgpu_vm_get_task_info: reference counts up task_info before returning this info - amdgpu_vm_put_task_info: reference counts down task_info - last put to task_info() frees task_info from the vm. This patch also does logistical changes required for existing usage of vm->task_info. V2: Do not block all the prints when task_info not found (Felix) V3: Fixed review comments from Felix - Fix wrong indentation - No debug message for -ENOMEM - Add NULL check for task_info - Do not duplicate the debug messages (ti vs no ti) - Get first reference of task_info in vm_init(), put last in vm_fini() V4: Fixed review comments from Felix - fix double reference increment in create_task_info - change amdgpu_vm_get_task_info_pasid - additional changes in amdgpu_gem.c while porting Cc: Christian Koenig <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# feb13f52	28-Feb-2024	Jesse Zhang <Jesse.Zhang@amd.com>	Revert "drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute" for Raven fix the issue: "amdgpu: Failed to create process VM object". [Why]when amdgpu initialized, seq64 do mampping and update bo mapping in vm page table. But when clifo run. It also initializes a vm for a process device through the function kfd_process_device_init_vm and ensure the root PD is clean through the function amdgpu_vm_pt_is_root_clean. So they have a conflict, and clinfo always failed. v1: - remove all the pte_supports_ats stuff from the amdgpu_vm code (Felix) Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# b5387349	11-Jan-2024	YuanShang <YuanShang.Mao@amd.com>	drm/amd/amdgpu: Update RLC_SPM_MC_CNT by ring wreg in guest Submit command of wreg in GFX and COMPUTE ring to update RLC_SPM_MC_CNT in guest machine during runtime. Signed-off-by: YuanShang <YuanShang.Mao@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 5394fb2a	15-Jan-2024	Felix Kuehling <felix.kuehling@amd.com>	drm/amdgpu: Remove unnecessary NULL check A static checker pointed out, that bo_va->base.bo was already derefenced earlier in the same scope. Therefore this check is unnecessary here. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: 50661eb1a2c8 ("drm/amdgpu: Auto-validate DMABuf imports in compute VMs") Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 50661eb1	03-Jan-2024	Felix Kuehling <felix.kuehling@amd.com>	drm/amdgpu: Auto-validate DMABuf imports in compute VMs DMABuf imports in compute VMs are not wrapped in a kgd_mem object on the process_info->kfd_bo_list. There is no explicit KFD API call to validate them or add eviction fences to them. This patch automatically validates and fences dymanic DMABuf imports when they are added to a compute VM. Revalidation after evictions is handled in the VM code. v2: * Renamed amdgpu_vm_validate_evicted_bos to amdgpu_vm_validate * Eliminated evicted_user state, use evicted state for VM BOs and user BOs * Fixed and simplified amdgpu_vm_fence_imports, depends on reserved BOs * Moved dma_resv_reserve_fences for amdgpu_vm_fence_imports into amdgpu_vm_validate, outside the vm->status_lock * Added dummy version of amdgpu_amdkfd_bo_validate_and_fence for builds without KFD v4: Eliminate amdgpu_vm_fence_imports. It's not needed because the reservation with its fences is shared with the export, as long as all imports are from KFD, with the exports already reserved, validated and fenced by the KFD restore worker. v5: Reintroduced separate evicted_user state to simplify the state machine and CS error handling when amdgpu_vm_validate is called without a ticket. Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 87825c86	18-Dec-2023	ZhenGuo Yin <zhenguo.yin@amd.com>	drm/amdgpu: re-create idle bo's PTE during VM state machine reset Idle bo's PTE needs to be re-created when resetting VM state machine. Set idle bo's vm_bo as moved to mark it as invalid. Fixes: 55bf196f60df ("drm/amdgpu: reset VM when an error is detected") Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 699d3929	11-Nov-2023	Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>	drm/amdgpu: Add function parameter 'xcc_mask' not described in 'amdgpu_vm_flush_compute_tlb' Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1373: warning: Function parameter or member 'xcc_mask' not described in 'amdgpu_vm_flush_compute_tlb' Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 94e2dae0	24-Feb-2023	Felix Kuehling <Felix.Kuehling@amd.com>	drm/amdkfd: Move TLB flushing logic into amdgpu This will make it possible for amdgpu GEM ioctls to flush TLBs on compute VMs. This removes VMID-based TLB flushing and always uses PASID-based flushing. This still works because it scans the VMID-PASID mapping registers to find the right VMID. It's only slightly less efficient. This is not a production use case. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 4a0057af	18-Dec-2023	ZhenGuo Yin <zhenguo.yin@amd.com>	drm/amdgpu: re-create idle bo's PTE during VM state machine reset Idle bo's PTE needs to be re-created when resetting VM state machine. Set idle bo's vm_bo as moved to mark it as invalid. Fixes: 55bf196f60df ("drm/amdgpu: reset VM when an error is detected") Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 8473bfdc	31-Oct-2023	Christian König <christian.koenig@amd.com>	drm/amdgpu: fix error handling in amdgpu_vm_init When clearing the root PD fails we need to properly release it again. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
# 25650307	31-Oct-2023	Felix Kuehling <Felix.Kuehling@amd.com>	drm/amdgpu: Fix possible null pointer dereference mem = bo->tbo.resource may be NULL in amdgpu_vm_bo_update. Fixes: 180253782038 ("drm/ttm: stop allocating dummy resources during BO creation") Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
# 5a104cb9	16-Mar-2022	Felix Kuehling <Felix.Kuehling@amd.com>	drm/amdkfd: Improve amdgpu_vm_handle_moved Let amdgpu_vm_handle_moved update all BO VA mappings of BOs reserved by the caller. This will be useful for handling extra BO VA mappings in KFD VMs that are managed through the render node API. v2: rebase against drm_exec changes (Alex) Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 142262a1	12-Oct-2023	David Francis <David.Francis@amd.com>	drm/amdgpu: Add EXT_COHERENT support for APU and NUMA systems On gfx943 APU, EXT_COHERENT should give MTYPE_CC for local and MTYPE_UC for nonlocal memory. On NUMA systems, local memory gets the local mtype, set by an override callback. If EXT_COHERENT is set, memory will be set as MTYPE_UC by default, with local memory MTYPE_CC. Add an option in the override function for this case, and add a check to ensure it is not used on UNCACHED memory. V2: Combined APU and NUMA code into one patch V3: Fixed a potential nullptr in amdgpu_vm_bo_update Signed-off-by: David Francis <David.Francis@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# e6f85887	17-Oct-2023	Felix Kuehling <Felix.Kuehling@amd.com>	drm/amdgpu: Fix possible null pointer dereference abo->tbo.resource may be NULL in amdgpu_vm_bo_update. Fixes: 180253782038 ("drm/ttm: stop allocating dummy resources during BO creation") Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 7d3f1d76	26-Jul-2023	Alex Deucher <alexander.deucher@amd.com>	drm/amdgpu: refine fault cache updates Don't update the fault cache if status is 0. In the multiple fault case, subsequent faults will return a 0 status which is useless for userspace and replaces the useful fault status, so only update if status is non-0. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 7a41ed8b	06-Oct-2020	Alex Deucher <alexander.deucher@amd.com>	drm/amdgpu: add new INFO ioctl query for the last GPU page fault Add a interface to query the last GPU page fault for the process. Useful for debugging context lost errors. v2: split vmhub representation between kernel and userspace v3: add locking when fetching fault info in INFO IOCTL Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23238 libdrm MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23238 Cc: samuel.pitoiset@gmail.com Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 2e8ef6a5	06-Oct-2020	Alex Deucher <alexander.deucher@amd.com>	drm/amdgpu: add cached GPU fault structure to vm struct When we get a GPU page fault, cache the fault for later analysis. Cc: samuel.pitoiset@gmail.com Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 4e8303cf	11-Sep-2023	Lijo Lazar <lijo.lazar@amd.com>	drm/amdgpu: Use function for IP version check Use an inline function for version check. Gives more flexibility to handle any format changes. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 887db1e4	11-Sep-2023	André Almeida <andrealmeid@igalia.com>	drm/amdgpu: Merge debug module parameters Merge all developer debug options available as separated module parameters in one, making it obvious that are for developers. Drop the obsolete module options in favor of the new ones. Signed-off-by: André Almeida <andrealmeid@igalia.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# eb3b214c	02-Sep-2023	Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>	drm/amdgpu: Use min_t to replace min Use min_t to replace min, min_t is a bit fast because min use twice typeof. And using min_t is cleaner here since the min/max macros do a typecheck while min_t()/max_t() to an explicit type cast. Fixes the below checkpatch warning: WARNING: min() should probably be min_t() Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 51b79f33	17-Oct-2023	Felix Kuehling <Felix.Kuehling@amd.com>	drm/amdgpu: Fix possible null pointer dereference abo->tbo.resource may be NULL in amdgpu_vm_bo_update. Fixes: 180253782038 ("drm/ttm: stop allocating dummy resources during BO creation") Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 5003ca63	13-Jul-2023	Guchun Chen <guchun.chen@amd.com>	drm/amdgpu: fix slab-out-of-bounds issue in amdgpu_vm_pt_create Recent code set xcp_id stored from file private data when opening device to amdgpu bo for accounting memory usage etc, but not all VMs are attached to this fpriv structure like the vm cases in amdgpu_mes_self_test, otherwise, KASAN will complain below out of bound access. And more importantly, VM code should not touch fpriv structure, so drop fpriv code handling from amdgpu_vm_pt. [ 77.292314] BUG: KASAN: slab-out-of-bounds in amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu] [ 77.293845] Read of size 4 at addr ffff888102c48a48 by task modprobe/1069 [ 77.294146] Call Trace: [ 77.294178] <TASK> [ 77.294208] dump_stack_lvl+0x49/0x63 [ 77.294260] print_report+0x16f/0x4a6 [ 77.294307] ? amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu] [ 77.295979] ? kasan_complete_mode_report_info+0x3c/0x200 [ 77.296057] ? amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu] [ 77.297556] kasan_report+0xb4/0x130 [ 77.297609] ? amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu] [ 77.299202] __asan_load4+0x6f/0x90 [ 77.299272] amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu] [ 77.300796] ? amdgpu_init+0x6e/0x1000 [amdgpu] [ 77.302222] ? amdgpu_vm_pt_clear+0x750/0x750 [amdgpu] [ 77.303721] ? preempt_count_sub+0x18/0xc0 [ 77.303786] amdgpu_vm_init+0x39e/0x870 [amdgpu] [ 77.305186] ? amdgpu_vm_wait_idle+0x90/0x90 [amdgpu] [ 77.306683] ? kasan_set_track+0x25/0x30 [ 77.306737] ? kasan_save_alloc_info+0x1b/0x30 [ 77.306795] ? __kasan_kmalloc+0x87/0xa0 [ 77.306852] amdgpu_mes_self_test+0x169/0x620 [amdgpu] v2: without specifying xcp partition for PD/PT bo, the xcp id is -1. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2686 Fixes: 3ebfd221c1a8 ("drm/amdkfd: Store xcp partition id to amdgpu bo") Signed-off-by: Guchun Chen <guchun.chen@amd.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 52b82609	12-Jul-2023	Luben Tuikov <luben.tuikov@amd.com>	drm/amdgpu: Rename to amdgpu_vm_tlb_seq_struct Rename struct amdgpu_vm_tlb_seq_cb {...} to struct amdgpu_vm_tlb_seq_struct {...}, so as to not conflict with documentation processing tools. Of course, C has no problem with this. Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/b5ebc891-ee63-1638-0377-7b512d34b823@infradead.org Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# ca6c1e21	16-Apr-2022	Christian König <christian.koenig@amd.com>	drm/amdgpu: use the new drm_exec object for CS v3 Use the new component here as well and remove the old handling. v2: drop dupplicate handling v3: fix memory leak pointed out by Tatsuyuki Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230711133122.3710-7-christian.koenig@amd.com
# 8abc1eb2	18-Apr-2022	Christian König <christian.koenig@amd.com>	drm/amdkfd: switch over to using drm_exec v3 Avoids quite a bit of logic and kmalloc overhead. v2: fix multiple problems pointed out by Felix v3: two more nit picks from Felix fixed Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230711133122.3710-4-christian.koenig@amd.com
# eb58ad14	30-Jun-2023	Xiaogang Chen <xiaogang.chen@amd.com>	drm/amdgpu: have bos for PDs/PTS cpu accessible when kfd uses cpu to update vm When kfd uses cpu to update vm iterates all current PDs/PTs bos, adds AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag and kmap them to kernel virtual address space before kfd updates the vm that was created by gfx. Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# e77673d1	09-Jun-2023	Mukul Joshi <mukul.joshi@amd.com>	drm/amdgpu: Update invalid PTE flag setting Update the invalid PTE flag setting with TF enabled. This is to ensure, in addition to transitioning the retry fault to a no-retry fault, it also causes the wavefront to enter the trap handler. With the current setting, the fault only transitions to a no-retry fault. Additionally, have 2 sets of invalid PTE settings, one for TF enabled, the other for TF disabled. The setting with TF disabled, doesn't work with TF enabled. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 8ecee4cb	13-Jul-2023	Guchun Chen <guchun.chen@amd.com>	drm/amdgpu: fix slab-out-of-bounds issue in amdgpu_vm_pt_create Recent code set xcp_id stored from file private data when opening device to amdgpu bo for accounting memory usage etc, but not all VMs are attached to this fpriv structure like the vm cases in amdgpu_mes_self_test, otherwise, KASAN will complain below out of bound access. And more importantly, VM code should not touch fpriv structure, so drop fpriv code handling from amdgpu_vm_pt. [ 77.292314] BUG: KASAN: slab-out-of-bounds in amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu] [ 77.293845] Read of size 4 at addr ffff888102c48a48 by task modprobe/1069 [ 77.294146] Call Trace: [ 77.294178] <TASK> [ 77.294208] dump_stack_lvl+0x49/0x63 [ 77.294260] print_report+0x16f/0x4a6 [ 77.294307] ? amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu] [ 77.295979] ? kasan_complete_mode_report_info+0x3c/0x200 [ 77.296057] ? amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu] [ 77.297556] kasan_report+0xb4/0x130 [ 77.297609] ? amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu] [ 77.299202] __asan_load4+0x6f/0x90 [ 77.299272] amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu] [ 77.300796] ? amdgpu_init+0x6e/0x1000 [amdgpu] [ 77.302222] ? amdgpu_vm_pt_clear+0x750/0x750 [amdgpu] [ 77.303721] ? preempt_count_sub+0x18/0xc0 [ 77.303786] amdgpu_vm_init+0x39e/0x870 [amdgpu] [ 77.305186] ? amdgpu_vm_wait_idle+0x90/0x90 [amdgpu] [ 77.306683] ? kasan_set_track+0x25/0x30 [ 77.306737] ? kasan_save_alloc_info+0x1b/0x30 [ 77.306795] ? __kasan_kmalloc+0x87/0xa0 [ 77.306852] amdgpu_mes_self_test+0x169/0x620 [amdgpu] v2: without specifying xcp partition for PD/PT bo, the xcp id is -1. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2686 Fixes: 3ebfd221c1a8 ("drm/amdkfd: Store xcp partition id to amdgpu bo") Signed-off-by: Guchun Chen <guchun.chen@amd.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 1d7776cc	19-Jun-2023	Xiaogang Chen <xiaogang.chen@amd.com>	drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute Since we allow kfd and graphic operate on same GPU VM to have interoperation between them GPU VM may have been used by graphic vm operations before kfd turns a GPU VM into a compute VM. Remove vm clean checking at amdgpu_vm_make_compute. Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# ea2c3c08	16-Jun-2023	Samuel Pitoiset <samuel.pitoiset@gmail.com>	drm/amdgpu: fix clearing mappings for BOs that are always valid in VM Per VM BOs must be marked as moved or otherwise their ranges are not updated on use which might be necessary when the replace operation splits mappings. This fixes random GPU hangs when replacing sparse mappings from the userspace, while OP_MAP/OP_UNMAP works fine because always valid BOs are correctly handled there. Cc: stable@vger.kernel.org Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# f88e295e	19-Apr-2023	Christian König <christian.koenig@amd.com>	drm/amdgpu: add VM generation token Instead of using the VRAM lost counter add a 64bit token which indicates if a context or job is still valid to use. Should the VRAM be lost or the page tables need re-creation the token will change indicating that userspace needs to act and re-create the contexts and re-submit the work. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 55bf196f	18-Apr-2023	Christian König <christian.koenig@amd.com>	drm/amdgpu: reset VM when an error is detected When some problem with the updates of page tables is detected reset the state machine of the VM and re-create all page tables from scratch. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# e2ad8e2d	05-Jun-2023	Christian König <christian.koenig@amd.com>	drm/amdgpu: make sure BOs are locked in amdgpu_vm_get_memory We need to grab the lock of the BO or otherwise can run into a crash when we try to inspect the current location. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Guchun Chen <guchun.chen@amd.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 80e709ee	07-Jun-2023	Chong Li <chongli2@amd.com>	drm/amdgpu: add option params to enforce process isolation between graphics and compute enforce process isolation between graphics and compute via using the same reserved vmid. v2: remove params "struct amdgpu_vm *vm" from amdgpu_vmid_alloc_reserved and amdgpu_vmid_free_reserved. Signed-off-by: Chong Li <chongli2@amd.com> Reviewed-by: Christian Koenig <Christian.Koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 9f0bcf49	01-Jun-2023	Chia-I Wu <olvaffe@gmail.com>	amdgpu: validate offset_in_bo of drm_amdgpu_gem_va This is motivated by OOB access in amdgpu_vm_update_range when offset_in_bo+map_size overflows. v2: keep the validations in amdgpu_vm_bo_map v3: add the validations to amdgpu_vm_bo_map/amdgpu_vm_bo_replace_map rather than to amdgpu_gem_va_ioctl Fixes: 9f7eb5367d00 ("drm/amdgpu: actually use the VM map parameters") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# a2b30804	13-May-2023	Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	drm/amdgpu: Validate VM ioctl flags. None have been defined yet, so reject anybody setting any. Mesa sets it to 0 anyway. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
# f5fe7edf	30-Sep-2022	Mukul Joshi <mukul.joshi@amd.com>	drm/amdkfd: Update interrupt handling for GFX9.4.3 Update interrupt handling in CPX mode for GFX9.4.3 by using the VMID space instead of SDMA client id to determine if an interrupt should be processed by a KFD node. This is especially needed for handling retry faults from MMHUB. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 5fb34bd9	24-May-2022	Alex Sierra <alex.sierra@amd.com>	drm/amdkfd: pass kfd_node ref to svm migration api This work is required for GC 9.4.3, previous to support memory partitions per node at SVM. When multiple partition is configured, every BO should be allocated inside one specific partition which corresponds to the current amdgpu_device and kfd_node. v2: squash in compilation fix (Alex) v3: squash in fix for pre-gfx 9.4.3 (Alex) v4: squash in best_loc fix (Alex) Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>