Cross Reference: /linux-master/drivers/gpu/drm/amd/amdkfd/kfd

History log of /linux-master/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
Revision	Date	Author	Comments
# a546a276	01-Dec-2023	Xiaogang Chen <xiaogang.chen@amd.com>	drm/amdkfd: Use partial migrations/mapping for GPU/CPU page faults in SVM This patch implements partial migration/mapping for gpu/cpu page faults in SVM according to migration granularity(default 2MB). A svm range may include pages from both system ram and vram of one gpu now. These chagnes are expected to improve migration performance and reduce mmu callback and TLB flush workloads. Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 541c341d	23-Oct-2023	Philip Yang <Philip.Yang@amd.com>	Revert "drm/amdkfd: Use partial migrations in GPU page faults" This reverts commit dc427a473e5d119232ddb27530920d9796cdea70. The change prevents migrating the entire range to VRAM because retry fault restore_pages map the remaining system memory range to GPUs. It will work correctly to submit together with partial mapping to GPU patch later. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# afaec204	23-Oct-2023	Philip Yang <Philip.Yang@amd.com>	Revert "drm/amdkfd:remove unused code" This reverts commit f9caf6cdd5cc1f4006fd7b6b113658c0b0159f23. Needed for the next revert patch. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# f9caf6cd	19-Oct-2023	Jesse Zhang <jesse.zhang@amd.com>	drm/amdkfd:remove unused code Function svm_range_split_by_grinity is not used, so it is removed. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# dc427a47	04-Oct-2023	Xiaogang Chen <xiaogang.chen@amd.com>	drm/amdkfd: Use partial migrations in GPU page faults This patch implements partial migration in gpu page fault according to migration granularity(default 2MB) and not split svm range in cpu page fault handling. A svm range may include pages from both system ram and vram of one gpu now. These chagnes are expected to improve migration performance and reduce mmu callback and TLB flush workloads. Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# eb3c357b	13-Sep-2023	Philip Yang <Philip.Yang@amd.com>	drm/amdkfd: Handle errors from svm validate and map If new range is splited to multiple pranges with max_svm_range_pages alignment and added to update_list, svm validate and map should keep going after error to make sure prange->mapped_to_gpu flag is up to date for the whole range. svm validate and map update set prange->mapped_to_gpu after mapping to GPUs successfully, otherwise clear prange->mapped_to_gpu flag (for update mapping case) instead of setting error flag, we can remove the redundant error flag to simpliy code. Refactor to remove goto and update prange->mapped_to_gpu flag inside svm_range_lock, to guarant we always evict queues or unmap from GPUs if there are invalid ranges. After svm validate and map return error -EAGIN, the caller retry will update the mapping for the whole range again. Fixes: c22b04407097 ("drm/amdkfd: flag added to handle errors from svm validate and map") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Tested-by: James Zhu <james.zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# c99b1612	01-Aug-2023	Philip Yang <Philip.Yang@amd.com>	drm/amdkfd: Remove svm range validated_once flag The validated_once flag is not used after the prefault was removed, The prefault was needed to ensure validate all system memory pages at least once before mapping or migrating the range to GPU. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# df954b69	15-Sep-2023	Xiaogang Chen <xiaogang.chen@amd.com>	drm/amdkfd: Separate dma unmap and free of dma address array operations We do not need free dma address array of svm_range each time we do dma unmap for pages in svm_range as we can reuse the same array. Only free it when free svm_range. Separate these two operations and use them accordingly. Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# ab3400eb	27-Jul-2023	Alex Sierra <alex.sierra@amd.com>	drm/amdkfd: avoid unmap dma address when svm_ranges are split DMA address reference within svm_ranges should be unmapped only after the memory has been released from the system. In case of range splitting, the DMA address information should be copied to the corresponding range after this has split. But leaving dma mapping intact. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# c22b0440	29-May-2023	Alex Sierra <alex.sierra@amd.com>	drm/amdkfd: flag added to handle errors from svm validate and map If a return error is raised during validation and mapping of a prange, this flag is set. It is a rare occurrence, but it could happen when `amdgpu_hmm_range_get_pages_done` returns true. In such cases, the caller should retry. However, it is important to ensure that the prange is updated correctly during the retry. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 84b4dd3f	31-Mar-2023	Philip Yang <Philip.Yang@amd.com>	drm/amdkfd: Refactor migrate init to support partition switch Rename smv_migrate_init to a better name kgd2kfd_init_zone_device because it setup zone devive pgmap for page migration and keep it in kfd_migrate.c to access static functions svm_migrate_pgmap_ops. Call it only once in amdgpu_device_ip_init after adev ip blocks are initialized, but before amdgpu_amdkfd_device_init initialize kfd nodes which enable SVM support based on pgmap. svm_range_set_max_pages is called by kgd2kfd_device_init everytime after switching compute partition mode. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 610dab11	31-Mar-2023	Philip Yang <Philip.Yang@amd.com>	drm/amdkfd: Move pgmap to amdgpu_kfd_dev structure VRAM pgmap resource is allocated every time when switching compute partitions because kfd_dev is re-initialized by post_partition_switch, As a result, it causes memory region resource leaking and system memory usage accounting unbalanced. pgmap resource should be allocated and registered only once when loading driver and freed when unloading driver, move it from kfd_dev to amdgpu_kfd_dev. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# f4d8b6f5	31-Jan-2023	Mukul Joshi <mukul.joshi@amd.com>	drm/amdkfd: Enable SVM on Native mode This patch enables SVM capability on GFX9.4.3 when run in Native mode. It also sets best_prefetch and best_restore locations to CPU as there is no VRAM. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# f5fe7edf	30-Sep-2022	Mukul Joshi <mukul.joshi@amd.com>	drm/amdkfd: Update interrupt handling for GFX9.4.3 Update interrupt handling in CPX mode for GFX9.4.3 by using the VMID space instead of SDMA client id to determine if an interrupt should be processed by a KFD node. This is especially needed for handling retry faults from MMHUB. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 5fb34bd9	24-May-2022	Alex Sierra <alex.sierra@amd.com>	drm/amdkfd: pass kfd_node ref to svm migration api This work is required for GC 9.4.3, previous to support memory partitions per node at SVM. When multiple partition is configured, every BO should be allocated inside one specific partition which corresponds to the current amdgpu_device and kfd_node. v2: squash in compilation fix (Alex) v3: squash in fix for pre-gfx 9.4.3 (Alex) v4: squash in best_loc fix (Alex) Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 8a7c3ce1	07-Sep-2022	Philip Yang <Philip.Yang@amd.com>	drm/amdkfd: Track unified memory when switching xnack mode Unified memory usage with xnack off is tracked to avoid oversubscribe system memory, with xnack on, we don't track unified memory usage to allow memory oversubscribe. When switching xnack mode from off to on, subsequent free ranges allocated with xnack off will not unreserve memory. When switching xnack mode from on to off, subsequent free ranges allocated with xnack on will unreserve memory. Both cases cause memory accounting unbalanced. When switching xnack mode from on to off, need reserve already allocated svm range memory. When switching xnack mode from off to on, need unreserve already allocated svm range memory. v6: Take prange lock to access range child list v5: Handle prange child ranges v4: Handle reservation memory failure v3: Handle switching xnack mode race with svm_range_deferred_list_work v2: Handle both switching xnack from on to off and from off to on cases Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# c969c5fd	26-Jul-2022	Philip Yang <Philip.Yang@amd.com>	drm/amdkfd: Remove prefault before migrating to VRAM Prefaulting potentially allocates system memory pages before a migration. This adds unnecessary overhead. Instead we can skip unallocated pages in the migration and just point migrate->dst to a 0-initialized VRAM page directly. Then the VRAM page will be inserted to the PTE. A subsequent CPU page fault will migrate the page back to system memory. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# c0289557	08-Aug-2022	Felix Kuehling <Felix.Kuehling@amd.com>	drm/amdkfd: Fix mm reference in SVM eviction worker Use the mm reference from the fence. This allows removing the svm_bo->svms pointer, which was problematic because we cannot assume that the struct kfd_process containing the svms is still allocated without holding a refcount on the process. Use mmget_not_zero to ensure the mm is still valid, and drop the svm_bo reference if it isn't. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 4959e609	25-Jul-2022	Philip Yang <Philip.Yang@amd.com>	drm/amdkfd: Set svm range max pages This will be used to split giant svm range into smaller ranges, to support VRAM overcommitment by giant range and improve GPU retry fault recover on giant range. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# e0f1e65b	13-Jan-2022	Philip Yang <Philip.Yang@amd.com>	drm/amdkfd: Add GPU recoverable fault SMI event Use ktime_get_boottime_ns() as timestamp to correlate with other APIs. Output timestamp when GPU recoverable fault starts and ends to recover the fault, if migration happened or only GPU page table is updated to recover, fault address, if read or write fault. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 6b9c63a6	18-Apr-2022	Philip Yang <Philip.Yang@amd.com>	drm/amdkfd: Add SVM range mapped_to_gpu flag To avoid unnecessary unmap SVM range from GPUs if range is not mapped on GPUs when migrating the range. This flag will also be used to flush TLB when updating the existing mapping on GPUs. It is protected by prange->migrate_mutex and mmap read lock in MMU notifier callback. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 2a909ae7	08-Nov-2021	Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>	drm/amdkfd: CRIU resume shared virtual memory ranges In CRIU resume stage, resume all the shared virtual memory ranges from the data stored inside the resuming kfd process during CRIU restore phase. Also setup xnack mode and free up the resources. KFD_IOCTL_SVM_ATTR_CLR_FLAGS is not available for querying via get_attr interface but we must clear the flags during restore as there might be some default flags set when the prange is created. Also handle the invalid PREFETCH atribute values saved during checkpoint by replacing them with another dummy KFD_IOCTL_SVM_ATTR_SET_FLAGS attribute. (rajneesh: Fixed the checkpatch reported problems) Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# c2db32ce	08-Nov-2021	Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>	drm/amdkfd: CRIU prepare for svm resume During CRIU restore phase, the VMAs for the virtual address ranges are not at their final location yet so in this stage, only cache the data required to successfully resume the svm ranges during an imminent CRIU resume phase. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 9d5dabfe	03-Nov-2021	Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>	drm/amdkfd: CRIU Save Shared Virtual Memory ranges During checkpoint stage, save the shared virtual memory ranges and attributes for the target process. A process may contain a number of svm ranges and each range might contain a number of attributes. While not all attributes may be applicable for a given prange but during checkpoint we store all possible values for the max possible attribute types. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 08a987a8	02-Nov-2021	Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>	drm/amdkfd: CRIU Discover svm ranges A KFD process may contain a number of virtual address ranges for shared virtual memory management and each such range can have many SVM attributes spanning across various nodes within the process boundary. This change reports the total number of such SVM ranges and their total private data size by extending the PROCESS_INFO op of the the CRIU IOCTL to discover the svm ranges in the target process and a future patches brings in the required support for checkpoint and restore for SVM ranges. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# b121862c	08-Dec-2021	Felix Kuehling <Felix.Kuehling@amd.com>	drm/amdkfd: Use prange->update_list head for remove_list The remove_list head was only used for keeping track of existing ranges that are to be removed from the svms->list. The update_list was used for new or existing ranges that need updated attributes. These two cases are mutually exclusive (i.e. the same range will never be on both lists). Therefore we can use the update_list head to track the remove_list and save another 16 bytes in the svm_range struct. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# ef3b4137	08-Dec-2021	Felix Kuehling <Felix.Kuehling@amd.com>	drm/amdkfd: Use prange->list head for insert_list There are seven list_heads in struct svm_range: list, update_list, remove_list, insert_list, svm_bo_list, deferred_list, child_list. This patch and the next one remove two of them that are redundant. The insert_list head was only used for new ranges that are not on the svms->list yet. So we can use that list head for keeping track of new ranges before they get added, and use list_move_tail to move them to the svms->list when ready. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 69879b30	24-Nov-2021	Philip Yang <Philip.Yang@amd.com>	drm/amdkfd: fix svm_bo release invalid wait context warning Add svm_range_bo_unref_async to schedule work to wait for svm_bo eviction work done and then free svm_bo. __do_munmap put_page is atomic context, call svm_range_bo_unref_async to avoid warning invalid wait context. Other non atomic context call svm_range_bo_unref. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 6bdfc37b	17-Aug-2021	Yifan Zhang <yifan1.zhang@amd.com>	drm/amdkfd: export svm_range_list_lock_and_flush_work export svm_range_list_lock_and_flush_work to make other kfd parts be able to sync svm_range_list. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# ff891a2e	15-Aug-2021	Philip Yang <Philip.Yang@amd.com>	drm/amdkfd: check access permisson to restore retry fault Check range access permission to restore GPU retry fault, if GPU retry fault on address which belongs to VMA, and VMA has no read or write permission requested by GPU, failed to restore the address. The vm fault event will pass back to user space. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 7981ec65	09-Mar-2021	Alex Sierra <alex.sierra@amd.com>	drm/amdkfd: Maintain svm_bo reference in page->zone_device_data Each zone-device page holds a reference to the SVM BO that manages its backing storage. This is necessary to correctly hold on to the BO in case zone_device pages are shared with a child-process. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 1d5dbfe6	05-May-2021	Alex Sierra <alex.sierra@amd.com>	drm/amdkfd: classify and map mixed svm range pages in GPU [Why] svm ranges can have mixed pages from device or system memory. A good example is, after a prange has been allocated in VRAM and a copy-on-write is triggered by a fork. This invalidates some pages inside the prange. Endding up in mixed pages. [How] By classifying each page inside a prange, based on its type. Device or system memory, during dma mapping call. If page corresponds to VRAM domain, a flag is set to its dma_addr entry for each GPU. Then, at the GPU page table mapping. All group of contiguous pages within the same type are mapped with their proper pte flags. v2: Instead of using ttm_res to calculate vram pfns in the svm_range. It is now done by setting the vram real physical address into drm_addr array. This makes more flexible VRAM management, plus removes the need to have a BO reference in the svm_range. v3: Remove mapping member from svm_range Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# a010d98a	06-May-2021	Alex Sierra <alex.sierra@amd.com>	drm/amdkfd: set owner ref to svm range prefault svm_range_prefault is called right before migrations to VRAM, to make sure pages are resident in system memory before the migration. With partial migrations, this reference is used by hmm range get pages to avoid migrating pages that are already in the same VRAM domain. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 3a61dae8	04-May-2021	Alex Sierra <alex.sierra@amd.com>	drm/amdkfd: device pgmap owner at the svm migrate init GPUs in the same XGMI hive have direct access to all members'VRAM. When mapping memory to a GPU, we don't need hmm_range_fault to fault device-private pages in the same hive back to the host. Identifying the page owner as the hive, rather than the individual GPU, accomplishes this. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# d4ebc200	21-Jun-2021	Philip Yang <Philip.Yang@amd.com>	drm/amdkfd: implement counters for vm fault and migration Add helper function to get process device data structure from adev to update counters. Update vm faults, page_in, page_out counters will no be executed in parallel, use WRITE_ONCE to avoid any form of compiler optimizations. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 5a75ea56	10-Jun-2021	Felix Kuehling <Felix.Kuehling@amd.com>	drm/amdkfd: Disable SVM per GPU, not per process When some GPUs don't support SVM, don't disabe it for the entire process. That would be inconsistent with the information the process got from the topology, which indicates SVM support per GPU. Instead disable SVM support only for the unsupported GPUs. This is done by checking any per-device attributes against the bitmap of supported GPUs. Also use the supported GPU bitmap to initialize access bitmaps for new SVM address ranges. Don't handle recoverable page faults from unsupported GPUs. (I don't think there will be unsupported GPUs that can generate recoverable page faults. But better safe than sorry.) Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <philip.yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 4ab159d2	29-Mar-2021	Felix Kuehling <Felix.Kuehling@amd.com>	drm/amdkfd: Add CONFIG_HSA_AMD_SVM Control whether to build SVM support into amdgpu with a Kconfig option. This makes it easier to disable it in production kernels if this new feature causes problems in production environments. Use "depends on" instead of "select" for DEVICE_PRIVATE, as is recommended for visible options. Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 564d2b92	24-Feb-2021	Felix Kuehling <Felix.Kuehling@amd.com>	drm/amdkfd: add svm range validate timestamp With xnack on, add validate timestamp in order to handle GPU vm fault from multiple GPUs. If GPU retry fault need migrate the range to the best restore location, use range validate timestamp to record system timestamp after range is restored to update GPU page table. Because multiple pages of same range have multiple retry fault, define AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING to the long time period that pending retry fault may still comes after page table update, to skip duplicate retry fault of same range. If difference between system timestamp and range last validate timestamp is bigger than AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING, that means the retry fault is from another GPU, then continue to handle retry fault recover. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# b41896e3	24-Feb-2021	Felix Kuehling <Felix.Kuehling@amd.com>	drm/amdkfd: add svm_bo eviction mechanism support svm_bo eviction mechanism is different from regular BOs. Every SVM_BO created contains one eviction fence and one worker item for eviction process. SVM_BOs can be attached to one or more pranges. For SVM_BO eviction mechanism, TTM will start to call enable_signal callback for every SVM_BO until VRAM space is available. Here, all the ttm_evict calls are synchronous, this guarantees that each eviction has completed and the fence has signaled before it returns. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 2383f56b	24-Feb-2021	Felix Kuehling <Felix.Kuehling@amd.com>	drm/amdkfd: page table restore through svm API Page table restore implementation in SVM API. This is called from the fault handler at amdgpu_vm. To update page tables through the page fault retry IH. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>