#
0cac183b |
|
29-Feb-2024 |
Jonathan Kim <Jonathan.Kim@amd.com> |
drm/amdkfd: range check cp bad op exception interrupts Due to a CP interrupt bug, bad packet garbage exception codes are raised. Do a range check so that the debugger and runtime do not receive garbage codes. Update the user api to guard exception code type checking as well. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Tested-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
ed1e1e42 |
|
23-Jan-2024 |
YiPeng Chai <YiPeng.Chai@amd.com> |
drm/amdgpu: Support passing poison consumption ras block to SRIOV Support passing poison consumption ras blocks to SRIOV. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
37fb8791 |
|
09-Aug-2023 |
Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> |
drm/amdkfd: ratelimited SQ interrupt messages No functional change. Use ratelimited version of pr_ to avoid overflowing of dmesg buffer Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Philip Yang <philip.yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
da3a815c |
|
28-Aug-2023 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdkfd: use mask to get v9 interrupt sq data bits correctly Interrupt sq data bits were not taken properly from contextid0 and contextid1. Use macro KFD_CONTEXT_ID_GET_SQ_INT_DATA instead. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
a1fe9e9f |
|
28-Aug-2023 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdkfd: use mask to get v9 interrupt sq data bits correctly Interrupt sq data bits were not taken properly from contextid0 and contextid1. Use macro KFD_CONTEXT_ID_GET_SQ_INT_DATA instead. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
d4300362 |
|
22-Jun-2023 |
Mukul Joshi <mukul.joshi@amd.com> |
drm/amdkfd: Update interrupt handling for GFX 9.4.3 For GFX 9.4.3, interrupt handling needs to be updated for: - Interrupt cookie will have a NodeId field. Each KFD node needs to check the NodeId before processing the interrupt. - For CPX mode, there are additional checks of client ID needed to process the interrupt. - Add NodeId to the process drain interrupt. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
12fb1ad7 |
|
21-Apr-2022 |
Jonathan Kim <jonathan.kim@amd.com> |
drm/amdkfd: update process interrupt handling for debug events The debugger must be notified by any debugger subscribed exception that comes from hardware interrupts. If a debugger session exits, any exceptions it subscribed to may still have interrupts in the interrupt ring buffer or KGD/KFD pipeline. To prevent a new session from inheriting stale interrupts, when a new queue is created, open an interrupt drain and allow the IH ring to drain from a timestamped checkpoint. Then inject a custom IV so that once the custom IV is picked up by the KFD, it's safe to close the drain and proceed with queue creation. The drain must also be on debug disable as SW interrupts may still be processed. Drain at this time and clear all the exception status. The debugger may also not be attached nor subscibed to certain exceptions so forward them directly to the runtime. GFX10 also requires its own IV processing, hence the creation of kfd_int_process_v10.c. This is because the IV from SQ interrupts are packed into a new continguous format unlike GFX9. To make this clear, a separate interrupting handling code file was created. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
c2d2588c |
|
07-Apr-2022 |
Jonathan Kim <jonathan.kim@amd.com> |
drm/amdkfd: add send exception operation Add a debug operation that allows the debugger to send an exception directly to runtime through a payload address. For memory violations, normal vmfault signals will be applied to notify runtime instead after passing in the saved exception data when a memory violation was raised to the debugger. For runtime exceptions, this will unblock the runtime enable function which will be explained and implemented in a follow up patch. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
8dc1db31 |
|
14-Sep-2022 |
Mukul Joshi <mukul.joshi@amd.com> |
drm/amdkfd: Introduce kfd_node struct (v5) Introduce a new structure, kfd_node, which will now represent a compute node. kfd_node is carved out of kfd_dev structure. kfd_dev struct now will become the parent of kfd_node, and will store common resources such as doorbells, GTT sub-alloctor etc. kfd_node struct will store all resources specific to a compute node, such as device queue manager, interrupt handling etc. This is the first step in adding compute partition support in KFD. v2: introduce kfd_node struct to gc v11 (Hawking) v3: make reference to kfd_dev struct through kfd_node (Morris) v4: use kfd_node instead for kfd isr/mqd functions (Morris) v5: rebase (Alex) Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Tested-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Morris Zhang <Shiwu.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
cc009e61 |
|
26-Apr-2022 |
Mukul Joshi <mukul.joshi@amd.com> |
drm/amdkfd: Add KFD support for soc21 v3 Add initial support for soc21 in KFD compute driver (Mukul) - Add new definition for soc21 device. - Add new file for amdgpu-kfd interface for GFX11 family. - Add new file for queue management, interrupt handling, mqd management for GFX11 family in KFD driver. - Related changes/updates for soc21 device in KFD driver. - Repurpose last 2 entries of SDMA MQD for driver use. v2: Add an optional argument into update queue operation (Mukul) v3: Switch to ip version check, replace kgd_dev with amdgpu_device (Hawking) Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Oak Zeng <Oak.Zeng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
c3eb12df |
|
07-Apr-2022 |
Felix Kuehling <Felix.Kuehling@amd.com> |
drm/amdkfd: Ignore bogus signals from MEC efficiently MEC firmware sometimes sends signal interrupts without a valid context ID on end of pipe events that don't intend to signal any HSA signals. This triggers the slow path in kfd_signal_event_interrupt that scans the entire event page for signaled events. Detect these signals in the top half interrupt handler to stop processing them as early as possible. Because we now always treat event ID 0 as invalid, reserve that ID during process initialization. v2: Update firmware version checks to support more GPUs Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
ed94aca6 |
|
21-Mar-2022 |
Tao Zhou <tao.zhou1@amd.com> |
drm/amdkfd: print unmap queue status for RAS poison consumption (v3) Print the status out when it passes, and also tell user gpu reset is triggered when we fall back to legacy way. v2: make the message more explicit. v3: change succeeds to succeeded. replace pr_warn with dev_warn. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
1990e29b |
|
16-Mar-2022 |
Tao Zhou <tao.zhou1@amd.com> |
drm/amdkfd: add RAS poison consumption handling for UTCL2 (v2) Do RAS page retirement and use gpu reset as fallback in UTCL2 fault handler. v2: replace vm fault event with posion consumed event in UTCL2 poison consumption. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
9d8a8d78 |
|
15-Mar-2022 |
Tao Zhou <tao.zhou1@amd.com> |
drm/amdkfd: replace source_id with client_id for RAS poison consumption Client ID is more accruate here and we can deal with more different cases with client ID. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
eed41975 |
|
15-Mar-2022 |
Tao Zhou <tao.zhou1@amd.com> |
drm/amdkfd: refine event_interrupt_poison_consumption Combine reading and setting poison flag as one atomic operation and add print message for the function. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
29b440d2 |
|
16-Feb-2022 |
Tao Zhou <tao.zhou1@amd.com> |
drm/amdkfd: add return value check for queue eviction Otherwise gpu reset will be triggered unconditionally in poison consumption. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
2243f493 |
|
10-Feb-2022 |
Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> |
drm/amdkfd: Fix leftover errors and warnings A bunch of errors and warnings are leftover KFD over the years, attempt to fix the errors and most warnings reported by checkpatch tool. Still a few warnings remain which may be false positives so ignore them for now. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
d87f36a0 |
|
10-Feb-2022 |
Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> |
drm/amdkfd: update SPDX license header Update the SPDX License header for all the KFD files. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
b1c87b08 |
|
06-Feb-2022 |
Tao Zhou <tao.zhou1@amd.com> |
drm/amdkfd: use unmap all queues for poison consumption Replace reset queue for specific PASID with unmap all queues, reset queue could break CP scheduler. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
03e5b167 |
|
06-Feb-2022 |
Tao Zhou <tao.zhou1@amd.com> |
drm/amdkfd: rename kfd_process_vm_fault to kfd_dqm_evict_pasid As the function is used in more different cases, use a more general name. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
5b0ce2d4 |
|
29-Dec-2021 |
yipechai <YiPeng.Chai@amd.com> |
drm/amdkfd: enable sdma ecc interrupt event can be handled by event_interrupt_wq_v9 Enable sdma ecc interrupt event can be handled by event_interrupt_wq_v9. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
b6485bed |
|
06-Dec-2021 |
Tao Zhou <tao.zhou1@amd.com> |
drm/amdkfd: reset queue which consumes RAS poison (v2) CP supports unmap queue with reset mode which only destroys specific queue without affecting others. Replacing whole gpu reset with reset queue mode for RAS poison consumption saves much time, and we can also fallback to gpu reset solution if reset queue fails. v2: Return directly if process is NULL; Reset queue solution is not applicable to SDMA, fallback to legacy way; Call kfd_unref_process after lookup process. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
f0dc99a6 |
|
17-Nov-2021 |
Graham Sider <Graham.Sider@amd.com> |
drm/amdkfd: add kfd_device_info_init function Initializes kfd->device_info given either asic_type (enum) if GFX version is less than GFX9, or GC IP version if greater. Also takes in vf and the target compiler gfx version. Uses SDMA version to determine num_sdma_queues_per_engine. Convert device_info to a non-pointer member of kfd, change references accordingly. Change unsupported asic condition to only probe f2g, move device_info initialization post-switch. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
6bfc7c7e |
|
19-Oct-2021 |
Graham Sider <Graham.Sider@amd.com> |
drm/amdkfd: replace kgd_dev in various amgpu_amdkfd funcs Modified definitions: - amdgpu_amdkfd_submit_ib - amdgpu_amdkfd_set_compute_idle - amdgpu_amdkfd_have_atomics_support - amdgpu_amdkfd_flush_gpu_tlb_pasid - amdgpu_amdkfd_flush_gpu_tlb_pasid - amdgpu_amdkfd_gpu_reset - amdgpu_amdkfd_alloc_gtt_mem - amdgpu_amdkfd_free_gtt_mem - amdgpu_amdkfd_alloc_gws - amdgpu_amdkfd_free_gws - amdgpu_amdkfd_ras_poison_consumption_handler Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
c7490949 |
|
23-Sep-2021 |
Tao Zhou <tao.zhou1@amd.com> |
amd/amdkfd: add ras page retirement handling for sq/sdma (v3) In ras poison mode, page retirement will be handled by the irq handler of the module which consumes corrupted data. v2: rename ras_process_cb to ras_poison_consumption_handler. move the handler's implementation from ASIC specific file to common file. v3: call gpu reset for xGMI connected mode. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
4a1d4b6d |
|
03-Jun-2021 |
Hawking Zhang <Hawking.Zhang@amd.com> |
drm/amdkfd: add sdma poison consumption handling Follow the same apporach as GFX to handle SDMA poison consumption. Send SIGBUS to application when receives SDMA_ECC interrupt and issue gpu reset either mode 2 or mode 1 to get the engine back Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Dennis Li<dennis.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
e2b1f9f5 |
|
11-May-2021 |
Dennis Li <Dennis.Li@amd.com> |
drm/amdkfd: refine the poison data consumption handling The user applications maybe register the KFD_EVENT_TYPE_HW_EXCEPTION and KFD_EVENT_TYPE_MEMORY events, driver could notify them when poison data consumed. Beside that, some applications maybe register SIGBUS signal hander. These applications will handle poison data by themselves, exit or re-create context to re-dispatch works. Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
be9064b7 |
|
25-Apr-2021 |
Hawking Zhang <Hawking.Zhang@amd.com> |
drm/amdgpu: remove unnecessary header include amdgpu.h is included in kfd_priv.h Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
20161e51 |
|
14-Apr-2021 |
Dennis Li <Dennis.Li@amd.com> |
drm/amdkfd: add edc error interrupt handle for poison propogate mode In poison progogate mode, when driver receive the edc error interrupt from SQ, driver should kill the process by pasid which is using the poison data, and then trigger GPU reset. Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
6d909c5d |
|
22-Jun-2020 |
Oak Zeng <Oak.Zeng@amd.com> |
drm/amdkfd: Add kernel parameter to stop queue eviction on vm fault This is to keep wavefront context for debug purpose Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
7af103ea |
|
05-Jan-2021 |
Tao Zhou <tao.zhou1@amd.com> |
drm/amdkfd: check more client ids in interrupt handler Add check for SExSH clients in kfd interrupt handler. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
ae279f69 |
|
18-Dec-2020 |
Alex Deucher <alexander.deucher@amd.com> |
drm/amdkfd: check both client id and src id in interrupt handlers We can have the same src ids for different client ids so make sure to check both the client id and the source id when handling interrupts. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
938a0650 |
|
13-May-2020 |
Amber Lin <Amber.Lin@amd.com> |
drm/amdkfd: Provide SMI events watch When the compute is malfunctioning or performance drops, the system admin will use SMI (System Management Interface) tool to monitor/diagnostic what went wrong. This patch provides an event watch interface for the user space to register devices and subscribe events they are interested. After registered, the user can use annoymous file descriptor's poll function with wait-time specified and wait for events to happen. Once an event happens, the user can use read() to retrieve information related to the event. VM fault event is done in this patch. v2: - remove UNREGISTER and add event ENABLE/DISABLE - correct kfifo usage - move event message API to kfd_ioctl.h v3: send the event msg in text than in binary v4: support multiple clients v5: move events enablement from ioctl to fd write v6: sparse fix Signed-off-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
8c8e1f69 |
|
18-May-2020 |
Aishwarya Ramakrishnan <aishwaryarj100@gmail.com> |
drm/amdkfd: Fix boolreturn.cocci warnings Return statements in functions returning bool should use true/false instead of 1/0. drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c:40:9-10: WARNING: return of 0/1 in function 'event_interrupt_isr_v9' with return type bool Generated by: scripts/coccinelle/misc/boolreturn.cocci Signed-off-by: Aishwarya Ramakrishnan <aishwaryarj100@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
3fe023d4 |
|
25-Sep-2019 |
Yong Zhao <Yong.Zhao@amd.com> |
drm/amdkfd: Query vmid pasid mapping through stored info for non HWS Because we record the mapping under non HWS mode in the software, we can query pasid through vmid using the stored mapping instead of reading from ATC registers. This also prepares for the defeatured ATC block in future ASICs. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
0ad8c5e2 |
|
08-Feb-2019 |
Yong Zhao <Yong.Zhao@amd.com> |
drm/amdkfd: Support MMHUB1 in kfd interrupt path Handle interrupts for second mmhub. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
a53a11a8 |
|
16-Oct-2018 |
Yong Zhao <Yong.Zhao@amd.com> |
drm/amdkfd: Workaround PASID missing in gfx9 interrupt payload under non HWS This is a known gfx9 HW issue, and this change can perfectly workaround the issue. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
00557f41 |
|
16-Oct-2018 |
Yong Zhao <Yong.Zhao@amd.com> |
drm/amdkfd: Adjust the debug message in KFD ISR This makes debug message get printed even when there is early return. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
58e69886 |
|
11-Jul-2018 |
Lan Xiao <Lan.Xiao@amd.com> |
drm/amdkfd: fix zero reading of VMID and PASID for Hawaii Upon VM Fault, the VMID and PASID written by HW are zeros in Hawaii. Instead of reading from ih_ring_entry, read directly from the registers. This workaround fix the soft hang issues caused by mishandled VM Fault in Hawaii. Signed-off-by: Lan Xiao <Lan.Xiao@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
#
2640c3fa |
|
11-Jul-2018 |
shaoyunl <Shaoyun.Liu@amd.com> |
drm/amdkfd: Handle VM faults in KFD 1. Pre-GFX9 the amdgpu ISR saves the vm-fault status and address per per-vmid. amdkfd needs to get the information from amdgpu through the new get_vm_fault_info interface. On GFX9 and later, all the required information is in the IH ring 2. amdkfd unmaps all queues from the faulting process and create new run-list without the guilty process 3. amdkfd notifies the runtime of the vm fault trap via EVENT_TYPE_MEMORY Signed-off-by: shaoyun liu <shaoyun.liu@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
#
c129db12 |
|
01-May-2018 |
Felix Kuehling <Felix.Kuehling@amd.com> |
drm/amdkfd: Add sanity checks in IRQ handlers Only accept interrupts from KFD VMIDs. Just checking for a PASID may not be enough because amdgpu started using PASIDs to map VM faults to processes. Warn if an IRQ doesn't have a valid PASID (indicating a firmware bug). Suggested-by: Shaoyun Liu <Shaoyun.Liu@amd.com> Suggested-by: Oak Zeng <Oak.Zeng@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
#
ca750681 |
|
10-Apr-2018 |
Felix Kuehling <Felix.Kuehling@amd.com> |
drm/amdkfd: Add SOC15 interrupt processing support Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com> Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|