History log of /linux-master/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v11.c
Revision Date Author Comments
# 0cac183b 29-Feb-2024 Jonathan Kim <Jonathan.Kim@amd.com>

drm/amdkfd: range check cp bad op exception interrupts

Due to a CP interrupt bug, bad packet garbage exception codes are raised.
Do a range check so that the debugger and runtime do not receive garbage
codes.
Update the user api to guard exception code type checking as well.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Tested-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# ed1e1e42 23-Jan-2024 YiPeng Chai <YiPeng.Chai@amd.com>

drm/amdgpu: Support passing poison consumption ras block to SRIOV

Support passing poison consumption ras blocks
to SRIOV.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 37fb8791 09-Aug-2023 Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

drm/amdkfd: ratelimited SQ interrupt messages

No functional change. Use ratelimited version of pr_ to avoid
overflowing of dmesg buffer

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 12fb1ad7 21-Apr-2022 Jonathan Kim <jonathan.kim@amd.com>

drm/amdkfd: update process interrupt handling for debug events

The debugger must be notified by any debugger subscribed exception
that comes from hardware interrupts.

If a debugger session exits, any exceptions it subscribed to may still
have interrupts in the interrupt ring buffer or KGD/KFD pipeline.
To prevent a new session from inheriting stale interrupts, when a new
queue is created, open an interrupt drain and allow the IH ring to drain
from a timestamped checkpoint. Then inject a custom IV so that once
the custom IV is picked up by the KFD, it's safe to close the drain
and proceed with queue creation.

The drain must also be on debug disable as SW interrupts may still
be processed. Drain at this time and clear all the exception status.

The debugger may also not be attached nor subscibed to certain
exceptions so forward them directly to the runtime.

GFX10 also requires its own IV processing, hence the creation of
kfd_int_process_v10.c. This is because the IV from SQ interrupts are
packed into a new continguous format unlike GFX9. To make this clear,
a separate interrupting handling code file was created.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 0df1106b 30-Mar-2023 Tom Rix <trix@redhat.com>

drm/amdkfd: remove unused sq_int_priv variable

clang with W=1 reports
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_int_process_v11.c:282:38: error: variable
'sq_int_priv' set but not used [-Werror,-Wunused-but-set-variable]
uint8_t sq_int_enc, sq_int_errtype, sq_int_priv;
^
This variable is not used so remove it.

Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 8dc1db31 14-Sep-2022 Mukul Joshi <mukul.joshi@amd.com>

drm/amdkfd: Introduce kfd_node struct (v5)

Introduce a new structure, kfd_node, which will now represent
a compute node. kfd_node is carved out of kfd_dev structure.
kfd_dev struct now will become the parent of kfd_node, and will
store common resources such as doorbells, GTT sub-alloctor etc.
kfd_node struct will store all resources specific to a compute
node, such as device queue manager, interrupt handling etc.

This is the first step in adding compute partition support in KFD.

v2: introduce kfd_node struct to gc v11 (Hawking)
v3: make reference to kfd_dev struct through kfd_node (Morris)
v4: use kfd_node instead for kfd isr/mqd functions (Morris)
v5: rebase (Alex)

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Tested-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Morris Zhang <Shiwu.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 15afe323 23-Sep-2022 Graham Sider <Graham.Sider@amd.com>

drm/amdkfd: fix dropped interrupt in kfd_int_process_v11

Shader wave interrupts were getting dropped in event_interrupt_wq_v11
if the PRIV bit was set to 1. This would often lead to a hang. Until
debugger logic is upstreamed, expand comment to stop early return.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 664883dd 23-Sep-2022 Graham Sider <Graham.Sider@amd.com>

drm/amdkfd: fix dropped interrupt in kfd_int_process_v11

Shader wave interrupts were getting dropped in event_interrupt_wq_v11
if the PRIV bit was set to 1. This would often lead to a hang. Until
debugger logic is upstreamed, expand comment to stop early return.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 3055e5d1 05-May-2022 Graham Sider <Graham.Sider@amd.com>

drm/amdkfd: Update event_interrupt_isr_v11 return

Add amdgpu_no_queue_eviction_on_vm_fault condition to
event_interrupt_isr_v11 return. If no queue eviction on vm fault
specified, function should return false for client/source ids specifying
vm fault.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 594a1d0f 05-May-2022 Yang Li <yang.lee@linux.alibaba.com>

drm/amdkfd: Return true/false (not 1/0) from bool functions

Return boolean values ("true" or "false") instead of 1 or 0 from bool
functions. This fixes the following warnings from coccicheck:

./drivers/gpu/drm/amd/amdkfd/kfd_int_process_v11.c:244:9-10: WARNING:
return of 0/1 in function 'event_interrupt_isr_v11' with return type
bool

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# cc009e61 26-Apr-2022 Mukul Joshi <mukul.joshi@amd.com>

drm/amdkfd: Add KFD support for soc21 v3

Add initial support for soc21 in KFD compute
driver (Mukul)
- Add new definition for soc21 device.
- Add new file for amdgpu-kfd interface for GFX11 family.
- Add new file for queue management, interrupt handling,
mqd management for GFX11 family in KFD driver.
- Related changes/updates for soc21 device in
KFD driver.
- Repurpose last 2 entries of SDMA MQD for driver use.

v2: Add an optional argument into update queue operation (Mukul)

v3: Switch to ip version check, replace kgd_dev with
amdgpu_device (Hawking)

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>