Cross Reference: /linux-master/drivers/gpu/drm/amd/amdgpu/mxgpu

History log of /linux-master/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h
Revision	Date	Author	Comments
# 2b6b29f3	30-Sep-2023	Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>	drm/amdgpu: Fix complex macros error Fixes the below: ERROR: Macros with complex values should be enclosed in parentheses WARNING: macros should not use a trailing semicolon +#define amdgpu_inc_vram_lost(adev) atomic_inc(&((adev)->vram_lost_counter)); Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 8ede944d	29-Jul-2022	Tao Zhou <tao.zhou1@amd.com>	drm/amdgpu: add RAS poison consumption handler for AI SRIOV Send message to host and host will handle it. v2: split the patch into two parts, one is for mxgpu ai and another one is for common poison consumption handler. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 216a9873	29-Dec-2021	James Yao <yiqing.yao@amd.com>	drm/amdgpu: add dummy event6 for vega10 [why] Malicious mailbox event1 fails driver loading on vega10. A dummy event6 prevent driver from taking response from malicious event1 as its own. [how] On vega10, send a mailbox event6 before sending event1. Signed-off-by: James Yao <yiqing.yao@amd.com> Reviewed-by: Jingwen Chen <Jingwen.Chen2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 85a774d9	06-Dec-2021	Zhigang Luo <zhigang.luo@amd.com>	drm/amdgpu: extended waiting SRIOV VF reset completion timeout to 10s For the ASIC has big FB, it need more time to clear FB during reset. This change extended SRIOV VF waiting reset completion timeout from 5s to 10s. Signed-off-by: Zhigang Luo <zhigang.luo@amd.com> Acked-by: Shaoyun Liu <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 64261a0d	27-Aug-2021	YuBiao Wang <YuBiao.Wang@amd.com>	drm/amd/amdgpu: Add ready_to_reset resp for vega10 Send response to host after received the flr notification from host. Port NV change to vega10. Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com> Reviewed-by: Jingwen Chen <Jingwen.Chen2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 3aa883ac	25-Nov-2020	Jiange Zhao <Jiange.Zhao@amd.com>	drm/amdgpu/SRIOV: Extend VF reset request wait period In Virtualization case, when one VF is sending too many FLR requests, hypervisor would stop responding to this VF's request for a long period of time. This is called event guard. During this period of cooling time, guest driver should wait instead of doing other things. After this period of time, guest driver would resume reset process and return to normal. Currently, guest driver would wait 12 seconds and return fail if it doesn't get response from host. Solution: extend this waiting time in guest driver and poll response periodically. Poll happens every 6 seconds and it will last for 60 seconds. v2: change the max repetition times from number to macro. Signed-off-by: Jiange Zhao <Jiange.Zhao@amd.com> Acked-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 312a79b6	21-Apr-2020	Monk Liu <Monk.Liu@amd.com>	drm/amdgpu: extent threshold of waiting FLR_COMPLETE to 5s to satisfy WHOLE GPU reset which need 3+ seconds to finish Signed-off-by: Monk Liu <Monk.Liu@amd.com> Acked-by: Yintian Tao <yttao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 4d130238	03-Mar-2020	Monk Liu <Monk.Liu@amd.com>	drm/amdgpu: cleanup idh event/req for NV headers 1) drop the headers from AI in mxgpu_nv.c, should refer to mxgpu_nv.h 2) the IDH_EVENT_MAX is not used and not aligned with host side so drop it 3) the IDH_TEXT_MESSAG was provided in host but not defined in guest Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# c9ffa427	30-Oct-2019	Yintian Tao <yttao@amd.com>	drm/amd/powerplay: enable pp one vf mode for vega10 Originally, due to the restriction from PSP and SMU, VF has to send message to hypervisor driver to handle powerplay change which is complicated and redundant. Currently, SMU and PSP can support VF to directly handle powerplay change by itself. Therefore, the old code about the handshake between VF and PF to handle powerplay will be removed and VF will use new the registers below to handshake with SMU. mmMP1_SMN_C2PMSG_101: register to handle SMU message mmMP1_SMN_C2PMSG_102: register to handle SMU parameter mmMP1_SMN_C2PMSG_103: register to handle SMU response v2: remove module parameter pp_one_vf v3: fix the parens v4: forbid vf to change smu feature v5: use hwmon_attributes_visible to skip sepicified hwmon atrribute v6: change skip condition at vega10_copy_table_to_smc Signed-off-by: Yintian Tao <yttao@amd.com> Acked-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# b6818520	30-Apr-2019	Trigger Huang <Trigger.Huang@amd.com>	drm/amdgpu: Add IDH_QUERY_ALIVE event for SR-IOV SR-IOV host side will send IDH_QUERY_ALIVE to guest VM to check if this guest VM is still alive (not destroyed). The only thing guest KMD need to do is to send ACK back to host. Signed-off-by: Trigger Huang <Trigger.Huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# bb5a2bdf	09-Apr-2019	Yintian Tao <yttao@amd.com>	drm/amdgpu: support dpm level modification under virtualization v3 Under vega10 virtualuzation, smu ip block will not be added. Therefore, we need add pp clk query and force dpm level function at amdgpu_virt_ops to support the feature. v2: add get_pp_clk existence check and use kzalloc to allocate buf v3: return -ENOMEM for allocation failure and correct the coding style Signed-off-by: Yintian Tao <yttao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 48527e52	14-Jan-2018	Monk Liu <Monk.Liu@amd.com>	drm/amdgpu: refactoring mailbox to fix TDR handshake bugs(v2) this patch actually refactor mailbox implmentations, and all below changes are needed together to fix all those mailbox handshake issues exposured by heavey TDR test. 1)refactor all mailbox functions based on byte accessing for mb_control reason is to avoid touching non-related bits when writing trn/rcv part of mailbox_control, this way some incorrect INTR sent to hypervisor side could be avoided, and it fixes couple handshake bug. 2)trans_msg function re-impled: put a invalid logic before transmitting message to make sure the ACK bit is in a clear status, otherwise there is chance that ACK asserted already before transmitting message and lead to fake ACK polling. (hypervisor side have some tricks to workaround ACK bit being corrupted by VF FLR which hase an side effects that may make guest side ACK bit asserted wrongly), and clear TRANS_MSG words after message transferred. 3)for mailbox_flr_work, it is also re-worked: it takes the mutex lock first if invoked, to block gpu recover's participate too early while hypervisor side is doing VF FLR. (hypervisor sends FLR_NOTIFY to guest before doing VF FLR and sentds FLR_COMPLETE after VF FLR done, and the FLR_NOTIFY will trigger interrupt to guest which lead to mailbox_flr_work being invoked) This can avoid the issue that mailbox trans msg being cleared by its VF FLR. 4)for mailbox_rcv_irq IRQ routine, it should only peek msg and schedule mailbox_flr_work, instead of ACK to hypervisor itself, because FLR_NOTIFY msg sent from hypervisor side doesn't need VF's ACK (this is because VF's ACK would lead to hypervisor clear its trans_valid/msg, and this would cause handshake bug if trans_valid/msg is cleared not due to correct VF ACK but from a wrong VF ACK like this "FLR_NOTIFY" one) This fixed handshake bug that sometimes GUEST always couldn't receive "READY_TO_ACCESS_GPU" msg from hypervisor. 5)seperate polling time limite accordingly: POLL ACK cost no more than 500ms POLL MSG cost no more than 12000ms POLL FLR finish cost no more than 500ms 6) we still need to set adev into in_gpu_reset mode after we received FLR_NOTIFY from host side, this can prevent innocent app wrongly succesed to open amdgpu dri device. FLR_NOFITY is received due to an IDLE hang detected from hypervisor side which indicating GPU is already die in this VF. v2: use MACRO as the offset of mailbox_control register don't test if NOTIFY_CMPL event in rcv_msg since it won't recieve that message anymore Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Pixel Ding <Pixel.Ding@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 6e132ca0	28-Jun-2017	Horace Chen <horace.chen@amd.com>	drm/amdgpu/sriov:increate mailbox polling timeout increase timeout to 12 seconds,because there may have multiple FLR waiting for done, the waiting time of events may be long, increase to 12s to reduce timeout failure. Signed-off-by: Horace Chen <horace.chen@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 89041940	23-Jun-2017	Gavin Wan <Gavin.Wan@amd.com>	drm/amdgpu: Support passing amdgpu critical error to host via GPU Mailbox. This feature works for SRIOV enviroment. For non-SRIOV enviroment, the trans_error function does nothing. The error information includes error_code (16bit), error_flags(16bit) and error_data(64bit). Since there are not many errors, we keep the errors in an array and transfer all errors to Host before amdgpu initialization function (amdgpu_device_init) exit. Signed-off-by: Gavin Wan <Gavin.Wan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# 8758cb6a	04-Apr-2017	Monk Liu <Monk.Liu@amd.com>	drm/amdgpu/vega10:timeout set to equal with VI Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# f98b617e	04-Apr-2017	Monk Liu <Monk.Liu@amd.com>	drm/amdgpu:implement the reset MB func for vega10 they are lack in the bringup stage, we need them for GPU reset feature. Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
# c9c9de93	09-Mar-2017	Xiangliang Yu <Xiangliang.Yu@amd.com>	drm/amdgpu/virt: impl mailbox for ai Implement mailbox protocol for AI so that guest vf can communicate with GPU hypervisor. Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Monk Liu <Monk.Liu@amd.com> Acked-by: Christian KÃ¶nig <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>