History log of /linux-master/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
Revision Date Author Comments
# 599f7c8b 02-Aug-2023 Ran Sun <sunran001@208suo.com>

drm/amdgpu: Clean up errors in mxgpu_vi.c

Fix the following errors reported by checkpatch:

ERROR: spaces required around that '-=' (ctx:WxV)

Signed-off-by: Ran Sun <sunran001@208suo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# b98a1648 12-Oct-2022 Victor Zhao <Victor.Zhao@amd.com>

Revert "drm/amdgpu: let mode2 reset fallback to default when failure"

This reverts commit dac6b80818ac2353631c5a33d140d8d5508e2957.

This commit reverted the AMDGPU_SKIP_MODE2_RESET as it conflicts with
the original design of reset handler. Will redesign it.

Fixes: dac6b80818ac23 ("drm/amdgpu: let mode2 reset fallback to default when failure")
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# a340847b 12-Oct-2022 Victor Zhao <Victor.Zhao@amd.com>

Revert "drm/amdgpu: let mode2 reset fallback to default when failure"

This reverts commit dac6b80818ac2353631c5a33d140d8d5508e2957.

This commit reverted the AMDGPU_SKIP_MODE2_RESET as it conflicts with
the original design of reset handler. Will redesign it.

Fixes: dac6b80818ac23 ("drm/amdgpu: let mode2 reset fallback to default when failure")
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# dac6b808 27-Jul-2022 Victor Zhao <Victor.Zhao@amd.com>

drm/amdgpu: let mode2 reset fallback to default when failure

- introduce AMDGPU_SKIP_MODE2_RESET flag
- let mode2 reset fallback to default reset method if failed

v2: move this part out from the asic specific part

Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# f1549c09 07-Jul-2022 Likun Gao <Likun.Gao@amd.com>

drm/amdgpu: support reset flag set for gpu reset

Move reset_context out of gpu recover function to make it configurable
for different reset purpose.
For the reset way of call gpu_recovery sysfs, force to use full reset
method. Otherwise, try soft reset by default if the related ASIC
supportted, if soft reset failed, will use full reset.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# cf727044 17-May-2022 Andrey Grodzovsky <andrey.grodzovsky@amd.com>

drm/amdgpu: Rename amdgpu_device_gpu_recover_imp back to amdgpu_device_gpu_recover

We removed the wrapper that was queueing the recover function
into reset domain queue who was using this name.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# cfbb6b00 21-Jan-2022 Andrey Grodzovsky <andrey.grodzovsky@amd.com>

drm/amdgpu: Rework reset domain to be refcounted.

The reset domain contains register access semaphor
now and so needs to be present as long as each device
in a hive needs it and so it cannot be binded to XGMI
hive life cycle.
Adress this by making reset domain refcounted and pointed
by each member of the hive and the hive itself.

v4:

Fix crash on boot witrh XGMI hive by adding type to reset_domain.
XGMI will only create a new reset_domain if prevoius was of single
device type meaning it's first boot. Otherwsie it will take a
refocunt to exsiting reset_domain from the amdgou device.

Add a wrapper around reset_domain->refcount get/put
and a wrapper around send to reset wq (Lijo)

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://www.spinics.net/lists/amd-gfx/msg74121.html


# 02599bc7 20-Dec-2021 Andrey Grodzovsky <andrey.grodzovsky@amd.com>

drm/amd/virt: For SRIOV send GPU reset directly to TDR queue.

No need to to trigger another work queue inside the work queue.

v3:

Problem:
Extra reset caused by host side FLR notification
following guest side triggered reset.
Fix: Preven qeuing flr_work from mailbox irq if guest
already executing a reset.

Suggested-by: Liu Shaoyun <Shaoyun.Liu@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Liu Shaoyun <Shaoyun.Liu@amd.com>
Link: https://www.spinics.net/lists/amd-gfx/msg74114.html


# 1ec5a443 27-Jan-2022 tangmeng <tangmeng@uniontech.com>

drm/amd/amdgpu: fix spelling mistake "disbale" -> "disable"

There is a spelling mistake. Fix it.

Signed-off-by: tangmeng <tangmeng@uniontech.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# c9d66b36 18-Apr-2019 Colin Ian King <colin.king@canonical.com>

drm/amd/amdgpu: fix spelling mistake "recieve" -> "receive"

There is a spelling mistake in a pr_err message. Fix it.

Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 1894687b 21-Nov-2018 Brajeswar Ghosh <brajeswar.linux@gmail.com>

drm/amd/amdgpu: Remove duplicate header

Remove gca/gfx_8_0_sh_mask.h which is included more than once

Signed-off-by: Brajeswar Ghosh <brajeswar.linux@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 1ffdeca6 17-Sep-2018 Christian König <christian.koenig@amd.com>

drm/amdgpu: move more defines into amdgpu_irq.h

Everything that isn't related to the IH ring.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 12938fad 21-Aug-2018 Christian König <christian.koenig@amd.com>

drm/amdgpu: cleanup GPU recovery check a bit (v2)

Check if we should call the function instead of providing the forced
flag.

v2: rebase on KFD changes (Alex)

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 5f152b5e 15-Dec-2017 Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: rename amdgpu_gpu_recover

add device to the name for consistency.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 9c3f2b54 14-Dec-2017 Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: rename amdgpu_program_register_sequence

add device for consistency with other functions in this file.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 8854695a 13-Dec-2017 Andrey Grodzovsky <andrey.grodzovsky@amd.com>

drm/amdgpu: Simplify amdgpu_lockup_timeout usage.

With introduction of amdgpu_gpu_recovery we don't need any more
to rely on amdgpu_lockup_timeout == 0 for disabling GPU reset.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# dcebf026 12-Dec-2017 Andrey Grodzovsky <andrey.grodzovsky@amd.com>

drm/amdgpu: Add gpu_recovery parameter

Add new parameter to control GPU recovery procedure.

v2:
Add auto logic where reset is disabled for bare metal and enabled
for SR-IOV.
Allow forced reset from debugfs.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# c47b41a7 03-Nov-2017 Christian König <christian.koenig@amd.com>

drm/amdgpu: remove nonsense const u32 cast on ARRAY_SIZE result

Not sure what that should originally been good for, but it doesn't seem
to make any sense any more.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# f4711033 29-Oct-2017 pding <Pixel.Ding@amd.com>

drm/amdgpu: return error when sriov access requests get timeout

Reported-by: Sun Gary <Gary.Sun@amd.com>
Signed-off-by: pding <Pixel.Ding@amd.com>
Reviewed-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 5740682e 25-Oct-2017 Monk Liu <Monk.Liu@amd.com>

drm/amdgpu:implement new GPU recover(v3)

1,new imple names amdgpu_gpu_recover which gives more hint
on what it does compared with gpu_reset

2,gpu_recover unify bare-metal and SR-IOV, only the asic reset
part is implemented differently

3,gpu_recover will increase hang job karma and mark its entity/context
as guilty if exceeds limit

V2:

4,in scheduler main routine the job from guilty context will be immedialy
fake signaled after it poped from queue and its fence be set with
"-ECANCELED" error

5,in scheduler recovery routine all jobs from the guilty entity would be
dropped

6,in run_job() routine the real IB submission would be skipped if @skip parameter
equales true or there was VRAM lost occured.

V3:

7,replace deprecated gpu reset, use new gpu recover

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# b5914238 23-Oct-2017 pding <Pixel.Ding@amd.com>

drm/amdgpu/virt: implement wait_reset callbacks for vi/ai

Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: pding <Pixel.Ding@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 89041940 23-Jun-2017 Gavin Wan <Gavin.Wan@amd.com>

drm/amdgpu: Support passing amdgpu critical error to host via GPU Mailbox.

This feature works for SRIOV enviroment. For non-SRIOV enviroment, the
trans_error function does nothing.

The error information includes error_code (16bit), error_flags(16bit)
and error_data(64bit). Since there are not many errors, we keep the
errors in an array and transfer all errors to Host before amdgpu
initialization function (amdgpu_device_init) exit.

Signed-off-by: Gavin Wan <Gavin.Wan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 0c63e113 26-Apr-2017 Monk Liu <Monk.Liu@amd.com>

drm/amdgpu:only call flr_work under infinite timeout

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 7225f873 26-Apr-2017 Monk Liu <Monk.Liu@amd.com>

drm/amdgpu:use job* to replace voluntary

that way we can know which job cause hang and
can do per sched reset/recovery instead of all
sched.

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 17b2e332 21-Apr-2017 Monk Liu <Monk.Liu@amd.com>

drm/amdgpu:need som change on vega10 mailbox

if sriov gpu reset is invoked by job timeout, it is run
in a global work-queue which is very slow and better not call
msleep ortherwise it takes long time to get back CPU.

so make below changes:

1: Change msleep 1 to mdelay 5
2: Ignore the ack fail from pf after time out,
because VF FLR will clear ack, sometime VF FLR is done
prior to the beginning of poll_ack so we can ignore this ack

TODO:
Put job_timedout (and the following gpu reset) in a driver thread,
instead of the global work_struct.

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# ee73164a 22-Feb-2017 Pixel Ding <Pixel.Ding@amd.com>

drm/amdgpu/virt: don't check VALID bit for FLR completion message

The interrupt after FLR is missed sometimes due to hardware reason, so
guest driver get the notification of FLR completion via polling
message. Then host doesn't write VALID bit to avoid sending interrupt,
otherwise the completion will be handled twice.

So there's a valid message without VALID bit for FLR completion,
driver should handle it without checking.

Signed-off-by: Pixel Ding <Pixel.Ding@amd.com>
Reviewed-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# d766e6a3 29-Mar-2016 Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: switch ih handling to two levels (v3)

Newer asics have a two levels of irq ids now:
client id - the IP
src id - the interrupt src within the IP

v2: integrated Christian's comments.
v3: fix rebase fail in SI and CIK

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Ken Wang <Qingqing.Wang@amd.com>
Reviewed-by: Ken Wang <Qingqing.Wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# d1aad4d8 16-Feb-2017 Xiangliang Yu <Xiangliang.Yu@amd.com>

drm/amdgpu/virt: fix typo

When send messages to hypervior, the messages format should be is
idh_request, not idh_event.

Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 2641e38b 26-Jan-2017 Monk Liu <Monk.Liu@amd.com>

drm/amdgpu:RUNTIME flag should clr later

this flag will get cleared by request gpu access

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 480da262 05-Feb-2017 Monk Liu <Monk.Liu@amd.com>

drm/amdgpu:use work instead of delay-work

no need to use a delay work since we don't know how
much time hypervisor takes on FLR, so just polling
and waiting in a work.

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 4a370955 25-Jan-2017 Monk Liu <Monk.Liu@amd.com>

drm/amdgpu:no kiq for mailbox registers access

Use no kiq version reg access due to:
1) better performance
2) INTR context consideration (some routine in mailbox is in
INTR context e.g.xgpu_vi_mailbox_rcv_irq)

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# 562fe45c0 24-Jan-2017 Ken Xue <Ken.Xue@amd.com>

drm/amdgpu:Refine handshake of mailbox

Signed-off-by: Ken Xue <Ken.Xue@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


# ab71ac56 12-Jan-2017 Xiangliang Yu <Xiangliang.Yu@amd.com>

drm/amdgpu/virt: implement VI virt operation interfaces

VI has asic specific virt support, which including mailbox and
golden registers init.

Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Signed-off-by: shaoyunl <Shaoyun.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>