History log of /linux-master/drivers/gpu/drm/msm/msm_ringbuffer.c
Revision Date Author Comments
# 917e9b7c 09-Jan-2024 Rob Clark <robdclark@chromium.org>

Revert "drm/msm/gpu: Push gpu lock down past runpm"

This reverts commit abe2023b4cea192ab266b351fd38dc9dbd846df0.

Changing the locking order means that scheduler/msm_job_run() can race
with the recovery kthread worker, with the result that the GPU gets an
extra runpm get when we are trying to power it off. Leaving the GPU in
an unrecovered state.

I'll need to come up with a different scheme for appeasing lockdep.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/573835/


# 2d7d2c4e 20-Nov-2023 Rob Clark <robdclark@chromium.org>

drm/msm/gem: Split out submit_unpin_objects() helper

Untangle unpinning from unlock/unref loop. The unpin only happens in
error paths so it is easier to decouple from the normal unlock path.

Since we never have an intermediate state where a subset of buffers
are pinned (ie. we never bail out of the pin or unpin loops) we can
replace the bo state flag bit with a global flag in the submit.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/568335/


# a6149f03 30-Oct-2023 Matthew Brost <matthew.brost@intel.com>

drm/sched: Convert drm scheduler to use a work queue rather than kthread

In Xe, the new Intel GPU driver, a choice has made to have a 1 to 1
mapping between a drm_gpu_scheduler and drm_sched_entity. At first this
seems a bit odd but let us explain the reasoning below.

1. In Xe the submission order from multiple drm_sched_entity is not
guaranteed to be the same completion even if targeting the same hardware
engine. This is because in Xe we have a firmware scheduler, the GuC,
which allowed to reorder, timeslice, and preempt submissions. If a using
shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls
apart as the TDR expects submission order == completion order. Using a
dedicated drm_gpu_scheduler per drm_sched_entity solve this problem.

2. In Xe submissions are done via programming a ring buffer (circular
buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the
limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow
control on the ring for free.

A problem with this design is currently a drm_gpu_scheduler uses a
kthread for submission / job cleanup. This doesn't scale if a large
number of drm_gpu_scheduler are used. To work around the scaling issue,
use a worker rather than kthread for submission / job cleanup.

v2:
- (Rob Clark) Fix msm build
- Pass in run work queue
v3:
- (Boris) don't have loop in worker
v4:
- (Tvrtko) break out submit ready, stop, start helpers into own patch
v5:
- (Boris) default to ordered work queue
v6:
- (Luben / checkpatch) fix alignment in msm_ringbuffer.c
- (Luben) s/drm_sched_submit_queue/drm_sched_wqueue_enqueue
- (Luben) Update comment for drm_sched_wqueue_enqueue
- (Luben) Positive check for submit_wq in drm_sched_init
- (Luben) s/alloc_submit_wq/own_submit_wq
v7:
- (Luben) s/drm_sched_wqueue_enqueue/drm_sched_run_job_queue
v8:
- (Luben) Adjust var names / comments

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://lore.kernel.org/r/20231031032439.1558703-3-matthew.brost@intel.com
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>


# 56e44960 14-Oct-2023 Luben Tuikov <luben.tuikov@amd.com>

drm/sched: Convert the GPU scheduler to variable number of run-queues

The GPU scheduler has now a variable number of run-queues, which are set up at
drm_sched_init() time. This way, each driver announces how many run-queues it
requires (supports) per each GPU scheduler it creates. Note, that run-queues
correspond to scheduler "priorities", thus if the number of run-queues is set
to 1 at drm_sched_init(), then that scheduler supports a single run-queue,
i.e. single "priority". If a driver further sets a single entity per
run-queue, then this creates a 1-to-1 correspondence between a scheduler and
a scheduled entity.

Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Russell King <linux+etnaviv@armlinux.org.uk>
Cc: Qiang Yu <yuq825@gmail.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Danilo Krummrich <dakr@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Emma Anholt <emma@anholt.net>
Cc: etnaviv@lists.freedesktop.org
Cc: lima@lists.freedesktop.org
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20231023032251.164775-1-luben.tuikov@amd.com


# abe2023b 10-Aug-2023 Rob Clark <robdclark@chromium.org>

drm/msm/gpu: Push gpu lock down past runpm

Avoid holding gpu lock when calling runpm, to avoid this lockdep splat:

======================================================
WARNING: possible circular locking dependency detected
6.4.3-debug+ #14 Not tainted
------------------------------------------------------
ring0/373 is trying to acquire lock:
ffffffead86efb98 (prepare_lock){+.+.}-{3:3}, at: clk_prepare_lock+0x70/0x98

but task is already holding lock:
ffffff809cd19170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x7c/0x128 [msm]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #4 (&gpu->lock){+.+.}-{3:3}:
__mutex_lock+0xc8/0x388
mutex_lock_nested+0x2c/0x38
msm_job_run+0x7c/0x128 [msm]
drm_sched_main+0x264/0x354 [gpu_sched]
kthread+0xf0/0x100
ret_from_fork+0x10/0x20

-> #3 (dma_fence_map){++++}-{0:0}:
__dma_fence_might_wait+0x74/0xc0
dma_resv_lockdep+0x1f0/0x2e8
do_one_initcall+0xb4/0x214
kernel_init_freeable+0x338/0x33c
kernel_init+0x30/0x134
ret_from_fork+0x10/0x20

-> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
fs_reclaim_acquire+0x7c/0x9c
slab_pre_alloc_hook.constprop.0+0x40/0x250
__kmem_cache_alloc_node+0x60/0x18c
kmalloc_node_trace+0x40/0x84
alloc_worker+0x2c/0x64
init_rescuer+0x34/0xe0
workqueue_init+0x168/0x1fc
kernel_init_freeable+0x15c/0x33c
kernel_init+0x30/0x134
ret_from_fork+0x10/0x20

-> #1 (fs_reclaim){+.+.}-{0:0}:
__fs_reclaim_acquire+0x3c/0x48
fs_reclaim_acquire+0x50/0x9c
slab_pre_alloc_hook.constprop.0+0x40/0x250
__kmem_cache_alloc_node+0x60/0x18c
kmalloc_trace+0x44/0x88
clk_rcg2_dfs_determine_rate+0x60/0x214
clk_core_determine_round_nolock+0xb8/0xf0
clk_core_round_rate_nolock+0x84/0x118
clk_core_round_rate_nolock+0xd8/0x118
clk_round_rate+0x6c/0xd0
geni_se_clk_tbl_get+0x78/0xc0
geni_se_clk_freq_match+0x44/0xe4
get_spi_clk_cfg+0x50/0xf4
geni_spi_set_clock_and_bw+0x54/0x104
spi_geni_prepare_message+0x130/0x174
__spi_pump_transfer_message+0x200/0x4d8
__spi_sync+0x13c/0x23c
spi_sync_locked+0x18/0x24
do_cros_ec_pkt_xfer_spi+0x124/0x3f0
cros_ec_xfer_high_pri_work+0x28/0x3c
kthread_worker_fn+0x14c/0x27c
kthread+0xf0/0x100
ret_from_fork+0x10/0x20

-> #0 (prepare_lock){+.+.}-{3:3}:
__lock_acquire+0xdf8/0x109c
lock_acquire+0x234/0x284
__mutex_lock+0xc8/0x388
mutex_lock_nested+0x2c/0x38
clk_prepare_lock+0x70/0x98
clk_prepare+0x24/0x50
clk_bulk_prepare+0x50/0x9c
a6xx_gmu_resume+0x94/0x800 [msm]
a6xx_gmu_pm_resume+0x38/0x158 [msm]
adreno_runtime_resume+0x2c/0x38 [msm]
pm_generic_runtime_resume+0x30/0x44
__rpm_callback+0x4c/0x134
rpm_callback+0x78/0x7c
rpm_resume+0x3a4/0x46c
__pm_runtime_resume+0x78/0xbc
pm_runtime_get_sync.isra.0+0x14/0x20 [msm]
msm_gpu_submit+0x4c/0x12c [msm]
msm_job_run+0x88/0x128 [msm]
drm_sched_main+0x264/0x354 [gpu_sched]
kthread+0xf0/0x100
ret_from_fork+0x10/0x20

other info that might help us debug this:
Chain exists of:
prepare_lock --> dma_fence_map --> &gpu->lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&gpu->lock);
lock(dma_fence_map);
lock(&gpu->lock);
lock(prepare_lock);

*** DEADLOCK ***
2 locks held by ring0/373:
#0: ffffffead875ae50 (dma_fence_map){++++}-{0:0}, at: drm_sched_main+0x54/0x354 [gpu_sched]
#1: ffffff809cd19170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x7c/0x128 [msm]

stack backtrace:
CPU: 2 PID: 373 Comm: ring0 Not tainted 6.4.3-debug+ #14
Hardware name: Google Villager (rev1+) with LTE (DT)
Call trace:
dump_backtrace+0xb4/0xf0
show_stack+0x20/0x30
dump_stack_lvl+0x60/0x84
dump_stack+0x18/0x24
print_circular_bug+0x1cc/0x234
check_noncircular+0x78/0xac
__lock_acquire+0xdf8/0x109c
lock_acquire+0x234/0x284
__mutex_lock+0xc8/0x388
mutex_lock_nested+0x2c/0x38
clk_prepare_lock+0x70/0x98
clk_prepare+0x24/0x50
clk_bulk_prepare+0x50/0x9c
a6xx_gmu_resume+0x94/0x800 [msm]
a6xx_gmu_pm_resume+0x38/0x158 [msm]
adreno_runtime_resume+0x2c/0x38 [msm]
pm_generic_runtime_resume+0x30/0x44
__rpm_callback+0x4c/0x134
rpm_callback+0x78/0x7c
rpm_resume+0x3a4/0x46c
__pm_runtime_resume+0x78/0xbc
pm_runtime_get_sync.isra.0+0x14/0x20 [msm]
msm_gpu_submit+0x4c/0x12c [msm]
msm_job_run+0x88/0x128 [msm]
drm_sched_main+0x264/0x354 [gpu_sched]
kthread+0xf0/0x100
ret_from_fork+0x10/0x20

Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/552298/


# 7391c282 02-Aug-2023 Rob Clark <robdclark@chromium.org>

drm/msm: Remove vma use tracking

This was not strictly necessary, as page unpinning (ie. shrinker) only
cares about the resv. It did give us some extra sanity checking for
userspace controlled iova, and was useful to catch issues on kernel and
userspace side when enabling userspace iova. But if userspace screws
this up, it just corrupts it's own gpu buffers and/or gets iova faults.
So we can just let userspace shoot it's own foot and drop the extra per-
buffer SUBMIT overhead.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Patchwork: https://patchwork.freedesktop.org/patch/551023/


# 6ba5daa5 02-Aug-2023 Rob Clark <robdclark@chromium.org>

drm/msm: Use drm_gem_object in submit bos table

Basically everywhere wants the base ptr type. So store that instead of
msm_gem_object.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/551021/


# 1a8b612e 02-Aug-2023 Rob Clark <robdclark@chromium.org>

drm/msm: Take lru lock once per job_run

Rather than acquiring it and dropping it for each individual obj.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/551019/


# 17b704f1 20-Mar-2023 Rob Clark <robdclark@chromium.org>

drm/msm/gem: Avoid obj lock in job_run()

Now that everything that controls which LRU an obj lives in *except* the
backing pages is protected by the LRU lock, add a special path to unpin
in the job_run() path, where we are assured that we already have backing
pages and will not be racing against eviction (because the GEM object's
dma_resv contains the fence that will be signaled when the submit/job
completes).

Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/527845/
Link: https://lore.kernel.org/r/20230320144356.803762-10-robdclark@gmail.com


# b14b8c5f 20-Mar-2023 Rob Clark <robdclark@chromium.org>

drm/msm: Decouple vma tracking from obj lock

We need to use the inuse count to track that a BO is pinned until
we have the hw_fence. But we want to remove the obj lock from the
job_run() path as this could deadlock against reclaim/shrinker
(because it is blocking the hw_fence from eventually being signaled).
So split that tracking out into a per-vma lock with narrower scope.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/527839/
Link: https://lore.kernel.org/r/20230320144356.803762-5-robdclark@gmail.com


# fc2f0756 20-Mar-2023 Rob Clark <robdclark@chromium.org>

drm/msm/gem: Tidy up VMA API

Stop open coding VMA construction, which will be needed in the next
commit. And since the VMA already has a ptr to the adress space, stop
passing that around everywhere. (Also, an aspace always has an mmu so
we can drop a couple pointless NULL checks.)

Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/527833/
Link: https://lore.kernel.org/r/20230320144356.803762-4-robdclark@gmail.com


# 769fec1e 20-Mar-2023 Rob Clark <robdclark@chromium.org>

drm/msm: Move submit bo flags update from obj lock

The flags are only accessed (1) when submit is constructed, before
enqueuing to gpu sched (ie. when still visible to only the task calling
the submit ioctl), (2) here, where we own a reference to the submit and
are serialized on the gpu sched thread, and (3) after the submit is
retired and last reference is dropped, which is serialized on the
submit's reference count. Hence locking is unneeded here.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/527830/
Link: https://lore.kernel.org/r/20230320144356.803762-3-robdclark@gmail.com


# f94e6a51 20-Mar-2023 Rob Clark <robdclark@chromium.org>

drm/msm: Pre-allocate hw_fence

Avoid allocating memory in job_run() by pre-allocating the hw_fence.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/527832/
Link: https://lore.kernel.org/r/20230320144356.803762-2-robdclark@gmail.com


# 084b9e17 23-Sep-2022 Rob Clark <robdclark@chromium.org>

drm/msm/gem: Unpin objects slightly later

The introduction of "drm/msm/gem: Evict active GEM objects when necessary"
exposes a problem with "drm/msm/gem: Unpin buffers earlier", in that we
need to keep the object pinned in the time the submit is queued up in the
gpu scheduler. Otherwise the shrinker will see it as a thing that can be
evicted if we wait for it to be signaled. But if the shrinker path is
waiting on it with the obj lock held, the job cannot be scheduled, as that
also requires briefly grabbing the obj lock, leading to deadlock. (Not to
mention, we don't want the shrinker to evict an obj queued up in gpu
scheduler.)

Fixes: f371bcc0c2ac ("drm/msm/gem: Unpin buffers earlier")
Fixes: 025d27239a2f ("drm/msm/gem: Evict active GEM objects when necessary")
Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/19
Signed-off-by: Rob Clark <robdclark@chromium.org>
Tested-by: Chia-I Wu <olvaffe@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/504528/
Link: https://lore.kernel.org/r/20220923224043.2449152-1-robdclark@gmail.com


# 125e03b2 18-Aug-2022 Akhil P Oommen <quic_akhilpo@quicinc.com>

drm/msm: Remove unnecessary pm_runtime_get/put

We already enable gpu power from msm_gpu_submit(), so avoid a duplicate
pm_runtime_get/put from msm_job_run().

Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
Patchwork: https://patchwork.freedesktop.org/patch/498390/
Link: https://lore.kernel.org/r/20220819015030.v5.1.Icf1e8f0c9b3e7e9933c3b48c70477d0582f3243f@changeid
Signed-off-by: Rob Clark <robdclark@chromium.org>


# 311e03c2 27-May-2022 Rob Clark <robdclark@chromium.org>

drm/msm/gem: Separate object and vma unpin

Previously the BO_PINNED state in the submit was tracking two related
but different things: (1) that the buffer object was pinned, and (2)
that the vma (mapping within a set of pagetables) was pinned. But with
fenced vma unpin (needed so that userspace couldn't race with retire
path for releasing a vma) these two were decoupled. The fact that the
BO_PINNED flag was already cleared meant that we leaked the bo pin count
which should have been dropped when the submit was retired.

So split this state into BO_OBJ_PINNED and BO_VMA_PINNED, so they can be
dropped independently.

Fixes: 95d1deb02a9c ("drm/msm/gem: Add fenced vma unpin")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/487559/
Link: https://lore.kernel.org/r/20220527172341.2151005-1-robdclark@gmail.com


# 500ca2a1 21-Apr-2022 Tom Rix <trix@redhat.com>

drm/msm: change msm_sched_ops from global to static

Smatch reports this issue
msm_ringbuffer.c:43:36: warning: symbol 'msm_sched_ops' was not declared. Should it be static?

msm_sched_ops is only used in msm_ringbuffer.c so change its
storage-class specifier to static.

Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/482883/
Link: https://lore.kernel.org/r/20220421131507.1557667-1-trix@redhat.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>


# 95d1deb0 11-Apr-2022 Rob Clark <robdclark@chromium.org>

drm/msm/gem: Add fenced vma unpin

With userspace allocated iova (next patch), we can have a race condition
where userspace observes the fence completion and deletes the vma before
retire_submit() gets around to unpinning the vma. To handle this, add a
fenced unpin which drops the refcount but tracks the fence, and update
msm_gem_vma_inuse() to check any previously unsignaled fences.

v2: Fix inuse underflow (duplicate unpin)
v3: Fix msm_job_run() vs submit_cleanup() race condition

Signed-off-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20220411215849.297838-10-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>


# 8ab62eda 22-Feb-2022 Jiawei Gu <Jiawei.Gu@amd.com>

drm/sched: Add device pointer to drm_gpu_scheduler

Add device pointer so scheduler's printing can use
DRM_DEV_ERROR() instead, which makes life easier under multiple GPU
scenario.

v2: amend all calls of drm_sched_init()
v3: fill dev pointer for all drm_sched_init() calls

Signed-off-by: Jiawei Gu <Jiawei.Gu@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220221095705.5290-1-Jiawei.Gu@amd.com


# c28e2f2b 09-Nov-2021 Rob Clark <robdclark@chromium.org>

drm/msm: Remove struct_mutex usage

The remaining struct_mutex usage is just to serialize various gpu
related things (submit/retire/recover/fault/etc), so replace
struct_mutex with gpu->lock.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20211109181117.591148-4-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>


# 80bcfbd3 04-Aug-2021 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/msm: Use scheduler dependency handling

drm_sched_job_init is already at the right place, so this boils down
to deleting code.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Rob Clark <robdclark@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Cc: Sean Paul <sean@poorly.run>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Link: https://patchwork.freedesktop.org/patch/msgid/20210805104705.862416-13-daniel.vetter@ffwll.ch


# 1d8a5ca4 27-Jul-2021 Rob Clark <robdclark@chromium.org>

drm/msm: Conversion to drm scheduler

For existing adrenos, there is one or more ringbuffer, depending on
whether preemption is supported. When preemption is supported, each
ringbuffer has it's own priority. A submitqueue (which maps to a
gl context or vk queue in userspace) is mapped to a specific ring-
buffer at creation time, based on the submitqueue's priority.

Each ringbuffer has it's own drm_gpu_scheduler. Each submitqueue
maps to a drm_sched_entity. And each submit maps to a drm_sched_job.

Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/4
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20210728010632.2633470-10-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>


# 030af2b0 27-Jul-2021 Rob Clark <robdclark@chromium.org>

drm/msm: drop drm_gem_object_put_locked()

No idea why we were still using this. It certainly hasn't been needed
for some time. So drop the pointless twin codepaths.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20210728010632.2633470-4-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>


# 375f9a63 27-Jul-2021 Rob Clark <robdclark@chromium.org>

drm/msm: Docs and misc cleanup

Fix a couple incorrect or misspelt comments, and add submitqueue doc
comment.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20210728010632.2633470-2-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>


# da3d378d 26-Jul-2021 Rob Clark <robdclark@chromium.org>

drm/msm: Let fences read directly from memptrs

Let dma_fence::signaled, etc, read directly from the address that the hw
is writing with updated completed fence seqno, so we can potentially
notice that the fence is signaled sooner.

Plus add some docs.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20210726144359.2179302-2-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>


# 77d20529 23-Oct-2020 Rob Clark <robdclark@chromium.org>

drm/msm: Protect ring->submits with it's own lock

One less place to rely on dev->struct_mutex.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>


# 77c40603 23-Oct-2020 Rob Clark <robdclark@chromium.org>

drm/msm: Document and rename preempt_lock

Before adding another lock, give ring->lock a more descriptive name.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>


# 604234f3 03-Sep-2020 Jordan Crouse <jcrouse@codeaurora.org>

drm/msm: Enable expanded apriv support for a650

a650 supports expanded apriv support that allows us to map critical buffers
(ringbuffer and memstore) as as privileged to protect them from corruption.

Cc: stable@vger.kernel.org
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>


# 352c83fb 17-Aug-2020 Rob Clark <robdclark@chromium.org>

drm/msm/gpu: make ringbuffer readonly

The GPU has no business writing into the ringbuffer, let's make it
readonly to the GPU.

Fixes: 7198e6b03155 ("drm/msm: add a3xx gpu support")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>


# caab277b 02-Jun-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not see http www gnu org
licenses

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 503 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Enrico Weigelt <info@metux.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 84c61275 07-Nov-2018 Jordan Crouse <jcrouse@codeaurora.org>

drm/msm/gpu: Map the ringbuffer in the iova at create time

For reasons that I'm sure made perfect sense at the time we were
opting to defer the iova alloc / pin on the ringbuffer until HW
init time so when we moved to iova reference counting we ended
up adding a reference count every time the hardware started.
Not that it mattered (because the ring is always around) but
it did make the debug output look odd. Allocate and pin the iova
at create time instead.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>


# 0815d774 07-Nov-2018 Jordan Crouse <jcrouse@codeaurora.org>

drm/msm: Add a name field for gem objects

For debugging purposes it is useful to assign descriptions
to buffers so that we know what they are used for. Add
a field to the buffer object and use that to name the various
kernel side allocations which ends up looking like like this
in /d/dri/X/gem:

flags id ref offset kaddr size madv name
00040000: I 0 ( 1) 00000000 0000000070b79eca 00004096 memptrs
vmas: [gpu: 01000000,mapped,inuse=1]
00020000: I 0 ( 1) 00000000 0000000031ed4074 00032768 ring0

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>


# 1e29dff0 07-Nov-2018 Jordan Crouse <jcrouse@codeaurora.org>

drm/msm: Add a common function to free kernel buffer objects

Buffer objects allocated with msm_gem_kernel_new() are mostly
freed the same way so we can save a few lines of code with a
common function.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>


# dc9a9b32 25-Jan-2018 Steve Kowalik <steven@wedontsleep.org>

drm/msm: Replace gem_object deprecated functions

drm_gem_object_{reference,unreference,unreference_unlocked} are
deprecated functions, and merely alias to the get/put functions.
Switch to the new names.

Signed-off-by: Steve Kowalik <steven@wedontsleep.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>


# b1fc2839 20-Oct-2017 Jordan Crouse <jcrouse@codeaurora.org>

drm/msm: Implement preemption for A5XX targets

Implement preemption for A5XX targets - this allows multiple
ringbuffers for different priorities with automatic preemption
of a lower priority ringbuffer if a higher one is ready.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>


# 4c7085a5 20-Oct-2017 Jordan Crouse <jcrouse@codeaurora.org>

drm/msm: Shadow current pointer in the ring until command is complete

Add a shadow pointer to track the current command being written into
the ring. Don't commit it as 'cur' until the command is submitted.
Because 'cur' is used to construct the software copy of the wptr this
ensures that somebody peeking in on the ring doesn't assume that a
command is inflight while it is being written. This isn't a huge deal
with a single ring (though technically the hangcheck could assume
the system is prematurely busy when it isn't) but it will be rather
important for preemption where the decision to preempt is based
on a non-empty ringbuffer. Without a shadow an aggressive preemption
scheme could assume that the ringbuffer is non empty and switch to it
before the CPU is done writing the command and boom.

Even though preemption won't be supported for all targets because of
the way the code is organized it is simpler to make this generic for
all targets. The extra load for non-preemption targets should be
minimal.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>


# f97decac 20-Oct-2017 Jordan Crouse <jcrouse@codeaurora.org>

drm/msm: Support multiple ringbuffers

Add the infrastructure to support the idea of multiple ringbuffers.
Assign each ringbuffer an id and use that as an index for the various
ring specific operations.

The biggest delta is to support legacy fences. Each fence gets its own
sequence number but the legacy functions expect to use a unique integer.
To handle this we return a unique identifier for each submission but
map it to a specific ring/sequence under the covers. Newer users use
a dma_fence pointer anyway so they don't care about the actual sequence
ID or ring.

The actual mechanics for multiple ringbuffers are very target specific
so this code just allows for the possibility but still only defines
one ringbuffer for each target family.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>


# 8223286d 27-Jul-2017 Jordan Crouse <jcrouse@codeaurora.org>

drm/msm: Add a helper function for in-kernel buffer allocations

Nearly all of the buffer allocations for kernel allocate an buffer object,
virtual address and GPU iova at the same time. Make a helper function to
handle the details.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
[dropped msm_fbdev conversion to new helper, since it interferes with
display-handover work, where we want to separate allocation and mapping]
Signed-off-by: Rob Clark <robdclark@gmail.com>


# 0e08270a 13-Jun-2017 Sushmita Susheelendra <ssusheel@codeaurora.org>

drm/msm: Separate locking of buffer resources from struct_mutex

Buffer object specific resources like pages, domains, sg list
need not be protected with struct_mutex. They can be protected
with a buffer object level lock. This simplifies locking and
makes it easier to avoid potential recursive locking scenarios
for SVM involving mmap_sem and struct_mutex. This also removes
unnecessary serialization when creating buffer objects, and also
between buffer object creation and GPU command submission.

Signed-off-by: Sushmita Susheelendra <ssusheel@codeaurora.org>
[robclark: squash in handling new locking for shrinker]
Signed-off-by: Rob Clark <robdclark@gmail.com>


# 88b333b0 20-Dec-2016 Jordan Crouse <jcrouse@codeaurora.org>

drm/msm: Ensure that the hardware write pointer is valid

Currently the value written to CP_RB_WPTR is calculated on the fly as
(rb->next - rb->start). But as the code is designed rb->next is wrapped
before writing the commands so if a series of commands happened to
fit perfectly in the ringbuffer, rb->next would end up being equal to
rb->size / 4 and thus result in an out of bounds address to CP_RB_WPTR.

The easiest way to fix this is to mask WPTR when writing it to the
hardware; it makes the hardware happy and the rest of the ringbuffer
math appears to work and there isn't any point in upsetting anything.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
[squash in is_power_of_2() check]
Signed-off-by: Rob Clark <robdclark@gmail.com>


# 18f23049 26-May-2016 Rob Clark <robdclark@gmail.com>

drm/msm: change gem->vmap() to get/put

Before we can add vmap shrinking, we really need to know which vmap'ings
are currently being used. So switch to get/put interface. Stubbed put
fxns for now.

Signed-off-by: Rob Clark <robdclark@gmail.com>


# 69a834c2 24-May-2016 Rob Clark <robdclark@gmail.com>

drm/msm: deal with exhausted vmap space better

Some, but not all, callers of obj->vmap() would check if return
IS_ERR(). So let's actually return an error if vmap() fails. And fixup
the call-sites that were not handling this properly.

Signed-off-by: Rob Clark <robdclark@gmail.com>


# 774449eb 15-May-2015 Rob Clark <robdclark@gmail.com>

drm/msm: fix locking inconsistencies in gpu->destroy()

In error paths, this was being called without struct_mutex held.
Leading to panics like:

msm 1a00000.qcom,mdss_mdp: No memory protection without IOMMU
Kernel panic - not syncing: BUG!
CPU: 0 PID: 1409 Comm: cat Not tainted 4.0.0-dirty #4
Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
Call trace:
[<ffffffc000089c78>] dump_backtrace+0x0/0x118
[<ffffffc000089da0>] show_stack+0x10/0x20
[<ffffffc0006686d4>] dump_stack+0x84/0xc4
[<ffffffc0006678b4>] panic+0xd0/0x210
[<ffffffc0003e1ce4>] drm_gem_object_free+0x5c/0x60
[<ffffffc000402870>] adreno_gpu_cleanup+0x60/0x80
[<ffffffc0004035a0>] a3xx_destroy+0x20/0x70
[<ffffffc0004036f4>] a3xx_gpu_init+0x84/0x108
[<ffffffc0004018b8>] adreno_load_gpu+0x58/0x190
[<ffffffc000419dac>] msm_open+0x74/0x88
[<ffffffc0003e0a48>] drm_open+0x168/0x400
[<ffffffc0003e7210>] drm_stub_open+0xa8/0x118
[<ffffffc0001a0e84>] chrdev_open+0x94/0x198
[<ffffffc000199f88>] do_dentry_open+0x208/0x310
[<ffffffc00019a4c4>] vfs_open+0x44/0x50
[<ffffffc0001aa26c>] do_last.isra.14+0x2c4/0xc10
[<ffffffc0001aac38>] path_openat+0x80/0x5e8
[<ffffffc0001ac354>] do_filp_open+0x2c/0x98
[<ffffffc00019b60c>] do_sys_open+0x13c/0x228
[<ffffffc00019b72c>] SyS_openat+0xc/0x18
CPU1: stopping

But there isn't any particularly good reason to hold struct_mutex for
teardown, so just standardize on calling it without the mutex held and
use the _unlocked() versions for GEM obj unref'ing

Signed-off-by: Rob Clark <robdclark@gmail.com>


# 7198e6b0 18-Jul-2013 Rob Clark <robdclark@gmail.com>

drm/msm: add a3xx gpu support

Add initial support for a3xx 3d core.

So far, with hardware that I've seen to date, we can have:
+ zero, one, or two z180 2d cores
+ a3xx or a2xx 3d core, which share a common CP (the firmware
for the CP seems to implement some different PM4 packet types
but the basics of cmdstream submission are the same)

Which means that the eventual complete "class" hierarchy, once
support for all past and present hw is in place, becomes:
+ msm_gpu
+ adreno_gpu
+ a3xx_gpu
+ a2xx_gpu
+ z180_gpu

This commit splits out the parts that will eventually be common
between a2xx/a3xx into adreno_gpu, and the parts that are even
common to z180 into msm_gpu.

Note that there is no cmdstream validation required. All memory access
from the GPU is via IOMMU/MMU. So as long as you don't map silly things
to the GPU, there isn't much damage that the GPU can do.

Signed-off-by: Rob Clark <robdclark@gmail.com>