#
38f922a5 |
|
22-Nov-2023 |
Luben Tuikov <ltuikov89@gmail.com> |
drm/sched: Reverse run-queue priority enumeration Reverse run-queue priority enumeration such that the higest priority is now 0, and for each consecutive integer the prioirty diminishes. Run-queues correspond to priorities. To an external observer a scheduler created with a single run-queue, and another created with DRM_SCHED_PRIORITY_COUNT number of run-queues, should always schedule sched->sched_rq[0] with the same "priority", as that index run-queue exists in both schedulers, i.e. a scheduler with one run-queue or many. This patch makes it so. In other words, the "priority" of sched->sched_rq[n], n >= 0, is the same for any scheduler created with any allowable number of run-queues (priorities), 0 to DRM_SCHED_PRIORITY_COUNT. Cc: Rob Clark <robdclark@gmail.com> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Danilo Krummrich <dakr@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Luben Tuikov <ltuikov89@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231124052752.6915-6-ltuikov89@gmail.com
|
#
fe375c74 |
|
14-Nov-2023 |
Luben Tuikov <ltuikov89@gmail.com> |
drm/sched: Rename priority MIN to LOW Rename DRM_SCHED_PRIORITY_MIN to DRM_SCHED_PRIORITY_LOW. This mirrors DRM_SCHED_PRIORITY_HIGH, for a list of DRM scheduler priorities in ascending order, DRM_SCHED_PRIORITY_LOW, DRM_SCHED_PRIORITY_NORMAL, DRM_SCHED_PRIORITY_HIGH, DRM_SCHED_PRIORITY_KERNEL. Cc: Rob Clark <robdclark@gmail.com> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Danilo Krummrich <dakr@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Luben Tuikov <ltuikov89@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231124052752.6915-5-ltuikov89@gmail.com
|
#
a78422e9 |
|
09-Nov-2023 |
Danilo Krummrich <dakr@redhat.com> |
drm/sched: implement dynamic job-flow control Currently, job flow control is implemented simply by limiting the number of jobs in flight. Therefore, a scheduler is initialized with a credit limit that corresponds to the number of jobs which can be sent to the hardware. This implies that for each job, drivers need to account for the maximum job size possible in order to not overflow the ring buffer. However, there are drivers, such as Nouveau, where the job size has a rather large range. For such drivers it can easily happen that job submissions not even filling the ring by 1% can block subsequent submissions, which, in the worst case, can lead to the ring run dry. In order to overcome this issue, allow for tracking the actual job size instead of the number of jobs. Therefore, add a field to track a job's credit count, which represents the number of credits a job contributes to the scheduler's credit limit. Signed-off-by: Danilo Krummrich <dakr@redhat.com> Reviewed-by: Luben Tuikov <ltuikov89@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231110001638.71750-1-dakr@redhat.com
|
#
f3123c25 |
|
09-Nov-2023 |
Luben Tuikov <ltuikov89@gmail.com> |
drm/sched: Qualify drm_sched_wakeup() by drm_sched_entity_is_ready() Don't "wake up" the GPU scheduler unless the entity is ready, as well as we can queue to the scheduler, i.e. there is no point in waking up the scheduler for the entity unless the entity is ready. Signed-off-by: Luben Tuikov <ltuikov89@gmail.com> Fixes: bc8d6a9df99038 ("drm/sched: Don't disturb the entity when in RR-mode scheduling") Reviewed-by: Danilo Krummrich <dakr@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231110000123.72565-2-ltuikov89@gmail.com
|
#
f12af4c4 |
|
02-Nov-2023 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
drm/sched: Drop suffix from drm_sched_wakeup_if_can_queue Because a) helper is exported to other parts of the scheduler and b) there isn't a plain drm_sched_wakeup to begin with, I think we can drop the suffix and by doing so separate the intimiate knowledge between the scheduler components a bit better. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231102105538.391648-6-tvrtko.ursulin@linux.intel.com Reviewed-by: Luben Tuikov <ltuikov89@gmail.com> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
|
#
3c6c7ca4 |
|
30-Oct-2023 |
Matthew Brost <matthew.brost@intel.com> |
drm/sched: Add a helper to queue TDR immediately Add a helper whereby a driver can invoke TDR immediately. v2: - Drop timeout args, rename function, use mod delayed work (Luben) v3: - s/XE/Xe (Luben) - present tense in commit message (Luben) - Adjust comment for drm_sched_tdr_queue_imm (Luben) v4: - Adjust commit message (Luben) Cc: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://lore.kernel.org/r/20231031032439.1558703-6-matthew.brost@intel.com Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
|
#
f7fe64ad |
|
30-Oct-2023 |
Matthew Brost <matthew.brost@intel.com> |
drm/sched: Split free_job into own work item Rather than call free_job and run_job in same work item have a dedicated work item for each. This aligns with the design and intended use of work queues. v2: - Test for DMA_FENCE_FLAG_TIMESTAMP_BIT before setting timestamp in free_job() work item (Danilo) v3: - Drop forward dec of drm_sched_select_entity (Boris) - Return in drm_sched_run_job_work if entity NULL (Boris) v4: - Replace dequeue with peek and invert logic (Luben) - Wrap to 100 lines (Luben) - Update comments for *_queue / *_queue_if_ready functions (Luben) v5: - Drop peek argument, blindly reinit idle (Luben) - s/drm_sched_free_job_queue_if_ready/drm_sched_free_job_queue_if_done (Luben) - Update work_run_job & work_free_job kernel doc (Luben) v6: - Do not move drm_sched_select_entity in file (Luben) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20231031032439.1558703-4-matthew.brost@intel.com Reviewed-by: Luben Tuikov <ltuikov89@gmail.com> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
|
#
a6149f03 |
|
30-Oct-2023 |
Matthew Brost <matthew.brost@intel.com> |
drm/sched: Convert drm scheduler to use a work queue rather than kthread In Xe, the new Intel GPU driver, a choice has made to have a 1 to 1 mapping between a drm_gpu_scheduler and drm_sched_entity. At first this seems a bit odd but let us explain the reasoning below. 1. In Xe the submission order from multiple drm_sched_entity is not guaranteed to be the same completion even if targeting the same hardware engine. This is because in Xe we have a firmware scheduler, the GuC, which allowed to reorder, timeslice, and preempt submissions. If a using shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls apart as the TDR expects submission order == completion order. Using a dedicated drm_gpu_scheduler per drm_sched_entity solve this problem. 2. In Xe submissions are done via programming a ring buffer (circular buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow control on the ring for free. A problem with this design is currently a drm_gpu_scheduler uses a kthread for submission / job cleanup. This doesn't scale if a large number of drm_gpu_scheduler are used. To work around the scaling issue, use a worker rather than kthread for submission / job cleanup. v2: - (Rob Clark) Fix msm build - Pass in run work queue v3: - (Boris) don't have loop in worker v4: - (Tvrtko) break out submit ready, stop, start helpers into own patch v5: - (Boris) default to ordered work queue v6: - (Luben / checkpatch) fix alignment in msm_ringbuffer.c - (Luben) s/drm_sched_submit_queue/drm_sched_wqueue_enqueue - (Luben) Update comment for drm_sched_wqueue_enqueue - (Luben) Positive check for submit_wq in drm_sched_init - (Luben) s/alloc_submit_wq/own_submit_wq v7: - (Luben) s/drm_sched_wqueue_enqueue/drm_sched_run_job_queue v8: - (Luben) Adjust var names / comments Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://lore.kernel.org/r/20231031032439.1558703-3-matthew.brost@intel.com Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
|
#
35963cf2 |
|
30-Oct-2023 |
Matthew Brost <matthew.brost@intel.com> |
drm/sched: Add drm_sched_wqueue_* helpers Add scheduler wqueue ready, stop, and start helpers to hide the implementation details of the scheduler from the drivers. v2: - s/sched_wqueue/sched_wqueue (Luben) - Remove the extra white line after the return-statement (Luben) - update drm_sched_wqueue_ready comment (Luben) Cc: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://lore.kernel.org/r/20231031032439.1558703-2-matthew.brost@intel.com Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
|
#
56e44960 |
|
14-Oct-2023 |
Luben Tuikov <luben.tuikov@amd.com> |
drm/sched: Convert the GPU scheduler to variable number of run-queues The GPU scheduler has now a variable number of run-queues, which are set up at drm_sched_init() time. This way, each driver announces how many run-queues it requires (supports) per each GPU scheduler it creates. Note, that run-queues correspond to scheduler "priorities", thus if the number of run-queues is set to 1 at drm_sched_init(), then that scheduler supports a single run-queue, i.e. single "priority". If a driver further sets a single entity per run-queue, then this creates a 1-to-1 correspondence between a scheduler and a scheduled entity. Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Russell King <linux+etnaviv@armlinux.org.uk> Cc: Qiang Yu <yuq825@gmail.com> Cc: Rob Clark <robdclark@gmail.com> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Danilo Krummrich <dakr@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Emma Anholt <emma@anholt.net> Cc: etnaviv@lists.freedesktop.org Cc: lima@lists.freedesktop.org Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org Cc: nouveau@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20231023032251.164775-1-luben.tuikov@amd.com
|
#
fa8391ad |
|
16-Oct-2023 |
Luben Tuikov <luben.tuikov@amd.com> |
gpu/drm: Eliminate DRM_SCHED_PRIORITY_UNSET Eliminate DRM_SCHED_PRIORITY_UNSET, value of -2, whose only user was amdgpu. Furthermore, eliminate an index bug, in that when amdgpu boots, it calls drm_sched_entity_init() with DRM_SCHED_PRIORITY_UNSET, which uses it to index sched->sched_rq[]. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Alex Deucher <Alexander.Deucher@amd.com> Link: https://lore.kernel.org/r/20231017035656.8211-2-luben.tuikov@amd.com
|
#
db8b4968 |
|
23-Jun-2023 |
Boris Brezillon <boris.brezillon@collabora.com> |
drm/sched: Call drm_sched_fence_set_parent() from drm_sched_fence_scheduled() Drivers that can delegate waits to the firmware/GPU pass the scheduled fence to drm_sched_job_add_dependency(), and issue wait commands to the firmware/GPU at job submission time. For this to be possible, they need all their 'native' dependencies to have a valid parent since this is where the actual HW fence information are encoded. In drm_sched_main(), we currently call drm_sched_fence_set_parent() after drm_sched_fence_scheduled(), leaving a short period of time during which the job depending on this fence can be submitted. Since setting parent and signaling the fence are two things that are kinda related (you can't have a parent if the job hasn't been scheduled), it probably makes sense to pass the parent fence to drm_sched_fence_scheduled() and let it call drm_sched_fence_set_parent() before it signals the scheduled fence. Here is a detailed description of the race we are fixing here: Thread A Thread B - calls drm_sched_fence_scheduled() - signals s_fence->scheduled which wakes up thread B - entity dep signaled, checking the next dep - no more deps waiting - entity is picked for job submission by drm_gpu_scheduler - run_job() is called - run_job() tries to collect native fence info from s_fence->parent, but it's NULL => BOOM, we can't do our native wait - calls drm_sched_fence_set_parent() v2: * Fix commit message v3: * Add a detailed description of the race to the commit message * Add Luben's R-b Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Cc: Frank Binns <frank.binns@imgtec.com> Cc: Sarah Walker <sarah.walker@imgtec.com> Cc: Donald Robson <donald.robson@imgtec.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230623075204.382350-1-boris.brezillon@collabora.com
|
#
3655c590 |
|
17-May-2023 |
Luben Tuikov <luben.tuikov@amd.com> |
drm/sched: Rename to drm_sched_wakeup_if_can_queue() Rename drm_sched_wakeup() to drm_sched_wakeup_if_canqueue() since the former is misleading, as it wakes up the GPU scheduler _only if_ more jobs can be queued to the underlying hardware. This distinction is important to make, since the wake conditional in the GPU scheduler thread wakes up when other conditions are also true, e.g. when there are jobs to be cleaned. For instance, a user might want to wake up the scheduler only because there are more jobs to clean, but whether we can queue more jobs is irrelevant. v2: Separate "canqueue" to "can_queue". (Alex D.) Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://lore.kernel.org/r/20230517233550.377847-2-luben.tuikov@amd.com Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
|
#
70102d77 |
|
17-Apr-2023 |
Christian König <christian.koenig@amd.com> |
drm/scheduler: add drm_sched_entity_error and use rcu for last_scheduled Switch to using RCU handling for the last scheduled job and add a function to return the error code of it. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230420115752.31470-2-christian.koenig@amd.com
|
#
539f9ee4 |
|
17-Apr-2023 |
Christian König <christian.koenig@amd.com> |
drm/scheduler: properly forward fence errors When a hw fence is signaled with an error properly forward that to the finished fence. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230420115752.31470-1-christian.koenig@amd.com
|
#
f3823da7 |
|
21-Sep-2021 |
Rob Clark <robdclark@gmail.com> |
drm/scheduler: Add fence deadline support As the finished fence is the one that is exposed to userspace, and therefore the one that other operations, like atomic update, would block on, we need to propagate the deadline from from the finished fence to the actual hw fence. v2: Split into drm_sched_fence_set_parent() (ckoenig) v3: Ensure a thread calling drm_sched_fence_set_deadline_finished() sees fence->parent set before drm_sched_fence_set_parent() does this test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT). Signed-off-by: Rob Clark <robdclark@chromium.org> Acked-by: Luben Tuikov <luben.tuikov@amd.com>
|
#
c087bbb6 |
|
09-Feb-2023 |
Maíra Canal <mcanal@igalia.com> |
drm/sched: Create wrapper to add a syncobj dependency to job In order to add a syncobj's fence as a dependency to a job, it is necessary to call drm_syncobj_find_fence() to find the fence and then add the dependency with drm_sched_job_add_dependency(). So, wrap these steps in one single function, drm_sched_job_add_syncobj_dependency(). Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Maíra Canal <mcanal@igalia.com> Signed-off-by: Maíra Canal <mairacanal@riseup.net> Link: https://patchwork.freedesktop.org/patch/msgid/20230209124447.467867-2-mcanal@igalia.com
|
#
baad1097 |
|
30-Mar-2023 |
Lucas Stach <l.stach@pengutronix.de> |
Revert "drm/scheduler: track GPU active time per entity" This reverts commit df622729ddbf as it introduces a use-after-free, which isn't easy to fix without going back to the design drawing board. Reported-by: Danilo Krummrich <dakr@redhat.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
|
#
df622729 |
|
01-Feb-2023 |
Lucas Stach <l.stach@pengutronix.de> |
drm/scheduler: track GPU active time per entity Track the accumulated time that jobs from this entity were active on the GPU. This allows drivers using the scheduler to trivially implement the DRM fdinfo when the hardware doesn't provide more specific information than signalling job completion anyways. [Bagas: Append missing colon to @elapsed_ns] Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
|
#
cb3076e9 |
|
26-Oct-2022 |
Christian König <christian.koenig@amd.com> |
drm/scheduler: cleanup define Remove some not implemented function define Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221109095010.141189-4-christian.koenig@amd.com
|
#
06a2d7cc |
|
25-Oct-2022 |
Christian König <christian.koenig@amd.com> |
drm/amdgpu: revert "implement tdr advanced mode" This reverts commit e6c6338f393b74ac0b303d567bb918b44ae7ad75. This feature basically re-submits one job after another to figure out which one was the one causing a hang. This is obviously incompatible with gang-submit which requires that multiple jobs run at the same time. It's also absolutely not helpful to crash the hardware multiple times if a clean recovery is desired. For testing and debugging environments we should rather disable recovery alltogether to be able to inspect the state with a hw debugger. Additional to that the sw implementation is clearly buggy and causes reference count issues for the hardware fence. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
a82f30b0 |
|
29-Sep-2022 |
Christian König <christian.koenig@amd.com> |
drm/scheduler: rename dependency callback into prepare_job This now matches much better what this is doing. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-14-christian.koenig@amd.com
|
#
2cf9886e |
|
29-Sep-2022 |
Christian König <christian.koenig@amd.com> |
drm/scheduler: remove drm_sched_dependency_optimized Not used any more. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-12-christian.koenig@amd.com
|
#
4d5230b5 |
|
28-Sep-2022 |
Christian König <christian.koenig@amd.com> |
drm/scheduler: add drm_sched_job_add_resv_dependencies Add a new function to update job dependencies from a resv obj. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-3-christian.koenig@amd.com
|
#
08fb97de0 |
|
29-Sep-2022 |
Andrey Grodzovsky <andrey.grodzovsky@amd.com> |
drm/sched: Add FIFO sched policy to run queue When many entities are competing for the same run queue on the same scheduler, we observe an unusually long wait times and some jobs get starved. This has been observed on GPUVis. The issue is due to the Round Robin policy used by schedulers to pick up the next entity's job queue for execution. Under stress of many entities and long job queues within entity some jobs could be stuck for very long time in it's entity's queue before being popped from the queue and executed while for other entities with smaller job queues a job might execute earlier even though that job arrived later then the job in the long queue. Fix: Add FIFO selection policy to entities in run queue, chose next entity on run queue in such order that if job on one entity arrived earlier then job on another entity the first job will start executing earlier regardless of the length of the entity's job queue. v2: Switch to rb tree structure for entities based on TS of oldest job waiting in the job queue of an entity. Improves next entity extraction to O(1). Entity TS update O(log N) where N is the number of entities in the run-queue Drop default option in module control parameter. v3: Various cosmetical fixes and minor refactoring of fifo update function. (Luben) v4: Switch drm_sched_rq_select_entity_fifo to in order search (Luben) v5: Fix up drm_sched_rq_select_entity_fifo loop (Luben) v6: Add missing drm_sched_rq_remove_fifo_locked v7: Fix ts sampling bug and more cosmetic stuff (Luben) v8: Fix module parameter string (Luben) Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Direct Rendering Infrastructure - Development <dri-devel@lists.freedesktop.org> Cc: AMD Graphics <amd-gfx@lists.freedesktop.org> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Tested-by: Yunxiang Li (Teddy) <Yunxiang.Li@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220930041258.1050247-1-luben.tuikov@amd.com
|
#
7b476aff |
|
07-Oct-2022 |
Christian König <christian.koenig@amd.com> |
drm/sched: add DRM_SCHED_FENCE_DONT_PIPELINE flag Setting this flag on a scheduler fence prevents pipelining of jobs depending on this fence. In other words we always insert a full CPU round trip before dependent jobs are pushed to the pipeline. Signed-off-by: Christian König <christian.koenig@amd.com> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2113#note_1579296 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221014081553.114899-1-christian.koenig@amd.com
|
#
f8ad757e |
|
04-Apr-2022 |
Randy Dunlap <rdunlap@infradead.org> |
drm/scheduler: quieten kernel-doc warnings Fix kernel-doc warnings in gpu_scheduler.h and sched_main.c. Quashes these warnings: include/drm/gpu_scheduler.h:332: warning: missing initial short description on line: * struct drm_sched_backend_ops include/drm/gpu_scheduler.h:412: warning: missing initial short description on line: * struct drm_gpu_scheduler include/drm/gpu_scheduler.h:461: warning: Function parameter or member 'dev' not described in 'drm_gpu_scheduler' drivers/gpu/drm/scheduler/sched_main.c:201: warning: missing initial short description on line: * drm_sched_dependency_optimized drivers/gpu/drm/scheduler/sched_main.c:995: warning: Function parameter or member 'dev' not described in 'drm_sched_init' Fixes: 2d33948e4e00 ("drm/scheduler: add documentation") Fixes: 8ab62eda177b ("drm/sched: Add device pointer to drm_gpu_scheduler") Fixes: 542cff7893a3 ("drm/sched: Avoid lockdep spalt on killing a processes") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Cc: Nayan Deshmukh <nayan26deshmukh@gmail.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Jiawei Gu <Jiawei.Gu@amd.com> Cc: dri-devel@lists.freedesktop.org Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220404213040.12912-1-rdunlap@infradead.org
|
#
7d64c40a |
|
11-Apr-2022 |
Dmitry Osipenko <dmitry.osipenko@collabora.com> |
drm/scheduler: Don't kill jobs in interrupt context Interrupt context can't sleep. Drivers like Panfrost and MSM are taking mutex when job is released, and thus, that code can sleep. This results into "BUG: scheduling while atomic" if locks are contented while job is freed. There is no good reason for releasing scheduler's jobs in IRQ context, hence use normal context to fix the trouble. Cc: stable@vger.kernel.org Fixes: 542cff7893a3 ("drm/sched: Avoid lockdep spalt on killing a processes") Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220411221536.283312-1-dmitry.osipenko@collabora.com
|
#
9b04369b |
|
11-Apr-2022 |
Dmitry Osipenko <dmitry.osipenko@collabora.com> |
drm/scheduler: Don't kill jobs in interrupt context Interrupt context can't sleep. Drivers like Panfrost and MSM are taking mutex when job is released, and thus, that code can sleep. This results into "BUG: scheduling while atomic" if locks are contented while job is freed. There is no good reason for releasing scheduler's jobs in IRQ context, hence use normal context to fix the trouble. Cc: stable@vger.kernel.org Fixes: 542cff7893a3 ("drm/sched: Avoid lockdep spalt on killing a processes") Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220411221536.283312-1-dmitry.osipenko@collabora.com
|
#
e795df5b |
|
28-Mar-2022 |
Andrey Grodzovsky <andrey.grodzovsky@amd.com> |
drm/sched: Fix htmldoc warning. Fixes the warning. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220328132532.406572-1-andrey.grodzovsky@amd.com
|
#
8ab62eda |
|
22-Feb-2022 |
Jiawei Gu <Jiawei.Gu@amd.com> |
drm/sched: Add device pointer to drm_gpu_scheduler Add device pointer so scheduler's printing can use DRM_DEV_ERROR() instead, which makes life easier under multiple GPU scenario. v2: amend all calls of drm_sched_init() v3: fill dev pointer for all drm_sched_init() calls Signed-off-by: Jiawei Gu <Jiawei.Gu@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220221095705.5290-1-Jiawei.Gu@amd.com
|
#
542cff78 |
|
27-Oct-2021 |
Andrey Grodzovsky <andrey.grodzovsky@amd.com> |
drm/sched: Avoid lockdep spalt on killing a processes Probelm: Singlaning one sched fence from within another's sched fence singal callback generates lockdep splat because the both have same lockdep class of their fence->lock Fix: Fix bellow stack by rescheduling to irq work of signaling and killing of jobs that left when entity is killed. [11176.741181] dump_stack+0x10/0x12 [11176.741186] __lock_acquire.cold+0x208/0x2df [11176.741197] lock_acquire+0xc6/0x2d0 [11176.741204] ? dma_fence_signal+0x28/0x80 [11176.741212] _raw_spin_lock_irqsave+0x4d/0x70 [11176.741219] ? dma_fence_signal+0x28/0x80 [11176.741225] dma_fence_signal+0x28/0x80 [11176.741230] drm_sched_fence_finished+0x12/0x20 [gpu_sched] [11176.741240] drm_sched_entity_kill_jobs_cb+0x1c/0x50 [gpu_sched] [11176.741248] dma_fence_signal_timestamp_locked+0xac/0x1a0 [11176.741254] dma_fence_signal+0x3b/0x80 [11176.741260] drm_sched_fence_finished+0x12/0x20 [gpu_sched] [11176.741268] drm_sched_job_done.isra.0+0x7f/0x1a0 [gpu_sched] [11176.741277] drm_sched_job_done_cb+0x12/0x20 [gpu_sched] [11176.741284] dma_fence_signal_timestamp_locked+0xac/0x1a0 [11176.741290] dma_fence_signal+0x3b/0x80 [11176.741296] amdgpu_fence_process+0xd1/0x140 [amdgpu] [11176.741504] sdma_v4_0_process_trap_irq+0x8c/0xb0 [amdgpu] [11176.741731] amdgpu_irq_dispatch+0xce/0x250 [amdgpu] [11176.741954] amdgpu_ih_process+0x81/0x100 [amdgpu] [11176.742174] amdgpu_irq_handler+0x26/0xa0 [amdgpu] [11176.742393] __handle_irq_event_percpu+0x4f/0x2c0 [11176.742402] handle_irq_event_percpu+0x33/0x80 [11176.742408] handle_irq_event+0x39/0x60 [11176.742414] handle_edge_irq+0x93/0x1d0 [11176.742419] __common_interrupt+0x50/0xe0 [11176.742426] common_interrupt+0x80/0x90 Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Suggested-by: Christian König <christian.koenig@amd.com> Tested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/dri-devel/msg321250.html
|
#
d4c16733 |
|
03-Sep-2021 |
Boris Brezillon <boris.brezillon@collabora.com> |
drm/sched: Fix drm_sched_fence_free() so it can be passed an uninitialized fence drm_sched_job_cleanup() will pass an uninitialized fence to drm_sched_fence_free(), which will cause to_drm_sched_fence() to return a NULL fence object, causing a NULL pointer deref when this NULL object is passed to kmem_cache_free(). Let's create a new drm_sched_fence_free() function that takes a drm_sched_fence pointer and suffix the old function with _rcu. While at it, complain if drm_sched_fence_free() is passed an initialized fence or if drm_sched_fence_free_rcu() is passed an uninitialized fence. Fixes: dbe48d030b28 ("drm/sched: Split drm_sched_job_init") Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210903120554.444101-1-boris.brezillon@collabora.com
|
#
981b04d9 |
|
04-Aug-2021 |
Daniel Vetter <daniel.vetter@ffwll.ch> |
drm/sched: improve docs around drm_sched_entity I found a few too many things that are tricky and not documented, so I started typing. I found a few more things that looked broken while typing, see the varios FIXME in drm_sched_entity. Also some of the usual logics: - actually include sched_entity.c declarations, that was lost in the move here: 620e762f9a98 ("drm/scheduler: move entity handling into separate file") - Ditch the kerneldoc for internal functions, keep the comments where they're describing more than what the function name already implies. - Switch drm_sched_entity to inline docs. Acked-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (v1) Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: "Christian König" <christian.koenig@amd.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Steven Price <steven.price@arm.com> Cc: Emma Anholt <emma@anholt.net> Cc: Lee Jones <lee.jones@linaro.org> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210805104705.862416-7-daniel.vetter@ffwll.ch
|
#
0e10e9a1 |
|
04-Aug-2021 |
Daniel Vetter <daniel.vetter@ffwll.ch> |
drm/sched: drop entity parameter from drm_sched_push_job Originally a job was only bound to the queue when we pushed this, but now that's done in drm_sched_job_init, making that parameter entirely redundant. Remove it. The same applies to the context parameter in lima_sched_context_queue_task, simplify that too. v2: Rebase on top of msm adopting drm/sched Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Emma Anholt <emma@anholt.net> Acked-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Steven Price <steven.price@arm.com> (v1) Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (v1) Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Russell King <linux+etnaviv@armlinux.org.uk> Cc: Christian Gmeiner <christian.gmeiner@gmail.com> Cc: Qiang Yu <yuq825@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Steven Price <steven.price@arm.com> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Emma Anholt <emma@anholt.net> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Nirmoy Das <nirmoy.das@amd.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Chen Li <chenli@uniontech.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Deepak R Varma <mh12gx2825@gmail.com> Cc: Kevin Wang <kevin1.wang@amd.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: "Marek Olšák" <marek.olsak@amd.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Cc: Dennis Li <Dennis.Li@amd.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: etnaviv@lists.freedesktop.org Cc: lima@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: Rob Clark <robdclark@gmail.com> Cc: Sean Paul <sean@poorly.run> Cc: Melissa Wen <mwen@igalia.com> Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org Link: https://patchwork.freedesktop.org/patch/msgid/20210805104705.862416-6-daniel.vetter@ffwll.ch
|
#
ebd5f742 |
|
04-Aug-2021 |
Daniel Vetter <daniel.vetter@ffwll.ch> |
drm/sched: Add dependency tracking Instead of just a callback we can just glue in the gem helpers that panfrost, v3d and lima currently use. There's really not that many ways to skin this cat. v2/3: Rebased. v4: Repaint this shed. The functions are now called _add_dependency() and _add_implicit_dependency() Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (v3) Reviewed-by: Steven Price <steven.price@arm.com> (v1) Acked-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Nirmoy Das <nirmoy.aiemd@gmail.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Link: https://patchwork.freedesktop.org/patch/msgid/20210805104705.862416-5-daniel.vetter@ffwll.ch
|
#
dbe48d03 |
|
17-Aug-2021 |
Daniel Vetter <daniel.vetter@ffwll.ch> |
drm/sched: Split drm_sched_job_init This is a very confusingly named function, because not just does it init an object, it arms it and provides a point of no return for pushing a job into the scheduler. It would be nice if that's a bit clearer in the interface. But the real reason is that I want to push the dependency tracking helpers into the scheduler code, and that means drm_sched_job_init must be called a lot earlier, without arming the job. v2: - don't change .gitignore (Steven) - don't forget v3d (Emma) v3: Emma noticed that I leak the memory allocated in drm_sched_job_init if we bail out before the point of no return in subsequent driver patches. To be able to fix this change drm_sched_job_cleanup() so it can handle being called both before and after drm_sched_job_arm(). Also improve the kerneldoc for this. v4: - Fix the drm_sched_job_cleanup logic, I inverted the booleans, as usual (Melissa) - Christian pointed out that drm_sched_entity_select_rq() also needs to be moved into drm_sched_job_arm, which made me realize that the job->id definitely needs to be moved too. Shuffle things to fit between job_init and job_arm. v5: Reshuffle the split between init/arm once more, amdgpu abuses drm_sched.ready to signal gpu reset failures. Also document this somewhat. (Christian) v6: Rebase on top of the msm drm/sched support. Note that the drm_sched_job_init() call is completely misplaced, and hence also the split-out drm_sched_entity_push_job(). I've put in a FIXME which the next patch will address. v7: Drop the FIXME in msm, after discussions with Rob I agree it shouldn't be a problem where it is now. Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Melissa Wen <mwen@igalia.com> Cc: Melissa Wen <melissa.srw@gmail.com> Acked-by: Emma Anholt <emma@anholt.net> Acked-by: Steven Price <steven.price@arm.com> (v2) Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (v5) Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Russell King <linux+etnaviv@armlinux.org.uk> Cc: Christian Gmeiner <christian.gmeiner@gmail.com> Cc: Qiang Yu <yuq825@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Steven Price <steven.price@arm.com> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Adam Borowski <kilobyte@angband.pl> Cc: Nick Terrell <terrelln@fb.com> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Paul Menzel <pmenzel@molgen.mpg.de> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Nirmoy Das <nirmoy.das@amd.com> Cc: Deepak R Varma <mh12gx2825@gmail.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Kevin Wang <kevin1.wang@amd.com> Cc: Chen Li <chenli@uniontech.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: "Marek Olšák" <marek.olsak@amd.com> Cc: Dennis Li <Dennis.Li@amd.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Cc: Sonny Jiang <sonny.jiang@amd.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Tian Tao <tiantao6@hisilicon.com> Cc: etnaviv@lists.freedesktop.org Cc: lima@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: Emma Anholt <emma@anholt.net> Cc: Rob Clark <robdclark@gmail.com> Cc: Sean Paul <sean@poorly.run> Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org Link: https://patchwork.freedesktop.org/patch/msgid/20210817084917.3555822-1-daniel.vetter@ffwll.ch
|
#
78efe21b |
|
30-Jun-2021 |
Boris Brezillon <boris.brezillon@collabora.com> |
drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr Mali Midgard/Bifrost GPUs have 3 hardware queues but only a global GPU reset. This leads to extra complexity when we need to synchronize timeout works with the reset work. One solution to address that is to have an ordered workqueue at the driver level that will be used by the different schedulers to queue their timeout work. Thanks to the serialization provided by the ordered workqueue we are guaranteed that timeout handlers are executed sequentially, and can thus easily reset the GPU from the timeout handler without extra synchronization. v5: * Add a new paragraph to the timedout_job() method v3: * New patch v4: * Actually use the timeout_wq to queue the timeout work Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Christian König <christian.koenig@amd.com> Cc: Qiang Yu <yuq825@gmail.com> Cc: Emma Anholt <emma@anholt.net> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210630062751.2832545-3-boris.brezillon@collabora.com
|
#
1fad1b7e |
|
30-Jun-2021 |
Boris Brezillon <boris.brezillon@collabora.com> |
drm/sched: Document what the timedout_job method should do The documentation is a bit vague and doesn't really describe what the ->timedout_job() is expected to do. Let's add a few more details. v5: * New patch Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210630062751.2832545-2-boris.brezillon@collabora.com
|
#
95b2151f |
|
28-May-2021 |
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> |
drm/sched: Fix inverted comment for hang_limit The hang_limit is the threshold after which the kernel no longer attempts to schedule a job. Its documentation stated the opposite due to a typo. Correct the wording to indicate the actual purpose of the field. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210528235152.38447-1-alyssa.rosenzweig@collabora.com
|
#
e6c6338f |
|
07-Mar-2021 |
Jack Zhang <Jack.Zhang1@amd.com> |
drm/amd/amdgpu implement tdr advanced mode [Why] Previous tdr design treats the first job in job_timeout as the bad job. But sometimes a later bad compute job can block a good gfx job and cause an unexpected gfx job timeout because gfx and compute ring share internal GC HW mutually. [How] This patch implements an advanced tdr mode.It involves an additinal synchronous pre-resubmit step(Step0 Resubmit) before normal resubmit step in order to find the real bad job. 1. At Step0 Resubmit stage, it synchronously submits and pends for the first job being signaled. If it gets timeout, we identify it as guilty and do hw reset. After that, we would do the normal resubmit step to resubmit left jobs. 2. For whole gpu reset(vram lost), do resubmit as the old way. v2: squash in build fix (Alex) Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
be318fd8 |
|
01-Apr-2021 |
Christian König <christian.koenig@amd.com> |
drm/sched: add missing member documentation Just fix a warning. Signed-off-by: Christian König <christian.koenig@amd.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: f2f12eb9c32b ("drm/scheduler: provide scheduler score externally") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210401125213.138855-1-christian.koenig@amd.com
|
#
f2f12eb9 |
|
01-Feb-2021 |
Christian König <christian.koenig@amd.com> |
drm/scheduler: provide scheduler score externally Allow multiple schedulers to share the load balancing score. This is useful when one engine has different hw rings. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-and-Tested-by: Leo Liu <leo.liu@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210204144405.2737-1-christian.koenig@amd.com
|
#
a6a1f036 |
|
20-Jan-2021 |
Luben Tuikov <luben.tuikov@amd.com> |
drm/scheduler: Job timeout handler returns status (v3) This patch does not change current behaviour. The driver's job timeout handler now returns status indicating back to the DRM layer whether the device (GPU) is no longer available, such as after it's been unplugged, or whether all is normal, i.e. current behaviour. All drivers which make use of the drm_sched_backend_ops' .timedout_job() callback have been accordingly renamed and return the would've-been default value of DRM_GPU_SCHED_STAT_NOMINAL to restart the task's timeout timer--this is the old behaviour, and is preserved by this patch. v2: Use enum as the status of a driver's job timeout callback method. v3: Return scheduler/device information, rather than task information. Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Russell King <linux+etnaviv@armlinux.org.uk> Cc: Christian Gmeiner <christian.gmeiner@gmail.com> Cc: Qiang Yu <yuq825@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Steven Price <steven.price@arm.com> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Eric Anholt <eric@anholt.net> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Steven Price <steven.price@arm.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/415095/
|
#
e2183fb1 |
|
10-Feb-2021 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
Revert "drm/scheduler: Job timeout handler returns status (v3)" This reverts commit c10983e14e8f5d7c8dab0415e0cb7fe8d10aa9e3. This commit is not meant for drm-misc-next-fixes, and was accidentally cherry picked over. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
|
#
c10983e1 |
|
20-Jan-2021 |
Luben Tuikov <luben.tuikov@amd.com> |
drm/scheduler: Job timeout handler returns status (v3) This patch does not change current behaviour. The driver's job timeout handler now returns status indicating back to the DRM layer whether the device (GPU) is no longer available, such as after it's been unplugged, or whether all is normal, i.e. current behaviour. All drivers which make use of the drm_sched_backend_ops' .timedout_job() callback have been accordingly renamed and return the would've-been default value of DRM_GPU_SCHED_STAT_NOMINAL to restart the task's timeout timer--this is the old behaviour, and is preserved by this patch. v2: Use enum as the status of a driver's job timeout callback method. v3: Return scheduler/device information, rather than task information. Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Russell King <linux+etnaviv@armlinux.org.uk> Cc: Christian Gmeiner <christian.gmeiner@gmail.com> Cc: Qiang Yu <yuq825@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Steven Price <steven.price@arm.com> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Eric Anholt <eric@anholt.net> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Steven Price <steven.price@arm.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/415095/ (cherry picked from commit a6a1f036c74e3d2a3a56b3140492c7c3ecb879f3) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
|
#
c365d304 |
|
09-Dec-2020 |
Luben Tuikov <luben.tuikov@amd.com> |
drm/sched: Add missing structure comment Add a missing structure comment for the recently added @list member. Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Christian König <christian.koenig@amd.com> Fixes: 8935ff00e3b1 ("drm/scheduler: "node" --> "list"") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/406546/
|
#
6efa4b46 |
|
03-Dec-2020 |
Luben Tuikov <luben.tuikov@amd.com> |
gpu/drm: ring_mirror_list --> pending_list Rename "ring_mirror_list" to "pending_list", to describe what something is, not what it does, how it's used, or how the hardware implements it. This also abstracts the actual hardware implementation, i.e. how the low-level driver communicates with the device it drives, ring, CAM, etc., shouldn't be exposed to DRM. The pending_list keeps jobs submitted, which are out of our control. Usually this means they are pending execution status in hardware, but the latter definition is a more general (inclusive) definition. Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/405573/ Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Christian König <christian.koenig@amd.com>
|
#
8935ff00 |
|
03-Dec-2020 |
Luben Tuikov <luben.tuikov@amd.com> |
drm/scheduler: "node" --> "list" Rename "node" to "list" in struct drm_sched_job, in order to make it consistent with what we see being used throughout gpu_scheduler.h, for instance in struct drm_sched_entity, as well as the rest of DRM and the kernel. Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/403515/ Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Christian König <christian.koenig@amd.com>
|
#
9af5e21d |
|
11-Aug-2020 |
Luben Tuikov <luben.tuikov@amd.com> |
drm/scheduler: Remove priority macro INVALID (v2) Remove DRM_SCHED_PRIORITY_INVALID. We no longer carry around an invalid priority and cut it off at the source. Backwards compatibility behaviour of AMDGPU CTX IOCTL passing in garbage for context priority from user space and then mapping that to DRM_SCHED_PRIORITY_NORMAL is preserved. v2: Revert "res" --> "r" and "prio" --> "priority". Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
e2d732fd |
|
11-Aug-2020 |
Luben Tuikov <luben.tuikov@amd.com> |
drm/scheduler: Scheduler priority fixes (v2) Remove DRM_SCHED_PRIORITY_LOW, as it was used in only one place. Rename and separate by a line DRM_SCHED_PRIORITY_MAX to DRM_SCHED_PRIORITY_COUNT as it represents a (total) count of said priorities and it is used as such in loops throughout the code. (0-based indexing is the the count number.) Remove redundant word HIGH in priority names, and rename *KERNEL* to *HIGH*, as it really means that, high. v2: Add back KERNEL and remove SW and HW, in lieu of a single HIGH between NORMAL and KERNEL. Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
d41a39dd |
|
25-Jun-2020 |
Nirmoy Das <nirmoy.aiemd@gmail.com> |
drm/scheduler: improve job distribution with multiple queues This patch uses score to select a new drm scheduler for better loadbalance between multiple drm schedulers instead of num_jobs. Below are test results after running amdgpu_test for ~10 times. Before this patch: sched_name num of many times it got schedule ========= ================================== sdma0 1463 sdma1 198 comp_1.0.1 280 After this patch: sched_name num of many times it got schedule ========= ================================== sdma0 925 sdma1 928 comp_1.0.1 177 comp_1.1.1 44 comp_1.2.1 43 comp_1.3.1 44 Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/373000/ Signed-off-by: Christian König <christian.koenig@amd.com>
|
#
fa3d55a1 |
|
28-Mar-2020 |
Sam Ravnborg <sam@ravnborg.org> |
drm/sched: fix kernel-doc in gpu_scheduler.h Fix following warning: gpu_scheduler.h:103: warning: Function parameter or member 'priority' not described in 'drm_sched_entity' Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Nirmoy Das <nirmoy.das@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20200328132025.19910-4-sam@ravnborg.org
|
#
ec2edcc2 |
|
13-Mar-2020 |
Nirmoy Das <nirmoy.das@amd.com> |
drm/sched: implement and export drm_sched_pick_best Remove drm_sched_entity_get_free_sched() and use the logic of picking the least loaded drm scheduler from a drm scheduler list to implement drm_sched_pick_best(). This patch also exports drm_sched_pick_best() so that it can be utilized by other drm drivers. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
d164bebb |
|
11-Mar-2020 |
changzhu <Changfeng.Zhu@amd.com> |
Revert "drm/scheduler: improve job distribution with multiple queues" It needs to revert this patch to avoid amdgpu_test compute hang problem on picasso. This reverts commit 56822db194232c089601728d68ed078dccb97f8b. Signed-off-by: changzhu <Changfeng.Zhu@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
b37aced3 |
|
27-Feb-2020 |
Nirmoy Das <nirmoy.das@amd.com> |
drm/scheduler: implement a function to modify sched list Implement drm_sched_entity_modify_sched() which modifies existing sched_list with a different one. This is going to be helpful when userspace changes priority of a ctx/entity then the driver can switch to the corresponding HW scheduler list for that priority. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
2639f453 |
|
22-Jan-2020 |
Nirmoy Das <nirmoy.das@amd.com> |
drm/amdgpu: fix doc by clarifying sched_list definition expand sched_list definition for better understanding. Also fix a typo atleast -> at least Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
9e3e90c5 |
|
14-Jan-2020 |
Nirmoy Das <nirmoy.das@amd.com> |
drm/scheduler: fix documentation by replacing rq_list with sched_list This also replaces old artifacts with a correct one in drm_sched_entity_init() declaration Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
56822db1 |
|
15-Jan-2020 |
Nirmoy Das <nirmoy.das@amd.com> |
drm/scheduler: improve job distribution with multiple queues This patch uses score based logic to select a new rq for better loadbalance between multiple rq/scheds instead of num_jobs. Below are test results after running amdgpu_test from mesa drm Before this patch: sched_name num of many times it got scheduled ========= ================================== sdma0 314 sdma1 32 comp_1.0.0 56 comp_1.0.1 0 comp_1.1.0 0 comp_1.1.1 0 comp_1.2.0 0 comp_1.2.1 0 comp_1.3.0 0 comp_1.3.1 0 After this patch: sched_name num of many times it got scheduled ========= ================================== sdma0 216 sdma1 185 comp_1.0.0 39 comp_1.0.1 9 comp_1.1.0 12 comp_1.1.1 0 comp_1.2.0 12 comp_1.2.1 0 comp_1.3.0 12 comp_1.3.1 0 Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
b3ac1766 |
|
05-Dec-2019 |
Nirmoy Das <nirmoy.das@amd.com> |
drm/scheduler: rework entity creation Entity currently keeps a copy of run_queue list and modify it in drm_sched_entity_set_priority(). Entities shouldn't modify run_queue list. Use drm_gpu_scheduler list instead of drm_sched_rq list in drm_sched_entity struct. In this way we can select a runqueue based on entity/ctx's priority for a drm scheduler. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
dc10218d |
|
07-Nov-2019 |
Stephen Rothwell <sfr@canb.auug.org.au> |
drm/sched: struct completion requires linux/completion.h inclusion Fixes: 83a7772ba223 ("drm/sched: Use completion to wait for sched->thread idle v2.") Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
83a7772b |
|
04-Nov-2019 |
Andrey Grodzovsky <andrey.grodzovsky@amd.com> |
drm/sched: Use completion to wait for sched->thread idle v2. Removes thread park/unpark hack from drm_sched_entity_fini and by this fixes reactivation of scheduler thread while the thread is supposed to be stopped. v2: Per sched entity completion. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
a5343b8a |
|
18-Apr-2019 |
Andrey Grodzovsky <andrey.grodzovsky@amd.com> |
drm/scheduler: Add flag to hint the release of guilty job. Problem: Sched thread's cleanup function races against TO handler and removes the guilty job from mirror list and we have no way of differentiating if the job was removed from within the TO handler or from the sched thread's clean-up function. Fix: Add a flag to scheduler to hint the TO handler that the guilty job needs to be explicitly released. v2: whitespace fix Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/1555599624-12285-5-git-send-email-andrey.grodzovsky@amd.com
|
#
5918045c |
|
18-Apr-2019 |
Christian König <christian.koenig@amd.com> |
drm/scheduler: rework job destruction We now destroy finished jobs from the worker thread to make sure that we never destroy a job currently in timeout processing. By this we avoid holding lock around ring mirror list in drm_sched_stop which should solve a deadlock reported by a user. v2: Remove unused variable. v4: Move guilty job free into sched code. v5: Move sched->hw_rq_count to drm_sched_start to account for counter decrement in drm_sched_stop even when we don't call resubmit jobs if guily job did signal. v6: remove unused variable Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692 Acked-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/1555599624-12285-3-git-send-email-andrey.grodzovsky@amd.com
|
#
3741540e |
|
05-Dec-2018 |
Andrey Grodzovsky <andrey.grodzovsky@amd.com> |
drm/sched: Rework HW fence processing. Expedite job deletion from ring mirror list to the HW fence signal callback instead from finish_work, together with waiting for all such fences to signal in drm_sched_stop we garantee that already signaled job will not be processed twice. Remove the sched finish fence callback and just submit finish_work directly from the HW fence callback. v2: Fix comments. v3: Attach hw fence cb to sched_job v5: Rebase Suggested-by: Christian Koenig <Christian.Koenig@amd.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
222b5f04 |
|
04-Dec-2018 |
Andrey Grodzovsky <andrey.grodzovsky@amd.com> |
drm/sched: Refactor ring mirror list handling. Decauple sched threads stop and start and ring mirror list handling from the policy of what to do about the guilty jobs. When stoppping the sched thread and detaching sched fences from non signaled HW fenes wait for all signaled HW fences to complete before rerunning the jobs. v2: Fix resubmission of guilty job into HW after refactoring. v4: Full restart for all the jobs, not only from guilty ring. Extract karma increase into standalone function. v5: Rework waiting for signaled jobs without relying on the job struct itself as those might already be freed for non 'guilty' job's schedulers. Expose karma increase to drivers. v6: Use list_for_each_entry_safe_continue and drm_sched_process_job in case fence already signaled. Call drm_sched_increase_karma only once for amdgpu and add documentation. v7: Wait only for the latest job's fence. Suggested-by: Christian Koenig <Christian.Koenig@amd.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
1db8c142 |
|
29-Nov-2018 |
Sharat Masetty <smasetty@codeaurora.org> |
drm/scheduler: Add drm_sched_suspend/resume_timeout() This patch adds two new functions to help client drivers suspend and resume the scheduler job timeout. This can be useful in cases where the hardware has preemption support enabled. Using this, it is possible to have the timeout active only for the ring which is active on the ringbuffer. This patch also makes the job_list_lock IRQ safe. Suggested-by: Christian Koenig <Christian.Koenig@amd.com> Signed-off-by: Sharat Masetty <smasetty@codeaurora.org> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
26efecf9 |
|
29-Oct-2018 |
Sharat Masetty <smasetty@codeaurora.org> |
drm/scheduler: Add drm_sched_job_cleanup This patch adds a new API to clean up the scheduler job resources. This is primarliy needed in cases the job was created but was not queued to the scheduler queue. Additionally with this change, the layer which creates the scheduler job also gets to free up the job's resources and this entails moving the dma_fence_put(finished_fence) to the drivers ops free handler routines. Signed-off-by: Sharat Masetty <smasetty@codeaurora.org> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
faf6e1a8 |
|
17-Oct-2018 |
Andrey Grodzovsky <andrey.grodzovsky@amd.com> |
drm/sched: Add boolean to mark if sched is ready to work v5 Problem: A particular scheduler may become unsuable (underlying HW) after some event (e.g. GPU reset). If it's later chosen by the get free sched. policy a command will fail to be submitted. Fix: Add a driver specific callback to report the sched status so rq with bad sched can be avoided in favor of working one or none in which case job init will fail. v2: Switch from driver callback to flag in scheduler. v3: rebase v4: Remove ready paramter from drm_sched_init, set uncoditionally to true once init done. v5: fix missed change in v3d in v4 (Alex) Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
8fe159b0 |
|
12-Oct-2018 |
Christian König <christian.koenig@amd.com> |
drm/sched: add drm_sched_fault Add a helper to immediately start timeout handling in case of a hardware fault. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
6a962430 |
|
25-Sep-2018 |
Nayan Deshmukh <nayan26deshmukh@gmail.com> |
drm/scheduler: remove timeout work_struct from drm_sched_job (v3) having a delayed work item per job is redundant as we only need one per scheduler to track the time out the currently executing job. v2: the first element of the ring mirror list is the currently executing job so we don't need a additional variable for it v3: squash in fixes for v3d and etnaviv Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
62347a33 |
|
17-Aug-2018 |
Andrey Grodzovsky <andrey.grodzovsky@amd.com> |
drm/scheduler: Add stopped flag to drm_sched_entity The flag will prevent another thread from same process to reinsert the entity queue into scheduler's rq after it was already removewd from there by another thread during drm_sched_entity_flush. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
620e762f |
|
06-Aug-2018 |
Christian König <christian.koenig@amd.com> |
drm/scheduler: move entity handling into separate file This is complex enough on it's own. Move it into a separate C file. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
7febe4bf |
|
01-Aug-2018 |
Christian König <christian.koenig@amd.com> |
drm/scheduler: fix setting the priorty for entities (v2) Since we now deal with multiple rq we need to update all of them, not just the current one. v2: Trivial: Removed unused variable (Alex) Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
249a07c0 |
|
01-Aug-2018 |
Nayan Deshmukh <nayan26deshmukh@gmail.com> |
drm/scheduler: add counter for total jobs in scheduler To keep track of the scheduler load. Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
ac0a6cf1 |
|
01-Aug-2018 |
Nayan Deshmukh <nayan26deshmukh@gmail.com> |
drm/scheduler: add a list of run queues to the entity These are the potential run queues on which the jobs from this entity can be scheduled. We will use this to do load balancing. Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
43bce41c |
|
26-Jul-2018 |
Christian König <christian.koenig@amd.com> |
drm/scheduler: only kill entity if last user is killed v2 Note which task is using the entity and only kill it if the last user of the entity is killed. This should prevent problems when entities are leaked to child processes. v2: add missing kernel doc Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
068c3304 |
|
20-Jul-2018 |
Nayan Deshmukh <nayan26deshmukh@gmail.com> |
drm/scheduler: remove sched field from the entity The scheduler of the entity is decided by the run queue on which it is queued. This patch avoids us the effort required to maintain a sync between rq and sched field when we start shifting entites among different rqs. Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
cdc50176 |
|
20-Jul-2018 |
Nayan Deshmukh <nayan26deshmukh@gmail.com> |
drm/scheduler: modify API to avoid redundancy entity has a scheduler field and we don't need the sched argument in any of the functions where entity is provided. Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
aa16b6c6 |
|
13-Jul-2018 |
Nayan Deshmukh <nayan26deshmukh@gmail.com> |
drm/scheduler: modify args of drm_sched_entity_init replace run queue by a list of run queues and remove the sched arg as that is part of run queue itself Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Eric Anholt <eric@anholt.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
8dc9fbbf |
|
13-Jul-2018 |
Nayan Deshmukh <nayan26deshmukh@gmail.com> |
drm/scheduler: add a pointer to scheduler in the rq This patch is in preparation for a better load balancing in scheduler. It allows us to associate entities with the run queues instead of binding them to a scheduler. Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Eric Anholt <eric@anholt.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
180fc134 |
|
04-Jun-2018 |
Andrey Grodzovsky <andrey.grodzovsky@amd.com> |
drm/scheduler: Rename cleanup functions v2. Everything in the flush code path (i.e. waiting for SW queue to become empty) names with *_flush() and everything in the release code path names *_fini() This patch also effect the amdgpu and etnaviv drivers which use those functions. v2: Also pplay the change to vd3. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Acked-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
741f01e6 |
|
30-May-2018 |
Andrey Grodzovsky <andrey.grodzovsky@amd.com> |
drm/scheduler: Avoid using wait_event_killable for dying process (V4) Dying process might be blocked from receiving any more signals so avoid using it. Also retire enity->fini_status and just check the SW queue, if it's not empty do the fallback cleanup. Also handle entity->last_scheduled == NULL use case which happens when HW ring is already hangged whem a new entity tried to enqeue jobs. v2: Return the remaining timeout and use that as parameter for the next call. This way when we need to cleanup multiple queues we don't wait for the entire TO period for each queue but rather in total. Styling comments. Rebase. v3: Update types from unsigned to long. Work with jiffies instead of ms. Return 0 when TO expires. Rebase. v4: Remove unnecessary timeout calculation. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
2d33948e |
|
28-May-2018 |
Nayan Deshmukh <nayan26deshmukh@gmail.com> |
drm/scheduler: add documentation convert existing raw comments into kernel-doc format as well as add new documentation v2: reword the overview Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Daniel Vetter <daniel@ffwll.ch>
|
#
563e1e66 |
|
15-May-2018 |
Andrey Grodzovsky <andrey.grodzovsky@amd.com> |
drm/scheduler: Remove obsolete spinlock. This spinlock is superfluous, any call to drm_sched_entity_push_job should already be under a lock together with matching drm_sched_job_init to match the order of insertion into queue with job's fence seqence number. v2: Improve patch description. Add functions documentation describing the locking considerations Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: Chunming Zhou <david1.zhou@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
8344c53f |
|
29-Mar-2018 |
Nayan Deshmukh <nayan26deshmukh@gmail.com> |
drm/scheduler: remove unused parameter this patch also effect the amdgpu and etnaviv drivers which use the function drm_sched_entity_init Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Suggested-by: Christian König <christian.koenig@amd.com> Acked-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
8ee3a52e |
|
15-Apr-2018 |
Emily Deng <Emily.Deng@amd.com> |
drm/gpu-sched: fix force APP kill hang(v4) issue: there are VMC page fault occurred if force APP kill during 3dmark test, the cause is in entity_fini we manually signal all those jobs in entity's queue which confuse the sync/dep mechanism: 1)page fault occurred in sdma's clear job which operate on shadow buffer, and shadow buffer's Gart table is cleaned by ttm_bo_release since the fence in its reservation was fake signaled by entity_fini() under the case of SIGKILL received. 2)page fault occurred in gfx' job because during the lifetime of gfx job we manually fake signal all jobs from its entity in entity_fini(), thus the unmapping/clear PTE job depend on those result fence is satisfied and sdma start clearing the PTE and lead to GFX page fault. fix: 1)should at least wait all jobs already scheduled complete in entity_fini() if SIGKILL is the case. 2)if a fence signaled and try to clear some entity's dependency, should set this entity guilty to prevent its job really run since the dependency is fake signaled. v2: splitting drm_sched_entity_fini() into two functions: 1)The first one is does the waiting, removes the entity from the runqueue and returns an error when the process was killed. 2)The second one then goes over the entity, install it as completion signal for the remaining jobs and signals all jobs with an error code. v3: 1)Replace the fini1 and fini2 with better name 2)Call the first part before the VM teardown in amdgpu_driver_postclose_kms() and the second part after the VM teardown 3)Keep the original function drm_sched_entity_fini to refine the code. v4: 1)Rename entity->finished to entity->last_scheduled; 2)Rename drm_sched_entity_fini_job_cb() to drm_sched_entity_kill_jobs_cb(); 3)Pass NULL to drm_sched_entity_fini_job_cb() if -ENOENT; 4)Replace the type of entity->fini_status with "int"; 5)Remove the check about entity->finished. Signed-off-by: Monk Liu <Monk.Liu@amd.com> Signed-off-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
1a61ee07 |
|
04-Apr-2018 |
Eric Anholt <eric@anholt.net> |
drm/sched: Extend the documentation. These comments answer all the questions I had for myself when implementing a driver using the GPU scheduler. Signed-off-by: Eric Anholt <eric@anholt.net> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
4983e48c |
|
06-Dec-2017 |
Lucas Stach <l.stach@pengutronix.de> |
drm/sched: move fence slab handling to module init/exit This is the only part of the scheduler which must not be called from different drivers. Move it to module init/exit so it is done a single time when loading the scheduler. Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
#
1b1f42d8 |
|
06-Dec-2017 |
Lucas Stach <l.stach@pengutronix.de> |
drm: move amd_gpu_scheduler into common location This moves and renames the AMDGPU scheduler to a common location in DRM in order to facilitate re-use by other drivers. This is mostly a straight forward rename with no code changes. One notable exception is the function to_drm_sched_fence(), which is no longer a inline header function to avoid the need to export the drm_sched_fence_ops_scheduled and drm_sched_fence_ops_finished structures. Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|