#
9a0c32d6 |
|
01-Feb-2024 |
Danilo Krummrich <dakr@redhat.com> |
drm/nouveau: don't fini scheduler if not initialized nouveau_abi16_ioctl_channel_alloc() and nouveau_cli_init() simply call their corresponding *_fini() counterpart. This can lead to nouveau_sched_fini() being called without struct nouveau_sched ever being initialized in the first place. Instead of embedding struct nouveau_sched into struct nouveau_cli and struct nouveau_chan_abi16, allocate struct nouveau_sched separately, such that we can check for the corresponding pointer to be NULL in the particular *_fini() functions. It makes sense to allocate struct nouveau_sched separately anyway, since in a subsequent commit we can also avoid to allocate a struct nouveau_sched in nouveau_abi16_ioctl_channel_alloc() at all, if the VM_BIND uAPI has been disabled due to the legacy uAPI being used. Fixes: 5f03a507b29e ("drm/nouveau: implement 1:1 scheduler - entity relationship") Reported-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Closes: https://lore.kernel.org/nouveau/20240131213917.1545604-1-ttabi@nvidia.com/ Reviewed-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Danilo Krummrich <dakr@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240202000606.3526-1-dakr@redhat.com
|
#
19b4c60c |
|
25-Nov-2023 |
Luben Tuikov <ltuikov89@gmail.com> |
drm/sched: Fix compilation issues with DRM priority rename Fix compilation issues with DRM scheduler priority rename MIN to LOW. Signed-off-by: Luben Tuikov <ltuikov89@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202311252109.WgbJsSkG-lkp@intel.com/ Cc: Danilo Krummrich <dakr@redhat.com> Cc: Frank Binns <frank.binns@imgtec.com> Cc: Donald Robson <donald.robson@imgtec.com> Cc: Matt Coster <matt.coster@imgtec.com> Cc: Direct Rendering Infrastructure - Development <dri-devel@lists.freedesktop.org> Fixes: fe375c74806dbd ("drm/sched: Rename priority MIN to LOW") Fixes: 38f922a563aac3 ("drm/sched: Reverse run-queue priority enumeration") Fixes: 5f03a507b29e44 ("drm/nouveau: implement 1:1 scheduler - entity relationship") Link: https://patchwork.freedesktop.org/patch/msgid/20231125192246.87268-2-ltuikov89@gmail.com Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/7429262c-6dea-4dcc-bf7e-54d2277dabf1@amd.com
|
#
46990918 |
|
13-Nov-2023 |
Danilo Krummrich <dakr@redhat.com> |
drm/nouveau: enable dynamic job-flow control Make use of the scheduler's credit limit and scheduler job's credit count to account for the actual size of a job, such that we fill up the ring efficiently. Signed-off-by: Danilo Krummrich <dakr@redhat.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231114002728.3491-2-dakr@redhat.com
|
#
5f03a507 |
|
13-Nov-2023 |
Danilo Krummrich <dakr@redhat.com> |
drm/nouveau: implement 1:1 scheduler - entity relationship Recent patches to the DRM scheduler [1][2] allow for a variable number of run-queues and add support for (shared) workqueues rather than dedicated kthreads per scheduler. This allows us to create a 1:1 relationship between a GPU scheduler and a scheduler entity, in order to properly support firmware schedulers being able to handle an arbitrary amount of dynamically allocated command ring buffers. This perfectly matches Nouveau's needs, hence make use of it. Topology wise we create one scheduler instance per client (handling VM_BIND jobs) and one scheduler instance per channel (handling EXEC jobs). All channel scheduler instances share a workqueue, but every client scheduler instance has a dedicated workqueue. The latter is required to ensure that for VM_BIND job's free_job() work and run_job() work can always run concurrently and hence, free_job() work can never stall run_job() work. For EXEC jobs we don't have this requirement, since EXEC job's free_job() does not require to take any locks which indirectly or directly are held for allocations elsewhere. [1] https://lore.kernel.org/all/8f53f7ef-7621-4f0b-bdef-d8d20bc497ff@redhat.com/T/ [2] https://lore.kernel.org/all/20231031032439.1558703-1-matthew.brost@intel.com/T/ Signed-off-by: Danilo Krummrich <dakr@redhat.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231114002728.3491-1-dakr@redhat.com
|
#
014f831a |
|
13-Nov-2023 |
Danilo Krummrich <dakr@redhat.com> |
drm/nouveau: use GPUVM common infrastructure GPUVM provides common infrastructure to track external and evicted GEM objects as well as locking and validation helpers. Especially external and evicted object tracking is a huge improvement compared to the current brute force approach of iterating all mappings in order to lock and validate the GPUVM's GEM objects. Hence, make us of it. Signed-off-by: Danilo Krummrich <dakr@redhat.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231113221202.7203-1-dakr@redhat.com
|
#
a78422e9 |
|
09-Nov-2023 |
Danilo Krummrich <dakr@redhat.com> |
drm/sched: implement dynamic job-flow control Currently, job flow control is implemented simply by limiting the number of jobs in flight. Therefore, a scheduler is initialized with a credit limit that corresponds to the number of jobs which can be sent to the hardware. This implies that for each job, drivers need to account for the maximum job size possible in order to not overflow the ring buffer. However, there are drivers, such as Nouveau, where the job size has a rather large range. For such drivers it can easily happen that job submissions not even filling the ring by 1% can block subsequent submissions, which, in the worst case, can lead to the ring run dry. In order to overcome this issue, allow for tracking the actual job size instead of the number of jobs. Therefore, add a field to track a job's credit count, which represents the number of credits a job contributes to the scheduler's credit limit. Signed-off-by: Danilo Krummrich <dakr@redhat.com> Reviewed-by: Luben Tuikov <ltuikov89@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231110001638.71750-1-dakr@redhat.com
|
#
a6149f03 |
|
30-Oct-2023 |
Matthew Brost <matthew.brost@intel.com> |
drm/sched: Convert drm scheduler to use a work queue rather than kthread In Xe, the new Intel GPU driver, a choice has made to have a 1 to 1 mapping between a drm_gpu_scheduler and drm_sched_entity. At first this seems a bit odd but let us explain the reasoning below. 1. In Xe the submission order from multiple drm_sched_entity is not guaranteed to be the same completion even if targeting the same hardware engine. This is because in Xe we have a firmware scheduler, the GuC, which allowed to reorder, timeslice, and preempt submissions. If a using shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls apart as the TDR expects submission order == completion order. Using a dedicated drm_gpu_scheduler per drm_sched_entity solve this problem. 2. In Xe submissions are done via programming a ring buffer (circular buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow control on the ring for free. A problem with this design is currently a drm_gpu_scheduler uses a kthread for submission / job cleanup. This doesn't scale if a large number of drm_gpu_scheduler are used. To work around the scaling issue, use a worker rather than kthread for submission / job cleanup. v2: - (Rob Clark) Fix msm build - Pass in run work queue v3: - (Boris) don't have loop in worker v4: - (Tvrtko) break out submit ready, stop, start helpers into own patch v5: - (Boris) default to ordered work queue v6: - (Luben / checkpatch) fix alignment in msm_ringbuffer.c - (Luben) s/drm_sched_submit_queue/drm_sched_wqueue_enqueue - (Luben) Update comment for drm_sched_wqueue_enqueue - (Luben) Positive check for submit_wq in drm_sched_init - (Luben) s/alloc_submit_wq/own_submit_wq v7: - (Luben) s/drm_sched_wqueue_enqueue/drm_sched_run_job_queue v8: - (Luben) Adjust var names / comments Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://lore.kernel.org/r/20231031032439.1558703-3-matthew.brost@intel.com Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
|
#
56e44960 |
|
14-Oct-2023 |
Luben Tuikov <luben.tuikov@amd.com> |
drm/sched: Convert the GPU scheduler to variable number of run-queues The GPU scheduler has now a variable number of run-queues, which are set up at drm_sched_init() time. This way, each driver announces how many run-queues it requires (supports) per each GPU scheduler it creates. Note, that run-queues correspond to scheduler "priorities", thus if the number of run-queues is set to 1 at drm_sched_init(), then that scheduler supports a single run-queue, i.e. single "priority". If a driver further sets a single entity per run-queue, then this creates a 1-to-1 correspondence between a scheduler and a scheduled entity. Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Russell King <linux+etnaviv@armlinux.org.uk> Cc: Qiang Yu <yuq825@gmail.com> Cc: Rob Clark <robdclark@gmail.com> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Danilo Krummrich <dakr@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Emma Anholt <emma@anholt.net> Cc: etnaviv@lists.freedesktop.org Cc: lima@lists.freedesktop.org Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org Cc: nouveau@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20231023032251.164775-1-luben.tuikov@amd.com
|
#
7baf6055 |
|
10-Aug-2023 |
Danilo Krummrich <dakr@redhat.com> |
drm/nouveau: sched: avoid job races between entities If a sched job depends on a dma-fence from a job from the same GPU scheduler instance, but a different scheduler entity, the GPU scheduler does only wait for the particular job to be scheduled, rather than for the job to fully complete. This is due to the GPU scheduler assuming that there is a scheduler instance per ring. However, the current implementation, in order to avoid arbitrary amounts of kthreads, has a single scheduler instance while scheduler entities represent rings. As a workaround, set the DRM_SCHED_FENCE_DONT_PIPELINE for all out-fences in order to force the scheduler to wait for full job completion for dependent jobs from different entities and same scheduler instance. There is some work in progress [1] to address the issues of firmware schedulers; once it is in-tree the scheduler topology in Nouveau should be re-worked accordingly. [1] https://lore.kernel.org/dri-devel/20230801205103.627779-1-matthew.brost@intel.com/ Signed-off-by: Danilo Krummrich <dakr@redhat.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collaboralcom> Link: https://patchwork.freedesktop.org/patch/msgid/20230811010632.2473-1-dakr@redhat.com
|
#
31499b01 |
|
16-Sep-2023 |
Danilo Krummrich <dakr@redhat.com> |
drm/nouveau: sched: fix leaking memory of timedout job Always stop and re-start the scheduler in order to let the scheduler free up the timedout job in case it got signaled. In case of exec jobs the job type specific callback will take care to signal all fences and tear down the channel. Fixes: b88baab82871 ("drm/nouveau: implement new VM_BIND uAPI") Signed-off-by: Danilo Krummrich <dakr@redhat.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230916162835.5719-1-dakr@redhat.com
|
#
6cdcc65f |
|
10-Aug-2023 |
Danilo Krummrich <dakr@redhat.com> |
drm/nouveau: sched: avoid job races between entities If a sched job depends on a dma-fence from a job from the same GPU scheduler instance, but a different scheduler entity, the GPU scheduler does only wait for the particular job to be scheduled, rather than for the job to fully complete. This is due to the GPU scheduler assuming that there is a scheduler instance per ring. However, the current implementation, in order to avoid arbitrary amounts of kthreads, has a single scheduler instance while scheduler entities represent rings. As a workaround, set the DRM_SCHED_FENCE_DONT_PIPELINE for all out-fences in order to force the scheduler to wait for full job completion for dependent jobs from different entities and same scheduler instance. There is some work in progress [1] to address the issues of firmware schedulers; once it is in-tree the scheduler topology in Nouveau should be re-worked accordingly. [1] https://lore.kernel.org/dri-devel/20230801205103.627779-1-matthew.brost@intel.com/ Signed-off-by: Danilo Krummrich <dakr@redhat.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collaboralcom> Link: https://patchwork.freedesktop.org/patch/msgid/20230811010632.2473-1-dakr@redhat.com
|
#
e05f3938 |
|
07-Aug-2023 |
Faith Ekstrand <faith@gfxstrand.net> |
drm/nouveau/sched: Don't pass user flags to drm_syncobj_find_fence() The flags field in drm_syncobj_find_fence() takes SYNCOBJ_WAIT flags from the syncobj UAPI whereas sync->flags is from the nouveau UAPI. What we actually want is 0 flags which tells it to just try to find the fence and then return without waiting. Fixes: b88baab82871 ("drm/nouveau: implement new VM_BIND uAPI") Cc: Danilo Krummrich <dakr@redhat.com> Cc: Dave Airlie <airlied@redhat.com> Reviewed-by: Danilo Krummrich <dakr@redhat.com> Signed-off-by: Faith Ekstrand <faith.ekstrand@collabora.com> Signed-off-by: Danilo Krummrich <dakr@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230807234210.175968-1-faith.ekstrand@collabora.com
|
#
b88baab8 |
|
04-Aug-2023 |
Danilo Krummrich <dakr@redhat.com> |
drm/nouveau: implement new VM_BIND uAPI This commit provides the implementation for the new uapi motivated by the Vulkan API. It allows user mode drivers (UMDs) to: 1) Initialize a GPU virtual address (VA) space via the new DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify the portion of VA space managed by the kernel and userspace, respectively. 2) Allocate and free a VA space region as well as bind and unbind memory to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl. UMDs can request the named operations to be processed either synchronously or asynchronously. It supports DRM syncobjs (incl. timelines) as synchronization mechanism. The management of the GPU VA mappings is implemented with the DRM GPU VA manager. 3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl. The execution happens asynchronously. It supports DRM syncobj (incl. timelines) as synchronization mechanism. DRM GEM object locking is handled with drm_exec. Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, use the DRM GPU scheduler for the asynchronous paths. Reviewed-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Danilo Krummrich <dakr@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230804182406.5222-12-dakr@redhat.com
|