#
9cb46b31 |
|
01-Apr-2024 |
Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> |
drm/xe/xe_migrate: Cast to output precision before multiplying operands Addressing potential overflow in result of multiplication of two lower precision (u32) operands before widening it to higher precision (u64). -v2 Fix commit message and description. (Rodrigo) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240401175300.3823653-1-himal.prasad.ghimiray@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit 34820967ae7b45411f8f4f737c2d63b0c608e0d7) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
#
45cfade3 |
|
26-Feb-2024 |
Arnd Bergmann <arnd@arndb.de> |
drm/xe/xe2: fix 64-bit division in pte_update_size This function does not build on 32-bit targets when the compiler fails to reduce DIV_ROUND_UP() into a shift: ld.lld: error: undefined symbol: __aeabi_uldivmod >>> referenced by xe_migrate.c >>> drivers/gpu/drm/xe/xe_migrate.o:(pte_update_size) in archive vmlinux.a There are two instances in this function. Change the first to use an open-coded shift with the same behavior, and the second one to a 32-bit calculation, which is sufficient here as the size is never more than 2^32 pages (16TB). Fixes: 237412e45390 ("drm/xe: Enable 32bits build") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240226124736.1272949-3-arnd@kernel.org Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 1408784b599927d2f361bac6dc5170d2ee275f17) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
#
a24d9099 |
|
21-Feb-2024 |
Dafna Hirschfeld <dhirschfeld@habana.ai> |
drm/xe: Do not include current dir for generated/xe_wa_oob.h The generated file 'generated/xe_wa_oob.h' is included using: "generated/xe_wa_oob.h" which first look inside the source code. But the file resides in the build directory and should therefore be included using: <generated/xe_wa_oob.h> Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240221083622.1584492-1-dhirschfeld@habana.ai
|
#
72f86ed3 |
|
01-Feb-2024 |
Matthew Brost <matthew.brost@intel.com> |
drm/xe: Map both mem.kernel_bb_pool and usm.bb_pool For integrated devices we need to map both mem.kernel_bb_pool and usm.bb_pool to be able to run batches from both pools. Fixes: a682b6a42d4d ("drm/xe: Support device page faults on integrated platforms") Tested-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Brian Welty <brian.welty@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240202033440.2351862-1-matthew.brost@intel.com
|
#
a856b67a |
|
31-Jan-2024 |
Matthew Brost <matthew.brost@intel.com> |
drm/xe: Take a reference in xe_exec_queue_last_fence_get() Take a reference in xe_exec_queue_last_fence_get(). Also fix a reference counting underflow bug VM bind and unbind. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240201004849.2219558-2-matthew.brost@intel.com
|
#
348769d1 |
|
24-Jan-2024 |
Fei Yang <fei.yang@intel.com> |
drm/xe: correct the assertion for number of PTEs While one MI_STORE_DATA_IMM can take no more than 0x1fe qwords, the size of the pgtable can be 512 entries. Fixes: 43d48379c939 ("drm/xe: correct the calculation of remaining size") Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Fei Yang <fei.yang@intel.com> Tested-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240125065245.1204731-2-fei.yang@intel.com
|
#
6a028675 |
|
18-Jan-2024 |
Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> |
drm/xe/xe2: Use XE_CACHE_WB pat index The pat table entry associated with XE_CACHE_WB is coherent whereas XE_CACHE_NONE is non coherent. Migration expects the coherency with cpu therefore use the coherent entry XE_CACHE_WB for buffers not supporting compression. For read/write to flat ccs region the issue is not related to coherency with cpu. The hardware expects the pat index associated with GPUVA for indirect access to be compression enabled hence use XE_CACHE_NONE_COMPRESSION. v2 - Fix the argument to emit_pte, pass the bool directly. (Thomas) v3 - Rebase - Update commit message (Matt) v4 - Add a Fixes: tag. (Thomas) Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: 65ef8dbad1db ("drm/xe/xe2: Update emit_pte to use compression enabled PAT index") Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240119041826.1670496-1-himal.prasad.ghimiray@intel.com
|
#
43d48379 |
|
16-Jan-2024 |
Fei Yang <fei.yang@intel.com> |
drm/xe: correct the calculation of remaining size In function write_pgtable, the calculation of chunk in the do-while loop is wrong, we should always compare against remaining size instead of the total size update->qwords. Signed-off-by: Fei Yang <fei.yang@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240116223709.652585-2-fei.yang@intel.com
|
#
ca630876 |
|
11-Jan-2024 |
Matt Roper <matthew.d.roper@intel.com> |
drm/xe/migrate: Cap PTEs written by MI_STORE_DATA_IMM to 510 Although MI_STORE_DATA_IMM's "length" field is 10-bits, 0x3FE is considered the largest legal value accepted. Since that instruction field is always encoded in (val-2) format, this translates to 0x400 dwords for the true maximum length of the instruction. Subtracting the instruction header (1 dword) and address (2 dwords), that leaves 0x3FD dwords (i.e., 0x1FE qwords) for PTE values. Bspec: 60246, 45753 Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20240111220238.1467572-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
#
ef51d754 |
|
10-Jan-2024 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/xe/migrate: Fix CCS copy for small VRAM copy chunks Since the migrate code is using the identity map for addressing VRAM, copy chunks may become as small as 64K if the VRAM resource is fragmented. However, a chunk size smaller that 1MiB may lead to the *next* chunk's offset into the CCS metadata backup memory may not be page-aligned, and the XY_CTRL_SURF_COPY_BLT command can't handle that, and even if it could, the current code doesn't handle the offset calculaton correctly. To fix this, make sure we align the size of VRAM copy chunks to 1MiB. If the remaining data to copy is smaller than that, that's not a problem, so use the remaining size. If the VRAM copy cunk becomes fragmented due to the size alignment restriction, don't use the identity map, but instead emit PTEs into the page-table like we do for system memory. v2: - Rebase v3: - Future proof somewhat by taking into account the real data size to flat CCS metadata size ratio. (Matt Roper) - Invert a couple of if-statements for better readability. - Fix support for 4K-granularity VRAM sizes. (Tested on DG1). v4: - Fix up code comments - Fix debug printout format typo. v5: - Add a Fixes: tag. Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Matthew Auld <matthew.william.auld@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Fixes: e89b384cde62 ("drm/xe/migrate: Update emit_pte to cope with a size level than 4k") Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240110163415.524165-1-thomas.hellstrom@linux.intel.com
|
#
25ce7c50 |
|
10-Jan-2024 |
Brian Welty <brian.welty@intel.com> |
drm/xe: Finish refactoring of exec_queue_create Setting of exec_queue user extensions is moved from the end of the ioctl function earlier, into __xe_exec_queue_alloc(). This fixes bug in that the USM attributes for access counters were being applied too late, and effectively were ignored. However, in order to apply user extensions this early, we can no longer call q->ops functions. Instead, make it more efficient. The user extension functions can simply update the q->sched_props values and they will be applied by the backend during q->ops->init(). v2: minor changes for readability (Matt) Signed-off-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
|
#
a8004af3 |
|
05-Jan-2024 |
Brian Welty <brian.welty@intel.com> |
drm/xe: Fix modifying exec_queue priority in xe_migrate_init After exec_queue has been created, we cannot simply modify q->priority. This needs to be done by the backend via q->ops. However in this case, it would be more efficient to simply pass a flag when creating the exec_queue and set the desired priority upfront during queue creation. To that end: new flag EXEC_QUEUE_FLAG_HIGH_PRIORITY is introduced. The priority field is moved to be with other scheduling properties and is now exec_queue.sched_props.priority. This is no longer set to initial value by the backend, but is now set within __xe_exec_queue_create(). Fixes: b4eecedc75c1 ("drm/xe: Fix potential deadlock handling page faults") Signed-off-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
|
#
3aa3c5c2 |
|
01-Feb-2024 |
Matthew Brost <matthew.brost@intel.com> |
drm/xe: Map both mem.kernel_bb_pool and usm.bb_pool For integrated devices we need to map both mem.kernel_bb_pool and usm.bb_pool to be able to run batches from both pools. Fixes: a682b6a42d4d ("drm/xe: Support device page faults on integrated platforms") Tested-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Brian Welty <brian.welty@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240202033440.2351862-1-matthew.brost@intel.com (cherry picked from commit 72f86ed3c88933d6fa09b036de93621ea71097a7) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
#
fc29b6d5 |
|
31-Jan-2024 |
Matthew Brost <matthew.brost@intel.com> |
drm/xe: Take a reference in xe_exec_queue_last_fence_get() Take a reference in xe_exec_queue_last_fence_get(). Also fix a reference counting underflow bug VM bind and unbind. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240201004849.2219558-2-matthew.brost@intel.com (cherry picked from commit a856b67a84169e065ebbeee50258936b1eacc9eb) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
#
c0e2508c |
|
18-Jan-2024 |
Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> |
drm/xe/xe2: Use XE_CACHE_WB pat index The pat table entry associated with XE_CACHE_WB is coherent whereas XE_CACHE_NONE is non coherent. Migration expects the coherency with cpu therefore use the coherent entry XE_CACHE_WB for buffers not supporting compression. For read/write to flat ccs region the issue is not related to coherency with cpu. The hardware expects the pat index associated with GPUVA for indirect access to be compression enabled hence use XE_CACHE_NONE_COMPRESSION. v2 - Fix the argument to emit_pte, pass the bool directly. (Thomas) v3 - Rebase - Update commit message (Matt) v4 - Add a Fixes: tag. (Thomas) Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: 65ef8dbad1db ("drm/xe/xe2: Update emit_pte to use compression enabled PAT index") Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240119041826.1670496-1-himal.prasad.ghimiray@intel.com (cherry picked from commit 6a02867560f77328ae5637b70b06704b140aafa6) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
#
7425c43c |
|
10-Jan-2024 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/xe/migrate: Fix CCS copy for small VRAM copy chunks Since the migrate code is using the identity map for addressing VRAM, copy chunks may become as small as 64K if the VRAM resource is fragmented. However, a chunk size smaller that 1MiB may lead to the *next* chunk's offset into the CCS metadata backup memory may not be page-aligned, and the XY_CTRL_SURF_COPY_BLT command can't handle that, and even if it could, the current code doesn't handle the offset calculaton correctly. To fix this, make sure we align the size of VRAM copy chunks to 1MiB. If the remaining data to copy is smaller than that, that's not a problem, so use the remaining size. If the VRAM copy cunk becomes fragmented due to the size alignment restriction, don't use the identity map, but instead emit PTEs into the page-table like we do for system memory. v2: - Rebase v3: - Future proof somewhat by taking into account the real data size to flat CCS metadata size ratio. (Matt Roper) - Invert a couple of if-statements for better readability. - Fix support for 4K-granularity VRAM sizes. (Tested on DG1). v4: - Fix up code comments - Fix debug printout format typo. v5: - Add a Fixes: tag. Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Matthew Auld <matthew.william.auld@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Fixes: e89b384cde62 ("drm/xe/migrate: Update emit_pte to cope with a size level than 4k") Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240110163415.524165-1-thomas.hellstrom@linux.intel.com (cherry picked from commit ef51d7542d143f3fd9a48d4e2c307563661668aa) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
#
19c02225 |
|
05-Jan-2024 |
Brian Welty <brian.welty@intel.com> |
drm/xe: Fix modifying exec_queue priority in xe_migrate_init After exec_queue has been created, we cannot simply modify q->priority. This needs to be done by the backend via q->ops. However in this case, it would be more efficient to simply pass a flag when creating the exec_queue and set the desired priority upfront during queue creation. To that end: new flag EXEC_QUEUE_FLAG_HIGH_PRIORITY is introduced. The priority field is moved to be with other scheduling properties and is now exec_queue.sched_props.priority. This is no longer set to initial value by the backend, but is now set within __xe_exec_queue_create(). Fixes: b4eecedc75c1 ("drm/xe: Fix potential deadlock handling page faults") Signed-off-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> (cherry picked from commit a8004af338f6b3319476ecbed63ea49bf393fc1f) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
#
266c8588 |
|
12-Dec-2023 |
Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> |
drm/xe/xe2: Handle flat ccs move for igfx. - Clear flat ccs during user bo creation. - copy ccs meta data between flat ccs and bo during eviction and restore. - Add a bool field ccs_cleared in bo, true means ccs region of bo is already cleared. v2: - Rebase. v3: - Maintain order of xe_bo_move_notify for ttm_bo_type_sg. v4: - xe_migrate_copy can be used to copy src to dst bo on igfx too. Add a bool which handles only ccs metadata copy. v5: - on dgfx ccs should be cleared even if the bo is not compression enabled. Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
65ef8dba |
|
12-Dec-2023 |
Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> |
drm/xe/xe2: Update emit_pte to use compression enabled PAT index For indirect accessed buffer use compression enabled PAT index. v2: - Fix parameter name. v3: - use a relevant define instead of fix number. Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
09427526 |
|
05-Dec-2023 |
Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> |
drm/xe/xe2: Update chunk size for each iteration of ccs copy In xe2 platform XY_CTRL_SURF_COPY_BLT can handle ccs copy for max of 1024 main surface pages. v2: - Use better logic to determine chunk size (Matt/Thomas) v3: - use function instead of macro(Thomas) Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
9116eabb |
|
12-Dec-2023 |
Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> |
drm/xe/xe_migrate: Use NULL 1G PTE mapped at 255GiB VA for ccs clear Get rid of the cleared bo, instead use null 1G PTE mapped at 255GiB offset, this can be used for both dgfx and igfx. v2: - Remove xe_migrate::cleared_bo. - Add a comment for NULL mapping.(Thomas) Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
9cca4902 |
|
12-Dec-2023 |
Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> |
drm/xe/xe2: Updates on XY_CTRL_SURF_COPY_BLT - The XY_CTRL_SURF_COPY_BLT instruction operating on ccs data expects size in pages of main memory for which CCS data should be copied. - The bitfield representing copy size in XY_CTRL_SURF_COPY_BLT has shifted one bit higher in the instruction. v2: - Fix the num_pages for ccs size calculation. - Address nits (Thomas) v3: - Use FIELD_PREP and FIELD_FIT instead of shifts and numbers.(Matt) Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
eb9702ad |
|
05-Dec-2023 |
Matthew Brost <matthew.brost@intel.com> |
drm/xe: Allow num_batch_buffer / num_binds == 0 in IOCTLs The idea being out-syncs can signal indicating all previous operations on the bind queue are complete. An example use case of this would be support for implementing vkQueueWaitIdle easily. All in-syncs are waited on before signaling out-syncs. This is implemented by forming a composite software fence of in-syncs and installing this fence in the out-syncs and exec queue last fence slot. The last fence must be added as a dependency for jobs on user exec queues as it is possible for the last fence to be a composite software fence (unordered, ioctl with zero bb or binds) rather than hardware fence (ordered, previous job on queue). Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
5a92da34 |
|
05-Dec-2023 |
Lucas De Marchi <lucas.demarchi@intel.com> |
drm/xe: Rename info.supports_* to info.has_* Rename supports_mmio_ext and supports_usm to use a has_ prefix so the flags are grouped together. This settles on just one variant for positive info matching ("has_") and one for negative ("skip_"). Also make sure the has_* flags are grouped together in xe_pci.c. Reviewed-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/20231205145235.2114761-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
a682b6a4 |
|
20-Nov-2023 |
Brian Welty <brian.welty@intel.com> |
drm/xe: Support device page faults on integrated platforms Update xe_migrate_prepare_vm() to use the usm batch buffer even for servicing device page faults on integrated platforms. And as we have no VRAM on integrated platforms, device pagefault handler should not attempt to migrate into VRAM. LNL is first integrated platform to support device pagefaults. Signed-off-by: Brian Welty <brian.welty@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
a21fe5ee |
|
21-Nov-2023 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/xe/bo: Rename xe_bo_get_sg() to xe_bo_sg() Using "get" typically refers to obtaining a refcount, which we don't do here so rename to xe_bo_sg(). Suggested-by: Ohad Sharabi <osharabi@habana.ai> Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/946 Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Ohad Sharabi<osharabi@habana.ai> Link: https://patchwork.freedesktop.org/patch/msgid/20231122110359.4087-3-thomas.hellstrom@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
a667cf56 |
|
25-Oct-2023 |
Matthew Auld <matthew.auld@intel.com> |
drm/xe/bo: consider dma-resv fences for clear job There could be active fences already in the dma-resv for the object prior to clearing. Make sure to input them as dependencies for the clear job. v2 (Matt B): - We can use USAGE_KERNEL here, since it's only the move fences we care about here. Also add a comment. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
4202dd9f |
|
25-Oct-2023 |
Matthew Auld <matthew.auld@intel.com> |
drm/xe/migrate: fix MI_ARB_ON_OFF usage Spec says: "This is a privileged command; it will not be effective (will be converted to a no-op) if executed from within a non-privileged batch buffer." However here it looks like we are just emitting it inside some bb which was jumped to via the ppGTT, which should be considered a non-privileged address space. It looks like we just need some way of preventing things like the emit_pte() and later copy/clear being preempted in-between so rather just emit directly in the ring for migration jobs. Bspec: 45716 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
0134f130 |
|
16-Oct-2023 |
Matt Roper <matthew.d.roper@intel.com> |
drm/xe: Extract MI_* instructions to their own header Extracting the common MI_* instructions that can be used with any engine to their own header will make it easier as we add additional engine instructions in upcoming patches. Also, since the majority of GPU instructions (both MI and non-MI) have a "length" field in bits 7:0 of the instruction header, a common define is added for that. Instruction-specific length fields are still defined for special case instructions that have larger/smaller length fields. v2: - Use "instr" instead of "inst" as the short form of "instruction" everywhere. (Lucas) - Include xe_reg_defs.h instead of the i915 compat header. (Lucas) Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20231016163449.1300701-12-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
14a1e6a4 |
|
16-Oct-2023 |
Matt Roper <matthew.d.roper@intel.com> |
drm/xe: Clarify number of dwords/qwords stored by MI_STORE_DATA_IMM MI_STORE_DATA_IMM can store either dword values or qword values, and can store more than one value if the instruction's length field is large enough. Create explicit defines to specify the number of dwords/qwords to be stored, which will set the instruction length correctly and, if necessary, turn on the 'store qword' bit. While we're here, also replace an open-coded version of MI_STORE_DATA_IMM with the common macros. Bspec: 60246 Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20231016163449.1300701-11-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
e814389f |
|
06-Oct-2023 |
Matthew Auld <matthew.auld@intel.com> |
drm/xe: directly use pat_index for pte_encode In a future patch userspace will be able to directly set the pat_index as part of vm_bind. To support this we need to get away from using xe_cache_level in the low level routines and rather just use the pat_index directly. v2: Rebase v3: Some missed conversions, also prefer tile_to_xe() (Niranjana) v4: remove leftover const (Lucas) Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Cc: Pallavi Mishra <pallavi.mishra@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Pallavi Mishra <pallavi.mishra@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
d9e85dd5 |
|
05-Oct-2023 |
David Kershner <david.kershner@intel.com> |
drm/xe/xe_migrate.c: Use DPA offset for page table entries. Device Physical Address (DPA) is the starting offset device memory. Update xe_migrate identity map base PTE entries to start at dpa_base instead of 0. The VM offset value should be 0 relative instead of DPA relative. Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: "Michael J. Ruhl" <michael.j.ruhl@intel.com> Signed-off-by: David Kershner <david.kershner@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
30603b5b |
|
29-Sep-2023 |
Haridhar Kalvala <haridhar.kalvala@intel.com> |
drm/xe/xe2: Update MOCS fields in blitter instructions Xe2 changes or adds bits for mocs in a few BLT instructions: XY_CTRL_SURF_COPY_BLT, XY_FAST_COLOR_BLT, XY_FAST_COPY_BLT, and MEM_SET. Modify the code to deal with the new location. Unlike Xe1, the MOCS field in those instructions is only the MOCS index and not the Structure_MEMORY_OBJECT_CONTROL_STATE anymore. The pxp bit is now explicitly documented separately. Bspec: 57567,57566,57565,57562 Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230929213640.3189912-5-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
4bdd8c2e |
|
29-Sep-2023 |
Haridhar Kalvala <haridhar.kalvala@intel.com> |
drm/xe/xe2: Set tile y type in XY_FAST_COPY_BLT to Tile4 Set bits 30 and 31 of XY_FAST_COPY_BLT's dword1 for XeHP and above. Destination or source being Y-Major is selected on dword0 and there's nothing to set on dword1. According to the bspec for Xe2, "Behavior is undefined when programmed the value 0". Also for XeHP, the only value allowed in those bits is 0b11, not being possible to select "Legacy Tile-Y" anymore, only the newer Tile4. So, unconditionally set those bits for graphics IP 12.50 and above. v2: Reword commit message and extend it to graphics version >= 12.50 (Matt Roper) Bspec: 57567 Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230929213640.3189912-4-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
c690f0e6 |
|
29-Sep-2023 |
Haridhar Kalvala <haridhar.kalvala@intel.com> |
drm/xe: Rename MEM_SET instruction PVC_MS_* doesn't reflect the real name of the instruction. Rename it to follow the name used in the bspec. Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230929213640.3189912-3-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
2c0ac321 |
|
29-Sep-2023 |
Haridhar Kalvala <haridhar.kalvala@intel.com> |
drm/xe: Adjust mocs field mask definitions Instead of using xe_mocs_index_to_value(), simply define the bitmask with the shift left applied. This will make it easier to adapt to new platforms that simply use the index. This also fixes PVC bug in emit_clear_link_copy() where the MOCS was getting shifted both by PVC_MS_MOCS_INDEX_MASK definition and by the xe_moc_index_to_value function. Bspec: 44509 Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230929213640.3189912-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
fcd75139 |
|
27-Sep-2023 |
Lucas De Marchi <lucas.demarchi@intel.com> |
drm/xe: Use pat_index to encode pde/pte Change the xelp_pte_encode() and xelp_pde_encode() functions to use the platform-dependent pat_index. The same function can be used for all platforms as they only need to encode the pat_index bits in the same pte/pde layout. For platforms that don't have the most significant bit, as long as they don't return a bogus index they should be fine. v2: Use the same logic to encode pde as it's compatible with previous logic, it's more future proof and also fixes the cache setting for PVC (Matt Roper) Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230927193902.2849159-10-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
23c8495e |
|
27-Sep-2023 |
Lucas De Marchi <lucas.demarchi@intel.com> |
drm/xe/migrate: Do not hand-encode pte Instead of encoding the pte, call a new vfunc from xe_vm to handle that. The encoding may not be the same on every platform, so keeping it in one place helps to better support them. Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230927193902.2849159-5-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
0e5e77bd |
|
27-Sep-2023 |
Lucas De Marchi <lucas.demarchi@intel.com> |
drm/xe: Use vfunc for pte/pde ppgtt encoding Move the function to encode pte/pde to be vfuncs inside struct xe_vm. This will allow to easily extend to platforms that don't have a compatible encoding. v2: Fix kunit build Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230927193902.2849159-4-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
1951dad5 |
|
27-Sep-2023 |
Matt Roper <matthew.d.roper@intel.com> |
drm/xe: Infer service copy functionality from engine list On platforms with multiple BCS engines (i.e., PVC and Xe2), not all BCS engines are created equal. The BCS0 engine is what the specs refer to as a "resource copy engine," which supports the platform's full set of copy/fill instructions. In contast, the non-BCS0 "service copy" engines are more streamlined and only support a subset of the GPU instructions supported by the resource copy engine. Platforms with both types of copy engines always support the MEM_COPY and MEM_SET instructions which can be used for simple copy and fill operations on either type of BCS engine. Since the simple MEM_SET instruction meets the needs of Xe's migrate code (and since the more elaborate XY_FAST_COLOR_BLT instruction isn't available to use on service copy engines), we always prefer to use MEM_SET for clearing buffers on our newer platforms. We've been using a 'has_link_copy_engine' feature flag to keep track of which platforms should use MEM_SET for fills. However a feature flag like this is unnecessary since we can already derive the presence of service copy engines (and in turn the MEM_SET instruction) just by looking at the platform's pre-fusing engine list. Utilizing the engine list for this also avoids mistakes like we've made on Xe2 where we forget to set the feature flag in the IP definition. For clarity, "service copy" is a general term that covers any blitter engines that support a limited subset of the overall blitter instruction set (in practice this is any non-BCS0 blitter engine). The "link copy engines" introduced on PVC and the "paging copy engine" present in Xe2 are both instances of service copy engines. v2: - Rewrite / expand the commit message. (Bala) - Fix checkpatch whitespace error. Bspec: 65019 Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Reviewed-by: Haridhar Kalvala <haridhar.kalvala@intel.com> Link: https://lore.kernel.org/r/20230927205143.2695089-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
c73acc1e |
|
12-Sep-2023 |
Francois Dugast <francois.dugast@intel.com> |
drm/xe: Use Xe assert macros instead of XE_WARN_ON macro The XE_WARN_ON macro maps to WARN_ON which is not justified in many cases where only a simple debug check is needed. Replace the use of the XE_WARN_ON macro with the new xe_assert macros which relies on drm_*. This takes a struct drm_device argument, which is one of the main changes in this commit. The other main change is that the condition is reversed, as with XE_WARN_ON a message is displayed if the condition is true, whereas with xe_assert it is if the condition is false. v2: - Rebase - Keep WARN splats in xe_wopcm.c (Matt Roper) v3: - Rebase Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
d00e9cc2 |
|
08-Sep-2023 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/xe/vm: Simplify and document xe_vm_lock() The xe_vm_lock() function was unnecessarily using ttm_eu_reserve_buffers(). Simplify and document the interface. v4: - Improve on xe_vm_lock() documentation (Matthew Brost) v5: - Rebase conflict. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230908091716.36984-3-thomas.hellstrom@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
a043fbab |
|
16-Aug-2023 |
Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> |
drm/xe/pvc: Use fast copy engines as migrate engine on PVC Some copy hardware engine instances are faster than others on PVC. Use a virtual engine of these plus the reserved instance for the migrate engine on PVC. The idea being if a fast instance is available it will be used and the throughput of kernel copies, clears, and pagefault servicing will be higher. v2: Use OOB WA, use all copy engines if no WA is required Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
923e4238 |
|
22-Aug-2023 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/xe: split kernel vs permanent engine flags If an engine is only destroyed on driver unload, we can skip its clean-up steps with the GuC because the GuC is going to be tuned off as well, so it doesn't matter if we're in sync with it or not. Currently, we apply this optimization to all engines marked as kernel, but this stops us to supporting kernel engines that don't stick around until unload. To remove this limitation, add a separate flag to indicate if the engine is expected to only be destryed on driver unload and use that to trigger the optimzation. While at it, add a small comment to explain what each engine flag represents. v2: s/XE_BUG_ON/XE_WARN_ON, s/ENGINE/EXEC_QUEUE v3: rebased Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20230822173334.1664332-3-daniele.ceraolospurio@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
0887a2e7 |
|
11-Jul-2023 |
Oak Zeng <oak.zeng@intel.com> |
drm/xe: Make xe_mem_region struct Make a xe_mem_region structure which will be used in the coming patches. The new structure is used in both xe device level (xe->mem.vram) and xe_tile level (tile->vram). Make the definition of xe_mem_region.dpa_base to be the DPA base of this memory region and change codes according to this new definition. v1: - rename xe_mem_region.base to dpa_base per conversation with Mike Ruhl Signed-off-by: Oak Zeng <oak.zeng@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
9b9529ce |
|
31-Jul-2023 |
Francois Dugast <francois.dugast@intel.com> |
drm/xe: Rename engine to exec_queue Engine was inappropriately used to refer to execution queues and it also created some confusion with hardware engines. Where it applies the exec_queue variable name is changed to q and comments are also updated. Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/162 Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
c22a4ed0 |
|
31-Jul-2023 |
Francois Dugast <francois.dugast@intel.com> |
drm/xe: Rename xe_engine.[ch] to xe_exec_queue.[ch] This is a preparation commit for a larger renaming of engine to exec queue. Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
99fea682 |
|
27-Jul-2023 |
Francois Dugast <francois.dugast@intel.com> |
drm/xe: Prefer WARN() over BUG() to avoid crashing the kernel Replace calls to XE_BUG_ON() with calls XE_WARN_ON() which in turn calls WARN() instead of BUG(). BUG() crashes the kernel and should only be used when it is absolutely unavoidable in case of catastrophic and unrecoverable failures, which is not the case here. Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
b23ebae7 |
|
26-Jul-2023 |
Lucas De Marchi <lucas.demarchi@intel.com> |
drm/xe: Set PTE_DM bit for stolen on MTL Integrated graphics 1270 and beyond should set the PTE_LM bit in the PTE when it's stolen memory. Add a new function, xe_bo_is_stolen_devmem(), and use it when encoding the PTE. In some places in the spec the PTE bit is called "Local Memory", abbreviated as LM, and in others it's called "Device Memory" (DM). Since we moved away from "Local Memory" and preferred the "vram" terminology, also rename the macros as DM to follow the name of the new function. Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230726160708.3967790-7-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
937b4be7 |
|
26-Jul-2023 |
Lucas De Marchi <lucas.demarchi@intel.com> |
drm/xe: Decouple vram check from xe_bo_addr() The output arg is_vram in xe_bo_addr() is unused by several callers. It's also not what the function is mainly doing. Remove the argument and let the interested callers to call xe_bo_is_vram(). Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230726160708.3967790-6-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
621c1fbd |
|
26-Jul-2023 |
Lucas De Marchi <lucas.demarchi@intel.com> |
drm/xe: Remove vma arg from xe_pte_encode() All the callers pass a NULL vma, so the buffer is always the BO. Remove the argument and the side effects of dealing with it. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20230726160708.3967790-5-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
fd84041d |
|
19-Jul-2023 |
Matthew Brost <matthew.brost@intel.com> |
drm/xe: Make bind engines safe We currently have a race between bind engines which can result in corrupted page tables leading to faults. A simple example: bind A 0x0000-0x1000, engine A, has unsatisfied in-fence bind B 0x1000-0x2000, engine B, no in-fences exec A uses 0x1000-0x2000 Bind B will pass bind A and exec A will fault. This occurs as bind A programs the root of the page table in a bind job which is held up by an in-fence. Bind B in this case just programs a leaf entry of the structure. To fix use range-fence utility to track cross bind engine conflicts. In the above example bind A would insert an dependency into the range-fence tree with a key of 0x0-0x7fffffffff, bind B would find that dependency and its bind job would scheduled behind the unsatisfied in-fence and bind A's job. Reviewed-by: Maarten Lankhorst<maarten.lankhorst@linux.intel.com> Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
72e8d73b |
|
19-Jul-2023 |
Francois Dugast <francois.dugast@intel.com> |
drm/xe: Cleanup style warnings and errors Fix 6 errors and 20 warnings reported by checkpatch.pl. Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
0d39b6da |
|
18-Jul-2023 |
Lucas De Marchi <lucas.demarchi@intel.com> |
drm/xe: Normalize XE_VM_FLAG* names Rename XE_VM_FLAGS_64K to XE_VM_FLAG_64K to follow the other names and s/GT/TILE/ that got missed in commit 08dea7674533 ("drm/xe: Move migration from GT to tile"). Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20230718193924.3084759-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
3e8e7ee6 |
|
17-Jul-2023 |
Francois Dugast <francois.dugast@intel.com> |
drm/xe: Cleanup style warnings Reduce the number of warnings reported by checkpatch.pl from 118 to 48 by addressing those warnings types: LEADING_SPACE LINE_SPACING BRACES TRAILING_SEMICOLON CONSTANT_COMPARISON BLOCK_COMMENT_STYLE RETURN_VOID ONE_SEMICOLON SUSPECT_CODE_INDENT LINE_CONTINUATIONS UNNECESSARY_ELSE UNSPECIFIED_INT UNNECESSARY_INT MISORDERED_TYPE Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
b06d47be |
|
07-Jul-2023 |
Matthew Brost <matthew.brost@intel.com> |
drm/xe: Port Xe to GPUVA Rather than open coding VM binds and VMA tracking, use the GPUVA library. GPUVA provides a common infrastructure for VM binds to use mmap / munmap semantics and support for VK sparse bindings. The concepts are: 1) xe_vm inherits from drm_gpuva_manager 2) xe_vma inherits from drm_gpuva 3) xe_vma_op inherits from drm_gpuva_op 4) VM bind operations (MAP, UNMAP, PREFETCH, UNMAP_ALL) call into the GPUVA code to generate an VMA operations list which is parsed, committed, and executed. v2 (CI): Add break after default in case statement. v3: Rebase v4: Fix some error handling v5: Use unlocked version VMA in error paths v6: Rebase, address some review feedback mainly Thomas H v7: Fix compile error in xe_vma_op_unwind, address checkpatch Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
b7474119 |
|
29-Jun-2023 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/xe: Make page-table updates using the default engine happen in order If the default engine m->eng was used, there is no check for idle and a cpu page-table update may thus happen in parallel with a gpu one. Don't allow CPU page-table updates with the default engine until the engine is idle. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230629205134.111849-2-thomas.hellstrom@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
a0ea91db |
|
11-Jun-2023 |
Lucas De Marchi <lucas.demarchi@intel.com> |
drm/xe: Rename pte/pde encoding functions Remove the leftover TODO by renameing the functions to use xe prefix. Since the static __gen8_pte_encode() already has a double score, just remove the prefix. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20230611222447.2837573-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
f6929e80 |
|
01-Jun-2023 |
Matt Roper <matthew.d.roper@intel.com> |
drm/xe: Allocate GT dynamically In preparation for re-adding media GT support, switch the primary GT within the tile to a dynamic allocation. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20230601215244.678611-19-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
08dea767 |
|
01-Jun-2023 |
Matt Roper <matthew.d.roper@intel.com> |
drm/xe: Move migration from GT to tile Migration primarily focuses on the memory associated with a tile, so it makes more sense to track this at the tile level (especially since the driver was already skipping migration operations on media GTs). Note that the blitter engine used to perform the migration always lives in the tile's primary GT today. In theory that could change if media GTs ever start including blitter engines in the future, but we can extend the design if/when that happens in the future. v2: - Fix kunit test build - Kerneldoc parameter name update v3: - Removed leftover prototype for removed function. (Gustavo) - Remove unrelated / unwanted error handling change. (Gustavo) Cc: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/20230601215244.678611-15-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
876611c2 |
|
01-Jun-2023 |
Matt Roper <matthew.d.roper@intel.com> |
drm/xe: Memory allocations are tile-based, not GT-based Since memory and address spaces are a tile concept rather than a GT concept, we need to plumb tile-based handling through lots of memory-related code. Note that one remaining shortcoming here that will need to be addressed before media GT support can be re-enabled is that although the address space is shared between a tile's GTs, each GT caches the PTEs independently in their own TLB and thus TLB invalidation should be handled at the GT level. v2: - Fix kunit test build. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20230601215244.678611-13-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
fb31517c |
|
25-May-2023 |
Michael J. Ruhl <michael.j.ruhl@intel.com> |
drm/xe: Rename GPU offset helper to reflect true usage The _io_offset helper function is returning an offset into the GPU address space. Using the CPU address offset (io_) is not correct. Rename to reflect usage. Update to use GPU offset information. Update PT dma_offset to use the helper Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
a2f9f4ff |
|
24-May-2023 |
Matthew Auld <matthew.auld@intel.com> |
drm/xe/migrate: retain CCS aux state for vram -> vram There is no mention that migrate_copy() will skip copying the CCS aux state for all types of vram -> vram transfers. Currently we don't need such a facility but might be surprising if we ever do. v2: (Lucas): - s/lmem/vram/ in the commit message - Tidy up the code a bit; use one emit_copy_ccs() v3: - Reword the commit message Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Nirmoy Das <nirmoy.das@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
3690a01b |
|
24-May-2023 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/xe: Support copying of data between system memory bos Modify the xe_migrate_copy() function somewhat to explicitly allow copying of data between two buffer objects including system memory buffer objects. Update the migrate test accordingly. v2: - Check that buffer object sizes match when copying (Matthew Auld) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
56492dac |
|
12-Apr-2023 |
Lucas De Marchi <lucas.demarchi@intel.com> |
drm/xe: Rename instruction field to avoid confusion There was both BLT_DEPTH_32 and XY_FAST_COLOR_BLT_DEPTH_32 - also add the prefix to the first to make it clear this is about the FAST_**COPY** operation. While at it, remove the GEN9_ prefix. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
58e19acf |
|
12-Apr-2023 |
Lucas De Marchi <lucas.demarchi@intel.com> |
drm/xe: Cleanup page-related defines Rename the following defines to lose the GEN* prefixes since they don't make sense for xe: GEN8_PTE_SHIFT -> XE_PTE_SHIFT GEN8_PAGE_SIZE -> XE_PAGE_SIZE GEN8_PTE_MASK -> XE_PTE_MASK GEN8_PDE_SHIFT -> XE_PDE_SHIFT GEN8_PDES -> XE_PDES GEN8_PDE_MASK -> XE_PDE_MASK GEN8_64K_PTE_SHIFT -> XE_64K_PTE_SHIFT GEN8_64K_PAGE_SIZE -> XE_64K_PAGE_SIZE GEN8_64K_PTE_MASK -> XE_64K_PTE_MASK GEN8_64K_PDE_MASK -> XE_64K_PDE_MASK GEN8_PDE_PS_2M -> XE_PDE_PS_2M GEN8_PDPE_PS_1G -> XE_PDPE_PS_1G GEN8_PDE_IPS_64K -> XE_PDE_IPS_64K GEN12_GGTT_PTE_LM -> XE_GGTT_PTE_LM GEN12_USM_PPGTT_PTE_AE -> XE_USM_PPGTT_PTE_AE GEN12_PPGTT_PTE_LM -> XE_PPGTT_PTE_LM GEN12_PDE_64K -> XE_PDE_64K GEN12_PTE_PS64 -> XE_PTE_PS64 GEN8_PAGE_PRESENT -> XE_PAGE_PRESENT GEN8_PAGE_RW -> XE_PAGE_RW PTE_READ_ONLY -> XE_PTE_READ_ONLY Keep an XE_ prefix to make sure we don't mix the defines for the CPU (e.g. PAGE_SIZE) with the ones fro the GPU). Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
0a12a612 |
|
10-Apr-2023 |
Matt Roper <matthew.d.roper@intel.com> |
drm/xe: Let primary and media GT share a kernel_bb_pool The media GT requires a valid gt->kernel_bb_pool during driver probe to allocate the WA and NOOP batchbuffers used to record default context images. Dynamically allocate the bb_pools so that the primary and media GT can use the same pool during driver init. The media GT still shouldn't be need the USM pool, so only hook up the kernel_bb_pool for now. Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20230410200229.2726648-1-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
c33a7219 |
|
31-Mar-2023 |
Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> |
drm/xe: Use proper vram offset In xe_migrate functions, use proper vram io offset of the tiles while calculating addresses. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
b4eecedc |
|
20-Mar-2023 |
Matthew Brost <matthew.brost@intel.com> |
drm/xe: Fix potential deadlock handling page faults Within a class the GuC will hault scheduling if the head of the queue can't be scheduled the queue will block. This can lead to deadlock if BCS0-7 all have faults and another engine on BCS0-7 is at head of the GuC scheduling queue as the migration engine used to fix tthe fault will be blocked. To work around this set the migration engine to the highest priority when servicing page faults. v2 (Maarten): Set priority to kernel once at creation Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
11a2407e |
|
17-Mar-2023 |
Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> |
drm/xe: Stop accepting value in xe_migrate_clear Although xe_migrate_clear() has a value argument, currently the driver is only passing 0 at all the places this function is invoked with the exception the kunit tests are using the parameter to validate this function with different values. xe_migrate_clear() is failing on platforms with link copy engines because xe_migrate_clear() via emit_clear() is using the blitter instruction XY_FAST_COLOR_BLT to clear the memory. But this instruction is not supported by link copy engine. So the solution is to use the alternate instruction MEM_SET when platform contains link copy engine. But MEM_SET instruction accepts only 8-bit value for setting whereas the value agrument of xe_migrate_clear() is 32-bit. So instead of spreading this limitation around all invocations of xe_migrate_clear() and causing more confusion, it was decided to not accept any value itself as driver does not really need this currently. All the kunit tests are adapted as per the new function prototype. This will be followed by a patch to add support for link copy engines. Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
7cba3396 |
|
13-Mar-2023 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/xe/tests: Test both CPU- and GPU page-table updates with the migrate test Add a test parameter to force GPU page-table updates with the migrate test and test both CPU- and GPU updates. Also provide some timing results. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
155c9165 |
|
09-Mar-2023 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/xe: Introduce xe_engine_is_idle() Introduce xe_engine_is_idle, and replace the static function in xe_migrate.c. The latter had two flaws. First the seqno == 1 test might return a false true value each time the seqno counter wrapped, Second, the cur_seqno == next_seqno test would never return true. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
fc1cc680 |
|
10-Mar-2023 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/xe/migrate: Update cpu page-table updates Don't wait for GPU to be able to update page-tables using CPU. Putting ourselves to sleep may be more of a problem than using GPU for page-table updates. Also allow the vm to be NULL since the migrate kunit test uses NULL for vm. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
2a8477f7 |
|
07-Mar-2023 |
Matthew Auld <matthew.auld@intel.com> |
drm/xe: s/lmem/vram/ This seems to be the preferred nomenclature in xe. Currently we are intermixing vram and lmem, which is confusing. v2 (Gwan-gyeong Mun & Lucas): - Rather apply to the entire driver Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
8cb49012 |
|
25-Feb-2023 |
Lucas De Marchi <lucas.demarchi@intel.com> |
drm/xe: Do not spread i915_reg_defs.h include Reduce the use of i915_reg_defs.h so it can be encapsulated in a single place. 1) If it was being included by mistake, remove 2) If it was included for FIELD_GET()/FIELD_PREP()/GENMASK() and the like, just include <linux/bitfield.h> 3) If it was included to be able to define additional registers, move the registers to the relavant headers (regs/xe_regs.h or regs/xe_gt_regs.h) v2: - Squash commit fixing i915_reg_defs.h include and with the one introducing regs/xe_reg_defs.h - Remove more cases of i915_reg_defs.h being used when all it was needed was linux/bitfield.h (Matt Roper) - Move some registers to the corresponding regs/*.h file (Matt Roper) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Rodrigo squashed here the removal of the i915 include]
|
#
63955b3b |
|
24-Feb-2023 |
Lucas De Marchi <lucas.demarchi@intel.com> |
drm/xe: Remove dependency on intel_gpu_commands.h Copy the macros used by xe in intel_gpu_commands.h to regs/xe_gpu_commands.h. PIPE_CONTROL_3D_ENGINE_FLAGS and PIPE_CONTROL_3D_ARCH_FLAGS were already defined in drivers/gpu/drm/xe/xe_ring_ops.c and only used there. So let that define to be used instead of also adding to the new header. v2: Let PIPE_CONTROL_3D_ENGINE_FLAGS/PIPE_CONTROL_3D_ARCH_FLAGS in the only .c that uses it instead of redefining (Matt Roper) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
ea9f879d |
|
24-Feb-2023 |
Lucas De Marchi <lucas.demarchi@intel.com> |
drm/xe: Sort includes Sort includes and split them in blocks: 1) .h corresponding to the .c. Example: xe_bb.c should have a "#include "xe_bb.h" first. 2) #include <linux/...> 3) #include <drm/...> 4) local includes 5) i915 includes This is accomplished by running `clang-format --style=file -i --sort-includes drivers/gpu/drm/xe/*.[ch]` and ignoring all the changes after the includes. There are also some manual tweaks to split the blocks. v2: Also sort includes in headers Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
e89b384c |
|
06-Jan-2023 |
Matthew Brost <matthew.brost@intel.com> |
drm/xe/migrate: Update emit_pte to cope with a size level than 4k emit_pte assumes the size argument is 4k aligned, this may not be true for the PTEs emitted for CSS as seen by below call stack: [ 56.734228] xe_migrate_copy:585: size=327680, ccs_start=327680, css_size=1280,4096 [ 56.734250] xe_migrate_copy:643: size=262144 [ 56.734252] emit_pte:404: ptes=64 [ 56.734255] emit_pte:418: chunk=64 [ 56.734257] xe_migrate_copy:650: size=1024 @ CCS emit PTE [ 56.734259] emit_pte:404: ptes=1 [ 56.734261] emit_pte:418: chunk=1 [ 56.734339] xe_migrate_copy:643: size=65536 [ 56.734342] emit_pte:404: ptes=16 [ 56.734344] emit_pte:418: chunk=16 [ 56.734346] xe_migrate_copy:650: size=256 # CCS emit PTE [ 56.734348] emit_pte:404: ptes=1 [ 56.734350] emit_pte:418: chunk=1 [ 56.734352] xe_res_next:174: size=4096, remaining=0 Update emit_pte to handle sizes less than 4k. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
e9d285ff |
|
12-Jan-2023 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/xe/migrate: Add kerneldoc for the migrate subsystem Add kerneldoc for structs and external functions. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
|
#
dd08ebf6 |
|
30-Mar-2023 |
Matthew Brost <matthew.brost@intel.com> |
drm/xe: Introduce a new DRM driver for Intel GPUs Xe, is a new driver for Intel GPUs that supports both integrated and discrete platforms starting with Tiger Lake (first Intel Xe Architecture). The code is at a stage where it is already functional and has experimental support for multiple platforms starting from Tiger Lake, with initial support implemented in Mesa (for Iris and Anv, our OpenGL and Vulkan drivers), as well as in NEO (for OpenCL and Level0). The new Xe driver leverages a lot from i915. As for display, the intent is to share the display code with the i915 driver so that there is maximum reuse there. But it is not added in this patch. This initial work is a collaboration of many people and unfortunately the big squashed patch won't fully honor the proper credits. But let's get some git quick stats so we can at least try to preserve some of the credits: Co-developed-by: Matthew Brost <matthew.brost@intel.com> Co-developed-by: Matthew Auld <matthew.auld@intel.com> Co-developed-by: Matt Roper <matthew.d.roper@intel.com> Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Co-developed-by: Francois Dugast <francois.dugast@intel.com> Co-developed-by: Lucas De Marchi <lucas.demarchi@intel.com> Co-developed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Co-developed-by: Philippe Lecluse <philippe.lecluse@intel.com> Co-developed-by: Nirmoy Das <nirmoy.das@intel.com> Co-developed-by: Jani Nikula <jani.nikula@intel.com> Co-developed-by: José Roberto de Souza <jose.souza@intel.com> Co-developed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Co-developed-by: Dave Airlie <airlied@redhat.com> Co-developed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Co-developed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Co-developed-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
|