1// SPDX-License-Identifier: MIT 2/* 3 * Copyright �� 2014 Intel Corporation 4 */ 5 6/** 7 * DOC: Logical Rings, Logical Ring Contexts and Execlists 8 * 9 * Motivation: 10 * GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts". 11 * These expanded contexts enable a number of new abilities, especially 12 * "Execlists" (also implemented in this file). 13 * 14 * One of the main differences with the legacy HW contexts is that logical 15 * ring contexts incorporate many more things to the context's state, like 16 * PDPs or ringbuffer control registers: 17 * 18 * The reason why PDPs are included in the context is straightforward: as 19 * PPGTTs (per-process GTTs) are actually per-context, having the PDPs 20 * contained there mean you don't need to do a ppgtt->switch_mm yourself, 21 * instead, the GPU will do it for you on the context switch. 22 * 23 * But, what about the ringbuffer control registers (head, tail, etc..)? 24 * shouldn't we just need a set of those per engine command streamer? This is 25 * where the name "Logical Rings" starts to make sense: by virtualizing the 26 * rings, the engine cs shifts to a new "ring buffer" with every context 27 * switch. When you want to submit a workload to the GPU you: A) choose your 28 * context, B) find its appropriate virtualized ring, C) write commands to it 29 * and then, finally, D) tell the GPU to switch to that context. 30 * 31 * Instead of the legacy MI_SET_CONTEXT, the way you tell the GPU to switch 32 * to a contexts is via a context execution list, ergo "Execlists". 33 * 34 * LRC implementation: 35 * Regarding the creation of contexts, we have: 36 * 37 * - One global default context. 38 * - One local default context for each opened fd. 39 * - One local extra context for each context create ioctl call. 40 * 41 * Now that ringbuffers belong per-context (and not per-engine, like before) 42 * and that contexts are uniquely tied to a given engine (and not reusable, 43 * like before) we need: 44 * 45 * - One ringbuffer per-engine inside each context. 46 * - One backing object per-engine inside each context. 47 * 48 * The global default context starts its life with these new objects fully 49 * allocated and populated. The local default context for each opened fd is 50 * more complex, because we don't know at creation time which engine is going 51 * to use them. To handle this, we have implemented a deferred creation of LR 52 * contexts: 53 * 54 * The local context starts its life as a hollow or blank holder, that only 55 * gets populated for a given engine once we receive an execbuffer. If later 56 * on we receive another execbuffer ioctl for the same context but a different 57 * engine, we allocate/populate a new ringbuffer and context backing object and 58 * so on. 59 * 60 * Finally, regarding local contexts created using the ioctl call: as they are 61 * only allowed with the render ring, we can allocate & populate them right 62 * away (no need to defer anything, at least for now). 63 * 64 * Execlists implementation: 65 * Execlists are the new method by which, on gen8+ hardware, workloads are 66 * submitted for execution (as opposed to the legacy, ringbuffer-based, method). 67 * This method works as follows: 68 * 69 * When a request is committed, its commands (the BB start and any leading or 70 * trailing commands, like the seqno breadcrumbs) are placed in the ringbuffer 71 * for the appropriate context. The tail pointer in the hardware context is not 72 * updated at this time, but instead, kept by the driver in the ringbuffer 73 * structure. A structure representing this request is added to a request queue 74 * for the appropriate engine: this structure contains a copy of the context's 75 * tail after the request was written to the ring buffer and a pointer to the 76 * context itself. 77 * 78 * If the engine's request queue was empty before the request was added, the 79 * queue is processed immediately. Otherwise the queue will be processed during 80 * a context switch interrupt. In any case, elements on the queue will get sent 81 * (in pairs) to the GPU's ExecLists Submit Port (ELSP, for short) with a 82 * globally unique 20-bits submission ID. 83 * 84 * When execution of a request completes, the GPU updates the context status 85 * buffer with a context complete event and generates a context switch interrupt. 86 * During the interrupt handling, the driver examines the events in the buffer: 87 * for each context complete event, if the announced ID matches that on the head 88 * of the request queue, then that request is retired and removed from the queue. 89 * 90 * After processing, if any requests were retired and the queue is not empty 91 * then a new execution list can be submitted. The two requests at the front of 92 * the queue are next to be submitted but since a context may not occur twice in 93 * an execution list, if subsequent requests have the same ID as the first then 94 * the two requests must be combined. This is done simply by discarding requests 95 * at the head of the queue until either only one requests is left (in which case 96 * we use a NULL second context) or the first two requests have unique IDs. 97 * 98 * By always executing the first two requests in the queue the driver ensures 99 * that the GPU is kept as busy as possible. In the case where a single context 100 * completes but a second context is still executing, the request for this second 101 * context will be at the head of the queue when we remove the first one. This 102 * request will then be resubmitted along with a new request for a different context, 103 * which will cause the hardware to continue executing the second request and queue 104 * the new request (the GPU detects the condition of a context getting preempted 105 * with the same context and optimizes the context switch flow by not doing 106 * preemption, but just sampling the new tail pointer). 107 * 108 */ 109#include <linux/interrupt.h> 110#include <linux/string_helpers.h> 111 112#include "i915_drv.h" 113#include "i915_reg.h" 114#include "i915_trace.h" 115#include "i915_vgpu.h" 116#include "gen8_engine_cs.h" 117#include "intel_breadcrumbs.h" 118#include "intel_context.h" 119#include "intel_engine_heartbeat.h" 120#include "intel_engine_pm.h" 121#include "intel_engine_regs.h" 122#include "intel_engine_stats.h" 123#include "intel_execlists_submission.h" 124#include "intel_gt.h" 125#include "intel_gt_irq.h" 126#include "intel_gt_pm.h" 127#include "intel_gt_regs.h" 128#include "intel_gt_requests.h" 129#include "intel_lrc.h" 130#include "intel_lrc_reg.h" 131#include "intel_mocs.h" 132#include "intel_reset.h" 133#include "intel_ring.h" 134#include "intel_workarounds.h" 135#include "shmem_utils.h" 136 137#define RING_EXECLIST_QFULL (1 << 0x2) 138#define RING_EXECLIST1_VALID (1 << 0x3) 139#define RING_EXECLIST0_VALID (1 << 0x4) 140#define RING_EXECLIST_ACTIVE_STATUS (3 << 0xE) 141#define RING_EXECLIST1_ACTIVE (1 << 0x11) 142#define RING_EXECLIST0_ACTIVE (1 << 0x12) 143 144#define GEN8_CTX_STATUS_IDLE_ACTIVE (1 << 0) 145#define GEN8_CTX_STATUS_PREEMPTED (1 << 1) 146#define GEN8_CTX_STATUS_ELEMENT_SWITCH (1 << 2) 147#define GEN8_CTX_STATUS_ACTIVE_IDLE (1 << 3) 148#define GEN8_CTX_STATUS_COMPLETE (1 << 4) 149#define GEN8_CTX_STATUS_LITE_RESTORE (1 << 15) 150 151#define GEN8_CTX_STATUS_COMPLETED_MASK \ 152 (GEN8_CTX_STATUS_COMPLETE | GEN8_CTX_STATUS_PREEMPTED) 153 154#define GEN12_CTX_STATUS_SWITCHED_TO_NEW_QUEUE (0x1) /* lower csb dword */ 155#define GEN12_CTX_SWITCH_DETAIL(csb_dw) ((csb_dw) & 0xF) /* upper csb dword */ 156#define GEN12_CSB_SW_CTX_ID_MASK GENMASK(25, 15) 157#define GEN12_IDLE_CTX_ID 0x7FF 158#define GEN12_CSB_CTX_VALID(csb_dw) \ 159 (FIELD_GET(GEN12_CSB_SW_CTX_ID_MASK, csb_dw) != GEN12_IDLE_CTX_ID) 160 161#define XEHP_CTX_STATUS_SWITCHED_TO_NEW_QUEUE BIT(1) /* upper csb dword */ 162#define XEHP_CSB_SW_CTX_ID_MASK GENMASK(31, 10) 163#define XEHP_IDLE_CTX_ID 0xFFFF 164#define XEHP_CSB_CTX_VALID(csb_dw) \ 165 (FIELD_GET(XEHP_CSB_SW_CTX_ID_MASK, csb_dw) != XEHP_IDLE_CTX_ID) 166 167/* Typical size of the average request (2 pipecontrols and a MI_BB) */ 168#define EXECLISTS_REQUEST_SIZE 64 /* bytes */ 169 170struct virtual_engine { 171 struct intel_engine_cs base; 172 struct intel_context context; 173 struct rcu_work rcu; 174 175 /* 176 * We allow only a single request through the virtual engine at a time 177 * (each request in the timeline waits for the completion fence of 178 * the previous before being submitted). By restricting ourselves to 179 * only submitting a single request, each request is placed on to a 180 * physical to maximise load spreading (by virtue of the late greedy 181 * scheduling -- each real engine takes the next available request 182 * upon idling). 183 */ 184 struct i915_request *request; 185 186 /* 187 * We keep a rbtree of available virtual engines inside each physical 188 * engine, sorted by priority. Here we preallocate the nodes we need 189 * for the virtual engine, indexed by physical_engine->id. 190 */ 191 struct ve_node { 192 struct rb_node rb; 193 int prio; 194 } nodes[I915_NUM_ENGINES]; 195 196 /* And finally, which physical engines this virtual engine maps onto. */ 197 unsigned int num_siblings; 198 struct intel_engine_cs *siblings[]; 199}; 200 201static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine) 202{ 203 GEM_BUG_ON(!intel_engine_is_virtual(engine)); 204 return container_of(engine, struct virtual_engine, base); 205} 206 207static struct intel_context * 208execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count, 209 unsigned long flags); 210 211static struct i915_request * 212__active_request(const struct intel_timeline * const tl, 213 struct i915_request *rq, 214 int error) 215{ 216 struct i915_request *active = rq; 217 218 list_for_each_entry_from_reverse(rq, &tl->requests, link) { 219 if (__i915_request_is_complete(rq)) 220 break; 221 222 if (error) { 223 i915_request_set_error_once(rq, error); 224 __i915_request_skip(rq); 225 } 226 active = rq; 227 } 228 229 return active; 230} 231 232static struct i915_request * 233active_request(const struct intel_timeline * const tl, struct i915_request *rq) 234{ 235 return __active_request(tl, rq, 0); 236} 237 238static void ring_set_paused(const struct intel_engine_cs *engine, int state) 239{ 240 /* 241 * We inspect HWS_PREEMPT with a semaphore inside 242 * engine->emit_fini_breadcrumb. If the dword is true, 243 * the ring is paused as the semaphore will busywait 244 * until the dword is false. 245 */ 246 engine->status_page.addr[I915_GEM_HWS_PREEMPT] = state; 247 if (state) 248 wmb(); 249} 250 251static struct i915_priolist *to_priolist(struct rb_node *rb) 252{ 253 return rb_entry(rb, struct i915_priolist, node); 254} 255 256static int rq_prio(const struct i915_request *rq) 257{ 258 return READ_ONCE(rq->sched.attr.priority); 259} 260 261static int effective_prio(const struct i915_request *rq) 262{ 263 int prio = rq_prio(rq); 264 265 /* 266 * If this request is special and must not be interrupted at any 267 * cost, so be it. Note we are only checking the most recent request 268 * in the context and so may be masking an earlier vip request. It 269 * is hoped that under the conditions where nopreempt is used, this 270 * will not matter (i.e. all requests to that context will be 271 * nopreempt for as long as desired). 272 */ 273 if (i915_request_has_nopreempt(rq)) 274 prio = I915_PRIORITY_UNPREEMPTABLE; 275 276 return prio; 277} 278 279static int queue_prio(const struct i915_sched_engine *sched_engine) 280{ 281 struct rb_node *rb; 282 283 rb = rb_first_cached(&sched_engine->queue); 284 if (!rb) 285 return INT_MIN; 286 287 return to_priolist(rb)->priority; 288} 289 290static int virtual_prio(const struct intel_engine_execlists *el) 291{ 292 struct rb_node *rb = rb_first_cached(&el->virtual); 293 294 return rb ? rb_entry(rb, struct ve_node, rb)->prio : INT_MIN; 295} 296 297static bool need_preempt(const struct intel_engine_cs *engine, 298 const struct i915_request *rq) 299{ 300 int last_prio; 301 302 if (!intel_engine_has_semaphores(engine)) 303 return false; 304 305 /* 306 * Check if the current priority hint merits a preemption attempt. 307 * 308 * We record the highest value priority we saw during rescheduling 309 * prior to this dequeue, therefore we know that if it is strictly 310 * less than the current tail of ESLP[0], we do not need to force 311 * a preempt-to-idle cycle. 312 * 313 * However, the priority hint is a mere hint that we may need to 314 * preempt. If that hint is stale or we may be trying to preempt 315 * ourselves, ignore the request. 316 * 317 * More naturally we would write 318 * prio >= max(0, last); 319 * except that we wish to prevent triggering preemption at the same 320 * priority level: the task that is running should remain running 321 * to preserve FIFO ordering of dependencies. 322 */ 323 last_prio = max(effective_prio(rq), I915_PRIORITY_NORMAL - 1); 324 if (engine->sched_engine->queue_priority_hint <= last_prio) 325 return false; 326 327 /* 328 * Check against the first request in ELSP[1], it will, thanks to the 329 * power of PI, be the highest priority of that context. 330 */ 331 if (!list_is_last(&rq->sched.link, &engine->sched_engine->requests) && 332 rq_prio(list_next_entry(rq, sched.link)) > last_prio) 333 return true; 334 335 /* 336 * If the inflight context did not trigger the preemption, then maybe 337 * it was the set of queued requests? Pick the highest priority in 338 * the queue (the first active priolist) and see if it deserves to be 339 * running instead of ELSP[0]. 340 * 341 * The highest priority request in the queue can not be either 342 * ELSP[0] or ELSP[1] as, thanks again to PI, if it was the same 343 * context, it's priority would not exceed ELSP[0] aka last_prio. 344 */ 345 return max(virtual_prio(&engine->execlists), 346 queue_prio(engine->sched_engine)) > last_prio; 347} 348 349__maybe_unused static bool 350assert_priority_queue(const struct i915_request *prev, 351 const struct i915_request *next) 352{ 353 /* 354 * Without preemption, the prev may refer to the still active element 355 * which we refuse to let go. 356 * 357 * Even with preemption, there are times when we think it is better not 358 * to preempt and leave an ostensibly lower priority request in flight. 359 */ 360 if (i915_request_is_active(prev)) 361 return true; 362 363 return rq_prio(prev) >= rq_prio(next); 364} 365 366static struct i915_request * 367__unwind_incomplete_requests(struct intel_engine_cs *engine) 368{ 369 struct i915_request *rq, *rn, *active = NULL; 370 struct list_head *pl; 371 int prio = I915_PRIORITY_INVALID; 372 373 lockdep_assert_held(&engine->sched_engine->lock); 374 375 list_for_each_entry_safe_reverse(rq, rn, 376 &engine->sched_engine->requests, 377 sched.link) { 378 if (__i915_request_is_complete(rq)) { 379 list_del_init(&rq->sched.link); 380 continue; 381 } 382 383 __i915_request_unsubmit(rq); 384 385 GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID); 386 if (rq_prio(rq) != prio) { 387 prio = rq_prio(rq); 388 pl = i915_sched_lookup_priolist(engine->sched_engine, 389 prio); 390 } 391 GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine)); 392 393 list_move(&rq->sched.link, pl); 394 set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); 395 396 /* Check in case we rollback so far we wrap [size/2] */ 397 if (intel_ring_direction(rq->ring, 398 rq->tail, 399 rq->ring->tail + 8) > 0) 400 rq->context->lrc.desc |= CTX_DESC_FORCE_RESTORE; 401 402 active = rq; 403 } 404 405 return active; 406} 407 408struct i915_request * 409execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists) 410{ 411 struct intel_engine_cs *engine = 412 container_of(execlists, typeof(*engine), execlists); 413 414 return __unwind_incomplete_requests(engine); 415} 416 417static void 418execlists_context_status_change(struct i915_request *rq, unsigned long status) 419{ 420 /* 421 * Only used when GVT-g is enabled now. When GVT-g is disabled, 422 * The compiler should eliminate this function as dead-code. 423 */ 424 if (!IS_ENABLED(CONFIG_DRM_I915_GVT)) 425 return; 426 427 STUB(); 428#ifdef notyet 429 atomic_notifier_call_chain(&rq->engine->context_status_notifier, 430 status, rq); 431#endif 432} 433 434static void reset_active(struct i915_request *rq, 435 struct intel_engine_cs *engine) 436{ 437 struct intel_context * const ce = rq->context; 438 u32 head; 439 440 /* 441 * The executing context has been cancelled. We want to prevent 442 * further execution along this context and propagate the error on 443 * to anything depending on its results. 444 * 445 * In __i915_request_submit(), we apply the -EIO and remove the 446 * requests' payloads for any banned requests. But first, we must 447 * rewind the context back to the start of the incomplete request so 448 * that we do not jump back into the middle of the batch. 449 * 450 * We preserve the breadcrumbs and semaphores of the incomplete 451 * requests so that inter-timeline dependencies (i.e other timelines) 452 * remain correctly ordered. And we defer to __i915_request_submit() 453 * so that all asynchronous waits are correctly handled. 454 */ 455 ENGINE_TRACE(engine, "{ reset rq=%llx:%lld }\n", 456 rq->fence.context, rq->fence.seqno); 457 458 /* On resubmission of the active request, payload will be scrubbed */ 459 if (__i915_request_is_complete(rq)) 460 head = rq->tail; 461 else 462 head = __active_request(ce->timeline, rq, -EIO)->head; 463 head = intel_ring_wrap(ce->ring, head); 464 465 /* Scrub the context image to prevent replaying the previous batch */ 466 lrc_init_regs(ce, engine, true); 467 468 /* We've switched away, so this should be a no-op, but intent matters */ 469 ce->lrc.lrca = lrc_update_regs(ce, engine, head); 470} 471 472static bool bad_request(const struct i915_request *rq) 473{ 474 return rq->fence.error && i915_request_started(rq); 475} 476 477static struct intel_engine_cs * 478__execlists_schedule_in(struct i915_request *rq) 479{ 480 struct intel_engine_cs * const engine = rq->engine; 481 struct intel_context * const ce = rq->context; 482 483 intel_context_get(ce); 484 485 if (unlikely(intel_context_is_closed(ce) && 486 !intel_engine_has_heartbeat(engine))) 487 intel_context_set_exiting(ce); 488 489 if (unlikely(!intel_context_is_schedulable(ce) || bad_request(rq))) 490 reset_active(rq, engine); 491 492 if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) 493 lrc_check_regs(ce, engine, "before"); 494 495 if (ce->tag) { 496 /* Use a fixed tag for OA and friends */ 497 GEM_BUG_ON(ce->tag <= BITS_PER_LONG); 498 ce->lrc.ccid = ce->tag; 499 } else if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 50)) { 500 /* We don't need a strict matching tag, just different values */ 501 unsigned int tag = ffs(READ_ONCE(engine->context_tag)); 502 503 GEM_BUG_ON(tag == 0 || tag >= BITS_PER_LONG); 504 clear_bit(tag - 1, &engine->context_tag); 505 ce->lrc.ccid = tag << (XEHP_SW_CTX_ID_SHIFT - 32); 506 507 BUILD_BUG_ON(BITS_PER_LONG > GEN12_MAX_CONTEXT_HW_ID); 508 509 } else { 510 /* We don't need a strict matching tag, just different values */ 511 unsigned int tag = __ffs(engine->context_tag); 512 513 GEM_BUG_ON(tag >= BITS_PER_LONG); 514 __clear_bit(tag, &engine->context_tag); 515 ce->lrc.ccid = (1 + tag) << (GEN11_SW_CTX_ID_SHIFT - 32); 516 517 BUILD_BUG_ON(BITS_PER_LONG > GEN12_MAX_CONTEXT_HW_ID); 518 } 519 520 ce->lrc.ccid |= engine->execlists.ccid; 521 522 __intel_gt_pm_get(engine->gt); 523 if (engine->fw_domain && !engine->fw_active++) 524 intel_uncore_forcewake_get(engine->uncore, engine->fw_domain); 525 execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN); 526 intel_engine_context_in(engine); 527 528 CE_TRACE(ce, "schedule-in, ccid:%x\n", ce->lrc.ccid); 529 530 return engine; 531} 532 533static void execlists_schedule_in(struct i915_request *rq, int idx) 534{ 535 struct intel_context * const ce = rq->context; 536 struct intel_engine_cs *old; 537 538 GEM_BUG_ON(!intel_engine_pm_is_awake(rq->engine)); 539 trace_i915_request_in(rq, idx); 540 541 old = ce->inflight; 542 if (!old) 543 old = __execlists_schedule_in(rq); 544 WRITE_ONCE(ce->inflight, ptr_inc(old)); 545 546 GEM_BUG_ON(intel_context_inflight(ce) != rq->engine); 547} 548 549static void 550resubmit_virtual_request(struct i915_request *rq, struct virtual_engine *ve) 551{ 552 struct intel_engine_cs *engine = rq->engine; 553 554 spin_lock_irq(&engine->sched_engine->lock); 555 556 clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); 557 WRITE_ONCE(rq->engine, &ve->base); 558 ve->base.submit_request(rq); 559 560 spin_unlock_irq(&engine->sched_engine->lock); 561} 562 563static void kick_siblings(struct i915_request *rq, struct intel_context *ce) 564{ 565 struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 566 struct intel_engine_cs *engine = rq->engine; 567 568 /* 569 * After this point, the rq may be transferred to a new sibling, so 570 * before we clear ce->inflight make sure that the context has been 571 * removed from the b->signalers and furthermore we need to make sure 572 * that the concurrent iterator in signal_irq_work is no longer 573 * following ce->signal_link. 574 */ 575 if (!list_empty(&ce->signals)) 576 intel_context_remove_breadcrumbs(ce, engine->breadcrumbs); 577 578 /* 579 * This engine is now too busy to run this virtual request, so 580 * see if we can find an alternative engine for it to execute on. 581 * Once a request has become bonded to this engine, we treat it the 582 * same as other native request. 583 */ 584 if (i915_request_in_priority_queue(rq) && 585 rq->execution_mask != engine->mask) 586 resubmit_virtual_request(rq, ve); 587 588 if (READ_ONCE(ve->request)) 589 tasklet_hi_schedule(&ve->base.sched_engine->tasklet); 590} 591 592static void __execlists_schedule_out(struct i915_request * const rq, 593 struct intel_context * const ce) 594{ 595 struct intel_engine_cs * const engine = rq->engine; 596 unsigned int ccid; 597 598 /* 599 * NB process_csb() is not under the engine->sched_engine->lock and hence 600 * schedule_out can race with schedule_in meaning that we should 601 * refrain from doing non-trivial work here. 602 */ 603 604 CE_TRACE(ce, "schedule-out, ccid:%x\n", ce->lrc.ccid); 605 GEM_BUG_ON(ce->inflight != engine); 606 607 if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) 608 lrc_check_regs(ce, engine, "after"); 609 610 /* 611 * If we have just completed this context, the engine may now be 612 * idle and we want to re-enter powersaving. 613 */ 614 if (intel_timeline_is_last(ce->timeline, rq) && 615 __i915_request_is_complete(rq)) 616 intel_engine_add_retire(engine, ce->timeline); 617 618 ccid = ce->lrc.ccid; 619 if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 50)) { 620 ccid >>= XEHP_SW_CTX_ID_SHIFT - 32; 621 ccid &= XEHP_MAX_CONTEXT_HW_ID; 622 } else { 623 ccid >>= GEN11_SW_CTX_ID_SHIFT - 32; 624 ccid &= GEN12_MAX_CONTEXT_HW_ID; 625 } 626 627 if (ccid < BITS_PER_LONG) { 628 GEM_BUG_ON(ccid == 0); 629 GEM_BUG_ON(test_bit(ccid - 1, &engine->context_tag)); 630 __set_bit(ccid - 1, &engine->context_tag); 631 } 632 intel_engine_context_out(engine); 633 execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT); 634 if (engine->fw_domain && !--engine->fw_active) 635 intel_uncore_forcewake_put(engine->uncore, engine->fw_domain); 636 intel_gt_pm_put_async(engine->gt); 637 638 /* 639 * If this is part of a virtual engine, its next request may 640 * have been blocked waiting for access to the active context. 641 * We have to kick all the siblings again in case we need to 642 * switch (e.g. the next request is not runnable on this 643 * engine). Hopefully, we will already have submitted the next 644 * request before the tasklet runs and do not need to rebuild 645 * each virtual tree and kick everyone again. 646 */ 647 if (ce->engine != engine) 648 kick_siblings(rq, ce); 649 650 WRITE_ONCE(ce->inflight, NULL); 651 intel_context_put(ce); 652} 653 654static inline void execlists_schedule_out(struct i915_request *rq) 655{ 656 struct intel_context * const ce = rq->context; 657 658 trace_i915_request_out(rq); 659 660 GEM_BUG_ON(!ce->inflight); 661 ce->inflight = ptr_dec(ce->inflight); 662 if (!__intel_context_inflight_count(ce->inflight)) 663 __execlists_schedule_out(rq, ce); 664 665 i915_request_put(rq); 666} 667 668static u32 map_i915_prio_to_lrc_desc_prio(int prio) 669{ 670 if (prio > I915_PRIORITY_NORMAL) 671 return GEN12_CTX_PRIORITY_HIGH; 672 else if (prio < I915_PRIORITY_NORMAL) 673 return GEN12_CTX_PRIORITY_LOW; 674 else 675 return GEN12_CTX_PRIORITY_NORMAL; 676} 677 678static u64 execlists_update_context(struct i915_request *rq) 679{ 680 struct intel_context *ce = rq->context; 681 u64 desc; 682 u32 tail, prev; 683 684 desc = ce->lrc.desc; 685 if (rq->engine->flags & I915_ENGINE_HAS_EU_PRIORITY) 686 desc |= map_i915_prio_to_lrc_desc_prio(rq_prio(rq)); 687 688 /* 689 * WaIdleLiteRestore:bdw,skl 690 * 691 * We should never submit the context with the same RING_TAIL twice 692 * just in case we submit an empty ring, which confuses the HW. 693 * 694 * We append a couple of NOOPs (gen8_emit_wa_tail) after the end of 695 * the normal request to be able to always advance the RING_TAIL on 696 * subsequent resubmissions (for lite restore). Should that fail us, 697 * and we try and submit the same tail again, force the context 698 * reload. 699 * 700 * If we need to return to a preempted context, we need to skip the 701 * lite-restore and force it to reload the RING_TAIL. Otherwise, the 702 * HW has a tendency to ignore us rewinding the TAIL to the end of 703 * an earlier request. 704 */ 705 GEM_BUG_ON(ce->lrc_reg_state[CTX_RING_TAIL] != rq->ring->tail); 706 prev = rq->ring->tail; 707 tail = intel_ring_set_tail(rq->ring, rq->tail); 708 if (unlikely(intel_ring_direction(rq->ring, tail, prev) <= 0)) 709 desc |= CTX_DESC_FORCE_RESTORE; 710 ce->lrc_reg_state[CTX_RING_TAIL] = tail; 711 rq->tail = rq->wa_tail; 712 713 /* 714 * Make sure the context image is complete before we submit it to HW. 715 * 716 * Ostensibly, writes (including the WCB) should be flushed prior to 717 * an uncached write such as our mmio register access, the empirical 718 * evidence (esp. on Braswell) suggests that the WC write into memory 719 * may not be visible to the HW prior to the completion of the UC 720 * register write and that we may begin execution from the context 721 * before its image is complete leading to invalid PD chasing. 722 */ 723 wmb(); 724 725 ce->lrc.desc &= ~CTX_DESC_FORCE_RESTORE; 726 return desc; 727} 728 729static void write_desc(struct intel_engine_execlists *execlists, u64 desc, u32 port) 730{ 731 if (execlists->ctrl_reg) { 732 writel(lower_32_bits(desc), execlists->submit_reg + port * 2); 733 writel(upper_32_bits(desc), execlists->submit_reg + port * 2 + 1); 734 } else { 735 writel(upper_32_bits(desc), execlists->submit_reg); 736 writel(lower_32_bits(desc), execlists->submit_reg); 737 } 738} 739 740static __maybe_unused char * 741dump_port(char *buf, int buflen, const char *prefix, struct i915_request *rq) 742{ 743 if (!rq) 744 return ""; 745 746 snprintf(buf, buflen, "%sccid:%x %llx:%lld%s prio %d", 747 prefix, 748 rq->context->lrc.ccid, 749 rq->fence.context, rq->fence.seqno, 750 __i915_request_is_complete(rq) ? "!" : 751 __i915_request_has_started(rq) ? "*" : 752 "", 753 rq_prio(rq)); 754 755 return buf; 756} 757 758static __maybe_unused noinline void 759trace_ports(const struct intel_engine_execlists *execlists, 760 const char *msg, 761 struct i915_request * const *ports) 762{ 763 const struct intel_engine_cs *engine = 764 container_of(execlists, typeof(*engine), execlists); 765 char __maybe_unused p0[40], p1[40]; 766 767 if (!ports[0]) 768 return; 769 770 ENGINE_TRACE(engine, "%s { %s%s }\n", msg, 771 dump_port(p0, sizeof(p0), "", ports[0]), 772 dump_port(p1, sizeof(p1), ", ", ports[1])); 773} 774 775static bool 776reset_in_progress(const struct intel_engine_cs *engine) 777{ 778 return unlikely(!__tasklet_is_enabled(&engine->sched_engine->tasklet)); 779} 780 781static __maybe_unused noinline bool 782assert_pending_valid(const struct intel_engine_execlists *execlists, 783 const char *msg) 784{ 785 struct intel_engine_cs *engine = 786 container_of(execlists, typeof(*engine), execlists); 787 struct i915_request * const *port, *rq, *prev = NULL; 788 struct intel_context *ce = NULL; 789 u32 ccid = -1; 790 791 trace_ports(execlists, msg, execlists->pending); 792 793 /* We may be messing around with the lists during reset, lalala */ 794 if (reset_in_progress(engine)) 795 return true; 796 797 if (!execlists->pending[0]) { 798 GEM_TRACE_ERR("%s: Nothing pending for promotion!\n", 799 engine->name); 800 return false; 801 } 802 803 if (execlists->pending[execlists_num_ports(execlists)]) { 804 GEM_TRACE_ERR("%s: Excess pending[%d] for promotion!\n", 805 engine->name, execlists_num_ports(execlists)); 806 return false; 807 } 808 809 for (port = execlists->pending; (rq = *port); port++) { 810 unsigned long flags; 811 bool ok = true; 812 813 GEM_BUG_ON(!kref_read(&rq->fence.refcount)); 814 GEM_BUG_ON(!i915_request_is_active(rq)); 815 816 if (ce == rq->context) { 817 GEM_TRACE_ERR("%s: Dup context:%llx in pending[%zd]\n", 818 engine->name, 819 ce->timeline->fence_context, 820 port - execlists->pending); 821 return false; 822 } 823 ce = rq->context; 824 825 if (ccid == ce->lrc.ccid) { 826 GEM_TRACE_ERR("%s: Dup ccid:%x context:%llx in pending[%zd]\n", 827 engine->name, 828 ccid, ce->timeline->fence_context, 829 port - execlists->pending); 830 return false; 831 } 832 ccid = ce->lrc.ccid; 833 834 /* 835 * Sentinels are supposed to be the last request so they flush 836 * the current execution off the HW. Check that they are the only 837 * request in the pending submission. 838 * 839 * NB: Due to the async nature of preempt-to-busy and request 840 * cancellation we need to handle the case where request 841 * becomes a sentinel in parallel to CSB processing. 842 */ 843 if (prev && i915_request_has_sentinel(prev) && 844 !READ_ONCE(prev->fence.error)) { 845 GEM_TRACE_ERR("%s: context:%llx after sentinel in pending[%zd]\n", 846 engine->name, 847 ce->timeline->fence_context, 848 port - execlists->pending); 849 return false; 850 } 851 prev = rq; 852 853 /* 854 * We want virtual requests to only be in the first slot so 855 * that they are never stuck behind a hog and can be immediately 856 * transferred onto the next idle engine. 857 */ 858 if (rq->execution_mask != engine->mask && 859 port != execlists->pending) { 860 GEM_TRACE_ERR("%s: virtual engine:%llx not in prime position[%zd]\n", 861 engine->name, 862 ce->timeline->fence_context, 863 port - execlists->pending); 864 return false; 865 } 866 867 /* Hold tightly onto the lock to prevent concurrent retires! */ 868 if (!spin_trylock_irqsave(&rq->lock, flags)) 869 continue; 870 871 if (__i915_request_is_complete(rq)) 872 goto unlock; 873 874 if (i915_active_is_idle(&ce->active) && 875 !intel_context_is_barrier(ce)) { 876 GEM_TRACE_ERR("%s: Inactive context:%llx in pending[%zd]\n", 877 engine->name, 878 ce->timeline->fence_context, 879 port - execlists->pending); 880 ok = false; 881 goto unlock; 882 } 883 884 if (!i915_vma_is_pinned(ce->state)) { 885 GEM_TRACE_ERR("%s: Unpinned context:%llx in pending[%zd]\n", 886 engine->name, 887 ce->timeline->fence_context, 888 port - execlists->pending); 889 ok = false; 890 goto unlock; 891 } 892 893 if (!i915_vma_is_pinned(ce->ring->vma)) { 894 GEM_TRACE_ERR("%s: Unpinned ring:%llx in pending[%zd]\n", 895 engine->name, 896 ce->timeline->fence_context, 897 port - execlists->pending); 898 ok = false; 899 goto unlock; 900 } 901 902unlock: 903 spin_unlock_irqrestore(&rq->lock, flags); 904 if (!ok) 905 return false; 906 } 907 908 return ce; 909} 910 911static void execlists_submit_ports(struct intel_engine_cs *engine) 912{ 913 struct intel_engine_execlists *execlists = &engine->execlists; 914 unsigned int n; 915 916 GEM_BUG_ON(!assert_pending_valid(execlists, "submit")); 917 918 /* 919 * We can skip acquiring intel_runtime_pm_get() here as it was taken 920 * on our behalf by the request (see i915_gem_mark_busy()) and it will 921 * not be relinquished until the device is idle (see 922 * i915_gem_idle_work_handler()). As a precaution, we make sure 923 * that all ELSP are drained i.e. we have processed the CSB, 924 * before allowing ourselves to idle and calling intel_runtime_pm_put(). 925 */ 926 GEM_BUG_ON(!intel_engine_pm_is_awake(engine)); 927 928 /* 929 * ELSQ note: the submit queue is not cleared after being submitted 930 * to the HW so we need to make sure we always clean it up. This is 931 * currently ensured by the fact that we always write the same number 932 * of elsq entries, keep this in mind before changing the loop below. 933 */ 934 for (n = execlists_num_ports(execlists); n--; ) { 935 struct i915_request *rq = execlists->pending[n]; 936 937 write_desc(execlists, 938 rq ? execlists_update_context(rq) : 0, 939 n); 940 } 941 942 /* we need to manually load the submit queue */ 943 if (execlists->ctrl_reg) 944 writel(EL_CTRL_LOAD, execlists->ctrl_reg); 945} 946 947static bool ctx_single_port_submission(const struct intel_context *ce) 948{ 949 return (IS_ENABLED(CONFIG_DRM_I915_GVT) && 950 intel_context_force_single_submission(ce)); 951} 952 953static bool can_merge_ctx(const struct intel_context *prev, 954 const struct intel_context *next) 955{ 956 if (prev != next) 957 return false; 958 959 if (ctx_single_port_submission(prev)) 960 return false; 961 962 return true; 963} 964 965static unsigned long i915_request_flags(const struct i915_request *rq) 966{ 967 return READ_ONCE(rq->fence.flags); 968} 969 970static bool can_merge_rq(const struct i915_request *prev, 971 const struct i915_request *next) 972{ 973 GEM_BUG_ON(prev == next); 974 GEM_BUG_ON(!assert_priority_queue(prev, next)); 975 976 /* 977 * We do not submit known completed requests. Therefore if the next 978 * request is already completed, we can pretend to merge it in 979 * with the previous context (and we will skip updating the ELSP 980 * and tracking). Thus hopefully keeping the ELSP full with active 981 * contexts, despite the best efforts of preempt-to-busy to confuse 982 * us. 983 */ 984 if (__i915_request_is_complete(next)) 985 return true; 986 987 if (unlikely((i915_request_flags(prev) | i915_request_flags(next)) & 988 (BIT(I915_FENCE_FLAG_NOPREEMPT) | 989 BIT(I915_FENCE_FLAG_SENTINEL)))) 990 return false; 991 992 if (!can_merge_ctx(prev->context, next->context)) 993 return false; 994 995 GEM_BUG_ON(i915_seqno_passed(prev->fence.seqno, next->fence.seqno)); 996 return true; 997} 998 999static bool virtual_matches(const struct virtual_engine *ve, 1000 const struct i915_request *rq, 1001 const struct intel_engine_cs *engine) 1002{ 1003 const struct intel_engine_cs *inflight; 1004 1005 if (!rq) 1006 return false; 1007 1008 if (!(rq->execution_mask & engine->mask)) /* We peeked too soon! */ 1009 return false; 1010 1011 /* 1012 * We track when the HW has completed saving the context image 1013 * (i.e. when we have seen the final CS event switching out of 1014 * the context) and must not overwrite the context image before 1015 * then. This restricts us to only using the active engine 1016 * while the previous virtualized request is inflight (so 1017 * we reuse the register offsets). This is a very small 1018 * hystersis on the greedy seelction algorithm. 1019 */ 1020 inflight = intel_context_inflight(&ve->context); 1021 if (inflight && inflight != engine) 1022 return false; 1023 1024 return true; 1025} 1026 1027static struct virtual_engine * 1028first_virtual_engine(struct intel_engine_cs *engine) 1029{ 1030 struct intel_engine_execlists *el = &engine->execlists; 1031 struct rb_node *rb = rb_first_cached(&el->virtual); 1032 1033 while (rb) { 1034 struct virtual_engine *ve = 1035 rb_entry(rb, typeof(*ve), nodes[engine->id].rb); 1036 struct i915_request *rq = READ_ONCE(ve->request); 1037 1038 /* lazily cleanup after another engine handled rq */ 1039 if (!rq || !virtual_matches(ve, rq, engine)) { 1040 rb_erase_cached(rb, &el->virtual); 1041 RB_CLEAR_NODE(rb); 1042 rb = rb_first_cached(&el->virtual); 1043 continue; 1044 } 1045 1046 return ve; 1047 } 1048 1049 return NULL; 1050} 1051 1052static void virtual_xfer_context(struct virtual_engine *ve, 1053 struct intel_engine_cs *engine) 1054{ 1055 unsigned int n; 1056 1057 if (likely(engine == ve->siblings[0])) 1058 return; 1059 1060 GEM_BUG_ON(READ_ONCE(ve->context.inflight)); 1061 if (!intel_engine_has_relative_mmio(engine)) 1062 lrc_update_offsets(&ve->context, engine); 1063 1064 /* 1065 * Move the bound engine to the top of the list for 1066 * future execution. We then kick this tasklet first 1067 * before checking others, so that we preferentially 1068 * reuse this set of bound registers. 1069 */ 1070 for (n = 1; n < ve->num_siblings; n++) { 1071 if (ve->siblings[n] == engine) { 1072 swap(ve->siblings[n], ve->siblings[0]); 1073 break; 1074 } 1075 } 1076} 1077 1078static void defer_request(struct i915_request *rq, struct list_head * const pl) 1079{ 1080 DRM_LIST_HEAD(list); 1081 1082 /* 1083 * We want to move the interrupted request to the back of 1084 * the round-robin list (i.e. its priority level), but 1085 * in doing so, we must then move all requests that were in 1086 * flight and were waiting for the interrupted request to 1087 * be run after it again. 1088 */ 1089 do { 1090 struct i915_dependency *p; 1091 1092 GEM_BUG_ON(i915_request_is_active(rq)); 1093 list_move_tail(&rq->sched.link, pl); 1094 1095 for_each_waiter(p, rq) { 1096 struct i915_request *w = 1097 container_of(p->waiter, typeof(*w), sched); 1098 1099 if (p->flags & I915_DEPENDENCY_WEAK) 1100 continue; 1101 1102 /* Leave semaphores spinning on the other engines */ 1103 if (w->engine != rq->engine) 1104 continue; 1105 1106 /* No waiter should start before its signaler */ 1107 GEM_BUG_ON(i915_request_has_initial_breadcrumb(w) && 1108 __i915_request_has_started(w) && 1109 !__i915_request_is_complete(rq)); 1110 1111 if (!i915_request_is_ready(w)) 1112 continue; 1113 1114 if (rq_prio(w) < rq_prio(rq)) 1115 continue; 1116 1117 GEM_BUG_ON(rq_prio(w) > rq_prio(rq)); 1118 GEM_BUG_ON(i915_request_is_active(w)); 1119 list_move_tail(&w->sched.link, &list); 1120 } 1121 1122 rq = list_first_entry_or_null(&list, typeof(*rq), sched.link); 1123 } while (rq); 1124} 1125 1126static void defer_active(struct intel_engine_cs *engine) 1127{ 1128 struct i915_request *rq; 1129 1130 rq = __unwind_incomplete_requests(engine); 1131 if (!rq) 1132 return; 1133 1134 defer_request(rq, i915_sched_lookup_priolist(engine->sched_engine, 1135 rq_prio(rq))); 1136} 1137 1138static bool 1139timeslice_yield(const struct intel_engine_execlists *el, 1140 const struct i915_request *rq) 1141{ 1142 /* 1143 * Once bitten, forever smitten! 1144 * 1145 * If the active context ever busy-waited on a semaphore, 1146 * it will be treated as a hog until the end of its timeslice (i.e. 1147 * until it is scheduled out and replaced by a new submission, 1148 * possibly even its own lite-restore). The HW only sends an interrupt 1149 * on the first miss, and we do know if that semaphore has been 1150 * signaled, or even if it is now stuck on another semaphore. Play 1151 * safe, yield if it might be stuck -- it will be given a fresh 1152 * timeslice in the near future. 1153 */ 1154 return rq->context->lrc.ccid == READ_ONCE(el->yield); 1155} 1156 1157static bool needs_timeslice(const struct intel_engine_cs *engine, 1158 const struct i915_request *rq) 1159{ 1160 if (!intel_engine_has_timeslices(engine)) 1161 return false; 1162 1163 /* If not currently active, or about to switch, wait for next event */ 1164 if (!rq || __i915_request_is_complete(rq)) 1165 return false; 1166 1167 /* We do not need to start the timeslice until after the ACK */ 1168 if (READ_ONCE(engine->execlists.pending[0])) 1169 return false; 1170 1171 /* If ELSP[1] is occupied, always check to see if worth slicing */ 1172 if (!list_is_last_rcu(&rq->sched.link, 1173 &engine->sched_engine->requests)) { 1174 ENGINE_TRACE(engine, "timeslice required for second inflight context\n"); 1175 return true; 1176 } 1177 1178 /* Otherwise, ELSP[0] is by itself, but may be waiting in the queue */ 1179 if (!i915_sched_engine_is_empty(engine->sched_engine)) { 1180 ENGINE_TRACE(engine, "timeslice required for queue\n"); 1181 return true; 1182 } 1183 1184 if (!RB_EMPTY_ROOT(&engine->execlists.virtual.rb_root)) { 1185 ENGINE_TRACE(engine, "timeslice required for virtual\n"); 1186 return true; 1187 } 1188 1189 return false; 1190} 1191 1192static bool 1193timeslice_expired(struct intel_engine_cs *engine, const struct i915_request *rq) 1194{ 1195 const struct intel_engine_execlists *el = &engine->execlists; 1196 1197 if (i915_request_has_nopreempt(rq) && __i915_request_has_started(rq)) 1198 return false; 1199 1200 if (!needs_timeslice(engine, rq)) 1201 return false; 1202 1203 return timer_expired(&el->timer) || timeslice_yield(el, rq); 1204} 1205 1206static unsigned long timeslice(const struct intel_engine_cs *engine) 1207{ 1208 return READ_ONCE(engine->props.timeslice_duration_ms); 1209} 1210 1211static void start_timeslice(struct intel_engine_cs *engine) 1212{ 1213 struct intel_engine_execlists *el = &engine->execlists; 1214 unsigned long duration; 1215 1216 /* Disable the timer if there is nothing to switch to */ 1217 duration = 0; 1218 if (needs_timeslice(engine, *el->active)) { 1219 /* Avoid continually prolonging an active timeslice */ 1220 if (timer_active(&el->timer)) { 1221 /* 1222 * If we just submitted a new ELSP after an old 1223 * context, that context may have already consumed 1224 * its timeslice, so recheck. 1225 */ 1226 if (!timer_pending(&el->timer)) 1227 tasklet_hi_schedule(&engine->sched_engine->tasklet); 1228 return; 1229 } 1230 1231 duration = timeslice(engine); 1232 } 1233 1234 set_timer_ms(&el->timer, duration); 1235} 1236 1237static void record_preemption(struct intel_engine_execlists *execlists) 1238{ 1239 (void)I915_SELFTEST_ONLY(execlists->preempt_hang.count++); 1240} 1241 1242static unsigned long active_preempt_timeout(struct intel_engine_cs *engine, 1243 const struct i915_request *rq) 1244{ 1245 if (!rq) 1246 return 0; 1247 1248 /* Only allow ourselves to force reset the currently active context */ 1249 engine->execlists.preempt_target = rq; 1250 1251 /* Force a fast reset for terminated contexts (ignoring sysfs!) */ 1252 if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq))) 1253 return INTEL_CONTEXT_BANNED_PREEMPT_TIMEOUT_MS; 1254 1255 return READ_ONCE(engine->props.preempt_timeout_ms); 1256} 1257 1258static void set_preempt_timeout(struct intel_engine_cs *engine, 1259 const struct i915_request *rq) 1260{ 1261 if (!intel_engine_has_preempt_reset(engine)) 1262 return; 1263 1264 set_timer_ms(&engine->execlists.preempt, 1265 active_preempt_timeout(engine, rq)); 1266} 1267 1268static bool completed(const struct i915_request *rq) 1269{ 1270 if (i915_request_has_sentinel(rq)) 1271 return false; 1272 1273 return __i915_request_is_complete(rq); 1274} 1275 1276static void execlists_dequeue(struct intel_engine_cs *engine) 1277{ 1278 struct intel_engine_execlists * const execlists = &engine->execlists; 1279 struct i915_sched_engine * const sched_engine = engine->sched_engine; 1280 struct i915_request **port = execlists->pending; 1281 struct i915_request ** const last_port = port + execlists->port_mask; 1282 struct i915_request *last, * const *active; 1283 struct virtual_engine *ve; 1284 struct rb_node *rb; 1285 bool submit = false; 1286 1287 /* 1288 * Hardware submission is through 2 ports. Conceptually each port 1289 * has a (RING_START, RING_HEAD, RING_TAIL) tuple. RING_START is 1290 * static for a context, and unique to each, so we only execute 1291 * requests belonging to a single context from each ring. RING_HEAD 1292 * is maintained by the CS in the context image, it marks the place 1293 * where it got up to last time, and through RING_TAIL we tell the CS 1294 * where we want to execute up to this time. 1295 * 1296 * In this list the requests are in order of execution. Consecutive 1297 * requests from the same context are adjacent in the ringbuffer. We 1298 * can combine these requests into a single RING_TAIL update: 1299 * 1300 * RING_HEAD...req1...req2 1301 * ^- RING_TAIL 1302 * since to execute req2 the CS must first execute req1. 1303 * 1304 * Our goal then is to point each port to the end of a consecutive 1305 * sequence of requests as being the most optimal (fewest wake ups 1306 * and context switches) submission. 1307 */ 1308 1309 spin_lock(&sched_engine->lock); 1310 1311 /* 1312 * If the queue is higher priority than the last 1313 * request in the currently active context, submit afresh. 1314 * We will resubmit again afterwards in case we need to split 1315 * the active context to interject the preemption request, 1316 * i.e. we will retrigger preemption following the ack in case 1317 * of trouble. 1318 * 1319 */ 1320 active = execlists->active; 1321 while ((last = *active) && completed(last)) 1322 active++; 1323 1324 if (last) { 1325 if (need_preempt(engine, last)) { 1326 ENGINE_TRACE(engine, 1327 "preempting last=%llx:%lld, prio=%d, hint=%d\n", 1328 last->fence.context, 1329 last->fence.seqno, 1330 last->sched.attr.priority, 1331 sched_engine->queue_priority_hint); 1332 record_preemption(execlists); 1333 1334 /* 1335 * Don't let the RING_HEAD advance past the breadcrumb 1336 * as we unwind (and until we resubmit) so that we do 1337 * not accidentally tell it to go backwards. 1338 */ 1339 ring_set_paused(engine, 1); 1340 1341 /* 1342 * Note that we have not stopped the GPU at this point, 1343 * so we are unwinding the incomplete requests as they 1344 * remain inflight and so by the time we do complete 1345 * the preemption, some of the unwound requests may 1346 * complete! 1347 */ 1348 __unwind_incomplete_requests(engine); 1349 1350 last = NULL; 1351 } else if (timeslice_expired(engine, last)) { 1352 ENGINE_TRACE(engine, 1353 "expired:%s last=%llx:%lld, prio=%d, hint=%d, yield?=%s\n", 1354 str_yes_no(timer_expired(&execlists->timer)), 1355 last->fence.context, last->fence.seqno, 1356 rq_prio(last), 1357 sched_engine->queue_priority_hint, 1358 str_yes_no(timeslice_yield(execlists, last))); 1359 1360 /* 1361 * Consume this timeslice; ensure we start a new one. 1362 * 1363 * The timeslice expired, and we will unwind the 1364 * running contexts and recompute the next ELSP. 1365 * If that submit will be the same pair of contexts 1366 * (due to dependency ordering), we will skip the 1367 * submission. If we don't cancel the timer now, 1368 * we will see that the timer has expired and 1369 * reschedule the tasklet; continually until the 1370 * next context switch or other preemption event. 1371 * 1372 * Since we have decided to reschedule based on 1373 * consumption of this timeslice, if we submit the 1374 * same context again, grant it a full timeslice. 1375 */ 1376 cancel_timer(&execlists->timer); 1377 ring_set_paused(engine, 1); 1378 defer_active(engine); 1379 1380 /* 1381 * Unlike for preemption, if we rewind and continue 1382 * executing the same context as previously active, 1383 * the order of execution will remain the same and 1384 * the tail will only advance. We do not need to 1385 * force a full context restore, as a lite-restore 1386 * is sufficient to resample the monotonic TAIL. 1387 * 1388 * If we switch to any other context, similarly we 1389 * will not rewind TAIL of current context, and 1390 * normal save/restore will preserve state and allow 1391 * us to later continue executing the same request. 1392 */ 1393 last = NULL; 1394 } else { 1395 /* 1396 * Otherwise if we already have a request pending 1397 * for execution after the current one, we can 1398 * just wait until the next CS event before 1399 * queuing more. In either case we will force a 1400 * lite-restore preemption event, but if we wait 1401 * we hopefully coalesce several updates into a single 1402 * submission. 1403 */ 1404 if (active[1]) { 1405 /* 1406 * Even if ELSP[1] is occupied and not worthy 1407 * of timeslices, our queue might be. 1408 */ 1409 spin_unlock(&sched_engine->lock); 1410 return; 1411 } 1412 } 1413 } 1414 1415 /* XXX virtual is always taking precedence */ 1416 while ((ve = first_virtual_engine(engine))) { 1417 struct i915_request *rq; 1418 1419 spin_lock(&ve->base.sched_engine->lock); 1420 1421 rq = ve->request; 1422 if (unlikely(!virtual_matches(ve, rq, engine))) 1423 goto unlock; /* lost the race to a sibling */ 1424 1425 GEM_BUG_ON(rq->engine != &ve->base); 1426 GEM_BUG_ON(rq->context != &ve->context); 1427 1428 if (unlikely(rq_prio(rq) < queue_prio(sched_engine))) { 1429 spin_unlock(&ve->base.sched_engine->lock); 1430 break; 1431 } 1432 1433 if (last && !can_merge_rq(last, rq)) { 1434 spin_unlock(&ve->base.sched_engine->lock); 1435 spin_unlock(&engine->sched_engine->lock); 1436 return; /* leave this for another sibling */ 1437 } 1438 1439 ENGINE_TRACE(engine, 1440 "virtual rq=%llx:%lld%s, new engine? %s\n", 1441 rq->fence.context, 1442 rq->fence.seqno, 1443 __i915_request_is_complete(rq) ? "!" : 1444 __i915_request_has_started(rq) ? "*" : 1445 "", 1446 str_yes_no(engine != ve->siblings[0])); 1447 1448 WRITE_ONCE(ve->request, NULL); 1449 WRITE_ONCE(ve->base.sched_engine->queue_priority_hint, INT_MIN); 1450 1451 rb = &ve->nodes[engine->id].rb; 1452 rb_erase_cached(rb, &execlists->virtual); 1453 RB_CLEAR_NODE(rb); 1454 1455 GEM_BUG_ON(!(rq->execution_mask & engine->mask)); 1456 WRITE_ONCE(rq->engine, engine); 1457 1458 if (__i915_request_submit(rq)) { 1459 /* 1460 * Only after we confirm that we will submit 1461 * this request (i.e. it has not already 1462 * completed), do we want to update the context. 1463 * 1464 * This serves two purposes. It avoids 1465 * unnecessary work if we are resubmitting an 1466 * already completed request after timeslicing. 1467 * But more importantly, it prevents us altering 1468 * ve->siblings[] on an idle context, where 1469 * we may be using ve->siblings[] in 1470 * virtual_context_enter / virtual_context_exit. 1471 */ 1472 virtual_xfer_context(ve, engine); 1473 GEM_BUG_ON(ve->siblings[0] != engine); 1474 1475 submit = true; 1476 last = rq; 1477 } 1478 1479 i915_request_put(rq); 1480unlock: 1481 spin_unlock(&ve->base.sched_engine->lock); 1482 1483 /* 1484 * Hmm, we have a bunch of virtual engine requests, 1485 * but the first one was already completed (thanks 1486 * preempt-to-busy!). Keep looking at the veng queue 1487 * until we have no more relevant requests (i.e. 1488 * the normal submit queue has higher priority). 1489 */ 1490 if (submit) 1491 break; 1492 } 1493 1494 while ((rb = rb_first_cached(&sched_engine->queue))) { 1495 struct i915_priolist *p = to_priolist(rb); 1496 struct i915_request *rq, *rn; 1497 1498 priolist_for_each_request_consume(rq, rn, p) { 1499 bool merge = true; 1500 1501 /* 1502 * Can we combine this request with the current port? 1503 * It has to be the same context/ringbuffer and not 1504 * have any exceptions (e.g. GVT saying never to 1505 * combine contexts). 1506 * 1507 * If we can combine the requests, we can execute both 1508 * by updating the RING_TAIL to point to the end of the 1509 * second request, and so we never need to tell the 1510 * hardware about the first. 1511 */ 1512 if (last && !can_merge_rq(last, rq)) { 1513 /* 1514 * If we are on the second port and cannot 1515 * combine this request with the last, then we 1516 * are done. 1517 */ 1518 if (port == last_port) 1519 goto done; 1520 1521 /* 1522 * We must not populate both ELSP[] with the 1523 * same LRCA, i.e. we must submit 2 different 1524 * contexts if we submit 2 ELSP. 1525 */ 1526 if (last->context == rq->context) 1527 goto done; 1528 1529 if (i915_request_has_sentinel(last)) 1530 goto done; 1531 1532 /* 1533 * We avoid submitting virtual requests into 1534 * the secondary ports so that we can migrate 1535 * the request immediately to another engine 1536 * rather than wait for the primary request. 1537 */ 1538 if (rq->execution_mask != engine->mask) 1539 goto done; 1540 1541 /* 1542 * If GVT overrides us we only ever submit 1543 * port[0], leaving port[1] empty. Note that we 1544 * also have to be careful that we don't queue 1545 * the same context (even though a different 1546 * request) to the second port. 1547 */ 1548 if (ctx_single_port_submission(last->context) || 1549 ctx_single_port_submission(rq->context)) 1550 goto done; 1551 1552 merge = false; 1553 } 1554 1555 if (__i915_request_submit(rq)) { 1556 if (!merge) { 1557 *port++ = i915_request_get(last); 1558 last = NULL; 1559 } 1560 1561 GEM_BUG_ON(last && 1562 !can_merge_ctx(last->context, 1563 rq->context)); 1564 GEM_BUG_ON(last && 1565 i915_seqno_passed(last->fence.seqno, 1566 rq->fence.seqno)); 1567 1568 submit = true; 1569 last = rq; 1570 } 1571 } 1572 1573 rb_erase_cached(&p->node, &sched_engine->queue); 1574 i915_priolist_free(p); 1575 } 1576done: 1577 *port++ = i915_request_get(last); 1578 1579 /* 1580 * Here be a bit of magic! Or sleight-of-hand, whichever you prefer. 1581 * 1582 * We choose the priority hint such that if we add a request of greater 1583 * priority than this, we kick the submission tasklet to decide on 1584 * the right order of submitting the requests to hardware. We must 1585 * also be prepared to reorder requests as they are in-flight on the 1586 * HW. We derive the priority hint then as the first "hole" in 1587 * the HW submission ports and if there are no available slots, 1588 * the priority of the lowest executing request, i.e. last. 1589 * 1590 * When we do receive a higher priority request ready to run from the 1591 * user, see queue_request(), the priority hint is bumped to that 1592 * request triggering preemption on the next dequeue (or subsequent 1593 * interrupt for secondary ports). 1594 */ 1595 sched_engine->queue_priority_hint = queue_prio(sched_engine); 1596 i915_sched_engine_reset_on_empty(sched_engine); 1597 spin_unlock(&sched_engine->lock); 1598 1599 /* 1600 * We can skip poking the HW if we ended up with exactly the same set 1601 * of requests as currently running, e.g. trying to timeslice a pair 1602 * of ordered contexts. 1603 */ 1604 if (submit && 1605 memcmp(active, 1606 execlists->pending, 1607 (port - execlists->pending) * sizeof(*port))) { 1608 *port = NULL; 1609 while (port-- != execlists->pending) 1610 execlists_schedule_in(*port, port - execlists->pending); 1611 1612 WRITE_ONCE(execlists->yield, -1); 1613 set_preempt_timeout(engine, *active); 1614 execlists_submit_ports(engine); 1615 } else { 1616 ring_set_paused(engine, 0); 1617 while (port-- != execlists->pending) 1618 i915_request_put(*port); 1619 *execlists->pending = NULL; 1620 } 1621} 1622 1623static void execlists_dequeue_irq(struct intel_engine_cs *engine) 1624{ 1625 local_irq_disable(); /* Suspend interrupts across request submission */ 1626 execlists_dequeue(engine); 1627 local_irq_enable(); /* flush irq_work (e.g. breadcrumb enabling) */ 1628} 1629 1630static void clear_ports(struct i915_request **ports, int count) 1631{ 1632 memset_p((void **)ports, NULL, count); 1633} 1634 1635static void 1636copy_ports(struct i915_request **dst, struct i915_request **src, int count) 1637{ 1638 /* A memcpy_p() would be very useful here! */ 1639 while (count--) 1640 WRITE_ONCE(*dst++, *src++); /* avoid write tearing */ 1641} 1642 1643static struct i915_request ** 1644cancel_port_requests(struct intel_engine_execlists * const execlists, 1645 struct i915_request **inactive) 1646{ 1647 struct i915_request * const *port; 1648 1649 for (port = execlists->pending; *port; port++) 1650 *inactive++ = *port; 1651 clear_ports(execlists->pending, ARRAY_SIZE(execlists->pending)); 1652 1653 /* Mark the end of active before we overwrite *active */ 1654 for (port = xchg(&execlists->active, execlists->pending); *port; port++) 1655 *inactive++ = *port; 1656 clear_ports(execlists->inflight, ARRAY_SIZE(execlists->inflight)); 1657 1658 smp_wmb(); /* complete the seqlock for execlists_active() */ 1659 WRITE_ONCE(execlists->active, execlists->inflight); 1660 1661 /* Having cancelled all outstanding process_csb(), stop their timers */ 1662 GEM_BUG_ON(execlists->pending[0]); 1663 cancel_timer(&execlists->timer); 1664 cancel_timer(&execlists->preempt); 1665 1666 return inactive; 1667} 1668 1669/* 1670 * Starting with Gen12, the status has a new format: 1671 * 1672 * bit 0: switched to new queue 1673 * bit 1: reserved 1674 * bit 2: semaphore wait mode (poll or signal), only valid when 1675 * switch detail is set to "wait on semaphore" 1676 * bits 3-5: engine class 1677 * bits 6-11: engine instance 1678 * bits 12-14: reserved 1679 * bits 15-25: sw context id of the lrc the GT switched to 1680 * bits 26-31: sw counter of the lrc the GT switched to 1681 * bits 32-35: context switch detail 1682 * - 0: ctx complete 1683 * - 1: wait on sync flip 1684 * - 2: wait on vblank 1685 * - 3: wait on scanline 1686 * - 4: wait on semaphore 1687 * - 5: context preempted (not on SEMAPHORE_WAIT or 1688 * WAIT_FOR_EVENT) 1689 * bit 36: reserved 1690 * bits 37-43: wait detail (for switch detail 1 to 4) 1691 * bits 44-46: reserved 1692 * bits 47-57: sw context id of the lrc the GT switched away from 1693 * bits 58-63: sw counter of the lrc the GT switched away from 1694 * 1695 * Xe_HP csb shuffles things around compared to TGL: 1696 * 1697 * bits 0-3: context switch detail (same possible values as TGL) 1698 * bits 4-9: engine instance 1699 * bits 10-25: sw context id of the lrc the GT switched to 1700 * bits 26-31: sw counter of the lrc the GT switched to 1701 * bit 32: semaphore wait mode (poll or signal), Only valid when 1702 * switch detail is set to "wait on semaphore" 1703 * bit 33: switched to new queue 1704 * bits 34-41: wait detail (for switch detail 1 to 4) 1705 * bits 42-57: sw context id of the lrc the GT switched away from 1706 * bits 58-63: sw counter of the lrc the GT switched away from 1707 */ 1708static inline bool 1709__gen12_csb_parse(bool ctx_to_valid, bool ctx_away_valid, bool new_queue, 1710 u8 switch_detail) 1711{ 1712 /* 1713 * The context switch detail is not guaranteed to be 5 when a preemption 1714 * occurs, so we can't just check for that. The check below works for 1715 * all the cases we care about, including preemptions of WAIT 1716 * instructions and lite-restore. Preempt-to-idle via the CTRL register 1717 * would require some extra handling, but we don't support that. 1718 */ 1719 if (!ctx_away_valid || new_queue) { 1720 GEM_BUG_ON(!ctx_to_valid); 1721 return true; 1722 } 1723 1724 /* 1725 * switch detail = 5 is covered by the case above and we do not expect a 1726 * context switch on an unsuccessful wait instruction since we always 1727 * use polling mode. 1728 */ 1729 GEM_BUG_ON(switch_detail); 1730 return false; 1731} 1732 1733static bool xehp_csb_parse(const u64 csb) 1734{ 1735 return __gen12_csb_parse(XEHP_CSB_CTX_VALID(lower_32_bits(csb)), /* cxt to */ 1736 XEHP_CSB_CTX_VALID(upper_32_bits(csb)), /* cxt away */ 1737 upper_32_bits(csb) & XEHP_CTX_STATUS_SWITCHED_TO_NEW_QUEUE, 1738 GEN12_CTX_SWITCH_DETAIL(lower_32_bits(csb))); 1739} 1740 1741static bool gen12_csb_parse(const u64 csb) 1742{ 1743 return __gen12_csb_parse(GEN12_CSB_CTX_VALID(lower_32_bits(csb)), /* cxt to */ 1744 GEN12_CSB_CTX_VALID(upper_32_bits(csb)), /* cxt away */ 1745 lower_32_bits(csb) & GEN12_CTX_STATUS_SWITCHED_TO_NEW_QUEUE, 1746 GEN12_CTX_SWITCH_DETAIL(upper_32_bits(csb))); 1747} 1748 1749static bool gen8_csb_parse(const u64 csb) 1750{ 1751 return csb & (GEN8_CTX_STATUS_IDLE_ACTIVE | GEN8_CTX_STATUS_PREEMPTED); 1752} 1753 1754static noinline u64 1755wa_csb_read(const struct intel_engine_cs *engine, u64 * const csb) 1756{ 1757 u64 entry; 1758 1759 /* 1760 * Reading from the HWSP has one particular advantage: we can detect 1761 * a stale entry. Since the write into HWSP is broken, we have no reason 1762 * to trust the HW at all, the mmio entry may equally be unordered, so 1763 * we prefer the path that is self-checking and as a last resort, 1764 * return the mmio value. 1765 * 1766 * tgl,dg1:HSDES#22011327657 1767 */ 1768 preempt_disable(); 1769 if (wait_for_atomic_us((entry = READ_ONCE(*csb)) != -1, 10)) { 1770 int idx = csb - engine->execlists.csb_status; 1771 int status; 1772 1773 status = GEN8_EXECLISTS_STATUS_BUF; 1774 if (idx >= 6) { 1775 status = GEN11_EXECLISTS_STATUS_BUF2; 1776 idx -= 6; 1777 } 1778 status += sizeof(u64) * idx; 1779 1780 entry = intel_uncore_read64(engine->uncore, 1781 _MMIO(engine->mmio_base + status)); 1782 } 1783 preempt_enable(); 1784 1785 return entry; 1786} 1787 1788static u64 csb_read(const struct intel_engine_cs *engine, u64 * const csb) 1789{ 1790 u64 entry = READ_ONCE(*csb); 1791 1792 /* 1793 * Unfortunately, the GPU does not always serialise its write 1794 * of the CSB entries before its write of the CSB pointer, at least 1795 * from the perspective of the CPU, using what is known as a Global 1796 * Observation Point. We may read a new CSB tail pointer, but then 1797 * read the stale CSB entries, causing us to misinterpret the 1798 * context-switch events, and eventually declare the GPU hung. 1799 * 1800 * icl:HSDES#1806554093 1801 * tgl:HSDES#22011248461 1802 */ 1803 if (unlikely(entry == -1)) 1804 entry = wa_csb_read(engine, csb); 1805 1806 /* Consume this entry so that we can spot its future reuse. */ 1807 WRITE_ONCE(*csb, -1); 1808 1809 /* ELSP is an implicit wmb() before the GPU wraps and overwrites csb */ 1810 return entry; 1811} 1812 1813static void new_timeslice(struct intel_engine_execlists *el) 1814{ 1815 /* By cancelling, we will start afresh in start_timeslice() */ 1816 cancel_timer(&el->timer); 1817} 1818 1819static struct i915_request ** 1820process_csb(struct intel_engine_cs *engine, struct i915_request **inactive) 1821{ 1822 struct intel_engine_execlists * const execlists = &engine->execlists; 1823 u64 * const buf = execlists->csb_status; 1824 const u8 num_entries = execlists->csb_size; 1825 struct i915_request **prev; 1826 u8 head, tail; 1827 1828 /* 1829 * As we modify our execlists state tracking we require exclusive 1830 * access. Either we are inside the tasklet, or the tasklet is disabled 1831 * and we assume that is only inside the reset paths and so serialised. 1832 */ 1833 GEM_BUG_ON(!tasklet_is_locked(&engine->sched_engine->tasklet) && 1834 !reset_in_progress(engine)); 1835 1836 /* 1837 * Note that csb_write, csb_status may be either in HWSP or mmio. 1838 * When reading from the csb_write mmio register, we have to be 1839 * careful to only use the GEN8_CSB_WRITE_PTR portion, which is 1840 * the low 4bits. As it happens we know the next 4bits are always 1841 * zero and so we can simply masked off the low u8 of the register 1842 * and treat it identically to reading from the HWSP (without having 1843 * to use explicit shifting and masking, and probably bifurcating 1844 * the code to handle the legacy mmio read). 1845 */ 1846 head = execlists->csb_head; 1847 tail = READ_ONCE(*execlists->csb_write); 1848 if (unlikely(head == tail)) 1849 return inactive; 1850 1851 /* 1852 * We will consume all events from HW, or at least pretend to. 1853 * 1854 * The sequence of events from the HW is deterministic, and derived 1855 * from our writes to the ELSP, with a smidgen of variability for 1856 * the arrival of the asynchronous requests wrt to the inflight 1857 * execution. If the HW sends an event that does not correspond with 1858 * the one we are expecting, we have to abandon all hope as we lose 1859 * all tracking of what the engine is actually executing. We will 1860 * only detect we are out of sequence with the HW when we get an 1861 * 'impossible' event because we have already drained our own 1862 * preemption/promotion queue. If this occurs, we know that we likely 1863 * lost track of execution earlier and must unwind and restart, the 1864 * simplest way is by stop processing the event queue and force the 1865 * engine to reset. 1866 */ 1867 execlists->csb_head = tail; 1868 ENGINE_TRACE(engine, "cs-irq head=%d, tail=%d\n", head, tail); 1869 1870 /* 1871 * Hopefully paired with a wmb() in HW! 1872 * 1873 * We must complete the read of the write pointer before any reads 1874 * from the CSB, so that we do not see stale values. Without an rmb 1875 * (lfence) the HW may speculatively perform the CSB[] reads *before* 1876 * we perform the READ_ONCE(*csb_write). 1877 */ 1878 rmb(); 1879 1880 /* Remember who was last running under the timer */ 1881 prev = inactive; 1882 *prev = NULL; 1883 1884 do { 1885 bool promote; 1886 u64 csb; 1887 1888 if (++head == num_entries) 1889 head = 0; 1890 1891 /* 1892 * We are flying near dragons again. 1893 * 1894 * We hold a reference to the request in execlist_port[] 1895 * but no more than that. We are operating in softirq 1896 * context and so cannot hold any mutex or sleep. That 1897 * prevents us stopping the requests we are processing 1898 * in port[] from being retired simultaneously (the 1899 * breadcrumb will be complete before we see the 1900 * context-switch). As we only hold the reference to the 1901 * request, any pointer chasing underneath the request 1902 * is subject to a potential use-after-free. Thus we 1903 * store all of the bookkeeping within port[] as 1904 * required, and avoid using unguarded pointers beneath 1905 * request itself. The same applies to the atomic 1906 * status notifier. 1907 */ 1908 1909 csb = csb_read(engine, buf + head); 1910 ENGINE_TRACE(engine, "csb[%d]: status=0x%08x:0x%08x\n", 1911 head, upper_32_bits(csb), lower_32_bits(csb)); 1912 1913 if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 50)) 1914 promote = xehp_csb_parse(csb); 1915 else if (GRAPHICS_VER(engine->i915) >= 12) 1916 promote = gen12_csb_parse(csb); 1917 else 1918 promote = gen8_csb_parse(csb); 1919 if (promote) { 1920 struct i915_request * const *old = execlists->active; 1921 1922 if (GEM_WARN_ON(!*execlists->pending)) { 1923 execlists->error_interrupt |= ERROR_CSB; 1924 break; 1925 } 1926 1927 ring_set_paused(engine, 0); 1928 1929 /* Point active to the new ELSP; prevent overwriting */ 1930 WRITE_ONCE(execlists->active, execlists->pending); 1931 smp_wmb(); /* notify execlists_active() */ 1932 1933 /* cancel old inflight, prepare for switch */ 1934 trace_ports(execlists, "preempted", old); 1935 while (*old) 1936 *inactive++ = *old++; 1937 1938 /* switch pending to inflight */ 1939 GEM_BUG_ON(!assert_pending_valid(execlists, "promote")); 1940 copy_ports(execlists->inflight, 1941 execlists->pending, 1942 execlists_num_ports(execlists)); 1943 smp_wmb(); /* complete the seqlock */ 1944 WRITE_ONCE(execlists->active, execlists->inflight); 1945 1946 /* XXX Magic delay for tgl */ 1947 ENGINE_POSTING_READ(engine, RING_CONTEXT_STATUS_PTR); 1948 1949 WRITE_ONCE(execlists->pending[0], NULL); 1950 } else { 1951 if (GEM_WARN_ON(!*execlists->active)) { 1952 execlists->error_interrupt |= ERROR_CSB; 1953 break; 1954 } 1955 1956 /* port0 completed, advanced to port1 */ 1957 trace_ports(execlists, "completed", execlists->active); 1958 1959 /* 1960 * We rely on the hardware being strongly 1961 * ordered, that the breadcrumb write is 1962 * coherent (visible from the CPU) before the 1963 * user interrupt is processed. One might assume 1964 * that the breadcrumb write being before the 1965 * user interrupt and the CS event for the context 1966 * switch would therefore be before the CS event 1967 * itself... 1968 */ 1969 if (GEM_SHOW_DEBUG() && 1970 !__i915_request_is_complete(*execlists->active)) { 1971 struct i915_request *rq = *execlists->active; 1972 const u32 *regs __maybe_unused = 1973 rq->context->lrc_reg_state; 1974 1975 ENGINE_TRACE(engine, 1976 "context completed before request!\n"); 1977 ENGINE_TRACE(engine, 1978 "ring:{start:0x%08x, head:%04x, tail:%04x, ctl:%08x, mode:%08x}\n", 1979 ENGINE_READ(engine, RING_START), 1980 ENGINE_READ(engine, RING_HEAD) & HEAD_ADDR, 1981 ENGINE_READ(engine, RING_TAIL) & TAIL_ADDR, 1982 ENGINE_READ(engine, RING_CTL), 1983 ENGINE_READ(engine, RING_MI_MODE)); 1984 ENGINE_TRACE(engine, 1985 "rq:{start:%08x, head:%04x, tail:%04x, seqno:%llx:%d, hwsp:%d}, ", 1986 i915_ggtt_offset(rq->ring->vma), 1987 rq->head, rq->tail, 1988 rq->fence.context, 1989 lower_32_bits(rq->fence.seqno), 1990 hwsp_seqno(rq)); 1991 ENGINE_TRACE(engine, 1992 "ctx:{start:%08x, head:%04x, tail:%04x}, ", 1993 regs[CTX_RING_START], 1994 regs[CTX_RING_HEAD], 1995 regs[CTX_RING_TAIL]); 1996 } 1997 1998 *inactive++ = *execlists->active++; 1999 2000 GEM_BUG_ON(execlists->active - execlists->inflight > 2001 execlists_num_ports(execlists)); 2002 } 2003 } while (head != tail); 2004 2005 /* 2006 * Gen11 has proven to fail wrt global observation point between 2007 * entry and tail update, failing on the ordering and thus 2008 * we see an old entry in the context status buffer. 2009 * 2010 * Forcibly evict out entries for the next gpu csb update, 2011 * to increase the odds that we get a fresh entries with non 2012 * working hardware. The cost for doing so comes out mostly with 2013 * the wash as hardware, working or not, will need to do the 2014 * invalidation before. 2015 */ 2016 drm_clflush_virt_range(&buf[0], num_entries * sizeof(buf[0])); 2017 2018 /* 2019 * We assume that any event reflects a change in context flow 2020 * and merits a fresh timeslice. We reinstall the timer after 2021 * inspecting the queue to see if we need to resumbit. 2022 */ 2023 if (*prev != *execlists->active) { /* elide lite-restores */ 2024 struct intel_context *prev_ce = NULL, *active_ce = NULL; 2025 2026 /* 2027 * Note the inherent discrepancy between the HW runtime, 2028 * recorded as part of the context switch, and the CPU 2029 * adjustment for active contexts. We have to hope that 2030 * the delay in processing the CS event is very small 2031 * and consistent. It works to our advantage to have 2032 * the CPU adjustment _undershoot_ (i.e. start later than) 2033 * the CS timestamp so we never overreport the runtime 2034 * and correct overselves later when updating from HW. 2035 */ 2036 if (*prev) 2037 prev_ce = (*prev)->context; 2038 if (*execlists->active) 2039 active_ce = (*execlists->active)->context; 2040 if (prev_ce != active_ce) { 2041 if (prev_ce) 2042 lrc_runtime_stop(prev_ce); 2043 if (active_ce) 2044 lrc_runtime_start(active_ce); 2045 } 2046 new_timeslice(execlists); 2047 } 2048 2049 return inactive; 2050} 2051 2052static void post_process_csb(struct i915_request **port, 2053 struct i915_request **last) 2054{ 2055 while (port != last) 2056 execlists_schedule_out(*port++); 2057} 2058 2059static void __execlists_hold(struct i915_request *rq) 2060{ 2061 DRM_LIST_HEAD(list); 2062 2063 do { 2064 struct i915_dependency *p; 2065 2066 if (i915_request_is_active(rq)) 2067 __i915_request_unsubmit(rq); 2068 2069 clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); 2070 list_move_tail(&rq->sched.link, 2071 &rq->engine->sched_engine->hold); 2072 i915_request_set_hold(rq); 2073 RQ_TRACE(rq, "on hold\n"); 2074 2075 for_each_waiter(p, rq) { 2076 struct i915_request *w = 2077 container_of(p->waiter, typeof(*w), sched); 2078 2079 if (p->flags & I915_DEPENDENCY_WEAK) 2080 continue; 2081 2082 /* Leave semaphores spinning on the other engines */ 2083 if (w->engine != rq->engine) 2084 continue; 2085 2086 if (!i915_request_is_ready(w)) 2087 continue; 2088 2089 if (__i915_request_is_complete(w)) 2090 continue; 2091 2092 if (i915_request_on_hold(w)) 2093 continue; 2094 2095 list_move_tail(&w->sched.link, &list); 2096 } 2097 2098 rq = list_first_entry_or_null(&list, typeof(*rq), sched.link); 2099 } while (rq); 2100} 2101 2102static bool execlists_hold(struct intel_engine_cs *engine, 2103 struct i915_request *rq) 2104{ 2105 if (i915_request_on_hold(rq)) 2106 return false; 2107 2108 spin_lock_irq(&engine->sched_engine->lock); 2109 2110 if (__i915_request_is_complete(rq)) { /* too late! */ 2111 rq = NULL; 2112 goto unlock; 2113 } 2114 2115 /* 2116 * Transfer this request onto the hold queue to prevent it 2117 * being resumbitted to HW (and potentially completed) before we have 2118 * released it. Since we may have already submitted following 2119 * requests, we need to remove those as well. 2120 */ 2121 GEM_BUG_ON(i915_request_on_hold(rq)); 2122 GEM_BUG_ON(rq->engine != engine); 2123 __execlists_hold(rq); 2124 GEM_BUG_ON(list_empty(&engine->sched_engine->hold)); 2125 2126unlock: 2127 spin_unlock_irq(&engine->sched_engine->lock); 2128 return rq; 2129} 2130 2131static bool hold_request(const struct i915_request *rq) 2132{ 2133 struct i915_dependency *p; 2134 bool result = false; 2135 2136 /* 2137 * If one of our ancestors is on hold, we must also be on hold, 2138 * otherwise we will bypass it and execute before it. 2139 */ 2140 rcu_read_lock(); 2141 for_each_signaler(p, rq) { 2142 const struct i915_request *s = 2143 container_of(p->signaler, typeof(*s), sched); 2144 2145 if (s->engine != rq->engine) 2146 continue; 2147 2148 result = i915_request_on_hold(s); 2149 if (result) 2150 break; 2151 } 2152 rcu_read_unlock(); 2153 2154 return result; 2155} 2156 2157static void __execlists_unhold(struct i915_request *rq) 2158{ 2159 DRM_LIST_HEAD(list); 2160 2161 do { 2162 struct i915_dependency *p; 2163 2164 RQ_TRACE(rq, "hold release\n"); 2165 2166 GEM_BUG_ON(!i915_request_on_hold(rq)); 2167 GEM_BUG_ON(!i915_sw_fence_signaled(&rq->submit)); 2168 2169 i915_request_clear_hold(rq); 2170 list_move_tail(&rq->sched.link, 2171 i915_sched_lookup_priolist(rq->engine->sched_engine, 2172 rq_prio(rq))); 2173 set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); 2174 2175 /* Also release any children on this engine that are ready */ 2176 for_each_waiter(p, rq) { 2177 struct i915_request *w = 2178 container_of(p->waiter, typeof(*w), sched); 2179 2180 if (p->flags & I915_DEPENDENCY_WEAK) 2181 continue; 2182 2183 if (w->engine != rq->engine) 2184 continue; 2185 2186 if (!i915_request_on_hold(w)) 2187 continue; 2188 2189 /* Check that no other parents are also on hold */ 2190 if (hold_request(w)) 2191 continue; 2192 2193 list_move_tail(&w->sched.link, &list); 2194 } 2195 2196 rq = list_first_entry_or_null(&list, typeof(*rq), sched.link); 2197 } while (rq); 2198} 2199 2200static void execlists_unhold(struct intel_engine_cs *engine, 2201 struct i915_request *rq) 2202{ 2203 spin_lock_irq(&engine->sched_engine->lock); 2204 2205 /* 2206 * Move this request back to the priority queue, and all of its 2207 * children and grandchildren that were suspended along with it. 2208 */ 2209 __execlists_unhold(rq); 2210 2211 if (rq_prio(rq) > engine->sched_engine->queue_priority_hint) { 2212 engine->sched_engine->queue_priority_hint = rq_prio(rq); 2213 tasklet_hi_schedule(&engine->sched_engine->tasklet); 2214 } 2215 2216 spin_unlock_irq(&engine->sched_engine->lock); 2217} 2218 2219struct execlists_capture { 2220 struct work_struct work; 2221 struct i915_request *rq; 2222 struct i915_gpu_coredump *error; 2223}; 2224 2225static void execlists_capture_work(struct work_struct *work) 2226{ 2227 struct execlists_capture *cap = container_of(work, typeof(*cap), work); 2228 const gfp_t gfp = __GFP_KSWAPD_RECLAIM | __GFP_RETRY_MAYFAIL | 2229 __GFP_NOWARN; 2230 struct intel_engine_cs *engine = cap->rq->engine; 2231 struct intel_gt_coredump *gt = cap->error->gt; 2232 struct intel_engine_capture_vma *vma; 2233 2234 /* Compress all the objects attached to the request, slow! */ 2235 vma = intel_engine_coredump_add_request(gt->engine, cap->rq, gfp); 2236 if (vma) { 2237 struct i915_vma_compress *compress = 2238 i915_vma_capture_prepare(gt); 2239 2240 intel_engine_coredump_add_vma(gt->engine, vma, compress); 2241 i915_vma_capture_finish(gt, compress); 2242 } 2243 2244 gt->simulated = gt->engine->simulated; 2245 cap->error->simulated = gt->simulated; 2246 2247 /* Publish the error state, and announce it to the world */ 2248 i915_error_state_store(cap->error); 2249 i915_gpu_coredump_put(cap->error); 2250 2251 /* Return this request and all that depend upon it for signaling */ 2252 execlists_unhold(engine, cap->rq); 2253 i915_request_put(cap->rq); 2254 2255 kfree(cap); 2256} 2257 2258static struct execlists_capture *capture_regs(struct intel_engine_cs *engine) 2259{ 2260 const gfp_t gfp = GFP_ATOMIC | __GFP_NOWARN; 2261 struct execlists_capture *cap; 2262 2263 cap = kmalloc(sizeof(*cap), gfp); 2264 if (!cap) 2265 return NULL; 2266 2267 cap->error = i915_gpu_coredump_alloc(engine->i915, gfp); 2268 if (!cap->error) 2269 goto err_cap; 2270 2271 cap->error->gt = intel_gt_coredump_alloc(engine->gt, gfp, CORE_DUMP_FLAG_NONE); 2272 if (!cap->error->gt) 2273 goto err_gpu; 2274 2275 cap->error->gt->engine = intel_engine_coredump_alloc(engine, gfp, CORE_DUMP_FLAG_NONE); 2276 if (!cap->error->gt->engine) 2277 goto err_gt; 2278 2279 cap->error->gt->engine->hung = true; 2280 2281 return cap; 2282 2283err_gt: 2284 kfree(cap->error->gt); 2285err_gpu: 2286 kfree(cap->error); 2287err_cap: 2288 kfree(cap); 2289 return NULL; 2290} 2291 2292static struct i915_request * 2293active_context(struct intel_engine_cs *engine, u32 ccid) 2294{ 2295 const struct intel_engine_execlists * const el = &engine->execlists; 2296 struct i915_request * const *port, *rq; 2297 2298 /* 2299 * Use the most recent result from process_csb(), but just in case 2300 * we trigger an error (via interrupt) before the first CS event has 2301 * been written, peek at the next submission. 2302 */ 2303 2304 for (port = el->active; (rq = *port); port++) { 2305 if (rq->context->lrc.ccid == ccid) { 2306 ENGINE_TRACE(engine, 2307 "ccid:%x found at active:%zd\n", 2308 ccid, port - el->active); 2309 return rq; 2310 } 2311 } 2312 2313 for (port = el->pending; (rq = *port); port++) { 2314 if (rq->context->lrc.ccid == ccid) { 2315 ENGINE_TRACE(engine, 2316 "ccid:%x found at pending:%zd\n", 2317 ccid, port - el->pending); 2318 return rq; 2319 } 2320 } 2321 2322 ENGINE_TRACE(engine, "ccid:%x not found\n", ccid); 2323 return NULL; 2324} 2325 2326static u32 active_ccid(struct intel_engine_cs *engine) 2327{ 2328 return ENGINE_READ_FW(engine, RING_EXECLIST_STATUS_HI); 2329} 2330 2331static void execlists_capture(struct intel_engine_cs *engine) 2332{ 2333 struct drm_i915_private *i915 = engine->i915; 2334 struct execlists_capture *cap; 2335 2336 if (!IS_ENABLED(CONFIG_DRM_I915_CAPTURE_ERROR)) 2337 return; 2338 2339 /* 2340 * We need to _quickly_ capture the engine state before we reset. 2341 * We are inside an atomic section (softirq) here and we are delaying 2342 * the forced preemption event. 2343 */ 2344 cap = capture_regs(engine); 2345 if (!cap) 2346 return; 2347 2348 spin_lock_irq(&engine->sched_engine->lock); 2349 cap->rq = active_context(engine, active_ccid(engine)); 2350 if (cap->rq) { 2351 cap->rq = active_request(cap->rq->context->timeline, cap->rq); 2352 cap->rq = i915_request_get_rcu(cap->rq); 2353 } 2354 spin_unlock_irq(&engine->sched_engine->lock); 2355 if (!cap->rq) 2356 goto err_free; 2357 2358 /* 2359 * Remove the request from the execlists queue, and take ownership 2360 * of the request. We pass it to our worker who will _slowly_ compress 2361 * all the pages the _user_ requested for debugging their batch, after 2362 * which we return it to the queue for signaling. 2363 * 2364 * By removing them from the execlists queue, we also remove the 2365 * requests from being processed by __unwind_incomplete_requests() 2366 * during the intel_engine_reset(), and so they will *not* be replayed 2367 * afterwards. 2368 * 2369 * Note that because we have not yet reset the engine at this point, 2370 * it is possible for the request that we have identified as being 2371 * guilty, did in fact complete and we will then hit an arbitration 2372 * point allowing the outstanding preemption to succeed. The likelihood 2373 * of that is very low (as capturing of the engine registers should be 2374 * fast enough to run inside an irq-off atomic section!), so we will 2375 * simply hold that request accountable for being non-preemptible 2376 * long enough to force the reset. 2377 */ 2378 if (!execlists_hold(engine, cap->rq)) 2379 goto err_rq; 2380 2381 INIT_WORK(&cap->work, execlists_capture_work); 2382 queue_work(i915->unordered_wq, &cap->work); 2383 return; 2384 2385err_rq: 2386 i915_request_put(cap->rq); 2387err_free: 2388 i915_gpu_coredump_put(cap->error); 2389 kfree(cap); 2390} 2391 2392static void execlists_reset(struct intel_engine_cs *engine, const char *msg) 2393{ 2394 const unsigned int bit = I915_RESET_ENGINE + engine->id; 2395 unsigned long *lock = &engine->gt->reset.flags; 2396 2397 if (!intel_has_reset_engine(engine->gt)) 2398 return; 2399 2400 if (test_and_set_bit(bit, lock)) 2401 return; 2402 2403 ENGINE_TRACE(engine, "reset for %s\n", msg); 2404 2405 /* Mark this tasklet as disabled to avoid waiting for it to complete */ 2406 tasklet_disable_nosync(&engine->sched_engine->tasklet); 2407 2408 ring_set_paused(engine, 1); /* Freeze the current request in place */ 2409 execlists_capture(engine); 2410 intel_engine_reset(engine, msg); 2411 2412 tasklet_enable(&engine->sched_engine->tasklet); 2413 clear_and_wake_up_bit(bit, lock); 2414} 2415 2416static bool preempt_timeout(const struct intel_engine_cs *const engine) 2417{ 2418 const struct timeout *t = &engine->execlists.preempt; 2419 2420 if (!CONFIG_DRM_I915_PREEMPT_TIMEOUT) 2421 return false; 2422 2423 if (!timer_expired(t)) 2424 return false; 2425 2426 return engine->execlists.pending[0]; 2427} 2428 2429/* 2430 * Check the unread Context Status Buffers and manage the submission of new 2431 * contexts to the ELSP accordingly. 2432 */ 2433static void execlists_submission_tasklet(struct tasklet_struct *t) 2434{ 2435 struct i915_sched_engine *sched_engine = 2436 from_tasklet(sched_engine, t, tasklet); 2437 struct intel_engine_cs * const engine = sched_engine->private_data; 2438 struct i915_request *post[2 * EXECLIST_MAX_PORTS]; 2439 struct i915_request **inactive; 2440 2441 rcu_read_lock(); 2442 inactive = process_csb(engine, post); 2443 GEM_BUG_ON(inactive - post > ARRAY_SIZE(post)); 2444 2445 if (unlikely(preempt_timeout(engine))) { 2446 const struct i915_request *rq = *engine->execlists.active; 2447 2448 /* 2449 * If after the preempt-timeout expired, we are still on the 2450 * same active request/context as before we initiated the 2451 * preemption, reset the engine. 2452 * 2453 * However, if we have processed a CS event to switch contexts, 2454 * but not yet processed the CS event for the pending 2455 * preemption, reset the timer allowing the new context to 2456 * gracefully exit. 2457 */ 2458 cancel_timer(&engine->execlists.preempt); 2459 if (rq == engine->execlists.preempt_target) 2460 engine->execlists.error_interrupt |= ERROR_PREEMPT; 2461 else 2462 set_timer_ms(&engine->execlists.preempt, 2463 active_preempt_timeout(engine, rq)); 2464 } 2465 2466 if (unlikely(READ_ONCE(engine->execlists.error_interrupt))) { 2467 const char *msg; 2468 2469 /* Generate the error message in priority wrt to the user! */ 2470 if (engine->execlists.error_interrupt & GENMASK(15, 0)) 2471 msg = "CS error"; /* thrown by a user payload */ 2472 else if (engine->execlists.error_interrupt & ERROR_CSB) 2473 msg = "invalid CSB event"; 2474 else if (engine->execlists.error_interrupt & ERROR_PREEMPT) 2475 msg = "preemption time out"; 2476 else 2477 msg = "internal error"; 2478 2479 engine->execlists.error_interrupt = 0; 2480 execlists_reset(engine, msg); 2481 } 2482 2483 if (!engine->execlists.pending[0]) { 2484 execlists_dequeue_irq(engine); 2485 start_timeslice(engine); 2486 } 2487 2488 post_process_csb(post, inactive); 2489 rcu_read_unlock(); 2490} 2491 2492static void execlists_irq_handler(struct intel_engine_cs *engine, u16 iir) 2493{ 2494 bool tasklet = false; 2495 2496 if (unlikely(iir & GT_CS_MASTER_ERROR_INTERRUPT)) { 2497 u32 eir; 2498 2499 /* Upper 16b are the enabling mask, rsvd for internal errors */ 2500 eir = ENGINE_READ(engine, RING_EIR) & GENMASK(15, 0); 2501 ENGINE_TRACE(engine, "CS error: %x\n", eir); 2502 2503 /* Disable the error interrupt until after the reset */ 2504 if (likely(eir)) { 2505 ENGINE_WRITE(engine, RING_EMR, ~0u); 2506 ENGINE_WRITE(engine, RING_EIR, eir); 2507 WRITE_ONCE(engine->execlists.error_interrupt, eir); 2508 tasklet = true; 2509 } 2510 } 2511 2512 if (iir & GT_WAIT_SEMAPHORE_INTERRUPT) { 2513 WRITE_ONCE(engine->execlists.yield, 2514 ENGINE_READ_FW(engine, RING_EXECLIST_STATUS_HI)); 2515 ENGINE_TRACE(engine, "semaphore yield: %08x\n", 2516 engine->execlists.yield); 2517 if (del_timer(&engine->execlists.timer)) 2518 tasklet = true; 2519 } 2520 2521 if (iir & GT_CONTEXT_SWITCH_INTERRUPT) 2522 tasklet = true; 2523 2524 if (iir & GT_RENDER_USER_INTERRUPT) 2525 intel_engine_signal_breadcrumbs(engine); 2526 2527 if (tasklet) 2528 tasklet_hi_schedule(&engine->sched_engine->tasklet); 2529} 2530 2531static void __execlists_kick(struct intel_engine_execlists *execlists) 2532{ 2533 struct intel_engine_cs *engine = 2534 container_of(execlists, typeof(*engine), execlists); 2535 2536 /* Kick the tasklet for some interrupt coalescing and reset handling */ 2537 tasklet_hi_schedule(&engine->sched_engine->tasklet); 2538} 2539 2540#define execlists_kick(t, member) \ 2541 __execlists_kick(container_of(t, struct intel_engine_execlists, member)) 2542 2543static void execlists_timeslice(void *arg) 2544{ 2545 struct timeout *timer = (struct timeout *)arg; 2546 execlists_kick(timer, timer); 2547} 2548 2549static void execlists_preempt(void *arg) 2550{ 2551 struct timeout *timer = (struct timeout *)arg; 2552 execlists_kick(timer, preempt); 2553} 2554 2555static void queue_request(struct intel_engine_cs *engine, 2556 struct i915_request *rq) 2557{ 2558 GEM_BUG_ON(!list_empty(&rq->sched.link)); 2559 list_add_tail(&rq->sched.link, 2560 i915_sched_lookup_priolist(engine->sched_engine, 2561 rq_prio(rq))); 2562 set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); 2563} 2564 2565static bool submit_queue(struct intel_engine_cs *engine, 2566 const struct i915_request *rq) 2567{ 2568 struct i915_sched_engine *sched_engine = engine->sched_engine; 2569 2570 if (rq_prio(rq) <= sched_engine->queue_priority_hint) 2571 return false; 2572 2573 sched_engine->queue_priority_hint = rq_prio(rq); 2574 return true; 2575} 2576 2577static bool ancestor_on_hold(const struct intel_engine_cs *engine, 2578 const struct i915_request *rq) 2579{ 2580 GEM_BUG_ON(i915_request_on_hold(rq)); 2581 return !list_empty(&engine->sched_engine->hold) && hold_request(rq); 2582} 2583 2584static void execlists_submit_request(struct i915_request *request) 2585{ 2586 struct intel_engine_cs *engine = request->engine; 2587 unsigned long flags; 2588 2589 /* Will be called from irq-context when using foreign fences. */ 2590 spin_lock_irqsave(&engine->sched_engine->lock, flags); 2591 2592 if (unlikely(ancestor_on_hold(engine, request))) { 2593 RQ_TRACE(request, "ancestor on hold\n"); 2594 list_add_tail(&request->sched.link, 2595 &engine->sched_engine->hold); 2596 i915_request_set_hold(request); 2597 } else { 2598 queue_request(engine, request); 2599 2600 GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine)); 2601 GEM_BUG_ON(list_empty(&request->sched.link)); 2602 2603 if (submit_queue(engine, request)) 2604 __execlists_kick(&engine->execlists); 2605 } 2606 2607 spin_unlock_irqrestore(&engine->sched_engine->lock, flags); 2608} 2609 2610static int 2611__execlists_context_pre_pin(struct intel_context *ce, 2612 struct intel_engine_cs *engine, 2613 struct i915_gem_ww_ctx *ww, void **vaddr) 2614{ 2615 int err; 2616 2617 err = lrc_pre_pin(ce, engine, ww, vaddr); 2618 if (err) 2619 return err; 2620 2621 if (!__test_and_set_bit(CONTEXT_INIT_BIT, &ce->flags)) { 2622 lrc_init_state(ce, engine, *vaddr); 2623 2624 __i915_gem_object_flush_map(ce->state->obj, 0, engine->context_size); 2625 } 2626 2627 return 0; 2628} 2629 2630static int execlists_context_pre_pin(struct intel_context *ce, 2631 struct i915_gem_ww_ctx *ww, 2632 void **vaddr) 2633{ 2634 return __execlists_context_pre_pin(ce, ce->engine, ww, vaddr); 2635} 2636 2637static int execlists_context_pin(struct intel_context *ce, void *vaddr) 2638{ 2639 return lrc_pin(ce, ce->engine, vaddr); 2640} 2641 2642static int execlists_context_alloc(struct intel_context *ce) 2643{ 2644 return lrc_alloc(ce, ce->engine); 2645} 2646 2647static void execlists_context_cancel_request(struct intel_context *ce, 2648 struct i915_request *rq) 2649{ 2650 struct intel_engine_cs *engine = NULL; 2651 2652 i915_request_active_engine(rq, &engine); 2653 2654 if (engine && intel_engine_pulse(engine)) 2655 intel_gt_handle_error(engine->gt, engine->mask, 0, 2656 "request cancellation by %s", 2657 curproc->p_p->ps_comm); 2658} 2659 2660static struct intel_context * 2661execlists_create_parallel(struct intel_engine_cs **engines, 2662 unsigned int num_siblings, 2663 unsigned int width) 2664{ 2665 struct intel_context *parent = NULL, *ce, *err; 2666 int i; 2667 2668 GEM_BUG_ON(num_siblings != 1); 2669 2670 for (i = 0; i < width; ++i) { 2671 ce = intel_context_create(engines[i]); 2672 if (IS_ERR(ce)) { 2673 err = ce; 2674 goto unwind; 2675 } 2676 2677 if (i == 0) 2678 parent = ce; 2679 else 2680 intel_context_bind_parent_child(parent, ce); 2681 } 2682 2683 parent->parallel.fence_context = dma_fence_context_alloc(1); 2684 2685 intel_context_set_nopreempt(parent); 2686 for_each_child(parent, ce) 2687 intel_context_set_nopreempt(ce); 2688 2689 return parent; 2690 2691unwind: 2692 if (parent) 2693 intel_context_put(parent); 2694 return err; 2695} 2696 2697static const struct intel_context_ops execlists_context_ops = { 2698 .flags = COPS_HAS_INFLIGHT | COPS_RUNTIME_CYCLES, 2699 2700 .alloc = execlists_context_alloc, 2701 2702 .cancel_request = execlists_context_cancel_request, 2703 2704 .pre_pin = execlists_context_pre_pin, 2705 .pin = execlists_context_pin, 2706 .unpin = lrc_unpin, 2707 .post_unpin = lrc_post_unpin, 2708 2709 .enter = intel_context_enter_engine, 2710 .exit = intel_context_exit_engine, 2711 2712 .reset = lrc_reset, 2713 .destroy = lrc_destroy, 2714 2715 .create_parallel = execlists_create_parallel, 2716 .create_virtual = execlists_create_virtual, 2717}; 2718 2719static int emit_pdps(struct i915_request *rq) 2720{ 2721 const struct intel_engine_cs * const engine = rq->engine; 2722 struct i915_ppgtt * const ppgtt = i915_vm_to_ppgtt(rq->context->vm); 2723 int err, i; 2724 u32 *cs; 2725 2726 GEM_BUG_ON(intel_vgpu_active(rq->i915)); 2727 2728 /* 2729 * Beware ye of the dragons, this sequence is magic! 2730 * 2731 * Small changes to this sequence can cause anything from 2732 * GPU hangs to forcewake errors and machine lockups! 2733 */ 2734 2735 cs = intel_ring_begin(rq, 2); 2736 if (IS_ERR(cs)) 2737 return PTR_ERR(cs); 2738 2739 *cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE; 2740 *cs++ = MI_NOOP; 2741 intel_ring_advance(rq, cs); 2742 2743 /* Flush any residual operations from the context load */ 2744 err = engine->emit_flush(rq, EMIT_FLUSH); 2745 if (err) 2746 return err; 2747 2748 /* Magic required to prevent forcewake errors! */ 2749 err = engine->emit_flush(rq, EMIT_INVALIDATE); 2750 if (err) 2751 return err; 2752 2753 cs = intel_ring_begin(rq, 4 * GEN8_3LVL_PDPES + 2); 2754 if (IS_ERR(cs)) 2755 return PTR_ERR(cs); 2756 2757 /* Ensure the LRI have landed before we invalidate & continue */ 2758 *cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES) | MI_LRI_FORCE_POSTED; 2759 for (i = GEN8_3LVL_PDPES; i--; ) { 2760 const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i); 2761 u32 base = engine->mmio_base; 2762 2763 *cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(base, i)); 2764 *cs++ = upper_32_bits(pd_daddr); 2765 *cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(base, i)); 2766 *cs++ = lower_32_bits(pd_daddr); 2767 } 2768 *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; 2769 intel_ring_advance(rq, cs); 2770 2771 intel_ring_advance(rq, cs); 2772 2773 return 0; 2774} 2775 2776static int execlists_request_alloc(struct i915_request *request) 2777{ 2778 int ret; 2779 2780 GEM_BUG_ON(!intel_context_is_pinned(request->context)); 2781 2782 /* 2783 * Flush enough space to reduce the likelihood of waiting after 2784 * we start building the request - in which case we will just 2785 * have to repeat work. 2786 */ 2787 request->reserved_space += EXECLISTS_REQUEST_SIZE; 2788 2789 /* 2790 * Note that after this point, we have committed to using 2791 * this request as it is being used to both track the 2792 * state of engine initialisation and liveness of the 2793 * golden renderstate above. Think twice before you try 2794 * to cancel/unwind this request now. 2795 */ 2796 2797 if (!i915_vm_is_4lvl(request->context->vm)) { 2798 ret = emit_pdps(request); 2799 if (ret) 2800 return ret; 2801 } 2802 2803 /* Unconditionally invalidate GPU caches and TLBs. */ 2804 ret = request->engine->emit_flush(request, EMIT_INVALIDATE); 2805 if (ret) 2806 return ret; 2807 2808 request->reserved_space -= EXECLISTS_REQUEST_SIZE; 2809 return 0; 2810} 2811 2812static void reset_csb_pointers(struct intel_engine_cs *engine) 2813{ 2814 struct intel_engine_execlists * const execlists = &engine->execlists; 2815 const unsigned int reset_value = execlists->csb_size - 1; 2816 2817 ring_set_paused(engine, 0); 2818 2819 /* 2820 * Sometimes Icelake forgets to reset its pointers on a GPU reset. 2821 * Bludgeon them with a mmio update to be sure. 2822 */ 2823 ENGINE_WRITE(engine, RING_CONTEXT_STATUS_PTR, 2824 0xffff << 16 | reset_value << 8 | reset_value); 2825 ENGINE_POSTING_READ(engine, RING_CONTEXT_STATUS_PTR); 2826 2827 /* 2828 * After a reset, the HW starts writing into CSB entry [0]. We 2829 * therefore have to set our HEAD pointer back one entry so that 2830 * the *first* entry we check is entry 0. To complicate this further, 2831 * as we don't wait for the first interrupt after reset, we have to 2832 * fake the HW write to point back to the last entry so that our 2833 * inline comparison of our cached head position against the last HW 2834 * write works even before the first interrupt. 2835 */ 2836 execlists->csb_head = reset_value; 2837 WRITE_ONCE(*execlists->csb_write, reset_value); 2838 wmb(); /* Make sure this is visible to HW (paranoia?) */ 2839 2840 /* Check that the GPU does indeed update the CSB entries! */ 2841 memset(execlists->csb_status, -1, (reset_value + 1) * sizeof(u64)); 2842 drm_clflush_virt_range(execlists->csb_status, 2843 execlists->csb_size * 2844 sizeof(execlists->csb_status)); 2845 2846 /* Once more for luck and our trusty paranoia */ 2847 ENGINE_WRITE(engine, RING_CONTEXT_STATUS_PTR, 2848 0xffff << 16 | reset_value << 8 | reset_value); 2849 ENGINE_POSTING_READ(engine, RING_CONTEXT_STATUS_PTR); 2850 2851 GEM_BUG_ON(READ_ONCE(*execlists->csb_write) != reset_value); 2852} 2853 2854static void sanitize_hwsp(struct intel_engine_cs *engine) 2855{ 2856 struct intel_timeline *tl; 2857 2858 list_for_each_entry(tl, &engine->status_page.timelines, engine_link) 2859 intel_timeline_reset_seqno(tl); 2860} 2861 2862static void execlists_sanitize(struct intel_engine_cs *engine) 2863{ 2864 GEM_BUG_ON(execlists_active(&engine->execlists)); 2865 2866 /* 2867 * Poison residual state on resume, in case the suspend didn't! 2868 * 2869 * We have to assume that across suspend/resume (or other loss 2870 * of control) that the contents of our pinned buffers has been 2871 * lost, replaced by garbage. Since this doesn't always happen, 2872 * let's poison such state so that we more quickly spot when 2873 * we falsely assume it has been preserved. 2874 */ 2875 if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) 2876 memset(engine->status_page.addr, POISON_INUSE, PAGE_SIZE); 2877 2878 reset_csb_pointers(engine); 2879 2880 /* 2881 * The kernel_context HWSP is stored in the status_page. As above, 2882 * that may be lost on resume/initialisation, and so we need to 2883 * reset the value in the HWSP. 2884 */ 2885 sanitize_hwsp(engine); 2886 2887 /* And scrub the dirty cachelines for the HWSP */ 2888 drm_clflush_virt_range(engine->status_page.addr, PAGE_SIZE); 2889 2890 intel_engine_reset_pinned_contexts(engine); 2891} 2892 2893static void enable_error_interrupt(struct intel_engine_cs *engine) 2894{ 2895 u32 status; 2896 2897 engine->execlists.error_interrupt = 0; 2898 ENGINE_WRITE(engine, RING_EMR, ~0u); 2899 ENGINE_WRITE(engine, RING_EIR, ~0u); /* clear all existing errors */ 2900 2901 status = ENGINE_READ(engine, RING_ESR); 2902 if (unlikely(status)) { 2903 drm_err(&engine->i915->drm, 2904 "engine '%s' resumed still in error: %08x\n", 2905 engine->name, status); 2906 __intel_gt_reset(engine->gt, engine->mask); 2907 } 2908 2909 /* 2910 * On current gen8+, we have 2 signals to play with 2911 * 2912 * - I915_ERROR_INSTUCTION (bit 0) 2913 * 2914 * Generate an error if the command parser encounters an invalid 2915 * instruction 2916 * 2917 * This is a fatal error. 2918 * 2919 * - CP_PRIV (bit 2) 2920 * 2921 * Generate an error on privilege violation (where the CP replaces 2922 * the instruction with a no-op). This also fires for writes into 2923 * read-only scratch pages. 2924 * 2925 * This is a non-fatal error, parsing continues. 2926 * 2927 * * there are a few others defined for odd HW that we do not use 2928 * 2929 * Since CP_PRIV fires for cases where we have chosen to ignore the 2930 * error (as the HW is validating and suppressing the mistakes), we 2931 * only unmask the instruction error bit. 2932 */ 2933 ENGINE_WRITE(engine, RING_EMR, ~I915_ERROR_INSTRUCTION); 2934} 2935 2936static void enable_execlists(struct intel_engine_cs *engine) 2937{ 2938 u32 mode; 2939 2940 assert_forcewakes_active(engine->uncore, FORCEWAKE_ALL); 2941 2942 intel_engine_set_hwsp_writemask(engine, ~0u); /* HWSTAM */ 2943 2944 if (GRAPHICS_VER(engine->i915) >= 11) 2945 mode = _MASKED_BIT_ENABLE(GEN11_GFX_DISABLE_LEGACY_MODE); 2946 else 2947 mode = _MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE); 2948 ENGINE_WRITE_FW(engine, RING_MODE_GEN7, mode); 2949 2950 ENGINE_WRITE_FW(engine, RING_MI_MODE, _MASKED_BIT_DISABLE(STOP_RING)); 2951 2952 ENGINE_WRITE_FW(engine, 2953 RING_HWS_PGA, 2954 i915_ggtt_offset(engine->status_page.vma)); 2955 ENGINE_POSTING_READ(engine, RING_HWS_PGA); 2956 2957 enable_error_interrupt(engine); 2958} 2959 2960static int execlists_resume(struct intel_engine_cs *engine) 2961{ 2962 intel_mocs_init_engine(engine); 2963 intel_breadcrumbs_reset(engine->breadcrumbs); 2964 2965 enable_execlists(engine); 2966 2967 if (engine->flags & I915_ENGINE_FIRST_RENDER_COMPUTE) 2968 xehp_enable_ccs_engines(engine); 2969 2970 return 0; 2971} 2972 2973static void execlists_reset_prepare(struct intel_engine_cs *engine) 2974{ 2975 ENGINE_TRACE(engine, "depth<-%d\n", 2976 atomic_read(&engine->sched_engine->tasklet.count)); 2977 2978 /* 2979 * Prevent request submission to the hardware until we have 2980 * completed the reset in i915_gem_reset_finish(). If a request 2981 * is completed by one engine, it may then queue a request 2982 * to a second via its execlists->tasklet *just* as we are 2983 * calling engine->resume() and also writing the ELSP. 2984 * Turning off the execlists->tasklet until the reset is over 2985 * prevents the race. 2986 */ 2987 __tasklet_disable_sync_once(&engine->sched_engine->tasklet); 2988 GEM_BUG_ON(!reset_in_progress(engine)); 2989 2990 /* 2991 * We stop engines, otherwise we might get failed reset and a 2992 * dead gpu (on elk). Also as modern gpu as kbl can suffer 2993 * from system hang if batchbuffer is progressing when 2994 * the reset is issued, regardless of READY_TO_RESET ack. 2995 * Thus assume it is best to stop engines on all gens 2996 * where we have a gpu reset. 2997 * 2998 * WaKBLVECSSemaphoreWaitPoll:kbl (on ALL_ENGINES) 2999 * 3000 * FIXME: Wa for more modern gens needs to be validated 3001 */ 3002 ring_set_paused(engine, 1); 3003 intel_engine_stop_cs(engine); 3004 3005 /* 3006 * Wa_22011802037: In addition to stopping the cs, we need 3007 * to wait for any pending mi force wakeups 3008 */ 3009 if (intel_engine_reset_needs_wa_22011802037(engine->gt)) 3010 intel_engine_wait_for_pending_mi_fw(engine); 3011 3012 engine->execlists.reset_ccid = active_ccid(engine); 3013} 3014 3015static struct i915_request ** 3016reset_csb(struct intel_engine_cs *engine, struct i915_request **inactive) 3017{ 3018 struct intel_engine_execlists * const execlists = &engine->execlists; 3019 3020 drm_clflush_virt_range(execlists->csb_write, 3021 sizeof(execlists->csb_write[0])); 3022 3023 inactive = process_csb(engine, inactive); /* drain preemption events */ 3024 3025 /* Following the reset, we need to reload the CSB read/write pointers */ 3026 reset_csb_pointers(engine); 3027 3028 return inactive; 3029} 3030 3031static void 3032execlists_reset_active(struct intel_engine_cs *engine, bool stalled) 3033{ 3034 struct intel_context *ce; 3035 struct i915_request *rq; 3036 u32 head; 3037 3038 /* 3039 * Save the currently executing context, even if we completed 3040 * its request, it was still running at the time of the 3041 * reset and will have been clobbered. 3042 */ 3043 rq = active_context(engine, engine->execlists.reset_ccid); 3044 if (!rq) 3045 return; 3046 3047 ce = rq->context; 3048 GEM_BUG_ON(!i915_vma_is_pinned(ce->state)); 3049 3050 if (__i915_request_is_complete(rq)) { 3051 /* Idle context; tidy up the ring so we can restart afresh */ 3052 head = intel_ring_wrap(ce->ring, rq->tail); 3053 goto out_replay; 3054 } 3055 3056 /* We still have requests in-flight; the engine should be active */ 3057 GEM_BUG_ON(!intel_engine_pm_is_awake(engine)); 3058 3059 /* Context has requests still in-flight; it should not be idle! */ 3060 GEM_BUG_ON(i915_active_is_idle(&ce->active)); 3061 3062 rq = active_request(ce->timeline, rq); 3063 head = intel_ring_wrap(ce->ring, rq->head); 3064 GEM_BUG_ON(head == ce->ring->tail); 3065 3066 /* 3067 * If this request hasn't started yet, e.g. it is waiting on a 3068 * semaphore, we need to avoid skipping the request or else we 3069 * break the signaling chain. However, if the context is corrupt 3070 * the request will not restart and we will be stuck with a wedged 3071 * device. It is quite often the case that if we issue a reset 3072 * while the GPU is loading the context image, that the context 3073 * image becomes corrupt. 3074 * 3075 * Otherwise, if we have not started yet, the request should replay 3076 * perfectly and we do not need to flag the result as being erroneous. 3077 */ 3078 if (!__i915_request_has_started(rq)) 3079 goto out_replay; 3080 3081 /* 3082 * If the request was innocent, we leave the request in the ELSP 3083 * and will try to replay it on restarting. The context image may 3084 * have been corrupted by the reset, in which case we may have 3085 * to service a new GPU hang, but more likely we can continue on 3086 * without impact. 3087 * 3088 * If the request was guilty, we presume the context is corrupt 3089 * and have to at least restore the RING register in the context 3090 * image back to the expected values to skip over the guilty request. 3091 */ 3092 __i915_request_reset(rq, stalled); 3093 3094 /* 3095 * We want a simple context + ring to execute the breadcrumb update. 3096 * We cannot rely on the context being intact across the GPU hang, 3097 * so clear it and rebuild just what we need for the breadcrumb. 3098 * All pending requests for this context will be zapped, and any 3099 * future request will be after userspace has had the opportunity 3100 * to recreate its own state. 3101 */ 3102out_replay: 3103 ENGINE_TRACE(engine, "replay {head:%04x, tail:%04x}\n", 3104 head, ce->ring->tail); 3105 lrc_reset_regs(ce, engine); 3106 ce->lrc.lrca = lrc_update_regs(ce, engine, head); 3107} 3108 3109static void execlists_reset_csb(struct intel_engine_cs *engine, bool stalled) 3110{ 3111 struct intel_engine_execlists * const execlists = &engine->execlists; 3112 struct i915_request *post[2 * EXECLIST_MAX_PORTS]; 3113 struct i915_request **inactive; 3114 3115 rcu_read_lock(); 3116 inactive = reset_csb(engine, post); 3117 3118 execlists_reset_active(engine, true); 3119 3120 inactive = cancel_port_requests(execlists, inactive); 3121 post_process_csb(post, inactive); 3122 rcu_read_unlock(); 3123} 3124 3125static void execlists_reset_rewind(struct intel_engine_cs *engine, bool stalled) 3126{ 3127 unsigned long flags; 3128 3129 ENGINE_TRACE(engine, "\n"); 3130 3131 /* Process the csb, find the guilty context and throw away */ 3132 execlists_reset_csb(engine, stalled); 3133 3134 /* Push back any incomplete requests for replay after the reset. */ 3135 rcu_read_lock(); 3136 spin_lock_irqsave(&engine->sched_engine->lock, flags); 3137 __unwind_incomplete_requests(engine); 3138 spin_unlock_irqrestore(&engine->sched_engine->lock, flags); 3139 rcu_read_unlock(); 3140} 3141 3142static void nop_submission_tasklet(struct tasklet_struct *t) 3143{ 3144 struct i915_sched_engine *sched_engine = 3145 from_tasklet(sched_engine, t, tasklet); 3146 struct intel_engine_cs * const engine = sched_engine->private_data; 3147 3148 /* The driver is wedged; don't process any more events. */ 3149 WRITE_ONCE(engine->sched_engine->queue_priority_hint, INT_MIN); 3150} 3151 3152static void execlists_reset_cancel(struct intel_engine_cs *engine) 3153{ 3154 struct intel_engine_execlists * const execlists = &engine->execlists; 3155 struct i915_sched_engine * const sched_engine = engine->sched_engine; 3156 struct i915_request *rq, *rn; 3157 struct rb_node *rb; 3158 unsigned long flags; 3159 3160 ENGINE_TRACE(engine, "\n"); 3161 3162 /* 3163 * Before we call engine->cancel_requests(), we should have exclusive 3164 * access to the submission state. This is arranged for us by the 3165 * caller disabling the interrupt generation, the tasklet and other 3166 * threads that may then access the same state, giving us a free hand 3167 * to reset state. However, we still need to let lockdep be aware that 3168 * we know this state may be accessed in hardirq context, so we 3169 * disable the irq around this manipulation and we want to keep 3170 * the spinlock focused on its duties and not accidentally conflate 3171 * coverage to the submission's irq state. (Similarly, although we 3172 * shouldn't need to disable irq around the manipulation of the 3173 * submission's irq state, we also wish to remind ourselves that 3174 * it is irq state.) 3175 */ 3176 execlists_reset_csb(engine, true); 3177 3178 rcu_read_lock(); 3179 spin_lock_irqsave(&engine->sched_engine->lock, flags); 3180 3181 /* Mark all executing requests as skipped. */ 3182 list_for_each_entry(rq, &engine->sched_engine->requests, sched.link) 3183 i915_request_put(i915_request_mark_eio(rq)); 3184 intel_engine_signal_breadcrumbs(engine); 3185 3186 /* Flush the queued requests to the timeline list (for retiring). */ 3187 while ((rb = rb_first_cached(&sched_engine->queue))) { 3188 struct i915_priolist *p = to_priolist(rb); 3189 3190 priolist_for_each_request_consume(rq, rn, p) { 3191 if (i915_request_mark_eio(rq)) { 3192 __i915_request_submit(rq); 3193 i915_request_put(rq); 3194 } 3195 } 3196 3197 rb_erase_cached(&p->node, &sched_engine->queue); 3198 i915_priolist_free(p); 3199 } 3200 3201 /* On-hold requests will be flushed to timeline upon their release */ 3202 list_for_each_entry(rq, &sched_engine->hold, sched.link) 3203 i915_request_put(i915_request_mark_eio(rq)); 3204 3205 /* Cancel all attached virtual engines */ 3206 while ((rb = rb_first_cached(&execlists->virtual))) { 3207 struct virtual_engine *ve = 3208 rb_entry(rb, typeof(*ve), nodes[engine->id].rb); 3209 3210 rb_erase_cached(rb, &execlists->virtual); 3211 RB_CLEAR_NODE(rb); 3212 3213 spin_lock(&ve->base.sched_engine->lock); 3214 rq = fetch_and_zero(&ve->request); 3215 if (rq) { 3216 if (i915_request_mark_eio(rq)) { 3217 rq->engine = engine; 3218 __i915_request_submit(rq); 3219 i915_request_put(rq); 3220 } 3221 i915_request_put(rq); 3222 3223 ve->base.sched_engine->queue_priority_hint = INT_MIN; 3224 } 3225 spin_unlock(&ve->base.sched_engine->lock); 3226 } 3227 3228 /* Remaining _unready_ requests will be nop'ed when submitted */ 3229 3230 sched_engine->queue_priority_hint = INT_MIN; 3231 sched_engine->queue = RB_ROOT_CACHED; 3232 3233 GEM_BUG_ON(__tasklet_is_enabled(&engine->sched_engine->tasklet)); 3234 engine->sched_engine->tasklet.callback = nop_submission_tasklet; 3235 3236 spin_unlock_irqrestore(&engine->sched_engine->lock, flags); 3237 rcu_read_unlock(); 3238} 3239 3240static void execlists_reset_finish(struct intel_engine_cs *engine) 3241{ 3242 struct intel_engine_execlists * const execlists = &engine->execlists; 3243 3244 /* 3245 * After a GPU reset, we may have requests to replay. Do so now while 3246 * we still have the forcewake to be sure that the GPU is not allowed 3247 * to sleep before we restart and reload a context. 3248 * 3249 * If the GPU reset fails, the engine may still be alive with requests 3250 * inflight. We expect those to complete, or for the device to be 3251 * reset as the next level of recovery, and as a final resort we 3252 * will declare the device wedged. 3253 */ 3254 GEM_BUG_ON(!reset_in_progress(engine)); 3255 3256 /* And kick in case we missed a new request submission. */ 3257 if (__tasklet_enable(&engine->sched_engine->tasklet)) 3258 __execlists_kick(execlists); 3259 3260 ENGINE_TRACE(engine, "depth->%d\n", 3261 atomic_read(&engine->sched_engine->tasklet.count)); 3262} 3263 3264static void gen8_logical_ring_enable_irq(struct intel_engine_cs *engine) 3265{ 3266 ENGINE_WRITE(engine, RING_IMR, 3267 ~(engine->irq_enable_mask | engine->irq_keep_mask)); 3268 ENGINE_POSTING_READ(engine, RING_IMR); 3269} 3270 3271static void gen8_logical_ring_disable_irq(struct intel_engine_cs *engine) 3272{ 3273 ENGINE_WRITE(engine, RING_IMR, ~engine->irq_keep_mask); 3274} 3275 3276static void execlists_park(struct intel_engine_cs *engine) 3277{ 3278 cancel_timer(&engine->execlists.timer); 3279 cancel_timer(&engine->execlists.preempt); 3280 3281 /* Reset upon idling, or we may delay the busy wakeup. */ 3282 WRITE_ONCE(engine->sched_engine->queue_priority_hint, INT_MIN); 3283} 3284 3285static void add_to_engine(struct i915_request *rq) 3286{ 3287 lockdep_assert_held(&rq->engine->sched_engine->lock); 3288 list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests); 3289} 3290 3291static void remove_from_engine(struct i915_request *rq) 3292{ 3293 struct intel_engine_cs *engine, *locked; 3294 3295 /* 3296 * Virtual engines complicate acquiring the engine timeline lock, 3297 * as their rq->engine pointer is not stable until under that 3298 * engine lock. The simple ploy we use is to take the lock then 3299 * check that the rq still belongs to the newly locked engine. 3300 */ 3301 locked = READ_ONCE(rq->engine); 3302 spin_lock_irq(&locked->sched_engine->lock); 3303 while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) { 3304 spin_unlock(&locked->sched_engine->lock); 3305 spin_lock(&engine->sched_engine->lock); 3306 locked = engine; 3307 } 3308 list_del_init(&rq->sched.link); 3309 3310 clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); 3311 clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags); 3312 3313 /* Prevent further __await_execution() registering a cb, then flush */ 3314 set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags); 3315 3316 spin_unlock_irq(&locked->sched_engine->lock); 3317 3318 i915_request_notify_execute_cb_imm(rq); 3319} 3320 3321static bool can_preempt(struct intel_engine_cs *engine) 3322{ 3323 if (GRAPHICS_VER(engine->i915) > 8) 3324 return true; 3325 3326 /* GPGPU on bdw requires extra w/a; not implemented */ 3327 return engine->class != RENDER_CLASS; 3328} 3329 3330static void kick_execlists(const struct i915_request *rq, int prio) 3331{ 3332 struct intel_engine_cs *engine = rq->engine; 3333 struct i915_sched_engine *sched_engine = engine->sched_engine; 3334 const struct i915_request *inflight; 3335 3336 /* 3337 * We only need to kick the tasklet once for the high priority 3338 * new context we add into the queue. 3339 */ 3340 if (prio <= sched_engine->queue_priority_hint) 3341 return; 3342 3343 rcu_read_lock(); 3344 3345 /* Nothing currently active? We're overdue for a submission! */ 3346 inflight = execlists_active(&engine->execlists); 3347 if (!inflight) 3348 goto unlock; 3349 3350 /* 3351 * If we are already the currently executing context, don't 3352 * bother evaluating if we should preempt ourselves. 3353 */ 3354 if (inflight->context == rq->context) 3355 goto unlock; 3356 3357 ENGINE_TRACE(engine, 3358 "bumping queue-priority-hint:%d for rq:%llx:%lld, inflight:%llx:%lld prio %d\n", 3359 prio, 3360 rq->fence.context, rq->fence.seqno, 3361 inflight->fence.context, inflight->fence.seqno, 3362 inflight->sched.attr.priority); 3363 3364 sched_engine->queue_priority_hint = prio; 3365 3366 /* 3367 * Allow preemption of low -> normal -> high, but we do 3368 * not allow low priority tasks to preempt other low priority 3369 * tasks under the impression that latency for low priority 3370 * tasks does not matter (as much as background throughput), 3371 * so kiss. 3372 */ 3373 if (prio >= max(I915_PRIORITY_NORMAL, rq_prio(inflight))) 3374 tasklet_hi_schedule(&sched_engine->tasklet); 3375 3376unlock: 3377 rcu_read_unlock(); 3378} 3379 3380static void execlists_set_default_submission(struct intel_engine_cs *engine) 3381{ 3382 engine->submit_request = execlists_submit_request; 3383 engine->sched_engine->schedule = i915_schedule; 3384 engine->sched_engine->kick_backend = kick_execlists; 3385 engine->sched_engine->tasklet.callback = execlists_submission_tasklet; 3386} 3387 3388static void execlists_shutdown(struct intel_engine_cs *engine) 3389{ 3390 /* Synchronise with residual timers and any softirq they raise */ 3391 del_timer_sync(&engine->execlists.timer); 3392 del_timer_sync(&engine->execlists.preempt); 3393 tasklet_kill(&engine->sched_engine->tasklet); 3394} 3395 3396static void execlists_release(struct intel_engine_cs *engine) 3397{ 3398 engine->sanitize = NULL; /* no longer in control, nothing to sanitize */ 3399 3400 execlists_shutdown(engine); 3401 3402 intel_engine_cleanup_common(engine); 3403 lrc_fini_wa_ctx(engine); 3404} 3405 3406static ktime_t __execlists_engine_busyness(struct intel_engine_cs *engine, 3407 ktime_t *now) 3408{ 3409 struct intel_engine_execlists_stats *stats = &engine->stats.execlists; 3410 ktime_t total = stats->total; 3411 3412 /* 3413 * If the engine is executing something at the moment 3414 * add it to the total. 3415 */ 3416 *now = ktime_get(); 3417 if (READ_ONCE(stats->active)) 3418 total = ktime_add(total, ktime_sub(*now, stats->start)); 3419 3420 return total; 3421} 3422 3423static ktime_t execlists_engine_busyness(struct intel_engine_cs *engine, 3424 ktime_t *now) 3425{ 3426 struct intel_engine_execlists_stats *stats = &engine->stats.execlists; 3427 unsigned int seq; 3428 ktime_t total; 3429 3430 do { 3431 seq = read_seqcount_begin(&stats->lock); 3432 total = __execlists_engine_busyness(engine, now); 3433 } while (read_seqcount_retry(&stats->lock, seq)); 3434 3435 return total; 3436} 3437 3438static void 3439logical_ring_default_vfuncs(struct intel_engine_cs *engine) 3440{ 3441 /* Default vfuncs which can be overridden by each engine. */ 3442 3443 engine->resume = execlists_resume; 3444 3445 engine->cops = &execlists_context_ops; 3446 engine->request_alloc = execlists_request_alloc; 3447 engine->add_active_request = add_to_engine; 3448 engine->remove_active_request = remove_from_engine; 3449 3450 engine->reset.prepare = execlists_reset_prepare; 3451 engine->reset.rewind = execlists_reset_rewind; 3452 engine->reset.cancel = execlists_reset_cancel; 3453 engine->reset.finish = execlists_reset_finish; 3454 3455 engine->park = execlists_park; 3456 engine->unpark = NULL; 3457 3458 engine->emit_flush = gen8_emit_flush_xcs; 3459 engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb; 3460 engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_xcs; 3461 if (GRAPHICS_VER(engine->i915) >= 12) { 3462 engine->emit_fini_breadcrumb = gen12_emit_fini_breadcrumb_xcs; 3463 engine->emit_flush = gen12_emit_flush_xcs; 3464 } 3465 engine->set_default_submission = execlists_set_default_submission; 3466 3467 if (GRAPHICS_VER(engine->i915) < 11) { 3468 engine->irq_enable = gen8_logical_ring_enable_irq; 3469 engine->irq_disable = gen8_logical_ring_disable_irq; 3470 } else { 3471 /* 3472 * TODO: On Gen11 interrupt masks need to be clear 3473 * to allow C6 entry. Keep interrupts enabled at 3474 * and take the hit of generating extra interrupts 3475 * until a more refined solution exists. 3476 */ 3477 } 3478 intel_engine_set_irq_handler(engine, execlists_irq_handler); 3479 3480 engine->flags |= I915_ENGINE_SUPPORTS_STATS; 3481 if (!intel_vgpu_active(engine->i915)) { 3482 engine->flags |= I915_ENGINE_HAS_SEMAPHORES; 3483 if (can_preempt(engine)) { 3484 engine->flags |= I915_ENGINE_HAS_PREEMPTION; 3485 if (CONFIG_DRM_I915_TIMESLICE_DURATION) 3486 engine->flags |= I915_ENGINE_HAS_TIMESLICES; 3487 } 3488 } 3489 3490 if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 50)) { 3491 if (intel_engine_has_preemption(engine)) 3492 engine->emit_bb_start = xehp_emit_bb_start; 3493 else 3494 engine->emit_bb_start = xehp_emit_bb_start_noarb; 3495 } else { 3496 if (intel_engine_has_preemption(engine)) 3497 engine->emit_bb_start = gen8_emit_bb_start; 3498 else 3499 engine->emit_bb_start = gen8_emit_bb_start_noarb; 3500 } 3501 3502 engine->busyness = execlists_engine_busyness; 3503} 3504 3505static void logical_ring_default_irqs(struct intel_engine_cs *engine) 3506{ 3507 unsigned int shift = 0; 3508 3509 if (GRAPHICS_VER(engine->i915) < 11) { 3510 const u8 irq_shifts[] = { 3511 [RCS0] = GEN8_RCS_IRQ_SHIFT, 3512 [BCS0] = GEN8_BCS_IRQ_SHIFT, 3513 [VCS0] = GEN8_VCS0_IRQ_SHIFT, 3514 [VCS1] = GEN8_VCS1_IRQ_SHIFT, 3515 [VECS0] = GEN8_VECS_IRQ_SHIFT, 3516 }; 3517 3518 shift = irq_shifts[engine->id]; 3519 } 3520 3521 engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT << shift; 3522 engine->irq_keep_mask = GT_CONTEXT_SWITCH_INTERRUPT << shift; 3523 engine->irq_keep_mask |= GT_CS_MASTER_ERROR_INTERRUPT << shift; 3524 engine->irq_keep_mask |= GT_WAIT_SEMAPHORE_INTERRUPT << shift; 3525} 3526 3527static void rcs_submission_override(struct intel_engine_cs *engine) 3528{ 3529 switch (GRAPHICS_VER(engine->i915)) { 3530 case 12: 3531 engine->emit_flush = gen12_emit_flush_rcs; 3532 engine->emit_fini_breadcrumb = gen12_emit_fini_breadcrumb_rcs; 3533 break; 3534 case 11: 3535 engine->emit_flush = gen11_emit_flush_rcs; 3536 engine->emit_fini_breadcrumb = gen11_emit_fini_breadcrumb_rcs; 3537 break; 3538 default: 3539 engine->emit_flush = gen8_emit_flush_rcs; 3540 engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_rcs; 3541 break; 3542 } 3543} 3544 3545int intel_execlists_submission_setup(struct intel_engine_cs *engine) 3546{ 3547 struct intel_engine_execlists * const execlists = &engine->execlists; 3548 struct drm_i915_private *i915 = engine->i915; 3549 struct intel_uncore *uncore = engine->uncore; 3550 u32 base = engine->mmio_base; 3551 3552 tasklet_setup(&engine->sched_engine->tasklet, execlists_submission_tasklet); 3553#ifdef __linux__ 3554 timer_setup(&engine->execlists.timer, execlists_timeslice, 0); 3555 timer_setup(&engine->execlists.preempt, execlists_preempt, 0); 3556#else 3557 timeout_set(&engine->execlists.timer, execlists_timeslice, 3558 &engine->execlists.timer); 3559 timeout_set(&engine->execlists.preempt, execlists_preempt, 3560 &engine->execlists.preempt); 3561#endif 3562 3563 logical_ring_default_vfuncs(engine); 3564 logical_ring_default_irqs(engine); 3565 3566 seqcount_init(&engine->stats.execlists.lock); 3567 3568 if (engine->flags & I915_ENGINE_HAS_RCS_REG_STATE) 3569 rcs_submission_override(engine); 3570 3571 lrc_init_wa_ctx(engine); 3572 3573 if (HAS_LOGICAL_RING_ELSQ(i915)) { 3574 execlists->submit_reg = intel_uncore_regs(uncore) + 3575 i915_mmio_reg_offset(RING_EXECLIST_SQ_CONTENTS(base)); 3576 execlists->ctrl_reg = intel_uncore_regs(uncore) + 3577 i915_mmio_reg_offset(RING_EXECLIST_CONTROL(base)); 3578 3579 engine->fw_domain = intel_uncore_forcewake_for_reg(engine->uncore, 3580 RING_EXECLIST_CONTROL(engine->mmio_base), 3581 FW_REG_WRITE); 3582 } else { 3583 execlists->submit_reg = intel_uncore_regs(uncore) + 3584 i915_mmio_reg_offset(RING_ELSP(base)); 3585 } 3586 3587 execlists->csb_status = 3588 (u64 *)&engine->status_page.addr[I915_HWS_CSB_BUF0_INDEX]; 3589 3590 execlists->csb_write = 3591 &engine->status_page.addr[INTEL_HWS_CSB_WRITE_INDEX(i915)]; 3592 3593 if (GRAPHICS_VER(i915) < 11) 3594 execlists->csb_size = GEN8_CSB_ENTRIES; 3595 else 3596 execlists->csb_size = GEN11_CSB_ENTRIES; 3597 3598 engine->context_tag = GENMASK(BITS_PER_LONG - 2, 0); 3599 if (GRAPHICS_VER(engine->i915) >= 11 && 3600 GRAPHICS_VER_FULL(engine->i915) < IP_VER(12, 50)) { 3601 execlists->ccid |= engine->instance << (GEN11_ENGINE_INSTANCE_SHIFT - 32); 3602 execlists->ccid |= engine->class << (GEN11_ENGINE_CLASS_SHIFT - 32); 3603 } 3604 3605 /* Finally, take ownership and responsibility for cleanup! */ 3606 engine->sanitize = execlists_sanitize; 3607 engine->release = execlists_release; 3608 3609 return 0; 3610} 3611 3612static struct list_head *virtual_queue(struct virtual_engine *ve) 3613{ 3614 return &ve->base.sched_engine->default_priolist.requests; 3615} 3616 3617static void rcu_virtual_context_destroy(struct work_struct *wrk) 3618{ 3619 struct virtual_engine *ve = 3620 container_of(wrk, typeof(*ve), rcu.work); 3621 unsigned int n; 3622 3623 GEM_BUG_ON(ve->context.inflight); 3624 3625 /* Preempt-to-busy may leave a stale request behind. */ 3626 if (unlikely(ve->request)) { 3627 struct i915_request *old; 3628 3629 spin_lock_irq(&ve->base.sched_engine->lock); 3630 3631 old = fetch_and_zero(&ve->request); 3632 if (old) { 3633 GEM_BUG_ON(!__i915_request_is_complete(old)); 3634 __i915_request_submit(old); 3635 i915_request_put(old); 3636 } 3637 3638 spin_unlock_irq(&ve->base.sched_engine->lock); 3639 } 3640 3641 /* 3642 * Flush the tasklet in case it is still running on another core. 3643 * 3644 * This needs to be done before we remove ourselves from the siblings' 3645 * rbtrees as in the case it is running in parallel, it may reinsert 3646 * the rb_node into a sibling. 3647 */ 3648 tasklet_kill(&ve->base.sched_engine->tasklet); 3649 3650 /* Decouple ourselves from the siblings, no more access allowed. */ 3651 for (n = 0; n < ve->num_siblings; n++) { 3652 struct intel_engine_cs *sibling = ve->siblings[n]; 3653 struct rb_node *node = &ve->nodes[sibling->id].rb; 3654 3655 if (RB_EMPTY_NODE(node)) 3656 continue; 3657 3658 spin_lock_irq(&sibling->sched_engine->lock); 3659 3660 /* Detachment is lazily performed in the sched_engine->tasklet */ 3661 if (!RB_EMPTY_NODE(node)) 3662 rb_erase_cached(node, &sibling->execlists.virtual); 3663 3664 spin_unlock_irq(&sibling->sched_engine->lock); 3665 } 3666 GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.sched_engine->tasklet)); 3667 GEM_BUG_ON(!list_empty(virtual_queue(ve))); 3668 3669 lrc_fini(&ve->context); 3670 intel_context_fini(&ve->context); 3671 3672 if (ve->base.breadcrumbs) 3673 intel_breadcrumbs_put(ve->base.breadcrumbs); 3674 if (ve->base.sched_engine) 3675 i915_sched_engine_put(ve->base.sched_engine); 3676 intel_engine_free_request_pool(&ve->base); 3677 3678 kfree(ve); 3679} 3680 3681static void virtual_context_destroy(struct kref *kref) 3682{ 3683 struct virtual_engine *ve = 3684 container_of(kref, typeof(*ve), context.ref); 3685 3686 GEM_BUG_ON(!list_empty(&ve->context.signals)); 3687 3688 /* 3689 * When destroying the virtual engine, we have to be aware that 3690 * it may still be in use from an hardirq/softirq context causing 3691 * the resubmission of a completed request (background completion 3692 * due to preempt-to-busy). Before we can free the engine, we need 3693 * to flush the submission code and tasklets that are still potentially 3694 * accessing the engine. Flushing the tasklets requires process context, 3695 * and since we can guard the resubmit onto the engine with an RCU read 3696 * lock, we can delegate the free of the engine to an RCU worker. 3697 */ 3698 INIT_RCU_WORK(&ve->rcu, rcu_virtual_context_destroy); 3699 queue_rcu_work(ve->context.engine->i915->unordered_wq, &ve->rcu); 3700} 3701 3702static void virtual_engine_initial_hint(struct virtual_engine *ve) 3703{ 3704 int swp; 3705 3706 /* 3707 * Pick a random sibling on starting to help spread the load around. 3708 * 3709 * New contexts are typically created with exactly the same order 3710 * of siblings, and often started in batches. Due to the way we iterate 3711 * the array of sibling when submitting requests, sibling[0] is 3712 * prioritised for dequeuing. If we make sure that sibling[0] is fairly 3713 * randomised across the system, we also help spread the load by the 3714 * first engine we inspect being different each time. 3715 * 3716 * NB This does not force us to execute on this engine, it will just 3717 * typically be the first we inspect for submission. 3718 */ 3719 swp = get_random_u32_below(ve->num_siblings); 3720 if (swp) 3721 swap(ve->siblings[swp], ve->siblings[0]); 3722} 3723 3724static int virtual_context_alloc(struct intel_context *ce) 3725{ 3726 struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 3727 3728 return lrc_alloc(ce, ve->siblings[0]); 3729} 3730 3731static int virtual_context_pre_pin(struct intel_context *ce, 3732 struct i915_gem_ww_ctx *ww, 3733 void **vaddr) 3734{ 3735 struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 3736 3737 /* Note: we must use a real engine class for setting up reg state */ 3738 return __execlists_context_pre_pin(ce, ve->siblings[0], ww, vaddr); 3739} 3740 3741static int virtual_context_pin(struct intel_context *ce, void *vaddr) 3742{ 3743 struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 3744 3745 return lrc_pin(ce, ve->siblings[0], vaddr); 3746} 3747 3748static void virtual_context_enter(struct intel_context *ce) 3749{ 3750 struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 3751 unsigned int n; 3752 3753 for (n = 0; n < ve->num_siblings; n++) 3754 intel_engine_pm_get(ve->siblings[n]); 3755 3756 intel_timeline_enter(ce->timeline); 3757} 3758 3759static void virtual_context_exit(struct intel_context *ce) 3760{ 3761 struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 3762 unsigned int n; 3763 3764 intel_timeline_exit(ce->timeline); 3765 3766 for (n = 0; n < ve->num_siblings; n++) 3767 intel_engine_pm_put(ve->siblings[n]); 3768} 3769 3770static struct intel_engine_cs * 3771virtual_get_sibling(struct intel_engine_cs *engine, unsigned int sibling) 3772{ 3773 struct virtual_engine *ve = to_virtual_engine(engine); 3774 3775 if (sibling >= ve->num_siblings) 3776 return NULL; 3777 3778 return ve->siblings[sibling]; 3779} 3780 3781static const struct intel_context_ops virtual_context_ops = { 3782 .flags = COPS_HAS_INFLIGHT | COPS_RUNTIME_CYCLES, 3783 3784 .alloc = virtual_context_alloc, 3785 3786 .cancel_request = execlists_context_cancel_request, 3787 3788 .pre_pin = virtual_context_pre_pin, 3789 .pin = virtual_context_pin, 3790 .unpin = lrc_unpin, 3791 .post_unpin = lrc_post_unpin, 3792 3793 .enter = virtual_context_enter, 3794 .exit = virtual_context_exit, 3795 3796 .destroy = virtual_context_destroy, 3797 3798 .get_sibling = virtual_get_sibling, 3799}; 3800 3801static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve) 3802{ 3803 struct i915_request *rq; 3804 intel_engine_mask_t mask; 3805 3806 rq = READ_ONCE(ve->request); 3807 if (!rq) 3808 return 0; 3809 3810 /* The rq is ready for submission; rq->execution_mask is now stable. */ 3811 mask = rq->execution_mask; 3812 if (unlikely(!mask)) { 3813 /* Invalid selection, submit to a random engine in error */ 3814 i915_request_set_error_once(rq, -ENODEV); 3815 mask = ve->siblings[0]->mask; 3816 } 3817 3818 ENGINE_TRACE(&ve->base, "rq=%llx:%lld, mask=%x, prio=%d\n", 3819 rq->fence.context, rq->fence.seqno, 3820 mask, ve->base.sched_engine->queue_priority_hint); 3821 3822 return mask; 3823} 3824 3825static void virtual_submission_tasklet(struct tasklet_struct *t) 3826{ 3827 struct i915_sched_engine *sched_engine = 3828 from_tasklet(sched_engine, t, tasklet); 3829 struct virtual_engine * const ve = 3830 (struct virtual_engine *)sched_engine->private_data; 3831 const int prio = READ_ONCE(sched_engine->queue_priority_hint); 3832 intel_engine_mask_t mask; 3833 unsigned int n; 3834 3835 rcu_read_lock(); 3836 mask = virtual_submission_mask(ve); 3837 rcu_read_unlock(); 3838 if (unlikely(!mask)) 3839 return; 3840 3841 for (n = 0; n < ve->num_siblings; n++) { 3842 struct intel_engine_cs *sibling = READ_ONCE(ve->siblings[n]); 3843 struct ve_node * const node = &ve->nodes[sibling->id]; 3844 struct rb_node **parent, *rb; 3845 bool first; 3846 3847 if (!READ_ONCE(ve->request)) 3848 break; /* already handled by a sibling's tasklet */ 3849 3850 spin_lock_irq(&sibling->sched_engine->lock); 3851 3852 if (unlikely(!(mask & sibling->mask))) { 3853 if (!RB_EMPTY_NODE(&node->rb)) { 3854 rb_erase_cached(&node->rb, 3855 &sibling->execlists.virtual); 3856 RB_CLEAR_NODE(&node->rb); 3857 } 3858 3859 goto unlock_engine; 3860 } 3861 3862 if (unlikely(!RB_EMPTY_NODE(&node->rb))) { 3863 /* 3864 * Cheat and avoid rebalancing the tree if we can 3865 * reuse this node in situ. 3866 */ 3867 first = rb_first_cached(&sibling->execlists.virtual) == 3868 &node->rb; 3869 if (prio == node->prio || (prio > node->prio && first)) 3870 goto submit_engine; 3871 3872 rb_erase_cached(&node->rb, &sibling->execlists.virtual); 3873 } 3874 3875 rb = NULL; 3876 first = true; 3877 parent = &sibling->execlists.virtual.rb_root.rb_node; 3878 while (*parent) { 3879 struct ve_node *other; 3880 3881 rb = *parent; 3882 other = rb_entry(rb, typeof(*other), rb); 3883 if (prio > other->prio) { 3884 parent = &rb->rb_left; 3885 } else { 3886 parent = &rb->rb_right; 3887 first = false; 3888 } 3889 } 3890 3891 rb_link_node(&node->rb, rb, parent); 3892 rb_insert_color_cached(&node->rb, 3893 &sibling->execlists.virtual, 3894 first); 3895 3896submit_engine: 3897 GEM_BUG_ON(RB_EMPTY_NODE(&node->rb)); 3898 node->prio = prio; 3899 if (first && prio > sibling->sched_engine->queue_priority_hint) 3900 tasklet_hi_schedule(&sibling->sched_engine->tasklet); 3901 3902unlock_engine: 3903 spin_unlock_irq(&sibling->sched_engine->lock); 3904 3905 if (intel_context_inflight(&ve->context)) 3906 break; 3907 } 3908} 3909 3910static void virtual_submit_request(struct i915_request *rq) 3911{ 3912 struct virtual_engine *ve = to_virtual_engine(rq->engine); 3913 unsigned long flags; 3914 3915 ENGINE_TRACE(&ve->base, "rq=%llx:%lld\n", 3916 rq->fence.context, 3917 rq->fence.seqno); 3918 3919 GEM_BUG_ON(ve->base.submit_request != virtual_submit_request); 3920 3921 spin_lock_irqsave(&ve->base.sched_engine->lock, flags); 3922 3923 /* By the time we resubmit a request, it may be completed */ 3924 if (__i915_request_is_complete(rq)) { 3925 __i915_request_submit(rq); 3926 goto unlock; 3927 } 3928 3929 if (ve->request) { /* background completion from preempt-to-busy */ 3930 GEM_BUG_ON(!__i915_request_is_complete(ve->request)); 3931 __i915_request_submit(ve->request); 3932 i915_request_put(ve->request); 3933 } 3934 3935 ve->base.sched_engine->queue_priority_hint = rq_prio(rq); 3936 ve->request = i915_request_get(rq); 3937 3938 GEM_BUG_ON(!list_empty(virtual_queue(ve))); 3939 list_move_tail(&rq->sched.link, virtual_queue(ve)); 3940 3941 tasklet_hi_schedule(&ve->base.sched_engine->tasklet); 3942 3943unlock: 3944 spin_unlock_irqrestore(&ve->base.sched_engine->lock, flags); 3945} 3946 3947static struct intel_context * 3948execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count, 3949 unsigned long flags) 3950{ 3951 struct drm_i915_private *i915 = siblings[0]->i915; 3952 struct virtual_engine *ve; 3953 unsigned int n; 3954 int err; 3955 3956 ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL); 3957 if (!ve) 3958 return ERR_PTR(-ENOMEM); 3959 3960 ve->base.i915 = i915; 3961 ve->base.gt = siblings[0]->gt; 3962 ve->base.uncore = siblings[0]->uncore; 3963 ve->base.id = -1; 3964 3965 ve->base.class = OTHER_CLASS; 3966 ve->base.uabi_class = I915_ENGINE_CLASS_INVALID; 3967 ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL; 3968 ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL; 3969 3970 /* 3971 * The decision on whether to submit a request using semaphores 3972 * depends on the saturated state of the engine. We only compute 3973 * this during HW submission of the request, and we need for this 3974 * state to be globally applied to all requests being submitted 3975 * to this engine. Virtual engines encompass more than one physical 3976 * engine and so we cannot accurately tell in advance if one of those 3977 * engines is already saturated and so cannot afford to use a semaphore 3978 * and be pessimized in priority for doing so -- if we are the only 3979 * context using semaphores after all other clients have stopped, we 3980 * will be starved on the saturated system. Such a global switch for 3981 * semaphores is less than ideal, but alas is the current compromise. 3982 */ 3983 ve->base.saturated = ALL_ENGINES; 3984 3985 snprintf(ve->base.name, sizeof(ve->base.name), "virtual"); 3986 3987 intel_engine_init_execlists(&ve->base); 3988 3989 ve->base.sched_engine = i915_sched_engine_create(ENGINE_VIRTUAL); 3990 if (!ve->base.sched_engine) { 3991 err = -ENOMEM; 3992 goto err_put; 3993 } 3994 ve->base.sched_engine->private_data = &ve->base; 3995 3996 ve->base.cops = &virtual_context_ops; 3997 ve->base.request_alloc = execlists_request_alloc; 3998 3999 ve->base.sched_engine->schedule = i915_schedule; 4000 ve->base.sched_engine->kick_backend = kick_execlists; 4001 ve->base.submit_request = virtual_submit_request; 4002 4003 INIT_LIST_HEAD(virtual_queue(ve)); 4004 tasklet_setup(&ve->base.sched_engine->tasklet, virtual_submission_tasklet); 4005 4006 intel_context_init(&ve->context, &ve->base); 4007 4008 ve->base.breadcrumbs = intel_breadcrumbs_create(NULL); 4009 if (!ve->base.breadcrumbs) { 4010 err = -ENOMEM; 4011 goto err_put; 4012 } 4013 4014 for (n = 0; n < count; n++) { 4015 struct intel_engine_cs *sibling = siblings[n]; 4016 4017 GEM_BUG_ON(!is_power_of_2(sibling->mask)); 4018 if (sibling->mask & ve->base.mask) { 4019 drm_dbg(&i915->drm, 4020 "duplicate %s entry in load balancer\n", 4021 sibling->name); 4022 err = -EINVAL; 4023 goto err_put; 4024 } 4025 4026 /* 4027 * The virtual engine implementation is tightly coupled to 4028 * the execlists backend -- we push out request directly 4029 * into a tree inside each physical engine. We could support 4030 * layering if we handle cloning of the requests and 4031 * submitting a copy into each backend. 4032 */ 4033 if (sibling->sched_engine->tasklet.callback != 4034 execlists_submission_tasklet) { 4035 err = -ENODEV; 4036 goto err_put; 4037 } 4038 4039 GEM_BUG_ON(RB_EMPTY_NODE(&ve->nodes[sibling->id].rb)); 4040 RB_CLEAR_NODE(&ve->nodes[sibling->id].rb); 4041 4042 ve->siblings[ve->num_siblings++] = sibling; 4043 ve->base.mask |= sibling->mask; 4044 ve->base.logical_mask |= sibling->logical_mask; 4045 4046 /* 4047 * All physical engines must be compatible for their emission 4048 * functions (as we build the instructions during request 4049 * construction and do not alter them before submission 4050 * on the physical engine). We use the engine class as a guide 4051 * here, although that could be refined. 4052 */ 4053 if (ve->base.class != OTHER_CLASS) { 4054 if (ve->base.class != sibling->class) { 4055 drm_dbg(&i915->drm, 4056 "invalid mixing of engine class, sibling %d, already %d\n", 4057 sibling->class, ve->base.class); 4058 err = -EINVAL; 4059 goto err_put; 4060 } 4061 continue; 4062 } 4063 4064 ve->base.class = sibling->class; 4065 ve->base.uabi_class = sibling->uabi_class; 4066 snprintf(ve->base.name, sizeof(ve->base.name), 4067 "v%dx%d", ve->base.class, count); 4068 ve->base.context_size = sibling->context_size; 4069 4070 ve->base.add_active_request = sibling->add_active_request; 4071 ve->base.remove_active_request = sibling->remove_active_request; 4072 ve->base.emit_bb_start = sibling->emit_bb_start; 4073 ve->base.emit_flush = sibling->emit_flush; 4074 ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb; 4075 ve->base.emit_fini_breadcrumb = sibling->emit_fini_breadcrumb; 4076 ve->base.emit_fini_breadcrumb_dw = 4077 sibling->emit_fini_breadcrumb_dw; 4078 4079 ve->base.flags = sibling->flags; 4080 } 4081 4082 ve->base.flags |= I915_ENGINE_IS_VIRTUAL; 4083 4084 virtual_engine_initial_hint(ve); 4085 return &ve->context; 4086 4087err_put: 4088 intel_context_put(&ve->context); 4089 return ERR_PTR(err); 4090} 4091 4092void intel_execlists_show_requests(struct intel_engine_cs *engine, 4093 struct drm_printer *m, 4094 void (*show_request)(struct drm_printer *m, 4095 const struct i915_request *rq, 4096 const char *prefix, 4097 int indent), 4098 unsigned int max) 4099{ 4100 const struct intel_engine_execlists *execlists = &engine->execlists; 4101 struct i915_sched_engine *sched_engine = engine->sched_engine; 4102 struct i915_request *rq, *last; 4103 unsigned long flags; 4104 unsigned int count; 4105 struct rb_node *rb; 4106 4107 spin_lock_irqsave(&sched_engine->lock, flags); 4108 4109 last = NULL; 4110 count = 0; 4111 list_for_each_entry(rq, &sched_engine->requests, sched.link) { 4112 if (count++ < max - 1) 4113 show_request(m, rq, "\t\t", 0); 4114 else 4115 last = rq; 4116 } 4117 if (last) { 4118 if (count > max) { 4119 drm_printf(m, 4120 "\t\t...skipping %d executing requests...\n", 4121 count - max); 4122 } 4123 show_request(m, last, "\t\t", 0); 4124 } 4125 4126 if (sched_engine->queue_priority_hint != INT_MIN) 4127 drm_printf(m, "\t\tQueue priority hint: %d\n", 4128 READ_ONCE(sched_engine->queue_priority_hint)); 4129 4130 last = NULL; 4131 count = 0; 4132 for (rb = rb_first_cached(&sched_engine->queue); rb; rb = rb_next(rb)) { 4133 struct i915_priolist *p = rb_entry(rb, typeof(*p), node); 4134 4135 priolist_for_each_request(rq, p) { 4136 if (count++ < max - 1) 4137 show_request(m, rq, "\t\t", 0); 4138 else 4139 last = rq; 4140 } 4141 } 4142 if (last) { 4143 if (count > max) { 4144 drm_printf(m, 4145 "\t\t...skipping %d queued requests...\n", 4146 count - max); 4147 } 4148 show_request(m, last, "\t\t", 0); 4149 } 4150 4151 last = NULL; 4152 count = 0; 4153 for (rb = rb_first_cached(&execlists->virtual); rb; rb = rb_next(rb)) { 4154 struct virtual_engine *ve = 4155 rb_entry(rb, typeof(*ve), nodes[engine->id].rb); 4156 struct i915_request *rq = READ_ONCE(ve->request); 4157 4158 if (rq) { 4159 if (count++ < max - 1) 4160 show_request(m, rq, "\t\t", 0); 4161 else 4162 last = rq; 4163 } 4164 } 4165 if (last) { 4166 if (count > max) { 4167 drm_printf(m, 4168 "\t\t...skipping %d virtual requests...\n", 4169 count - max); 4170 } 4171 show_request(m, last, "\t\t", 0); 4172 } 4173 4174 spin_unlock_irqrestore(&sched_engine->lock, flags); 4175} 4176 4177void intel_execlists_dump_active_requests(struct intel_engine_cs *engine, 4178 struct i915_request *hung_rq, 4179 struct drm_printer *m) 4180{ 4181 unsigned long flags; 4182 4183 spin_lock_irqsave(&engine->sched_engine->lock, flags); 4184 4185 intel_engine_dump_active_requests(&engine->sched_engine->requests, hung_rq, m); 4186 4187 drm_printf(m, "\tOn hold?: %zu\n", 4188 list_count_nodes(&engine->sched_engine->hold)); 4189 4190 spin_unlock_irqrestore(&engine->sched_engine->lock, flags); 4191} 4192 4193#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) 4194#include "selftest_execlists.c" 4195#endif 4196