tutorial04.rst revision 1.3
1.. Copyright (C) 2014-2017 Free Software Foundation, Inc. 2 Originally contributed by David Malcolm <dmalcolm@redhat.com> 3 4 This is free software: you can redistribute it and/or modify it 5 under the terms of the GNU General Public License as published by 6 the Free Software Foundation, either version 3 of the License, or 7 (at your option) any later version. 8 9 This program is distributed in the hope that it will be useful, but 10 WITHOUT ANY WARRANTY; without even the implied warranty of 11 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 12 General Public License for more details. 13 14 You should have received a copy of the GNU General Public License 15 along with this program. If not, see 16 <http://www.gnu.org/licenses/>. 17 18.. default-domain:: cpp 19 20Tutorial part 4: Adding JIT-compilation to a toy interpreter 21------------------------------------------------------------ 22In this example we construct a "toy" interpreter, and add JIT-compilation 23to it. 24 25Our toy interpreter 26******************* 27 28It's a stack-based interpreter, and is intended as a (very simple) example 29of the kind of bytecode interpreter seen in dynamic languages such as 30Python, Ruby etc. 31 32For the sake of simplicity, our toy virtual machine is very limited: 33 34 * The only data type is `int` 35 36 * It can only work on one function at a time (so that the only 37 function call that can be made is to recurse). 38 39 * Functions can only take one parameter. 40 41 * Functions have a stack of `int` values. 42 43 * We'll implement function call within the interpreter by calling a 44 function in our implementation, rather than implementing our own 45 frame stack. 46 47 * The parser is only good enough to get the examples to work. 48 49Naturally, a real interpreter would be much more complicated that this. 50 51The following operations are supported: 52 53====================== ======================== =============== ============== 54Operation Meaning Old Stack New Stack 55====================== ======================== =============== ============== 56DUP Duplicate top of stack. ``[..., x]`` ``[..., x, x]`` 57ROT Swap top two elements ``[..., x, y]`` ``[..., y, x]`` 58 of stack. 59BINARY_ADD Add the top two elements ``[..., x, y]`` ``[..., (x+y)]`` 60 on the stack. 61BINARY_SUBTRACT Likewise, but subtract. ``[..., x, y]`` ``[..., (x-y)]`` 62BINARY_MULT Likewise, but multiply. ``[..., x, y]`` ``[..., (x*y)]`` 63BINARY_COMPARE_LT Compare the top two ``[..., x, y]`` ``[..., (x<y)]`` 64 elements on the stack 65 and push a nonzero/zero 66 if (x<y). 67RECURSE Recurse, passing the top ``[..., x]`` ``[..., fn(x)]`` 68 of the stack, and 69 popping the result. 70RETURN Return the top of the ``[x]`` ``[]`` 71 stack. 72PUSH_CONST `arg` Push an int const. ``[...]`` ``[..., arg]`` 73JUMP_ABS_IF_TRUE `arg` Pop; if top of stack was ``[..., x]`` ``[...]`` 74 nonzero, jump to 75 ``arg``. 76====================== ======================== =============== ============== 77 78Programs can be interpreted, disassembled, and compiled to machine code. 79 80The interpreter reads ``.toy`` scripts. Here's what a simple recursive 81factorial program looks like, the script ``factorial.toy``. 82The parser ignores lines beginning with a `#`. 83 84 .. literalinclude:: ../../examples/tut04-toyvm/factorial.toy 85 :lines: 1- 86 :language: c 87 88The interpreter is a simple infinite loop with a big ``switch`` statement 89based on what the next opcode is: 90 91 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 92 :start-after: /* Execute the given function. */ 93 :end-before: /* JIT compilation. */ 94 :language: c++ 95 96Compiling to machine code 97************************* 98We want to generate machine code that can be cast to this type and 99then directly executed in-process: 100 101 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 102 :start-after: /* Functions are compiled to this function ptr type. */ 103 :end-before: enum opcode 104 :language: c++ 105 106Our compiler isn't very sophisticated; it takes the implementation of 107each opcode above, and maps it directly to the operations supported by 108the libgccjit API. 109 110How should we handle the stack? In theory we could calculate what the 111stack depth will be at each opcode, and optimize away the stack 112manipulation "by hand". We'll see below that libgccjit is able to do 113this for us, so we'll implement stack manipulation 114in a direct way, by creating a ``stack`` array and ``stack_depth`` 115variables, local within the generated function, equivalent to this C code: 116 117.. code-block:: c 118 119 int stack_depth; 120 int stack[MAX_STACK_DEPTH]; 121 122We'll also have local variables ``x`` and ``y`` for use when implementing 123the opcodes, equivalent to this: 124 125.. code-block:: c 126 127 int x; 128 int y; 129 130This means our compiler has the following state: 131 132 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 133 :start-after: /* State. */ 134 :end-before: }; 135 :language: c++ 136 137Setting things up 138***************** 139 140First we create our types: 141 142 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 143 :start-after: /* Create types. */ 144 :end-before: /* The constant value 1. */ 145 :language: c++ 146 147along with extracting a useful `int` constant: 148 149 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 150 :start-after: /* The constant value 1. */ 151 :end-before: /* Create locations. */ 152 :language: c++ 153 154We'll implement push and pop in terms of the ``stack`` array and 155``stack_depth``. Here are helper functions for adding statements to 156a block, implementing pushing and popping values: 157 158 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 159 :start-after: /* Stack manipulation. */ 160 :end-before: /* Create the context. */ 161 :language: c++ 162 163We will support single-stepping through the generated code in the 164debugger, so we need to create :type:`gccjit::location` instances, one 165per operation in the source code. These will reference the lines of 166e.g. ``factorial.toy``. 167 168 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 169 :start-after: /* Create locations. */ 170 :end-before: /* Creating the function. */ 171 :language: c++ 172 173Let's create the function itself. As usual, we create its parameter 174first, then use the parameter to create the function: 175 176 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 177 :start-after: /* Creating the function. */ 178 :end-before: /* Create stack lvalues. */ 179 :language: c++ 180 181We create the locals within the function. 182 183 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 184 :start-after: /* Create stack lvalues. */ 185 :end-before: /* 1st pass: create blocks, one per opcode. 186 :language: c++ 187 188Populating the function 189*********************** 190 191There's some one-time initialization, and the API treats the first block 192you create as the entrypoint of the function, so we need to create that 193block first: 194 195 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 196 :start-after: first. */ 197 :end-before: /* Create a block per operation. */ 198 :language: c++ 199 200We can now create blocks for each of the operations. Most of these will 201be consolidated into larger blocks when the optimizer runs. 202 203 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 204 :start-after: /* Create a block per operation. */ 205 :end-before: /* Populate the initial block. */ 206 :language: c++ 207 208Now that we have a block it can jump to when it's done, we can populate 209the initial block: 210 211 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 212 :start-after: /* Populate the initial block. */ 213 :end-before: /* 2nd pass: fill in instructions. */ 214 :language: c++ 215 216We can now populate the blocks for the individual operations. We loop 217through them, adding instructions to their blocks: 218 219 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 220 :start-after: /* 2nd pass: fill in instructions. */ 221 :end-before: /* Helper macros. */ 222 :language: c++ 223 224We're going to have another big ``switch`` statement for implementing 225the opcodes, this time for compiling them, rather than interpreting 226them. It's helpful to have macros for implementing push and pop, so that 227we can make the ``switch`` statement that's coming up look as much as 228possible like the one above within the interpreter: 229 230.. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 231 :start-after: /* Helper macros. */ 232 :end-before: block.add_comment 233 :language: c++ 234 235.. note:: 236 237 A particularly clever implementation would have an *identical* 238 ``switch`` statement shared by the interpreter and the compiler, with 239 some preprocessor "magic". We're not doing that here, for the sake 240 of simplicity. 241 242When I first implemented this compiler, I accidentally missed an edit 243when copying and pasting the ``Y_EQUALS_POP`` macro, so that popping the 244stack into ``y`` instead erroneously assigned it to ``x``, leaving ``y`` 245uninitialized. 246 247To track this kind of thing down, we can use 248:func:`gccjit::block::add_comment` to add descriptive comments 249to the internal representation. This is invaluable when looking through 250the generated IR for, say ``factorial``: 251 252 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 253 :start-after: PUSH_RVALUE (y) 254 :end-before: /* Handle the individual opcodes. */ 255 :language: c++ 256 257We can now write the big ``switch`` statement that implements the 258individual opcodes, populating the relevant block with statements: 259 260 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 261 :start-after: /* Handle the individual opcodes. */ 262 :end-before: /* Go to the next block. */ 263 :language: c++ 264 265Every block must be terminated, via a call to one of the 266``gccjit::block::end_with_`` entrypoints. This has been done for two 267of the opcodes, but we need to do it for the other ones, by jumping 268to the next block. 269 270 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 271 :start-after: /* Go to the next block. */ 272 :end-before: /* end of loop on PC locations. */ 273 :language: c++ 274 275This is analogous to simply incrementing the program counter. 276 277Verifying the control flow graph 278******************************** 279Having finished looping over the blocks, the context is complete. 280 281As before, we can verify that the control flow and statements are sane by 282using :func:`gccjit::function::dump_to_dot`: 283 284.. code-block:: c++ 285 286 fn.dump_to_dot ("/tmp/factorial.dot"); 287 288and viewing the result. Note how the label names, comments, and 289variable names show up in the dump, to make it easier to spot 290errors in our compiler. 291 292 .. figure:: ../../intro/factorial.png 293 :alt: image of a control flow graph 294 295Compiling the context 296********************* 297Having finished looping over the blocks and populating them with 298statements, the context is complete. 299 300We can now compile it, extract machine code from the result, and 301run it: 302 303 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 304 :start-after: /* Wrapper around a gcc_jit_result *. */ 305 :end-before: /* Functions are compiled to this function ptr type. */ 306 :language: c++ 307 308 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 309 :start-after: /* JIT-compilation. */ 310 :end-before: return 0; 311 :language: c++ 312 313Single-stepping through the generated code 314****************************************** 315 316It's possible to debug the generated code. To do this we need to both: 317 318 * Set up source code locations for our statements, so that we can 319 meaningfully step through the code. We did this above by 320 calling :func:`gccjit::context::new_location` and using the 321 results. 322 323 * Enable the generation of debugging information, by setting 324 :c:macro:`GCC_JIT_BOOL_OPTION_DEBUGINFO` on the 325 :type:`gccjit::context` via 326 :func:`gccjit::context::set_bool_option`: 327 328 .. code-block:: c++ 329 330 ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DEBUGINFO, 1); 331 332Having done this, we can put a breakpoint on the generated function: 333 334.. code-block:: console 335 336 $ gdb --args ./toyvm factorial.toy 10 337 (gdb) break factorial 338 Function "factorial" not defined. 339 Make breakpoint pending on future shared library load? (y or [n]) y 340 Breakpoint 1 (factorial) pending. 341 (gdb) run 342 Breakpoint 1, factorial (arg=10) at factorial.toy:14 343 14 DUP 344 345We've set up location information, which references ``factorial.toy``. 346This allows us to use e.g. ``list`` to see where we are in the script: 347 348.. code-block:: console 349 350 (gdb) list 351 9 352 10 # Initial state: 353 11 # stack: [arg] 354 12 355 13 # 0: 356 14 DUP 357 15 # stack: [arg, arg] 358 16 359 17 # 1: 360 18 PUSH_CONST 2 361 362and to step through the function, examining the data: 363 364.. code-block:: console 365 366 (gdb) n 367 18 PUSH_CONST 2 368 (gdb) n 369 22 BINARY_COMPARE_LT 370 (gdb) print stack 371 $5 = {10, 10, 2, 0, -7152, 32767, 0, 0} 372 (gdb) print stack_depth 373 $6 = 3 374 375You'll see that the parts of the ``stack`` array that haven't been 376touched yet are uninitialized. 377 378.. note:: 379 380 Turning on optimizations may lead to unpredictable results when 381 stepping through the generated code: the execution may appear to 382 "jump around" the source code. This is analogous to turning up the 383 optimization level in a regular compiler. 384 385Examining the generated code 386**************************** 387 388How good is the optimized code? 389 390We can turn up optimizations, by calling 391:cpp:func:`gccjit::context::set_int_option` with 392:c:macro:`GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL`: 393 394.. code-block:: c++ 395 396 ctxt.set_int_option (GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL, 3); 397 398One of GCC's internal representations is called "gimple". A dump of the 399initial gimple representation of the code can be seen by setting: 400 401.. code-block:: c++ 402 403 ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DUMP_INITIAL_GIMPLE, 1); 404 405With optimization on and source locations displayed, this gives: 406 407.. We'll use "c" for gimple dumps 408 409.. code-block:: c 410 411 factorial (signed int arg) 412 { 413 <unnamed type> D.80; 414 signed int D.81; 415 signed int D.82; 416 signed int D.83; 417 signed int D.84; 418 signed int D.85; 419 signed int y; 420 signed int x; 421 signed int stack_depth; 422 signed int stack[8]; 423 424 try 425 { 426 initial: 427 stack_depth = 0; 428 stack[stack_depth] = arg; 429 stack_depth = stack_depth + 1; 430 goto instr0; 431 instr0: 432 /* DUP */: 433 stack_depth = stack_depth + -1; 434 x = stack[stack_depth]; 435 stack[stack_depth] = x; 436 stack_depth = stack_depth + 1; 437 stack[stack_depth] = x; 438 stack_depth = stack_depth + 1; 439 goto instr1; 440 instr1: 441 /* PUSH_CONST */: 442 stack[stack_depth] = 2; 443 stack_depth = stack_depth + 1; 444 goto instr2; 445 446 /* etc */ 447 448You can see the generated machine code in assembly form via: 449 450.. code-block:: c++ 451 452 ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DUMP_GENERATED_CODE, 1); 453 result = ctxt.compile (); 454 455which shows that (on this x86_64 box) the compiler has unrolled the loop 456and is using MMX instructions to perform several multiplications 457simultaneously: 458 459.. code-block:: gas 460 461 .file "fake.c" 462 .text 463 .Ltext0: 464 .p2align 4,,15 465 .globl factorial 466 .type factorial, @function 467 factorial: 468 .LFB0: 469 .file 1 "factorial.toy" 470 .loc 1 14 0 471 .cfi_startproc 472 .LVL0: 473 .L2: 474 .loc 1 26 0 475 cmpl $1, %edi 476 jle .L13 477 leal -1(%rdi), %edx 478 movl %edx, %ecx 479 shrl $2, %ecx 480 leal 0(,%rcx,4), %esi 481 testl %esi, %esi 482 je .L14 483 cmpl $9, %edx 484 jbe .L14 485 leal -2(%rdi), %eax 486 movl %eax, -16(%rsp) 487 leal -3(%rdi), %eax 488 movd -16(%rsp), %xmm0 489 movl %edi, -16(%rsp) 490 movl %eax, -12(%rsp) 491 movd -16(%rsp), %xmm1 492 xorl %eax, %eax 493 movl %edx, -16(%rsp) 494 movd -12(%rsp), %xmm4 495 movd -16(%rsp), %xmm6 496 punpckldq %xmm4, %xmm0 497 movdqa .LC1(%rip), %xmm4 498 punpckldq %xmm6, %xmm1 499 punpcklqdq %xmm0, %xmm1 500 movdqa .LC0(%rip), %xmm0 501 jmp .L5 502 # etc - edited for brevity 503 504This is clearly overkill for a function that will likely overflow the 505``int`` type before the vectorization is worthwhile - but then again, this 506is a toy example. 507 508Turning down the optimization level to 2: 509 510.. code-block:: c++ 511 512 ctxt.set_int_option (GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL, 2); 513 514yields this code, which is simple enough to quote in its entirety: 515 516.. code-block:: gas 517 518 .file "fake.c" 519 .text 520 .p2align 4,,15 521 .globl factorial 522 .type factorial, @function 523 factorial: 524 .LFB0: 525 .cfi_startproc 526 .L2: 527 cmpl $1, %edi 528 jle .L8 529 movl $1, %edx 530 jmp .L4 531 .p2align 4,,10 532 .p2align 3 533 .L6: 534 movl %eax, %edi 535 .L4: 536 .L5: 537 leal -1(%rdi), %eax 538 imull %edi, %edx 539 cmpl $1, %eax 540 jne .L6 541 .L3: 542 .L7: 543 imull %edx, %eax 544 ret 545 .L8: 546 movl %edi, %eax 547 movl $1, %edx 548 jmp .L7 549 .cfi_endproc 550 .LFE0: 551 .size factorial, .-factorial 552 .ident "GCC: (GNU) 4.9.0 20131023 (Red Hat 0.2-%{gcc_release})" 553 .section .note.GNU-stack,"",@progbits 554 555Note that the stack pushing and popping have been eliminated, as has the 556recursive call (in favor of an iteration). 557 558Putting it all together 559*********************** 560 561The complete example can be seen in the source tree at 562``gcc/jit/docs/examples/tut04-toyvm/toyvm.cc`` 563 564along with a Makefile and a couple of sample .toy scripts: 565 566.. code-block:: console 567 568 $ ls -al 569 drwxrwxr-x. 2 david david 4096 Sep 19 17:46 . 570 drwxrwxr-x. 3 david david 4096 Sep 19 15:26 .. 571 -rw-rw-r--. 1 david david 615 Sep 19 12:43 factorial.toy 572 -rw-rw-r--. 1 david david 834 Sep 19 13:08 fibonacci.toy 573 -rw-rw-r--. 1 david david 238 Sep 19 14:22 Makefile 574 -rw-rw-r--. 1 david david 16457 Sep 19 17:07 toyvm.cc 575 576 $ make toyvm 577 g++ -Wall -g -o toyvm toyvm.cc -lgccjit 578 579 $ ./toyvm factorial.toy 10 580 interpreter result: 3628800 581 compiler result: 3628800 582 583 $ ./toyvm fibonacci.toy 10 584 interpreter result: 55 585 compiler result: 55 586 587Behind the curtain: How does our code get optimized? 588**************************************************** 589 590Our example is done, but you may be wondering about exactly how the 591compiler turned what we gave it into the machine code seen above. 592 593We can examine what the compiler is doing in detail by setting: 594 595.. code-block:: c++ 596 597 state.ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DUMP_EVERYTHING, 1); 598 state.ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_KEEP_INTERMEDIATES, 1); 599 600This will dump detailed information about the compiler's state to a 601directory under ``/tmp``, and keep it from being cleaned up. 602 603The precise names and their formats of these files is subject to change. 604Higher optimization levels lead to more files. 605Here's what I saw (edited for brevity; there were almost 200 files): 606 607.. code-block:: console 608 609 intermediate files written to /tmp/libgccjit-KPQbGw 610 $ ls /tmp/libgccjit-KPQbGw/ 611 fake.c.000i.cgraph 612 fake.c.000i.type-inheritance 613 fake.c.004t.gimple 614 fake.c.007t.omplower 615 fake.c.008t.lower 616 fake.c.011t.eh 617 fake.c.012t.cfg 618 fake.c.014i.visibility 619 fake.c.015i.early_local_cleanups 620 fake.c.016t.ssa 621 # etc 622 623The gimple code is converted into Static Single Assignment form, 624with annotations for use when generating the debuginfo: 625 626.. code-block:: console 627 628 $ less /tmp/libgccjit-KPQbGw/fake.c.016t.ssa 629 630.. code-block:: c 631 632 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0) 633 634 factorial (signed int arg) 635 { 636 signed int stack[8]; 637 signed int stack_depth; 638 signed int x; 639 signed int y; 640 <unnamed type> _20; 641 signed int _21; 642 signed int _38; 643 signed int _44; 644 signed int _51; 645 signed int _56; 646 647 initial: 648 stack_depth_3 = 0; 649 # DEBUG stack_depth => stack_depth_3 650 stack[stack_depth_3] = arg_5(D); 651 stack_depth_7 = stack_depth_3 + 1; 652 # DEBUG stack_depth => stack_depth_7 653 # DEBUG instr0 => NULL 654 # DEBUG /* DUP */ => NULL 655 stack_depth_8 = stack_depth_7 + -1; 656 # DEBUG stack_depth => stack_depth_8 657 x_9 = stack[stack_depth_8]; 658 # DEBUG x => x_9 659 stack[stack_depth_8] = x_9; 660 stack_depth_11 = stack_depth_8 + 1; 661 # DEBUG stack_depth => stack_depth_11 662 stack[stack_depth_11] = x_9; 663 stack_depth_13 = stack_depth_11 + 1; 664 # DEBUG stack_depth => stack_depth_13 665 # DEBUG instr1 => NULL 666 # DEBUG /* PUSH_CONST */ => NULL 667 stack[stack_depth_13] = 2; 668 669 /* etc; edited for brevity */ 670 671We can perhaps better see the code by turning off 672:c:macro:`GCC_JIT_BOOL_OPTION_DEBUGINFO` to suppress all those ``DEBUG`` 673statements, giving: 674 675.. code-block:: console 676 677 $ less /tmp/libgccjit-1Hywc0/fake.c.016t.ssa 678 679.. code-block:: c 680 681 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0) 682 683 factorial (signed int arg) 684 { 685 signed int stack[8]; 686 signed int stack_depth; 687 signed int x; 688 signed int y; 689 <unnamed type> _20; 690 signed int _21; 691 signed int _38; 692 signed int _44; 693 signed int _51; 694 signed int _56; 695 696 initial: 697 stack_depth_3 = 0; 698 stack[stack_depth_3] = arg_5(D); 699 stack_depth_7 = stack_depth_3 + 1; 700 stack_depth_8 = stack_depth_7 + -1; 701 x_9 = stack[stack_depth_8]; 702 stack[stack_depth_8] = x_9; 703 stack_depth_11 = stack_depth_8 + 1; 704 stack[stack_depth_11] = x_9; 705 stack_depth_13 = stack_depth_11 + 1; 706 stack[stack_depth_13] = 2; 707 stack_depth_15 = stack_depth_13 + 1; 708 stack_depth_16 = stack_depth_15 + -1; 709 y_17 = stack[stack_depth_16]; 710 stack_depth_18 = stack_depth_16 + -1; 711 x_19 = stack[stack_depth_18]; 712 _20 = x_19 < y_17; 713 _21 = (signed int) _20; 714 stack[stack_depth_18] = _21; 715 stack_depth_23 = stack_depth_18 + 1; 716 stack_depth_24 = stack_depth_23 + -1; 717 x_25 = stack[stack_depth_24]; 718 if (x_25 != 0) 719 goto <bb 4> (instr9); 720 else 721 goto <bb 3> (instr4); 722 723 instr4: 724 /* DUP */: 725 stack_depth_26 = stack_depth_24 + -1; 726 x_27 = stack[stack_depth_26]; 727 stack[stack_depth_26] = x_27; 728 stack_depth_29 = stack_depth_26 + 1; 729 stack[stack_depth_29] = x_27; 730 stack_depth_31 = stack_depth_29 + 1; 731 stack[stack_depth_31] = 1; 732 stack_depth_33 = stack_depth_31 + 1; 733 stack_depth_34 = stack_depth_33 + -1; 734 y_35 = stack[stack_depth_34]; 735 stack_depth_36 = stack_depth_34 + -1; 736 x_37 = stack[stack_depth_36]; 737 _38 = x_37 - y_35; 738 stack[stack_depth_36] = _38; 739 stack_depth_40 = stack_depth_36 + 1; 740 stack_depth_41 = stack_depth_40 + -1; 741 x_42 = stack[stack_depth_41]; 742 _44 = factorial (x_42); 743 stack[stack_depth_41] = _44; 744 stack_depth_46 = stack_depth_41 + 1; 745 stack_depth_47 = stack_depth_46 + -1; 746 y_48 = stack[stack_depth_47]; 747 stack_depth_49 = stack_depth_47 + -1; 748 x_50 = stack[stack_depth_49]; 749 _51 = x_50 * y_48; 750 stack[stack_depth_49] = _51; 751 stack_depth_53 = stack_depth_49 + 1; 752 753 # stack_depth_1 = PHI <stack_depth_24(2), stack_depth_53(3)> 754 instr9: 755 /* RETURN */: 756 stack_depth_54 = stack_depth_1 + -1; 757 x_55 = stack[stack_depth_54]; 758 _56 = x_55; 759 stack ={v} {CLOBBER}; 760 return _56; 761 762 } 763 764Note in the above how all the :type:`gccjit::block` instances we 765created have been consolidated into just 3 blocks in GCC's internal 766representation: ``initial``, ``instr4`` and ``instr9``. 767 768Optimizing away stack manipulation 769^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 770Recall our simple implementation of stack operations. Let's examine 771how the stack operations are optimized away. 772 773After a pass of constant-propagation, the depth of the stack at each 774opcode can be determined at compile-time: 775 776.. code-block:: console 777 778 $ less /tmp/libgccjit-1Hywc0/fake.c.021t.ccp1 779 780.. code-block:: c 781 782 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0) 783 784 factorial (signed int arg) 785 { 786 signed int stack[8]; 787 signed int stack_depth; 788 signed int x; 789 signed int y; 790 <unnamed type> _20; 791 signed int _21; 792 signed int _38; 793 signed int _44; 794 signed int _51; 795 796 initial: 797 stack[0] = arg_5(D); 798 x_9 = stack[0]; 799 stack[0] = x_9; 800 stack[1] = x_9; 801 stack[2] = 2; 802 y_17 = stack[2]; 803 x_19 = stack[1]; 804 _20 = x_19 < y_17; 805 _21 = (signed int) _20; 806 stack[1] = _21; 807 x_25 = stack[1]; 808 if (x_25 != 0) 809 goto <bb 4> (instr9); 810 else 811 goto <bb 3> (instr4); 812 813 instr4: 814 /* DUP */: 815 x_27 = stack[0]; 816 stack[0] = x_27; 817 stack[1] = x_27; 818 stack[2] = 1; 819 y_35 = stack[2]; 820 x_37 = stack[1]; 821 _38 = x_37 - y_35; 822 stack[1] = _38; 823 x_42 = stack[1]; 824 _44 = factorial (x_42); 825 stack[1] = _44; 826 y_48 = stack[1]; 827 x_50 = stack[0]; 828 _51 = x_50 * y_48; 829 stack[0] = _51; 830 831 instr9: 832 /* RETURN */: 833 x_55 = stack[0]; 834 x_56 = x_55; 835 stack ={v} {CLOBBER}; 836 return x_56; 837 838 } 839 840Note how, in the above, all those ``stack_depth`` values are now just 841constants: we're accessing specific stack locations at each opcode. 842 843The "esra" pass ("Early Scalar Replacement of Aggregates") breaks 844out our "stack" array into individual elements: 845 846.. code-block:: console 847 848 $ less /tmp/libgccjit-1Hywc0/fake.c.024t.esra 849 850.. code-block:: c 851 852 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0) 853 854 Created a replacement for stack offset: 0, size: 32: stack$0 855 Created a replacement for stack offset: 32, size: 32: stack$1 856 Created a replacement for stack offset: 64, size: 32: stack$2 857 858 Symbols to be put in SSA form 859 { D.89 D.90 D.91 } 860 Incremental SSA update started at block: 0 861 Number of blocks in CFG: 5 862 Number of blocks to update: 4 ( 80%) 863 864 865 factorial (signed int arg) 866 { 867 signed int stack$2; 868 signed int stack$1; 869 signed int stack$0; 870 signed int stack[8]; 871 signed int stack_depth; 872 signed int x; 873 signed int y; 874 <unnamed type> _20; 875 signed int _21; 876 signed int _38; 877 signed int _44; 878 signed int _51; 879 880 initial: 881 stack$0_45 = arg_5(D); 882 x_9 = stack$0_45; 883 stack$0_39 = x_9; 884 stack$1_32 = x_9; 885 stack$2_30 = 2; 886 y_17 = stack$2_30; 887 x_19 = stack$1_32; 888 _20 = x_19 < y_17; 889 _21 = (signed int) _20; 890 stack$1_28 = _21; 891 x_25 = stack$1_28; 892 if (x_25 != 0) 893 goto <bb 4> (instr9); 894 else 895 goto <bb 3> (instr4); 896 897 instr4: 898 /* DUP */: 899 x_27 = stack$0_39; 900 stack$0_22 = x_27; 901 stack$1_14 = x_27; 902 stack$2_12 = 1; 903 y_35 = stack$2_12; 904 x_37 = stack$1_14; 905 _38 = x_37 - y_35; 906 stack$1_10 = _38; 907 x_42 = stack$1_10; 908 _44 = factorial (x_42); 909 stack$1_6 = _44; 910 y_48 = stack$1_6; 911 x_50 = stack$0_22; 912 _51 = x_50 * y_48; 913 stack$0_1 = _51; 914 915 # stack$0_52 = PHI <stack$0_39(2), stack$0_1(3)> 916 instr9: 917 /* RETURN */: 918 x_55 = stack$0_52; 919 x_56 = x_55; 920 stack ={v} {CLOBBER}; 921 return x_56; 922 923 } 924 925Hence at this point, all those pushes and pops of the stack are now 926simply assignments to specific temporary variables. 927 928After some copy propagation, the stack manipulation has been completely 929optimized away: 930 931.. code-block:: console 932 933 $ less /tmp/libgccjit-1Hywc0/fake.c.026t.copyprop1 934 935.. code-block:: c 936 937 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0) 938 939 factorial (signed int arg) 940 { 941 signed int stack$2; 942 signed int stack$1; 943 signed int stack$0; 944 signed int stack[8]; 945 signed int stack_depth; 946 signed int x; 947 signed int y; 948 <unnamed type> _20; 949 signed int _21; 950 signed int _38; 951 signed int _44; 952 signed int _51; 953 954 initial: 955 stack$0_39 = arg_5(D); 956 _20 = arg_5(D) <= 1; 957 _21 = (signed int) _20; 958 if (_21 != 0) 959 goto <bb 4> (instr9); 960 else 961 goto <bb 3> (instr4); 962 963 instr4: 964 /* DUP */: 965 _38 = arg_5(D) + -1; 966 _44 = factorial (_38); 967 _51 = arg_5(D) * _44; 968 stack$0_1 = _51; 969 970 # stack$0_52 = PHI <arg_5(D)(2), _51(3)> 971 instr9: 972 /* RETURN */: 973 stack ={v} {CLOBBER}; 974 return stack$0_52; 975 976 } 977 978Later on, another pass finally eliminated ``stack_depth`` local and the 979unused parts of the `stack`` array altogether: 980 981.. code-block:: console 982 983 $ less /tmp/libgccjit-1Hywc0/fake.c.036t.release_ssa 984 985.. code-block:: c 986 987 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0) 988 989 Released 44 names, 314.29%, removed 44 holes 990 factorial (signed int arg) 991 { 992 signed int stack$0; 993 signed int mult_acc_1; 994 <unnamed type> _5; 995 signed int _6; 996 signed int _7; 997 signed int mul_tmp_10; 998 signed int mult_acc_11; 999 signed int mult_acc_13; 1000 1001 # arg_9 = PHI <arg_8(D)(0)> 1002 # mult_acc_13 = PHI <1(0)> 1003 initial: 1004 1005 <bb 5>: 1006 # arg_4 = PHI <arg_9(2), _7(3)> 1007 # mult_acc_1 = PHI <mult_acc_13(2), mult_acc_11(3)> 1008 _5 = arg_4 <= 1; 1009 _6 = (signed int) _5; 1010 if (_6 != 0) 1011 goto <bb 4> (instr9); 1012 else 1013 goto <bb 3> (instr4); 1014 1015 instr4: 1016 /* DUP */: 1017 _7 = arg_4 + -1; 1018 mult_acc_11 = mult_acc_1 * arg_4; 1019 goto <bb 5>; 1020 1021 # stack$0_12 = PHI <arg_4(5)> 1022 instr9: 1023 /* RETURN */: 1024 mul_tmp_10 = mult_acc_1 * stack$0_12; 1025 return mul_tmp_10; 1026 1027 } 1028 1029 1030Elimination of tail recursion 1031^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1032Another significant optimization is the detection that the call to 1033``factorial`` is tail recursion, which can be eliminated in favor of 1034an iteration: 1035 1036.. code-block:: console 1037 1038 $ less /tmp/libgccjit-1Hywc0/fake.c.030t.tailr1 1039 1040.. code-block:: c 1041 1042 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0) 1043 1044 1045 Symbols to be put in SSA form 1046 { D.88 } 1047 Incremental SSA update started at block: 0 1048 Number of blocks in CFG: 5 1049 Number of blocks to update: 4 ( 80%) 1050 1051 1052 factorial (signed int arg) 1053 { 1054 signed int stack$2; 1055 signed int stack$1; 1056 signed int stack$0; 1057 signed int stack[8]; 1058 signed int stack_depth; 1059 signed int x; 1060 signed int y; 1061 signed int mult_acc_1; 1062 <unnamed type> _20; 1063 signed int _21; 1064 signed int _38; 1065 signed int mul_tmp_44; 1066 signed int mult_acc_51; 1067 1068 # arg_5 = PHI <arg_39(D)(0), _38(3)> 1069 # mult_acc_1 = PHI <1(0), mult_acc_51(3)> 1070 initial: 1071 _20 = arg_5 <= 1; 1072 _21 = (signed int) _20; 1073 if (_21 != 0) 1074 goto <bb 4> (instr9); 1075 else 1076 goto <bb 3> (instr4); 1077 1078 instr4: 1079 /* DUP */: 1080 _38 = arg_5 + -1; 1081 mult_acc_51 = mult_acc_1 * arg_5; 1082 goto <bb 2> (initial); 1083 1084 # stack$0_52 = PHI <arg_5(2)> 1085 instr9: 1086 /* RETURN */: 1087 stack ={v} {CLOBBER}; 1088 mul_tmp_44 = mult_acc_1 * stack$0_52; 1089 return mul_tmp_44; 1090 1091 } 1092