tutorial04.rst revision 1.1
1.. Copyright (C) 2014-2015 Free Software Foundation, Inc. 2 Originally contributed by David Malcolm <dmalcolm@redhat.com> 3 4 This is free software: you can redistribute it and/or modify it 5 under the terms of the GNU General Public License as published by 6 the Free Software Foundation, either version 3 of the License, or 7 (at your option) any later version. 8 9 This program is distributed in the hope that it will be useful, but 10 WITHOUT ANY WARRANTY; without even the implied warranty of 11 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 12 General Public License for more details. 13 14 You should have received a copy of the GNU General Public License 15 along with this program. If not, see 16 <http://www.gnu.org/licenses/>. 17 18.. default-domain:: cpp 19 20Tutorial part 4: Adding JIT-compilation to a toy interpreter 21------------------------------------------------------------ 22In this example we construct a "toy" interpreter, and add JIT-compilation 23to it. 24 25Our toy interpreter 26******************* 27 28It's a stack-based interpreter, and is intended as a (very simple) example 29of the kind of bytecode interpreter seen in dynamic languages such as 30Python, Ruby etc. 31 32For the sake of simplicity, our toy virtual machine is very limited: 33 34 * The only data type is `int` 35 36 * It can only work on one function at a time (so that the only 37 function call that can be made is to recurse). 38 39 * Functions can only take one parameter. 40 41 * Functions have a stack of `int` values. 42 43 * We'll implement function call within the interpreter by calling a 44 function in our implementation, rather than implementing our own 45 frame stack. 46 47 * The parser is only good enough to get the examples to work. 48 49Naturally, a real interpreter would be much more complicated that this. 50 51The following operations are supported: 52 53====================== ======================== =============== ============== 54Operation Meaning Old Stack New Stack 55====================== ======================== =============== ============== 56DUP Duplicate top of stack. ``[..., x]`` ``[..., x, x]`` 57ROT Swap top two elements ``[..., x, y]`` ``[..., y, x]`` 58 of stack. 59BINARY_ADD Add the top two elements ``[..., x, y]`` ``[..., (x+y)]`` 60 on the stack. 61BINARY_SUBTRACT Likewise, but subtract. ``[..., x, y]`` ``[..., (x-y)]`` 62BINARY_MULT Likewise, but multiply. ``[..., x, y]`` ``[..., (x*y)]`` 63BINARY_COMPARE_LT Compare the top two ``[..., x, y]`` ``[..., (x<y)]`` 64 elements on the stack 65 and push a nonzero/zero 66 if (x<y). 67RECURSE Recurse, passing the top ``[..., x]`` ``[..., fn(x)]`` 68 of the stack, and 69 popping the result. 70RETURN Return the top of the ``[x]`` ``[]`` 71 stack. 72PUSH_CONST `arg` Push an int const. ``[...]`` ``[..., arg]`` 73JUMP_ABS_IF_TRUE `arg` Pop; if top of stack was ``[..., x]`` ``[...]`` 74 nonzero, jump to 75 ``arg``. 76====================== ======================== =============== ============== 77 78Programs can be interpreted, disassembled, and compiled to machine code. 79 80The interpreter reads ``.toy`` scripts. Here's what a simple recursive 81factorial program looks like, the script ``factorial.toy``. 82The parser ignores lines beginning with a `#`. 83 84 .. literalinclude:: ../../examples/tut04-toyvm/factorial.toy 85 :lines: 1- 86 :language: c 87 88The interpreter is a simple infinite loop with a big ``switch`` statement 89based on what the next opcode is: 90 91 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 92 :start-after: /* Execute the given function. */ 93 :end-before: /* JIT compilation. */ 94 :language: c++ 95 96Compiling to machine code 97************************* 98We want to generate machine code that can be cast to this type and 99then directly executed in-process: 100 101 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 102 :start-after: /* Functions are compiled to this function ptr type. */ 103 :end-before: enum opcode 104 :language: c++ 105 106Our compiler isn't very sophisticated; it takes the implementation of 107each opcode above, and maps it directly to the operations supported by 108the libgccjit API. 109 110How should we handle the stack? In theory we could calculate what the 111stack depth will be at each opcode, and optimize away the stack 112manipulation "by hand". We'll see below that libgccjit is able to do 113this for us, so we'll implement stack manipulation 114in a direct way, by creating a ``stack`` array and ``stack_depth`` 115variables, local within the generated function, equivalent to this C code: 116 117.. code-block:: c 118 119 int stack_depth; 120 int stack[MAX_STACK_DEPTH]; 121 122We'll also have local variables ``x`` and ``y`` for use when implementing 123the opcodes, equivalent to this: 124 125.. code-block:: c 126 127 int x; 128 int y; 129 130This means our compiler has the following state: 131 132 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 133 :start-after: /* State. */ 134 :end-before: }; 135 :language: c++ 136 137Setting things up 138***************** 139 140First we create our types: 141 142 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 143 :start-after: /* Create types. */ 144 :end-before: /* The constant value 1. */ 145 :language: c++ 146 147along with extracting a useful `int` constant: 148 149 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 150 :start-after: /* The constant value 1. */ 151 :end-before: /* Create locations. */ 152 :language: c++ 153 154We'll implement push and pop in terms of the ``stack`` array and 155``stack_depth``. Here are helper functions for adding statements to 156a block, implementing pushing and popping values: 157 158 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 159 :start-after: /* Stack manipulation. */ 160 :end-before: /* Create the context. */ 161 :language: c++ 162 163We will support single-stepping through the generated code in the 164debugger, so we need to create :type:`gccjit::location` instances, one 165per operation in the source code. These will reference the lines of 166e.g. ``factorial.toy``. 167 168 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 169 :start-after: /* Create locations. */ 170 :end-before: /* Creating the function. */ 171 :language: c++ 172 173Let's create the function itself. As usual, we create its parameter 174first, then use the parameter to create the function: 175 176 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 177 :start-after: /* Creating the function. */ 178 :end-before: /* Create stack lvalues. */ 179 :language: c++ 180 181We create the locals within the function. 182 183 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 184 :start-after: /* Create stack lvalues. */ 185 :end-before: /* 1st pass: create blocks, one per opcode. 186 :language: c++ 187 188Populating the function 189*********************** 190 191There's some one-time initialization, and the API treats the first block 192you create as the entrypoint of the function, so we need to create that 193block first: 194 195 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 196 :start-after: first. */ 197 :end-before: /* Create a block per operation. */ 198 :language: c++ 199 200We can now create blocks for each of the operations. Most of these will 201be consolidated into larger blocks when the optimizer runs. 202 203 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 204 :start-after: /* Create a block per operation. */ 205 :end-before: /* Populate the initial block. */ 206 :language: c++ 207 208Now that we have a block it can jump to when it's done, we can populate 209the initial block: 210 211 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 212 :start-after: /* Populate the initial block. */ 213 :end-before: /* 2nd pass: fill in instructions. */ 214 :language: c++ 215 216We can now populate the blocks for the individual operations. We loop 217through them, adding instructions to their blocks: 218 219 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 220 :start-after: /* 2nd pass: fill in instructions. */ 221 :end-before: /* Helper macros. */ 222 :language: c++ 223 224We're going to have another big ``switch`` statement for implementing 225the opcodes, this time for compiling them, rather than interpreting 226them. It's helpful to have macros for implementing push and pop, so that 227we can make the ``switch`` statement that's coming up look as much as 228possible like the one above within the interpreter: 229 230.. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 231 :start-after: /* Helper macros. */ 232 :end-before: block.add_comment 233 :language: c++ 234 235.. note:: 236 237 A particularly clever implementation would have an *identical* 238 ``switch`` statement shared by the interpreter and the compiler, with 239 some preprocessor "magic". We're not doing that here, for the sake 240 of simplicity. 241 242When I first implemented this compiler, I accidentally missed an edit 243when copying and pasting the ``Y_EQUALS_POP`` macro, so that popping the 244stack into ``y`` instead erroneously assigned it to ``x``, leaving ``y`` 245uninitialized. 246 247To track this kind of thing down, we can use 248:func:`gccjit::block::add_comment` to add descriptive comments 249to the internal representation. This is invaluable when looking through 250the generated IR for, say ``factorial``: 251 252 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 253 :start-after: PUSH_RVALUE (y) 254 :end-before: /* Handle the individual opcodes. */ 255 :language: c++ 256 257We can now write the big ``switch`` statement that implements the 258individual opcodes, populating the relevant block with statements: 259 260 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 261 :start-after: /* Handle the individual opcodes. */ 262 :end-before: /* Go to the next block. */ 263 :language: c++ 264 265Every block must be terminated, via a call to one of the 266``gccjit::block::end_with_`` entrypoints. This has been done for two 267of the opcodes, but we need to do it for the other ones, by jumping 268to the next block. 269 270 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 271 :start-after: /* Go to the next block. */ 272 :end-before: /* end of loop on PC locations. */ 273 :language: c++ 274 275This is analogous to simply incrementing the program counter. 276 277Verifying the control flow graph 278******************************** 279Having finished looping over the blocks, the context is complete. 280 281As before, we can verify that the control flow and statements are sane by 282using :func:`gccjit::function::dump_to_dot`: 283 284.. code-block:: c++ 285 286 fn.dump_to_dot ("/tmp/factorial.dot"); 287 288and viewing the result. Note how the label names, comments, and 289variable names show up in the dump, to make it easier to spot 290errors in our compiler. 291 292 .. figure:: ../../intro/factorial.png 293 :alt: image of a control flow graph 294 295Compiling the context 296********************* 297Having finished looping over the blocks and populating them with 298statements, the context is complete. 299 300We can now compile it, and extract machine code from the result: 301 302 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 303 :start-after: /* We've now finished populating the context. Compile it. */ 304 :end-before: /* (this leaks "result" and "funcname") */ 305 :language: c++ 306 307We can now run the result: 308 309 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc 310 :start-after: /* JIT-compilation. */ 311 :end-before: return 0; 312 :language: c++ 313 314Single-stepping through the generated code 315****************************************** 316 317It's possible to debug the generated code. To do this we need to both: 318 319 * Set up source code locations for our statements, so that we can 320 meaningfully step through the code. We did this above by 321 calling :func:`gccjit::context::new_location` and using the 322 results. 323 324 * Enable the generation of debugging information, by setting 325 :c:macro:`GCC_JIT_BOOL_OPTION_DEBUGINFO` on the 326 :type:`gccjit::context` via 327 :func:`gccjit::context::set_bool_option`: 328 329 .. code-block:: c++ 330 331 ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DEBUGINFO, 1); 332 333Having done this, we can put a breakpoint on the generated function: 334 335.. code-block:: console 336 337 $ gdb --args ./toyvm factorial.toy 10 338 (gdb) break factorial 339 Function "factorial" not defined. 340 Make breakpoint pending on future shared library load? (y or [n]) y 341 Breakpoint 1 (factorial) pending. 342 (gdb) run 343 Breakpoint 1, factorial (arg=10) at factorial.toy:14 344 14 DUP 345 346We've set up location information, which references ``factorial.toy``. 347This allows us to use e.g. ``list`` to see where we are in the script: 348 349.. code-block:: console 350 351 (gdb) list 352 9 353 10 # Initial state: 354 11 # stack: [arg] 355 12 356 13 # 0: 357 14 DUP 358 15 # stack: [arg, arg] 359 16 360 17 # 1: 361 18 PUSH_CONST 2 362 363and to step through the function, examining the data: 364 365.. code-block:: console 366 367 (gdb) n 368 18 PUSH_CONST 2 369 (gdb) n 370 22 BINARY_COMPARE_LT 371 (gdb) print stack 372 $5 = {10, 10, 2, 0, -7152, 32767, 0, 0} 373 (gdb) print stack_depth 374 $6 = 3 375 376You'll see that the parts of the ``stack`` array that haven't been 377touched yet are uninitialized. 378 379.. note:: 380 381 Turning on optimizations may lead to unpredictable results when 382 stepping through the generated code: the execution may appear to 383 "jump around" the source code. This is analogous to turning up the 384 optimization level in a regular compiler. 385 386Examining the generated code 387**************************** 388 389How good is the optimized code? 390 391We can turn up optimizations, by calling 392:cpp:func:`gccjit::context::set_int_option` with 393:c:macro:`GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL`: 394 395.. code-block:: c++ 396 397 ctxt.set_int_option (GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL, 3); 398 399One of GCC's internal representations is called "gimple". A dump of the 400initial gimple representation of the code can be seen by setting: 401 402.. code-block:: c++ 403 404 ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DUMP_INITIAL_GIMPLE, 1); 405 406With optimization on and source locations displayed, this gives: 407 408.. We'll use "c" for gimple dumps 409 410.. code-block:: c 411 412 factorial (signed int arg) 413 { 414 <unnamed type> D.80; 415 signed int D.81; 416 signed int D.82; 417 signed int D.83; 418 signed int D.84; 419 signed int D.85; 420 signed int y; 421 signed int x; 422 signed int stack_depth; 423 signed int stack[8]; 424 425 try 426 { 427 initial: 428 stack_depth = 0; 429 stack[stack_depth] = arg; 430 stack_depth = stack_depth + 1; 431 goto instr0; 432 instr0: 433 /* DUP */: 434 stack_depth = stack_depth + -1; 435 x = stack[stack_depth]; 436 stack[stack_depth] = x; 437 stack_depth = stack_depth + 1; 438 stack[stack_depth] = x; 439 stack_depth = stack_depth + 1; 440 goto instr1; 441 instr1: 442 /* PUSH_CONST */: 443 stack[stack_depth] = 2; 444 stack_depth = stack_depth + 1; 445 goto instr2; 446 447 /* etc */ 448 449You can see the generated machine code in assembly form via: 450 451.. code-block:: c++ 452 453 ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DUMP_GENERATED_CODE, 1); 454 result = ctxt.compile (); 455 456which shows that (on this x86_64 box) the compiler has unrolled the loop 457and is using MMX instructions to perform several multiplications 458simultaneously: 459 460.. code-block:: gas 461 462 .file "fake.c" 463 .text 464 .Ltext0: 465 .p2align 4,,15 466 .globl factorial 467 .type factorial, @function 468 factorial: 469 .LFB0: 470 .file 1 "factorial.toy" 471 .loc 1 14 0 472 .cfi_startproc 473 .LVL0: 474 .L2: 475 .loc 1 26 0 476 cmpl $1, %edi 477 jle .L13 478 leal -1(%rdi), %edx 479 movl %edx, %ecx 480 shrl $2, %ecx 481 leal 0(,%rcx,4), %esi 482 testl %esi, %esi 483 je .L14 484 cmpl $9, %edx 485 jbe .L14 486 leal -2(%rdi), %eax 487 movl %eax, -16(%rsp) 488 leal -3(%rdi), %eax 489 movd -16(%rsp), %xmm0 490 movl %edi, -16(%rsp) 491 movl %eax, -12(%rsp) 492 movd -16(%rsp), %xmm1 493 xorl %eax, %eax 494 movl %edx, -16(%rsp) 495 movd -12(%rsp), %xmm4 496 movd -16(%rsp), %xmm6 497 punpckldq %xmm4, %xmm0 498 movdqa .LC1(%rip), %xmm4 499 punpckldq %xmm6, %xmm1 500 punpcklqdq %xmm0, %xmm1 501 movdqa .LC0(%rip), %xmm0 502 jmp .L5 503 # etc - edited for brevity 504 505This is clearly overkill for a function that will likely overflow the 506``int`` type before the vectorization is worthwhile - but then again, this 507is a toy example. 508 509Turning down the optimization level to 2: 510 511.. code-block:: c++ 512 513 ctxt.set_int_option (GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL, 2); 514 515yields this code, which is simple enough to quote in its entirety: 516 517.. code-block:: gas 518 519 .file "fake.c" 520 .text 521 .p2align 4,,15 522 .globl factorial 523 .type factorial, @function 524 factorial: 525 .LFB0: 526 .cfi_startproc 527 .L2: 528 cmpl $1, %edi 529 jle .L8 530 movl $1, %edx 531 jmp .L4 532 .p2align 4,,10 533 .p2align 3 534 .L6: 535 movl %eax, %edi 536 .L4: 537 .L5: 538 leal -1(%rdi), %eax 539 imull %edi, %edx 540 cmpl $1, %eax 541 jne .L6 542 .L3: 543 .L7: 544 imull %edx, %eax 545 ret 546 .L8: 547 movl %edi, %eax 548 movl $1, %edx 549 jmp .L7 550 .cfi_endproc 551 .LFE0: 552 .size factorial, .-factorial 553 .ident "GCC: (GNU) 4.9.0 20131023 (Red Hat 0.2-%{gcc_release})" 554 .section .note.GNU-stack,"",@progbits 555 556Note that the stack pushing and popping have been eliminated, as has the 557recursive call (in favor of an iteration). 558 559Putting it all together 560*********************** 561 562The complete example can be seen in the source tree at 563``gcc/jit/docs/examples/tut04-toyvm/toyvm.cc`` 564 565along with a Makefile and a couple of sample .toy scripts: 566 567.. code-block:: console 568 569 $ ls -al 570 drwxrwxr-x. 2 david david 4096 Sep 19 17:46 . 571 drwxrwxr-x. 3 david david 4096 Sep 19 15:26 .. 572 -rw-rw-r--. 1 david david 615 Sep 19 12:43 factorial.toy 573 -rw-rw-r--. 1 david david 834 Sep 19 13:08 fibonacci.toy 574 -rw-rw-r--. 1 david david 238 Sep 19 14:22 Makefile 575 -rw-rw-r--. 1 david david 16457 Sep 19 17:07 toyvm.cc 576 577 $ make toyvm 578 g++ -Wall -g -o toyvm toyvm.cc -lgccjit 579 580 $ ./toyvm factorial.toy 10 581 interpreter result: 3628800 582 compiler result: 3628800 583 584 $ ./toyvm fibonacci.toy 10 585 interpreter result: 55 586 compiler result: 55 587 588Behind the curtain: How does our code get optimized? 589**************************************************** 590 591Our example is done, but you may be wondering about exactly how the 592compiler turned what we gave it into the machine code seen above. 593 594We can examine what the compiler is doing in detail by setting: 595 596.. code-block:: c++ 597 598 state.ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DUMP_EVERYTHING, 1); 599 state.ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_KEEP_INTERMEDIATES, 1); 600 601This will dump detailed information about the compiler's state to a 602directory under ``/tmp``, and keep it from being cleaned up. 603 604The precise names and their formats of these files is subject to change. 605Higher optimization levels lead to more files. 606Here's what I saw (edited for brevity; there were almost 200 files): 607 608.. code-block:: console 609 610 intermediate files written to /tmp/libgccjit-KPQbGw 611 $ ls /tmp/libgccjit-KPQbGw/ 612 fake.c.000i.cgraph 613 fake.c.000i.type-inheritance 614 fake.c.004t.gimple 615 fake.c.007t.omplower 616 fake.c.008t.lower 617 fake.c.011t.eh 618 fake.c.012t.cfg 619 fake.c.014i.visibility 620 fake.c.015i.early_local_cleanups 621 fake.c.016t.ssa 622 # etc 623 624The gimple code is converted into Static Single Assignment form, 625with annotations for use when generating the debuginfo: 626 627.. code-block:: console 628 629 $ less /tmp/libgccjit-KPQbGw/fake.c.016t.ssa 630 631.. code-block:: c 632 633 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0) 634 635 factorial (signed int arg) 636 { 637 signed int stack[8]; 638 signed int stack_depth; 639 signed int x; 640 signed int y; 641 <unnamed type> _20; 642 signed int _21; 643 signed int _38; 644 signed int _44; 645 signed int _51; 646 signed int _56; 647 648 initial: 649 stack_depth_3 = 0; 650 # DEBUG stack_depth => stack_depth_3 651 stack[stack_depth_3] = arg_5(D); 652 stack_depth_7 = stack_depth_3 + 1; 653 # DEBUG stack_depth => stack_depth_7 654 # DEBUG instr0 => NULL 655 # DEBUG /* DUP */ => NULL 656 stack_depth_8 = stack_depth_7 + -1; 657 # DEBUG stack_depth => stack_depth_8 658 x_9 = stack[stack_depth_8]; 659 # DEBUG x => x_9 660 stack[stack_depth_8] = x_9; 661 stack_depth_11 = stack_depth_8 + 1; 662 # DEBUG stack_depth => stack_depth_11 663 stack[stack_depth_11] = x_9; 664 stack_depth_13 = stack_depth_11 + 1; 665 # DEBUG stack_depth => stack_depth_13 666 # DEBUG instr1 => NULL 667 # DEBUG /* PUSH_CONST */ => NULL 668 stack[stack_depth_13] = 2; 669 670 /* etc; edited for brevity */ 671 672We can perhaps better see the code by turning off 673:c:macro:`GCC_JIT_BOOL_OPTION_DEBUGINFO` to suppress all those ``DEBUG`` 674statements, giving: 675 676.. code-block:: console 677 678 $ less /tmp/libgccjit-1Hywc0/fake.c.016t.ssa 679 680.. code-block:: c 681 682 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0) 683 684 factorial (signed int arg) 685 { 686 signed int stack[8]; 687 signed int stack_depth; 688 signed int x; 689 signed int y; 690 <unnamed type> _20; 691 signed int _21; 692 signed int _38; 693 signed int _44; 694 signed int _51; 695 signed int _56; 696 697 initial: 698 stack_depth_3 = 0; 699 stack[stack_depth_3] = arg_5(D); 700 stack_depth_7 = stack_depth_3 + 1; 701 stack_depth_8 = stack_depth_7 + -1; 702 x_9 = stack[stack_depth_8]; 703 stack[stack_depth_8] = x_9; 704 stack_depth_11 = stack_depth_8 + 1; 705 stack[stack_depth_11] = x_9; 706 stack_depth_13 = stack_depth_11 + 1; 707 stack[stack_depth_13] = 2; 708 stack_depth_15 = stack_depth_13 + 1; 709 stack_depth_16 = stack_depth_15 + -1; 710 y_17 = stack[stack_depth_16]; 711 stack_depth_18 = stack_depth_16 + -1; 712 x_19 = stack[stack_depth_18]; 713 _20 = x_19 < y_17; 714 _21 = (signed int) _20; 715 stack[stack_depth_18] = _21; 716 stack_depth_23 = stack_depth_18 + 1; 717 stack_depth_24 = stack_depth_23 + -1; 718 x_25 = stack[stack_depth_24]; 719 if (x_25 != 0) 720 goto <bb 4> (instr9); 721 else 722 goto <bb 3> (instr4); 723 724 instr4: 725 /* DUP */: 726 stack_depth_26 = stack_depth_24 + -1; 727 x_27 = stack[stack_depth_26]; 728 stack[stack_depth_26] = x_27; 729 stack_depth_29 = stack_depth_26 + 1; 730 stack[stack_depth_29] = x_27; 731 stack_depth_31 = stack_depth_29 + 1; 732 stack[stack_depth_31] = 1; 733 stack_depth_33 = stack_depth_31 + 1; 734 stack_depth_34 = stack_depth_33 + -1; 735 y_35 = stack[stack_depth_34]; 736 stack_depth_36 = stack_depth_34 + -1; 737 x_37 = stack[stack_depth_36]; 738 _38 = x_37 - y_35; 739 stack[stack_depth_36] = _38; 740 stack_depth_40 = stack_depth_36 + 1; 741 stack_depth_41 = stack_depth_40 + -1; 742 x_42 = stack[stack_depth_41]; 743 _44 = factorial (x_42); 744 stack[stack_depth_41] = _44; 745 stack_depth_46 = stack_depth_41 + 1; 746 stack_depth_47 = stack_depth_46 + -1; 747 y_48 = stack[stack_depth_47]; 748 stack_depth_49 = stack_depth_47 + -1; 749 x_50 = stack[stack_depth_49]; 750 _51 = x_50 * y_48; 751 stack[stack_depth_49] = _51; 752 stack_depth_53 = stack_depth_49 + 1; 753 754 # stack_depth_1 = PHI <stack_depth_24(2), stack_depth_53(3)> 755 instr9: 756 /* RETURN */: 757 stack_depth_54 = stack_depth_1 + -1; 758 x_55 = stack[stack_depth_54]; 759 _56 = x_55; 760 stack ={v} {CLOBBER}; 761 return _56; 762 763 } 764 765Note in the above how all the :type:`gccjit::block` instances we 766created have been consolidated into just 3 blocks in GCC's internal 767representation: ``initial``, ``instr4`` and ``instr9``. 768 769Optimizing away stack manipulation 770^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 771Recall our simple implementation of stack operations. Let's examine 772how the stack operations are optimized away. 773 774After a pass of constant-propagation, the depth of the stack at each 775opcode can be determined at compile-time: 776 777.. code-block:: console 778 779 $ less /tmp/libgccjit-1Hywc0/fake.c.021t.ccp1 780 781.. code-block:: c 782 783 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0) 784 785 factorial (signed int arg) 786 { 787 signed int stack[8]; 788 signed int stack_depth; 789 signed int x; 790 signed int y; 791 <unnamed type> _20; 792 signed int _21; 793 signed int _38; 794 signed int _44; 795 signed int _51; 796 797 initial: 798 stack[0] = arg_5(D); 799 x_9 = stack[0]; 800 stack[0] = x_9; 801 stack[1] = x_9; 802 stack[2] = 2; 803 y_17 = stack[2]; 804 x_19 = stack[1]; 805 _20 = x_19 < y_17; 806 _21 = (signed int) _20; 807 stack[1] = _21; 808 x_25 = stack[1]; 809 if (x_25 != 0) 810 goto <bb 4> (instr9); 811 else 812 goto <bb 3> (instr4); 813 814 instr4: 815 /* DUP */: 816 x_27 = stack[0]; 817 stack[0] = x_27; 818 stack[1] = x_27; 819 stack[2] = 1; 820 y_35 = stack[2]; 821 x_37 = stack[1]; 822 _38 = x_37 - y_35; 823 stack[1] = _38; 824 x_42 = stack[1]; 825 _44 = factorial (x_42); 826 stack[1] = _44; 827 y_48 = stack[1]; 828 x_50 = stack[0]; 829 _51 = x_50 * y_48; 830 stack[0] = _51; 831 832 instr9: 833 /* RETURN */: 834 x_55 = stack[0]; 835 x_56 = x_55; 836 stack ={v} {CLOBBER}; 837 return x_56; 838 839 } 840 841Note how, in the above, all those ``stack_depth`` values are now just 842constants: we're accessing specific stack locations at each opcode. 843 844The "esra" pass ("Early Scalar Replacement of Aggregates") breaks 845out our "stack" array into individual elements: 846 847.. code-block:: console 848 849 $ less /tmp/libgccjit-1Hywc0/fake.c.024t.esra 850 851.. code-block:: c 852 853 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0) 854 855 Created a replacement for stack offset: 0, size: 32: stack$0 856 Created a replacement for stack offset: 32, size: 32: stack$1 857 Created a replacement for stack offset: 64, size: 32: stack$2 858 859 Symbols to be put in SSA form 860 { D.89 D.90 D.91 } 861 Incremental SSA update started at block: 0 862 Number of blocks in CFG: 5 863 Number of blocks to update: 4 ( 80%) 864 865 866 factorial (signed int arg) 867 { 868 signed int stack$2; 869 signed int stack$1; 870 signed int stack$0; 871 signed int stack[8]; 872 signed int stack_depth; 873 signed int x; 874 signed int y; 875 <unnamed type> _20; 876 signed int _21; 877 signed int _38; 878 signed int _44; 879 signed int _51; 880 881 initial: 882 stack$0_45 = arg_5(D); 883 x_9 = stack$0_45; 884 stack$0_39 = x_9; 885 stack$1_32 = x_9; 886 stack$2_30 = 2; 887 y_17 = stack$2_30; 888 x_19 = stack$1_32; 889 _20 = x_19 < y_17; 890 _21 = (signed int) _20; 891 stack$1_28 = _21; 892 x_25 = stack$1_28; 893 if (x_25 != 0) 894 goto <bb 4> (instr9); 895 else 896 goto <bb 3> (instr4); 897 898 instr4: 899 /* DUP */: 900 x_27 = stack$0_39; 901 stack$0_22 = x_27; 902 stack$1_14 = x_27; 903 stack$2_12 = 1; 904 y_35 = stack$2_12; 905 x_37 = stack$1_14; 906 _38 = x_37 - y_35; 907 stack$1_10 = _38; 908 x_42 = stack$1_10; 909 _44 = factorial (x_42); 910 stack$1_6 = _44; 911 y_48 = stack$1_6; 912 x_50 = stack$0_22; 913 _51 = x_50 * y_48; 914 stack$0_1 = _51; 915 916 # stack$0_52 = PHI <stack$0_39(2), stack$0_1(3)> 917 instr9: 918 /* RETURN */: 919 x_55 = stack$0_52; 920 x_56 = x_55; 921 stack ={v} {CLOBBER}; 922 return x_56; 923 924 } 925 926Hence at this point, all those pushes and pops of the stack are now 927simply assignments to specific temporary variables. 928 929After some copy propagation, the stack manipulation has been completely 930optimized away: 931 932.. code-block:: console 933 934 $ less /tmp/libgccjit-1Hywc0/fake.c.026t.copyprop1 935 936.. code-block:: c 937 938 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0) 939 940 factorial (signed int arg) 941 { 942 signed int stack$2; 943 signed int stack$1; 944 signed int stack$0; 945 signed int stack[8]; 946 signed int stack_depth; 947 signed int x; 948 signed int y; 949 <unnamed type> _20; 950 signed int _21; 951 signed int _38; 952 signed int _44; 953 signed int _51; 954 955 initial: 956 stack$0_39 = arg_5(D); 957 _20 = arg_5(D) <= 1; 958 _21 = (signed int) _20; 959 if (_21 != 0) 960 goto <bb 4> (instr9); 961 else 962 goto <bb 3> (instr4); 963 964 instr4: 965 /* DUP */: 966 _38 = arg_5(D) + -1; 967 _44 = factorial (_38); 968 _51 = arg_5(D) * _44; 969 stack$0_1 = _51; 970 971 # stack$0_52 = PHI <arg_5(D)(2), _51(3)> 972 instr9: 973 /* RETURN */: 974 stack ={v} {CLOBBER}; 975 return stack$0_52; 976 977 } 978 979Later on, another pass finally eliminated ``stack_depth`` local and the 980unused parts of the `stack`` array altogether: 981 982.. code-block:: console 983 984 $ less /tmp/libgccjit-1Hywc0/fake.c.036t.release_ssa 985 986.. code-block:: c 987 988 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0) 989 990 Released 44 names, 314.29%, removed 44 holes 991 factorial (signed int arg) 992 { 993 signed int stack$0; 994 signed int mult_acc_1; 995 <unnamed type> _5; 996 signed int _6; 997 signed int _7; 998 signed int mul_tmp_10; 999 signed int mult_acc_11; 1000 signed int mult_acc_13; 1001 1002 # arg_9 = PHI <arg_8(D)(0)> 1003 # mult_acc_13 = PHI <1(0)> 1004 initial: 1005 1006 <bb 5>: 1007 # arg_4 = PHI <arg_9(2), _7(3)> 1008 # mult_acc_1 = PHI <mult_acc_13(2), mult_acc_11(3)> 1009 _5 = arg_4 <= 1; 1010 _6 = (signed int) _5; 1011 if (_6 != 0) 1012 goto <bb 4> (instr9); 1013 else 1014 goto <bb 3> (instr4); 1015 1016 instr4: 1017 /* DUP */: 1018 _7 = arg_4 + -1; 1019 mult_acc_11 = mult_acc_1 * arg_4; 1020 goto <bb 5>; 1021 1022 # stack$0_12 = PHI <arg_4(5)> 1023 instr9: 1024 /* RETURN */: 1025 mul_tmp_10 = mult_acc_1 * stack$0_12; 1026 return mul_tmp_10; 1027 1028 } 1029 1030 1031Elimination of tail recursion 1032^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1033Another significant optimization is the detection that the call to 1034``factorial`` is tail recursion, which can be eliminated in favor of 1035an iteration: 1036 1037.. code-block:: console 1038 1039 $ less /tmp/libgccjit-1Hywc0/fake.c.030t.tailr1 1040 1041.. code-block:: c 1042 1043 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0) 1044 1045 1046 Symbols to be put in SSA form 1047 { D.88 } 1048 Incremental SSA update started at block: 0 1049 Number of blocks in CFG: 5 1050 Number of blocks to update: 4 ( 80%) 1051 1052 1053 factorial (signed int arg) 1054 { 1055 signed int stack$2; 1056 signed int stack$1; 1057 signed int stack$0; 1058 signed int stack[8]; 1059 signed int stack_depth; 1060 signed int x; 1061 signed int y; 1062 signed int mult_acc_1; 1063 <unnamed type> _20; 1064 signed int _21; 1065 signed int _38; 1066 signed int mul_tmp_44; 1067 signed int mult_acc_51; 1068 1069 # arg_5 = PHI <arg_39(D)(0), _38(3)> 1070 # mult_acc_1 = PHI <1(0), mult_acc_51(3)> 1071 initial: 1072 _20 = arg_5 <= 1; 1073 _21 = (signed int) _20; 1074 if (_21 != 0) 1075 goto <bb 4> (instr9); 1076 else 1077 goto <bb 3> (instr4); 1078 1079 instr4: 1080 /* DUP */: 1081 _38 = arg_5 + -1; 1082 mult_acc_51 = mult_acc_1 * arg_5; 1083 goto <bb 2> (initial); 1084 1085 # stack$0_52 = PHI <arg_5(2)> 1086 instr9: 1087 /* RETURN */: 1088 stack ={v} {CLOBBER}; 1089 mul_tmp_44 = mult_acc_1 * stack$0_52; 1090 return mul_tmp_44; 1091 1092 } 1093