tutorial04.rst revision 1.1.1.1
1.. Copyright (C) 2014-2015 Free Software Foundation, Inc.
2   Originally contributed by David Malcolm <dmalcolm@redhat.com>
3
4   This is free software: you can redistribute it and/or modify it
5   under the terms of the GNU General Public License as published by
6   the Free Software Foundation, either version 3 of the License, or
7   (at your option) any later version.
8
9   This program is distributed in the hope that it will be useful, but
10   WITHOUT ANY WARRANTY; without even the implied warranty of
11   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
12   General Public License for more details.
13
14   You should have received a copy of the GNU General Public License
15   along with this program.  If not, see
16   <http://www.gnu.org/licenses/>.
17
18.. default-domain:: cpp
19
20Tutorial part 4: Adding JIT-compilation to a toy interpreter
21------------------------------------------------------------
22In this example we construct a "toy" interpreter, and add JIT-compilation
23to it.
24
25Our toy interpreter
26*******************
27
28It's a stack-based interpreter, and is intended as a (very simple) example
29of the kind of bytecode interpreter seen in dynamic languages such as
30Python, Ruby etc.
31
32For the sake of simplicity, our toy virtual machine is very limited:
33
34  * The only data type is `int`
35
36  * It can only work on one function at a time (so that the only
37    function call that can be made is to recurse).
38
39  * Functions can only take one parameter.
40
41  * Functions have a stack of `int` values.
42
43  * We'll implement function call within the interpreter by calling a
44    function in our implementation, rather than implementing our own
45    frame stack.
46
47  * The parser is only good enough to get the examples to work.
48
49Naturally, a real interpreter would be much more complicated that this.
50
51The following operations are supported:
52
53====================== ======================== =============== ==============
54Operation              Meaning                  Old Stack       New Stack
55====================== ======================== =============== ==============
56DUP                    Duplicate top of stack.  ``[..., x]``    ``[..., x, x]``
57ROT                    Swap top two elements    ``[..., x, y]`` ``[..., y, x]``
58                       of stack.
59BINARY_ADD             Add the top two elements ``[..., x, y]`` ``[..., (x+y)]``
60                       on the stack.
61BINARY_SUBTRACT        Likewise, but subtract.  ``[..., x, y]`` ``[..., (x-y)]``
62BINARY_MULT            Likewise, but multiply.  ``[..., x, y]`` ``[..., (x*y)]``
63BINARY_COMPARE_LT      Compare the top two      ``[..., x, y]`` ``[..., (x<y)]``
64                       elements on the stack
65                       and push a nonzero/zero
66                       if (x<y).
67RECURSE                Recurse, passing the top ``[..., x]``    ``[..., fn(x)]``
68                       of the stack, and
69                       popping the result.
70RETURN                 Return the top of the    ``[x]``         ``[]``
71                       stack.
72PUSH_CONST `arg`       Push an int const.       ``[...]``       ``[..., arg]``
73JUMP_ABS_IF_TRUE `arg` Pop; if top of stack was ``[..., x]``    ``[...]``
74                       nonzero, jump to
75                       ``arg``.
76====================== ======================== =============== ==============
77
78Programs can be interpreted, disassembled, and compiled to machine code.
79
80The interpreter reads ``.toy`` scripts.  Here's what a simple recursive
81factorial program looks like, the script ``factorial.toy``.
82The parser ignores lines beginning with a `#`.
83
84   .. literalinclude:: ../../examples/tut04-toyvm/factorial.toy
85    :lines: 1-
86    :language: c
87
88The interpreter is a simple infinite loop with a big ``switch`` statement
89based on what the next opcode is:
90
91   .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
92    :start-after: /* Execute the given function.  */
93    :end-before: /* JIT compilation.  */
94    :language: c++
95
96Compiling to machine code
97*************************
98We want to generate machine code that can be cast to this type and
99then directly executed in-process:
100
101   .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
102    :start-after: /* Functions are compiled to this function ptr type.  */
103    :end-before: enum opcode
104    :language: c++
105
106Our compiler isn't very sophisticated; it takes the implementation of
107each opcode above, and maps it directly to the operations supported by
108the libgccjit API.
109
110How should we handle the stack?  In theory we could calculate what the
111stack depth will be at each opcode, and optimize away the stack
112manipulation "by hand".  We'll see below that libgccjit is able to do
113this for us, so we'll implement stack manipulation
114in a direct way, by creating a ``stack`` array and ``stack_depth``
115variables, local within the generated function, equivalent to this C code:
116
117.. code-block:: c
118
119  int stack_depth;
120  int stack[MAX_STACK_DEPTH];
121
122We'll also have local variables ``x`` and ``y`` for use when implementing
123the opcodes, equivalent to this:
124
125.. code-block:: c
126
127  int x;
128  int y;
129
130This means our compiler has the following state:
131
132   .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
133    :start-after:   /* State.  */
134    :end-before: };
135    :language: c++
136
137Setting things up
138*****************
139
140First we create our types:
141
142   .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
143    :start-after: /* Create types.  */
144    :end-before: /* The constant value 1.  */
145    :language: c++
146
147along with extracting a useful `int` constant:
148
149   .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
150    :start-after: /* The constant value 1.  */
151    :end-before: /* Create locations.  */
152    :language: c++
153
154We'll implement push and pop in terms of the ``stack`` array and
155``stack_depth``.  Here are helper functions for adding statements to
156a block, implementing pushing and popping values:
157
158   .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
159    :start-after: /* Stack manipulation.  */
160    :end-before: /* Create the context. */
161    :language: c++
162
163We will support single-stepping through the generated code in the
164debugger, so we need to create :type:`gccjit::location` instances, one
165per operation in the source code.  These will reference the lines of
166e.g. ``factorial.toy``.
167
168   .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
169    :start-after: /* Create locations.  */
170    :end-before: /* Creating the function.  */
171    :language: c++
172
173Let's create the function itself.  As usual, we create its parameter
174first, then use the parameter to create the function:
175
176   .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
177    :start-after: /* Creating the function.  */
178    :end-before: /* Create stack lvalues.  */
179    :language: c++
180
181We create the locals within the function.
182
183   .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
184    :start-after: /* Create stack lvalues.  */
185    :end-before: /* 1st pass: create blocks, one per opcode.
186    :language: c++
187
188Populating the function
189***********************
190
191There's some one-time initialization, and the API treats the first block
192you create as the entrypoint of the function, so we need to create that
193block first:
194
195   .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
196    :start-after: first.  */
197    :end-before: /* Create a block per operation.  */
198    :language: c++
199
200We can now create blocks for each of the operations.  Most of these will
201be consolidated into larger blocks when the optimizer runs.
202
203   .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
204    :start-after: /* Create a block per operation.  */
205    :end-before: /* Populate the initial block.  */
206    :language: c++
207
208Now that we have a block it can jump to when it's done, we can populate
209the initial block:
210
211   .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
212    :start-after: /* Populate the initial block.  */
213    :end-before: /* 2nd pass: fill in instructions.  */
214    :language: c++
215
216We can now populate the blocks for the individual operations.  We loop
217through them, adding instructions to their blocks:
218
219   .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
220    :start-after: /* 2nd pass: fill in instructions.  */
221    :end-before: /* Helper macros.  */
222    :language: c++
223
224We're going to have another big ``switch`` statement for implementing
225the opcodes, this time for compiling them, rather than interpreting
226them.  It's helpful to have macros for implementing push and pop, so that
227we can make the ``switch`` statement that's coming up look as much as
228possible like the one above within the interpreter:
229
230.. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
231    :start-after: /* Helper macros.  */
232    :end-before: block.add_comment
233    :language: c++
234
235.. note::
236
237   A particularly clever implementation would have an *identical*
238   ``switch`` statement shared by the interpreter and the compiler, with
239   some preprocessor "magic".  We're not doing that here, for the sake
240   of simplicity.
241
242When I first implemented this compiler, I accidentally missed an edit
243when copying and pasting the ``Y_EQUALS_POP`` macro, so that popping the
244stack into ``y`` instead erroneously assigned it to ``x``, leaving ``y``
245uninitialized.
246
247To track this kind of thing down, we can use
248:func:`gccjit::block::add_comment` to add descriptive comments
249to the internal representation.  This is invaluable when looking through
250the generated IR for, say ``factorial``:
251
252   .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
253    :start-after: PUSH_RVALUE (y)
254    :end-before: /* Handle the individual opcodes.  */
255    :language: c++
256
257We can now write the big ``switch`` statement that implements the
258individual opcodes, populating the relevant block with statements:
259
260   .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
261    :start-after: /* Handle the individual opcodes.  */
262    :end-before: /* Go to the next block.  */
263    :language: c++
264
265Every block must be terminated, via a call to one of the
266``gccjit::block::end_with_`` entrypoints.  This has been done for two
267of the opcodes, but we need to do it for the other ones, by jumping
268to the next block.
269
270   .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
271    :start-after: /* Go to the next block.  */
272    :end-before: /* end of loop on PC locations.  */
273    :language: c++
274
275This is analogous to simply incrementing the program counter.
276
277Verifying the control flow graph
278********************************
279Having finished looping over the blocks, the context is complete.
280
281As before, we can verify that the control flow and statements are sane by
282using :func:`gccjit::function::dump_to_dot`:
283
284.. code-block:: c++
285
286  fn.dump_to_dot ("/tmp/factorial.dot");
287
288and viewing the result.  Note how the label names, comments, and
289variable names show up in the dump, to make it easier to spot
290errors in our compiler.
291
292  .. figure:: ../../intro/factorial.png
293    :alt: image of a control flow graph
294
295Compiling the context
296*********************
297Having finished looping over the blocks and populating them with
298statements, the context is complete.
299
300We can now compile it, and extract machine code from the result:
301
302   .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
303    :start-after: /* We've now finished populating the context.  Compile it.  */
304    :end-before: /* (this leaks "result" and "funcname") */
305    :language: c++
306
307We can now run the result:
308
309   .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
310    :start-after: /* JIT-compilation.  */
311    :end-before: return 0;
312    :language: c++
313
314Single-stepping through the generated code
315******************************************
316
317It's possible to debug the generated code.  To do this we need to both:
318
319  * Set up source code locations for our statements, so that we can
320    meaningfully step through the code.  We did this above by
321    calling :func:`gccjit::context::new_location` and using the
322    results.
323
324  * Enable the generation of debugging information, by setting
325    :c:macro:`GCC_JIT_BOOL_OPTION_DEBUGINFO` on the
326    :type:`gccjit::context` via
327    :func:`gccjit::context::set_bool_option`:
328
329    .. code-block:: c++
330
331      ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DEBUGINFO, 1);
332
333Having done this, we can put a breakpoint on the generated function:
334
335.. code-block:: console
336
337  $ gdb --args ./toyvm factorial.toy 10
338  (gdb) break factorial
339  Function "factorial" not defined.
340  Make breakpoint pending on future shared library load? (y or [n]) y
341  Breakpoint 1 (factorial) pending.
342  (gdb) run
343  Breakpoint 1, factorial (arg=10) at factorial.toy:14
344  14	DUP
345
346We've set up location information, which references ``factorial.toy``.
347This allows us to use e.g. ``list`` to see where we are in the script:
348
349.. code-block:: console
350
351  (gdb) list
352  9
353  10    # Initial state:
354  11    # stack: [arg]
355  12
356  13    # 0:
357  14    DUP
358  15    # stack: [arg, arg]
359  16
360  17    # 1:
361  18    PUSH_CONST 2
362
363and to step through the function, examining the data:
364
365.. code-block:: console
366
367  (gdb) n
368  18    PUSH_CONST 2
369  (gdb) n
370  22    BINARY_COMPARE_LT
371  (gdb) print stack
372  $5 = {10, 10, 2, 0, -7152, 32767, 0, 0}
373  (gdb) print stack_depth
374  $6 = 3
375
376You'll see that the parts of the ``stack`` array that haven't been
377touched yet are uninitialized.
378
379.. note::
380
381   Turning on optimizations may lead to unpredictable results when
382   stepping through the generated code: the execution may appear to
383   "jump around" the source code.  This is analogous to turning up the
384   optimization level in a regular compiler.
385
386Examining the generated code
387****************************
388
389How good is the optimized code?
390
391We can turn up optimizations, by calling
392:cpp:func:`gccjit::context::set_int_option` with
393:c:macro:`GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL`:
394
395.. code-block:: c++
396
397  ctxt.set_int_option (GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL, 3);
398
399One of GCC's internal representations is called "gimple".  A dump of the
400initial gimple representation of the code can be seen by setting:
401
402.. code-block:: c++
403
404  ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DUMP_INITIAL_GIMPLE, 1);
405
406With optimization on and source locations displayed, this gives:
407
408.. We'll use "c" for gimple dumps
409
410.. code-block:: c
411
412  factorial (signed int arg)
413  {
414    <unnamed type> D.80;
415    signed int D.81;
416    signed int D.82;
417    signed int D.83;
418    signed int D.84;
419    signed int D.85;
420    signed int y;
421    signed int x;
422    signed int stack_depth;
423    signed int stack[8];
424
425    try
426      {
427        initial:
428        stack_depth = 0;
429        stack[stack_depth] = arg;
430        stack_depth = stack_depth + 1;
431        goto instr0;
432        instr0:
433        /* DUP */:
434        stack_depth = stack_depth + -1;
435        x = stack[stack_depth];
436        stack[stack_depth] = x;
437        stack_depth = stack_depth + 1;
438        stack[stack_depth] = x;
439        stack_depth = stack_depth + 1;
440        goto instr1;
441        instr1:
442        /* PUSH_CONST */:
443        stack[stack_depth] = 2;
444        stack_depth = stack_depth + 1;
445        goto instr2;
446
447        /* etc */
448
449You can see the generated machine code in assembly form via:
450
451.. code-block:: c++
452
453  ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DUMP_GENERATED_CODE, 1);
454  result = ctxt.compile ();
455
456which shows that (on this x86_64 box) the compiler has unrolled the loop
457and is using MMX instructions to perform several multiplications
458simultaneously:
459
460.. code-block:: gas
461
462          .file   "fake.c"
463          .text
464  .Ltext0:
465          .p2align 4,,15
466          .globl  factorial
467          .type   factorial, @function
468  factorial:
469  .LFB0:
470          .file 1 "factorial.toy"
471          .loc 1 14 0
472          .cfi_startproc
473  .LVL0:
474  .L2:
475          .loc 1 26 0
476          cmpl    $1, %edi
477          jle     .L13
478          leal    -1(%rdi), %edx
479          movl    %edx, %ecx
480          shrl    $2, %ecx
481          leal    0(,%rcx,4), %esi
482          testl   %esi, %esi
483          je      .L14
484          cmpl    $9, %edx
485          jbe     .L14
486          leal    -2(%rdi), %eax
487          movl    %eax, -16(%rsp)
488          leal    -3(%rdi), %eax
489          movd    -16(%rsp), %xmm0
490          movl    %edi, -16(%rsp)
491          movl    %eax, -12(%rsp)
492          movd    -16(%rsp), %xmm1
493          xorl    %eax, %eax
494          movl    %edx, -16(%rsp)
495          movd    -12(%rsp), %xmm4
496          movd    -16(%rsp), %xmm6
497          punpckldq       %xmm4, %xmm0
498          movdqa  .LC1(%rip), %xmm4
499          punpckldq       %xmm6, %xmm1
500          punpcklqdq      %xmm0, %xmm1
501          movdqa  .LC0(%rip), %xmm0
502          jmp     .L5
503          # etc - edited for brevity
504
505This is clearly overkill for a function that will likely overflow the
506``int`` type before the vectorization is worthwhile - but then again, this
507is a toy example.
508
509Turning down the optimization level to 2:
510
511.. code-block:: c++
512
513  ctxt.set_int_option (GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL, 2);
514
515yields this code, which is simple enough to quote in its entirety:
516
517.. code-block:: gas
518
519          .file   "fake.c"
520          .text
521          .p2align 4,,15
522          .globl  factorial
523          .type   factorial, @function
524  factorial:
525  .LFB0:
526          .cfi_startproc
527  .L2:
528          cmpl    $1, %edi
529          jle     .L8
530          movl    $1, %edx
531          jmp     .L4
532          .p2align 4,,10
533          .p2align 3
534  .L6:
535          movl    %eax, %edi
536  .L4:
537  .L5:
538          leal    -1(%rdi), %eax
539          imull   %edi, %edx
540          cmpl    $1, %eax
541          jne     .L6
542  .L3:
543  .L7:
544          imull   %edx, %eax
545          ret
546  .L8:
547          movl    %edi, %eax
548          movl    $1, %edx
549          jmp     .L7
550          .cfi_endproc
551  .LFE0:
552          .size   factorial, .-factorial
553          .ident  "GCC: (GNU) 4.9.0 20131023 (Red Hat 0.2-%{gcc_release})"
554          .section        .note.GNU-stack,"",@progbits
555
556Note that the stack pushing and popping have been eliminated, as has the
557recursive call (in favor of an iteration).
558
559Putting it all together
560***********************
561
562The complete example can be seen in the source tree at
563``gcc/jit/docs/examples/tut04-toyvm/toyvm.cc``
564
565along with a Makefile and a couple of sample .toy scripts:
566
567.. code-block:: console
568
569  $ ls -al
570  drwxrwxr-x. 2 david david   4096 Sep 19 17:46 .
571  drwxrwxr-x. 3 david david   4096 Sep 19 15:26 ..
572  -rw-rw-r--. 1 david david    615 Sep 19 12:43 factorial.toy
573  -rw-rw-r--. 1 david david    834 Sep 19 13:08 fibonacci.toy
574  -rw-rw-r--. 1 david david    238 Sep 19 14:22 Makefile
575  -rw-rw-r--. 1 david david  16457 Sep 19 17:07 toyvm.cc
576
577  $ make toyvm
578  g++ -Wall -g -o toyvm toyvm.cc -lgccjit
579
580  $ ./toyvm factorial.toy 10
581  interpreter result: 3628800
582  compiler result: 3628800
583
584  $ ./toyvm fibonacci.toy 10
585  interpreter result: 55
586  compiler result: 55
587
588Behind the curtain: How does our code get optimized?
589****************************************************
590
591Our example is done, but you may be wondering about exactly how the
592compiler turned what we gave it into the machine code seen above.
593
594We can examine what the compiler is doing in detail by setting:
595
596.. code-block:: c++
597
598  state.ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DUMP_EVERYTHING, 1);
599  state.ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_KEEP_INTERMEDIATES, 1);
600
601This will dump detailed information about the compiler's state to a
602directory under ``/tmp``, and keep it from being cleaned up.
603
604The precise names and their formats of these files is subject to change.
605Higher optimization levels lead to more files.
606Here's what I saw (edited for brevity; there were almost 200 files):
607
608.. code-block:: console
609
610  intermediate files written to /tmp/libgccjit-KPQbGw
611  $ ls /tmp/libgccjit-KPQbGw/
612  fake.c.000i.cgraph
613  fake.c.000i.type-inheritance
614  fake.c.004t.gimple
615  fake.c.007t.omplower
616  fake.c.008t.lower
617  fake.c.011t.eh
618  fake.c.012t.cfg
619  fake.c.014i.visibility
620  fake.c.015i.early_local_cleanups
621  fake.c.016t.ssa
622  # etc
623
624The gimple code is converted into Static Single Assignment form,
625with annotations for use when generating the debuginfo:
626
627.. code-block:: console
628
629  $ less /tmp/libgccjit-KPQbGw/fake.c.016t.ssa
630
631.. code-block:: c
632
633  ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
634
635  factorial (signed int arg)
636  {
637    signed int stack[8];
638    signed int stack_depth;
639    signed int x;
640    signed int y;
641    <unnamed type> _20;
642    signed int _21;
643    signed int _38;
644    signed int _44;
645    signed int _51;
646    signed int _56;
647
648  initial:
649    stack_depth_3 = 0;
650    # DEBUG stack_depth => stack_depth_3
651    stack[stack_depth_3] = arg_5(D);
652    stack_depth_7 = stack_depth_3 + 1;
653    # DEBUG stack_depth => stack_depth_7
654    # DEBUG instr0 => NULL
655    # DEBUG /* DUP */ => NULL
656    stack_depth_8 = stack_depth_7 + -1;
657    # DEBUG stack_depth => stack_depth_8
658    x_9 = stack[stack_depth_8];
659    # DEBUG x => x_9
660    stack[stack_depth_8] = x_9;
661    stack_depth_11 = stack_depth_8 + 1;
662    # DEBUG stack_depth => stack_depth_11
663    stack[stack_depth_11] = x_9;
664    stack_depth_13 = stack_depth_11 + 1;
665    # DEBUG stack_depth => stack_depth_13
666    # DEBUG instr1 => NULL
667    # DEBUG /* PUSH_CONST */ => NULL
668    stack[stack_depth_13] = 2;
669
670    /* etc; edited for brevity */
671
672We can perhaps better see the code by turning off
673:c:macro:`GCC_JIT_BOOL_OPTION_DEBUGINFO` to suppress all those ``DEBUG``
674statements, giving:
675
676.. code-block:: console
677
678  $ less /tmp/libgccjit-1Hywc0/fake.c.016t.ssa
679
680.. code-block:: c
681
682  ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
683
684  factorial (signed int arg)
685  {
686    signed int stack[8];
687    signed int stack_depth;
688    signed int x;
689    signed int y;
690    <unnamed type> _20;
691    signed int _21;
692    signed int _38;
693    signed int _44;
694    signed int _51;
695    signed int _56;
696
697  initial:
698    stack_depth_3 = 0;
699    stack[stack_depth_3] = arg_5(D);
700    stack_depth_7 = stack_depth_3 + 1;
701    stack_depth_8 = stack_depth_7 + -1;
702    x_9 = stack[stack_depth_8];
703    stack[stack_depth_8] = x_9;
704    stack_depth_11 = stack_depth_8 + 1;
705    stack[stack_depth_11] = x_9;
706    stack_depth_13 = stack_depth_11 + 1;
707    stack[stack_depth_13] = 2;
708    stack_depth_15 = stack_depth_13 + 1;
709    stack_depth_16 = stack_depth_15 + -1;
710    y_17 = stack[stack_depth_16];
711    stack_depth_18 = stack_depth_16 + -1;
712    x_19 = stack[stack_depth_18];
713    _20 = x_19 < y_17;
714    _21 = (signed int) _20;
715    stack[stack_depth_18] = _21;
716    stack_depth_23 = stack_depth_18 + 1;
717    stack_depth_24 = stack_depth_23 + -1;
718    x_25 = stack[stack_depth_24];
719    if (x_25 != 0)
720      goto <bb 4> (instr9);
721    else
722      goto <bb 3> (instr4);
723
724  instr4:
725  /* DUP */:
726    stack_depth_26 = stack_depth_24 + -1;
727    x_27 = stack[stack_depth_26];
728    stack[stack_depth_26] = x_27;
729    stack_depth_29 = stack_depth_26 + 1;
730    stack[stack_depth_29] = x_27;
731    stack_depth_31 = stack_depth_29 + 1;
732    stack[stack_depth_31] = 1;
733    stack_depth_33 = stack_depth_31 + 1;
734    stack_depth_34 = stack_depth_33 + -1;
735    y_35 = stack[stack_depth_34];
736    stack_depth_36 = stack_depth_34 + -1;
737    x_37 = stack[stack_depth_36];
738    _38 = x_37 - y_35;
739    stack[stack_depth_36] = _38;
740    stack_depth_40 = stack_depth_36 + 1;
741    stack_depth_41 = stack_depth_40 + -1;
742    x_42 = stack[stack_depth_41];
743    _44 = factorial (x_42);
744    stack[stack_depth_41] = _44;
745    stack_depth_46 = stack_depth_41 + 1;
746    stack_depth_47 = stack_depth_46 + -1;
747    y_48 = stack[stack_depth_47];
748    stack_depth_49 = stack_depth_47 + -1;
749    x_50 = stack[stack_depth_49];
750    _51 = x_50 * y_48;
751    stack[stack_depth_49] = _51;
752    stack_depth_53 = stack_depth_49 + 1;
753
754    # stack_depth_1 = PHI <stack_depth_24(2), stack_depth_53(3)>
755  instr9:
756  /* RETURN */:
757    stack_depth_54 = stack_depth_1 + -1;
758    x_55 = stack[stack_depth_54];
759    _56 = x_55;
760    stack ={v} {CLOBBER};
761    return _56;
762
763  }
764
765Note in the above how all the :type:`gccjit::block` instances we
766created have been consolidated into just 3 blocks in GCC's internal
767representation: ``initial``, ``instr4`` and ``instr9``.
768
769Optimizing away stack manipulation
770^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
771Recall our simple implementation of stack operations.  Let's examine
772how the stack operations are optimized away.
773
774After a pass of constant-propagation, the depth of the stack at each
775opcode can be determined at compile-time:
776
777.. code-block:: console
778
779  $ less /tmp/libgccjit-1Hywc0/fake.c.021t.ccp1
780
781.. code-block:: c
782
783  ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
784
785  factorial (signed int arg)
786  {
787    signed int stack[8];
788    signed int stack_depth;
789    signed int x;
790    signed int y;
791    <unnamed type> _20;
792    signed int _21;
793    signed int _38;
794    signed int _44;
795    signed int _51;
796
797  initial:
798    stack[0] = arg_5(D);
799    x_9 = stack[0];
800    stack[0] = x_9;
801    stack[1] = x_9;
802    stack[2] = 2;
803    y_17 = stack[2];
804    x_19 = stack[1];
805    _20 = x_19 < y_17;
806    _21 = (signed int) _20;
807    stack[1] = _21;
808    x_25 = stack[1];
809    if (x_25 != 0)
810      goto <bb 4> (instr9);
811    else
812      goto <bb 3> (instr4);
813
814  instr4:
815  /* DUP */:
816    x_27 = stack[0];
817    stack[0] = x_27;
818    stack[1] = x_27;
819    stack[2] = 1;
820    y_35 = stack[2];
821    x_37 = stack[1];
822    _38 = x_37 - y_35;
823    stack[1] = _38;
824    x_42 = stack[1];
825    _44 = factorial (x_42);
826    stack[1] = _44;
827    y_48 = stack[1];
828    x_50 = stack[0];
829    _51 = x_50 * y_48;
830    stack[0] = _51;
831
832  instr9:
833  /* RETURN */:
834    x_55 = stack[0];
835    x_56 = x_55;
836    stack ={v} {CLOBBER};
837    return x_56;
838
839  }
840
841Note how, in the above, all those ``stack_depth`` values are now just
842constants: we're accessing specific stack locations at each opcode.
843
844The "esra" pass ("Early Scalar Replacement of Aggregates") breaks
845out our "stack" array into individual elements:
846
847.. code-block:: console
848
849  $ less /tmp/libgccjit-1Hywc0/fake.c.024t.esra
850
851.. code-block:: c
852
853  ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
854
855  Created a replacement for stack offset: 0, size: 32: stack$0
856  Created a replacement for stack offset: 32, size: 32: stack$1
857  Created a replacement for stack offset: 64, size: 32: stack$2
858
859  Symbols to be put in SSA form
860  { D.89 D.90 D.91 }
861  Incremental SSA update started at block: 0
862  Number of blocks in CFG: 5
863  Number of blocks to update: 4 ( 80%)
864
865
866  factorial (signed int arg)
867  {
868    signed int stack$2;
869    signed int stack$1;
870    signed int stack$0;
871    signed int stack[8];
872    signed int stack_depth;
873    signed int x;
874    signed int y;
875    <unnamed type> _20;
876    signed int _21;
877    signed int _38;
878    signed int _44;
879    signed int _51;
880
881  initial:
882    stack$0_45 = arg_5(D);
883    x_9 = stack$0_45;
884    stack$0_39 = x_9;
885    stack$1_32 = x_9;
886    stack$2_30 = 2;
887    y_17 = stack$2_30;
888    x_19 = stack$1_32;
889    _20 = x_19 < y_17;
890    _21 = (signed int) _20;
891    stack$1_28 = _21;
892    x_25 = stack$1_28;
893    if (x_25 != 0)
894      goto <bb 4> (instr9);
895    else
896      goto <bb 3> (instr4);
897
898  instr4:
899  /* DUP */:
900    x_27 = stack$0_39;
901    stack$0_22 = x_27;
902    stack$1_14 = x_27;
903    stack$2_12 = 1;
904    y_35 = stack$2_12;
905    x_37 = stack$1_14;
906    _38 = x_37 - y_35;
907    stack$1_10 = _38;
908    x_42 = stack$1_10;
909    _44 = factorial (x_42);
910    stack$1_6 = _44;
911    y_48 = stack$1_6;
912    x_50 = stack$0_22;
913    _51 = x_50 * y_48;
914    stack$0_1 = _51;
915
916    # stack$0_52 = PHI <stack$0_39(2), stack$0_1(3)>
917  instr9:
918  /* RETURN */:
919    x_55 = stack$0_52;
920    x_56 = x_55;
921    stack ={v} {CLOBBER};
922    return x_56;
923
924  }
925
926Hence at this point, all those pushes and pops of the stack are now
927simply assignments to specific temporary variables.
928
929After some copy propagation, the stack manipulation has been completely
930optimized away:
931
932.. code-block:: console
933
934  $ less /tmp/libgccjit-1Hywc0/fake.c.026t.copyprop1
935
936.. code-block:: c
937
938  ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
939
940  factorial (signed int arg)
941  {
942    signed int stack$2;
943    signed int stack$1;
944    signed int stack$0;
945    signed int stack[8];
946    signed int stack_depth;
947    signed int x;
948    signed int y;
949    <unnamed type> _20;
950    signed int _21;
951    signed int _38;
952    signed int _44;
953    signed int _51;
954
955  initial:
956    stack$0_39 = arg_5(D);
957    _20 = arg_5(D) <= 1;
958    _21 = (signed int) _20;
959    if (_21 != 0)
960      goto <bb 4> (instr9);
961    else
962      goto <bb 3> (instr4);
963
964  instr4:
965  /* DUP */:
966    _38 = arg_5(D) + -1;
967    _44 = factorial (_38);
968    _51 = arg_5(D) * _44;
969    stack$0_1 = _51;
970
971    # stack$0_52 = PHI <arg_5(D)(2), _51(3)>
972  instr9:
973  /* RETURN */:
974    stack ={v} {CLOBBER};
975    return stack$0_52;
976
977  }
978
979Later on, another pass finally eliminated ``stack_depth`` local and the
980unused parts of the `stack`` array altogether:
981
982.. code-block:: console
983
984  $ less /tmp/libgccjit-1Hywc0/fake.c.036t.release_ssa
985
986.. code-block:: c
987
988  ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
989
990  Released 44 names, 314.29%, removed 44 holes
991  factorial (signed int arg)
992  {
993    signed int stack$0;
994    signed int mult_acc_1;
995    <unnamed type> _5;
996    signed int _6;
997    signed int _7;
998    signed int mul_tmp_10;
999    signed int mult_acc_11;
1000    signed int mult_acc_13;
1001
1002    # arg_9 = PHI <arg_8(D)(0)>
1003    # mult_acc_13 = PHI <1(0)>
1004  initial:
1005
1006    <bb 5>:
1007    # arg_4 = PHI <arg_9(2), _7(3)>
1008    # mult_acc_1 = PHI <mult_acc_13(2), mult_acc_11(3)>
1009    _5 = arg_4 <= 1;
1010    _6 = (signed int) _5;
1011    if (_6 != 0)
1012      goto <bb 4> (instr9);
1013    else
1014      goto <bb 3> (instr4);
1015
1016  instr4:
1017  /* DUP */:
1018    _7 = arg_4 + -1;
1019    mult_acc_11 = mult_acc_1 * arg_4;
1020    goto <bb 5>;
1021
1022    # stack$0_12 = PHI <arg_4(5)>
1023  instr9:
1024  /* RETURN */:
1025    mul_tmp_10 = mult_acc_1 * stack$0_12;
1026    return mul_tmp_10;
1027
1028  }
1029
1030
1031Elimination of tail recursion
1032^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1033Another significant optimization is the detection that the call to
1034``factorial`` is tail recursion, which can be eliminated in favor of
1035an iteration:
1036
1037.. code-block:: console
1038
1039  $ less /tmp/libgccjit-1Hywc0/fake.c.030t.tailr1
1040
1041.. code-block:: c
1042
1043  ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
1044
1045
1046  Symbols to be put in SSA form
1047  { D.88 }
1048  Incremental SSA update started at block: 0
1049  Number of blocks in CFG: 5
1050  Number of blocks to update: 4 ( 80%)
1051
1052
1053  factorial (signed int arg)
1054  {
1055    signed int stack$2;
1056    signed int stack$1;
1057    signed int stack$0;
1058    signed int stack[8];
1059    signed int stack_depth;
1060    signed int x;
1061    signed int y;
1062    signed int mult_acc_1;
1063    <unnamed type> _20;
1064    signed int _21;
1065    signed int _38;
1066    signed int mul_tmp_44;
1067    signed int mult_acc_51;
1068
1069    # arg_5 = PHI <arg_39(D)(0), _38(3)>
1070    # mult_acc_1 = PHI <1(0), mult_acc_51(3)>
1071  initial:
1072    _20 = arg_5 <= 1;
1073    _21 = (signed int) _20;
1074    if (_21 != 0)
1075      goto <bb 4> (instr9);
1076    else
1077      goto <bb 3> (instr4);
1078
1079  instr4:
1080  /* DUP */:
1081    _38 = arg_5 + -1;
1082    mult_acc_51 = mult_acc_1 * arg_5;
1083    goto <bb 2> (initial);
1084
1085    # stack$0_52 = PHI <arg_5(2)>
1086  instr9:
1087  /* RETURN */:
1088    stack ={v} {CLOBBER};
1089    mul_tmp_44 = mult_acc_1 * stack$0_52;
1090    return mul_tmp_44;
1091
1092  }
1093