1.. Copyright (C) 2014-2020 Free Software Foundation, Inc.
2   Originally contributed by David Malcolm <dmalcolm@redhat.com>
3
4   This is free software: you can redistribute it and/or modify it
5   under the terms of the GNU General Public License as published by
6   the Free Software Foundation, either version 3 of the License, or
7   (at your option) any later version.
8
9   This program is distributed in the hope that it will be useful, but
10   WITHOUT ANY WARRANTY; without even the implied warranty of
11   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
12   General Public License for more details.
13
14   You should have received a copy of the GNU General Public License
15   along with this program.  If not, see
16   <http://www.gnu.org/licenses/>.
17
18Tutorial part 4: Adding JIT-compilation to a toy interpreter
19------------------------------------------------------------
20In this example we construct a "toy" interpreter, and add JIT-compilation
21to it.
22
23Our toy interpreter
24*******************
25
26It's a stack-based interpreter, and is intended as a (very simple) example
27of the kind of bytecode interpreter seen in dynamic languages such as
28Python, Ruby etc.
29
30For the sake of simplicity, our toy virtual machine is very limited:
31
32  * The only data type is `int`
33
34  * It can only work on one function at a time (so that the only
35    function call that can be made is to recurse).
36
37  * Functions can only take one parameter.
38
39  * Functions have a stack of `int` values.
40
41  * We'll implement function call within the interpreter by calling a
42    function in our implementation, rather than implementing our own
43    frame stack.
44
45  * The parser is only good enough to get the examples to work.
46
47Naturally, a real interpreter would be much more complicated that this.
48
49The following operations are supported:
50
51====================== ======================== =============== ==============
52Operation              Meaning                  Old Stack       New Stack
53====================== ======================== =============== ==============
54DUP                    Duplicate top of stack.  ``[..., x]``    ``[..., x, x]``
55ROT                    Swap top two elements    ``[..., x, y]`` ``[..., y, x]``
56                       of stack.
57BINARY_ADD             Add the top two elements ``[..., x, y]`` ``[..., (x+y)]``
58                       on the stack.
59BINARY_SUBTRACT        Likewise, but subtract.  ``[..., x, y]`` ``[..., (x-y)]``
60BINARY_MULT            Likewise, but multiply.  ``[..., x, y]`` ``[..., (x*y)]``
61BINARY_COMPARE_LT      Compare the top two      ``[..., x, y]`` ``[..., (x<y)]``
62                       elements on the stack
63                       and push a nonzero/zero
64                       if (x<y).
65RECURSE                Recurse, passing the top ``[..., x]``    ``[..., fn(x)]``
66                       of the stack, and
67                       popping the result.
68RETURN                 Return the top of the    ``[x]``         ``[]``
69                       stack.
70PUSH_CONST `arg`       Push an int const.       ``[...]``       ``[..., arg]``
71JUMP_ABS_IF_TRUE `arg` Pop; if top of stack was ``[..., x]``    ``[...]``
72                       nonzero, jump to
73                       ``arg``.
74====================== ======================== =============== ==============
75
76Programs can be interpreted, disassembled, and compiled to machine code.
77
78The interpreter reads ``.toy`` scripts.  Here's what a simple recursive
79factorial program looks like, the script ``factorial.toy``.
80The parser ignores lines beginning with a `#`.
81
82   .. literalinclude:: ../examples/tut04-toyvm/factorial.toy
83    :lines: 1-
84    :language: c
85
86The interpreter is a simple infinite loop with a big ``switch`` statement
87based on what the next opcode is:
88
89   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
90    :start-after: /* Execute the given function.  */
91    :end-before: /* JIT compilation.  */
92    :language: c
93
94Compiling to machine code
95*************************
96We want to generate machine code that can be cast to this type and
97then directly executed in-process:
98
99   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
100    :start-after: /* Functions are compiled to this function ptr type.  */
101    :end-before: enum opcode
102    :language: c
103
104The lifetime of the code is tied to that of a :c:type:`gcc_jit_result *`.
105We'll handle this by bundling them up in a structure, so that we can
106clean them up together by calling :c:func:`gcc_jit_result_release`:
107
108   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
109    :start-after: /* A struct to hold the compilation results.  */
110    :end-before: /* The main compilation hook.  */
111    :language: c
112
113Our compiler isn't very sophisticated; it takes the implementation of
114each opcode above, and maps it directly to the operations supported by
115the libgccjit API.
116
117How should we handle the stack?  In theory we could calculate what the
118stack depth will be at each opcode, and optimize away the stack
119manipulation "by hand".  We'll see below that libgccjit is able to do
120this for us, so we'll implement stack manipulation
121in a direct way, by creating a ``stack`` array and ``stack_depth``
122variables, local within the generated function, equivalent to this C code:
123
124.. code-block:: c
125
126  int stack_depth;
127  int stack[MAX_STACK_DEPTH];
128
129We'll also have local variables ``x`` and ``y`` for use when implementing
130the opcodes, equivalent to this:
131
132.. code-block:: c
133
134  int x;
135  int y;
136
137This means our compiler has the following state:
138
139   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
140    :start-after: /* JIT compilation.  */
141    :end-before: /* Stack manipulation.  */
142    :language: c
143
144Setting things up
145*****************
146
147First we create our types:
148
149   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
150    :start-after: /* Create types.  */
151    :end-before: /* The constant value 1.  */
152    :language: c
153
154along with extracting a useful `int` constant:
155
156   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
157    :start-after: /* The constant value 1.  */
158    :end-before: /* Create locations.  */
159    :language: c
160
161We'll implement push and pop in terms of the ``stack`` array and
162``stack_depth``.  Here are helper functions for adding statements to
163a block, implementing pushing and popping values:
164
165   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
166    :start-after: /* Stack manipulation.  */
167    :end-before: /* A struct to hold the compilation results.  */
168    :language: c
169
170We will support single-stepping through the generated code in the
171debugger, so we need to create :c:type:`gcc_jit_location` instances, one
172per operation in the source code.  These will reference the lines of
173e.g. ``factorial.toy``.
174
175   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
176    :start-after: /* Create locations.  */
177    :end-before: /* Creating the function.  */
178    :language: c
179
180Let's create the function itself.  As usual, we create its parameter
181first, then use the parameter to create the function:
182
183   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
184    :start-after: /* Creating the function.  */
185    :end-before: /* Create stack lvalues.  */
186    :language: c
187
188We create the locals within the function.
189
190   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
191    :start-after: /* Create stack lvalues.  */
192    :end-before: /* 1st pass: create blocks, one per opcode.
193    :language: c
194
195Populating the function
196***********************
197
198There's some one-time initialization, and the API treats the first block
199you create as the entrypoint of the function, so we need to create that
200block first:
201
202   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
203    :start-after: first.  */
204    :end-before: /* Create a block per operation.  */
205    :language: c
206
207We can now create blocks for each of the operations.  Most of these will
208be consolidated into larger blocks when the optimizer runs.
209
210   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
211    :start-after: /* Create a block per operation.  */
212    :end-before: /* Populate the initial block.  */
213    :language: c
214
215Now that we have a block it can jump to when it's done, we can populate
216the initial block:
217
218   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
219    :start-after: /* Populate the initial block.  */
220    :end-before: /* 2nd pass: fill in instructions.  */
221    :language: c
222
223We can now populate the blocks for the individual operations.  We loop
224through them, adding instructions to their blocks:
225
226   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
227    :start-after: /* 2nd pass: fill in instructions.  */
228    :end-before: /* Helper macros.  */
229    :language: c
230
231We're going to have another big ``switch`` statement for implementing
232the opcodes, this time for compiling them, rather than interpreting
233them.  It's helpful to have macros for implementing push and pop, so that
234we can make the ``switch`` statement that's coming up look as much as
235possible like the one above within the interpreter:
236
237.. literalinclude:: ../examples/tut04-toyvm/toyvm.c
238    :start-after: /* Helper macros.  */
239    :end-before: gcc_jit_block_add_comment
240    :language: c
241
242.. note::
243
244   A particularly clever implementation would have an *identical*
245   ``switch`` statement shared by the interpreter and the compiler, with
246   some preprocessor "magic".  We're not doing that here, for the sake
247   of simplicity.
248
249When I first implemented this compiler, I accidentally missed an edit
250when copying and pasting the ``Y_EQUALS_POP`` macro, so that popping the
251stack into ``y`` instead erroneously assigned it to ``x``, leaving ``y``
252uninitialized.
253
254To track this kind of thing down, we can use
255:c:func:`gcc_jit_block_add_comment` to add descriptive comments
256to the internal representation.  This is invaluable when looking through
257the generated IR for, say ``factorial``:
258
259   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
260    :start-after: PUSH_RVALUE (gcc_jit_lvalue_as_rvalue (state.y))
261    :end-before: /* Handle the individual opcodes.  */
262    :language: c
263
264We can now write the big ``switch`` statement that implements the
265individual opcodes, populating the relevant block with statements:
266
267   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
268    :start-after: /* Handle the individual opcodes.  */
269    :end-before: /* Go to the next block.  */
270    :language: c
271
272Every block must be terminated, via a call to one of the
273``gcc_jit_block_end_with_`` entrypoints.  This has been done for two
274of the opcodes, but we need to do it for the other ones, by jumping
275to the next block.
276
277   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
278    :start-after: /* Go to the next block.  */
279    :end-before: /* end of loop on PC locations.  */
280    :language: c
281
282This is analogous to simply incrementing the program counter.
283
284Verifying the control flow graph
285********************************
286Having finished looping over the blocks, the context is complete.
287
288As before, we can verify that the control flow and statements are sane by
289using :c:func:`gcc_jit_function_dump_to_dot`:
290
291.. code-block:: c
292
293  gcc_jit_function_dump_to_dot (state.fn, "/tmp/factorial.dot");
294
295and viewing the result.  Note how the label names, comments, and
296variable names show up in the dump, to make it easier to spot
297errors in our compiler.
298
299  .. figure:: factorial.png
300    :alt: image of a control flow graph
301
302Compiling the context
303*********************
304Having finished looping over the blocks and populating them with
305statements, the context is complete.
306
307We can now compile it, and extract machine code from the result:
308
309   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
310    :start-after: /* We've now finished populating the context.  Compile it.  */
311    :end-before: /* (this leaks "result" and "funcname") */
312    :language: c
313
314We can now run the result:
315
316   .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
317    :start-after: /* JIT-compilation.  */
318    :end-before: return 0;
319    :language: c
320
321Single-stepping through the generated code
322******************************************
323
324It's possible to debug the generated code.  To do this we need to both:
325
326  * Set up source code locations for our statements, so that we can
327    meaningfully step through the code.  We did this above by
328    calling :c:func:`gcc_jit_context_new_location` and using the
329    results.
330
331  * Enable the generation of debugging information, by setting
332    :c:macro:`GCC_JIT_BOOL_OPTION_DEBUGINFO` on the
333    :c:type:`gcc_jit_context` via
334    :c:func:`gcc_jit_context_set_bool_option`:
335
336    .. code-block:: c
337
338     gcc_jit_context_set_bool_option (
339       ctxt,
340       GCC_JIT_BOOL_OPTION_DEBUGINFO,
341       1);
342
343Having done this, we can put a breakpoint on the generated function:
344
345.. code-block:: console
346
347  $ gdb --args ./toyvm factorial.toy 10
348  (gdb) break factorial
349  Function "factorial" not defined.
350  Make breakpoint pending on future shared library load? (y or [n]) y
351  Breakpoint 1 (factorial) pending.
352  (gdb) run
353  Breakpoint 1, factorial (arg=10) at factorial.toy:14
354  14	DUP
355
356We've set up location information, which references ``factorial.toy``.
357This allows us to use e.g. ``list`` to see where we are in the script:
358
359.. code-block:: console
360
361  (gdb) list
362  9
363  10    # Initial state:
364  11    # stack: [arg]
365  12
366  13    # 0:
367  14    DUP
368  15    # stack: [arg, arg]
369  16
370  17    # 1:
371  18    PUSH_CONST 2
372
373and to step through the function, examining the data:
374
375.. code-block:: console
376
377  (gdb) n
378  18    PUSH_CONST 2
379  (gdb) n
380  22    BINARY_COMPARE_LT
381  (gdb) print stack
382  $5 = {10, 10, 2, 0, -7152, 32767, 0, 0}
383  (gdb) print stack_depth
384  $6 = 3
385
386You'll see that the parts of the ``stack`` array that haven't been
387touched yet are uninitialized.
388
389.. note::
390
391   Turning on optimizations may lead to unpredictable results when
392   stepping through the generated code: the execution may appear to
393   "jump around" the source code.  This is analogous to turning up the
394   optimization level in a regular compiler.
395
396Examining the generated code
397****************************
398
399How good is the optimized code?
400
401We can turn up optimizations, by calling
402:c:func:`gcc_jit_context_set_int_option` with
403:c:macro:`GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL`:
404
405.. code-block:: c
406
407  gcc_jit_context_set_int_option (
408    ctxt,
409    GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL,
410    3);
411
412One of GCC's internal representations is called "gimple".  A dump of the
413initial gimple representation of the code can be seen by setting:
414
415.. code-block:: c
416
417  gcc_jit_context_set_bool_option (ctxt,
418                                   GCC_JIT_BOOL_OPTION_DUMP_INITIAL_GIMPLE,
419                                   1);
420
421With optimization on and source locations displayed, this gives:
422
423.. We'll use "c" for gimple dumps
424
425.. code-block:: c
426
427  factorial (signed int arg)
428  {
429    <unnamed type> D.80;
430    signed int D.81;
431    signed int D.82;
432    signed int D.83;
433    signed int D.84;
434    signed int D.85;
435    signed int y;
436    signed int x;
437    signed int stack_depth;
438    signed int stack[8];
439
440    try
441      {
442        initial:
443        stack_depth = 0;
444        stack[stack_depth] = arg;
445        stack_depth = stack_depth + 1;
446        goto instr0;
447        instr0:
448        /* DUP */:
449        stack_depth = stack_depth + -1;
450        x = stack[stack_depth];
451        stack[stack_depth] = x;
452        stack_depth = stack_depth + 1;
453        stack[stack_depth] = x;
454        stack_depth = stack_depth + 1;
455        goto instr1;
456        instr1:
457        /* PUSH_CONST */:
458        stack[stack_depth] = 2;
459        stack_depth = stack_depth + 1;
460        goto instr2;
461
462        /* etc */
463
464You can see the generated machine code in assembly form via:
465
466.. code-block:: c
467
468  gcc_jit_context_set_bool_option (
469    ctxt,
470    GCC_JIT_BOOL_OPTION_DUMP_GENERATED_CODE,
471    1);
472  result = gcc_jit_context_compile (ctxt);
473
474which shows that (on this x86_64 box) the compiler has unrolled the loop
475and is using MMX instructions to perform several multiplications
476simultaneously:
477
478.. code-block:: gas
479
480          .file   "fake.c"
481          .text
482  .Ltext0:
483          .p2align 4,,15
484          .globl  factorial
485          .type   factorial, @function
486  factorial:
487  .LFB0:
488          .file 1 "factorial.toy"
489          .loc 1 14 0
490          .cfi_startproc
491  .LVL0:
492  .L2:
493          .loc 1 26 0
494          cmpl    $1, %edi
495          jle     .L13
496          leal    -1(%rdi), %edx
497          movl    %edx, %ecx
498          shrl    $2, %ecx
499          leal    0(,%rcx,4), %esi
500          testl   %esi, %esi
501          je      .L14
502          cmpl    $9, %edx
503          jbe     .L14
504          leal    -2(%rdi), %eax
505          movl    %eax, -16(%rsp)
506          leal    -3(%rdi), %eax
507          movd    -16(%rsp), %xmm0
508          movl    %edi, -16(%rsp)
509          movl    %eax, -12(%rsp)
510          movd    -16(%rsp), %xmm1
511          xorl    %eax, %eax
512          movl    %edx, -16(%rsp)
513          movd    -12(%rsp), %xmm4
514          movd    -16(%rsp), %xmm6
515          punpckldq       %xmm4, %xmm0
516          movdqa  .LC1(%rip), %xmm4
517          punpckldq       %xmm6, %xmm1
518          punpcklqdq      %xmm0, %xmm1
519          movdqa  .LC0(%rip), %xmm0
520          jmp     .L5
521          # etc - edited for brevity
522
523This is clearly overkill for a function that will likely overflow the
524``int`` type before the vectorization is worthwhile - but then again, this
525is a toy example.
526
527Turning down the optimization level to 2:
528
529.. code-block:: c
530
531  gcc_jit_context_set_int_option (
532    ctxt,
533    GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL,
534    3);
535
536yields this code, which is simple enough to quote in its entirety:
537
538.. code-block:: gas
539
540          .file   "fake.c"
541          .text
542          .p2align 4,,15
543          .globl  factorial
544          .type   factorial, @function
545  factorial:
546  .LFB0:
547          .cfi_startproc
548  .L2:
549          cmpl    $1, %edi
550          jle     .L8
551          movl    $1, %edx
552          jmp     .L4
553          .p2align 4,,10
554          .p2align 3
555  .L6:
556          movl    %eax, %edi
557  .L4:
558  .L5:
559          leal    -1(%rdi), %eax
560          imull   %edi, %edx
561          cmpl    $1, %eax
562          jne     .L6
563  .L3:
564  .L7:
565          imull   %edx, %eax
566          ret
567  .L8:
568          movl    %edi, %eax
569          movl    $1, %edx
570          jmp     .L7
571          .cfi_endproc
572  .LFE0:
573          .size   factorial, .-factorial
574          .ident  "GCC: (GNU) 4.9.0 20131023 (Red Hat 0.2-%{gcc_release})"
575          .section        .note.GNU-stack,"",@progbits
576
577Note that the stack pushing and popping have been eliminated, as has the
578recursive call (in favor of an iteration).
579
580Putting it all together
581***********************
582
583The complete example can be seen in the source tree at
584``gcc/jit/docs/examples/tut04-toyvm/toyvm.c``
585
586along with a Makefile and a couple of sample .toy scripts:
587
588.. code-block:: console
589
590  $ ls -al
591  drwxrwxr-x. 2 david david   4096 Sep 19 17:46 .
592  drwxrwxr-x. 3 david david   4096 Sep 19 15:26 ..
593  -rw-rw-r--. 1 david david    615 Sep 19 12:43 factorial.toy
594  -rw-rw-r--. 1 david david    834 Sep 19 13:08 fibonacci.toy
595  -rw-rw-r--. 1 david david    238 Sep 19 14:22 Makefile
596  -rw-rw-r--. 1 david david  16457 Sep 19 17:07 toyvm.c
597
598  $ make toyvm
599  g++ -Wall -g -o toyvm toyvm.c -lgccjit
600
601  $ ./toyvm factorial.toy 10
602  interpreter result: 3628800
603  compiler result: 3628800
604
605  $ ./toyvm fibonacci.toy 10
606  interpreter result: 55
607  compiler result: 55
608
609Behind the curtain: How does our code get optimized?
610****************************************************
611
612Our example is done, but you may be wondering about exactly how the
613compiler turned what we gave it into the machine code seen above.
614
615We can examine what the compiler is doing in detail by setting:
616
617.. code-block:: c
618
619  gcc_jit_context_set_bool_option (state.ctxt,
620                                   GCC_JIT_BOOL_OPTION_DUMP_EVERYTHING,
621                                   1);
622  gcc_jit_context_set_bool_option (state.ctxt,
623                                   GCC_JIT_BOOL_OPTION_KEEP_INTERMEDIATES,
624                                   1);
625
626This will dump detailed information about the compiler's state to a
627directory under ``/tmp``, and keep it from being cleaned up.
628
629The precise names and their formats of these files is subject to change.
630Higher optimization levels lead to more files.
631Here's what I saw (edited for brevity; there were almost 200 files):
632
633.. code-block:: console
634
635  intermediate files written to /tmp/libgccjit-KPQbGw
636  $ ls /tmp/libgccjit-KPQbGw/
637  fake.c.000i.cgraph
638  fake.c.000i.type-inheritance
639  fake.c.004t.gimple
640  fake.c.007t.omplower
641  fake.c.008t.lower
642  fake.c.011t.eh
643  fake.c.012t.cfg
644  fake.c.014i.visibility
645  fake.c.015i.early_local_cleanups
646  fake.c.016t.ssa
647  # etc
648
649The gimple code is converted into Static Single Assignment form,
650with annotations for use when generating the debuginfo:
651
652.. code-block:: console
653
654  $ less /tmp/libgccjit-KPQbGw/fake.c.016t.ssa
655
656.. code-block:: c
657
658  ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
659
660  factorial (signed int arg)
661  {
662    signed int stack[8];
663    signed int stack_depth;
664    signed int x;
665    signed int y;
666    <unnamed type> _20;
667    signed int _21;
668    signed int _38;
669    signed int _44;
670    signed int _51;
671    signed int _56;
672
673  initial:
674    stack_depth_3 = 0;
675    # DEBUG stack_depth => stack_depth_3
676    stack[stack_depth_3] = arg_5(D);
677    stack_depth_7 = stack_depth_3 + 1;
678    # DEBUG stack_depth => stack_depth_7
679    # DEBUG instr0 => NULL
680    # DEBUG /* DUP */ => NULL
681    stack_depth_8 = stack_depth_7 + -1;
682    # DEBUG stack_depth => stack_depth_8
683    x_9 = stack[stack_depth_8];
684    # DEBUG x => x_9
685    stack[stack_depth_8] = x_9;
686    stack_depth_11 = stack_depth_8 + 1;
687    # DEBUG stack_depth => stack_depth_11
688    stack[stack_depth_11] = x_9;
689    stack_depth_13 = stack_depth_11 + 1;
690    # DEBUG stack_depth => stack_depth_13
691    # DEBUG instr1 => NULL
692    # DEBUG /* PUSH_CONST */ => NULL
693    stack[stack_depth_13] = 2;
694
695    /* etc; edited for brevity */
696
697We can perhaps better see the code by turning off
698:c:macro:`GCC_JIT_BOOL_OPTION_DEBUGINFO` to suppress all those ``DEBUG``
699statements, giving:
700
701.. code-block:: console
702
703  $ less /tmp/libgccjit-1Hywc0/fake.c.016t.ssa
704
705.. code-block:: c
706
707  ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
708
709  factorial (signed int arg)
710  {
711    signed int stack[8];
712    signed int stack_depth;
713    signed int x;
714    signed int y;
715    <unnamed type> _20;
716    signed int _21;
717    signed int _38;
718    signed int _44;
719    signed int _51;
720    signed int _56;
721
722  initial:
723    stack_depth_3 = 0;
724    stack[stack_depth_3] = arg_5(D);
725    stack_depth_7 = stack_depth_3 + 1;
726    stack_depth_8 = stack_depth_7 + -1;
727    x_9 = stack[stack_depth_8];
728    stack[stack_depth_8] = x_9;
729    stack_depth_11 = stack_depth_8 + 1;
730    stack[stack_depth_11] = x_9;
731    stack_depth_13 = stack_depth_11 + 1;
732    stack[stack_depth_13] = 2;
733    stack_depth_15 = stack_depth_13 + 1;
734    stack_depth_16 = stack_depth_15 + -1;
735    y_17 = stack[stack_depth_16];
736    stack_depth_18 = stack_depth_16 + -1;
737    x_19 = stack[stack_depth_18];
738    _20 = x_19 < y_17;
739    _21 = (signed int) _20;
740    stack[stack_depth_18] = _21;
741    stack_depth_23 = stack_depth_18 + 1;
742    stack_depth_24 = stack_depth_23 + -1;
743    x_25 = stack[stack_depth_24];
744    if (x_25 != 0)
745      goto <bb 4> (instr9);
746    else
747      goto <bb 3> (instr4);
748
749  instr4:
750  /* DUP */:
751    stack_depth_26 = stack_depth_24 + -1;
752    x_27 = stack[stack_depth_26];
753    stack[stack_depth_26] = x_27;
754    stack_depth_29 = stack_depth_26 + 1;
755    stack[stack_depth_29] = x_27;
756    stack_depth_31 = stack_depth_29 + 1;
757    stack[stack_depth_31] = 1;
758    stack_depth_33 = stack_depth_31 + 1;
759    stack_depth_34 = stack_depth_33 + -1;
760    y_35 = stack[stack_depth_34];
761    stack_depth_36 = stack_depth_34 + -1;
762    x_37 = stack[stack_depth_36];
763    _38 = x_37 - y_35;
764    stack[stack_depth_36] = _38;
765    stack_depth_40 = stack_depth_36 + 1;
766    stack_depth_41 = stack_depth_40 + -1;
767    x_42 = stack[stack_depth_41];
768    _44 = factorial (x_42);
769    stack[stack_depth_41] = _44;
770    stack_depth_46 = stack_depth_41 + 1;
771    stack_depth_47 = stack_depth_46 + -1;
772    y_48 = stack[stack_depth_47];
773    stack_depth_49 = stack_depth_47 + -1;
774    x_50 = stack[stack_depth_49];
775    _51 = x_50 * y_48;
776    stack[stack_depth_49] = _51;
777    stack_depth_53 = stack_depth_49 + 1;
778
779    # stack_depth_1 = PHI <stack_depth_24(2), stack_depth_53(3)>
780  instr9:
781  /* RETURN */:
782    stack_depth_54 = stack_depth_1 + -1;
783    x_55 = stack[stack_depth_54];
784    _56 = x_55;
785    stack ={v} {CLOBBER};
786    return _56;
787
788  }
789
790Note in the above how all the :c:type:`gcc_jit_block` instances we
791created have been consolidated into just 3 blocks in GCC's internal
792representation: ``initial``, ``instr4`` and ``instr9``.
793
794Optimizing away stack manipulation
795^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
796Recall our simple implementation of stack operations.  Let's examine
797how the stack operations are optimized away.
798
799After a pass of constant-propagation, the depth of the stack at each
800opcode can be determined at compile-time:
801
802.. code-block:: console
803
804  $ less /tmp/libgccjit-1Hywc0/fake.c.021t.ccp1
805
806.. code-block:: c
807
808  ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
809
810  factorial (signed int arg)
811  {
812    signed int stack[8];
813    signed int stack_depth;
814    signed int x;
815    signed int y;
816    <unnamed type> _20;
817    signed int _21;
818    signed int _38;
819    signed int _44;
820    signed int _51;
821
822  initial:
823    stack[0] = arg_5(D);
824    x_9 = stack[0];
825    stack[0] = x_9;
826    stack[1] = x_9;
827    stack[2] = 2;
828    y_17 = stack[2];
829    x_19 = stack[1];
830    _20 = x_19 < y_17;
831    _21 = (signed int) _20;
832    stack[1] = _21;
833    x_25 = stack[1];
834    if (x_25 != 0)
835      goto <bb 4> (instr9);
836    else
837      goto <bb 3> (instr4);
838
839  instr4:
840  /* DUP */:
841    x_27 = stack[0];
842    stack[0] = x_27;
843    stack[1] = x_27;
844    stack[2] = 1;
845    y_35 = stack[2];
846    x_37 = stack[1];
847    _38 = x_37 - y_35;
848    stack[1] = _38;
849    x_42 = stack[1];
850    _44 = factorial (x_42);
851    stack[1] = _44;
852    y_48 = stack[1];
853    x_50 = stack[0];
854    _51 = x_50 * y_48;
855    stack[0] = _51;
856
857  instr9:
858  /* RETURN */:
859    x_55 = stack[0];
860    x_56 = x_55;
861    stack ={v} {CLOBBER};
862    return x_56;
863
864  }
865
866Note how, in the above, all those ``stack_depth`` values are now just
867constants: we're accessing specific stack locations at each opcode.
868
869The "esra" pass ("Early Scalar Replacement of Aggregates") breaks
870out our "stack" array into individual elements:
871
872.. code-block:: console
873
874  $ less /tmp/libgccjit-1Hywc0/fake.c.024t.esra
875
876.. code-block:: c
877
878  ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
879
880  Created a replacement for stack offset: 0, size: 32: stack$0
881  Created a replacement for stack offset: 32, size: 32: stack$1
882  Created a replacement for stack offset: 64, size: 32: stack$2
883
884  Symbols to be put in SSA form
885  { D.89 D.90 D.91 }
886  Incremental SSA update started at block: 0
887  Number of blocks in CFG: 5
888  Number of blocks to update: 4 ( 80%)
889
890
891  factorial (signed int arg)
892  {
893    signed int stack$2;
894    signed int stack$1;
895    signed int stack$0;
896    signed int stack[8];
897    signed int stack_depth;
898    signed int x;
899    signed int y;
900    <unnamed type> _20;
901    signed int _21;
902    signed int _38;
903    signed int _44;
904    signed int _51;
905
906  initial:
907    stack$0_45 = arg_5(D);
908    x_9 = stack$0_45;
909    stack$0_39 = x_9;
910    stack$1_32 = x_9;
911    stack$2_30 = 2;
912    y_17 = stack$2_30;
913    x_19 = stack$1_32;
914    _20 = x_19 < y_17;
915    _21 = (signed int) _20;
916    stack$1_28 = _21;
917    x_25 = stack$1_28;
918    if (x_25 != 0)
919      goto <bb 4> (instr9);
920    else
921      goto <bb 3> (instr4);
922
923  instr4:
924  /* DUP */:
925    x_27 = stack$0_39;
926    stack$0_22 = x_27;
927    stack$1_14 = x_27;
928    stack$2_12 = 1;
929    y_35 = stack$2_12;
930    x_37 = stack$1_14;
931    _38 = x_37 - y_35;
932    stack$1_10 = _38;
933    x_42 = stack$1_10;
934    _44 = factorial (x_42);
935    stack$1_6 = _44;
936    y_48 = stack$1_6;
937    x_50 = stack$0_22;
938    _51 = x_50 * y_48;
939    stack$0_1 = _51;
940
941    # stack$0_52 = PHI <stack$0_39(2), stack$0_1(3)>
942  instr9:
943  /* RETURN */:
944    x_55 = stack$0_52;
945    x_56 = x_55;
946    stack ={v} {CLOBBER};
947    return x_56;
948
949  }
950
951Hence at this point, all those pushes and pops of the stack are now
952simply assignments to specific temporary variables.
953
954After some copy propagation, the stack manipulation has been completely
955optimized away:
956
957.. code-block:: console
958
959  $ less /tmp/libgccjit-1Hywc0/fake.c.026t.copyprop1
960
961.. code-block:: c
962
963  ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
964
965  factorial (signed int arg)
966  {
967    signed int stack$2;
968    signed int stack$1;
969    signed int stack$0;
970    signed int stack[8];
971    signed int stack_depth;
972    signed int x;
973    signed int y;
974    <unnamed type> _20;
975    signed int _21;
976    signed int _38;
977    signed int _44;
978    signed int _51;
979
980  initial:
981    stack$0_39 = arg_5(D);
982    _20 = arg_5(D) <= 1;
983    _21 = (signed int) _20;
984    if (_21 != 0)
985      goto <bb 4> (instr9);
986    else
987      goto <bb 3> (instr4);
988
989  instr4:
990  /* DUP */:
991    _38 = arg_5(D) + -1;
992    _44 = factorial (_38);
993    _51 = arg_5(D) * _44;
994    stack$0_1 = _51;
995
996    # stack$0_52 = PHI <arg_5(D)(2), _51(3)>
997  instr9:
998  /* RETURN */:
999    stack ={v} {CLOBBER};
1000    return stack$0_52;
1001
1002  }
1003
1004Later on, another pass finally eliminated ``stack_depth`` local and the
1005unused parts of the `stack`` array altogether:
1006
1007.. code-block:: console
1008
1009  $ less /tmp/libgccjit-1Hywc0/fake.c.036t.release_ssa
1010
1011.. code-block:: c
1012
1013  ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
1014
1015  Released 44 names, 314.29%, removed 44 holes
1016  factorial (signed int arg)
1017  {
1018    signed int stack$0;
1019    signed int mult_acc_1;
1020    <unnamed type> _5;
1021    signed int _6;
1022    signed int _7;
1023    signed int mul_tmp_10;
1024    signed int mult_acc_11;
1025    signed int mult_acc_13;
1026
1027    # arg_9 = PHI <arg_8(D)(0)>
1028    # mult_acc_13 = PHI <1(0)>
1029  initial:
1030
1031    <bb 5>:
1032    # arg_4 = PHI <arg_9(2), _7(3)>
1033    # mult_acc_1 = PHI <mult_acc_13(2), mult_acc_11(3)>
1034    _5 = arg_4 <= 1;
1035    _6 = (signed int) _5;
1036    if (_6 != 0)
1037      goto <bb 4> (instr9);
1038    else
1039      goto <bb 3> (instr4);
1040
1041  instr4:
1042  /* DUP */:
1043    _7 = arg_4 + -1;
1044    mult_acc_11 = mult_acc_1 * arg_4;
1045    goto <bb 5>;
1046
1047    # stack$0_12 = PHI <arg_4(5)>
1048  instr9:
1049  /* RETURN */:
1050    mul_tmp_10 = mult_acc_1 * stack$0_12;
1051    return mul_tmp_10;
1052
1053  }
1054
1055
1056Elimination of tail recursion
1057^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1058Another significant optimization is the detection that the call to
1059``factorial`` is tail recursion, which can be eliminated in favor of
1060an iteration:
1061
1062.. code-block:: console
1063
1064  $ less /tmp/libgccjit-1Hywc0/fake.c.030t.tailr1
1065
1066.. code-block:: c
1067
1068  ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
1069
1070
1071  Symbols to be put in SSA form
1072  { D.88 }
1073  Incremental SSA update started at block: 0
1074  Number of blocks in CFG: 5
1075  Number of blocks to update: 4 ( 80%)
1076
1077
1078  factorial (signed int arg)
1079  {
1080    signed int stack$2;
1081    signed int stack$1;
1082    signed int stack$0;
1083    signed int stack[8];
1084    signed int stack_depth;
1085    signed int x;
1086    signed int y;
1087    signed int mult_acc_1;
1088    <unnamed type> _20;
1089    signed int _21;
1090    signed int _38;
1091    signed int mul_tmp_44;
1092    signed int mult_acc_51;
1093
1094    # arg_5 = PHI <arg_39(D)(0), _38(3)>
1095    # mult_acc_1 = PHI <1(0), mult_acc_51(3)>
1096  initial:
1097    _20 = arg_5 <= 1;
1098    _21 = (signed int) _20;
1099    if (_21 != 0)
1100      goto <bb 4> (instr9);
1101    else
1102      goto <bb 3> (instr4);
1103
1104  instr4:
1105  /* DUP */:
1106    _38 = arg_5 + -1;
1107    mult_acc_51 = mult_acc_1 * arg_5;
1108    goto <bb 2> (initial);
1109
1110    # stack$0_52 = PHI <arg_5(2)>
1111  instr9:
1112  /* RETURN */:
1113    stack ={v} {CLOBBER};
1114    mul_tmp_44 = mult_acc_1 * stack$0_52;
1115    return mul_tmp_44;
1116
1117  }
1118