1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2                      "http://www.w3.org/TR/html4/strict.dtd">
3<html>
4<head>
5  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
6  <title>Source Level Debugging with LLVM</title>
7  <link rel="stylesheet" href="_static/llvm.css" type="text/css">
8</head>
9<body>
10
11<h1>Source Level Debugging with LLVM</h1>
12
13<table class="layout" style="width:100%">
14  <tr class="layout">
15    <td class="left">
16<ul>
17  <li><a href="#introduction">Introduction</a>
18  <ol>
19    <li><a href="#phil">Philosophy behind LLVM debugging information</a></li>
20    <li><a href="#consumers">Debug information consumers</a></li>
21    <li><a href="#debugopt">Debugging optimized code</a></li>
22  </ol></li>
23  <li><a href="#format">Debugging information format</a>
24  <ol>
25    <li><a href="#debug_info_descriptors">Debug information descriptors</a>
26    <ul>
27      <li><a href="#format_compile_units">Compile unit descriptors</a></li>
28      <li><a href="#format_files">File descriptors</a></li>
29      <li><a href="#format_global_variables">Global variable descriptors</a></li>
30      <li><a href="#format_subprograms">Subprogram descriptors</a></li>
31      <li><a href="#format_blocks">Block descriptors</a></li>
32      <li><a href="#format_basic_type">Basic type descriptors</a></li>
33      <li><a href="#format_derived_type">Derived type descriptors</a></li>
34      <li><a href="#format_composite_type">Composite type descriptors</a></li>
35      <li><a href="#format_subrange">Subrange descriptors</a></li>
36      <li><a href="#format_enumeration">Enumerator descriptors</a></li>
37      <li><a href="#format_variables">Local variables</a></li>
38    </ul></li>
39    <li><a href="#format_common_intrinsics">Debugger intrinsic functions</a>
40      <ul>
41      <li><a href="#format_common_declare">llvm.dbg.declare</a></li>
42      <li><a href="#format_common_value">llvm.dbg.value</a></li>
43    </ul></li>
44  </ol></li>
45  <li><a href="#format_common_lifetime">Object lifetimes and scoping</a></li>
46  <li><a href="#ccxx_frontend">C/C++ front-end specific debug information</a>
47  <ol>
48    <li><a href="#ccxx_compile_units">C/C++ source file information</a></li>
49    <li><a href="#ccxx_global_variable">C/C++ global variable information</a></li>
50    <li><a href="#ccxx_subprogram">C/C++ function information</a></li>
51    <li><a href="#ccxx_basic_types">C/C++ basic types</a></li>
52    <li><a href="#ccxx_derived_types">C/C++ derived types</a></li>
53    <li><a href="#ccxx_composite_types">C/C++ struct/union types</a></li>
54    <li><a href="#ccxx_enumeration_types">C/C++ enumeration types</a></li>
55  </ol></li>
56  <li><a href="#llvmdwarfextension">LLVM Dwarf Extensions</a>
57    <ol>
58      <li><a href="#objcproperty">Debugging Information Extension
59	  for Objective C Properties</a>
60        <ul>
61	  <li><a href="#objcpropertyintroduction">Introduction</a></li>
62	  <li><a href="#objcpropertyproposal">Proposal</a></li>
63	  <li><a href="#objcpropertynewattributes">New DWARF Attributes</a></li>
64	  <li><a href="#objcpropertynewconstants">New DWARF Constants</a></li>
65        </ul>
66      </li>
67      <li><a href="#acceltable">Name Accelerator Tables</a>
68        <ul>
69          <li><a href="#acceltableintroduction">Introduction</a></li>
70          <li><a href="#acceltablehashes">Hash Tables</a></li>
71          <li><a href="#acceltabledetails">Details</a></li>
72          <li><a href="#acceltablecontents">Contents</a></li>
73          <li><a href="#acceltableextensions">Language Extensions and File Format Changes</a></li>
74        </ul>
75      </li>
76    </ol>
77  </li>
78</ul>
79</td>
80</tr></table>
81
82<div class="doc_author">
83  <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>
84            and <a href="mailto:jlaskey@mac.com">Jim Laskey</a></p>
85</div>
86
87
88<!-- *********************************************************************** -->
89<h2><a name="introduction">Introduction</a></h2>
90<!-- *********************************************************************** -->
91
92<div>
93
94<p>This document is the central repository for all information pertaining to
95   debug information in LLVM.  It describes the <a href="#format">actual format
96   that the LLVM debug information</a> takes, which is useful for those
97   interested in creating front-ends or dealing directly with the information.
98   Further, this document provides specific examples of what debug information
99   for C/C++ looks like.</p>
100
101<!-- ======================================================================= -->
102<h3>
103  <a name="phil">Philosophy behind LLVM debugging information</a>
104</h3>
105
106<div>
107
108<p>The idea of the LLVM debugging information is to capture how the important
109   pieces of the source-language's Abstract Syntax Tree map onto LLVM code.
110   Several design aspects have shaped the solution that appears here.  The
111   important ones are:</p>
112
113<ul>
114  <li>Debugging information should have very little impact on the rest of the
115      compiler.  No transformations, analyses, or code generators should need to
116      be modified because of debugging information.</li>
117
118  <li>LLVM optimizations should interact in <a href="#debugopt">well-defined and
119      easily described ways</a> with the debugging information.</li>
120
121  <li>Because LLVM is designed to support arbitrary programming languages,
122      LLVM-to-LLVM tools should not need to know anything about the semantics of
123      the source-level-language.</li>
124
125  <li>Source-level languages are often <b>widely</b> different from one another.
126      LLVM should not put any restrictions of the flavor of the source-language,
127      and the debugging information should work with any language.</li>
128
129  <li>With code generator support, it should be possible to use an LLVM compiler
130      to compile a program to native machine code and standard debugging
131      formats.  This allows compatibility with traditional machine-code level
132      debuggers, like GDB or DBX.</li>
133</ul>
134
135<p>The approach used by the LLVM implementation is to use a small set
136   of <a href="#format_common_intrinsics">intrinsic functions</a> to define a
137   mapping between LLVM program objects and the source-level objects.  The
138   description of the source-level program is maintained in LLVM metadata
139   in an <a href="#ccxx_frontend">implementation-defined format</a>
140   (the C/C++ front-end currently uses working draft 7 of
141   the <a href="http://www.eagercon.com/dwarf/dwarf3std.htm">DWARF 3
142   standard</a>).</p>
143
144<p>When a program is being debugged, a debugger interacts with the user and
145   turns the stored debug information into source-language specific information.
146   As such, a debugger must be aware of the source-language, and is thus tied to
147   a specific language or family of languages.</p>
148
149</div>
150
151<!-- ======================================================================= -->
152<h3>
153  <a name="consumers">Debug information consumers</a>
154</h3>
155
156<div>
157
158<p>The role of debug information is to provide meta information normally
159   stripped away during the compilation process.  This meta information provides
160   an LLVM user a relationship between generated code and the original program
161   source code.</p>
162
163<p>Currently, debug information is consumed by DwarfDebug to produce dwarf
164   information used by the gdb debugger.  Other targets could use the same
165   information to produce stabs or other debug forms.</p>
166
167<p>It would also be reasonable to use debug information to feed profiling tools
168   for analysis of generated code, or, tools for reconstructing the original
169   source from generated code.</p>
170
171<p>TODO - expound a bit more.</p>
172
173</div>
174
175<!-- ======================================================================= -->
176<h3>
177  <a name="debugopt">Debugging optimized code</a>
178</h3>
179
180<div>
181
182<p>An extremely high priority of LLVM debugging information is to make it
183   interact well with optimizations and analysis.  In particular, the LLVM debug
184   information provides the following guarantees:</p>
185
186<ul>
187  <li>LLVM debug information <b>always provides information to accurately read
188      the source-level state of the program</b>, regardless of which LLVM
189      optimizations have been run, and without any modification to the
190      optimizations themselves.  However, some optimizations may impact the
191      ability to modify the current state of the program with a debugger, such
192      as setting program variables, or calling functions that have been
193      deleted.</li>
194
195  <li>As desired, LLVM optimizations can be upgraded to be aware of the LLVM
196      debugging information, allowing them to update the debugging information
197      as they perform aggressive optimizations.  This means that, with effort,
198      the LLVM optimizers could optimize debug code just as well as non-debug
199      code.</li>
200
201  <li>LLVM debug information does not prevent optimizations from
202      happening (for example inlining, basic block reordering/merging/cleanup,
203      tail duplication, etc).</li>
204
205  <li>LLVM debug information is automatically optimized along with the rest of
206      the program, using existing facilities.  For example, duplicate
207      information is automatically merged by the linker, and unused information
208      is automatically removed.</li>
209</ul>
210
211<p>Basically, the debug information allows you to compile a program with
212   "<tt>-O0 -g</tt>" and get full debug information, allowing you to arbitrarily
213   modify the program as it executes from a debugger.  Compiling a program with
214   "<tt>-O3 -g</tt>" gives you full debug information that is always available
215   and accurate for reading (e.g., you get accurate stack traces despite tail
216   call elimination and inlining), but you might lose the ability to modify the
217   program and call functions where were optimized out of the program, or
218   inlined away completely.</p>
219
220<p><a href="TestingGuide.html#quicktestsuite">LLVM test suite</a> provides a
221   framework to test optimizer's handling of debugging information. It can be
222   run like this:</p>
223
224<div class="doc_code">
225<pre>
226% cd llvm/projects/test-suite/MultiSource/Benchmarks  # or some other level
227% make TEST=dbgopt
228</pre>
229</div>
230
231<p>This will test impact of debugging information on optimization passes. If
232   debugging information influences optimization passes then it will be reported
233   as a failure. See <a href="TestingGuide.html">TestingGuide</a> for more
234   information on LLVM test infrastructure and how to run various tests.</p>
235
236</div>
237
238</div>
239
240<!-- *********************************************************************** -->
241<h2>
242  <a name="format">Debugging information format</a>
243</h2>
244<!-- *********************************************************************** -->
245
246<div>
247
248<p>LLVM debugging information has been carefully designed to make it possible
249   for the optimizer to optimize the program and debugging information without
250   necessarily having to know anything about debugging information.  In
251   particular, the use of metadata avoids duplicated debugging information from
252   the beginning, and the global dead code elimination pass automatically
253   deletes debugging information for a function if it decides to delete the
254   function. </p>
255
256<p>To do this, most of the debugging information (descriptors for types,
257   variables, functions, source files, etc) is inserted by the language
258   front-end in the form of LLVM metadata. </p>
259
260<p>Debug information is designed to be agnostic about the target debugger and
261   debugging information representation (e.g. DWARF/Stabs/etc).  It uses a
262   generic pass to decode the information that represents variables, types,
263   functions, namespaces, etc: this allows for arbitrary source-language
264   semantics and type-systems to be used, as long as there is a module
265   written for the target debugger to interpret the information. </p>
266
267<p>To provide basic functionality, the LLVM debugger does have to make some
268   assumptions about the source-level language being debugged, though it keeps
269   these to a minimum.  The only common features that the LLVM debugger assumes
270   exist are <a href="#format_files">source files</a>,
271   and <a href="#format_global_variables">program objects</a>.  These abstract
272   objects are used by a debugger to form stack traces, show information about
273   local variables, etc.</p>
274
275<p>This section of the documentation first describes the representation aspects
276   common to any source-language.  The <a href="#ccxx_frontend">next section</a>
277   describes the data layout conventions used by the C and C++ front-ends.</p>
278
279<!-- ======================================================================= -->
280<h3>
281  <a name="debug_info_descriptors">Debug information descriptors</a>
282</h3>
283
284<div>
285
286<p>In consideration of the complexity and volume of debug information, LLVM
287   provides a specification for well formed debug descriptors. </p>
288
289<p>Consumers of LLVM debug information expect the descriptors for program
290   objects to start in a canonical format, but the descriptors can include
291   additional information appended at the end that is source-language
292   specific. All LLVM debugging information is versioned, allowing backwards
293   compatibility in the case that the core structures need to change in some
294   way.  Also, all debugging information objects start with a tag to indicate
295   what type of object it is.  The source-language is allowed to define its own
296   objects, by using unreserved tag numbers.  We recommend using with tags in
297   the range 0x1000 through 0x2000 (there is a defined enum DW_TAG_user_base =
298   0x1000.)</p>
299
300<p>The fields of debug descriptors used internally by LLVM
301   are restricted to only the simple data types <tt>i32</tt>, <tt>i1</tt>,
302   <tt>float</tt>, <tt>double</tt>, <tt>mdstring</tt> and <tt>mdnode</tt>. </p>
303
304<div class="doc_code">
305<pre>
306!1 = metadata !{
307  i32,   ;; A tag
308  ...
309}
310</pre>
311</div>
312
313<p><a name="LLVMDebugVersion">The first field of a descriptor is always an
314   <tt>i32</tt> containing a tag value identifying the content of the
315   descriptor.  The remaining fields are specific to the descriptor.  The values
316   of tags are loosely bound to the tag values of DWARF information entries.
317   However, that does not restrict the use of the information supplied to DWARF
318   targets.  To facilitate versioning of debug information, the tag is augmented
319   with the current debug version (LLVMDebugVersion = 8 &lt;&lt; 16 or
320   0x80000 or 524288.)</a></p>
321
322<p>The details of the various descriptors follow.</p>
323
324<!-- ======================================================================= -->
325<h4>
326  <a name="format_compile_units">Compile unit descriptors</a>
327</h4>
328
329<div>
330
331<div class="doc_code">
332<pre>
333!0 = metadata !{
334  i32,       ;; Tag = 17 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a>
335             ;; (DW_TAG_compile_unit)
336  i32,       ;; Unused field.
337  i32,       ;; DWARF language identifier (ex. DW_LANG_C89)
338  metadata,  ;; Source file name
339  metadata,  ;; Source file directory (includes trailing slash)
340  metadata   ;; Producer (ex. "4.0.1 LLVM (LLVM research group)")
341  i1,        ;; True if this is a main compile unit.
342  i1,        ;; True if this is optimized.
343  metadata,  ;; Flags
344  i32        ;; Runtime version
345  metadata   ;; List of enums types
346  metadata   ;; List of retained types
347  metadata   ;; List of subprograms
348  metadata   ;; List of global variables
349}
350</pre>
351</div>
352
353<p>These descriptors contain a source language ID for the file (we use the DWARF
354   3.0 ID numbers, such as <tt>DW_LANG_C89</tt>, <tt>DW_LANG_C_plus_plus</tt>,
355   <tt>DW_LANG_Cobol74</tt>, etc), three strings describing the filename,
356   working directory of the compiler, and an identifier string for the compiler
357   that produced it.</p>
358
359<p>Compile unit descriptors provide the root context for objects declared in a
360   specific compilation unit. File descriptors are defined using this context.
361   These descriptors are collected by a named metadata
362   <tt>!llvm.dbg.cu</tt>. Compile unit descriptor keeps track of subprograms,
363   global variables and type information.
364
365</div>
366
367<!-- ======================================================================= -->
368<h4>
369  <a name="format_files">File descriptors</a>
370</h4>
371
372<div>
373
374<div class="doc_code">
375<pre>
376!0 = metadata !{
377  i32,       ;; Tag = 41 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a>
378             ;; (DW_TAG_file_type)
379  metadata,  ;; Source file name
380  metadata,  ;; Source file directory (includes trailing slash)
381  metadata   ;; Unused
382}
383</pre>
384</div>
385
386<p>These descriptors contain information for a file. Global variables and top
387   level functions would be defined using this context.k File descriptors also
388   provide context for source line correspondence. </p>
389
390<p>Each input file is encoded as a separate file descriptor in LLVM debugging
391   information output. </p>
392
393</div>
394
395<!-- ======================================================================= -->
396<h4>
397  <a name="format_global_variables">Global variable descriptors</a>
398</h4>
399
400<div>
401
402<div class="doc_code">
403<pre>
404!1 = metadata !{
405  i32,      ;; Tag = 52 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a>
406            ;; (DW_TAG_variable)
407  i32,      ;; Unused field.
408  metadata, ;; Reference to context descriptor
409  metadata, ;; Name
410  metadata, ;; Display name (fully qualified C++ name)
411  metadata, ;; MIPS linkage name (for C++)
412  metadata, ;; Reference to file where defined
413  i32,      ;; Line number where defined
414  metadata, ;; Reference to type descriptor
415  i1,       ;; True if the global is local to compile unit (static)
416  i1,       ;; True if the global is defined in the compile unit (not extern)
417  {}*       ;; Reference to the global variable
418}
419</pre>
420</div>
421
422<p>These descriptors provide debug information about globals variables.  The
423provide details such as name, type and where the variable is defined. All
424global variables are collected inside the named metadata
425<tt>!llvm.dbg.cu</tt>.</p>
426
427</div>
428
429<!-- ======================================================================= -->
430<h4>
431  <a name="format_subprograms">Subprogram descriptors</a>
432</h4>
433
434<div>
435
436<div class="doc_code">
437<pre>
438!2 = metadata !{
439  i32,      ;; Tag = 46 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a>
440            ;; (DW_TAG_subprogram)
441  i32,      ;; Unused field.
442  metadata, ;; Reference to context descriptor
443  metadata, ;; Name
444  metadata, ;; Display name (fully qualified C++ name)
445  metadata, ;; MIPS linkage name (for C++)
446  metadata, ;; Reference to file where defined
447  i32,      ;; Line number where defined
448  metadata, ;; Reference to type descriptor
449  i1,       ;; True if the global is local to compile unit (static)
450  i1,       ;; True if the global is defined in the compile unit (not extern)
451  i32,      ;; Line number where the scope of the subprogram begins
452  i32,      ;; Virtuality, e.g. dwarf::DW_VIRTUALITY__virtual
453  i32,      ;; Index into a virtual function
454  metadata, ;; indicates which base type contains the vtable pointer for the
455            ;; derived class
456  i32,      ;; Flags - Artifical, Private, Protected, Explicit, Prototyped.
457  i1,       ;; isOptimized
458  Function *,;; Pointer to LLVM function
459  metadata, ;; Lists function template parameters
460  metadata  ;; Function declaration descriptor
461  metadata  ;; List of function variables
462}
463</pre>
464</div>
465
466<p>These descriptors provide debug information about functions, methods and
467   subprograms.  They provide details such as name, return types and the source
468   location where the subprogram is defined.
469</p>
470
471</div>
472
473<!-- ======================================================================= -->
474<h4>
475  <a name="format_blocks">Block descriptors</a>
476</h4>
477
478<div>
479
480<div class="doc_code">
481<pre>
482!3 = metadata !{
483  i32,     ;; Tag = 11 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> (DW_TAG_lexical_block)
484  metadata,;; Reference to context descriptor
485  i32,     ;; Line number
486  i32,     ;; Column number
487  metadata,;; Reference to source file
488  i32      ;; Unique ID to identify blocks from a template function
489}
490</pre>
491</div>
492
493<p>This descriptor provides debug information about nested blocks within a
494   subprogram. The line number and column numbers are used to dinstinguish
495   two lexical blocks at same depth. </p>
496
497<div class="doc_code">
498<pre>
499!3 = metadata !{
500  i32,     ;; Tag = 11 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> (DW_TAG_lexical_block)
501  metadata ;; Reference to the scope we're annotating with a file change
502  metadata,;; Reference to the file the scope is enclosed in.
503}
504</pre>
505</div>
506
507<p>This descriptor provides a wrapper around a lexical scope to handle file
508   changes in the middle of a lexical block.</p>
509
510</div>
511
512<!-- ======================================================================= -->
513<h4>
514  <a name="format_basic_type">Basic type descriptors</a>
515</h4>
516
517<div>
518
519<div class="doc_code">
520<pre>
521!4 = metadata !{
522  i32,      ;; Tag = 36 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a>
523            ;; (DW_TAG_base_type)
524  metadata, ;; Reference to context
525  metadata, ;; Name (may be "" for anonymous types)
526  metadata, ;; Reference to file where defined (may be NULL)
527  i32,      ;; Line number where defined (may be 0)
528  i64,      ;; Size in bits
529  i64,      ;; Alignment in bits
530  i64,      ;; Offset in bits
531  i32,      ;; Flags
532  i32       ;; DWARF type encoding
533}
534</pre>
535</div>
536
537<p>These descriptors define primitive types used in the code. Example int, bool
538   and float.  The context provides the scope of the type, which is usually the
539   top level.  Since basic types are not usually user defined the context
540   and line number can be left as NULL and 0.  The size, alignment and offset
541   are expressed in bits and can be 64 bit values.  The alignment is used to
542   round the offset when embedded in a
543   <a href="#format_composite_type">composite type</a> (example to keep float
544   doubles on 64 bit boundaries.) The offset is the bit offset if embedded in
545   a <a href="#format_composite_type">composite type</a>.</p>
546
547<p>The type encoding provides the details of the type.  The values are typically
548   one of the following:</p>
549
550<div class="doc_code">
551<pre>
552DW_ATE_address       = 1
553DW_ATE_boolean       = 2
554DW_ATE_float         = 4
555DW_ATE_signed        = 5
556DW_ATE_signed_char   = 6
557DW_ATE_unsigned      = 7
558DW_ATE_unsigned_char = 8
559</pre>
560</div>
561
562</div>
563
564<!-- ======================================================================= -->
565<h4>
566  <a name="format_derived_type">Derived type descriptors</a>
567</h4>
568
569<div>
570
571<div class="doc_code">
572<pre>
573!5 = metadata !{
574  i32,      ;; Tag (see below)
575  metadata, ;; Reference to context
576  metadata, ;; Name (may be "" for anonymous types)
577  metadata, ;; Reference to file where defined (may be NULL)
578  i32,      ;; Line number where defined (may be 0)
579  i64,      ;; Size in bits
580  i64,      ;; Alignment in bits
581  i64,      ;; Offset in bits
582  i32,      ;; Flags to encode attributes, e.g. private
583  metadata, ;; Reference to type derived from
584  metadata, ;; (optional) Name of the Objective C property associated with
585            ;; Objective-C an ivar
586  metadata, ;; (optional) Name of the Objective C property getter selector.
587  metadata, ;; (optional) Name of the Objective C property setter selector.
588  i32       ;; (optional) Objective C property attributes.
589}
590</pre>
591</div>
592
593<p>These descriptors are used to define types derived from other types.  The
594value of the tag varies depending on the meaning.  The following are possible
595tag values:</p>
596
597<div class="doc_code">
598<pre>
599DW_TAG_formal_parameter = 5
600DW_TAG_member           = 13
601DW_TAG_pointer_type     = 15
602DW_TAG_reference_type   = 16
603DW_TAG_typedef          = 22
604DW_TAG_const_type       = 38
605DW_TAG_volatile_type    = 53
606DW_TAG_restrict_type    = 55
607</pre>
608</div>
609
610<p><tt>DW_TAG_member</tt> is used to define a member of
611   a <a href="#format_composite_type">composite type</a>
612   or <a href="#format_subprograms">subprogram</a>.  The type of the member is
613   the <a href="#format_derived_type">derived
614   type</a>. <tt>DW_TAG_formal_parameter</tt> is used to define a member which
615   is a formal argument of a subprogram.</p>
616
617<p><tt>DW_TAG_typedef</tt> is used to provide a name for the derived type.</p>
618
619<p><tt>DW_TAG_pointer_type</tt>, <tt>DW_TAG_reference_type</tt>,
620   <tt>DW_TAG_const_type</tt>, <tt>DW_TAG_volatile_type</tt> and
621   <tt>DW_TAG_restrict_type</tt> are used to qualify
622   the <a href="#format_derived_type">derived type</a>. </p>
623
624<p><a href="#format_derived_type">Derived type</a> location can be determined
625   from the context and line number.  The size, alignment and offset are
626   expressed in bits and can be 64 bit values.  The alignment is used to round
627   the offset when embedded in a <a href="#format_composite_type">composite
628   type</a> (example to keep float doubles on 64 bit boundaries.) The offset is
629   the bit offset if embedded in a <a href="#format_composite_type">composite
630   type</a>.</p>
631
632<p>Note that the <tt>void *</tt> type is expressed as a type derived from NULL.
633</p>
634
635</div>
636
637<!-- ======================================================================= -->
638<h4>
639  <a name="format_composite_type">Composite type descriptors</a>
640</h4>
641
642<div>
643
644<div class="doc_code">
645<pre>
646!6 = metadata !{
647  i32,      ;; Tag (see below)
648  metadata, ;; Reference to context
649  metadata, ;; Name (may be "" for anonymous types)
650  metadata, ;; Reference to file where defined (may be NULL)
651  i32,      ;; Line number where defined (may be 0)
652  i64,      ;; Size in bits
653  i64,      ;; Alignment in bits
654  i64,      ;; Offset in bits
655  i32,      ;; Flags
656  metadata, ;; Reference to type derived from
657  metadata, ;; Reference to array of member descriptors
658  i32       ;; Runtime languages
659}
660</pre>
661</div>
662
663<p>These descriptors are used to define types that are composed of 0 or more
664elements.  The value of the tag varies depending on the meaning.  The following
665are possible tag values:</p>
666
667<div class="doc_code">
668<pre>
669DW_TAG_array_type       = 1
670DW_TAG_enumeration_type = 4
671DW_TAG_structure_type   = 19
672DW_TAG_union_type       = 23
673DW_TAG_vector_type      = 259
674DW_TAG_subroutine_type  = 21
675DW_TAG_inheritance      = 28
676</pre>
677</div>
678
679<p>The vector flag indicates that an array type is a native packed vector.</p>
680
681<p>The members of array types (tag = <tt>DW_TAG_array_type</tt>) or vector types
682   (tag = <tt>DW_TAG_vector_type</tt>) are <a href="#format_subrange">subrange
683   descriptors</a>, each representing the range of subscripts at that level of
684   indexing.</p>
685
686<p>The members of enumeration types (tag = <tt>DW_TAG_enumeration_type</tt>) are
687   <a href="#format_enumeration">enumerator descriptors</a>, each representing
688   the definition of enumeration value for the set. All enumeration type
689   descriptors are collected inside the named metadata
690   <tt>!llvm.dbg.cu</tt>.</p>
691
692<p>The members of structure (tag = <tt>DW_TAG_structure_type</tt>) or union (tag
693   = <tt>DW_TAG_union_type</tt>) types are any one of
694   the <a href="#format_basic_type">basic</a>,
695   <a href="#format_derived_type">derived</a>
696   or <a href="#format_composite_type">composite</a> type descriptors, each
697   representing a field member of the structure or union.</p>
698
699<p>For C++ classes (tag = <tt>DW_TAG_structure_type</tt>), member descriptors
700   provide information about base classes, static members and member
701   functions. If a member is a <a href="#format_derived_type">derived type
702   descriptor</a> and has a tag of <tt>DW_TAG_inheritance</tt>, then the type
703   represents a base class. If the member of is
704   a <a href="#format_global_variables">global variable descriptor</a> then it
705   represents a static member.  And, if the member is
706   a <a href="#format_subprograms">subprogram descriptor</a> then it represents
707   a member function.  For static members and member
708   functions, <tt>getName()</tt> returns the members link or the C++ mangled
709   name.  <tt>getDisplayName()</tt> the simplied version of the name.</p>
710
711<p>The first member of subroutine (tag = <tt>DW_TAG_subroutine_type</tt>) type
712   elements is the return type for the subroutine.  The remaining elements are
713   the formal arguments to the subroutine.</p>
714
715<p><a href="#format_composite_type">Composite type</a> location can be
716   determined from the context and line number.  The size, alignment and
717   offset are expressed in bits and can be 64 bit values.  The alignment is used
718   to round the offset when embedded in
719   a <a href="#format_composite_type">composite type</a> (as an example, to keep
720   float doubles on 64 bit boundaries.) The offset is the bit offset if embedded
721   in a <a href="#format_composite_type">composite type</a>.</p>
722
723</div>
724
725<!-- ======================================================================= -->
726<h4>
727  <a name="format_subrange">Subrange descriptors</a>
728</h4>
729
730<div>
731
732<div class="doc_code">
733<pre>
734!42 = metadata !{
735  i32,    ;; Tag = 33 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> (DW_TAG_subrange_type)
736  i64,    ;; Low value
737  i64     ;; High value
738}
739</pre>
740</div>
741
742<p>These descriptors are used to define ranges of array subscripts for an array
743   <a href="#format_composite_type">composite type</a>.  The low value defines
744   the lower bounds typically zero for C/C++.  The high value is the upper
745   bounds.  Values are 64 bit.  High - low + 1 is the size of the array.  If low
746   > high the array bounds are not included in generated debugging information.
747</p>
748
749</div>
750
751<!-- ======================================================================= -->
752<h4>
753  <a name="format_enumeration">Enumerator descriptors</a>
754</h4>
755
756<div>
757
758<div class="doc_code">
759<pre>
760!6 = metadata !{
761  i32,      ;; Tag = 40 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a>
762            ;; (DW_TAG_enumerator)
763  metadata, ;; Name
764  i64       ;; Value
765}
766</pre>
767</div>
768
769<p>These descriptors are used to define members of an
770   enumeration <a href="#format_composite_type">composite type</a>, it
771   associates the name to the value.</p>
772
773</div>
774
775<!-- ======================================================================= -->
776<h4>
777  <a name="format_variables">Local variables</a>
778</h4>
779
780<div>
781
782<div class="doc_code">
783<pre>
784!7 = metadata !{
785  i32,      ;; Tag (see below)
786  metadata, ;; Context
787  metadata, ;; Name
788  metadata, ;; Reference to file where defined
789  i32,      ;; 24 bit - Line number where defined
790            ;; 8 bit - Argument number. 1 indicates 1st argument.
791  metadata, ;; Type descriptor
792  i32,      ;; flags
793  metadata  ;; (optional) Reference to inline location
794}
795</pre>
796</div>
797
798<p>These descriptors are used to define variables local to a sub program.  The
799   value of the tag depends on the usage of the variable:</p>
800
801<div class="doc_code">
802<pre>
803DW_TAG_auto_variable   = 256
804DW_TAG_arg_variable    = 257
805DW_TAG_return_variable = 258
806</pre>
807</div>
808
809<p>An auto variable is any variable declared in the body of the function.  An
810   argument variable is any variable that appears as a formal argument to the
811   function.  A return variable is used to track the result of a function and
812   has no source correspondent.</p>
813
814<p>The context is either the subprogram or block where the variable is defined.
815   Name the source variable name.  Context and line indicate where the
816   variable was defined. Type descriptor defines the declared type of the
817   variable.</p>
818
819</div>
820
821</div>
822
823<!-- ======================================================================= -->
824<h3>
825  <a name="format_common_intrinsics">Debugger intrinsic functions</a>
826</h3>
827
828<div>
829
830<p>LLVM uses several intrinsic functions (name prefixed with "llvm.dbg") to
831   provide debug information at various points in generated code.</p>
832
833<!-- ======================================================================= -->
834<h4>
835  <a name="format_common_declare">llvm.dbg.declare</a>
836</h4>
837
838<div>
839<pre>
840  void %<a href="#format_common_declare">llvm.dbg.declare</a>(metadata, metadata)
841</pre>
842
843<p>This intrinsic provides information about a local element (e.g., variable). The
844   first argument is metadata holding the alloca for the variable. The
845   second argument is metadata containing a description of the variable.</p>
846</div>
847
848<!-- ======================================================================= -->
849<h4>
850  <a name="format_common_value">llvm.dbg.value</a>
851</h4>
852
853<div>
854<pre>
855  void %<a href="#format_common_value">llvm.dbg.value</a>(metadata, i64, metadata)
856</pre>
857
858<p>This intrinsic provides information when a user source variable is set to a
859   new value.  The first argument is the new value (wrapped as metadata).  The
860   second argument is the offset in the user source variable where the new value
861   is written.  The third argument is metadata containing a description of the
862   user source variable.</p>
863</div>
864
865</div>
866
867<!-- ======================================================================= -->
868<h3>
869  <a name="format_common_lifetime">Object lifetimes and scoping</a>
870</h3>
871
872<div>
873<p>In many languages, the local variables in functions can have their lifetimes
874   or scopes limited to a subset of a function.  In the C family of languages,
875   for example, variables are only live (readable and writable) within the
876   source block that they are defined in.  In functional languages, values are
877   only readable after they have been defined.  Though this is a very obvious
878   concept, it is non-trivial to model in LLVM, because it has no notion of
879   scoping in this sense, and does not want to be tied to a language's scoping
880   rules.</p>
881
882<p>In order to handle this, the LLVM debug format uses the metadata attached to
883   llvm instructions to encode line number and scoping information. Consider
884   the following C fragment, for example:</p>
885
886<div class="doc_code">
887<pre>
8881.  void foo() {
8892.    int X = 21;
8903.    int Y = 22;
8914.    {
8925.      int Z = 23;
8936.      Z = X;
8947.    }
8958.    X = Y;
8969.  }
897</pre>
898</div>
899
900<p>Compiled to LLVM, this function would be represented like this:</p>
901
902<div class="doc_code">
903<pre>
904define void @foo() nounwind ssp {
905entry:
906  %X = alloca i32, align 4                        ; &lt;i32*&gt; [#uses=4]
907  %Y = alloca i32, align 4                        ; &lt;i32*&gt; [#uses=4]
908  %Z = alloca i32, align 4                        ; &lt;i32*&gt; [#uses=3]
909  %0 = bitcast i32* %X to {}*                     ; &lt;{}*&gt; [#uses=1]
910  call void @llvm.dbg.declare(metadata !{i32 * %X}, metadata !0), !dbg !7
911  store i32 21, i32* %X, !dbg !8
912  %1 = bitcast i32* %Y to {}*                     ; &lt;{}*&gt; [#uses=1]
913  call void @llvm.dbg.declare(metadata !{i32 * %Y}, metadata !9), !dbg !10
914  store i32 22, i32* %Y, !dbg !11
915  %2 = bitcast i32* %Z to {}*                     ; &lt;{}*&gt; [#uses=1]
916  call void @llvm.dbg.declare(metadata !{i32 * %Z}, metadata !12), !dbg !14
917  store i32 23, i32* %Z, !dbg !15
918  %tmp = load i32* %X, !dbg !16                   ; &lt;i32&gt; [#uses=1]
919  %tmp1 = load i32* %Y, !dbg !16                  ; &lt;i32&gt; [#uses=1]
920  %add = add nsw i32 %tmp, %tmp1, !dbg !16        ; &lt;i32&gt; [#uses=1]
921  store i32 %add, i32* %Z, !dbg !16
922  %tmp2 = load i32* %Y, !dbg !17                  ; &lt;i32&gt; [#uses=1]
923  store i32 %tmp2, i32* %X, !dbg !17
924  ret void, !dbg !18
925}
926
927declare void @llvm.dbg.declare(metadata, metadata) nounwind readnone
928
929!0 = metadata !{i32 459008, metadata !1, metadata !"X",
930                metadata !3, i32 2, metadata !6}; [ DW_TAG_auto_variable ]
931!1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ]
932!2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo", metadata !"foo",
933               metadata !"foo", metadata !3, i32 1, metadata !4,
934               i1 false, i1 true}; [DW_TAG_subprogram ]
935!3 = metadata !{i32 458769, i32 0, i32 12, metadata !"foo.c",
936                metadata !"/private/tmp", metadata !"clang 1.1", i1 true,
937                i1 false, metadata !"", i32 0}; [DW_TAG_compile_unit ]
938!4 = metadata !{i32 458773, metadata !3, metadata !"", null, i32 0, i64 0, i64 0,
939                i64 0, i32 0, null, metadata !5, i32 0}; [DW_TAG_subroutine_type ]
940!5 = metadata !{null}
941!6 = metadata !{i32 458788, metadata !3, metadata !"int", metadata !3, i32 0,
942                i64 32, i64 32, i64 0, i32 0, i32 5}; [DW_TAG_base_type ]
943!7 = metadata !{i32 2, i32 7, metadata !1, null}
944!8 = metadata !{i32 2, i32 3, metadata !1, null}
945!9 = metadata !{i32 459008, metadata !1, metadata !"Y", metadata !3, i32 3,
946                metadata !6}; [ DW_TAG_auto_variable ]
947!10 = metadata !{i32 3, i32 7, metadata !1, null}
948!11 = metadata !{i32 3, i32 3, metadata !1, null}
949!12 = metadata !{i32 459008, metadata !13, metadata !"Z", metadata !3, i32 5,
950                 metadata !6}; [ DW_TAG_auto_variable ]
951!13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ]
952!14 = metadata !{i32 5, i32 9, metadata !13, null}
953!15 = metadata !{i32 5, i32 5, metadata !13, null}
954!16 = metadata !{i32 6, i32 5, metadata !13, null}
955!17 = metadata !{i32 8, i32 3, metadata !1, null}
956!18 = metadata !{i32 9, i32 1, metadata !2, null}
957</pre>
958</div>
959
960<p>This example illustrates a few important details about LLVM debugging
961   information. In particular, it shows how the <tt>llvm.dbg.declare</tt>
962   intrinsic and location information, which are attached to an instruction,
963   are applied together to allow a debugger to analyze the relationship between
964   statements, variable definitions, and the code used to implement the
965   function.</p>
966
967<div class="doc_code">
968<pre>
969call void @llvm.dbg.declare(metadata, metadata !0), !dbg !7
970</pre>
971</div>
972
973<p>The first intrinsic
974   <tt>%<a href="#format_common_declare">llvm.dbg.declare</a></tt>
975   encodes debugging information for the variable <tt>X</tt>. The metadata
976   <tt>!dbg !7</tt> attached to the intrinsic provides scope information for the
977   variable <tt>X</tt>.</p>
978
979<div class="doc_code">
980<pre>
981!7 = metadata !{i32 2, i32 7, metadata !1, null}
982!1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ]
983!2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo",
984                metadata !"foo", metadata !"foo", metadata !3, i32 1,
985                metadata !4, i1 false, i1 true}; [DW_TAG_subprogram ]
986</pre>
987</div>
988
989<p>Here <tt>!7</tt> is metadata providing location information. It has four
990   fields: line number, column number, scope, and original scope. The original
991   scope represents inline location if this instruction is inlined inside a
992   caller, and is null otherwise. In this example, scope is encoded by
993   <tt>!1</tt>. <tt>!1</tt> represents a lexical block inside the scope
994   <tt>!2</tt>, where <tt>!2</tt> is a
995   <a href="#format_subprograms">subprogram descriptor</a>. This way the
996   location information attached to the intrinsics indicates that the
997   variable <tt>X</tt> is declared at line number 2 at a function level scope in
998   function <tt>foo</tt>.</p>
999
1000<p>Now lets take another example.</p>
1001
1002<div class="doc_code">
1003<pre>
1004call void @llvm.dbg.declare(metadata, metadata !12), !dbg !14
1005</pre>
1006</div>
1007
1008<p>The second intrinsic
1009   <tt>%<a href="#format_common_declare">llvm.dbg.declare</a></tt>
1010   encodes debugging information for variable <tt>Z</tt>. The metadata
1011   <tt>!dbg !14</tt> attached to the intrinsic provides scope information for
1012   the variable <tt>Z</tt>.</p>
1013
1014<div class="doc_code">
1015<pre>
1016!13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ]
1017!14 = metadata !{i32 5, i32 9, metadata !13, null}
1018</pre>
1019</div>
1020
1021<p>Here <tt>!14</tt> indicates that <tt>Z</tt> is declared at line number 5 and
1022   column number 9 inside of lexical scope <tt>!13</tt>. The lexical scope
1023   itself resides inside of lexical scope <tt>!1</tt> described above.</p>
1024
1025<p>The scope information attached with each instruction provides a
1026   straightforward way to find instructions covered by a scope.</p>
1027
1028</div>
1029
1030</div>
1031
1032<!-- *********************************************************************** -->
1033<h2>
1034  <a name="ccxx_frontend">C/C++ front-end specific debug information</a>
1035</h2>
1036<!-- *********************************************************************** -->
1037
1038<div>
1039
1040<p>The C and C++ front-ends represent information about the program in a format
1041   that is effectively identical
1042   to <a href="http://www.eagercon.com/dwarf/dwarf3std.htm">DWARF 3.0</a> in
1043   terms of information content.  This allows code generators to trivially
1044   support native debuggers by generating standard dwarf information, and
1045   contains enough information for non-dwarf targets to translate it as
1046   needed.</p>
1047
1048<p>This section describes the forms used to represent C and C++ programs. Other
1049   languages could pattern themselves after this (which itself is tuned to
1050   representing programs in the same way that DWARF 3 does), or they could
1051   choose to provide completely different forms if they don't fit into the DWARF
1052   model.  As support for debugging information gets added to the various LLVM
1053   source-language front-ends, the information used should be documented
1054   here.</p>
1055
1056<p>The following sections provide examples of various C/C++ constructs and the
1057   debug information that would best describe those constructs.</p>
1058
1059<!-- ======================================================================= -->
1060<h3>
1061  <a name="ccxx_compile_units">C/C++ source file information</a>
1062</h3>
1063
1064<div>
1065
1066<p>Given the source files <tt>MySource.cpp</tt> and <tt>MyHeader.h</tt> located
1067   in the directory <tt>/Users/mine/sources</tt>, the following code:</p>
1068
1069<div class="doc_code">
1070<pre>
1071#include "MyHeader.h"
1072
1073int main(int argc, char *argv[]) {
1074  return 0;
1075}
1076</pre>
1077</div>
1078
1079<p>a C/C++ front-end would generate the following descriptors:</p>
1080
1081<div class="doc_code">
1082<pre>
1083...
1084;;
1085;; Define the compile unit for the main source file "/Users/mine/sources/MySource.cpp".
1086;;
1087!2 = metadata !{
1088  i32 524305,    ;; Tag
1089  i32 0,         ;; Unused
1090  i32 4,         ;; Language Id
1091  metadata !"MySource.cpp",
1092  metadata !"/Users/mine/sources",
1093  metadata !"4.2.1 (Based on Apple Inc. build 5649) (LLVM build 00)",
1094  i1 true,       ;; Main Compile Unit
1095  i1 false,      ;; Optimized compile unit
1096  metadata !"",  ;; Compiler flags
1097  i32 0}         ;; Runtime version
1098
1099;;
1100;; Define the file for the file "/Users/mine/sources/MySource.cpp".
1101;;
1102!1 = metadata !{
1103  i32 524329,    ;; Tag
1104  metadata !"MySource.cpp",
1105  metadata !"/Users/mine/sources",
1106  metadata !2    ;; Compile unit
1107}
1108
1109;;
1110;; Define the file for the file "/Users/mine/sources/Myheader.h"
1111;;
1112!3 = metadata !{
1113  i32 524329,    ;; Tag
1114  metadata !"Myheader.h"
1115  metadata !"/Users/mine/sources",
1116  metadata !2    ;; Compile unit
1117}
1118
1119...
1120</pre>
1121</div>
1122
1123<p>llvm::Instruction provides easy access to metadata attached with an
1124instruction. One can extract line number information encoded in LLVM IR
1125using <tt>Instruction::getMetadata()</tt> and
1126<tt>DILocation::getLineNumber()</tt>.
1127<pre>
1128 if (MDNode *N = I->getMetadata("dbg")) {  // Here I is an LLVM instruction
1129   DILocation Loc(N);                      // DILocation is in DebugInfo.h
1130   unsigned Line = Loc.getLineNumber();
1131   StringRef File = Loc.getFilename();
1132   StringRef Dir = Loc.getDirectory();
1133 }
1134</pre>
1135</div>
1136
1137<!-- ======================================================================= -->
1138<h3>
1139  <a name="ccxx_global_variable">C/C++ global variable information</a>
1140</h3>
1141
1142<div>
1143
1144<p>Given an integer global variable declared as follows:</p>
1145
1146<div class="doc_code">
1147<pre>
1148int MyGlobal = 100;
1149</pre>
1150</div>
1151
1152<p>a C/C++ front-end would generate the following descriptors:</p>
1153
1154<div class="doc_code">
1155<pre>
1156;;
1157;; Define the global itself.
1158;;
1159%MyGlobal = global int 100
1160...
1161;;
1162;; List of debug info of globals
1163;;
1164!llvm.dbg.cu = !{!0}
1165
1166;; Define the compile unit.
1167!0 = metadata !{
1168  i32 786449,                       ;; Tag
1169  i32 0,                            ;; Context
1170  i32 4,                            ;; Language
1171  metadata !"foo.cpp",              ;; File
1172  metadata !"/Volumes/Data/tmp",    ;; Directory
1173  metadata !"clang version 3.1 ",   ;; Producer
1174  i1 true,                          ;; Deprecated field
1175  i1 false,                         ;; "isOptimized"?
1176  metadata !"",                     ;; Flags
1177  i32 0,                            ;; Runtime Version
1178  metadata !1,                      ;; Enum Types
1179  metadata !1,                      ;; Retained Types
1180  metadata !1,                      ;; Subprograms
1181  metadata !3                       ;; Global Variables
1182} ; [ DW_TAG_compile_unit ]
1183
1184;; The Array of Global Variables
1185!3 = metadata !{
1186  metadata !4
1187}
1188
1189!4 = metadata !{
1190  metadata !5
1191}
1192
1193;;
1194;; Define the global variable itself.
1195;;
1196!5 = metadata !{
1197  i32 786484,                        ;; Tag
1198  i32 0,                             ;; Unused
1199  null,                              ;; Unused
1200  metadata !"MyGlobal",              ;; Name
1201  metadata !"MyGlobal",              ;; Display Name
1202  metadata !"",                      ;; Linkage Name
1203  metadata !6,                       ;; File
1204  i32 1,                             ;; Line
1205  metadata !7,                       ;; Type
1206  i32 0,                             ;; IsLocalToUnit
1207  i32 1,                             ;; IsDefinition
1208  i32* @MyGlobal                     ;; LLVM-IR Value
1209} ; [ DW_TAG_variable ]
1210
1211;;
1212;; Define the file
1213;;
1214!6 = metadata !{
1215  i32 786473,                        ;; Tag
1216  metadata !"foo.cpp",               ;; File
1217  metadata !"/Volumes/Data/tmp",     ;; Directory
1218  null                               ;; Unused
1219} ; [ DW_TAG_file_type ]
1220
1221;;
1222;; Define the type
1223;;
1224!7 = metadata !{
1225  i32 786468,                         ;; Tag
1226  null,                               ;; Unused
1227  metadata !"int",                    ;; Name
1228  null,                               ;; Unused
1229  i32 0,                              ;; Line
1230  i64 32,                             ;; Size in Bits
1231  i64 32,                             ;; Align in Bits
1232  i64 0,                              ;; Offset
1233  i32 0,                              ;; Flags
1234  i32 5                               ;; Encoding
1235} ; [ DW_TAG_base_type ]
1236
1237</pre>
1238</div>
1239
1240</div>
1241
1242<!-- ======================================================================= -->
1243<h3>
1244  <a name="ccxx_subprogram">C/C++ function information</a>
1245</h3>
1246
1247<div>
1248
1249<p>Given a function declared as follows:</p>
1250
1251<div class="doc_code">
1252<pre>
1253int main(int argc, char *argv[]) {
1254  return 0;
1255}
1256</pre>
1257</div>
1258
1259<p>a C/C++ front-end would generate the following descriptors:</p>
1260
1261<div class="doc_code">
1262<pre>
1263;;
1264;; Define the anchor for subprograms.  Note that the second field of the
1265;; anchor is 46, which is the same as the tag for subprograms
1266;; (46 = DW_TAG_subprogram.)
1267;;
1268!6 = metadata !{
1269  i32 524334,        ;; Tag
1270  i32 0,             ;; Unused
1271  metadata !1,       ;; Context
1272  metadata !"main",  ;; Name
1273  metadata !"main",  ;; Display name
1274  metadata !"main",  ;; Linkage name
1275  metadata !1,       ;; File
1276  i32 1,             ;; Line number
1277  metadata !4,       ;; Type
1278  i1 false,          ;; Is local
1279  i1 true,           ;; Is definition
1280  i32 0,             ;; Virtuality attribute, e.g. pure virtual function
1281  i32 0,             ;; Index into virtual table for C++ methods
1282  i32 0,             ;; Type that holds virtual table.
1283  i32 0,             ;; Flags
1284  i1 false,          ;; True if this function is optimized
1285  Function *,        ;; Pointer to llvm::Function
1286  null               ;; Function template parameters
1287}
1288;;
1289;; Define the subprogram itself.
1290;;
1291define i32 @main(i32 %argc, i8** %argv) {
1292...
1293}
1294</pre>
1295</div>
1296
1297</div>
1298
1299<!-- ======================================================================= -->
1300<h3>
1301  <a name="ccxx_basic_types">C/C++ basic types</a>
1302</h3>
1303
1304<div>
1305
1306<p>The following are the basic type descriptors for C/C++ core types:</p>
1307
1308<!-- ======================================================================= -->
1309<h4>
1310  <a name="ccxx_basic_type_bool">bool</a>
1311</h4>
1312
1313<div>
1314
1315<div class="doc_code">
1316<pre>
1317!2 = metadata !{
1318  i32 524324,        ;; Tag
1319  metadata !1,       ;; Context
1320  metadata !"bool",  ;; Name
1321  metadata !1,       ;; File
1322  i32 0,             ;; Line number
1323  i64 8,             ;; Size in Bits
1324  i64 8,             ;; Align in Bits
1325  i64 0,             ;; Offset in Bits
1326  i32 0,             ;; Flags
1327  i32 2              ;; Encoding
1328}
1329</pre>
1330</div>
1331
1332</div>
1333
1334<!-- ======================================================================= -->
1335<h4>
1336  <a name="ccxx_basic_char">char</a>
1337</h4>
1338
1339<div>
1340
1341<div class="doc_code">
1342<pre>
1343!2 = metadata !{
1344  i32 524324,        ;; Tag
1345  metadata !1,       ;; Context
1346  metadata !"char",  ;; Name
1347  metadata !1,       ;; File
1348  i32 0,             ;; Line number
1349  i64 8,             ;; Size in Bits
1350  i64 8,             ;; Align in Bits
1351  i64 0,             ;; Offset in Bits
1352  i32 0,             ;; Flags
1353  i32 6              ;; Encoding
1354}
1355</pre>
1356</div>
1357
1358</div>
1359
1360<!-- ======================================================================= -->
1361<h4>
1362  <a name="ccxx_basic_unsigned_char">unsigned char</a>
1363</h4>
1364
1365<div>
1366
1367<div class="doc_code">
1368<pre>
1369!2 = metadata !{
1370  i32 524324,        ;; Tag
1371  metadata !1,       ;; Context
1372  metadata !"unsigned char",
1373  metadata !1,       ;; File
1374  i32 0,             ;; Line number
1375  i64 8,             ;; Size in Bits
1376  i64 8,             ;; Align in Bits
1377  i64 0,             ;; Offset in Bits
1378  i32 0,             ;; Flags
1379  i32 8              ;; Encoding
1380}
1381</pre>
1382</div>
1383
1384</div>
1385
1386<!-- ======================================================================= -->
1387<h4>
1388  <a name="ccxx_basic_short">short</a>
1389</h4>
1390
1391<div>
1392
1393<div class="doc_code">
1394<pre>
1395!2 = metadata !{
1396  i32 524324,        ;; Tag
1397  metadata !1,       ;; Context
1398  metadata !"short int",
1399  metadata !1,       ;; File
1400  i32 0,             ;; Line number
1401  i64 16,            ;; Size in Bits
1402  i64 16,            ;; Align in Bits
1403  i64 0,             ;; Offset in Bits
1404  i32 0,             ;; Flags
1405  i32 5              ;; Encoding
1406}
1407</pre>
1408</div>
1409
1410</div>
1411
1412<!-- ======================================================================= -->
1413<h4>
1414  <a name="ccxx_basic_unsigned_short">unsigned short</a>
1415</h4>
1416
1417<div>
1418
1419<div class="doc_code">
1420<pre>
1421!2 = metadata !{
1422  i32 524324,        ;; Tag
1423  metadata !1,       ;; Context
1424  metadata !"short unsigned int",
1425  metadata !1,       ;; File
1426  i32 0,             ;; Line number
1427  i64 16,            ;; Size in Bits
1428  i64 16,            ;; Align in Bits
1429  i64 0,             ;; Offset in Bits
1430  i32 0,             ;; Flags
1431  i32 7              ;; Encoding
1432}
1433</pre>
1434</div>
1435
1436</div>
1437
1438<!-- ======================================================================= -->
1439<h4>
1440  <a name="ccxx_basic_int">int</a>
1441</h4>
1442
1443<div>
1444
1445<div class="doc_code">
1446<pre>
1447!2 = metadata !{
1448  i32 524324,        ;; Tag
1449  metadata !1,       ;; Context
1450  metadata !"int",   ;; Name
1451  metadata !1,       ;; File
1452  i32 0,             ;; Line number
1453  i64 32,            ;; Size in Bits
1454  i64 32,            ;; Align in Bits
1455  i64 0,             ;; Offset in Bits
1456  i32 0,             ;; Flags
1457  i32 5              ;; Encoding
1458}
1459</pre></div>
1460
1461</div>
1462
1463<!-- ======================================================================= -->
1464<h4>
1465  <a name="ccxx_basic_unsigned_int">unsigned int</a>
1466</h4>
1467
1468<div>
1469
1470<div class="doc_code">
1471<pre>
1472!2 = metadata !{
1473  i32 524324,        ;; Tag
1474  metadata !1,       ;; Context
1475  metadata !"unsigned int",
1476  metadata !1,       ;; File
1477  i32 0,             ;; Line number
1478  i64 32,            ;; Size in Bits
1479  i64 32,            ;; Align in Bits
1480  i64 0,             ;; Offset in Bits
1481  i32 0,             ;; Flags
1482  i32 7              ;; Encoding
1483}
1484</pre>
1485</div>
1486
1487</div>
1488
1489<!-- ======================================================================= -->
1490<h4>
1491  <a name="ccxx_basic_long_long">long long</a>
1492</h4>
1493
1494<div>
1495
1496<div class="doc_code">
1497<pre>
1498!2 = metadata !{
1499  i32 524324,        ;; Tag
1500  metadata !1,       ;; Context
1501  metadata !"long long int",
1502  metadata !1,       ;; File
1503  i32 0,             ;; Line number
1504  i64 64,            ;; Size in Bits
1505  i64 64,            ;; Align in Bits
1506  i64 0,             ;; Offset in Bits
1507  i32 0,             ;; Flags
1508  i32 5              ;; Encoding
1509}
1510</pre>
1511</div>
1512
1513</div>
1514
1515<!-- ======================================================================= -->
1516<h4>
1517  <a name="ccxx_basic_unsigned_long_long">unsigned long long</a>
1518</h4>
1519
1520<div>
1521
1522<div class="doc_code">
1523<pre>
1524!2 = metadata !{
1525  i32 524324,        ;; Tag
1526  metadata !1,       ;; Context
1527  metadata !"long long unsigned int",
1528  metadata !1,       ;; File
1529  i32 0,             ;; Line number
1530  i64 64,            ;; Size in Bits
1531  i64 64,            ;; Align in Bits
1532  i64 0,             ;; Offset in Bits
1533  i32 0,             ;; Flags
1534  i32 7              ;; Encoding
1535}
1536</pre>
1537</div>
1538
1539</div>
1540
1541<!-- ======================================================================= -->
1542<h4>
1543  <a name="ccxx_basic_float">float</a>
1544</h4>
1545
1546<div>
1547
1548<div class="doc_code">
1549<pre>
1550!2 = metadata !{
1551  i32 524324,        ;; Tag
1552  metadata !1,       ;; Context
1553  metadata !"float",
1554  metadata !1,       ;; File
1555  i32 0,             ;; Line number
1556  i64 32,            ;; Size in Bits
1557  i64 32,            ;; Align in Bits
1558  i64 0,             ;; Offset in Bits
1559  i32 0,             ;; Flags
1560  i32 4              ;; Encoding
1561}
1562</pre>
1563</div>
1564
1565</div>
1566
1567<!-- ======================================================================= -->
1568<h4>
1569  <a name="ccxx_basic_double">double</a>
1570</h4>
1571
1572<div>
1573
1574<div class="doc_code">
1575<pre>
1576!2 = metadata !{
1577  i32 524324,        ;; Tag
1578  metadata !1,       ;; Context
1579  metadata !"double",;; Name
1580  metadata !1,       ;; File
1581  i32 0,             ;; Line number
1582  i64 64,            ;; Size in Bits
1583  i64 64,            ;; Align in Bits
1584  i64 0,             ;; Offset in Bits
1585  i32 0,             ;; Flags
1586  i32 4              ;; Encoding
1587}
1588</pre>
1589</div>
1590
1591</div>
1592
1593</div>
1594
1595<!-- ======================================================================= -->
1596<h3>
1597  <a name="ccxx_derived_types">C/C++ derived types</a>
1598</h3>
1599
1600<div>
1601
1602<p>Given the following as an example of C/C++ derived type:</p>
1603
1604<div class="doc_code">
1605<pre>
1606typedef const int *IntPtr;
1607</pre>
1608</div>
1609
1610<p>a C/C++ front-end would generate the following descriptors:</p>
1611
1612<div class="doc_code">
1613<pre>
1614;;
1615;; Define the typedef "IntPtr".
1616;;
1617!2 = metadata !{
1618  i32 524310,          ;; Tag
1619  metadata !1,         ;; Context
1620  metadata !"IntPtr",  ;; Name
1621  metadata !3,         ;; File
1622  i32 0,               ;; Line number
1623  i64 0,               ;; Size in bits
1624  i64 0,               ;; Align in bits
1625  i64 0,               ;; Offset in bits
1626  i32 0,               ;; Flags
1627  metadata !4          ;; Derived From type
1628}
1629
1630;;
1631;; Define the pointer type.
1632;;
1633!4 = metadata !{
1634  i32 524303,          ;; Tag
1635  metadata !1,         ;; Context
1636  metadata !"",        ;; Name
1637  metadata !1,         ;; File
1638  i32 0,               ;; Line number
1639  i64 64,              ;; Size in bits
1640  i64 64,              ;; Align in bits
1641  i64 0,               ;; Offset in bits
1642  i32 0,               ;; Flags
1643  metadata !5          ;; Derived From type
1644}
1645;;
1646;; Define the const type.
1647;;
1648!5 = metadata !{
1649  i32 524326,          ;; Tag
1650  metadata !1,         ;; Context
1651  metadata !"",        ;; Name
1652  metadata !1,         ;; File
1653  i32 0,               ;; Line number
1654  i64 32,              ;; Size in bits
1655  i64 32,              ;; Align in bits
1656  i64 0,               ;; Offset in bits
1657  i32 0,               ;; Flags
1658  metadata !6          ;; Derived From type
1659}
1660;;
1661;; Define the int type.
1662;;
1663!6 = metadata !{
1664  i32 524324,          ;; Tag
1665  metadata !1,         ;; Context
1666  metadata !"int",     ;; Name
1667  metadata !1,         ;; File
1668  i32 0,               ;; Line number
1669  i64 32,              ;; Size in bits
1670  i64 32,              ;; Align in bits
1671  i64 0,               ;; Offset in bits
1672  i32 0,               ;; Flags
1673  5                    ;; Encoding
1674}
1675</pre>
1676</div>
1677
1678</div>
1679
1680<!-- ======================================================================= -->
1681<h3>
1682  <a name="ccxx_composite_types">C/C++ struct/union types</a>
1683</h3>
1684
1685<div>
1686
1687<p>Given the following as an example of C/C++ struct type:</p>
1688
1689<div class="doc_code">
1690<pre>
1691struct Color {
1692  unsigned Red;
1693  unsigned Green;
1694  unsigned Blue;
1695};
1696</pre>
1697</div>
1698
1699<p>a C/C++ front-end would generate the following descriptors:</p>
1700
1701<div class="doc_code">
1702<pre>
1703;;
1704;; Define basic type for unsigned int.
1705;;
1706!5 = metadata !{
1707  i32 524324,        ;; Tag
1708  metadata !1,       ;; Context
1709  metadata !"unsigned int",
1710  metadata !1,       ;; File
1711  i32 0,             ;; Line number
1712  i64 32,            ;; Size in Bits
1713  i64 32,            ;; Align in Bits
1714  i64 0,             ;; Offset in Bits
1715  i32 0,             ;; Flags
1716  i32 7              ;; Encoding
1717}
1718;;
1719;; Define composite type for struct Color.
1720;;
1721!2 = metadata !{
1722  i32 524307,        ;; Tag
1723  metadata !1,       ;; Context
1724  metadata !"Color", ;; Name
1725  metadata !1,       ;; Compile unit
1726  i32 1,             ;; Line number
1727  i64 96,            ;; Size in bits
1728  i64 32,            ;; Align in bits
1729  i64 0,             ;; Offset in bits
1730  i32 0,             ;; Flags
1731  null,              ;; Derived From
1732  metadata !3,       ;; Elements
1733  i32 0              ;; Runtime Language
1734}
1735
1736;;
1737;; Define the Red field.
1738;;
1739!4 = metadata !{
1740  i32 524301,        ;; Tag
1741  metadata !1,       ;; Context
1742  metadata !"Red",   ;; Name
1743  metadata !1,       ;; File
1744  i32 2,             ;; Line number
1745  i64 32,            ;; Size in bits
1746  i64 32,            ;; Align in bits
1747  i64 0,             ;; Offset in bits
1748  i32 0,             ;; Flags
1749  metadata !5        ;; Derived From type
1750}
1751
1752;;
1753;; Define the Green field.
1754;;
1755!6 = metadata !{
1756  i32 524301,        ;; Tag
1757  metadata !1,       ;; Context
1758  metadata !"Green", ;; Name
1759  metadata !1,       ;; File
1760  i32 3,             ;; Line number
1761  i64 32,            ;; Size in bits
1762  i64 32,            ;; Align in bits
1763  i64 32,             ;; Offset in bits
1764  i32 0,             ;; Flags
1765  metadata !5        ;; Derived From type
1766}
1767
1768;;
1769;; Define the Blue field.
1770;;
1771!7 = metadata !{
1772  i32 524301,        ;; Tag
1773  metadata !1,       ;; Context
1774  metadata !"Blue",  ;; Name
1775  metadata !1,       ;; File
1776  i32 4,             ;; Line number
1777  i64 32,            ;; Size in bits
1778  i64 32,            ;; Align in bits
1779  i64 64,             ;; Offset in bits
1780  i32 0,             ;; Flags
1781  metadata !5        ;; Derived From type
1782}
1783
1784;;
1785;; Define the array of fields used by the composite type Color.
1786;;
1787!3 = metadata !{metadata !4, metadata !6, metadata !7}
1788</pre>
1789</div>
1790
1791</div>
1792
1793<!-- ======================================================================= -->
1794<h3>
1795  <a name="ccxx_enumeration_types">C/C++ enumeration types</a>
1796</h3>
1797
1798<div>
1799
1800<p>Given the following as an example of C/C++ enumeration type:</p>
1801
1802<div class="doc_code">
1803<pre>
1804enum Trees {
1805  Spruce = 100,
1806  Oak = 200,
1807  Maple = 300
1808};
1809</pre>
1810</div>
1811
1812<p>a C/C++ front-end would generate the following descriptors:</p>
1813
1814<div class="doc_code">
1815<pre>
1816;;
1817;; Define composite type for enum Trees
1818;;
1819!2 = metadata !{
1820  i32 524292,        ;; Tag
1821  metadata !1,       ;; Context
1822  metadata !"Trees", ;; Name
1823  metadata !1,       ;; File
1824  i32 1,             ;; Line number
1825  i64 32,            ;; Size in bits
1826  i64 32,            ;; Align in bits
1827  i64 0,             ;; Offset in bits
1828  i32 0,             ;; Flags
1829  null,              ;; Derived From type
1830  metadata !3,       ;; Elements
1831  i32 0              ;; Runtime language
1832}
1833
1834;;
1835;; Define the array of enumerators used by composite type Trees.
1836;;
1837!3 = metadata !{metadata !4, metadata !5, metadata !6}
1838
1839;;
1840;; Define Spruce enumerator.
1841;;
1842!4 = metadata !{i32 524328, metadata !"Spruce", i64 100}
1843
1844;;
1845;; Define Oak enumerator.
1846;;
1847!5 = metadata !{i32 524328, metadata !"Oak", i64 200}
1848
1849;;
1850;; Define Maple enumerator.
1851;;
1852!6 = metadata !{i32 524328, metadata !"Maple", i64 300}
1853
1854</pre>
1855</div>
1856
1857</div>
1858
1859</div>
1860
1861
1862<!-- *********************************************************************** -->
1863<h2>
1864  <a name="llvmdwarfextension">Debugging information format</a>
1865</h2>
1866<!-- *********************************************************************** -->
1867<div>
1868<!-- ======================================================================= -->
1869<h3>
1870  <a name="objcproperty">Debugging Information Extension for Objective C Properties</a>
1871</h3>
1872<div>
1873<!-- *********************************************************************** -->
1874<h4>
1875  <a name="objcpropertyintroduction">Introduction</a>
1876</h4>
1877<!-- *********************************************************************** -->
1878
1879<div>
1880<p>Objective C provides a simpler way to declare and define accessor methods
1881using declared properties. The language provides features to declare a
1882property and to let compiler synthesize accessor methods.
1883</p>
1884
1885<p>The debugger lets developer inspect Objective C interfaces and their
1886instance variables and class variables. However, the debugger does not know
1887anything about the properties defined in Objective C interfaces. The debugger
1888consumes information generated by compiler in DWARF format. The format does
1889not support encoding of Objective C properties. This proposal describes DWARF
1890extensions to encode Objective C properties, which the debugger can use to let
1891developers inspect Objective C properties.
1892</p>
1893
1894</div>
1895
1896
1897<!-- *********************************************************************** -->
1898<h4>
1899  <a name="objcpropertyproposal">Proposal</a>
1900</h4>
1901<!-- *********************************************************************** -->
1902
1903<div>
1904<p>Objective C properties exist separately from class members. A property
1905can be defined only by &quot;setter&quot; and &quot;getter&quot; selectors, and
1906be calculated anew on each access.  Or a property can just be a direct access
1907to some declared ivar.  Finally it can have an ivar &quot;automatically
1908synthesized&quot; for it by the compiler, in which case the property can be
1909referred to in user code directly using the standard C dereference syntax as
1910well as through the property &quot;dot&quot; syntax, but there is no entry in
1911the @interface declaration corresponding to this ivar.
1912</p>
1913<p>
1914To facilitate debugging, these properties we will add a new DWARF TAG into the
1915DW_TAG_structure_type definition for the class to hold the description of a
1916given property, and a set of DWARF attributes that provide said description.
1917The property tag will also contain the name and declared type of the property.
1918</p>
1919<p>
1920If there is a related ivar, there will also be a DWARF property attribute placed
1921in the DW_TAG_member DIE for that ivar referring back to the property TAG for
1922that property. And in the case where the compiler synthesizes the ivar directly,
1923the compiler is expected to generate a DW_TAG_member for that ivar (with the
1924DW_AT_artificial set to 1), whose name will be the name used to access this
1925ivar directly in code, and with the property attribute pointing back to the
1926property it is backing.
1927</p>
1928<p>
1929The following examples will serve as illustration for our discussion:
1930</p>
1931
1932<div class="doc_code">
1933<pre>
1934@interface I1 {
1935  int n2;
1936}
1937
1938@property int p1;
1939@property int p2;
1940@end
1941
1942@implementation I1
1943@synthesize p1;
1944@synthesize p2 = n2;
1945@end
1946</pre>
1947</div>
1948
1949<p>
1950This produces the following DWARF (this is a &quot;pseudo dwarfdump&quot; output):
1951</p>
1952<div class="doc_code">
1953<pre>
19540x00000100:  TAG_structure_type [7] *
1955               AT_APPLE_runtime_class( 0x10 )
1956               AT_name( "I1" )
1957               AT_decl_file( "Objc_Property.m" )
1958               AT_decl_line( 3 )
1959
19600x00000110    TAG_APPLE_property
1961                AT_name ( "p1" )
1962                AT_type ( {0x00000150} ( int ) )
1963
19640x00000120:   TAG_APPLE_property
1965                AT_name ( "p2" )
1966                AT_type ( {0x00000150} ( int ) )
1967
19680x00000130:   TAG_member [8]
1969                AT_name( "_p1" )
1970                AT_APPLE_property ( {0x00000110} "p1" )
1971                AT_type( {0x00000150} ( int ) )
1972                AT_artificial ( 0x1 )
1973
19740x00000140:    TAG_member [8]
1975                 AT_name( "n2" )
1976                 AT_APPLE_property ( {0x00000120} "p2" )
1977                 AT_type( {0x00000150} ( int ) )
1978
19790x00000150:  AT_type( ( int ) )
1980</pre>
1981</div>
1982
1983<p> Note, the current convention is that the name of the ivar for an
1984auto-synthesized property is the name of the property from which it derives with
1985an underscore prepended, as is shown in the example.
1986But we actually don't need to know this convention, since we are given the name
1987of the ivar directly.
1988</p>
1989
1990<p>
1991Also, it is common practice in ObjC to have different property declarations in
1992the @interface and @implementation - e.g. to provide a read-only property in
1993the interface,and a read-write interface in the implementation.  In that case,
1994the compiler should emit whichever property declaration will be in force in the
1995current translation unit.
1996</p>
1997
1998<p> Developers can decorate a property with attributes which are encoded using
1999DW_AT_APPLE_property_attribute.
2000</p>
2001
2002<div class="doc_code">
2003<pre>
2004@property (readonly, nonatomic) int pr;
2005</pre>
2006</div>
2007<p>
2008Which produces a property tag:
2009<p>
2010<div class="doc_code">
2011<pre>
2012TAG_APPLE_property [8]
2013  AT_name( "pr" )
2014  AT_type ( {0x00000147} (int) )
2015  AT_APPLE_property_attribute (DW_APPLE_PROPERTY_readonly, DW_APPLE_PROPERTY_nonatomic)
2016</pre>
2017</div>
2018
2019<p> The setter and getter method names are attached to the property using
2020DW_AT_APPLE_property_setter and DW_AT_APPLE_property_getter attributes.
2021</p>
2022<div class="doc_code">
2023<pre>
2024@interface I1
2025@property (setter=myOwnP3Setter:) int p3;
2026-(void)myOwnP3Setter:(int)a;
2027@end
2028
2029@implementation I1
2030@synthesize p3;
2031-(void)myOwnP3Setter:(int)a{ }
2032@end
2033</pre>
2034</div>
2035
2036<p>
2037The DWARF for this would be:
2038</p>
2039<div class="doc_code">
2040<pre>
20410x000003bd: TAG_structure_type [7] *
2042              AT_APPLE_runtime_class( 0x10 )
2043              AT_name( "I1" )
2044              AT_decl_file( "Objc_Property.m" )
2045              AT_decl_line( 3 )
2046
20470x000003cd      TAG_APPLE_property
2048                  AT_name ( "p3" )
2049                  AT_APPLE_property_setter ( "myOwnP3Setter:" )
2050                  AT_type( {0x00000147} ( int ) )
2051
20520x000003f3:     TAG_member [8]
2053                  AT_name( "_p3" )
2054                  AT_type ( {0x00000147} ( int ) )
2055                  AT_APPLE_property ( {0x000003cd} )
2056                  AT_artificial ( 0x1 )
2057</pre>
2058</div>
2059
2060</div>
2061
2062<!-- *********************************************************************** -->
2063<h4>
2064  <a name="objcpropertynewtags">New DWARF Tags</a>
2065</h4>
2066<!-- *********************************************************************** -->
2067
2068<div>
2069<table border="1" cellspacing="0">
2070  <col width="200">
2071  <col width="200">
2072  <tr>
2073    <th>TAG</th>
2074    <th>Value</th>
2075  </tr>
2076  <tr>
2077    <td>DW_TAG_APPLE_property</td>
2078    <td>0x4200</td>
2079  </tr>
2080</table>
2081
2082</div>
2083
2084<!-- *********************************************************************** -->
2085<h4>
2086  <a name="objcpropertynewattributes">New DWARF Attributes</a>
2087</h4>
2088<!-- *********************************************************************** -->
2089
2090<div>
2091<table border="1" cellspacing="0">
2092  <col width="200">
2093  <col width="200">
2094  <col width="200">
2095  <tr>
2096    <th>Attribute</th>
2097    <th>Value</th>
2098    <th>Classes</th>
2099  </tr>
2100  <tr>
2101    <td>DW_AT_APPLE_property</td>
2102    <td>0x3fed</td>
2103    <td>Reference</td>
2104  </tr>
2105  <tr>
2106    <td>DW_AT_APPLE_property_getter</td>
2107    <td>0x3fe9</td>
2108    <td>String</td>
2109  </tr>
2110  <tr>
2111    <td>DW_AT_APPLE_property_setter</td>
2112    <td>0x3fea</td>
2113    <td>String</td>
2114  </tr>
2115  <tr>
2116    <td>DW_AT_APPLE_property_attribute</td>
2117    <td>0x3feb</td>
2118    <td>Constant</td>
2119  </tr>
2120</table>
2121
2122</div>
2123
2124<!-- *********************************************************************** -->
2125<h4>
2126  <a name="objcpropertynewconstants">New DWARF Constants</a>
2127</h4>
2128<!-- *********************************************************************** -->
2129
2130<div>
2131<table border="1" cellspacing="0">
2132  <col width="200">
2133  <col width="200">
2134  <tr>
2135    <th>Name</th>
2136    <th>Value</th>
2137  </tr>
2138  <tr>
2139    <td>DW_AT_APPLE_PROPERTY_readonly</td>
2140    <td>0x1</td>
2141  </tr>
2142  <tr>
2143    <td>DW_AT_APPLE_PROPERTY_readwrite</td>
2144    <td>0x2</td>
2145  </tr>
2146  <tr>
2147    <td>DW_AT_APPLE_PROPERTY_assign</td>
2148    <td>0x4</td>
2149  </tr>
2150  <tr>
2151    <td>DW_AT_APPLE_PROPERTY_retain</td>
2152    <td>0x8</td>
2153  </tr>
2154  <tr>
2155    <td>DW_AT_APPLE_PROPERTY_copy</td>
2156    <td>0x10</td>
2157  </tr>
2158  <tr>
2159    <td>DW_AT_APPLE_PROPERTY_nonatomic</td>
2160    <td>0x20</td>
2161  </tr>
2162</table>
2163
2164</div>
2165</div>
2166
2167<!-- ======================================================================= -->
2168<h3>
2169  <a name="acceltable">Name Accelerator Tables</a>
2170</h3>
2171<!-- ======================================================================= -->
2172<div>
2173<!-- ======================================================================= -->
2174<h4>
2175  <a name="acceltableintroduction">Introduction</a>
2176</h4>
2177<!-- ======================================================================= -->
2178<div>
2179<p>The .debug_pubnames and .debug_pubtypes formats are not what a debugger
2180  needs. The "pub" in the section name indicates that the entries in the
2181  table are publicly visible names only. This means no static or hidden
2182  functions show up in the .debug_pubnames. No static variables or private class
2183  variables are in the .debug_pubtypes. Many compilers add different things to
2184  these tables, so we can't rely upon the contents between gcc, icc, or clang.</p>
2185
2186<p>The typical query given by users tends not to match up with the contents of
2187  these tables. For example, the DWARF spec states that "In the case of the
2188  name of a function member or static data member of a C++ structure, class or
2189  union, the name presented in the .debug_pubnames section is not the simple
2190  name given by the DW_AT_name attribute of the referenced debugging information
2191  entry, but rather the fully qualified name of the data or function member."
2192  So the only names in these tables for complex C++ entries is a fully
2193  qualified name.  Debugger users tend not to enter their search strings as
2194  "a::b::c(int,const Foo&) const", but rather as "c", "b::c" , or "a::b::c".  So
2195  the name entered in the name table must be demangled in order to chop it up
2196  appropriately and additional names must be manually entered into the table
2197  to make it effective as a name lookup table for debuggers to use.</p>
2198
2199<p>All debuggers currently ignore the .debug_pubnames table as a result of
2200  its inconsistent and useless public-only name content making it a waste of
2201  space in the object file. These tables, when they are written to disk, are
2202  not sorted in any way, leaving every debugger to do its own parsing
2203  and sorting. These tables also include an inlined copy of the string values
2204  in the table itself making the tables much larger than they need to be on
2205  disk, especially for large C++ programs.</p>
2206
2207<p>Can't we just fix the sections by adding all of the names we need to this
2208  table? No, because that is not what the tables are defined to contain and we
2209  won't know the difference between the old bad tables and the new good tables.
2210  At best we could make our own renamed sections that contain all of the data
2211  we need.</p>
2212
2213<p>These tables are also insufficient for what a debugger like LLDB needs.
2214  LLDB uses clang for its expression parsing where LLDB acts as a PCH. LLDB is
2215  then often asked to look for type "foo" or namespace "bar", or list items in
2216  namespace "baz". Namespaces are not included in the pubnames or pubtypes
2217  tables. Since clang asks a lot of questions when it is parsing an expression,
2218  we need to be very fast when looking up names, as it happens a lot. Having new
2219  accelerator tables that are optimized for very quick lookups will benefit
2220  this type of debugging experience greatly.</p>
2221
2222<p>We would like to generate name lookup tables that can be mapped into
2223  memory from disk, and used as is, with little or no up-front parsing. We would
2224  also be able to control the exact content of these different tables so they
2225  contain exactly what we need. The Name Accelerator Tables were designed
2226  to fix these issues. In order to solve these issues we need to:</p>
2227
2228<ul>
2229  <li>Have a format that can be mapped into memory from disk and used as is</li>
2230  <li>Lookups should be very fast</li>
2231  <li>Extensible table format so these tables can be made by many producers</li>
2232  <li>Contain all of the names needed for typical lookups out of the box</li>
2233  <li>Strict rules for the contents of tables</li>
2234</ul>
2235
2236<p>Table size is important and the accelerator table format should allow the
2237  reuse of strings from common string tables so the strings for the names are
2238  not duplicated. We also want to make sure the table is ready to be used as-is
2239  by simply mapping the table into memory with minimal header parsing.</p>
2240
2241<p>The name lookups need to be fast and optimized for the kinds of lookups
2242  that debuggers tend to do. Optimally we would like to touch as few parts of
2243  the mapped table as possible when doing a name lookup and be able to quickly
2244  find the name entry we are looking for, or discover there are no matches. In
2245  the case of debuggers we optimized for lookups that fail most of the time.</p>
2246
2247<p>Each table that is defined should have strict rules on exactly what is in
2248  the accelerator tables and documented so clients can rely on the content.</p>
2249
2250</div>
2251
2252<!-- ======================================================================= -->
2253<h4>
2254  <a name="acceltablehashes">Hash Tables</a>
2255</h4>
2256<!-- ======================================================================= -->
2257
2258<div>
2259<h5>Standard Hash Tables</h5>
2260
2261<p>Typical hash tables have a header, buckets, and each bucket points to the
2262bucket contents:
2263</p>
2264
2265<div class="doc_code">
2266<pre>
2267.------------.
2268|  HEADER    |
2269|------------|
2270|  BUCKETS   |
2271|------------|
2272|  DATA      |
2273`------------'
2274</pre>
2275</div>
2276
2277<p>The BUCKETS are an array of offsets to DATA for each hash:</p>
2278
2279<div class="doc_code">
2280<pre>
2281.------------.
2282| 0x00001000 | BUCKETS[0]
2283| 0x00002000 | BUCKETS[1]
2284| 0x00002200 | BUCKETS[2]
2285| 0x000034f0 | BUCKETS[3]
2286|            | ...
2287| 0xXXXXXXXX | BUCKETS[n_buckets]
2288'------------'
2289</pre>
2290</div>
2291
2292<p>So for bucket[3] in the example above, we have an offset into the table
2293  0x000034f0 which points to a chain of entries for the bucket. Each bucket
2294  must contain a next pointer, full 32 bit hash value, the string itself,
2295  and the data for the current string value.</p>
2296
2297<div class="doc_code">
2298<pre>
2299            .------------.
23000x000034f0: | 0x00003500 | next pointer
2301            | 0x12345678 | 32 bit hash
2302            | "erase"    | string value
2303            | data[n]    | HashData for this bucket
2304            |------------|
23050x00003500: | 0x00003550 | next pointer
2306            | 0x29273623 | 32 bit hash
2307            | "dump"     | string value
2308            | data[n]    | HashData for this bucket
2309            |------------|
23100x00003550: | 0x00000000 | next pointer
2311            | 0x82638293 | 32 bit hash
2312            | "main"     | string value
2313            | data[n]    | HashData for this bucket
2314            `------------'
2315</pre>
2316</div>
2317
2318<p>The problem with this layout for debuggers is that we need to optimize for
2319  the negative lookup case where the symbol we're searching for is not present.
2320  So if we were to lookup "printf" in the table above, we would make a 32 hash
2321  for "printf", it might match bucket[3]. We would need to go to the offset
2322  0x000034f0 and start looking to see if our 32 bit hash matches. To do so, we
2323  need to read the next pointer, then read the hash, compare it, and skip to
2324  the next bucket. Each time we are skipping many bytes in memory and touching
2325  new cache pages just to do the compare on the full 32 bit hash. All of these
2326  accesses then tell us that we didn't have a match.</p>
2327
2328<h5>Name Hash Tables</h5>
2329
2330<p>To solve the issues mentioned above we have structured the hash tables
2331  a bit differently: a header, buckets, an array of all unique 32 bit hash
2332  values, followed by an array of hash value data offsets, one for each hash
2333  value, then the data for all hash values:</p>
2334
2335<div class="doc_code">
2336<pre>
2337.-------------.
2338|  HEADER     |
2339|-------------|
2340|  BUCKETS    |
2341|-------------|
2342|  HASHES     |
2343|-------------|
2344|  OFFSETS    |
2345|-------------|
2346|  DATA       |
2347`-------------'
2348</pre>
2349</div>
2350
2351<p>The BUCKETS in the name tables are an index into the HASHES array. By
2352  making all of the full 32 bit hash values contiguous in memory, we allow
2353  ourselves to efficiently check for a match while touching as little
2354  memory as possible. Most often checking the 32 bit hash values is as far as
2355  the lookup goes. If it does match, it usually is a match with no collisions.
2356  So for a table with "n_buckets" buckets, and "n_hashes" unique 32 bit hash
2357  values, we can clarify the contents of the BUCKETS, HASHES and OFFSETS as:</p>
2358
2359<div class="doc_code">
2360<pre>
2361.-------------------------.
2362|  HEADER.magic           | uint32_t
2363|  HEADER.version         | uint16_t
2364|  HEADER.hash_function   | uint16_t
2365|  HEADER.bucket_count    | uint32_t
2366|  HEADER.hashes_count    | uint32_t
2367|  HEADER.header_data_len | uint32_t
2368|  HEADER_DATA            | HeaderData
2369|-------------------------|
2370|  BUCKETS                | uint32_t[n_buckets] // 32 bit hash indexes
2371|-------------------------|
2372|  HASHES                 | uint32_t[n_buckets] // 32 bit hash values
2373|-------------------------|
2374|  OFFSETS                | uint32_t[n_buckets] // 32 bit offsets to hash value data
2375|-------------------------|
2376|  ALL HASH DATA          |
2377`-------------------------'
2378</pre>
2379</div>
2380
2381<p>So taking the exact same data from the standard hash example above we end up
2382  with:</p>
2383
2384<div class="doc_code">
2385<pre>
2386            .------------.
2387            | HEADER     |
2388            |------------|
2389            |          0 | BUCKETS[0]
2390            |          2 | BUCKETS[1]
2391            |          5 | BUCKETS[2]
2392            |          6 | BUCKETS[3]
2393            |            | ...
2394            |        ... | BUCKETS[n_buckets]
2395            |------------|
2396            | 0x........ | HASHES[0]
2397            | 0x........ | HASHES[1]
2398            | 0x........ | HASHES[2]
2399            | 0x........ | HASHES[3]
2400            | 0x........ | HASHES[4]
2401            | 0x........ | HASHES[5]
2402            | 0x12345678 | HASHES[6]    hash for BUCKETS[3]
2403            | 0x29273623 | HASHES[7]    hash for BUCKETS[3]
2404            | 0x82638293 | HASHES[8]    hash for BUCKETS[3]
2405            | 0x........ | HASHES[9]
2406            | 0x........ | HASHES[10]
2407            | 0x........ | HASHES[11]
2408            | 0x........ | HASHES[12]
2409            | 0x........ | HASHES[13]
2410            | 0x........ | HASHES[n_hashes]
2411            |------------|
2412            | 0x........ | OFFSETS[0]
2413            | 0x........ | OFFSETS[1]
2414            | 0x........ | OFFSETS[2]
2415            | 0x........ | OFFSETS[3]
2416            | 0x........ | OFFSETS[4]
2417            | 0x........ | OFFSETS[5]
2418            | 0x000034f0 | OFFSETS[6]   offset for BUCKETS[3]
2419            | 0x00003500 | OFFSETS[7]   offset for BUCKETS[3]
2420            | 0x00003550 | OFFSETS[8]   offset for BUCKETS[3]
2421            | 0x........ | OFFSETS[9]
2422            | 0x........ | OFFSETS[10]
2423            | 0x........ | OFFSETS[11]
2424            | 0x........ | OFFSETS[12]
2425            | 0x........ | OFFSETS[13]
2426            | 0x........ | OFFSETS[n_hashes]
2427            |------------|
2428            |            |
2429            |            |
2430            |            |
2431            |            |
2432            |            |
2433            |------------|
24340x000034f0: | 0x00001203 | .debug_str ("erase")
2435            | 0x00000004 | A 32 bit array count - number of HashData with name "erase"
2436            | 0x........ | HashData[0]
2437            | 0x........ | HashData[1]
2438            | 0x........ | HashData[2]
2439            | 0x........ | HashData[3]
2440            | 0x00000000 | String offset into .debug_str (terminate data for hash)
2441            |------------|
24420x00003500: | 0x00001203 | String offset into .debug_str ("collision")
2443            | 0x00000002 | A 32 bit array count - number of HashData with name "collision"
2444            | 0x........ | HashData[0]
2445            | 0x........ | HashData[1]
2446            | 0x00001203 | String offset into .debug_str ("dump")
2447            | 0x00000003 | A 32 bit array count - number of HashData with name "dump"
2448            | 0x........ | HashData[0]
2449            | 0x........ | HashData[1]
2450            | 0x........ | HashData[2]
2451            | 0x00000000 | String offset into .debug_str (terminate data for hash)
2452            |------------|
24530x00003550: | 0x00001203 | String offset into .debug_str ("main")
2454            | 0x00000009 | A 32 bit array count - number of HashData with name "main"
2455            | 0x........ | HashData[0]
2456            | 0x........ | HashData[1]
2457            | 0x........ | HashData[2]
2458            | 0x........ | HashData[3]
2459            | 0x........ | HashData[4]
2460            | 0x........ | HashData[5]
2461            | 0x........ | HashData[6]
2462            | 0x........ | HashData[7]
2463            | 0x........ | HashData[8]
2464            | 0x00000000 | String offset into .debug_str (terminate data for hash)
2465            `------------'
2466</pre>
2467</div>
2468
2469<p>So we still have all of the same data, we just organize it more efficiently
2470  for debugger lookup. If we repeat the same "printf" lookup from above, we
2471  would hash "printf" and find it matches BUCKETS[3] by taking the 32 bit hash
2472  value and modulo it by n_buckets. BUCKETS[3] contains "6" which is the index
2473  into the HASHES table. We would then compare any consecutive 32 bit hashes
2474  values in the HASHES array as long as the hashes would be in BUCKETS[3]. We
2475  do this by verifying that each subsequent hash value modulo n_buckets is still
2476  3. In the case of a failed lookup we would access the memory for BUCKETS[3], and
2477  then compare a few consecutive 32 bit hashes before we know that we have no match.
2478  We don't end up marching through multiple words of memory and we really keep the
2479  number of processor data cache lines being accessed as small as possible.</p>
2480
2481<p>The string hash that is used for these lookup tables is the Daniel J.
2482  Bernstein hash which is also used in the ELF GNU_HASH sections. It is a very
2483  good hash for all kinds of names in programs with very few hash collisions.</p>
2484
2485<p>Empty buckets are designated by using an invalid hash index of UINT32_MAX.</p>
2486</div>
2487
2488<!-- ======================================================================= -->
2489<h4>
2490  <a name="acceltabledetails">Details</a>
2491</h4>
2492<!-- ======================================================================= -->
2493<div>
2494<p>These name hash tables are designed to be generic where specializations of
2495  the table get to define additional data that goes into the header
2496  ("HeaderData"), how the string value is stored ("KeyType") and the content
2497  of the data for each hash value.</p>
2498
2499<h5>Header Layout</h5>
2500<p>The header has a fixed part, and the specialized part. The exact format of
2501  the header is:</p>
2502<div class="doc_code">
2503<pre>
2504struct Header
2505{
2506  uint32_t   magic;           // 'HASH' magic value to allow endian detection
2507  uint16_t   version;         // Version number
2508  uint16_t   hash_function;   // The hash function enumeration that was used
2509  uint32_t   bucket_count;    // The number of buckets in this hash table
2510  uint32_t   hashes_count;    // The total number of unique hash values and hash data offsets in this table
2511  uint32_t   header_data_len; // The bytes to skip to get to the hash indexes (buckets) for correct alignment
2512                              // Specifically the length of the following HeaderData field - this does not
2513                              // include the size of the preceding fields
2514  HeaderData header_data;     // Implementation specific header data
2515};
2516</pre>
2517</div>
2518<p>The header starts with a 32 bit "magic" value which must be 'HASH' encoded as
2519  an ASCII integer. This allows the detection of the start of the hash table and
2520  also allows the table's byte order to be determined so the table can be
2521  correctly extracted. The "magic" value is followed by a 16 bit version number
2522  which allows the table to be revised and modified in the future. The current
2523  version number is 1. "hash_function" is a uint16_t enumeration that specifies
2524  which hash function was used to produce this table. The current values for the
2525  hash function enumerations include:</p>
2526<div class="doc_code">
2527<pre>
2528enum HashFunctionType
2529{
2530  eHashFunctionDJB = 0u, // Daniel J Bernstein hash function
2531};
2532</pre>
2533</div>
2534<p>"bucket_count" is a 32 bit unsigned integer that represents how many buckets
2535  are in the BUCKETS array. "hashes_count" is the number of unique 32 bit hash
2536  values that are in the HASHES array, and is the same number of offsets are
2537  contained in the OFFSETS array. "header_data_len" specifies the size in
2538  bytes of the HeaderData that is filled in by specialized versions of this
2539  table.</p>
2540
2541<h5>Fixed Lookup</h5>
2542<p>The header is followed by the buckets, hashes, offsets, and hash value
2543  data.
2544<div class="doc_code">
2545<pre>
2546struct FixedTable
2547{
2548  uint32_t buckets[Header.bucket_count];  // An array of hash indexes into the "hashes[]" array below
2549  uint32_t hashes [Header.hashes_count];  // Every unique 32 bit hash for the entire table is in this table
2550  uint32_t offsets[Header.hashes_count];  // An offset that corresponds to each item in the "hashes[]" array above
2551};
2552</pre>
2553</div>
2554<p>"buckets" is an array of 32 bit indexes into the "hashes" array. The
2555  "hashes" array contains all of the 32 bit hash values for all names in the
2556  hash table. Each hash in the "hashes" table has an offset in the "offsets"
2557  array that points to the data for the hash value.</p>
2558
2559<p>This table setup makes it very easy to repurpose these tables to contain
2560  different data, while keeping the lookup mechanism the same for all tables.
2561  This layout also makes it possible to save the table to disk and map it in
2562  later and do very efficient name lookups with little or no parsing.</p>
2563
2564<p>DWARF lookup tables can be implemented in a variety of ways and can store
2565  a lot of information for each name. We want to make the DWARF tables
2566  extensible and able to store the data efficiently so we have used some of the
2567  DWARF features that enable efficient data storage to define exactly what kind
2568  of data we store for each name.</p>
2569
2570<p>The "HeaderData" contains a definition of the contents of each HashData
2571  chunk. We might want to store an offset to all of the debug information
2572  entries (DIEs) for each name. To keep things extensible, we create a list of
2573  items, or Atoms, that are contained in the data for each name. First comes the
2574  type of the data in each atom:</p>
2575<div class="doc_code">
2576<pre>
2577enum AtomType
2578{
2579  eAtomTypeNULL       = 0u,
2580  eAtomTypeDIEOffset  = 1u,   // DIE offset, check form for encoding
2581  eAtomTypeCUOffset   = 2u,   // DIE offset of the compiler unit header that contains the item in question
2582  eAtomTypeTag        = 3u,   // DW_TAG_xxx value, should be encoded as DW_FORM_data1 (if no tags exceed 255) or DW_FORM_data2
2583  eAtomTypeNameFlags  = 4u,   // Flags from enum NameFlags
2584  eAtomTypeTypeFlags  = 5u,   // Flags from enum TypeFlags
2585};
2586</pre>
2587</div>
2588<p>The enumeration values and their meanings are:</p>
2589<div class="doc_code">
2590<pre>
2591  eAtomTypeNULL       - a termination atom that specifies the end of the atom list
2592  eAtomTypeDIEOffset  - an offset into the .debug_info section for the DWARF DIE for this name
2593  eAtomTypeCUOffset   - an offset into the .debug_info section for the CU that contains the DIE
2594  eAtomTypeDIETag     - The DW_TAG_XXX enumeration value so you don't have to parse the DWARF to see what it is
2595  eAtomTypeNameFlags  - Flags for functions and global variables (isFunction, isInlined, isExternal...)
2596  eAtomTypeTypeFlags  - Flags for types (isCXXClass, isObjCClass, ...)
2597</pre>
2598</div>
2599<p>Then we allow each atom type to define the atom type and how the data for
2600  each atom type data is encoded:</p>
2601<div class="doc_code">
2602<pre>
2603struct Atom
2604{
2605  uint16_t type;  // AtomType enum value
2606  uint16_t form;  // DWARF DW_FORM_XXX defines
2607};
2608</pre>
2609</div>
2610<p>The "form" type above is from the DWARF specification and defines the
2611  exact encoding of the data for the Atom type. See the DWARF specification for
2612  the DW_FORM_ definitions.</p>
2613<div class="doc_code">
2614<pre>
2615struct HeaderData
2616{
2617  uint32_t die_offset_base;
2618  uint32_t atom_count;
2619  Atoms    atoms[atom_count0];
2620};
2621</pre>
2622</div>
2623<p>"HeaderData" defines the base DIE offset that should be added to any atoms
2624  that are encoded using the DW_FORM_ref1, DW_FORM_ref2, DW_FORM_ref4,
2625  DW_FORM_ref8 or DW_FORM_ref_udata. It also defines what is contained in
2626  each "HashData" object -- Atom.form tells us how large each field will be in
2627  the HashData and the Atom.type tells us how this data should be interpreted.</p>
2628
2629<p>For the current implementations of the ".apple_names" (all functions + globals),
2630  the ".apple_types" (names of all types that are defined), and the
2631  ".apple_namespaces" (all namespaces), we currently set the Atom array to be:</p>
2632<div class="doc_code">
2633<pre>
2634HeaderData.atom_count = 1;
2635HeaderData.atoms[0].type = eAtomTypeDIEOffset;
2636HeaderData.atoms[0].form = DW_FORM_data4;
2637</pre>
2638</div>
2639<p>This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is
2640  encoded as a 32 bit value (DW_FORM_data4). This allows a single name to have
2641  multiple matching DIEs in a single file, which could come up with an inlined
2642  function for instance. Future tables could include more information about the
2643  DIE such as flags indicating if the DIE is a function, method, block,
2644  or inlined.</p>
2645
2646<p>The KeyType for the DWARF table is a 32 bit string table offset into the
2647  ".debug_str" table. The ".debug_str" is the string table for the DWARF which
2648  may already contain copies of all of the strings. This helps make sure, with
2649  help from the compiler, that we reuse the strings between all of the DWARF
2650  sections and keeps the hash table size down. Another benefit to having the
2651  compiler generate all strings as DW_FORM_strp in the debug info, is that
2652  DWARF parsing can be made much faster.</p>
2653
2654<p>After a lookup is made, we get an offset into the hash data. The hash data
2655  needs to be able to deal with 32 bit hash collisions, so the chunk of data
2656  at the offset in the hash data consists of a triple:</p>
2657<div class="doc_code">
2658<pre>
2659uint32_t str_offset
2660uint32_t hash_data_count
2661HashData[hash_data_count]
2662</pre>
2663</div>
2664<p>If "str_offset" is zero, then the bucket contents are done. 99.9% of the
2665  hash data chunks contain a single item (no 32 bit hash collision):</p>
2666<div class="doc_code">
2667<pre>
2668.------------.
2669| 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main")
2670| 0x00000004 | uint32_t HashData count
2671| 0x........ | uint32_t HashData[0] DIE offset
2672| 0x........ | uint32_t HashData[1] DIE offset
2673| 0x........ | uint32_t HashData[2] DIE offset
2674| 0x........ | uint32_t HashData[3] DIE offset
2675| 0x00000000 | uint32_t KeyType (end of hash chain)
2676`------------'
2677</pre>
2678</div>
2679<p>If there are collisions, you will have multiple valid string offsets:</p>
2680<div class="doc_code">
2681<pre>
2682.------------.
2683| 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main")
2684| 0x00000004 | uint32_t HashData count
2685| 0x........ | uint32_t HashData[0] DIE offset
2686| 0x........ | uint32_t HashData[1] DIE offset
2687| 0x........ | uint32_t HashData[2] DIE offset
2688| 0x........ | uint32_t HashData[3] DIE offset
2689| 0x00002023 | uint32_t KeyType (.debug_str[0x0002023] => "print")
2690| 0x00000002 | uint32_t HashData count
2691| 0x........ | uint32_t HashData[0] DIE offset
2692| 0x........ | uint32_t HashData[1] DIE offset
2693| 0x00000000 | uint32_t KeyType (end of hash chain)
2694`------------'
2695</pre>
2696</div>
2697<p>Current testing with real world C++ binaries has shown that there is around 1
2698  32 bit hash collision per 100,000 name entries.</p>
2699</div>
2700<!-- ======================================================================= -->
2701<h4>
2702  <a name="acceltablecontents">Contents</a>
2703</h4>
2704<!-- ======================================================================= -->
2705<div>
2706<p>As we said, we want to strictly define exactly what is included in the
2707  different tables. For DWARF, we have 3 tables: ".apple_names", ".apple_types",
2708  and ".apple_namespaces".</p>
2709
2710<p>".apple_names" sections should contain an entry for each DWARF DIE whose
2711  DW_TAG is a DW_TAG_label, DW_TAG_inlined_subroutine, or DW_TAG_subprogram that
2712  has address attributes: DW_AT_low_pc, DW_AT_high_pc, DW_AT_ranges or
2713  DW_AT_entry_pc. It also contains DW_TAG_variable DIEs that have a DW_OP_addr
2714  in the location (global and static variables). All global and static variables
2715  should be included, including those scoped within functions and classes. For
2716  example using the following code:</p>
2717<div class="doc_code">
2718<pre>
2719static int var = 0;
2720
2721void f ()
2722{
2723  static int var = 0;
2724}
2725</pre>
2726</div>
2727<p>Both of the static "var" variables would be included in the table. All
2728  functions should emit both their full names and their basenames. For C or C++,
2729  the full name is the mangled name (if available) which is usually in the
2730  DW_AT_MIPS_linkage_name attribute, and the DW_AT_name contains the function
2731  basename. If global or static variables have a mangled name in a
2732  DW_AT_MIPS_linkage_name attribute, this should be emitted along with the
2733  simple name found in the DW_AT_name attribute.</p>
2734
2735<p>".apple_types" sections should contain an entry for each DWARF DIE whose
2736  tag is one of:</p>
2737<ul>
2738  <li>DW_TAG_array_type</li>
2739  <li>DW_TAG_class_type</li>
2740  <li>DW_TAG_enumeration_type</li>
2741  <li>DW_TAG_pointer_type</li>
2742  <li>DW_TAG_reference_type</li>
2743  <li>DW_TAG_string_type</li>
2744  <li>DW_TAG_structure_type</li>
2745  <li>DW_TAG_subroutine_type</li>
2746  <li>DW_TAG_typedef</li>
2747  <li>DW_TAG_union_type</li>
2748  <li>DW_TAG_ptr_to_member_type</li>
2749  <li>DW_TAG_set_type</li>
2750  <li>DW_TAG_subrange_type</li>
2751  <li>DW_TAG_base_type</li>
2752  <li>DW_TAG_const_type</li>
2753  <li>DW_TAG_constant</li>
2754  <li>DW_TAG_file_type</li>
2755  <li>DW_TAG_namelist</li>
2756  <li>DW_TAG_packed_type</li>
2757  <li>DW_TAG_volatile_type</li>
2758  <li>DW_TAG_restrict_type</li>
2759  <li>DW_TAG_interface_type</li>
2760  <li>DW_TAG_unspecified_type</li>
2761  <li>DW_TAG_shared_type</li>
2762</ul>
2763<p>Only entries with a DW_AT_name attribute are included, and the entry must
2764  not be a forward declaration (DW_AT_declaration attribute with a non-zero value).
2765  For example, using the following code:</p>
2766<div class="doc_code">
2767<pre>
2768int main ()
2769{
2770  int *b = 0;
2771  return *b;
2772}
2773</pre>
2774</div>
2775<p>We get a few type DIEs:</p>
2776<div class="doc_code">
2777<pre>
27780x00000067:     TAG_base_type [5]
2779                AT_encoding( DW_ATE_signed )
2780                AT_name( "int" )
2781                AT_byte_size( 0x04 )
2782
27830x0000006e:     TAG_pointer_type [6]
2784                AT_type( {0x00000067} ( int ) )
2785                AT_byte_size( 0x08 )
2786</pre>
2787</div>
2788<p>The DW_TAG_pointer_type is not included because it does not have a DW_AT_name.</p>
2789
2790<p>".apple_namespaces" section should contain all DW_TAG_namespace DIEs. If
2791  we run into a namespace that has no name this is an anonymous namespace,
2792  and the name should be output as "(anonymous namespace)" (without the quotes).
2793  Why? This matches the output of the abi::cxa_demangle() that is in the standard
2794  C++ library that demangles mangled names.</p>
2795</div>
2796
2797<!-- ======================================================================= -->
2798<h4>
2799  <a name="acceltableextensions">Language Extensions and File Format Changes</a>
2800</h4>
2801<!-- ======================================================================= -->
2802<div>
2803<h5>Objective-C Extensions</h5>
2804<p>".apple_objc" section should contain all DW_TAG_subprogram DIEs for an
2805  Objective-C class. The name used in the hash table is the name of the
2806  Objective-C class itself. If the Objective-C class has a category, then an
2807  entry is made for both the class name without the category, and for the class
2808  name with the category. So if we have a DIE at offset 0x1234 with a name
2809  of method "-[NSString(my_additions) stringWithSpecialString:]", we would add
2810  an entry for "NSString" that points to DIE 0x1234, and an entry for
2811  "NSString(my_additions)" that points to 0x1234. This allows us to quickly
2812  track down all Objective-C methods for an Objective-C class when doing
2813  expressions. It is needed because of the dynamic nature of Objective-C where
2814  anyone can add methods to a class. The DWARF for Objective-C methods is also
2815  emitted differently from C++ classes where the methods are not usually
2816  contained in the class definition, they are scattered about across one or more
2817  compile units. Categories can also be defined in different shared libraries.
2818  So we need to be able to quickly find all of the methods and class functions
2819  given the Objective-C class name, or quickly find all methods and class
2820  functions for a class + category name. This table does not contain any selector
2821  names, it just maps Objective-C class names (or class names + category) to all
2822  of the methods and class functions. The selectors are added as function
2823  basenames in the .debug_names section.</p>
2824
2825<p>In the ".apple_names" section for Objective-C functions, the full name is the
2826  entire function name with the brackets ("-[NSString stringWithCString:]") and the
2827  basename is the selector only ("stringWithCString:").</p>
2828
2829<h5>Mach-O Changes</h5>
2830<p>The sections names for the apple hash tables are for non mach-o files. For
2831  mach-o files, the sections should be contained in the "__DWARF" segment with
2832  names as follows:</p>
2833<ul>
2834  <li>".apple_names" -> "__apple_names"</li>
2835  <li>".apple_types" -> "__apple_types"</li>
2836  <li>".apple_namespaces" -> "__apple_namespac" (16 character limit)</li>
2837  <li> ".apple_objc" -> "__apple_objc"</li>
2838</ul>
2839</div>
2840</div>
2841</div>
2842
2843<!-- *********************************************************************** -->
2844
2845<hr>
2846<address>
2847  <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
2848  src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a>
2849  <a href="http://validator.w3.org/check/referer"><img
2850  src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a>
2851
2852  <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
2853  <a href="http://llvm.org/">LLVM Compiler Infrastructure</a><br>
2854  Last modified: $Date$
2855</address>
2856
2857</body>
2858</html>
2859