1Notes on the GNU Implementation of DWARF Debugging Information 2-------------------------------------------------------------- 3Last Updated: Sun Jul 17 08:17:42 PDT 1994 by rfg@segfault.us.com 4------------------------------------------------------------ 5 6This file describes special and unique aspects of the GNU implementation 7of the DWARF debugging information language, as provided in the GNU version 82.x compiler(s). 9 10For general information about the DWARF debugging information language, 11you should obtain the DWARF version 1 specification document (and perhaps 12also the DWARF version 2 draft specification document) developed by the 13UNIX International Programming Languages Special Interest Group. A copy 14of the DWARF version 1 specification (in PostScript form) may be 15obtained either from me <rfg@netcom.com> or from the main Data General 16FTP server. (See below.) The file you are looking at now only describes 17known deviations from the DWARF version 1 specification, together with 18those things which are allowed by the DWARF version 1 specification but 19which are known to cause interoperability problems (e.g. with SVR4 SDB). 20 21To obtain a copy of the DWARF Version 1 and/or DWARF Version 2 specification 22from Data General's FTP server, use the following procedure: 23 24--------------------------------------------------------------------------- 25 ftp to machine: "dg-rtp.dg.com" (128.222.1.2). 26 27 Log in as "ftp". 28 cd to "plsig" 29 get any of the following file you are interested in: 30 31 dwarf.1.0.3.ps 32 dwarf.2.0.0.index.ps 33 dwarf.2.0.0.ps 34--------------------------------------------------------------------------- 35 36The generation of DWARF debugging information by the GNU version 2.x C 37compiler has now been tested rather extensively for m88k, i386, i860, and 38Sparc targets. The DWARF output of the GNU C compiler appears to inter- 39operate well with the standard SVR4 SDB debugger on these kinds of target 40systems (but of course, there are no guarantees). 41 42DWARF generation for the GNU g++ compiler is still not operable. This is 43due primarily to the many remaining cases where the g++ front end does not 44conform to the conventions used in the GNU C front end for representing 45various kinds of declarations in the TREE data structure. It is not clear 46at this time how these problems will be addressed. 47 48Future plans for the dwarfout.c module of the GNU compiler(s) includes the 49addition of full support for GNU FORTRAN. (This should, in theory, be a 50lot simpler to add than adding support for g++... but we'll see.) 51 52Many features of the DWARF version 2 specification have been adapted to 53(and used in) the GNU implementation of DWARF (version 1). In most of 54these cases, a DWARF version 2 approach is used in place of (or in addition 55to) DWARF version 1 stuff simply because it is apparent that DWARF version 561 is not sufficiently expressive to provide the kinds of information which 57may be necessary to support really robust debugging. In all of these cases 58however, the use of DWARF version 2 features should not interfere in any 59way with the interoperability (of GNU compilers) with generally available 60"classic" (pre version 1) DWARF consumer tools (e.g. SVR4 SDB). 61 62The DWARF generation enhancement for the GNU compiler(s) was initially 63donated to the Free Software Foundation by Network Computing Devices. 64(Thanks NCD!) Additional development and maintenance of dwarfout.c has 65been largely supported (i.e. funded) by Intel Corporation. (Thanks Intel!) 66 67If you have questions or comments about the DWARF generation feature, please 68send mail to me <rfg@netcom.com>. I will be happy to investigate any bugs 69reported and I may even provide fixes (but of course, I can make no promises). 70 71The DWARF debugging information produced by GCC may deviate in a few minor 72(but perhaps significant) respects from the DWARF debugging information 73currently produced by other C compilers. A serious attempt has been made 74however to conform to the published specifications, to existing practice, 75and to generally accepted norms in the GNU implementation of DWARF. 76 77 ** IMPORTANT NOTE ** ** IMPORTANT NOTE ** ** IMPORTANT NOTE ** 78 79Under normal circumstances, the DWARF information generated by the GNU 80compilers (in an assembly language file) is essentially impossible for 81a human being to read. This fact can make it very difficult to debug 82certain DWARF-related problems. In order to overcome this difficulty, 83a feature has been added to dwarfout.c (enabled by the -fverbose-asm 84option) which causes additional comments to be placed into the assembly 85language output file, out to the right-hand side of most bits of DWARF 86material. The comments indicate (far more clearly that the obscure 87DWARF hex codes do) what is actually being encoded in DWARF. Thus, the 88-fverbose-asm option can be highly useful for those who must study the 89DWARF output from the GNU compilers in detail. 90 91--------- 92 93(Footnote: Within this file, the term `Debugging Information Entry' will 94be abbreviated as `DIE'.) 95 96 97Release Notes (aka known bugs) 98------------------------------- 99 100In one very obscure case involving dynamically sized arrays, the DWARF 101"location information" for such an array may make it appear that the 102array has been totally optimized out of existence, when in fact it 103*must* actually exist. (This only happens when you are using *both* -g 104*and* -O.) This is due to aggressive dead store elimination in the 105compiler, and to the fact that the DECL_RTL expressions associated with 106variables are not always updated to correctly reflect the effects of 107GCC's aggressive dead store elimination. 108 109------------------------------- 110 111When attempting to set a breakpoint at the "start" of a function compiled 112with -g1, the debugger currently has no way of knowing exactly where the 113end of the prologue code for the function is. Thus, for most targets, 114all the debugger can do is to set the breakpoint at the AT_low_pc address 115for the function. But if you stop there and then try to look at one or 116more of the formal parameter values, they may not have been "homed" yet, 117so you may get inaccurate answers (or perhaps even addressing errors). 118 119Some people may consider this simply a non-feature, but I consider it a 120bug, and I hope to provide some GNU-specific attributes (on function 121DIEs) which will specify the address of the end of the prologue and the 122address of the beginning of the epilogue in a future release. 123 124------------------------------- 125 126It is believed at this time that old bugs relating to the AT_bit_offset 127values for bit-fields have been fixed. 128 129There may still be some very obscure bugs relating to the DWARF description 130of type `long long' bit-fields for target machines (e.g. 80x86 machines) 131where the alignment of type `long long' data objects is different from 132(and less than) the size of a type `long long' data object. 133 134Please report any problems with the DWARF description of bit-fields as you 135would any other GCC bug. (Procedures for bug reporting are given in the 136GNU C compiler manual.) 137 138-------------------------------- 139 140At this time, GCC does not know how to handle the GNU C "nested functions" 141extension. (See the GCC manual for more info on this extension to ANSI C.) 142 143-------------------------------- 144 145The GNU compilers now represent inline functions (and inlined instances 146thereof) in exactly the manner described by the current DWARF version 2 147(draft) specification. The version 1 specification for handling inline 148functions (and inlined instances) was known to be brain-damaged (by the 149PLSIG) when the version 1 spec was finalized, but it was simply too late 150in the cycle to get it removed before the version 1 spec was formally 151released to the public (by UI). 152 153-------------------------------- 154 155At this time, GCC does not generate the kind of really precise information 156about the exact declared types of entities with signed integral types which 157is required by the current DWARF draft specification. 158 159Specifically, the current DWARF draft specification seems to require that 160the type of an non-unsigned integral bit-field member of a struct or union 161type be represented as either a "signed" type or as a "plain" type, 162depending upon the exact set of keywords that were used in the 163type specification for the given bit-field member. It was felt (by the 164UI/PLSIG) that this distinction between "plain" and "signed" integral types 165could have some significance (in the case of bit-fields) because ANSI C 166does not constrain the signedness of a plain bit-field, whereas it does 167constrain the signedness of an explicitly "signed" bit-field. For this 168reason, the current DWARF specification calls for compilers to produce 169type information (for *all* integral typed entities... not just bit-fields) 170which explicitly indicates the signedness of the relevant type to be 171"signed" or "plain" or "unsigned". 172 173Unfortunately, the GNU DWARF implementation is currently incapable of making 174such distinctions. 175 176-------------------------------- 177 178 179Known Interoperability Problems 180------------------------------- 181 182Although the GNU implementation of DWARF conforms (for the most part) with 183the current UI/PLSIG DWARF version 1 specification (with many compatible 184version 2 features added in as "vendor specific extensions" just for good 185measure) there are a few known cases where GCC's DWARF output can cause 186some confusion for "classic" (pre version 1) DWARF consumers such as the 187System V Release 4 SDB debugger. These cases are described in this section. 188 189-------------------------------- 190 191The DWARF version 1 specification includes the fundamental type codes 192FT_ext_prec_float, FT_complex, FT_dbl_prec_complex, and FT_ext_prec_complex. 193Since GNU C is only a C compiler (and since C doesn't provide any "complex" 194data types) the only one of these fundamental type codes which GCC ever 195generates is FT_ext_prec_float. This fundamental type code is generated 196by GCC for the `long double' data type. Unfortunately, due to an apparent 197bug in the SVR4 SDB debugger, SDB can become very confused wherever any 198attempt is made to print a variable, parameter, or field whose type was 199given in terms of FT_ext_prec_float. 200 201(Actually, SVR4 SDB fails to understand *any* of the four fundamental type 202codes mentioned here. This will fact will cause additional problems when 203there is a GNU FORTRAN front-end.) 204 205-------------------------------- 206 207In general, it appears that SVR4 SDB is not able to effectively ignore 208fundamental type codes in the "implementation defined" range. This can 209cause problems when a program being debugged uses the `long long' data 210type (or the signed or unsigned varieties thereof) because these types 211are not defined by ANSI C, and thus, GCC must use its own private fundamental 212type codes (from the implementation-defined range) to represent these types. 213 214-------------------------------- 215 216 217General GNU DWARF extensions 218---------------------------- 219 220In the current DWARF version 1 specification, no mechanism is specified by 221which accurate information about executable code from include files can be 222properly (and fully) described. (The DWARF version 2 specification *does* 223specify such a mechanism, but it is about 10 times more complicated than 224it needs to be so I'm not terribly anxious to try to implement it right 225away.) 226 227In the GNU implementation of DWARF version 1, a fully downward-compatible 228extension has been implemented which permits the GNU compilers to specify 229which executable lines come from which files. This extension places 230additional information (about source file names) in GNU-specific sections 231(which should be totally ignored by all non-GNU DWARF consumers) so that 232this extended information can be provided (to GNU DWARF consumers) in a way 233which is totally transparent (and invisible) to non-GNU DWARF consumers 234(e.g. the SVR4 SDB debugger). The additional information is placed *only* 235in specialized GNU-specific sections, where it should never even be seen 236by non-GNU DWARF consumers. 237 238To understand this GNU DWARF extension, imagine that the sequence of entries 239in the .lines section is broken up into several subsections. Each contiguous 240sequence of .line entries which relates to a sequence of lines (or statements) 241from one particular file (either a `base' file or an `include' file) could 242be called a `line entries chunk' (LEC). 243 244For each LEC there is one entry in the .debug_srcinfo section. 245 246Each normal entry in the .debug_srcinfo section consists of two 4-byte 247words of data as follows: 248 249 (1) The starting address (relative to the entire .line section) 250 of the first .line entry in the relevant LEC. 251 252 (2) The starting address (relative to the entire .debug_sfnames 253 section) of a NUL terminated string representing the 254 relevant filename. (This filename name be either a 255 relative or an absolute filename, depending upon how the 256 given source file was located during compilation.) 257 258Obviously, each .debug_srcinfo entry allows you to find the relevant filename, 259and it also points you to the first .line entry that was generated as a result 260of having compiled a given source line from the given source file. 261 262Each subsequent .line entry should also be assumed to have been produced 263as a result of compiling yet more lines from the same file. The end of 264any given LEC is easily found by looking at the first 4-byte pointer in 265the *next* .debug_srcinfo entry. That next .debug_srcinfo entry points 266to a new and different LEC, so the preceding LEC (implicitly) must have 267ended with the last .line section entry which occurs at the 2 1/2 words 268just before the address given in the first pointer of the new .debug_srcinfo 269entry. 270 271The following picture may help to clarify this feature. Let's assume that 272`LE' stands for `.line entry'. Also, assume that `* 'stands for a pointer. 273 274 275 .line section .debug_srcinfo section .debug_sfnames section 276 ---------------------------------------------------------------- 277 278 LE <---------------------- * 279 LE * -----------------> "foobar.c" <--- 280 LE | 281 LE | 282 LE <---------------------- * | 283 LE * -----------------> "foobar.h" <| | 284 LE | | 285 LE | | 286 LE <---------------------- * | | 287 LE * -----------------> "inner.h" | | 288 LE | | 289 LE <---------------------- * | | 290 LE * ------------------------------- | 291 LE | 292 LE | 293 LE | 294 LE | 295 LE <---------------------- * | 296 LE * ----------------------------------- 297 LE 298 LE 299 LE 300 301In effect, each entry in the .debug_srcinfo section points to *both* a 302filename (in the .debug_sfnames section) and to the start of a block of 303consecutive LEs (in the .line section). 304 305Note that just like in the .line section, there are specialized first and 306last entries in the .debug_srcinfo section for each object file. These 307special first and last entries for the .debug_srcinfo section are very 308different from the normal .debug_srcinfo section entries. They provide 309additional information which may be helpful to a debugger when it is 310interpreting the data in the .debug_srcinfo, .debug_sfnames, and .line 311sections. 312 313The first entry in the .debug_srcinfo section for each compilation unit 314consists of five 4-byte words of data. The contents of these five words 315should be interpreted (by debuggers) as follows: 316 317 (1) The starting address (relative to the entire .line section) 318 of the .line section for this compilation unit. 319 320 (2) The starting address (relative to the entire .debug_sfnames 321 section) of the .debug_sfnames section for this compilation 322 unit. 323 324 (3) The starting address (in the execution virtual address space) 325 of the .text section for this compilation unit. 326 327 (4) The ending address plus one (in the execution virtual address 328 space) of the .text section for this compilation unit. 329 330 (5) The date/time (in seconds since midnight 1/1/70) at which the 331 compilation of this compilation unit occurred. This value 332 should be interpreted as an unsigned quantity because gcc 333 might be configured to generate a default value of 0xffffffff 334 in this field (in cases where it is desired to have object 335 files created at different times from identical source files 336 be byte-for-byte identical). By default, these timestamps 337 are *not* generated by dwarfout.c (so that object files 338 compiled at different times will be byte-for-byte identical). 339 If you wish to enable this "timestamp" feature however, you 340 can simply place a #define for the symbol `DWARF_TIMESTAMPS' 341 in your target configuration file and then rebuild the GNU 342 compiler(s). 343 344Note that the first string placed into the .debug_sfnames section for each 345compilation unit is the name of the directory in which compilation occurred. 346This string ends with a `/' (to help indicate that it is the pathname of a 347directory). Thus, the second word of each specialized initial .debug_srcinfo 348entry for each compilation unit may be used as a pointer to the (string) 349name of the compilation directory, and that string may in turn be used to 350"absolutize" any relative pathnames which may appear later on in the 351.debug_sfnames section entries for the same compilation unit. 352 353The fifth and last word of each specialized starting entry for a compilation 354unit in the .debug_srcinfo section may (depending upon your configuration) 355indicate the date/time of compilation, and this may be used (by a debugger) 356to determine if any of the source files which contributed code to this 357compilation unit are newer than the object code for the compilation unit 358itself. If so, the debugger may wish to print an "out-of-date" warning 359about the compilation unit. 360 361The .debug_srcinfo section associated with each compilation will also have 362a specialized terminating entry. This terminating .debug_srcinfo section 363entry will consist of the following two 4-byte words of data: 364 365 (1) The offset, measured from the start of the .line section to 366 the beginning of the terminating entry for the .line section. 367 368 (2) A word containing the value 0xffffffff. 369 370-------------------------------- 371 372In the current DWARF version 1 specification, no mechanism is specified by 373which information about macro definitions and un-definitions may be provided 374to the DWARF consumer. 375 376The DWARF version 2 (draft) specification does specify such a mechanism. 377That specification was based on the GNU ("vendor specific extension") 378which provided some support for macro definitions and un-definitions, 379but the "official" DWARF version 2 (draft) specification mechanism for 380handling macros and the GNU implementation have diverged somewhat. I 381plan to update the GNU implementation to conform to the "official" 382DWARF version 2 (draft) specification as soon as I get time to do that. 383 384Note that in the GNU implementation, additional information about macro 385definitions and un-definitions is *only* provided when the -g3 level of 386debug-info production is selected. (The default level is -g2 and the 387plain old -g option is considered to be identical to -g2.) 388 389GCC records information about macro definitions and undefinitions primarily 390in a section called the .debug_macinfo section. Normal entries in the 391.debug_macinfo section consist of the following three parts: 392 393 (1) A special "type" byte. 394 395 (2) A 3-byte line-number/filename-offset field. 396 397 (3) A NUL terminated string. 398 399The interpretation of the second and third parts is dependent upon the 400value of the leading (type) byte. 401 402The type byte may have one of four values depending upon the type of the 403.debug_macinfo entry which follows. The 1-byte MACINFO type codes presently 404used, and their meanings are as follows: 405 406 MACINFO_start A base file or an include file starts here. 407 MACINFO_resume The current base or include file ends here. 408 MACINFO_define A #define directive occurs here. 409 MACINFO_undef A #undef directive occur here. 410 411(Note that the MACINFO_... codes mentioned here are simply symbolic names 412for constants which are defined in the GNU dwarf.h file.) 413 414For MACINFO_define and MACINFO_undef entries, the second (3-byte) field 415contains the number of the source line (relative to the start of the current 416base source file or the current include files) when the #define or #undef 417directive appears. For a MACINFO_define entry, the following string field 418contains the name of the macro which is defined, followed by its definition. 419Note that the definition is always separated from the name of the macro 420by at least one whitespace character. For a MACINFO_undef entry, the 421string which follows the 3-byte line number field contains just the name 422of the macro which is being undef'ed. 423 424For a MACINFO_start entry, the 3-byte field following the type byte contains 425the offset, relative to the start of the .debug_sfnames section for the 426current compilation unit, of a string which names the new source file which 427is beginning its inclusion at this point. Following that 3-byte field, 428each MACINFO_start entry always contains a zero length NUL terminated 429string. 430 431For a MACINFO_resume entry, the 3-byte field following the type byte contains 432the line number WITHIN THE INCLUDING FILE at which the inclusion of the 433current file (whose inclusion ends here) was initiated. Following that 4343-byte field, each MACINFO_resume entry always contains a zero length NUL 435terminated string. 436 437Each set of .debug_macinfo entries for each compilation unit is terminated 438by a special .debug_macinfo entry consisting of a 4-byte zero value followed 439by a single NUL byte. 440 441-------------------------------- 442 443In the current DWARF draft specification, no provision is made for providing 444a separate level of (limited) debugging information necessary to support 445tracebacks (only) through fully-debugged code (e.g. code in system libraries). 446 447A proposal to define such a level was submitted (by me) to the UI/PLSIG. 448This proposal was rejected by the UI/PLSIG for inclusion into the DWARF 449version 1 specification for two reasons. First, it was felt (by the PLSIG) 450that the issues involved in supporting a "traceback only" subset of DWARF 451were not well understood. Second, and perhaps more importantly, the PLSIG 452is already having enough trouble agreeing on what it means to be "conforming" 453to the DWARF specification, and it was felt that trying to specify multiple 454different *levels* of conformance would only complicate our discussions of 455this already divisive issue. Nonetheless, the GNU implementation of DWARF 456provides an abbreviated "traceback only" level of debug-info production for 457use with fully-debugged "system library" code. This level should only be 458used for fully debugged system library code, and even then, it should only 459be used where there is a very strong need to conserve disk space. This 460abbreviated level of debug-info production can be used by specifying the 461-g1 option on the compilation command line. 462 463-------------------------------- 464 465As mentioned above, the GNU implementation of DWARF currently uses the DWARF 466version 2 (draft) approach for inline functions (and inlined instances 467thereof). This is used in preference to the version 1 approach because 468(quite simply) the version 1 approach is highly brain-damaged and probably 469unworkable. 470 471-------------------------------- 472 473 474GNU DWARF Representation of GNU C Extensions to ANSI C 475------------------------------------------------------ 476 477The file dwarfout.c has been designed and implemented so as to provide 478some reasonable DWARF representation for each and every declarative 479construct which is accepted by the GNU C compiler. Since the GNU C 480compiler accepts a superset of ANSI C, this means that there are some 481cases in which the DWARF information produced by GCC must take some 482liberties in improvising DWARF representations for declarations which 483are only valid in (extended) GNU C. 484 485In particular, GNU C provides at least three significant extensions to 486ANSI C when it comes to declarations. These are (1) inline functions, 487and (2) dynamic arrays, and (3) incomplete enum types. (See the GCC 488manual for more information on these GNU extensions to ANSI C.) When 489used, these GNU C extensions are represented (in the generated DWARF 490output of GCC) in the most natural and intuitively obvious ways. 491 492In the case of inline functions, the DWARF representation is exactly as 493called for in the DWARF version 2 (draft) specification for an identical 494function written in C++; i.e. we "reuse" the representation of inline 495functions which has been defined for C++ to support this GNU C extension. 496 497In the case of dynamic arrays, we use the most obvious representational 498mechanism available; i.e. an array type in which the upper bound of 499some dimension (usually the first and only dimension) is a variable 500rather than a constant. (See the DWARF version 1 specification for more 501details.) 502 503In the case of incomplete enum types, such types are represented simply 504as TAG_enumeration_type DIEs which DO NOT contain either AT_byte_size 505attributes or AT_element_list attributes. 506 507-------------------------------- 508 509 510Future Directions 511----------------- 512 513The codes, formats, and other paraphernalia necessary to provide proper 514support for symbolic debugging for the C++ language are still being worked 515on by the UI/PLSIG. The vast majority of the additions to DWARF which will 516be needed to completely support C++ have already been hashed out and agreed 517upon, but a few small issues (e.g. anonymous unions, access declarations) 518are still being discussed. Also, we in the PLSIG are still discussing 519whether or not we need to do anything special for C++ templates. (At this 520time it is not yet clear whether we even need to do anything special for 521these.) 522 523Unfortunately, as mentioned above, there are quite a few problems in the 524g++ front end itself, and these are currently responsible for severely 525restricting the progress which can be made on adding DWARF support 526specifically for the g++ front-end. Furthermore, Richard Stallman has 527expressed the view that C++ friendships might not be important enough to 528describe (in DWARF). This view directly conflicts with both the DWARF 529version 1 and version 2 (draft) specifications, so until this small 530misunderstanding is cleared up, DWARF support for g++ is unlikely. 531 532With regard to FORTRAN, the UI/PLSIG has defined what is believed to be a 533complete and sufficient set of codes and rules for adequately representing 534all of FORTRAN 77, and most of Fortran 90 in DWARF. While some support for 535this has been implemented in dwarfout.c, further implementation and testing 536will have to await the arrival of the GNU Fortran front-end (which is 537currently in early alpha test as of this writing). 538 539GNU DWARF support for other languages (i.e. Pascal and Modula) is a moot 540issue until there are GNU front-ends for these other languages. 541 542GNU DWARF support for DWARF version 2 will probably not be attempted until 543such time as the version 2 specification is finalized. (More work needs 544to be done on the version 2 specification to make the new "abbreviations" 545feature of version 2 more easily implementable. Until then, it will be 546a royal pain the ass to implement version 2 "abbreviations".) For the 547time being, version 2 features will be added (in a version 1 compatible 548manner) when and where these features seem necessary or extremely desirable. 549 550As currently defined, DWARF only describes a (binary) language which can 551be used to communicate symbolic debugging information from a compiler 552through an assembler and a linker, to a debugger. There is no clear 553specification of what processing should be (or must be) done by the 554assembler and/or the linker. Fortunately, the role of the assembler 555is easily inferred (by anyone knowledgeable about assemblers) just by 556looking at examples of assembly-level DWARF code. Sadly though, the 557allowable (or required) processing steps performed by a linker are 558harder to infer and (perhaps) even harder to agree upon. There are 559several forms of very useful `post-processing' steps which intelligent 560linkers *could* (in theory) perform on object files containing DWARF, 561but any and all such link-time transformations are currently both disallowed 562and unspecified. 563 564In particular, possible link-time transformations of DWARF code which could 565provide significant benefits include (but are not limited to): 566 567 Commonization of duplicate DIEs obtained from multiple input 568 (object) files. 569 570 Cross-compilation type checking based upon DWARF type information 571 for objects and functions. 572 573 Other possible `compacting' transformations designed to save disk 574 space and to reduce linker & debugger I/O activity. 575