1@section coff backends 2BFD supports a number of different flavours of coff format. 3The major differences between formats are the sizes and 4alignments of fields in structures on disk, and the occasional 5extra field. 6 7Coff in all its varieties is implemented with a few common 8files and a number of implementation specific files. For 9example, the i386 coff format is implemented in the file 10@file{coff-i386.c}. This file @code{#include}s 11@file{coff/i386.h} which defines the external structure of the 12coff format for the i386, and @file{coff/internal.h} which 13defines the internal structure. @file{coff-i386.c} also 14defines the relocations used by the i386 coff format 15@xref{Relocations}. 16 17@subsection Porting to a new version of coff 18The recommended method is to select from the existing 19implementations the version of coff which is most like the one 20you want to use. For example, we'll say that i386 coff is 21the one you select, and that your coff flavour is called foo. 22Copy @file{i386coff.c} to @file{foocoff.c}, copy 23@file{../include/coff/i386.h} to @file{../include/coff/foo.h}, 24and add the lines to @file{targets.c} and @file{Makefile.in} 25so that your new back end is used. Alter the shapes of the 26structures in @file{../include/coff/foo.h} so that they match 27what you need. You will probably also have to add 28@code{#ifdef}s to the code in @file{coff/internal.h} and 29@file{coffcode.h} if your version of coff is too wild. 30 31You can verify that your new BFD backend works quite simply by 32building @file{objdump} from the @file{binutils} directory, 33and making sure that its version of what's going on and your 34host system's idea (assuming it has the pretty standard coff 35dump utility, usually called @code{att-dump} or just 36@code{dump}) are the same. Then clean up your code, and send 37what you've done to Cygnus. Then your stuff will be in the 38next release, and you won't have to keep integrating it. 39 40@subsection How the coff backend works 41 42 43@subsubsection File layout 44The Coff backend is split into generic routines that are 45applicable to any Coff target and routines that are specific 46to a particular target. The target-specific routines are 47further split into ones which are basically the same for all 48Coff targets except that they use the external symbol format 49or use different values for certain constants. 50 51The generic routines are in @file{coffgen.c}. These routines 52work for any Coff target. They use some hooks into the target 53specific code; the hooks are in a @code{bfd_coff_backend_data} 54structure, one of which exists for each target. 55 56The essentially similar target-specific routines are in 57@file{coffcode.h}. This header file includes executable C code. 58The various Coff targets first include the appropriate Coff 59header file, make any special defines that are needed, and 60then include @file{coffcode.h}. 61 62Some of the Coff targets then also have additional routines in 63the target source file itself. 64 65@subsubsection Coff long section names 66In the standard Coff object format, section names are limited to 67the eight bytes available in the @code{s_name} field of the 68@code{SCNHDR} section header structure. The format requires the 69field to be NUL-padded, but not necessarily NUL-terminated, so 70the longest section names permitted are a full eight characters. 71 72The Microsoft PE variants of the Coff object file format add 73an extension to support the use of long section names. This 74extension is defined in section 4 of the Microsoft PE/COFF 75specification (rev 8.1). If a section name is too long to fit 76into the section header's @code{s_name} field, it is instead 77placed into the string table, and the @code{s_name} field is 78filled with a slash ("/") followed by the ASCII decimal 79representation of the offset of the full name relative to the 80string table base. 81 82Note that this implies that the extension can only be used in object 83files, as executables do not contain a string table. The standard 84specifies that long section names from objects emitted into executable 85images are to be truncated. 86 87However, as a GNU extension, BFD can generate executable images 88that contain a string table and long section names. This 89would appear to be technically valid, as the standard only says 90that Coff debugging information is deprecated, not forbidden, 91and in practice it works, although some tools that parse PE files 92expecting the MS standard format may become confused; @file{PEview} is 93one known example. 94 95The functionality is supported in BFD by code implemented under 96the control of the macro @code{COFF_LONG_SECTION_NAMES}. If not 97defined, the format does not support long section names in any way. 98If defined, it is used to initialise a flag, 99@code{_bfd_coff_long_section_names}, and a hook function pointer, 100@code{_bfd_coff_set_long_section_names}, in the Coff backend data 101structure. The flag controls the generation of long section names 102in output BFDs at runtime; if it is false, as it will be by default 103when generating an executable image, long section names are truncated; 104if true, the long section names extension is employed. The hook 105points to a function that allows the value of a copy of the flag 106in coff object tdata to be altered at runtime, on formats that 107support long section names at all; on other formats it points 108to a stub that returns an error indication. 109 110With input BFDs, the flag is set according to whether any long section 111names are detected while reading the section headers. For a completely 112new BFD, the flag is set to the default for the target format. This 113information can be used by a client of the BFD library when deciding 114what output format to generate, and means that a BFD that is opened 115for read and subsequently converted to a writeable BFD and modified 116in-place will retain whatever format it had on input. 117 118If @code{COFF_LONG_SECTION_NAMES} is simply defined (blank), or is 119defined to the value "1", then long section names are enabled by 120default; if it is defined to the value zero, they are disabled by 121default (but still accepted in input BFDs). The header @file{coffcode.h} 122defines a macro, @code{COFF_DEFAULT_LONG_SECTION_NAMES}, which is 123used in the backends to initialise the backend data structure fields 124appropriately; see the comments for further detail. 125 126@subsubsection Bit twiddling 127Each flavour of coff supported in BFD has its own header file 128describing the external layout of the structures. There is also 129an internal description of the coff layout, in 130@file{coff/internal.h}. A major function of the 131coff backend is swapping the bytes and twiddling the bits to 132translate the external form of the structures into the normal 133internal form. This is all performed in the 134@code{bfd_swap}_@i{thing}_@i{direction} routines. Some 135elements are different sizes between different versions of 136coff; it is the duty of the coff version specific include file 137to override the definitions of various packing routines in 138@file{coffcode.h}. E.g., the size of line number entry in coff is 139sometimes 16 bits, and sometimes 32 bits. @code{#define}ing 140@code{PUT_LNSZ_LNNO} and @code{GET_LNSZ_LNNO} will select the 141correct one. No doubt, some day someone will find a version of 142coff which has a varying field size not catered to at the 143moment. To port BFD, that person will have to add more @code{#defines}. 144Three of the bit twiddling routines are exported to 145@code{gdb}; @code{coff_swap_aux_in}, @code{coff_swap_sym_in} 146and @code{coff_swap_lineno_in}. @code{GDB} reads the symbol 147table on its own, but uses BFD to fix things up. More of the 148bit twiddlers are exported for @code{gas}; 149@code{coff_swap_aux_out}, @code{coff_swap_sym_out}, 150@code{coff_swap_lineno_out}, @code{coff_swap_reloc_out}, 151@code{coff_swap_filehdr_out}, @code{coff_swap_aouthdr_out}, 152@code{coff_swap_scnhdr_out}. @code{Gas} currently keeps track 153of all the symbol table and reloc drudgery itself, thereby 154saving the internal BFD overhead, but uses BFD to swap things 155on the way out, making cross ports much safer. Doing so also 156allows BFD (and thus the linker) to use the same header files 157as @code{gas}, which makes one avenue to disaster disappear. 158 159@subsubsection Symbol reading 160The simple canonical form for symbols used by BFD is not rich 161enough to keep all the information available in a coff symbol 162table. The back end gets around this problem by keeping the original 163symbol table around, "behind the scenes". 164 165When a symbol table is requested (through a call to 166@code{bfd_canonicalize_symtab}), a request gets through to 167@code{coff_get_normalized_symtab}. This reads the symbol table from 168the coff file and swaps all the structures inside into the 169internal form. It also fixes up all the pointers in the table 170(represented in the file by offsets from the first symbol in 171the table) into physical pointers to elements in the new 172internal table. This involves some work since the meanings of 173fields change depending upon context: a field that is a 174pointer to another structure in the symbol table at one moment 175may be the size in bytes of a structure at the next. Another 176pass is made over the table. All symbols which mark file names 177(@code{C_FILE} symbols) are modified so that the internal 178string points to the value in the auxent (the real filename) 179rather than the normal text associated with the symbol 180(@code{".file"}). 181 182At this time the symbol names are moved around. Coff stores 183all symbols less than nine characters long physically 184within the symbol table; longer strings are kept at the end of 185the file in the string table. This pass moves all strings 186into memory and replaces them with pointers to the strings. 187 188The symbol table is massaged once again, this time to create 189the canonical table used by the BFD application. Each symbol 190is inspected in turn, and a decision made (using the 191@code{sclass} field) about the various flags to set in the 192@code{asymbol}. @xref{Symbols}. The generated canonical table 193shares strings with the hidden internal symbol table. 194 195Any linenumbers are read from the coff file too, and attached 196to the symbols which own the functions the linenumbers belong to. 197 198@subsubsection Symbol writing 199Writing a symbol to a coff file which didn't come from a coff 200file will lose any debugging information. The @code{asymbol} 201structure remembers the BFD from which the symbol was taken, and on 202output the back end makes sure that the same destination target as 203source target is present. 204 205When the symbols have come from a coff file then all the 206debugging information is preserved. 207 208Symbol tables are provided for writing to the back end in a 209vector of pointers to pointers. This allows applications like 210the linker to accumulate and output large symbol tables 211without having to do too much byte copying. 212 213This function runs through the provided symbol table and 214patches each symbol marked as a file place holder 215(@code{C_FILE}) to point to the next file place holder in the 216list. It also marks each @code{offset} field in the list with 217the offset from the first symbol of the current symbol. 218 219Another function of this procedure is to turn the canonical 220value form of BFD into the form used by coff. Internally, BFD 221expects symbol values to be offsets from a section base; so a 222symbol physically at 0x120, but in a section starting at 2230x100, would have the value 0x20. Coff expects symbols to 224contain their final value, so symbols have their values 225changed at this point to reflect their sum with their owning 226section. This transformation uses the 227@code{output_section} field of the @code{asymbol}'s 228@code{asection} @xref{Sections}. 229 230@itemize @bullet 231 232@item 233@code{coff_mangle_symbols} 234@end itemize 235This routine runs though the provided symbol table and uses 236the offsets generated by the previous pass and the pointers 237generated when the symbol table was read in to create the 238structured hierarchy required by coff. It changes each pointer 239to a symbol into the index into the symbol table of the asymbol. 240 241@itemize @bullet 242 243@item 244@code{coff_write_symbols} 245@end itemize 246This routine runs through the symbol table and patches up the 247symbols from their internal form into the coff way, calls the 248bit twiddlers, and writes out the table to the file. 249 250@findex coff_symbol_type 251@subsubsection @code{coff_symbol_type} 252The hidden information for an @code{asymbol} is described in a 253@code{combined_entry_type}: 254 255 256@example 257typedef struct coff_ptr_struct 258@{ 259 /* Remembers the offset from the first symbol in the file for 260 this symbol. Generated by coff_renumber_symbols. */ 261 unsigned int offset; 262 263 /* Selects between the elements of the union below. */ 264 unsigned int is_sym : 1; 265 266 /* Selects between the elements of the x_sym.x_tagndx union. If set, 267 p is valid and the field will be renumbered. */ 268 unsigned int fix_tag : 1; 269 270 /* Selects between the elements of the x_sym.x_fcnary.x_fcn.x_endndx 271 union. If set, p is valid and the field will be renumbered. */ 272 unsigned int fix_end : 1; 273 274 /* Selects between the elements of the x_csect.x_scnlen union. If set, 275 p is valid and the field will be renumbered. */ 276 unsigned int fix_scnlen : 1; 277 278 /* If set, u.syment.n_value contains a pointer to a symbol. The final 279 value will be the offset field. Used for XCOFF C_BSTAT symbols. */ 280 unsigned int fix_value : 1; 281 282 /* If set, u.syment.n_value is an index into the line number entries. 283 Used for XCOFF C_BINCL/C_EINCL symbols. */ 284 unsigned int fix_line : 1; 285 286 /* The container for the symbol structure as read and translated 287 from the file. */ 288 union 289 @{ 290 union internal_auxent auxent; 291 struct internal_syment syment; 292 @} u; 293 294 /* An extra pointer which can used by format based on COFF (like XCOFF) 295 to provide extra information to their backend. */ 296 void *extrap; 297@} combined_entry_type; 298 299/* Each canonical asymbol really looks like this: */ 300 301typedef struct coff_symbol_struct 302@{ 303 /* The actual symbol which the rest of BFD works with */ 304 asymbol symbol; 305 306 /* A pointer to the hidden information for this symbol */ 307 combined_entry_type *native; 308 309 /* A pointer to the linenumber information for this symbol */ 310 struct lineno_cache_entry *lineno; 311 312 /* Have the line numbers been relocated yet ? */ 313 bool done_lineno; 314@} coff_symbol_type; 315 316@end example 317@findex bfd_coff_backend_data 318@subsubsection @code{bfd_coff_backend_data} 319 320@example 321typedef struct 322@{ 323 void (*_bfd_coff_swap_aux_in) 324 (bfd *, void *, int, int, int, int, void *); 325 326 void (*_bfd_coff_swap_sym_in) 327 (bfd *, void *, void *); 328 329 void (*_bfd_coff_swap_lineno_in) 330 (bfd *, void *, void *); 331 332 unsigned int (*_bfd_coff_swap_aux_out) 333 (bfd *, void *, int, int, int, int, void *); 334 335 unsigned int (*_bfd_coff_swap_sym_out) 336 (bfd *, void *, void *); 337 338 unsigned int (*_bfd_coff_swap_lineno_out) 339 (bfd *, void *, void *); 340 341 unsigned int (*_bfd_coff_swap_reloc_out) 342 (bfd *, void *, void *); 343 344 unsigned int (*_bfd_coff_swap_filehdr_out) 345 (bfd *, void *, void *); 346 347 unsigned int (*_bfd_coff_swap_aouthdr_out) 348 (bfd *, void *, void *); 349 350 unsigned int (*_bfd_coff_swap_scnhdr_out) 351 (bfd *, void *, void *); 352 353 unsigned int _bfd_filhsz; 354 unsigned int _bfd_aoutsz; 355 unsigned int _bfd_scnhsz; 356 unsigned int _bfd_symesz; 357 unsigned int _bfd_auxesz; 358 unsigned int _bfd_relsz; 359 unsigned int _bfd_linesz; 360 unsigned int _bfd_filnmlen; 361 bool _bfd_coff_long_filenames; 362 363 bool _bfd_coff_long_section_names; 364 bool (*_bfd_coff_set_long_section_names) 365 (bfd *, int); 366 367 unsigned int _bfd_coff_default_section_alignment_power; 368 bool _bfd_coff_force_symnames_in_strings; 369 unsigned int _bfd_coff_debug_string_prefix_length; 370 unsigned int _bfd_coff_max_nscns; 371 372 void (*_bfd_coff_swap_filehdr_in) 373 (bfd *, void *, void *); 374 375 void (*_bfd_coff_swap_aouthdr_in) 376 (bfd *, void *, void *); 377 378 void (*_bfd_coff_swap_scnhdr_in) 379 (bfd *, void *, void *); 380 381 void (*_bfd_coff_swap_reloc_in) 382 (bfd *abfd, void *, void *); 383 384 bool (*_bfd_coff_bad_format_hook) 385 (bfd *, void *); 386 387 bool (*_bfd_coff_set_arch_mach_hook) 388 (bfd *, void *); 389 390 void * (*_bfd_coff_mkobject_hook) 391 (bfd *, void *, void *); 392 393 bool (*_bfd_styp_to_sec_flags_hook) 394 (bfd *, void *, const char *, asection *, flagword *); 395 396 void (*_bfd_set_alignment_hook) 397 (bfd *, asection *, void *); 398 399 bool (*_bfd_coff_slurp_symbol_table) 400 (bfd *); 401 402 bool (*_bfd_coff_symname_in_debug) 403 (bfd *, struct internal_syment *); 404 405 bool (*_bfd_coff_pointerize_aux_hook) 406 (bfd *, combined_entry_type *, combined_entry_type *, 407 unsigned int, combined_entry_type *); 408 409 bool (*_bfd_coff_print_aux) 410 (bfd *, FILE *, combined_entry_type *, combined_entry_type *, 411 combined_entry_type *, unsigned int); 412 413 bool (*_bfd_coff_reloc16_extra_cases) 414 (bfd *, struct bfd_link_info *, struct bfd_link_order *, arelent *, 415 bfd_byte *, size_t *, size_t *); 416 417 int (*_bfd_coff_reloc16_estimate) 418 (bfd *, asection *, arelent *, unsigned int, 419 struct bfd_link_info *); 420 421 enum coff_symbol_classification (*_bfd_coff_classify_symbol) 422 (bfd *, struct internal_syment *); 423 424 bool (*_bfd_coff_compute_section_file_positions) 425 (bfd *); 426 427 bool (*_bfd_coff_start_final_link) 428 (bfd *, struct bfd_link_info *); 429 430 bool (*_bfd_coff_relocate_section) 431 (bfd *, struct bfd_link_info *, bfd *, asection *, bfd_byte *, 432 struct internal_reloc *, struct internal_syment *, asection **); 433 434 reloc_howto_type *(*_bfd_coff_rtype_to_howto) 435 (bfd *, asection *, struct internal_reloc *, 436 struct coff_link_hash_entry *, struct internal_syment *, bfd_vma *); 437 438 bool (*_bfd_coff_adjust_symndx) 439 (bfd *, struct bfd_link_info *, bfd *, asection *, 440 struct internal_reloc *, bool *); 441 442 bool (*_bfd_coff_link_add_one_symbol) 443 (struct bfd_link_info *, bfd *, const char *, flagword, 444 asection *, bfd_vma, const char *, bool, bool, 445 struct bfd_link_hash_entry **); 446 447 bool (*_bfd_coff_link_output_has_begun) 448 (bfd *, struct coff_final_link_info *); 449 450 bool (*_bfd_coff_final_link_postscript) 451 (bfd *, struct coff_final_link_info *); 452 453 bool (*_bfd_coff_print_pdata) 454 (bfd *, void *); 455 456@} bfd_coff_backend_data; 457 458@end example 459@subsubsection Writing relocations 460To write relocations, the back end steps though the 461canonical relocation table and create an 462@code{internal_reloc}. The symbol index to use is removed from 463the @code{offset} field in the symbol table supplied. The 464address comes directly from the sum of the section base 465address and the relocation offset; the type is dug directly 466from the howto field. Then the @code{internal_reloc} is 467swapped into the shape of an @code{external_reloc} and written 468out to disk. 469 470@subsubsection Reading linenumbers 471Creating the linenumber table is done by reading in the entire 472coff linenumber table, and creating another table for internal use. 473 474A coff linenumber table is structured so that each function 475is marked as having a line number of 0. Each line within the 476function is an offset from the first line in the function. The 477base of the line number information for the table is stored in 478the symbol associated with the function. 479 480Note: The PE format uses line number 0 for a flag indicating a 481new source file. 482 483The information is copied from the external to the internal 484table, and each symbol which marks a function is marked by 485pointing its... 486 487How does this work ? 488 489@subsubsection Reading relocations 490Coff relocations are easily transformed into the internal BFD form 491(@code{arelent}). 492 493Reading a coff relocation table is done in the following stages: 494 495@itemize @bullet 496 497@item 498Read the entire coff relocation table into memory. 499 500@item 501Process each relocation in turn; first swap it from the 502external to the internal form. 503 504@item 505Turn the symbol referenced in the relocation's symbol index 506into a pointer into the canonical symbol table. 507This table is the same as the one returned by a call to 508@code{bfd_canonicalize_symtab}. The back end will call that 509routine and save the result if a canonicalization hasn't been done. 510 511@item 512The reloc index is turned into a pointer to a howto 513structure, in a back end specific way. For instance, the 386 514uses the @code{r_type} to directly produce an index 515into a howto table vector. 516 517@item 518Note that @code{arelent.addend} for COFF is often not what 519most people understand as a relocation addend, but rather an 520adjustment to the relocation addend stored in section contents 521of relocatable object files. The value found in section 522contents may also be confusing, depending on both symbol value 523and addend somewhat similar to the field value for a 524final-linked object. See @code{CALC_ADDEND}. 525@end itemize 526 527