1@section coff backends
2BFD supports a number of different flavours of coff format.
3The major differences between formats are the sizes and
4alignments of fields in structures on disk, and the occasional
5extra field.
6
7Coff in all its varieties is implemented with a few common
8files and a number of implementation specific files. For
9example, the i386 coff format is implemented in the file
10@file{coff-i386.c}.  This file @code{#include}s
11@file{coff/i386.h} which defines the external structure of the
12coff format for the i386, and @file{coff/internal.h} which
13defines the internal structure. @file{coff-i386.c} also
14defines the relocations used by the i386 coff format
15@xref{Relocations}.
16
17@subsection Porting to a new version of coff
18The recommended method is to select from the existing
19implementations the version of coff which is most like the one
20you want to use.  For example, we'll say that i386 coff is
21the one you select, and that your coff flavour is called foo.
22Copy @file{i386coff.c} to @file{foocoff.c}, copy
23@file{../include/coff/i386.h} to @file{../include/coff/foo.h},
24and add the lines to @file{targets.c} and @file{Makefile.in}
25so that your new back end is used. Alter the shapes of the
26structures in @file{../include/coff/foo.h} so that they match
27what you need. You will probably also have to add
28@code{#ifdef}s to the code in @file{coff/internal.h} and
29@file{coffcode.h} if your version of coff is too wild.
30
31You can verify that your new BFD backend works quite simply by
32building @file{objdump} from the @file{binutils} directory,
33and making sure that its version of what's going on and your
34host system's idea (assuming it has the pretty standard coff
35dump utility, usually called @code{att-dump} or just
36@code{dump}) are the same.  Then clean up your code, and send
37what you've done to Cygnus. Then your stuff will be in the
38next release, and you won't have to keep integrating it.
39
40@subsection How the coff backend works
41
42
43@subsubsection File layout
44The Coff backend is split into generic routines that are
45applicable to any Coff target and routines that are specific
46to a particular target.  The target-specific routines are
47further split into ones which are basically the same for all
48Coff targets except that they use the external symbol format
49or use different values for certain constants.
50
51The generic routines are in @file{coffgen.c}.  These routines
52work for any Coff target.  They use some hooks into the target
53specific code; the hooks are in a @code{bfd_coff_backend_data}
54structure, one of which exists for each target.
55
56The essentially similar target-specific routines are in
57@file{coffcode.h}.  This header file includes executable C code.
58The various Coff targets first include the appropriate Coff
59header file, make any special defines that are needed, and
60then include @file{coffcode.h}.
61
62Some of the Coff targets then also have additional routines in
63the target source file itself.
64
65@subsubsection Coff long section names
66In the standard Coff object format, section names are limited to
67the eight bytes available in the @code{s_name} field of the
68@code{SCNHDR} section header structure.  The format requires the
69field to be NUL-padded, but not necessarily NUL-terminated, so
70the longest section names permitted are a full eight characters.
71
72The Microsoft PE variants of the Coff object file format add
73an extension to support the use of long section names.  This
74extension is defined in section 4 of the Microsoft PE/COFF
75specification (rev 8.1).  If a section name is too long to fit
76into the section header's @code{s_name} field, it is instead
77placed into the string table, and the @code{s_name} field is
78filled with a slash ("/") followed by the ASCII decimal
79representation of the offset of the full name relative to the
80string table base.
81
82Note that this implies that the extension can only be used in object
83files, as executables do not contain a string table.  The standard
84specifies that long section names from objects emitted into executable
85images are to be truncated.
86
87However, as a GNU extension, BFD can generate executable images
88that contain a string table and long section names.  This
89would appear to be technically valid, as the standard only says
90that Coff debugging information is deprecated, not forbidden,
91and in practice it works, although some tools that parse PE files
92expecting the MS standard format may become confused; @file{PEview} is
93one known example.
94
95The functionality is supported in BFD by code implemented under
96the control of the macro @code{COFF_LONG_SECTION_NAMES}.  If not
97defined, the format does not support long section names in any way.
98If defined, it is used to initialise a flag,
99@code{_bfd_coff_long_section_names}, and a hook function pointer,
100@code{_bfd_coff_set_long_section_names}, in the Coff backend data
101structure.  The flag controls the generation of long section names
102in output BFDs at runtime; if it is false, as it will be by default
103when generating an executable image, long section names are truncated;
104if true, the long section names extension is employed.  The hook
105points to a function that allows the value of a copy of the flag
106in coff object tdata to be altered at runtime, on formats that
107support long section names at all; on other formats it points
108to a stub that returns an error indication.
109
110With input BFDs, the flag is set according to whether any long section
111names are detected while reading the section headers.  For a completely
112new BFD, the flag is set to the default for the target format.  This
113information can be used by a client of the BFD library when deciding
114what output format to generate, and means that a BFD that is opened
115for read and subsequently converted to a writeable BFD and modified
116in-place will retain whatever format it had on input.
117
118If @code{COFF_LONG_SECTION_NAMES} is simply defined (blank), or is
119defined to the value "1", then long section names are enabled by
120default; if it is defined to the value zero, they are disabled by
121default (but still accepted in input BFDs).  The header @file{coffcode.h}
122defines a macro, @code{COFF_DEFAULT_LONG_SECTION_NAMES}, which is
123used in the backends to initialise the backend data structure fields
124appropriately; see the comments for further detail.
125
126@subsubsection Bit twiddling
127Each flavour of coff supported in BFD has its own header file
128describing the external layout of the structures. There is also
129an internal description of the coff layout, in
130@file{coff/internal.h}. A major function of the
131coff backend is swapping the bytes and twiddling the bits to
132translate the external form of the structures into the normal
133internal form. This is all performed in the
134@code{bfd_swap}_@i{thing}_@i{direction} routines. Some
135elements are different sizes between different versions of
136coff; it is the duty of the coff version specific include file
137to override the definitions of various packing routines in
138@file{coffcode.h}. E.g., the size of line number entry in coff is
139sometimes 16 bits, and sometimes 32 bits. @code{#define}ing
140@code{PUT_LNSZ_LNNO} and @code{GET_LNSZ_LNNO} will select the
141correct one. No doubt, some day someone will find a version of
142coff which has a varying field size not catered to at the
143moment. To port BFD, that person will have to add more @code{#defines}.
144Three of the bit twiddling routines are exported to
145@code{gdb}; @code{coff_swap_aux_in}, @code{coff_swap_sym_in}
146and @code{coff_swap_lineno_in}. @code{GDB} reads the symbol
147table on its own, but uses BFD to fix things up.  More of the
148bit twiddlers are exported for @code{gas};
149@code{coff_swap_aux_out}, @code{coff_swap_sym_out},
150@code{coff_swap_lineno_out}, @code{coff_swap_reloc_out},
151@code{coff_swap_filehdr_out}, @code{coff_swap_aouthdr_out},
152@code{coff_swap_scnhdr_out}. @code{Gas} currently keeps track
153of all the symbol table and reloc drudgery itself, thereby
154saving the internal BFD overhead, but uses BFD to swap things
155on the way out, making cross ports much safer.  Doing so also
156allows BFD (and thus the linker) to use the same header files
157as @code{gas}, which makes one avenue to disaster disappear.
158
159@subsubsection Symbol reading
160The simple canonical form for symbols used by BFD is not rich
161enough to keep all the information available in a coff symbol
162table. The back end gets around this problem by keeping the original
163symbol table around, "behind the scenes".
164
165When a symbol table is requested (through a call to
166@code{bfd_canonicalize_symtab}), a request gets through to
167@code{coff_get_normalized_symtab}. This reads the symbol table from
168the coff file and swaps all the structures inside into the
169internal form. It also fixes up all the pointers in the table
170(represented in the file by offsets from the first symbol in
171the table) into physical pointers to elements in the new
172internal table. This involves some work since the meanings of
173fields change depending upon context: a field that is a
174pointer to another structure in the symbol table at one moment
175may be the size in bytes of a structure at the next.  Another
176pass is made over the table. All symbols which mark file names
177(@code{C_FILE} symbols) are modified so that the internal
178string points to the value in the auxent (the real filename)
179rather than the normal text associated with the symbol
180(@code{".file"}).
181
182At this time the symbol names are moved around. Coff stores
183all symbols less than nine characters long physically
184within the symbol table; longer strings are kept at the end of
185the file in the string table. This pass moves all strings
186into memory and replaces them with pointers to the strings.
187
188The symbol table is massaged once again, this time to create
189the canonical table used by the BFD application. Each symbol
190is inspected in turn, and a decision made (using the
191@code{sclass} field) about the various flags to set in the
192@code{asymbol}.  @xref{Symbols}. The generated canonical table
193shares strings with the hidden internal symbol table.
194
195Any linenumbers are read from the coff file too, and attached
196to the symbols which own the functions the linenumbers belong to.
197
198@subsubsection Symbol writing
199Writing a symbol to a coff file which didn't come from a coff
200file will lose any debugging information. The @code{asymbol}
201structure remembers the BFD from which the symbol was taken, and on
202output the back end makes sure that the same destination target as
203source target is present.
204
205When the symbols have come from a coff file then all the
206debugging information is preserved.
207
208Symbol tables are provided for writing to the back end in a
209vector of pointers to pointers. This allows applications like
210the linker to accumulate and output large symbol tables
211without having to do too much byte copying.
212
213This function runs through the provided symbol table and
214patches each symbol marked as a file place holder
215(@code{C_FILE}) to point to the next file place holder in the
216list. It also marks each @code{offset} field in the list with
217the offset from the first symbol of the current symbol.
218
219Another function of this procedure is to turn the canonical
220value form of BFD into the form used by coff. Internally, BFD
221expects symbol values to be offsets from a section base; so a
222symbol physically at 0x120, but in a section starting at
2230x100, would have the value 0x20. Coff expects symbols to
224contain their final value, so symbols have their values
225changed at this point to reflect their sum with their owning
226section.  This transformation uses the
227@code{output_section} field of the @code{asymbol}'s
228@code{asection} @xref{Sections}.
229
230@itemize @bullet
231
232@item
233@code{coff_mangle_symbols}
234@end itemize
235This routine runs though the provided symbol table and uses
236the offsets generated by the previous pass and the pointers
237generated when the symbol table was read in to create the
238structured hierarchy required by coff. It changes each pointer
239to a symbol into the index into the symbol table of the asymbol.
240
241@itemize @bullet
242
243@item
244@code{coff_write_symbols}
245@end itemize
246This routine runs through the symbol table and patches up the
247symbols from their internal form into the coff way, calls the
248bit twiddlers, and writes out the table to the file.
249
250@findex coff_symbol_type
251@subsubsection @code{coff_symbol_type}
252The hidden information for an @code{asymbol} is described in a
253@code{combined_entry_type}:
254
255
256@example
257typedef struct coff_ptr_struct
258@{
259  /* Remembers the offset from the first symbol in the file for
260     this symbol.  Generated by coff_renumber_symbols.  */
261  unsigned int offset;
262
263  /* Selects between the elements of the union below.  */
264  unsigned int is_sym : 1;
265
266  /* Selects between the elements of the x_sym.x_tagndx union.  If set,
267     p is valid and the field will be renumbered.  */
268  unsigned int fix_tag : 1;
269
270  /* Selects between the elements of the x_sym.x_fcnary.x_fcn.x_endndx
271     union.  If set, p is valid and the field will be renumbered.  */
272  unsigned int fix_end : 1;
273
274  /* Selects between the elements of the x_csect.x_scnlen union.  If set,
275     p is valid and the field will be renumbered.  */
276  unsigned int fix_scnlen : 1;
277
278  /* If set, u.syment.n_value contains a pointer to a symbol.  The final
279     value will be the offset field.  Used for XCOFF C_BSTAT symbols.  */
280  unsigned int fix_value : 1;
281
282  /* If set, u.syment.n_value is an index into the line number entries.
283     Used for XCOFF C_BINCL/C_EINCL symbols.  */
284  unsigned int fix_line : 1;
285
286  /* The container for the symbol structure as read and translated
287     from the file.  */
288  union
289  @{
290    union internal_auxent auxent;
291    struct internal_syment syment;
292  @} u;
293
294 /* An extra pointer which can used by format based on COFF (like XCOFF)
295    to provide extra information to their backend.  */
296 void *extrap;
297@} combined_entry_type;
298
299/* Each canonical asymbol really looks like this: */
300
301typedef struct coff_symbol_struct
302@{
303  /* The actual symbol which the rest of BFD works with */
304  asymbol symbol;
305
306  /* A pointer to the hidden information for this symbol */
307  combined_entry_type *native;
308
309  /* A pointer to the linenumber information for this symbol */
310  struct lineno_cache_entry *lineno;
311
312  /* Have the line numbers been relocated yet ? */
313  bool done_lineno;
314@} coff_symbol_type;
315
316@end example
317@findex bfd_coff_backend_data
318@subsubsection @code{bfd_coff_backend_data}
319
320@example
321typedef struct
322@{
323  void (*_bfd_coff_swap_aux_in)
324    (bfd *, void *, int, int, int, int, void *);
325
326  void (*_bfd_coff_swap_sym_in)
327    (bfd *, void *, void *);
328
329  void (*_bfd_coff_swap_lineno_in)
330    (bfd *, void *, void *);
331
332  unsigned int (*_bfd_coff_swap_aux_out)
333    (bfd *, void *, int, int, int, int, void *);
334
335  unsigned int (*_bfd_coff_swap_sym_out)
336    (bfd *, void *, void *);
337
338  unsigned int (*_bfd_coff_swap_lineno_out)
339    (bfd *, void *, void *);
340
341  unsigned int (*_bfd_coff_swap_reloc_out)
342    (bfd *, void *, void *);
343
344  unsigned int (*_bfd_coff_swap_filehdr_out)
345    (bfd *, void *, void *);
346
347  unsigned int (*_bfd_coff_swap_aouthdr_out)
348    (bfd *, void *, void *);
349
350  unsigned int (*_bfd_coff_swap_scnhdr_out)
351    (bfd *, void *, void *);
352
353  unsigned int _bfd_filhsz;
354  unsigned int _bfd_aoutsz;
355  unsigned int _bfd_scnhsz;
356  unsigned int _bfd_symesz;
357  unsigned int _bfd_auxesz;
358  unsigned int _bfd_relsz;
359  unsigned int _bfd_linesz;
360  unsigned int _bfd_filnmlen;
361  bool _bfd_coff_long_filenames;
362
363  bool _bfd_coff_long_section_names;
364  bool (*_bfd_coff_set_long_section_names)
365    (bfd *, int);
366
367  unsigned int _bfd_coff_default_section_alignment_power;
368  bool _bfd_coff_force_symnames_in_strings;
369  unsigned int _bfd_coff_debug_string_prefix_length;
370  unsigned int _bfd_coff_max_nscns;
371
372  void (*_bfd_coff_swap_filehdr_in)
373    (bfd *, void *, void *);
374
375  void (*_bfd_coff_swap_aouthdr_in)
376    (bfd *, void *, void *);
377
378  void (*_bfd_coff_swap_scnhdr_in)
379    (bfd *, void *, void *);
380
381  void (*_bfd_coff_swap_reloc_in)
382    (bfd *abfd, void *, void *);
383
384  bool (*_bfd_coff_bad_format_hook)
385    (bfd *, void *);
386
387  bool (*_bfd_coff_set_arch_mach_hook)
388    (bfd *, void *);
389
390  void * (*_bfd_coff_mkobject_hook)
391    (bfd *, void *, void *);
392
393  bool (*_bfd_styp_to_sec_flags_hook)
394    (bfd *, void *, const char *, asection *, flagword *);
395
396  void (*_bfd_set_alignment_hook)
397    (bfd *, asection *, void *);
398
399  bool (*_bfd_coff_slurp_symbol_table)
400    (bfd *);
401
402  bool (*_bfd_coff_symname_in_debug)
403    (bfd *, struct internal_syment *);
404
405  bool (*_bfd_coff_pointerize_aux_hook)
406    (bfd *, combined_entry_type *, combined_entry_type *,
407     unsigned int, combined_entry_type *);
408
409  bool (*_bfd_coff_print_aux)
410    (bfd *, FILE *, combined_entry_type *, combined_entry_type *,
411     combined_entry_type *, unsigned int);
412
413  bool (*_bfd_coff_reloc16_extra_cases)
414    (bfd *, struct bfd_link_info *, struct bfd_link_order *, arelent *,
415     bfd_byte *, size_t *, size_t *);
416
417  int (*_bfd_coff_reloc16_estimate)
418    (bfd *, asection *, arelent *, unsigned int,
419     struct bfd_link_info *);
420
421  enum coff_symbol_classification (*_bfd_coff_classify_symbol)
422    (bfd *, struct internal_syment *);
423
424  bool (*_bfd_coff_compute_section_file_positions)
425    (bfd *);
426
427  bool (*_bfd_coff_start_final_link)
428    (bfd *, struct bfd_link_info *);
429
430  bool (*_bfd_coff_relocate_section)
431    (bfd *, struct bfd_link_info *, bfd *, asection *, bfd_byte *,
432     struct internal_reloc *, struct internal_syment *, asection **);
433
434  reloc_howto_type *(*_bfd_coff_rtype_to_howto)
435    (bfd *, asection *, struct internal_reloc *,
436     struct coff_link_hash_entry *, struct internal_syment *, bfd_vma *);
437
438  bool (*_bfd_coff_adjust_symndx)
439    (bfd *, struct bfd_link_info *, bfd *, asection *,
440     struct internal_reloc *, bool *);
441
442  bool (*_bfd_coff_link_add_one_symbol)
443    (struct bfd_link_info *, bfd *, const char *, flagword,
444     asection *, bfd_vma, const char *, bool, bool,
445     struct bfd_link_hash_entry **);
446
447  bool (*_bfd_coff_link_output_has_begun)
448    (bfd *, struct coff_final_link_info *);
449
450  bool (*_bfd_coff_final_link_postscript)
451    (bfd *, struct coff_final_link_info *);
452
453  bool (*_bfd_coff_print_pdata)
454    (bfd *, void *);
455
456@} bfd_coff_backend_data;
457
458@end example
459@subsubsection Writing relocations
460To write relocations, the back end steps though the
461canonical relocation table and create an
462@code{internal_reloc}. The symbol index to use is removed from
463the @code{offset} field in the symbol table supplied.  The
464address comes directly from the sum of the section base
465address and the relocation offset; the type is dug directly
466from the howto field.  Then the @code{internal_reloc} is
467swapped into the shape of an @code{external_reloc} and written
468out to disk.
469
470@subsubsection Reading linenumbers
471Creating the linenumber table is done by reading in the entire
472coff linenumber table, and creating another table for internal use.
473
474A coff linenumber table is structured so that each function
475is marked as having a line number of 0. Each line within the
476function is an offset from the first line in the function. The
477base of the line number information for the table is stored in
478the symbol associated with the function.
479
480Note: The PE format uses line number 0 for a flag indicating a
481new source file.
482
483The information is copied from the external to the internal
484table, and each symbol which marks a function is marked by
485pointing its...
486
487How does this work ?
488
489@subsubsection Reading relocations
490Coff relocations are easily transformed into the internal BFD form
491(@code{arelent}).
492
493Reading a coff relocation table is done in the following stages:
494
495@itemize @bullet
496
497@item
498Read the entire coff relocation table into memory.
499
500@item
501Process each relocation in turn; first swap it from the
502external to the internal form.
503
504@item
505Turn the symbol referenced in the relocation's symbol index
506into a pointer into the canonical symbol table.
507This table is the same as the one returned by a call to
508@code{bfd_canonicalize_symtab}. The back end will call that
509routine and save the result if a canonicalization hasn't been done.
510
511@item
512The reloc index is turned into a pointer to a howto
513structure, in a back end specific way. For instance, the 386
514uses the @code{r_type} to directly produce an index
515into a howto table vector.
516
517@item
518Note that @code{arelent.addend} for COFF is often not what
519most people understand as a relocation addend, but rather an
520adjustment to the relocation addend stored in section contents
521of relocatable object files.  The value found in section
522contents may also be confusing, depending on both symbol value
523and addend somewhat similar to the field value for a
524final-linked object.  See @code{CALC_ADDEND}.
525@end itemize
526
527