133965Sjdp@c This summary of BFD is shared by the BFD and LD docs.
233965SjdpWhen an object file is opened, BFD subroutines automatically determine
333965Sjdpthe format of the input object file.  They then build a descriptor in
433965Sjdpmemory with pointers to routines that will be used to access elements of
533965Sjdpthe object file's data structures.
633965Sjdp
789857SobrienAs different information from the object files is required,
833965SjdpBFD reads from different sections of the file and processes them.
933965SjdpFor example, a very common operation for the linker is processing symbol
1033965Sjdptables.  Each BFD back end provides a routine for converting
1133965Sjdpbetween the object file's representation of symbols and an internal
1233965Sjdpcanonical format. When the linker asks for the symbol table of an object
1333965Sjdpfile, it calls through a memory pointer to the routine from the
1433965Sjdprelevant BFD back end which reads and converts the table into a canonical
1533965Sjdpform.  The linker then operates upon the canonical form. When the link is
1633965Sjdpfinished and the linker writes the output file's symbol table,
1733965Sjdpanother BFD back end routine is called to take the newly
1833965Sjdpcreated symbol table and convert it into the chosen output format.
1933965Sjdp
2033965Sjdp@menu
2133965Sjdp* BFD information loss::	Information Loss
2233965Sjdp* Canonical format::		The BFD	canonical object-file format 
2333965Sjdp@end menu
2433965Sjdp
2533965Sjdp@node BFD information loss
2633965Sjdp@subsection Information Loss
2733965Sjdp
2833965Sjdp@emph{Information can be lost during output.} The output formats
2933965Sjdpsupported by BFD do not provide identical facilities, and
3033965Sjdpinformation which can be described in one form has nowhere to go in
3133965Sjdpanother format. One example of this is alignment information in
3233965Sjdp@code{b.out}. There is nowhere in an @code{a.out} format file to store
3333965Sjdpalignment information on the contained data, so when a file is linked
3433965Sjdpfrom @code{b.out} and an @code{a.out} image is produced, alignment
3533965Sjdpinformation will not propagate to the output file. (The linker will
3633965Sjdpstill use the alignment information internally, so the link is performed
3733965Sjdpcorrectly).
3833965Sjdp
3933965SjdpAnother example is COFF section names. COFF files may contain an
4033965Sjdpunlimited number of sections, each one with a textual section name. If
4133965Sjdpthe target of the link is a format which does not have many sections (e.g.,
4233965Sjdp@code{a.out}) or has sections without names (e.g., the Oasys format), the
4333965Sjdplink cannot be done simply. You can circumvent this problem by
4433965Sjdpdescribing the desired input-to-output section mapping with the linker command
4533965Sjdplanguage.
4633965Sjdp
4733965Sjdp@emph{Information can be lost during canonicalization.} The BFD
4833965Sjdpinternal canonical form of the external formats is not exhaustive; there
4933965Sjdpare structures in input formats for which there is no direct
5033965Sjdprepresentation internally.  This means that the BFD back ends
5133965Sjdpcannot maintain all possible data richness through the transformation
5233965Sjdpbetween external to internal and back to external formats.
5333965Sjdp
5433965SjdpThis limitation is only a problem when an application reads one
5533965Sjdpformat and writes another.  Each BFD back end is responsible for
5633965Sjdpmaintaining as much data as possible, and the internal BFD
5733965Sjdpcanonical form has structures which are opaque to the BFD core,
5833965Sjdpand exported only to the back ends. When a file is read in one format,
5933965Sjdpthe canonical form is generated for BFD and the application. At the
6033965Sjdpsame time, the back end saves away any information which may otherwise
6133965Sjdpbe lost. If the data is then written back in the same format, the back
6233965Sjdpend routine will be able to use the canonical form provided by the
6333965SjdpBFD core as well as the information it prepared earlier.  Since
6433965Sjdpthere is a great deal of commonality between back ends,
6533965Sjdpthere is no information lost when
6633965Sjdplinking or copying big endian COFF to little endian COFF, or @code{a.out} to
6733965Sjdp@code{b.out}.  When a mixture of formats is linked, the information is
6833965Sjdponly lost from the files whose format differs from the destination.
6933965Sjdp
7033965Sjdp@node Canonical format
7133965Sjdp@subsection The BFD canonical object-file format
7233965Sjdp
7333965SjdpThe greatest potential for loss of information occurs when there is the least
7433965Sjdpoverlap between the information provided by the source format, that
7533965Sjdpstored by the canonical format, and that needed by the
7633965Sjdpdestination format. A brief description of the canonical form may help
7733965Sjdpyou understand which kinds of data you can count on preserving across
7833965Sjdpconversions.
7933965Sjdp@cindex BFD canonical format
8033965Sjdp@cindex internal object-file format
8133965Sjdp
8233965Sjdp@table @emph
8333965Sjdp@item files
8433965SjdpInformation stored on a per-file basis includes target machine
8533965Sjdparchitecture, particular implementation format type, a demand pageable
8633965Sjdpbit, and a write protected bit.  Information like Unix magic numbers is
8733965Sjdpnot stored here---only the magic numbers' meaning, so a @code{ZMAGIC}
8833965Sjdpfile would have both the demand pageable bit and the write protected
8933965Sjdptext bit set.  The byte order of the target is stored on a per-file
9033965Sjdpbasis, so that big- and little-endian object files may be used with one
9133965Sjdpanother.
9233965Sjdp
9333965Sjdp@item sections
9433965SjdpEach section in the input file contains the name of the section, the
9533965Sjdpsection's original address in the object file, size and alignment
9633965Sjdpinformation, various flags, and pointers into other BFD data
9733965Sjdpstructures.
9833965Sjdp
9933965Sjdp@item symbols
10033965SjdpEach symbol contains a pointer to the information for the object file
10133965Sjdpwhich originally defined it, its name, its value, and various flag
10233965Sjdpbits.  When a BFD back end reads in a symbol table, it relocates all
10333965Sjdpsymbols to make them relative to the base of the section where they were
10433965Sjdpdefined.  Doing this ensures that each symbol points to its containing
10533965Sjdpsection.  Each symbol also has a varying amount of hidden private data
10633965Sjdpfor the BFD back end.  Since the symbol points to the original file, the
10733965Sjdpprivate data format for that symbol is accessible.  @code{ld} can
10833965Sjdpoperate on a collection of symbols of wildly different formats without
10933965Sjdpproblems.
11033965Sjdp
11133965SjdpNormal global and simple local symbols are maintained on output, so an
11233965Sjdpoutput file (no matter its format) will retain symbols pointing to
11333965Sjdpfunctions and to global, static, and common variables.  Some symbol
11433965Sjdpinformation is not worth retaining; in @code{a.out}, type information is
11533965Sjdpstored in the symbol table as long symbol names.  This information would
11633965Sjdpbe useless to most COFF debuggers; the linker has command line switches
11733965Sjdpto allow users to throw it away.
11833965Sjdp
11933965SjdpThere is one word of type information within the symbol, so if the
12033965Sjdpformat supports symbol type information within symbols (for example, COFF,
12133965SjdpIEEE, Oasys) and the type is simple enough to fit within one word
12233965Sjdp(nearly everything but aggregates), the information will be preserved.
12333965Sjdp
12433965Sjdp@item relocation level
12533965SjdpEach canonical BFD relocation record contains a pointer to the symbol to
12633965Sjdprelocate to, the offset of the data to relocate, the section the data
12733965Sjdpis in, and a pointer to a relocation type descriptor. Relocation is
12833965Sjdpperformed by passing messages through the relocation type
12933965Sjdpdescriptor and the symbol pointer. Therefore, relocations can be performed
13033965Sjdpon output data using a relocation method that is only available in one of the
13133965Sjdpinput formats. For instance, Oasys provides a byte relocation format.
13233965SjdpA relocation record requesting this relocation type would point
13333965Sjdpindirectly to a routine to perform this, so the relocation may be
13433965Sjdpperformed on a byte being written to a 68k COFF file, even though 68k COFF
13533965Sjdphas no such relocation type.
13633965Sjdp
13733965Sjdp@item line numbers
13833965SjdpObject formats can contain, for debugging purposes, some form of mapping
13933965Sjdpbetween symbols, source line numbers, and addresses in the output file.
14033965SjdpThese addresses have to be relocated along with the symbol information.
14133965SjdpEach symbol with an associated list of line number records points to the
14233965Sjdpfirst record of the list.  The head of a line number list consists of a
14333965Sjdppointer to the symbol, which allows finding out the address of the
14433965Sjdpfunction whose line number is being described. The rest of the list is
14533965Sjdpmade up of pairs: offsets into the section and line numbers. Any format
14633965Sjdpwhich can simply derive this information can pass it successfully
14733965Sjdpbetween formats (COFF, IEEE and Oasys).
14833965Sjdp@end table
149