bfd/doc/bfdsumm.texi

33965Sjdp@c This summary of BFD is shared by the BFD and LD docs.
33965SjdpWhen an object file is opened, BFD subroutines automatically determine
33965Sjdpthe format of the input object file.  They then build a descriptor in
33965Sjdpmemory with pointers to routines that will be used to access elements of
33965Sjdpthe object file's data structures.
33965Sjdp
89857SobrienAs different information from the object files is required,
33965SjdpBFD reads from different sections of the file and processes them.
33965SjdpFor example, a very common operation for the linker is processing symbol
33965Sjdptables.  Each BFD back end provides a routine for converting
33965Sjdpbetween the object file's representation of symbols and an internal
33965Sjdpcanonical format. When the linker asks for the symbol table of an object
33965Sjdpfile, it calls through a memory pointer to the routine from the
33965Sjdprelevant BFD back end which reads and converts the table into a canonical
33965Sjdpform.  The linker then operates upon the canonical form. When the link is
33965Sjdpfinished and the linker writes the output file's symbol table,
33965Sjdpanother BFD back end routine is called to take the newly
33965Sjdpcreated symbol table and convert it into the chosen output format.
33965Sjdp
33965Sjdp@menu
33965Sjdp* BFD information loss::	Information Loss
33965Sjdp* Canonical format::		The BFD	canonical object-file format
33965Sjdp@end menu
33965Sjdp
33965Sjdp@node BFD information loss
33965Sjdp@subsection Information Loss
33965Sjdp
33965Sjdp@emph{Information can be lost during output.} The output formats
33965Sjdpsupported by BFD do not provide identical facilities, and
33965Sjdpinformation which can be described in one form has nowhere to go in
33965Sjdpanother format. One example of this is alignment information in
33965Sjdp@code{b.out}. There is nowhere in an @code{a.out} format file to store
33965Sjdpalignment information on the contained data, so when a file is linked
33965Sjdpfrom @code{b.out} and an @code{a.out} image is produced, alignment
33965Sjdpinformation will not propagate to the output file. (The linker will
33965Sjdpstill use the alignment information internally, so the link is performed
33965Sjdpcorrectly).
33965Sjdp
33965SjdpAnother example is COFF section names. COFF files may contain an
33965Sjdpunlimited number of sections, each one with a textual section name. If
33965Sjdpthe target of the link is a format which does not have many sections (e.g.,
33965Sjdp@code{a.out}) or has sections without names (e.g., the Oasys format), the
33965Sjdplink cannot be done simply. You can circumvent this problem by
33965Sjdpdescribing the desired input-to-output section mapping with the linker command
33965Sjdplanguage.
33965Sjdp
33965Sjdp@emph{Information can be lost during canonicalization.} The BFD
33965Sjdpinternal canonical form of the external formats is not exhaustive; there
33965Sjdpare structures in input formats for which there is no direct
33965Sjdprepresentation internally.  This means that the BFD back ends
33965Sjdpcannot maintain all possible data richness through the transformation
33965Sjdpbetween external to internal and back to external formats.
33965Sjdp
33965SjdpThis limitation is only a problem when an application reads one
33965Sjdpformat and writes another.  Each BFD back end is responsible for
33965Sjdpmaintaining as much data as possible, and the internal BFD
33965Sjdpcanonical form has structures which are opaque to the BFD core,
33965Sjdpand exported only to the back ends. When a file is read in one format,
33965Sjdpthe canonical form is generated for BFD and the application. At the
33965Sjdpsame time, the back end saves away any information which may otherwise
33965Sjdpbe lost. If the data is then written back in the same format, the back
33965Sjdpend routine will be able to use the canonical form provided by the
33965SjdpBFD core as well as the information it prepared earlier.  Since
33965Sjdpthere is a great deal of commonality between back ends,
33965Sjdpthere is no information lost when
33965Sjdplinking or copying big endian COFF to little endian COFF, or @code{a.out} to
33965Sjdp@code{b.out}.  When a mixture of formats is linked, the information is
33965Sjdponly lost from the files whose format differs from the destination.
33965Sjdp
33965Sjdp@node Canonical format
33965Sjdp@subsection The BFD canonical object-file format
33965Sjdp
33965SjdpThe greatest potential for loss of information occurs when there is the least
33965Sjdpoverlap between the information provided by the source format, that
33965Sjdpstored by the canonical format, and that needed by the
33965Sjdpdestination format. A brief description of the canonical form may help
33965Sjdpyou understand which kinds of data you can count on preserving across
33965Sjdpconversions.
33965Sjdp@cindex BFD canonical format
33965Sjdp@cindex internal object-file format
33965Sjdp
33965Sjdp@table @emph
33965Sjdp@item files
33965SjdpInformation stored on a per-file basis includes target machine
33965Sjdparchitecture, particular implementation format type, a demand pageable
33965Sjdpbit, and a write protected bit.  Information like Unix magic numbers is
33965Sjdpnot stored here---only the magic numbers' meaning, so a @code{ZMAGIC}
33965Sjdpfile would have both the demand pageable bit and the write protected
33965Sjdptext bit set.  The byte order of the target is stored on a per-file
33965Sjdpbasis, so that big- and little-endian object files may be used with one
33965Sjdpanother.
33965Sjdp
33965Sjdp@item sections
33965SjdpEach section in the input file contains the name of the section, the
33965Sjdpsection's original address in the object file, size and alignment
33965Sjdpinformation, various flags, and pointers into other BFD data
33965Sjdpstructures.
33965Sjdp
33965Sjdp@item symbols
33965SjdpEach symbol contains a pointer to the information for the object file
33965Sjdpwhich originally defined it, its name, its value, and various flag
33965Sjdpbits.  When a BFD back end reads in a symbol table, it relocates all
33965Sjdpsymbols to make them relative to the base of the section where they were
33965Sjdpdefined.  Doing this ensures that each symbol points to its containing
33965Sjdpsection.  Each symbol also has a varying amount of hidden private data
33965Sjdpfor the BFD back end.  Since the symbol points to the original file, the
33965Sjdpprivate data format for that symbol is accessible.  @code{ld} can
33965Sjdpoperate on a collection of symbols of wildly different formats without
33965Sjdpproblems.
33965Sjdp
33965SjdpNormal global and simple local symbols are maintained on output, so an
33965Sjdpoutput file (no matter its format) will retain symbols pointing to
33965Sjdpfunctions and to global, static, and common variables.  Some symbol
33965Sjdpinformation is not worth retaining; in @code{a.out}, type information is
33965Sjdpstored in the symbol table as long symbol names.  This information would
33965Sjdpbe useless to most COFF debuggers; the linker has command line switches
33965Sjdpto allow users to throw it away.
33965Sjdp
33965SjdpThere is one word of type information within the symbol, so if the
33965Sjdpformat supports symbol type information within symbols (for example, COFF,
33965SjdpIEEE, Oasys) and the type is simple enough to fit within one word
33965Sjdp(nearly everything but aggregates), the information will be preserved.
33965Sjdp
33965Sjdp@item relocation level
33965SjdpEach canonical BFD relocation record contains a pointer to the symbol to
33965Sjdprelocate to, the offset of the data to relocate, the section the data
33965Sjdpis in, and a pointer to a relocation type descriptor. Relocation is
33965Sjdpperformed by passing messages through the relocation type
33965Sjdpdescriptor and the symbol pointer. Therefore, relocations can be performed
33965Sjdpon output data using a relocation method that is only available in one of the
33965Sjdpinput formats. For instance, Oasys provides a byte relocation format.
33965SjdpA relocation record requesting this relocation type would point
33965Sjdpindirectly to a routine to perform this, so the relocation may be
33965Sjdpperformed on a byte being written to a 68k COFF file, even though 68k COFF
33965Sjdphas no such relocation type.
33965Sjdp
33965Sjdp@item line numbers
33965SjdpObject formats can contain, for debugging purposes, some form of mapping
33965Sjdpbetween symbols, source line numbers, and addresses in the output file.
33965SjdpThese addresses have to be relocated along with the symbol information.
33965SjdpEach symbol with an associated list of line number records points to the
33965Sjdpfirst record of the list.  The head of a line number list consists of a
33965Sjdppointer to the symbol, which allows finding out the address of the
33965Sjdpfunction whose line number is being described. The rest of the list is
33965Sjdpmade up of pairs: offsets into the section and line numbers. Any format
33965Sjdpwhich can simply derive this information can pass it successfully
33965Sjdpbetween formats (COFF, IEEE and Oasys).
33965Sjdp@end table