133965Sjdp@c This summary of BFD is shared by the BFD and LD docs. 233965SjdpWhen an object file is opened, BFD subroutines automatically determine 333965Sjdpthe format of the input object file. They then build a descriptor in 433965Sjdpmemory with pointers to routines that will be used to access elements of 533965Sjdpthe object file's data structures. 633965Sjdp 789857SobrienAs different information from the object files is required, 833965SjdpBFD reads from different sections of the file and processes them. 933965SjdpFor example, a very common operation for the linker is processing symbol 1033965Sjdptables. Each BFD back end provides a routine for converting 1133965Sjdpbetween the object file's representation of symbols and an internal 1233965Sjdpcanonical format. When the linker asks for the symbol table of an object 1333965Sjdpfile, it calls through a memory pointer to the routine from the 1433965Sjdprelevant BFD back end which reads and converts the table into a canonical 1533965Sjdpform. The linker then operates upon the canonical form. When the link is 1633965Sjdpfinished and the linker writes the output file's symbol table, 1733965Sjdpanother BFD back end routine is called to take the newly 1833965Sjdpcreated symbol table and convert it into the chosen output format. 1933965Sjdp 2033965Sjdp@menu 2133965Sjdp* BFD information loss:: Information Loss 2233965Sjdp* Canonical format:: The BFD canonical object-file format 2333965Sjdp@end menu 2433965Sjdp 2533965Sjdp@node BFD information loss 2633965Sjdp@subsection Information Loss 2733965Sjdp 2833965Sjdp@emph{Information can be lost during output.} The output formats 2933965Sjdpsupported by BFD do not provide identical facilities, and 3033965Sjdpinformation which can be described in one form has nowhere to go in 3133965Sjdpanother format. One example of this is alignment information in 3233965Sjdp@code{b.out}. There is nowhere in an @code{a.out} format file to store 3333965Sjdpalignment information on the contained data, so when a file is linked 3433965Sjdpfrom @code{b.out} and an @code{a.out} image is produced, alignment 3533965Sjdpinformation will not propagate to the output file. (The linker will 3633965Sjdpstill use the alignment information internally, so the link is performed 3733965Sjdpcorrectly). 3833965Sjdp 3933965SjdpAnother example is COFF section names. COFF files may contain an 4033965Sjdpunlimited number of sections, each one with a textual section name. If 4133965Sjdpthe target of the link is a format which does not have many sections (e.g., 4233965Sjdp@code{a.out}) or has sections without names (e.g., the Oasys format), the 4333965Sjdplink cannot be done simply. You can circumvent this problem by 4433965Sjdpdescribing the desired input-to-output section mapping with the linker command 4533965Sjdplanguage. 4633965Sjdp 4733965Sjdp@emph{Information can be lost during canonicalization.} The BFD 4833965Sjdpinternal canonical form of the external formats is not exhaustive; there 4933965Sjdpare structures in input formats for which there is no direct 5033965Sjdprepresentation internally. This means that the BFD back ends 5133965Sjdpcannot maintain all possible data richness through the transformation 5233965Sjdpbetween external to internal and back to external formats. 5333965Sjdp 5433965SjdpThis limitation is only a problem when an application reads one 5533965Sjdpformat and writes another. Each BFD back end is responsible for 5633965Sjdpmaintaining as much data as possible, and the internal BFD 5733965Sjdpcanonical form has structures which are opaque to the BFD core, 5833965Sjdpand exported only to the back ends. When a file is read in one format, 5933965Sjdpthe canonical form is generated for BFD and the application. At the 6033965Sjdpsame time, the back end saves away any information which may otherwise 6133965Sjdpbe lost. If the data is then written back in the same format, the back 6233965Sjdpend routine will be able to use the canonical form provided by the 6333965SjdpBFD core as well as the information it prepared earlier. Since 6433965Sjdpthere is a great deal of commonality between back ends, 6533965Sjdpthere is no information lost when 6633965Sjdplinking or copying big endian COFF to little endian COFF, or @code{a.out} to 6733965Sjdp@code{b.out}. When a mixture of formats is linked, the information is 6833965Sjdponly lost from the files whose format differs from the destination. 6933965Sjdp 7033965Sjdp@node Canonical format 7133965Sjdp@subsection The BFD canonical object-file format 7233965Sjdp 7333965SjdpThe greatest potential for loss of information occurs when there is the least 7433965Sjdpoverlap between the information provided by the source format, that 7533965Sjdpstored by the canonical format, and that needed by the 7633965Sjdpdestination format. A brief description of the canonical form may help 7733965Sjdpyou understand which kinds of data you can count on preserving across 7833965Sjdpconversions. 7933965Sjdp@cindex BFD canonical format 8033965Sjdp@cindex internal object-file format 8133965Sjdp 8233965Sjdp@table @emph 8333965Sjdp@item files 8433965SjdpInformation stored on a per-file basis includes target machine 8533965Sjdparchitecture, particular implementation format type, a demand pageable 8633965Sjdpbit, and a write protected bit. Information like Unix magic numbers is 8733965Sjdpnot stored here---only the magic numbers' meaning, so a @code{ZMAGIC} 8833965Sjdpfile would have both the demand pageable bit and the write protected 8933965Sjdptext bit set. The byte order of the target is stored on a per-file 9033965Sjdpbasis, so that big- and little-endian object files may be used with one 9133965Sjdpanother. 9233965Sjdp 9333965Sjdp@item sections 9433965SjdpEach section in the input file contains the name of the section, the 9533965Sjdpsection's original address in the object file, size and alignment 9633965Sjdpinformation, various flags, and pointers into other BFD data 9733965Sjdpstructures. 9833965Sjdp 9933965Sjdp@item symbols 10033965SjdpEach symbol contains a pointer to the information for the object file 10133965Sjdpwhich originally defined it, its name, its value, and various flag 10233965Sjdpbits. When a BFD back end reads in a symbol table, it relocates all 10333965Sjdpsymbols to make them relative to the base of the section where they were 10433965Sjdpdefined. Doing this ensures that each symbol points to its containing 10533965Sjdpsection. Each symbol also has a varying amount of hidden private data 10633965Sjdpfor the BFD back end. Since the symbol points to the original file, the 10733965Sjdpprivate data format for that symbol is accessible. @code{ld} can 10833965Sjdpoperate on a collection of symbols of wildly different formats without 10933965Sjdpproblems. 11033965Sjdp 11133965SjdpNormal global and simple local symbols are maintained on output, so an 11233965Sjdpoutput file (no matter its format) will retain symbols pointing to 11333965Sjdpfunctions and to global, static, and common variables. Some symbol 11433965Sjdpinformation is not worth retaining; in @code{a.out}, type information is 11533965Sjdpstored in the symbol table as long symbol names. This information would 11633965Sjdpbe useless to most COFF debuggers; the linker has command line switches 11733965Sjdpto allow users to throw it away. 11833965Sjdp 11933965SjdpThere is one word of type information within the symbol, so if the 12033965Sjdpformat supports symbol type information within symbols (for example, COFF, 12133965SjdpIEEE, Oasys) and the type is simple enough to fit within one word 12233965Sjdp(nearly everything but aggregates), the information will be preserved. 12333965Sjdp 12433965Sjdp@item relocation level 12533965SjdpEach canonical BFD relocation record contains a pointer to the symbol to 12633965Sjdprelocate to, the offset of the data to relocate, the section the data 12733965Sjdpis in, and a pointer to a relocation type descriptor. Relocation is 12833965Sjdpperformed by passing messages through the relocation type 12933965Sjdpdescriptor and the symbol pointer. Therefore, relocations can be performed 13033965Sjdpon output data using a relocation method that is only available in one of the 13133965Sjdpinput formats. For instance, Oasys provides a byte relocation format. 13233965SjdpA relocation record requesting this relocation type would point 13333965Sjdpindirectly to a routine to perform this, so the relocation may be 13433965Sjdpperformed on a byte being written to a 68k COFF file, even though 68k COFF 13533965Sjdphas no such relocation type. 13633965Sjdp 13733965Sjdp@item line numbers 13833965SjdpObject formats can contain, for debugging purposes, some form of mapping 13933965Sjdpbetween symbols, source line numbers, and addresses in the output file. 14033965SjdpThese addresses have to be relocated along with the symbol information. 14133965SjdpEach symbol with an associated list of line number records points to the 14233965Sjdpfirst record of the list. The head of a line number list consists of a 14333965Sjdppointer to the symbol, which allows finding out the address of the 14433965Sjdpfunction whose line number is being described. The rest of the list is 14533965Sjdpmade up of pairs: offsets into the section and line numbers. Any format 14633965Sjdpwhich can simply derive this information can pass it successfully 14733965Sjdpbetween formats (COFF, IEEE and Oasys). 14833965Sjdp@end table 149