1\input texinfo
2@setfilename ldint.info
3@c Copyright 1992, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001,
4@c 2003, 2005, 2006, 2007
5@c Free Software Foundation, Inc.
6
7@ifnottex
8@dircategory Software development
9@direntry
10* Ld-Internals: (ldint).	The GNU linker internals.
11@end direntry
12@end ifnottex
13
14@copying
15This file documents the internals of the GNU linker ld.
16
17Copyright @copyright{} 1992, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2007
18Free Software Foundation, Inc.
19Contributed by Cygnus Support.
20
21Permission is granted to copy, distribute and/or modify this document
22under the terms of the GNU Free Documentation License, Version 1.3 or
23any later version published by the Free Software Foundation; with the
24Invariant Sections being ``GNU General Public License'' and ``Funding
25Free Software'', the Front-Cover texts being (a) (see below), and with
26the Back-Cover Texts being (b) (see below).  A copy of the license is
27included in the section entitled ``GNU Free Documentation License''.
28
29(a) The FSF's Front-Cover Text is:
30
31     A GNU Manual
32
33(b) The FSF's Back-Cover Text is:
34
35     You have freedom to copy and modify this GNU Manual, like GNU
36     software.  Copies published by the Free Software Foundation raise
37     funds for GNU development.
38@end copying
39
40@iftex
41@finalout
42@setchapternewpage off
43@settitle GNU Linker Internals
44@titlepage
45@title{A guide to the internals of the GNU linker}
46@author Per Bothner, Steve Chamberlain, Ian Lance Taylor, DJ Delorie
47@author Cygnus Support
48@page
49
50@tex
51\def\$#1${{#1}}  % Kluge: collect RCS revision info without $...$
52\xdef\manvers{2.10.91}  % For use in headers, footers too
53{\parskip=0pt
54\hfill Cygnus Support\par
55\hfill \manvers\par
56\hfill \TeX{}info \texinfoversion\par
57}
58@end tex
59
60@vskip 0pt plus 1filll
61Copyright @copyright{} 1992, 1993, 1994, 1995, 1996, 1997, 1998, 2000
62Free Software Foundation, Inc.
63
64      Permission is granted to copy, distribute and/or modify this document
65      under the terms of the GNU Free Documentation License, Version 1.3
66      or any later version published by the Free Software Foundation;
67      with no Invariant Sections, with no Front-Cover Texts, and with no
68      Back-Cover Texts.  A copy of the license is included in the
69      section entitled "GNU Free Documentation License".
70
71@end titlepage
72@end iftex
73
74@node Top
75@top
76
77This file documents the internals of the GNU linker @code{ld}.  It is a
78collection of miscellaneous information with little form at this point.
79Mostly, it is a repository into which you can put information about
80GNU @code{ld} as you discover it (or as you design changes to @code{ld}).
81
82This document is distributed under the terms of the GNU Free
83Documentation License.  A copy of the license is included in the
84section entitled "GNU Free Documentation License".
85
86@menu
87* README::			The README File
88* Emulations::			How linker emulations are generated
89* Emulation Walkthrough::	A Walkthrough of a Typical Emulation
90* Architecture Specific::	Some Architecture Specific Notes
91* GNU Free Documentation License::  GNU Free Documentation License
92@end menu
93
94@node README
95@chapter The @file{README} File
96
97Check the @file{README} file; it often has useful information that does not
98appear anywhere else in the directory.
99
100@node Emulations
101@chapter How linker emulations are generated
102
103Each linker target has an @dfn{emulation}.  The emulation includes the
104default linker script, and certain emulations also modify certain types
105of linker behaviour.
106
107Emulations are created during the build process by the shell script
108@file{genscripts.sh}.
109
110The @file{genscripts.sh} script starts by reading a file in the
111@file{emulparams} directory.  This is a shell script which sets various
112shell variables used by @file{genscripts.sh} and the other shell scripts
113it invokes.
114
115The @file{genscripts.sh} script will invoke a shell script in the
116@file{scripttempl} directory in order to create default linker scripts
117written in the linker command language.  The @file{scripttempl} script
118will be invoked 5 (or, in some cases, 6) times, with different
119assignments to shell variables, to create different default scripts.
120The choice of script is made based on the command line options.
121
122After creating the scripts, @file{genscripts.sh} will invoke yet another
123shell script, this time in the @file{emultempl} directory.  That shell
124script will create the emulation source file, which contains C code.
125This C code permits the linker emulation to override various linker
126behaviours.  Most targets use the generic emulation code, which is in
127@file{emultempl/generic.em}.
128
129To summarize, @file{genscripts.sh} reads three shell scripts: an
130emulation parameters script in the @file{emulparams} directory, a linker
131script generation script in the @file{scripttempl} directory, and an
132emulation source file generation script in the @file{emultempl}
133directory.
134
135For example, the Sun 4 linker sets up variables in
136@file{emulparams/sun4.sh}, creates linker scripts using
137@file{scripttempl/aout.sc}, and creates the emulation code using
138@file{emultempl/sunos.em}.
139
140Note that the linker can support several emulations simultaneously,
141depending upon how it is configured.  An emulation can be selected with
142the @code{-m} option.  The @code{-V} option will list all supported
143emulations.
144
145@menu
146* emulation parameters::        @file{emulparams} scripts
147* linker scripts::              @file{scripttempl} scripts
148* linker emulations::           @file{emultempl} scripts
149@end menu
150
151@node emulation parameters
152@section @file{emulparams} scripts
153
154Each target selects a particular file in the @file{emulparams} directory
155by setting the shell variable @code{targ_emul} in @file{configure.tgt}.
156This shell variable is used by the @file{configure} script to control
157building an emulation source file.
158
159Certain conventions are enforced.  Suppose the @code{targ_emul} variable
160is set to @var{emul} in @file{configure.tgt}.  The name of the emulation
161shell script will be @file{emulparams/@var{emul}.sh}.  The
162@file{Makefile} must have a target named @file{e@var{emul}.c}; this
163target must depend upon @file{emulparams/@var{emul}.sh}, as well as the
164appropriate scripts in the @file{scripttempl} and @file{emultempl}
165directories.  The @file{Makefile} target must invoke @code{GENSCRIPTS}
166with two arguments: @var{emul}, and the value of the make variable
167@code{tdir_@var{emul}}.  The value of the latter variable will be set by
168the @file{configure} script, and is used to set the default target
169directory to search.
170
171By convention, the @file{emulparams/@var{emul}.sh} shell script should
172only set shell variables.  It may set shell variables which are to be
173interpreted by the @file{scripttempl} and the @file{emultempl} scripts.
174Certain shell variables are interpreted directly by the
175@file{genscripts.sh} script.
176
177Here is a list of shell variables interpreted by @file{genscripts.sh},
178as well as some conventional shell variables interpreted by the
179@file{scripttempl} and @file{emultempl} scripts.
180
181@table @code
182@item SCRIPT_NAME
183This is the name of the @file{scripttempl} script to use.  If
184@code{SCRIPT_NAME} is set to @var{script}, @file{genscripts.sh} will use
185the script @file{scripttempl/@var{script}.sc}.
186
187@item TEMPLATE_NAME
188This is the name of the @file{emultempl} script to use.  If
189@code{TEMPLATE_NAME} is set to @var{template}, @file{genscripts.sh} will
190use the script @file{emultempl/@var{template}.em}.  If this variable is
191not set, the default value is @samp{generic}.
192
193@item GENERATE_SHLIB_SCRIPT
194If this is set to a nonempty string, @file{genscripts.sh} will invoke
195the @file{scripttempl} script an extra time to create a shared library
196script.  @ref{linker scripts}.
197
198@item OUTPUT_FORMAT
199This is normally set to indicate the BFD output format use (e.g.,
200@samp{"a.out-sunos-big"}.  The @file{scripttempl} script will normally
201use it in an @code{OUTPUT_FORMAT} expression in the linker script.
202
203@item ARCH
204This is normally set to indicate the architecture to use (e.g.,
205@samp{sparc}).  The @file{scripttempl} script will normally use it in an
206@code{OUTPUT_ARCH} expression in the linker script.
207
208@item ENTRY
209Some @file{scripttempl} scripts use this to set the entry address, in an
210@code{ENTRY} expression in the linker script.
211
212@item TEXT_START_ADDR
213Some @file{scripttempl} scripts use this to set the start address of the
214@samp{.text} section.
215
216@item SEGMENT_SIZE
217The @file{genscripts.sh} script uses this to set the default value of
218@code{DATA_ALIGNMENT} when running the @file{scripttempl} script.
219
220@item TARGET_PAGE_SIZE
221If @code{SEGMENT_SIZE} is not defined, the @file{genscripts.sh} script
222uses this to define it.
223
224@item ALIGNMENT
225Some @file{scripttempl} scripts set this to a number to pass to
226@code{ALIGN} to set the required alignment for the @code{end} symbol.
227@end table
228
229@node linker scripts
230@section @file{scripttempl} scripts
231
232Each linker target uses a @file{scripttempl} script to generate the
233default linker scripts.  The name of the @file{scripttempl} script is
234set by the @code{SCRIPT_NAME} variable in the @file{emulparams} script.
235If @code{SCRIPT_NAME} is set to @var{script}, @code{genscripts.sh} will
236invoke @file{scripttempl/@var{script}.sc}.
237
238The @file{genscripts.sh} script will invoke the @file{scripttempl}
239script 5 to 9 times.  Each time it will set the shell variable
240@code{LD_FLAG} to a different value.  When the linker is run, the
241options used will direct it to select a particular script.  (Script
242selection is controlled by the @code{get_script} emulation entry point;
243this describes the conventional behaviour).
244
245The @file{scripttempl} script should just write a linker script, written
246in the linker command language, to standard output.  If the emulation
247name--the name of the @file{emulparams} file without the @file{.sc}
248extension--is @var{emul}, then the output will be directed to
249@file{ldscripts/@var{emul}.@var{extension}} in the build directory,
250where @var{extension} changes each time the @file{scripttempl} script is
251invoked.
252
253Here is the list of values assigned to @code{LD_FLAG}.
254
255@table @code
256@item (empty)
257The script generated is used by default (when none of the following
258cases apply).  The output has an extension of @file{.x}.
259@item n
260The script generated is used when the linker is invoked with the
261@code{-n} option.  The output has an extension of @file{.xn}.
262@item N
263The script generated is used when the linker is invoked with the
264@code{-N} option.  The output has an extension of @file{.xbn}.
265@item r
266The script generated is used when the linker is invoked with the
267@code{-r} option.  The output has an extension of @file{.xr}.
268@item u
269The script generated is used when the linker is invoked with the
270@code{-Ur} option.  The output has an extension of @file{.xu}.
271@item shared
272The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
273this value if @code{GENERATE_SHLIB_SCRIPT} is defined in the
274@file{emulparams} file.  The @file{emultempl} script must arrange to use
275this script at the appropriate time, normally when the linker is invoked
276with the @code{-shared} option.  The output has an extension of
277@file{.xs}.
278@item c
279The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
280this value if @code{GENERATE_COMBRELOC_SCRIPT} is defined in the
281@file{emulparams} file or if @code{SCRIPT_NAME} is @code{elf}. The
282@file{emultempl} script must arrange to use this script at the appropriate
283time, normally when the linker is invoked with the @code{-z combreloc}
284option.  The output has an extension of
285@file{.xc}.
286@item cshared
287The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
288this value if @code{GENERATE_COMBRELOC_SCRIPT} is defined in the
289@file{emulparams} file or if @code{SCRIPT_NAME} is @code{elf} and
290@code{GENERATE_SHLIB_SCRIPT} is defined in the @file{emulparams} file.
291The @file{emultempl} script must arrange to use this script at the
292appropriate time, normally when the linker is invoked with the @code{-shared
293-z combreloc} option.  The output has an extension of @file{.xsc}.
294@item auto_import
295The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
296this value if @code{GENERATE_AUTO_IMPORT_SCRIPT} is defined in the
297@file{emulparams} file.  The @file{emultempl} script must arrange to
298use this script at the appropriate time, normally when the linker is
299invoked with the @code{--enable-auto-import} option.  The output has
300an extension of @file{.xa}.
301@end table
302
303Besides the shell variables set by the @file{emulparams} script, and the
304@code{LD_FLAG} variable, the @file{genscripts.sh} script will set
305certain variables for each run of the @file{scripttempl} script.
306
307@table @code
308@item RELOCATING
309This will be set to a non-empty string when the linker is doing a final
310relocation (e.g., all scripts other than @code{-r} and @code{-Ur}).
311
312@item CONSTRUCTING
313This will be set to a non-empty string when the linker is building
314global constructor and destructor tables (e.g., all scripts other than
315@code{-r}).
316
317@item DATA_ALIGNMENT
318This will be set to an @code{ALIGN} expression when the output should be
319page aligned, or to @samp{.} when generating the @code{-N} script.
320
321@item CREATE_SHLIB
322This will be set to a non-empty string when generating a @code{-shared}
323script.
324
325@item COMBRELOC
326This will be set to a non-empty string when generating @code{-z combreloc}
327scripts to a temporary file name which can be used during script generation.
328@end table
329
330The conventional way to write a @file{scripttempl} script is to first
331set a few shell variables, and then write out a linker script using
332@code{cat} with a here document.  The linker script will use variable
333substitutions, based on the above variables and those set in the
334@file{emulparams} script, to control its behaviour.
335
336When there are parts of the @file{scripttempl} script which should only
337be run when doing a final relocation, they should be enclosed within a
338variable substitution based on @code{RELOCATING}.  For example, on many
339targets special symbols such as @code{_end} should be defined when doing
340a final link.  Naturally, those symbols should not be defined when doing
341a relocatable link using @code{-r}.  The @file{scripttempl} script
342could use a construct like this to define those symbols:
343@smallexample
344  $@{RELOCATING+ _end = .;@}
345@end smallexample
346This will do the symbol assignment only if the @code{RELOCATING}
347variable is defined.
348
349The basic job of the linker script is to put the sections in the correct
350order, and at the correct memory addresses.  For some targets, the
351linker script may have to do some other operations.
352
353For example, on most MIPS platforms, the linker is responsible for
354defining the special symbol @code{_gp}, used to initialize the
355@code{$gp} register.  It must be set to the start of the small data
356section plus @code{0x8000}.  Naturally, it should only be defined when
357doing a final relocation.  This will typically be done like this:
358@smallexample
359  $@{RELOCATING+ _gp = ALIGN(16) + 0x8000;@}
360@end smallexample
361This line would appear just before the sections which compose the small
362data section (@samp{.sdata}, @samp{.sbss}).  All those sections would be
363contiguous in memory.
364
365Many COFF systems build constructor tables in the linker script.  The
366compiler will arrange to output the address of each global constructor
367in a @samp{.ctor} section, and the address of each global destructor in
368a @samp{.dtor} section (this is done by defining
369@code{ASM_OUTPUT_CONSTRUCTOR} and @code{ASM_OUTPUT_DESTRUCTOR} in the
370@code{gcc} configuration files).  The @code{gcc} runtime support
371routines expect the constructor table to be named @code{__CTOR_LIST__}.
372They expect it to be a list of words, with the first word being the
373count of the number of entries.  There should be a trailing zero word.
374(Actually, the count may be -1 if the trailing word is present, and the
375trailing word may be omitted if the count is correct, but, as the
376@code{gcc} behaviour has changed slightly over the years, it is safest
377to provide both).  Here is a typical way that might be handled in a
378@file{scripttempl} file.
379@smallexample
380    $@{CONSTRUCTING+ __CTOR_LIST__ = .;@}
381    $@{CONSTRUCTING+ LONG((__CTOR_END__ - __CTOR_LIST__) / 4 - 2)@}
382    $@{CONSTRUCTING+ *(.ctors)@}
383    $@{CONSTRUCTING+ LONG(0)@}
384    $@{CONSTRUCTING+ __CTOR_END__ = .;@}
385    $@{CONSTRUCTING+ __DTOR_LIST__ = .;@}
386    $@{CONSTRUCTING+ LONG((__DTOR_END__ - __DTOR_LIST__) / 4 - 2)@}
387    $@{CONSTRUCTING+ *(.dtors)@}
388    $@{CONSTRUCTING+ LONG(0)@}
389    $@{CONSTRUCTING+ __DTOR_END__ = .;@}
390@end smallexample
391The use of @code{CONSTRUCTING} ensures that these linker script commands
392will only appear when the linker is supposed to be building the
393constructor and destructor tables.  This example is written for a target
394which uses 4 byte pointers.
395
396Embedded systems often need to set a stack address.  This is normally
397best done by using the @code{PROVIDE} construct with a default stack
398address.  This permits the user to easily override the stack address
399using the @code{--defsym} option.  Here is an example:
400@smallexample
401  $@{RELOCATING+ PROVIDE (__stack = 0x80000000);@}
402@end smallexample
403The value of the symbol @code{__stack} would then be used in the startup
404code to initialize the stack pointer.
405
406@node linker emulations
407@section @file{emultempl} scripts
408
409Each linker target uses an @file{emultempl} script to generate the
410emulation code.  The name of the @file{emultempl} script is set by the
411@code{TEMPLATE_NAME} variable in the @file{emulparams} script.  If the
412@code{TEMPLATE_NAME} variable is not set, the default is
413@samp{generic}.  If the value of @code{TEMPLATE_NAME} is @var{template},
414@file{genscripts.sh} will use @file{emultempl/@var{template}.em}.
415
416Most targets use the generic @file{emultempl} script,
417@file{emultempl/generic.em}.  A different @file{emultempl} script is
418only needed if the linker must support unusual actions, such as linking
419against shared libraries.
420
421The @file{emultempl} script is normally written as a simple invocation
422of @code{cat} with a here document.  The document will use a few
423variable substitutions.  Typically each function names uses a
424substitution involving @code{EMULATION_NAME}, for ease of debugging when
425the linker supports multiple emulations.
426
427Every function and variable in the emitted file should be static.  The
428only globally visible object must be named
429@code{ld_@var{EMULATION_NAME}_emulation}, where @var{EMULATION_NAME} is
430the name of the emulation set in @file{configure.tgt} (this is also the
431name of the @file{emulparams} file without the @file{.sh} extension).
432The @file{genscripts.sh} script will set the shell variable
433@code{EMULATION_NAME} before invoking the @file{emultempl} script.
434
435The @code{ld_@var{EMULATION_NAME}_emulation} variable must be a
436@code{struct ld_emulation_xfer_struct}, as defined in @file{ldemul.h}.
437It defines a set of function pointers which are invoked by the linker,
438as well as strings for the emulation name (normally set from the shell
439variable @code{EMULATION_NAME} and the default BFD target name (normally
440set from the shell variable @code{OUTPUT_FORMAT} which is normally set
441by the @file{emulparams} file).
442
443The @file{genscripts.sh} script will set the shell variable
444@code{COMPILE_IN} when it invokes the @file{emultempl} script for the
445default emulation.  In this case, the @file{emultempl} script should
446include the linker scripts directly, and return them from the
447@code{get_scripts} entry point.  When the emulation is not the default,
448the @code{get_scripts} entry point should just return a file name.  See
449@file{emultempl/generic.em} for an example of how this is done.
450
451At some point, the linker emulation entry points should be documented.
452
453@node Emulation Walkthrough
454@chapter A Walkthrough of a Typical Emulation
455
456This chapter is to help people who are new to the way emulations
457interact with the linker, or who are suddenly thrust into the position
458of having to work with existing emulations.  It will discuss the files
459you need to be aware of.  It will tell you when the given "hooks" in
460the emulation will be called.  It will, hopefully, give you enough
461information about when and how things happen that you'll be able to
462get by.  As always, the source is the definitive reference to this.
463
464The starting point for the linker is in @file{ldmain.c} where
465@code{main} is defined.  The bulk of the code that's emulation
466specific will initially be in @code{emultempl/@var{emulation}.em} but
467will end up in @code{e@var{emulation}.c} when the build is done.
468Most of the work to select and interface with emulations is in
469@code{ldemul.h} and @code{ldemul.c}.  Specifically, @code{ldemul.h}
470defines the @code{ld_emulation_xfer_struct} structure your emulation
471exports.
472
473Your emulation file exports a symbol
474@code{ld_@var{EMULATION_NAME}_emulation}.  If your emulation is
475selected (it usually is, since usually there's only one),
476@code{ldemul.c} sets the variable @var{ld_emulation} to point to it.
477@code{ldemul.c} also defines a number of API functions that interface
478to your emulation, like @code{ldemul_after_parse} which simply calls
479your @code{ld_@var{EMULATION}_emulation.after_parse} function.  For
480the rest of this section, the functions will be mentioned, but you
481should assume the indirect reference to your emulation also.
482
483We will also skip or gloss over parts of the link process that don't
484relate to emulations, like setting up internationalization.
485
486After initialization, @code{main} selects an emulation by pre-scanning
487the command line arguments.  It calls @code{ldemul_choose_target} to
488choose a target.  If you set @code{choose_target} to
489@code{ldemul_default_target}, it picks your @code{target_name} by
490default.
491
492@code{main} calls @code{ldemul_before_parse}, then @code{parse_args}.
493@code{parse_args} calls @code{ldemul_parse_args} for each arg, which
494must update the @code{getopt} globals if it recognizes the argument.
495If the emulation doesn't recognize it, then parse_args checks to see
496if it recognizes it.
497
498Now that the emulation has had access to all its command-line options,
499@code{main} calls @code{ldemul_set_symbols}.  This can be used for any
500initialization that may be affected by options.  It is also supposed
501to set up any variables needed by the emulation script.
502
503@code{main} now calls @code{ldemul_get_script} to get the emulation
504script to use (based on arguments, no doubt, @pxref{Emulations}) and
505runs it.  While parsing, @code{ldgram.y} may call @code{ldemul_hll} or
506@code{ldemul_syslib} to handle the @code{HLL} or @code{SYSLIB}
507commands.  It may call @code{ldemul_unrecognized_file} if you asked
508the linker to link a file it doesn't recognize.  It will call
509@code{ldemul_recognized_file} for each file it does recognize, in case
510the emulation wants to handle some files specially.  All the while,
511it's loading the files (possibly calling
512@code{ldemul_open_dynamic_archive}) and symbols and stuff.  After it's
513done reading the script, @code{main} calls @code{ldemul_after_parse}.
514Use the after-parse hook to set up anything that depends on stuff the
515script might have set up, like the entry point.
516
517@code{main} next calls @code{lang_process} in @code{ldlang.c}.  This
518appears to be the main core of the linking itself, as far as emulation
519hooks are concerned(*).  It first opens the output file's BFD, calling
520@code{ldemul_set_output_arch}, and calls
521@code{ldemul_create_output_section_statements} in case you need to use
522other means to find or create object files (i.e. shared libraries
523found on a path, or fake stub objects).  Despite the name, nobody
524creates output sections here.
525
526(*) In most cases, the BFD library does the bulk of the actual
527linking, handling symbol tables, symbol resolution, relocations, and
528building the final output file.  See the BFD reference for all the
529details.  Your emulation is usually concerned more with managing
530things at the file and section level, like "put this here, add this
531section", etc.
532
533Next, the objects to be linked are opened and BFDs created for them,
534and @code{ldemul_after_open} is called.  At this point, you have all
535the objects and symbols loaded, but none of the data has been placed
536yet.
537
538Next comes the Big Linking Thingy (except for the parts BFD does).
539All input sections are mapped to output sections according to the
540script.  If a section doesn't get mapped by default,
541@code{ldemul_place_orphan} will get called to figure out where it goes.
542Next it figures out the offsets for each section, calling
543@code{ldemul_before_allocation} before and
544@code{ldemul_after_allocation} after deciding where each input section
545ends up in the output sections.
546
547The last part of @code{lang_process} is to figure out all the symbols'
548values.  After assigning final values to the symbols,
549@code{ldemul_finish} is called, and after that, any undefined symbols
550are turned into fatal errors.
551
552OK, back to @code{main}, which calls @code{ldwrite} in
553@file{ldwrite.c}.  @code{ldwrite} calls BFD's final_link, which does
554all the relocation fixups and writes the output bfd to disk, and we're
555done.
556
557In summary,
558
559@itemize @bullet
560
561@item @code{main()} in @file{ldmain.c}
562@item @file{emultempl/@var{EMULATION}.em} has your code
563@item @code{ldemul_choose_target} (defaults to your @code{target_name})
564@item @code{ldemul_before_parse}
565@item Parse argv, calls @code{ldemul_parse_args} for each
566@item @code{ldemul_set_symbols}
567@item @code{ldemul_get_script}
568@item parse script
569
570@itemize @bullet
571@item may call @code{ldemul_hll} or @code{ldemul_syslib}
572@item may call @code{ldemul_open_dynamic_archive}
573@end itemize
574
575@item @code{ldemul_after_parse}
576@item @code{lang_process()} in @file{ldlang.c}
577
578@itemize @bullet
579@item create @code{output_bfd}
580@item @code{ldemul_set_output_arch}
581@item @code{ldemul_create_output_section_statements}
582@item read objects, create input bfds - all symbols exist, but have no values
583@item may call @code{ldemul_unrecognized_file}
584@item will call @code{ldemul_recognized_file}
585@item @code{ldemul_after_open}
586@item map input sections to output sections
587@item may call @code{ldemul_place_orphan} for remaining sections
588@item @code{ldemul_before_allocation}
589@item gives input sections offsets into output sections, places output sections
590@item @code{ldemul_after_allocation} - section addresses valid
591@item assigns values to symbols
592@item @code{ldemul_finish} - symbol values valid
593@end itemize
594
595@item output bfd is written to disk
596
597@end itemize
598
599@node Architecture Specific
600@chapter Some Architecture Specific Notes
601
602This is the place for notes on the behavior of @code{ld} on
603specific platforms.  Currently, only Intel x86 is documented (and 
604of that, only the auto-import behavior for DLLs).
605
606@menu
607* ix86::                        Intel x86
608@end menu
609
610@node ix86
611@section Intel x86
612
613@table @emph
614@code{ld} can create DLLs that operate with various runtimes available
615on a common x86 operating system.  These runtimes include native (using 
616the mingw "platform"), cygwin, and pw.
617
618@item auto-import from DLLs 
619@enumerate
620@item
621With this feature on, DLL clients can import variables from DLL 
622without any concern from their side (for example, without any source
623code modifications).  Auto-import can be enabled using the 
624@code{--enable-auto-import} flag, or disabled via the 
625@code{--disable-auto-import} flag.  Auto-import is disabled by default.
626
627@item
628This is done completely in bounds of the PE specification (to be fair,
629there's a minor violation of the spec at one point, but in practice 
630auto-import works on all known variants of that common x86 operating
631system)  So, the resulting DLL can be used with any other PE 
632compiler/linker.
633
634@item
635Auto-import is fully compatible with standard import method, in which
636variables are decorated using attribute modifiers. Libraries of either
637type may be mixed together.
638
639@item
640Overhead (space): 8 bytes per imported symbol, plus 20 for each
641reference to it; Overhead (load time): negligible; Overhead 
642(virtual/physical memory): should be less than effect of DLL 
643relocation.
644@end enumerate
645
646Motivation
647
648The obvious and only way to get rid of dllimport insanity is 
649to make client access variable directly in the DLL, bypassing 
650the extra dereference imposed by ordinary DLL runtime linking.
651I.e., whenever client contains something like
652
653@code{mov dll_var,%eax,}
654
655address of dll_var in the command should be relocated to point 
656into loaded DLL. The aim is to make OS loader do so, and than 
657make ld help with that.  Import section of PE made following 
658way: there's a vector of structures each describing imports 
659from particular DLL. Each such structure points to two other 
660parallel vectors: one holding imported names, and one which 
661will hold address of corresponding imported name. So, the 
662solution is de-vectorize these structures, making import 
663locations be sparse and pointing directly into code.
664
665Implementation
666
667For each reference of data symbol to be imported from DLL (to 
668set of which belong symbols with name <sym>, if __imp_<sym> is 
669found in implib), the import fixup entry is generated. That 
670entry is of type IMAGE_IMPORT_DESCRIPTOR and stored in .idata$3 
671subsection. Each fixup entry contains pointer to symbol's address 
672within .text section (marked with __fuN_<sym> symbol, where N is 
673integer), pointer to DLL name (so, DLL name is referenced by 
674multiple entries), and pointer to symbol name thunk. Symbol name 
675thunk is singleton vector (__nm_th_<symbol>) pointing to 
676IMAGE_IMPORT_BY_NAME structure (__nm_<symbol>) directly containing 
677imported name. Here comes that "om the edge" problem mentioned above: 
678PE specification rambles that name vector (OriginalFirstThunk) should 
679run in parallel with addresses vector (FirstThunk), i.e. that they 
680should have same number of elements and terminated with zero. We violate
681this, since FirstThunk points directly into machine code. But in 
682practice, OS loader implemented the sane way: it goes thru 
683OriginalFirstThunk and puts addresses to FirstThunk, not something 
684else. It once again should be noted that dll and symbol name 
685structures are reused across fixup entries and should be there 
686anyway to support standard import stuff, so sustained overhead is 
68720 bytes per reference. Other question is whether having several 
688IMAGE_IMPORT_DESCRIPTORS for the same DLL is possible. Answer is yes, 
689it is done even by native compiler/linker (libth32's functions are in 
690fact resident in windows9x kernel32.dll, so if you use it, you have 
691two IMAGE_IMPORT_DESCRIPTORS for kernel32.dll). Yet other question is 
692whether referencing the same PE structures several times is valid. 
693The answer is why not, prohibiting that (detecting violation) would 
694require more work on behalf of loader than not doing it.
695
696@end table
697
698@node GNU Free Documentation License
699@chapter GNU Free Documentation License
700
701@include fdl.texi
702
703@contents
704@bye
705