1\input texinfo 2@setfilename ldint.info 3@c Copyright 1992, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 4@c 2003, 2005, 2006, 2007 5@c Free Software Foundation, Inc. 6 7@ifnottex 8@dircategory Software development 9@direntry 10* Ld-Internals: (ldint). The GNU linker internals. 11@end direntry 12@end ifnottex 13 14@copying 15This file documents the internals of the GNU linker ld. 16 17Copyright @copyright{} 1992, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2007 18Free Software Foundation, Inc. 19Contributed by Cygnus Support. 20 21Permission is granted to copy, distribute and/or modify this document 22under the terms of the GNU Free Documentation License, Version 1.3 or 23any later version published by the Free Software Foundation; with the 24Invariant Sections being ``GNU General Public License'' and ``Funding 25Free Software'', the Front-Cover texts being (a) (see below), and with 26the Back-Cover Texts being (b) (see below). A copy of the license is 27included in the section entitled ``GNU Free Documentation License''. 28 29(a) The FSF's Front-Cover Text is: 30 31 A GNU Manual 32 33(b) The FSF's Back-Cover Text is: 34 35 You have freedom to copy and modify this GNU Manual, like GNU 36 software. Copies published by the Free Software Foundation raise 37 funds for GNU development. 38@end copying 39 40@iftex 41@finalout 42@setchapternewpage off 43@settitle GNU Linker Internals 44@titlepage 45@title{A guide to the internals of the GNU linker} 46@author Per Bothner, Steve Chamberlain, Ian Lance Taylor, DJ Delorie 47@author Cygnus Support 48@page 49 50@tex 51\def\$#1${{#1}} % Kluge: collect RCS revision info without $...$ 52\xdef\manvers{2.10.91} % For use in headers, footers too 53{\parskip=0pt 54\hfill Cygnus Support\par 55\hfill \manvers\par 56\hfill \TeX{}info \texinfoversion\par 57} 58@end tex 59 60@vskip 0pt plus 1filll 61Copyright @copyright{} 1992, 1993, 1994, 1995, 1996, 1997, 1998, 2000 62Free Software Foundation, Inc. 63 64 Permission is granted to copy, distribute and/or modify this document 65 under the terms of the GNU Free Documentation License, Version 1.3 66 or any later version published by the Free Software Foundation; 67 with no Invariant Sections, with no Front-Cover Texts, and with no 68 Back-Cover Texts. A copy of the license is included in the 69 section entitled "GNU Free Documentation License". 70 71@end titlepage 72@end iftex 73 74@node Top 75@top 76 77This file documents the internals of the GNU linker @code{ld}. It is a 78collection of miscellaneous information with little form at this point. 79Mostly, it is a repository into which you can put information about 80GNU @code{ld} as you discover it (or as you design changes to @code{ld}). 81 82This document is distributed under the terms of the GNU Free 83Documentation License. A copy of the license is included in the 84section entitled "GNU Free Documentation License". 85 86@menu 87* README:: The README File 88* Emulations:: How linker emulations are generated 89* Emulation Walkthrough:: A Walkthrough of a Typical Emulation 90* Architecture Specific:: Some Architecture Specific Notes 91* GNU Free Documentation License:: GNU Free Documentation License 92@end menu 93 94@node README 95@chapter The @file{README} File 96 97Check the @file{README} file; it often has useful information that does not 98appear anywhere else in the directory. 99 100@node Emulations 101@chapter How linker emulations are generated 102 103Each linker target has an @dfn{emulation}. The emulation includes the 104default linker script, and certain emulations also modify certain types 105of linker behaviour. 106 107Emulations are created during the build process by the shell script 108@file{genscripts.sh}. 109 110The @file{genscripts.sh} script starts by reading a file in the 111@file{emulparams} directory. This is a shell script which sets various 112shell variables used by @file{genscripts.sh} and the other shell scripts 113it invokes. 114 115The @file{genscripts.sh} script will invoke a shell script in the 116@file{scripttempl} directory in order to create default linker scripts 117written in the linker command language. The @file{scripttempl} script 118will be invoked 5 (or, in some cases, 6) times, with different 119assignments to shell variables, to create different default scripts. 120The choice of script is made based on the command line options. 121 122After creating the scripts, @file{genscripts.sh} will invoke yet another 123shell script, this time in the @file{emultempl} directory. That shell 124script will create the emulation source file, which contains C code. 125This C code permits the linker emulation to override various linker 126behaviours. Most targets use the generic emulation code, which is in 127@file{emultempl/generic.em}. 128 129To summarize, @file{genscripts.sh} reads three shell scripts: an 130emulation parameters script in the @file{emulparams} directory, a linker 131script generation script in the @file{scripttempl} directory, and an 132emulation source file generation script in the @file{emultempl} 133directory. 134 135For example, the Sun 4 linker sets up variables in 136@file{emulparams/sun4.sh}, creates linker scripts using 137@file{scripttempl/aout.sc}, and creates the emulation code using 138@file{emultempl/sunos.em}. 139 140Note that the linker can support several emulations simultaneously, 141depending upon how it is configured. An emulation can be selected with 142the @code{-m} option. The @code{-V} option will list all supported 143emulations. 144 145@menu 146* emulation parameters:: @file{emulparams} scripts 147* linker scripts:: @file{scripttempl} scripts 148* linker emulations:: @file{emultempl} scripts 149@end menu 150 151@node emulation parameters 152@section @file{emulparams} scripts 153 154Each target selects a particular file in the @file{emulparams} directory 155by setting the shell variable @code{targ_emul} in @file{configure.tgt}. 156This shell variable is used by the @file{configure} script to control 157building an emulation source file. 158 159Certain conventions are enforced. Suppose the @code{targ_emul} variable 160is set to @var{emul} in @file{configure.tgt}. The name of the emulation 161shell script will be @file{emulparams/@var{emul}.sh}. The 162@file{Makefile} must have a target named @file{e@var{emul}.c}; this 163target must depend upon @file{emulparams/@var{emul}.sh}, as well as the 164appropriate scripts in the @file{scripttempl} and @file{emultempl} 165directories. The @file{Makefile} target must invoke @code{GENSCRIPTS} 166with two arguments: @var{emul}, and the value of the make variable 167@code{tdir_@var{emul}}. The value of the latter variable will be set by 168the @file{configure} script, and is used to set the default target 169directory to search. 170 171By convention, the @file{emulparams/@var{emul}.sh} shell script should 172only set shell variables. It may set shell variables which are to be 173interpreted by the @file{scripttempl} and the @file{emultempl} scripts. 174Certain shell variables are interpreted directly by the 175@file{genscripts.sh} script. 176 177Here is a list of shell variables interpreted by @file{genscripts.sh}, 178as well as some conventional shell variables interpreted by the 179@file{scripttempl} and @file{emultempl} scripts. 180 181@table @code 182@item SCRIPT_NAME 183This is the name of the @file{scripttempl} script to use. If 184@code{SCRIPT_NAME} is set to @var{script}, @file{genscripts.sh} will use 185the script @file{scripttempl/@var{script}.sc}. 186 187@item TEMPLATE_NAME 188This is the name of the @file{emultempl} script to use. If 189@code{TEMPLATE_NAME} is set to @var{template}, @file{genscripts.sh} will 190use the script @file{emultempl/@var{template}.em}. If this variable is 191not set, the default value is @samp{generic}. 192 193@item GENERATE_SHLIB_SCRIPT 194If this is set to a nonempty string, @file{genscripts.sh} will invoke 195the @file{scripttempl} script an extra time to create a shared library 196script. @ref{linker scripts}. 197 198@item OUTPUT_FORMAT 199This is normally set to indicate the BFD output format use (e.g., 200@samp{"a.out-sunos-big"}. The @file{scripttempl} script will normally 201use it in an @code{OUTPUT_FORMAT} expression in the linker script. 202 203@item ARCH 204This is normally set to indicate the architecture to use (e.g., 205@samp{sparc}). The @file{scripttempl} script will normally use it in an 206@code{OUTPUT_ARCH} expression in the linker script. 207 208@item ENTRY 209Some @file{scripttempl} scripts use this to set the entry address, in an 210@code{ENTRY} expression in the linker script. 211 212@item TEXT_START_ADDR 213Some @file{scripttempl} scripts use this to set the start address of the 214@samp{.text} section. 215 216@item SEGMENT_SIZE 217The @file{genscripts.sh} script uses this to set the default value of 218@code{DATA_ALIGNMENT} when running the @file{scripttempl} script. 219 220@item TARGET_PAGE_SIZE 221If @code{SEGMENT_SIZE} is not defined, the @file{genscripts.sh} script 222uses this to define it. 223 224@item ALIGNMENT 225Some @file{scripttempl} scripts set this to a number to pass to 226@code{ALIGN} to set the required alignment for the @code{end} symbol. 227@end table 228 229@node linker scripts 230@section @file{scripttempl} scripts 231 232Each linker target uses a @file{scripttempl} script to generate the 233default linker scripts. The name of the @file{scripttempl} script is 234set by the @code{SCRIPT_NAME} variable in the @file{emulparams} script. 235If @code{SCRIPT_NAME} is set to @var{script}, @code{genscripts.sh} will 236invoke @file{scripttempl/@var{script}.sc}. 237 238The @file{genscripts.sh} script will invoke the @file{scripttempl} 239script 5 to 9 times. Each time it will set the shell variable 240@code{LD_FLAG} to a different value. When the linker is run, the 241options used will direct it to select a particular script. (Script 242selection is controlled by the @code{get_script} emulation entry point; 243this describes the conventional behaviour). 244 245The @file{scripttempl} script should just write a linker script, written 246in the linker command language, to standard output. If the emulation 247name--the name of the @file{emulparams} file without the @file{.sc} 248extension--is @var{emul}, then the output will be directed to 249@file{ldscripts/@var{emul}.@var{extension}} in the build directory, 250where @var{extension} changes each time the @file{scripttempl} script is 251invoked. 252 253Here is the list of values assigned to @code{LD_FLAG}. 254 255@table @code 256@item (empty) 257The script generated is used by default (when none of the following 258cases apply). The output has an extension of @file{.x}. 259@item n 260The script generated is used when the linker is invoked with the 261@code{-n} option. The output has an extension of @file{.xn}. 262@item N 263The script generated is used when the linker is invoked with the 264@code{-N} option. The output has an extension of @file{.xbn}. 265@item r 266The script generated is used when the linker is invoked with the 267@code{-r} option. The output has an extension of @file{.xr}. 268@item u 269The script generated is used when the linker is invoked with the 270@code{-Ur} option. The output has an extension of @file{.xu}. 271@item shared 272The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to 273this value if @code{GENERATE_SHLIB_SCRIPT} is defined in the 274@file{emulparams} file. The @file{emultempl} script must arrange to use 275this script at the appropriate time, normally when the linker is invoked 276with the @code{-shared} option. The output has an extension of 277@file{.xs}. 278@item c 279The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to 280this value if @code{GENERATE_COMBRELOC_SCRIPT} is defined in the 281@file{emulparams} file or if @code{SCRIPT_NAME} is @code{elf}. The 282@file{emultempl} script must arrange to use this script at the appropriate 283time, normally when the linker is invoked with the @code{-z combreloc} 284option. The output has an extension of 285@file{.xc}. 286@item cshared 287The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to 288this value if @code{GENERATE_COMBRELOC_SCRIPT} is defined in the 289@file{emulparams} file or if @code{SCRIPT_NAME} is @code{elf} and 290@code{GENERATE_SHLIB_SCRIPT} is defined in the @file{emulparams} file. 291The @file{emultempl} script must arrange to use this script at the 292appropriate time, normally when the linker is invoked with the @code{-shared 293-z combreloc} option. The output has an extension of @file{.xsc}. 294@item auto_import 295The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to 296this value if @code{GENERATE_AUTO_IMPORT_SCRIPT} is defined in the 297@file{emulparams} file. The @file{emultempl} script must arrange to 298use this script at the appropriate time, normally when the linker is 299invoked with the @code{--enable-auto-import} option. The output has 300an extension of @file{.xa}. 301@end table 302 303Besides the shell variables set by the @file{emulparams} script, and the 304@code{LD_FLAG} variable, the @file{genscripts.sh} script will set 305certain variables for each run of the @file{scripttempl} script. 306 307@table @code 308@item RELOCATING 309This will be set to a non-empty string when the linker is doing a final 310relocation (e.g., all scripts other than @code{-r} and @code{-Ur}). 311 312@item CONSTRUCTING 313This will be set to a non-empty string when the linker is building 314global constructor and destructor tables (e.g., all scripts other than 315@code{-r}). 316 317@item DATA_ALIGNMENT 318This will be set to an @code{ALIGN} expression when the output should be 319page aligned, or to @samp{.} when generating the @code{-N} script. 320 321@item CREATE_SHLIB 322This will be set to a non-empty string when generating a @code{-shared} 323script. 324 325@item COMBRELOC 326This will be set to a non-empty string when generating @code{-z combreloc} 327scripts to a temporary file name which can be used during script generation. 328@end table 329 330The conventional way to write a @file{scripttempl} script is to first 331set a few shell variables, and then write out a linker script using 332@code{cat} with a here document. The linker script will use variable 333substitutions, based on the above variables and those set in the 334@file{emulparams} script, to control its behaviour. 335 336When there are parts of the @file{scripttempl} script which should only 337be run when doing a final relocation, they should be enclosed within a 338variable substitution based on @code{RELOCATING}. For example, on many 339targets special symbols such as @code{_end} should be defined when doing 340a final link. Naturally, those symbols should not be defined when doing 341a relocatable link using @code{-r}. The @file{scripttempl} script 342could use a construct like this to define those symbols: 343@smallexample 344 $@{RELOCATING+ _end = .;@} 345@end smallexample 346This will do the symbol assignment only if the @code{RELOCATING} 347variable is defined. 348 349The basic job of the linker script is to put the sections in the correct 350order, and at the correct memory addresses. For some targets, the 351linker script may have to do some other operations. 352 353For example, on most MIPS platforms, the linker is responsible for 354defining the special symbol @code{_gp}, used to initialize the 355@code{$gp} register. It must be set to the start of the small data 356section plus @code{0x8000}. Naturally, it should only be defined when 357doing a final relocation. This will typically be done like this: 358@smallexample 359 $@{RELOCATING+ _gp = ALIGN(16) + 0x8000;@} 360@end smallexample 361This line would appear just before the sections which compose the small 362data section (@samp{.sdata}, @samp{.sbss}). All those sections would be 363contiguous in memory. 364 365Many COFF systems build constructor tables in the linker script. The 366compiler will arrange to output the address of each global constructor 367in a @samp{.ctor} section, and the address of each global destructor in 368a @samp{.dtor} section (this is done by defining 369@code{ASM_OUTPUT_CONSTRUCTOR} and @code{ASM_OUTPUT_DESTRUCTOR} in the 370@code{gcc} configuration files). The @code{gcc} runtime support 371routines expect the constructor table to be named @code{__CTOR_LIST__}. 372They expect it to be a list of words, with the first word being the 373count of the number of entries. There should be a trailing zero word. 374(Actually, the count may be -1 if the trailing word is present, and the 375trailing word may be omitted if the count is correct, but, as the 376@code{gcc} behaviour has changed slightly over the years, it is safest 377to provide both). Here is a typical way that might be handled in a 378@file{scripttempl} file. 379@smallexample 380 $@{CONSTRUCTING+ __CTOR_LIST__ = .;@} 381 $@{CONSTRUCTING+ LONG((__CTOR_END__ - __CTOR_LIST__) / 4 - 2)@} 382 $@{CONSTRUCTING+ *(.ctors)@} 383 $@{CONSTRUCTING+ LONG(0)@} 384 $@{CONSTRUCTING+ __CTOR_END__ = .;@} 385 $@{CONSTRUCTING+ __DTOR_LIST__ = .;@} 386 $@{CONSTRUCTING+ LONG((__DTOR_END__ - __DTOR_LIST__) / 4 - 2)@} 387 $@{CONSTRUCTING+ *(.dtors)@} 388 $@{CONSTRUCTING+ LONG(0)@} 389 $@{CONSTRUCTING+ __DTOR_END__ = .;@} 390@end smallexample 391The use of @code{CONSTRUCTING} ensures that these linker script commands 392will only appear when the linker is supposed to be building the 393constructor and destructor tables. This example is written for a target 394which uses 4 byte pointers. 395 396Embedded systems often need to set a stack address. This is normally 397best done by using the @code{PROVIDE} construct with a default stack 398address. This permits the user to easily override the stack address 399using the @code{--defsym} option. Here is an example: 400@smallexample 401 $@{RELOCATING+ PROVIDE (__stack = 0x80000000);@} 402@end smallexample 403The value of the symbol @code{__stack} would then be used in the startup 404code to initialize the stack pointer. 405 406@node linker emulations 407@section @file{emultempl} scripts 408 409Each linker target uses an @file{emultempl} script to generate the 410emulation code. The name of the @file{emultempl} script is set by the 411@code{TEMPLATE_NAME} variable in the @file{emulparams} script. If the 412@code{TEMPLATE_NAME} variable is not set, the default is 413@samp{generic}. If the value of @code{TEMPLATE_NAME} is @var{template}, 414@file{genscripts.sh} will use @file{emultempl/@var{template}.em}. 415 416Most targets use the generic @file{emultempl} script, 417@file{emultempl/generic.em}. A different @file{emultempl} script is 418only needed if the linker must support unusual actions, such as linking 419against shared libraries. 420 421The @file{emultempl} script is normally written as a simple invocation 422of @code{cat} with a here document. The document will use a few 423variable substitutions. Typically each function names uses a 424substitution involving @code{EMULATION_NAME}, for ease of debugging when 425the linker supports multiple emulations. 426 427Every function and variable in the emitted file should be static. The 428only globally visible object must be named 429@code{ld_@var{EMULATION_NAME}_emulation}, where @var{EMULATION_NAME} is 430the name of the emulation set in @file{configure.tgt} (this is also the 431name of the @file{emulparams} file without the @file{.sh} extension). 432The @file{genscripts.sh} script will set the shell variable 433@code{EMULATION_NAME} before invoking the @file{emultempl} script. 434 435The @code{ld_@var{EMULATION_NAME}_emulation} variable must be a 436@code{struct ld_emulation_xfer_struct}, as defined in @file{ldemul.h}. 437It defines a set of function pointers which are invoked by the linker, 438as well as strings for the emulation name (normally set from the shell 439variable @code{EMULATION_NAME} and the default BFD target name (normally 440set from the shell variable @code{OUTPUT_FORMAT} which is normally set 441by the @file{emulparams} file). 442 443The @file{genscripts.sh} script will set the shell variable 444@code{COMPILE_IN} when it invokes the @file{emultempl} script for the 445default emulation. In this case, the @file{emultempl} script should 446include the linker scripts directly, and return them from the 447@code{get_scripts} entry point. When the emulation is not the default, 448the @code{get_scripts} entry point should just return a file name. See 449@file{emultempl/generic.em} for an example of how this is done. 450 451At some point, the linker emulation entry points should be documented. 452 453@node Emulation Walkthrough 454@chapter A Walkthrough of a Typical Emulation 455 456This chapter is to help people who are new to the way emulations 457interact with the linker, or who are suddenly thrust into the position 458of having to work with existing emulations. It will discuss the files 459you need to be aware of. It will tell you when the given "hooks" in 460the emulation will be called. It will, hopefully, give you enough 461information about when and how things happen that you'll be able to 462get by. As always, the source is the definitive reference to this. 463 464The starting point for the linker is in @file{ldmain.c} where 465@code{main} is defined. The bulk of the code that's emulation 466specific will initially be in @code{emultempl/@var{emulation}.em} but 467will end up in @code{e@var{emulation}.c} when the build is done. 468Most of the work to select and interface with emulations is in 469@code{ldemul.h} and @code{ldemul.c}. Specifically, @code{ldemul.h} 470defines the @code{ld_emulation_xfer_struct} structure your emulation 471exports. 472 473Your emulation file exports a symbol 474@code{ld_@var{EMULATION_NAME}_emulation}. If your emulation is 475selected (it usually is, since usually there's only one), 476@code{ldemul.c} sets the variable @var{ld_emulation} to point to it. 477@code{ldemul.c} also defines a number of API functions that interface 478to your emulation, like @code{ldemul_after_parse} which simply calls 479your @code{ld_@var{EMULATION}_emulation.after_parse} function. For 480the rest of this section, the functions will be mentioned, but you 481should assume the indirect reference to your emulation also. 482 483We will also skip or gloss over parts of the link process that don't 484relate to emulations, like setting up internationalization. 485 486After initialization, @code{main} selects an emulation by pre-scanning 487the command line arguments. It calls @code{ldemul_choose_target} to 488choose a target. If you set @code{choose_target} to 489@code{ldemul_default_target}, it picks your @code{target_name} by 490default. 491 492@code{main} calls @code{ldemul_before_parse}, then @code{parse_args}. 493@code{parse_args} calls @code{ldemul_parse_args} for each arg, which 494must update the @code{getopt} globals if it recognizes the argument. 495If the emulation doesn't recognize it, then parse_args checks to see 496if it recognizes it. 497 498Now that the emulation has had access to all its command-line options, 499@code{main} calls @code{ldemul_set_symbols}. This can be used for any 500initialization that may be affected by options. It is also supposed 501to set up any variables needed by the emulation script. 502 503@code{main} now calls @code{ldemul_get_script} to get the emulation 504script to use (based on arguments, no doubt, @pxref{Emulations}) and 505runs it. While parsing, @code{ldgram.y} may call @code{ldemul_hll} or 506@code{ldemul_syslib} to handle the @code{HLL} or @code{SYSLIB} 507commands. It may call @code{ldemul_unrecognized_file} if you asked 508the linker to link a file it doesn't recognize. It will call 509@code{ldemul_recognized_file} for each file it does recognize, in case 510the emulation wants to handle some files specially. All the while, 511it's loading the files (possibly calling 512@code{ldemul_open_dynamic_archive}) and symbols and stuff. After it's 513done reading the script, @code{main} calls @code{ldemul_after_parse}. 514Use the after-parse hook to set up anything that depends on stuff the 515script might have set up, like the entry point. 516 517@code{main} next calls @code{lang_process} in @code{ldlang.c}. This 518appears to be the main core of the linking itself, as far as emulation 519hooks are concerned(*). It first opens the output file's BFD, calling 520@code{ldemul_set_output_arch}, and calls 521@code{ldemul_create_output_section_statements} in case you need to use 522other means to find or create object files (i.e. shared libraries 523found on a path, or fake stub objects). Despite the name, nobody 524creates output sections here. 525 526(*) In most cases, the BFD library does the bulk of the actual 527linking, handling symbol tables, symbol resolution, relocations, and 528building the final output file. See the BFD reference for all the 529details. Your emulation is usually concerned more with managing 530things at the file and section level, like "put this here, add this 531section", etc. 532 533Next, the objects to be linked are opened and BFDs created for them, 534and @code{ldemul_after_open} is called. At this point, you have all 535the objects and symbols loaded, but none of the data has been placed 536yet. 537 538Next comes the Big Linking Thingy (except for the parts BFD does). 539All input sections are mapped to output sections according to the 540script. If a section doesn't get mapped by default, 541@code{ldemul_place_orphan} will get called to figure out where it goes. 542Next it figures out the offsets for each section, calling 543@code{ldemul_before_allocation} before and 544@code{ldemul_after_allocation} after deciding where each input section 545ends up in the output sections. 546 547The last part of @code{lang_process} is to figure out all the symbols' 548values. After assigning final values to the symbols, 549@code{ldemul_finish} is called, and after that, any undefined symbols 550are turned into fatal errors. 551 552OK, back to @code{main}, which calls @code{ldwrite} in 553@file{ldwrite.c}. @code{ldwrite} calls BFD's final_link, which does 554all the relocation fixups and writes the output bfd to disk, and we're 555done. 556 557In summary, 558 559@itemize @bullet 560 561@item @code{main()} in @file{ldmain.c} 562@item @file{emultempl/@var{EMULATION}.em} has your code 563@item @code{ldemul_choose_target} (defaults to your @code{target_name}) 564@item @code{ldemul_before_parse} 565@item Parse argv, calls @code{ldemul_parse_args} for each 566@item @code{ldemul_set_symbols} 567@item @code{ldemul_get_script} 568@item parse script 569 570@itemize @bullet 571@item may call @code{ldemul_hll} or @code{ldemul_syslib} 572@item may call @code{ldemul_open_dynamic_archive} 573@end itemize 574 575@item @code{ldemul_after_parse} 576@item @code{lang_process()} in @file{ldlang.c} 577 578@itemize @bullet 579@item create @code{output_bfd} 580@item @code{ldemul_set_output_arch} 581@item @code{ldemul_create_output_section_statements} 582@item read objects, create input bfds - all symbols exist, but have no values 583@item may call @code{ldemul_unrecognized_file} 584@item will call @code{ldemul_recognized_file} 585@item @code{ldemul_after_open} 586@item map input sections to output sections 587@item may call @code{ldemul_place_orphan} for remaining sections 588@item @code{ldemul_before_allocation} 589@item gives input sections offsets into output sections, places output sections 590@item @code{ldemul_after_allocation} - section addresses valid 591@item assigns values to symbols 592@item @code{ldemul_finish} - symbol values valid 593@end itemize 594 595@item output bfd is written to disk 596 597@end itemize 598 599@node Architecture Specific 600@chapter Some Architecture Specific Notes 601 602This is the place for notes on the behavior of @code{ld} on 603specific platforms. Currently, only Intel x86 is documented (and 604of that, only the auto-import behavior for DLLs). 605 606@menu 607* ix86:: Intel x86 608@end menu 609 610@node ix86 611@section Intel x86 612 613@table @emph 614@code{ld} can create DLLs that operate with various runtimes available 615on a common x86 operating system. These runtimes include native (using 616the mingw "platform"), cygwin, and pw. 617 618@item auto-import from DLLs 619@enumerate 620@item 621With this feature on, DLL clients can import variables from DLL 622without any concern from their side (for example, without any source 623code modifications). Auto-import can be enabled using the 624@code{--enable-auto-import} flag, or disabled via the 625@code{--disable-auto-import} flag. Auto-import is disabled by default. 626 627@item 628This is done completely in bounds of the PE specification (to be fair, 629there's a minor violation of the spec at one point, but in practice 630auto-import works on all known variants of that common x86 operating 631system) So, the resulting DLL can be used with any other PE 632compiler/linker. 633 634@item 635Auto-import is fully compatible with standard import method, in which 636variables are decorated using attribute modifiers. Libraries of either 637type may be mixed together. 638 639@item 640Overhead (space): 8 bytes per imported symbol, plus 20 for each 641reference to it; Overhead (load time): negligible; Overhead 642(virtual/physical memory): should be less than effect of DLL 643relocation. 644@end enumerate 645 646Motivation 647 648The obvious and only way to get rid of dllimport insanity is 649to make client access variable directly in the DLL, bypassing 650the extra dereference imposed by ordinary DLL runtime linking. 651I.e., whenever client contains something like 652 653@code{mov dll_var,%eax,} 654 655address of dll_var in the command should be relocated to point 656into loaded DLL. The aim is to make OS loader do so, and than 657make ld help with that. Import section of PE made following 658way: there's a vector of structures each describing imports 659from particular DLL. Each such structure points to two other 660parallel vectors: one holding imported names, and one which 661will hold address of corresponding imported name. So, the 662solution is de-vectorize these structures, making import 663locations be sparse and pointing directly into code. 664 665Implementation 666 667For each reference of data symbol to be imported from DLL (to 668set of which belong symbols with name <sym>, if __imp_<sym> is 669found in implib), the import fixup entry is generated. That 670entry is of type IMAGE_IMPORT_DESCRIPTOR and stored in .idata$3 671subsection. Each fixup entry contains pointer to symbol's address 672within .text section (marked with __fuN_<sym> symbol, where N is 673integer), pointer to DLL name (so, DLL name is referenced by 674multiple entries), and pointer to symbol name thunk. Symbol name 675thunk is singleton vector (__nm_th_<symbol>) pointing to 676IMAGE_IMPORT_BY_NAME structure (__nm_<symbol>) directly containing 677imported name. Here comes that "om the edge" problem mentioned above: 678PE specification rambles that name vector (OriginalFirstThunk) should 679run in parallel with addresses vector (FirstThunk), i.e. that they 680should have same number of elements and terminated with zero. We violate 681this, since FirstThunk points directly into machine code. But in 682practice, OS loader implemented the sane way: it goes thru 683OriginalFirstThunk and puts addresses to FirstThunk, not something 684else. It once again should be noted that dll and symbol name 685structures are reused across fixup entries and should be there 686anyway to support standard import stuff, so sustained overhead is 68720 bytes per reference. Other question is whether having several 688IMAGE_IMPORT_DESCRIPTORS for the same DLL is possible. Answer is yes, 689it is done even by native compiler/linker (libth32's functions are in 690fact resident in windows9x kernel32.dll, so if you use it, you have 691two IMAGE_IMPORT_DESCRIPTORS for kernel32.dll). Yet other question is 692whether referencing the same PE structures several times is valid. 693The answer is why not, prohibiting that (detecting violation) would 694require more work on behalf of loader than not doing it. 695 696@end table 697 698@node GNU Free Documentation License 699@chapter GNU Free Documentation License 700 701@include fdl.texi 702 703@contents 704@bye 705