1280461Sdim.. _Readers: 2280461Sdim 3280461SdimDeveloping lld Readers 4280461Sdim====================== 5280461Sdim 6317025SdimNote: this document discuss Mach-O port of LLD. For ELF and COFF, 7317025Sdimsee :doc:`index`. 8317025Sdim 9280461SdimIntroduction 10280461Sdim------------ 11280461Sdim 12280461SdimThe purpose of a "Reader" is to take an object file in a particular format 13280461Sdimand create an `lld::File`:cpp:class: (which is a graph of Atoms) 14280461Sdimrepresenting the object file. A Reader inherits from 15280461Sdim`lld::Reader`:cpp:class: which lives in 16280461Sdim:file:`include/lld/Core/Reader.h` and 17280461Sdim:file:`lib/Core/Reader.cpp`. 18280461Sdim 19280461SdimThe Reader infrastructure for an object format ``Foo`` requires the 20280461Sdimfollowing pieces in order to fit into lld: 21280461Sdim 22280461Sdim:file:`include/lld/ReaderWriter/ReaderFoo.h` 23280461Sdim 24280461Sdim .. cpp:class:: ReaderOptionsFoo : public ReaderOptions 25280461Sdim 26280461Sdim This Options class is the only way to configure how the Reader will 27280461Sdim parse any file into an `lld::Reader`:cpp:class: object. This class 28280461Sdim should be declared in the `lld`:cpp:class: namespace. 29280461Sdim 30280461Sdim .. cpp:function:: Reader *createReaderFoo(ReaderOptionsFoo &reader) 31280461Sdim 32280461Sdim This factory function configures and create the Reader. This function 33280461Sdim should be declared in the `lld`:cpp:class: namespace. 34280461Sdim 35280461Sdim:file:`lib/ReaderWriter/Foo/ReaderFoo.cpp` 36280461Sdim 37280461Sdim .. cpp:class:: ReaderFoo : public Reader 38280461Sdim 39280461Sdim This is the concrete Reader class which can be called to parse 40280461Sdim object files. It should be declared in an anonymous namespace or 41280461Sdim if there is shared code with the `lld::WriterFoo`:cpp:class: you 42280461Sdim can make a nested namespace (e.g. `lld::foo`:cpp:class:). 43280461Sdim 44280461SdimYou may have noticed that :cpp:class:`ReaderFoo` is not declared in the 45280461Sdim``.h`` file. An important design aspect of lld is that all Readers are 46280461Sdimcreated *only* through an object-format-specific 47280461Sdim:cpp:func:`createReaderFoo` factory function. The creation of the Reader is 48280461Sdimparametrized through a :cpp:class:`ReaderOptionsFoo` class. This options 49280461Sdimclass is the one-and-only way to control how the Reader operates when 50280461Sdimparsing an input file into an Atom graph. For instance, you may want the 51280461SdimReader to only accept certain architectures. The options class can be 52280461Sdiminstantiated from command line options or be programmatically configured. 53280461Sdim 54280461SdimWhere to start 55280461Sdim-------------- 56280461Sdim 57280461SdimThe lld project already has a skeleton of source code for Readers for 58292934Sdim``ELF``, ``PECOFF``, ``MachO``, and lld's native ``YAML`` graph format. 59292934SdimIf your file format is a variant of one of those, you should modify the 60292934Sdimexisting Reader to support your variant. This is done by customizing the Options 61280461Sdimclass for the Reader and making appropriate changes to the ``.cpp`` file to 62280461Sdiminterpret those options and act accordingly. 63280461Sdim 64280461SdimIf your object file format is not a variant of any existing Reader, you'll need 65280461Sdimto create a new Reader subclass with the organization described above. 66280461Sdim 67280461SdimReaders are factories 68280461Sdim--------------------- 69280461Sdim 70280461SdimThe linker will usually only instantiate your Reader once. That one Reader will 71280461Sdimhave its loadFile() method called many times with different input files. 72280461SdimTo support multithreaded linking, the Reader may be parsing multiple input 73280461Sdimfiles in parallel. Therefore, there should be no parsing state in you Reader 74280461Sdimobject. Any parsing state should be in ivars of your File subclass or in 75280461Sdimsome temporary object. 76280461Sdim 77344779SdimThe key function to implement in a reader is:: 78280461Sdim 79280461Sdim virtual error_code loadFile(LinkerInput &input, 80280461Sdim std::vector<std::unique_ptr<File>> &result); 81280461Sdim 82280461SdimIt takes a memory buffer (which contains the contents of the object file 83280461Sdimbeing read) and returns an instantiated lld::File object which is 84280461Sdima collection of Atoms. The result is a vector of File pointers (instead of 85280461Sdimsimple a File pointer) because some file formats allow multiple object 86280461Sdim"files" to be encoded in one file system file. 87280461Sdim 88280461Sdim 89280461SdimMemory Ownership 90280461Sdim---------------- 91280461Sdim 92280461SdimAtoms are always owned by their File object. During core linking when Atoms 93280461Sdimare coalesced or stripped away, core linking does not delete them. 94280461SdimCore linking just removes those unused Atoms from its internal list. 95280461SdimThe destructor of a File object is responsible for deleting all Atoms it 96280461Sdimowns, and if ownership of the MemoryBuffer was passed to it, the File 97280461Sdimdestructor needs to delete that too. 98280461Sdim 99280461SdimMaking Atoms 100280461Sdim------------ 101280461Sdim 102280461SdimThe internal model of lld is purely Atom based. But most object files do not 103280461Sdimhave an explicit concept of Atoms, instead most have "sections". The way 104280461Sdimto think of this is that a section is just a list of Atoms with common 105280461Sdimattributes. 106280461Sdim 107280461SdimThe first step in parsing section-based object files is to cleave each 108280461Sdimsection into a list of Atoms. The technique may vary by section type. For 109280461Sdimcode sections (e.g. .text), there are usually symbols at the start of each 110280461Sdimfunction. Those symbol addresses are the points at which the section is 111280461Sdimcleaved into discrete Atoms. Some file formats (like ELF) also include the 112280461Sdimlength of each symbol in the symbol table. Otherwise, the length of each 113280461SdimAtom is calculated to run to the start of the next symbol or the end of the 114280461Sdimsection. 115280461Sdim 116280461SdimOther sections types can be implicitly cleaved. For instance c-string literals 117280461Sdimor unwind info (e.g. .eh_frame) can be cleaved by having the Reader look at 118280461Sdimthe content of the section. It is important to cleave sections into Atoms 119280461Sdimto remove false dependencies. For instance the .eh_frame section often 120280461Sdimhas no symbols, but contains "pointers" to the functions for which it 121280461Sdimhas unwind info. If the .eh_frame section was not cleaved (but left as one 122280461Sdimbig Atom), there would always be a reference (from the eh_frame Atom) to 123280461Sdimeach function. So the linker would be unable to coalesce or dead stripped 124280461Sdimaway the function atoms. 125280461Sdim 126280461SdimThe lld Atom model also requires that a reference to an undefined symbol be 127280461Sdimmodeled as a Reference to an UndefinedAtom. So the Reader also needs to 128280461Sdimcreate an UndefinedAtom for each undefined symbol in the object file. 129280461Sdim 130280461SdimOnce all Atoms have been created, the second step is to create References 131280461Sdim(recall that Atoms are "nodes" and References are "edges"). Most References 132280461Sdimare created by looking at the "relocation records" in the object file. If 133280461Sdima function contains a call to "malloc", there is usually a relocation record 134280461Sdimspecifying the address in the section and the symbol table index. Your 135280461SdimReader will need to convert the address to an Atom and offset and the symbol 136280461Sdimtable index into a target Atom. If "malloc" is not defined in the object file, 137280461Sdimthe target Atom of the Reference will be an UndefinedAtom. 138280461Sdim 139280461Sdim 140280461SdimPerformance 141280461Sdim----------- 142280461SdimOnce you have the above working to parse an object file into Atoms and 143280461SdimReferences, you'll want to look at performance. Some techniques that can 144280461Sdimhelp performance are: 145280461Sdim 146280461Sdim* Use llvm::BumpPtrAllocator or pre-allocate one big vector<Reference> and then 147280461Sdim just have each atom point to its subrange of References in that vector. 148280461Sdim This can be faster that allocating each Reference as separate object. 149280461Sdim* Pre-scan the symbol table and determine how many atoms are in each section 150280461Sdim then allocate space for all the Atom objects at once. 151280461Sdim* Don't copy symbol names or section content to each Atom, instead use 152280461Sdim StringRef and ArrayRef in each Atom to point to its name and content in the 153280461Sdim MemoryBuffer. 154280461Sdim 155280461Sdim 156280461SdimTesting 157280461Sdim------- 158280461Sdim 159280461SdimWe are still working on infrastructure to test Readers. The issue is that 160280461Sdimyou don't want to check in binary files to the test suite. And the tools 161280461Sdimfor creating your object file from assembly source may not be available on 162280461Sdimevery OS. 163280461Sdim 164280461SdimWe are investigating a way to use YAML to describe the section, symbols, 165280461Sdimand content of a file. Then have some code which will write out an object 166280461Sdimfile from that YAML description. 167280461Sdim 168280461SdimOnce that is in place, you can write test cases that contain section/symbols 169280461SdimYAML and is run through the linker to produce Atom/References based YAML which 170280461Sdimis then run through FileCheck to verify the Atoms and References are as 171280461Sdimexpected. 172280461Sdim 173280461Sdim 174280461Sdim 175