1@section mmo backend 2The mmo object format is used exclusively together with Professor 3Donald E.@: Knuth's educational 64-bit processor MMIX. The simulator 4@command{mmix} which is available at 5@url{http://mmix.cs.hm.edu/src/index.html} 6understands this format. That package also includes a combined 7assembler and linker called @command{mmixal}. The mmo format has 8no advantages feature-wise compared to e.g. ELF. It is a simple 9non-relocatable object format with no support for archives or 10debugging information, except for symbol value information and 11line numbers (which is not yet implemented in BFD). See 12@url{http://mmix.cs.hm.edu/} for more 13information about MMIX. The ELF format is used for intermediate 14object files in the BFD implementation. 15 16@c We want to xref the symbol table node. A feature in "chew" 17@c requires that "commands" do not contain spaces in the 18@c arguments. Hence the hyphen in "Symbol-table". 19@menu 20* File layout:: 21* Symbol-table:: 22* mmo section mapping:: 23@end menu 24 25@node File layout, Symbol-table, mmo, mmo 26@subsection File layout 27The mmo file contents is not partitioned into named sections as 28with e.g.@: ELF. Memory areas is formed by specifying the 29location of the data that follows. Only the memory area 30@samp{0x0000@dots{}00} to @samp{0x01ff@dots{}ff} is executable, so 31it is used for code (and constants) and the area 32@samp{0x2000@dots{}00} to @samp{0x20ff@dots{}ff} is used for 33writable data. @xref{mmo section mapping}. 34 35There is provision for specifying ``special data'' of 65536 36different types. We use type 80 (decimal), arbitrarily chosen the 37same as the ELF @code{e_machine} number for MMIX, filling it with 38section information normally found in ELF objects. @xref{mmo 39section mapping}. 40 41Contents is entered as 32-bit words, xor:ed over previous 42contents, always zero-initialized. A word that starts with the 43byte @samp{0x98} forms a command called a @samp{lopcode}, where 44the next byte distinguished between the thirteen lopcodes. The 45two remaining bytes, called the @samp{Y} and @samp{Z} fields, or 46the @samp{YZ} field (a 16-bit big-endian number), are used for 47various purposes different for each lopcode. As documented in 48@url{http://mmix.cs.hm.edu/doc/mmixal.pdf}, 49the lopcodes are: 50 51@table @code 52@item lop_quote 530x98000001. The next word is contents, regardless of whether it 54starts with 0x98 or not. 55 56@item lop_loc 570x9801YYZZ, where @samp{Z} is 1 or 2. This is a location 58directive, setting the location for the next data to the next 5932-bit word (for @math{Z = 1}) or 64-bit word (for @math{Z = 2}), 60plus @math{Y * 2^56}. Normally @samp{Y} is 0 for the text segment 61and 2 for the data segment. Beware that the low bits of non- 62tetrabyte-aligned values are silently discarded when being 63automatically incremented and when storing contents (in contrast 64to e.g. its use as current location when followed by lop_fixo 65et al before the next possibly-quoted tetrabyte contents). 66 67@item lop_skip 680x9802YYZZ. Increase the current location by @samp{YZ} bytes. 69 70@item lop_fixo 710x9803YYZZ, where @samp{Z} is 1 or 2. Store the current location 72as 64 bits into the location pointed to by the next 32-bit 73(@math{Z = 1}) or 64-bit (@math{Z = 2}) word, plus @math{Y * 742^56}. 75 76@item lop_fixr 770x9804YYZZ. @samp{YZ} is stored into the current location plus 78@math{2 - 4 * YZ}. 79 80@item lop_fixrx 810x980500ZZ. @samp{Z} is 16 or 24. A value @samp{L} derived from 82the following 32-bit word are used in a manner similar to 83@samp{YZ} in lop_fixr: it is xor:ed into the current location 84minus @math{4 * L}. The first byte of the word is 0 or 1. If it 85is 1, then @math{L = (@var{lowest 24 bits of word}) - 2^Z}, if 0, 86then @math{L = (@var{lowest 24 bits of word})}. 87 88@item lop_file 890x9806YYZZ. @samp{Y} is the file number, @samp{Z} is count of 9032-bit words. Set the file number to @samp{Y} and the line 91counter to 0. The next @math{Z * 4} bytes contain the file name, 92padded with zeros if the count is not a multiple of four. The 93same @samp{Y} may occur multiple times, but @samp{Z} must be 0 for 94all but the first occurrence. 95 96@item lop_line 970x9807YYZZ. @samp{YZ} is the line number. Together with 98lop_file, it forms the source location for the next 32-bit word. 99Note that for each non-lopcode 32-bit word, line numbers are 100assumed incremented by one. 101 102@item lop_spec 1030x9808YYZZ. @samp{YZ} is the type number. Data until the next 104lopcode other than lop_quote forms special data of type @samp{YZ}. 105@xref{mmo section mapping}. 106 107Other types than 80, (or type 80 with a content that does not 108parse) is stored in sections named @code{.MMIX.spec_data.@var{n}} 109where @var{n} is the @samp{YZ}-type. The flags for such a 110sections say not to allocate or load the data. The vma is 0. 111Contents of multiple occurrences of special data @var{n} is 112concatenated to the data of the previous lop_spec @var{n}s. The 113location in data or code at which the lop_spec occurred is lost. 114 115@item lop_pre 1160x980901ZZ. The first lopcode in a file. The @samp{Z} field forms the 117length of header information in 32-bit words, where the first word 118tells the time in seconds since @samp{00:00:00 GMT Jan 1 1970}. 119 120@item lop_post 1210x980a00ZZ. @math{Z > 32}. This lopcode follows after all 122content-generating lopcodes in a program. The @samp{Z} field 123denotes the value of @samp{rG} at the beginning of the program. 124The following @math{256 - Z} big-endian 64-bit words are loaded 125into global registers @samp{$G} @dots{} @samp{$255}. 126 127@item lop_stab 1280x980b0000. The next-to-last lopcode in a program. Must follow 129immediately after the lop_post lopcode and its data. After this 130lopcode follows all symbols in a compressed format 131(@pxref{Symbol-table}). 132 133@item lop_end 1340x980cYYZZ. The last lopcode in a program. It must follow the 135lop_stab lopcode and its data. The @samp{YZ} field contains the 136number of 32-bit words of symbol table information after the 137preceding lop_stab lopcode. 138@end table 139 140Note that the lopcode "fixups"; @code{lop_fixr}, @code{lop_fixrx} and 141@code{lop_fixo} are not generated by BFD, but are handled. They are 142generated by @code{mmixal}. 143 144This trivial one-label, one-instruction file: 145 146@example 147 :Main TRAP 1,2,3 148@end example 149 150can be represented this way in mmo: 151 152@example 153 0x98090101 - lop_pre, one 32-bit word with timestamp. 154 <timestamp> 155 0x98010002 - lop_loc, text segment, using a 64-bit address. 156 Note that mmixal does not emit this for the file above. 157 0x00000000 - Address, high 32 bits. 158 0x00000000 - Address, low 32 bits. 159 0x98060002 - lop_file, 2 32-bit words for file-name. 160 0x74657374 - "test" 161 0x2e730000 - ".s\0\0" 162 0x98070001 - lop_line, line 1. 163 0x00010203 - TRAP 1,2,3 164 0x980a00ff - lop_post, setting $255 to 0. 165 0x00000000 166 0x00000000 167 0x980b0000 - lop_stab for ":Main" = 0, serial 1. 168 0x203a4040 @xref{Symbol-table}. 169 0x10404020 170 0x4d206120 171 0x69016e00 172 0x81000000 173 0x980c0005 - lop_end; symbol table contained five 32-bit words. 174@end example 175@node Symbol-table, mmo section mapping, File layout, mmo 176@subsection Symbol table format 177From mmixal.w (or really, the generated mmixal.tex) in the 178MMIXware package which also contains the @command{mmix} simulator: 179``Symbols are stored and retrieved by means of a @samp{ternary 180search trie}, following ideas of Bentley and Sedgewick. (See 181ACM--SIAM Symp.@: on Discrete Algorithms @samp{8} (1997), 360--369; 182R.@:Sedgewick, @samp{Algorithms in C} (Reading, Mass.@: 183Addison--Wesley, 1998), @samp{15.4}.) Each trie node stores a 184character, and there are branches to subtries for the cases where 185a given character is less than, equal to, or greater than the 186character in the trie. There also is a pointer to a symbol table 187entry if a symbol ends at the current node.'' 188 189So it's a tree encoded as a stream of bytes. The stream of bytes 190acts on a single virtual global symbol, adding and removing 191characters and signalling complete symbol points. Here, we read 192the stream and create symbols at the completion points. 193 194First, there's a control byte @code{m}. If any of the listed bits 195in @code{m} is nonzero, we execute what stands at the right, in 196the listed order: 197 198@example 199 (MMO3_LEFT) 200 0x40 - Traverse left trie. 201 (Read a new command byte and recurse.) 202 203 (MMO3_SYMBITS) 204 0x2f - Read the next byte as a character and store it in the 205 current character position; increment character position. 206 Test the bits of @code{m}: 207 208 (MMO3_WCHAR) 209 0x80 - The character is 16-bit (so read another byte, 210 merge into current character. 211 212 (MMO3_TYPEBITS) 213 0xf - We have a complete symbol; parse the type, value 214 and serial number and do what should be done 215 with a symbol. The type and length information 216 is in j = (m & 0xf). 217 218 (MMO3_REGQUAL_BITS) 219 j == 0xf: A register variable. The following 220 byte tells which register. 221 j <= 8: An absolute symbol. Read j bytes as the 222 big-endian number the symbol equals. 223 A j = 2 with two zero bytes denotes an 224 unknown symbol. 225 j > 8: As with j <= 8, but add (0x20 << 56) 226 to the value in the following j - 8 227 bytes. 228 229 Then comes the serial number, as a variant of 230 uleb128, but better named ubeb128: 231 Read bytes and shift the previous value left 7 232 (multiply by 128). Add in the new byte, repeat 233 until a byte has bit 7 set. The serial number 234 is the computed value minus 128. 235 236 (MMO3_MIDDLE) 237 0x20 - Traverse middle trie. (Read a new command byte 238 and recurse.) Decrement character position. 239 240 (MMO3_RIGHT) 241 0x10 - Traverse right trie. (Read a new command byte and 242 recurse.) 243@end example 244 245Let's look again at the @code{lop_stab} for the trivial file 246(@pxref{File layout}). 247 248@example 249 0x980b0000 - lop_stab for ":Main" = 0, serial 1. 250 0x203a4040 251 0x10404020 252 0x4d206120 253 0x69016e00 254 0x81000000 255@end example 256 257This forms the trivial trie (note that the path between ``:'' and 258``M'' is redundant): 259 260@example 261 203a ":" 262 40 / 263 40 / 264 10 \ 265 40 / 266 40 / 267 204d "M" 268 2061 "a" 269 2069 "i" 270 016e "n" is the last character in a full symbol, and 271 with a value represented in one byte. 272 00 The value is 0. 273 81 The serial number is 1. 274@end example 275 276@node mmo section mapping, , Symbol-table, mmo 277@subsection mmo section mapping 278The implementation in BFD uses special data type 80 (decimal) to 279encapsulate and describe named sections, containing e.g.@: debug 280information. If needed, any datum in the encapsulation will be 281quoted using lop_quote. First comes a 32-bit word holding the 282number of 32-bit words containing the zero-terminated zero-padded 283segment name. After the name there's a 32-bit word holding flags 284describing the section type. Then comes a 64-bit big-endian word 285with the section length (in bytes), then another with the section 286start address. Depending on the type of section, the contents 287might follow, zero-padded to 32-bit boundary. For a loadable 288section (such as data or code), the contents might follow at some 289later point, not necessarily immediately, as a lop_loc with the 290same start address as in the section description, followed by the 291contents. This in effect forms a descriptor that must be emitted 292before the actual contents. Sections described this way must not 293overlap. 294 295For areas that don't have such descriptors, synthetic sections are 296formed by BFD. Consecutive contents in the two memory areas 297@samp{0x0000@dots{}00} to @samp{0x01ff@dots{}ff} and 298@samp{0x2000@dots{}00} to @samp{0x20ff@dots{}ff} are entered in 299sections named @code{.text} and @code{.data} respectively. If an area 300is not otherwise described, but would together with a neighboring 301lower area be less than @samp{0x40000000} bytes long, it is joined 302with the lower area and the gap is zero-filled. For other cases, 303a new section is formed, named @code{.MMIX.sec.@var{n}}. Here, 304@var{n} is a number, a running count through the mmo file, 305starting at 0. 306 307A loadable section specified as: 308 309@example 310 .section secname,"ax" 311 TETRA 1,2,3,4,-1,-2009 312 BYTE 80 313@end example 314 315and linked to address @samp{0x4}, is represented by the sequence: 316 317@example 318 0x98080050 - lop_spec 80 319 0x00000002 - two 32-bit words for the section name 320 0x7365636e - "secn" 321 0x616d6500 - "ame\0" 322 0x00000033 - flags CODE, READONLY, LOAD, ALLOC 323 0x00000000 - high 32 bits of section length 324 0x0000001c - section length is 28 bytes; 6 * 4 + 1 + alignment to 32 bits 325 0x00000000 - high 32 bits of section address 326 0x00000004 - section address is 4 327 0x98010002 - 64 bits with address of following data 328 0x00000000 - high 32 bits of address 329 0x00000004 - low 32 bits: data starts at address 4 330 0x00000001 - 1 331 0x00000002 - 2 332 0x00000003 - 3 333 0x00000004 - 4 334 0xffffffff - -1 335 0xfffff827 - -2009 336 0x50000000 - 80 as a byte, padded with zeros. 337@end example 338 339Note that the lop_spec wrapping does not include the section 340contents. Compare this to a non-loaded section specified as: 341 342@example 343 .section thirdsec 344 TETRA 200001,100002 345 BYTE 38,40 346@end example 347 348This, when linked to address @samp{0x200000000000001c}, is 349represented by: 350 351@example 352 0x98080050 - lop_spec 80 353 0x00000002 - two 32-bit words for the section name 354 0x7365636e - "thir" 355 0x616d6500 - "dsec" 356 0x00000010 - flag READONLY 357 0x00000000 - high 32 bits of section length 358 0x0000000c - section length is 12 bytes; 2 * 4 + 2 + alignment to 32 bits 359 0x20000000 - high 32 bits of address 360 0x0000001c - low 32 bits of address 0x200000000000001c 361 0x00030d41 - 200001 362 0x000186a2 - 100002 363 0x26280000 - 38, 40 as bytes, padded with zeros 364@end example 365 366For the latter example, the section contents must not be 367loaded in memory, and is therefore specified as part of the 368special data. The address is usually unimportant but might 369provide information for e.g.@: the DWARF 2 debugging format. 370