1@section mmo backend
2The mmo object format is used exclusively together with Professor
3Donald E.@: Knuth's educational 64-bit processor MMIX.  The simulator
4@command{mmix} which is available at
5@url{http://mmix.cs.hm.edu/src/index.html}
6understands this format.  That package also includes a combined
7assembler and linker called @command{mmixal}.  The mmo format has
8no advantages feature-wise compared to e.g. ELF.  It is a simple
9non-relocatable object format with no support for archives or
10debugging information, except for symbol value information and
11line numbers (which is not yet implemented in BFD).  See
12@url{http://mmix.cs.hm.edu/} for more
13information about MMIX.  The ELF format is used for intermediate
14object files in the BFD implementation.
15
16@c We want to xref the symbol table node.  A feature in "chew"
17@c requires that "commands" do not contain spaces in the
18@c arguments.  Hence the hyphen in "Symbol-table".
19@menu
20* File layout::
21* Symbol-table::
22* mmo section mapping::
23@end menu
24
25@node File layout, Symbol-table, mmo, mmo
26@subsection File layout
27The mmo file contents is not partitioned into named sections as
28with e.g.@: ELF.  Memory areas is formed by specifying the
29location of the data that follows.  Only the memory area
30@samp{0x0000@dots{}00} to @samp{0x01ff@dots{}ff} is executable, so
31it is used for code (and constants) and the area
32@samp{0x2000@dots{}00} to @samp{0x20ff@dots{}ff} is used for
33writable data.  @xref{mmo section mapping}.
34
35There is provision for specifying ``special data'' of 65536
36different types.  We use type 80 (decimal), arbitrarily chosen the
37same as the ELF @code{e_machine} number for MMIX, filling it with
38section information normally found in ELF objects. @xref{mmo
39section mapping}.
40
41Contents is entered as 32-bit words, xor:ed over previous
42contents, always zero-initialized.  A word that starts with the
43byte @samp{0x98} forms a command called a @samp{lopcode}, where
44the next byte distinguished between the thirteen lopcodes.  The
45two remaining bytes, called the @samp{Y} and @samp{Z} fields, or
46the @samp{YZ} field (a 16-bit big-endian number), are used for
47various purposes different for each lopcode.  As documented in
48@url{http://mmix.cs.hm.edu/doc/mmixal.pdf},
49the lopcodes are:
50
51@table @code
52@item lop_quote
530x98000001.  The next word is contents, regardless of whether it
54starts with 0x98 or not.
55
56@item lop_loc
570x9801YYZZ, where @samp{Z} is 1 or 2.  This is a location
58directive, setting the location for the next data to the next
5932-bit word (for @math{Z = 1}) or 64-bit word (for @math{Z = 2}),
60plus @math{Y * 2^56}.  Normally @samp{Y} is 0 for the text segment
61and 2 for the data segment.  Beware that the low bits of non-
62tetrabyte-aligned values are silently discarded when being
63automatically incremented and when storing contents (in contrast
64to e.g. its use as current location when followed by lop_fixo
65et al before the next possibly-quoted tetrabyte contents).
66
67@item lop_skip
680x9802YYZZ.  Increase the current location by @samp{YZ} bytes.
69
70@item lop_fixo
710x9803YYZZ, where @samp{Z} is 1 or 2.  Store the current location
72as 64 bits into the location pointed to by the next 32-bit
73(@math{Z = 1}) or 64-bit (@math{Z = 2}) word, plus @math{Y *
742^56}.
75
76@item lop_fixr
770x9804YYZZ.  @samp{YZ} is stored into the current location plus
78@math{2 - 4 * YZ}.
79
80@item lop_fixrx
810x980500ZZ.  @samp{Z} is 16 or 24.  A value @samp{L} derived from
82the following 32-bit word are used in a manner similar to
83@samp{YZ} in lop_fixr: it is xor:ed into the current location
84minus @math{4 * L}.  The first byte of the word is 0 or 1.  If it
85is 1, then @math{L = (@var{lowest 24 bits of word}) - 2^Z}, if 0,
86then @math{L = (@var{lowest 24 bits of word})}.
87
88@item lop_file
890x9806YYZZ.  @samp{Y} is the file number, @samp{Z} is count of
9032-bit words.  Set the file number to @samp{Y} and the line
91counter to 0.  The next @math{Z * 4} bytes contain the file name,
92padded with zeros if the count is not a multiple of four.  The
93same @samp{Y} may occur multiple times, but @samp{Z} must be 0 for
94all but the first occurrence.
95
96@item lop_line
970x9807YYZZ.  @samp{YZ} is the line number.  Together with
98lop_file, it forms the source location for the next 32-bit word.
99Note that for each non-lopcode 32-bit word, line numbers are
100assumed incremented by one.
101
102@item lop_spec
1030x9808YYZZ.  @samp{YZ} is the type number.  Data until the next
104lopcode other than lop_quote forms special data of type @samp{YZ}.
105@xref{mmo section mapping}.
106
107Other types than 80, (or type 80 with a content that does not
108parse) is stored in sections named @code{.MMIX.spec_data.@var{n}}
109where @var{n} is the @samp{YZ}-type.  The flags for such a
110sections say not to allocate or load the data.  The vma is 0.
111Contents of multiple occurrences of special data @var{n} is
112concatenated to the data of the previous lop_spec @var{n}s.  The
113location in data or code at which the lop_spec occurred is lost.
114
115@item lop_pre
1160x980901ZZ.  The first lopcode in a file.  The @samp{Z} field forms the
117length of header information in 32-bit words, where the first word
118tells the time in seconds since @samp{00:00:00 GMT Jan 1 1970}.
119
120@item lop_post
1210x980a00ZZ.  @math{Z > 32}.  This lopcode follows after all
122content-generating lopcodes in a program.  The @samp{Z} field
123denotes the value of @samp{rG} at the beginning of the program.
124The following @math{256 - Z} big-endian 64-bit words are loaded
125into global registers @samp{$G} @dots{} @samp{$255}.
126
127@item lop_stab
1280x980b0000.  The next-to-last lopcode in a program.  Must follow
129immediately after the lop_post lopcode and its data.  After this
130lopcode follows all symbols in a compressed format
131(@pxref{Symbol-table}).
132
133@item lop_end
1340x980cYYZZ.  The last lopcode in a program.  It must follow the
135lop_stab lopcode and its data.  The @samp{YZ} field contains the
136number of 32-bit words of symbol table information after the
137preceding lop_stab lopcode.
138@end table
139
140Note that the lopcode "fixups"; @code{lop_fixr}, @code{lop_fixrx} and
141@code{lop_fixo} are not generated by BFD, but are handled.  They are
142generated by @code{mmixal}.
143
144This trivial one-label, one-instruction file:
145
146@example
147 :Main TRAP 1,2,3
148@end example
149
150can be represented this way in mmo:
151
152@example
153 0x98090101 - lop_pre, one 32-bit word with timestamp.
154 <timestamp>
155 0x98010002 - lop_loc, text segment, using a 64-bit address.
156              Note that mmixal does not emit this for the file above.
157 0x00000000 - Address, high 32 bits.
158 0x00000000 - Address, low 32 bits.
159 0x98060002 - lop_file, 2 32-bit words for file-name.
160 0x74657374 - "test"
161 0x2e730000 - ".s\0\0"
162 0x98070001 - lop_line, line 1.
163 0x00010203 - TRAP 1,2,3
164 0x980a00ff - lop_post, setting $255 to 0.
165 0x00000000
166 0x00000000
167 0x980b0000 - lop_stab for ":Main" = 0, serial 1.
168 0x203a4040   @xref{Symbol-table}.
169 0x10404020
170 0x4d206120
171 0x69016e00
172 0x81000000
173 0x980c0005 - lop_end; symbol table contained five 32-bit words.
174@end example
175@node Symbol-table, mmo section mapping, File layout, mmo
176@subsection Symbol table format
177From mmixal.w (or really, the generated mmixal.tex) in the
178MMIXware package which also contains the @command{mmix} simulator:
179``Symbols are stored and retrieved by means of a @samp{ternary
180search trie}, following ideas of Bentley and Sedgewick. (See
181ACM--SIAM Symp.@: on Discrete Algorithms @samp{8} (1997), 360--369;
182R.@:Sedgewick, @samp{Algorithms in C} (Reading, Mass.@:
183Addison--Wesley, 1998), @samp{15.4}.)  Each trie node stores a
184character, and there are branches to subtries for the cases where
185a given character is less than, equal to, or greater than the
186character in the trie.  There also is a pointer to a symbol table
187entry if a symbol ends at the current node.''
188
189So it's a tree encoded as a stream of bytes.  The stream of bytes
190acts on a single virtual global symbol, adding and removing
191characters and signalling complete symbol points.  Here, we read
192the stream and create symbols at the completion points.
193
194First, there's a control byte @code{m}.  If any of the listed bits
195in @code{m} is nonzero, we execute what stands at the right, in
196the listed order:
197
198@example
199 (MMO3_LEFT)
200 0x40 - Traverse left trie.
201        (Read a new command byte and recurse.)
202
203 (MMO3_SYMBITS)
204 0x2f - Read the next byte as a character and store it in the
205        current character position; increment character position.
206        Test the bits of @code{m}:
207
208        (MMO3_WCHAR)
209        0x80 - The character is 16-bit (so read another byte,
210               merge into current character.
211
212        (MMO3_TYPEBITS)
213        0xf  - We have a complete symbol; parse the type, value
214               and serial number and do what should be done
215               with a symbol.  The type and length information
216               is in j = (m & 0xf).
217
218               (MMO3_REGQUAL_BITS)
219               j == 0xf: A register variable.  The following
220                         byte tells which register.
221               j <= 8:   An absolute symbol.  Read j bytes as the
222                         big-endian number the symbol equals.
223                         A j = 2 with two zero bytes denotes an
224                         unknown symbol.
225               j > 8:    As with j <= 8, but add (0x20 << 56)
226                         to the value in the following j - 8
227                         bytes.
228
229               Then comes the serial number, as a variant of
230               uleb128, but better named ubeb128:
231               Read bytes and shift the previous value left 7
232               (multiply by 128).  Add in the new byte, repeat
233               until a byte has bit 7 set.  The serial number
234               is the computed value minus 128.
235
236        (MMO3_MIDDLE)
237        0x20 - Traverse middle trie.  (Read a new command byte
238               and recurse.)  Decrement character position.
239
240 (MMO3_RIGHT)
241 0x10 - Traverse right trie.  (Read a new command byte and
242        recurse.)
243@end example
244
245Let's look again at the @code{lop_stab} for the trivial file
246(@pxref{File layout}).
247
248@example
249 0x980b0000 - lop_stab for ":Main" = 0, serial 1.
250 0x203a4040
251 0x10404020
252 0x4d206120
253 0x69016e00
254 0x81000000
255@end example
256
257This forms the trivial trie (note that the path between ``:'' and
258``M'' is redundant):
259
260@example
261 203a     ":"
262 40       /
263 40      /
264 10      \
265 40      /
266 40     /
267 204d  "M"
268 2061  "a"
269 2069  "i"
270 016e  "n" is the last character in a full symbol, and
271       with a value represented in one byte.
272 00    The value is 0.
273 81    The serial number is 1.
274@end example
275
276@node mmo section mapping, , Symbol-table, mmo
277@subsection mmo section mapping
278The implementation in BFD uses special data type 80 (decimal) to
279encapsulate and describe named sections, containing e.g.@: debug
280information.  If needed, any datum in the encapsulation will be
281quoted using lop_quote.  First comes a 32-bit word holding the
282number of 32-bit words containing the zero-terminated zero-padded
283segment name.  After the name there's a 32-bit word holding flags
284describing the section type.  Then comes a 64-bit big-endian word
285with the section length (in bytes), then another with the section
286start address.  Depending on the type of section, the contents
287might follow, zero-padded to 32-bit boundary.  For a loadable
288section (such as data or code), the contents might follow at some
289later point, not necessarily immediately, as a lop_loc with the
290same start address as in the section description, followed by the
291contents.  This in effect forms a descriptor that must be emitted
292before the actual contents.  Sections described this way must not
293overlap.
294
295For areas that don't have such descriptors, synthetic sections are
296formed by BFD.  Consecutive contents in the two memory areas
297@samp{0x0000@dots{}00} to @samp{0x01ff@dots{}ff} and
298@samp{0x2000@dots{}00} to @samp{0x20ff@dots{}ff} are entered in
299sections named @code{.text} and @code{.data} respectively.  If an area
300is not otherwise described, but would together with a neighboring
301lower area be less than @samp{0x40000000} bytes long, it is joined
302with the lower area and the gap is zero-filled.  For other cases,
303a new section is formed, named @code{.MMIX.sec.@var{n}}.  Here,
304@var{n} is a number, a running count through the mmo file,
305starting at 0.
306
307A loadable section specified as:
308
309@example
310 .section secname,"ax"
311 TETRA 1,2,3,4,-1,-2009
312 BYTE 80
313@end example
314
315and linked to address @samp{0x4}, is represented by the sequence:
316
317@example
318 0x98080050 - lop_spec 80
319 0x00000002 - two 32-bit words for the section name
320 0x7365636e - "secn"
321 0x616d6500 - "ame\0"
322 0x00000033 - flags CODE, READONLY, LOAD, ALLOC
323 0x00000000 - high 32 bits of section length
324 0x0000001c - section length is 28 bytes; 6 * 4 + 1 + alignment to 32 bits
325 0x00000000 - high 32 bits of section address
326 0x00000004 - section address is 4
327 0x98010002 - 64 bits with address of following data
328 0x00000000 - high 32 bits of address
329 0x00000004 - low 32 bits: data starts at address 4
330 0x00000001 - 1
331 0x00000002 - 2
332 0x00000003 - 3
333 0x00000004 - 4
334 0xffffffff - -1
335 0xfffff827 - -2009
336 0x50000000 - 80 as a byte, padded with zeros.
337@end example
338
339Note that the lop_spec wrapping does not include the section
340contents.  Compare this to a non-loaded section specified as:
341
342@example
343 .section thirdsec
344 TETRA 200001,100002
345 BYTE 38,40
346@end example
347
348This, when linked to address @samp{0x200000000000001c}, is
349represented by:
350
351@example
352 0x98080050 - lop_spec 80
353 0x00000002 - two 32-bit words for the section name
354 0x7365636e - "thir"
355 0x616d6500 - "dsec"
356 0x00000010 - flag READONLY
357 0x00000000 - high 32 bits of section length
358 0x0000000c - section length is 12 bytes; 2 * 4 + 2 + alignment to 32 bits
359 0x20000000 - high 32 bits of address
360 0x0000001c - low 32 bits of address 0x200000000000001c
361 0x00030d41 - 200001
362 0x000186a2 - 100002
363 0x26280000 - 38, 40 as bytes, padded with zeros
364@end example
365
366For the latter example, the section contents must not be
367loaded in memory, and is therefore specified as part of the
368special data.  The address is usually unimportant but might
369provide information for e.g.@: the DWARF 2 debugging format.
370