1@section mmo backend
2The mmo object format is used exclusively together with Professor
3Donald E.@: Knuth's educational 64-bit processor MMIX.  The simulator
4@command{mmix} which is available at
5@url{http://www-cs-faculty.stanford.edu/~knuth/programs/mmix.tar.gz}
6understands this format.  That package also includes a combined
7assembler and linker called @command{mmixal}.  The mmo format has
8no advantages feature-wise compared to e.g. ELF.  It is a simple
9non-relocatable object format with no support for archives or
10debugging information, except for symbol value information and
11line numbers (which is not yet implemented in BFD).  See
12@url{http://www-cs-faculty.stanford.edu/~knuth/mmix.html} for more
13information about MMIX.  The ELF format is used for intermediate
14object files in the BFD implementation.
15
16@c We want to xref the symbol table node.  A feature in "chew"
17@c requires that "commands" do not contain spaces in the
18@c arguments.  Hence the hyphen in "Symbol-table".
19@menu
20* File layout::
21* Symbol-table::
22* mmo section mapping::
23@end menu
24
25@node File layout, Symbol-table, mmo, mmo
26@subsection File layout
27The mmo file contents is not partitioned into named sections as
28with e.g.@: ELF.  Memory areas is formed by specifying the
29location of the data that follows.  Only the memory area
30@samp{0x0000@dots{}00} to @samp{0x01ff@dots{}ff} is executable, so
31it is used for code (and constants) and the area
32@samp{0x2000@dots{}00} to @samp{0x20ff@dots{}ff} is used for
33writable data.  @xref{mmo section mapping}.
34
35There is provision for specifying ``special data'' of 65536
36different types.  We use type 80 (decimal), arbitrarily chosen the
37same as the ELF @code{e_machine} number for MMIX, filling it with
38section information normally found in ELF objects. @xref{mmo
39section mapping}.
40
41Contents is entered as 32-bit words, xor:ed over previous
42contents, always zero-initialized.  A word that starts with the
43byte @samp{0x98} forms a command called a @samp{lopcode}, where
44the next byte distinguished between the thirteen lopcodes.  The
45two remaining bytes, called the @samp{Y} and @samp{Z} fields, or
46the @samp{YZ} field (a 16-bit big-endian number), are used for
47various purposes different for each lopcode.  As documented in
48@url{http://www-cs-faculty.stanford.edu/~knuth/mmixal-intro.ps.gz},
49the lopcodes are:
50
51@table @code
52@item lop_quote
530x98000001.  The next word is contents, regardless of whether it
54starts with 0x98 or not.
55
56@item lop_loc
570x9801YYZZ, where @samp{Z} is 1 or 2.  This is a location
58directive, setting the location for the next data to the next
5932-bit word (for @math{Z = 1}) or 64-bit word (for @math{Z = 2}),
60plus @math{Y * 2^56}.  Normally @samp{Y} is 0 for the text segment
61and 2 for the data segment.
62
63@item lop_skip
640x9802YYZZ.  Increase the current location by @samp{YZ} bytes.
65
66@item lop_fixo
670x9803YYZZ, where @samp{Z} is 1 or 2.  Store the current location
68as 64 bits into the location pointed to by the next 32-bit
69(@math{Z = 1}) or 64-bit (@math{Z = 2}) word, plus @math{Y *
702^56}.
71
72@item lop_fixr
730x9804YYZZ.  @samp{YZ} is stored into the current location plus
74@math{2 - 4 * YZ}.
75
76@item lop_fixrx
770x980500ZZ.  @samp{Z} is 16 or 24.  A value @samp{L} derived from
78the following 32-bit word are used in a manner similar to
79@samp{YZ} in lop_fixr: it is xor:ed into the current location
80minus @math{4 * L}.  The first byte of the word is 0 or 1.  If it
81is 1, then @math{L = (@var{lowest 24 bits of word}) - 2^Z}, if 0,
82then @math{L = (@var{lowest 24 bits of word})}.
83
84@item lop_file
850x9806YYZZ.  @samp{Y} is the file number, @samp{Z} is count of
8632-bit words.  Set the file number to @samp{Y} and the line
87counter to 0.  The next @math{Z * 4} bytes contain the file name,
88padded with zeros if the count is not a multiple of four.  The
89same @samp{Y} may occur multiple times, but @samp{Z} must be 0 for
90all but the first occurrence.
91
92@item lop_line
930x9807YYZZ.  @samp{YZ} is the line number.  Together with
94lop_file, it forms the source location for the next 32-bit word.
95Note that for each non-lopcode 32-bit word, line numbers are
96assumed incremented by one.
97
98@item lop_spec
990x9808YYZZ.  @samp{YZ} is the type number.  Data until the next
100lopcode other than lop_quote forms special data of type @samp{YZ}.
101@xref{mmo section mapping}.
102
103Other types than 80, (or type 80 with a content that does not
104parse) is stored in sections named @code{.MMIX.spec_data.@var{n}}
105where @var{n} is the @samp{YZ}-type.  The flags for such a
106sections say not to allocate or load the data.  The vma is 0.
107Contents of multiple occurrences of special data @var{n} is
108concatenated to the data of the previous lop_spec @var{n}s.  The
109location in data or code at which the lop_spec occurred is lost.
110
111@item lop_pre
1120x980901ZZ.  The first lopcode in a file.  The @samp{Z} field forms the
113length of header information in 32-bit words, where the first word
114tells the time in seconds since @samp{00:00:00 GMT Jan 1 1970}.
115
116@item lop_post
1170x980a00ZZ.  @math{Z > 32}.  This lopcode follows after all
118content-generating lopcodes in a program.  The @samp{Z} field
119denotes the value of @samp{rG} at the beginning of the program.
120The following @math{256 - Z} big-endian 64-bit words are loaded
121into global registers @samp{$G} @dots{} @samp{$255}.
122
123@item lop_stab
1240x980b0000.  The next-to-last lopcode in a program.  Must follow
125immediately after the lop_post lopcode and its data.  After this
126lopcode follows all symbols in a compressed format
127(@pxref{Symbol-table}).
128
129@item lop_end
1300x980cYYZZ.  The last lopcode in a program.  It must follow the
131lop_stab lopcode and its data.  The @samp{YZ} field contains the
132number of 32-bit words of symbol table information after the
133preceding lop_stab lopcode.
134@end table
135
136Note that the lopcode "fixups"; @code{lop_fixr}, @code{lop_fixrx} and
137@code{lop_fixo} are not generated by BFD, but are handled.  They are
138generated by @code{mmixal}.
139
140This trivial one-label, one-instruction file:
141
142@example
143 :Main TRAP 1,2,3
144@end example
145
146can be represented this way in mmo:
147
148@example
149 0x98090101 - lop_pre, one 32-bit word with timestamp.
150 <timestamp>
151 0x98010002 - lop_loc, text segment, using a 64-bit address.
152              Note that mmixal does not emit this for the file above.
153 0x00000000 - Address, high 32 bits.
154 0x00000000 - Address, low 32 bits.
155 0x98060002 - lop_file, 2 32-bit words for file-name.
156 0x74657374 - "test"
157 0x2e730000 - ".s\0\0"
158 0x98070001 - lop_line, line 1.
159 0x00010203 - TRAP 1,2,3
160 0x980a00ff - lop_post, setting $255 to 0.
161 0x00000000
162 0x00000000
163 0x980b0000 - lop_stab for ":Main" = 0, serial 1.
164 0x203a4040   @xref{Symbol-table}.
165 0x10404020
166 0x4d206120
167 0x69016e00
168 0x81000000
169 0x980c0005 - lop_end; symbol table contained five 32-bit words.
170@end example
171@node Symbol-table, mmo section mapping, File layout, mmo
172@subsection Symbol table format
173From mmixal.w (or really, the generated mmixal.tex) in
174@url{http://www-cs-faculty.stanford.edu/~knuth/programs/mmix.tar.gz}):
175``Symbols are stored and retrieved by means of a @samp{ternary
176search trie}, following ideas of Bentley and Sedgewick. (See
177ACM--SIAM Symp.@: on Discrete Algorithms @samp{8} (1997), 360--369;
178R.@:Sedgewick, @samp{Algorithms in C} (Reading, Mass.@:
179Addison--Wesley, 1998), @samp{15.4}.)  Each trie node stores a
180character, and there are branches to subtries for the cases where
181a given character is less than, equal to, or greater than the
182character in the trie.  There also is a pointer to a symbol table
183entry if a symbol ends at the current node.''
184
185So it's a tree encoded as a stream of bytes.  The stream of bytes
186acts on a single virtual global symbol, adding and removing
187characters and signalling complete symbol points.  Here, we read
188the stream and create symbols at the completion points.
189
190First, there's a control byte @code{m}.  If any of the listed bits
191in @code{m} is nonzero, we execute what stands at the right, in
192the listed order:
193
194@example
195 (MMO3_LEFT)
196 0x40 - Traverse left trie.
197        (Read a new command byte and recurse.)
198
199 (MMO3_SYMBITS)
200 0x2f - Read the next byte as a character and store it in the
201        current character position; increment character position.
202        Test the bits of @code{m}:
203
204        (MMO3_WCHAR)
205        0x80 - The character is 16-bit (so read another byte,
206               merge into current character.
207
208        (MMO3_TYPEBITS)
209        0xf  - We have a complete symbol; parse the type, value
210               and serial number and do what should be done
211               with a symbol.  The type and length information
212               is in j = (m & 0xf).
213
214               (MMO3_REGQUAL_BITS)
215               j == 0xf: A register variable.  The following
216                         byte tells which register.
217               j <= 8:   An absolute symbol.  Read j bytes as the
218                         big-endian number the symbol equals.
219                         A j = 2 with two zero bytes denotes an
220                         unknown symbol.
221               j > 8:    As with j <= 8, but add (0x20 << 56)
222                         to the value in the following j - 8
223                         bytes.
224
225               Then comes the serial number, as a variant of
226               uleb128, but better named ubeb128:
227               Read bytes and shift the previous value left 7
228               (multiply by 128).  Add in the new byte, repeat
229               until a byte has bit 7 set.  The serial number
230               is the computed value minus 128.
231
232        (MMO3_MIDDLE)
233        0x20 - Traverse middle trie.  (Read a new command byte
234               and recurse.)  Decrement character position.
235
236 (MMO3_RIGHT)
237 0x10 - Traverse right trie.  (Read a new command byte and
238        recurse.)
239@end example
240
241Let's look again at the @code{lop_stab} for the trivial file
242(@pxref{File layout}).
243
244@example
245 0x980b0000 - lop_stab for ":Main" = 0, serial 1.
246 0x203a4040
247 0x10404020
248 0x4d206120
249 0x69016e00
250 0x81000000
251@end example
252
253This forms the trivial trie (note that the path between ``:'' and
254``M'' is redundant):
255
256@example
257 203a     ":"
258 40       /
259 40      /
260 10      \
261 40      /
262 40     /
263 204d  "M"
264 2061  "a"
265 2069  "i"
266 016e  "n" is the last character in a full symbol, and
267       with a value represented in one byte.
268 00    The value is 0.
269 81    The serial number is 1.
270@end example
271
272@node mmo section mapping, , Symbol-table, mmo
273@subsection mmo section mapping
274The implementation in BFD uses special data type 80 (decimal) to
275encapsulate and describe named sections, containing e.g.@: debug
276information.  If needed, any datum in the encapsulation will be
277quoted using lop_quote.  First comes a 32-bit word holding the
278number of 32-bit words containing the zero-terminated zero-padded
279segment name.  After the name there's a 32-bit word holding flags
280describing the section type.  Then comes a 64-bit big-endian word
281with the section length (in bytes), then another with the section
282start address.  Depending on the type of section, the contents
283might follow, zero-padded to 32-bit boundary.  For a loadable
284section (such as data or code), the contents might follow at some
285later point, not necessarily immediately, as a lop_loc with the
286same start address as in the section description, followed by the
287contents.  This in effect forms a descriptor that must be emitted
288before the actual contents.  Sections described this way must not
289overlap.
290
291For areas that don't have such descriptors, synthetic sections are
292formed by BFD.  Consecutive contents in the two memory areas
293@samp{0x0000@dots{}00} to @samp{0x01ff@dots{}ff} and
294@samp{0x2000@dots{}00} to @samp{0x20ff@dots{}ff} are entered in
295sections named @code{.text} and @code{.data} respectively.  If an area
296is not otherwise described, but would together with a neighboring
297lower area be less than @samp{0x40000000} bytes long, it is joined
298with the lower area and the gap is zero-filled.  For other cases,
299a new section is formed, named @code{.MMIX.sec.@var{n}}.  Here,
300@var{n} is a number, a running count through the mmo file,
301starting at 0.
302
303A loadable section specified as:
304
305@example
306 .section secname,"ax"
307 TETRA 1,2,3,4,-1,-2009
308 BYTE 80
309@end example
310
311and linked to address @samp{0x4}, is represented by the sequence:
312
313@example
314 0x98080050 - lop_spec 80
315 0x00000002 - two 32-bit words for the section name
316 0x7365636e - "secn"
317 0x616d6500 - "ame\0"
318 0x00000033 - flags CODE, READONLY, LOAD, ALLOC
319 0x00000000 - high 32 bits of section length
320 0x0000001c - section length is 28 bytes; 6 * 4 + 1 + alignment to 32 bits
321 0x00000000 - high 32 bits of section address
322 0x00000004 - section address is 4
323 0x98010002 - 64 bits with address of following data
324 0x00000000 - high 32 bits of address
325 0x00000004 - low 32 bits: data starts at address 4
326 0x00000001 - 1
327 0x00000002 - 2
328 0x00000003 - 3
329 0x00000004 - 4
330 0xffffffff - -1
331 0xfffff827 - -2009
332 0x50000000 - 80 as a byte, padded with zeros.
333@end example
334
335Note that the lop_spec wrapping does not include the section
336contents.  Compare this to a non-loaded section specified as:
337
338@example
339 .section thirdsec
340 TETRA 200001,100002
341 BYTE 38,40
342@end example
343
344This, when linked to address @samp{0x200000000000001c}, is
345represented by:
346
347@example
348 0x98080050 - lop_spec 80
349 0x00000002 - two 32-bit words for the section name
350 0x7365636e - "thir"
351 0x616d6500 - "dsec"
352 0x00000010 - flag READONLY
353 0x00000000 - high 32 bits of section length
354 0x0000000c - section length is 12 bytes; 2 * 4 + 2 + alignment to 32 bits
355 0x20000000 - high 32 bits of address
356 0x0000001c - low 32 bits of address 0x200000000000001c
357 0x00030d41 - 200001
358 0x000186a2 - 100002
359 0x26280000 - 38, 40 as bytes, padded with zeros
360@end example
361
362For the latter example, the section contents must not be
363loaded in memory, and is therefore specified as part of the
364special data.  The address is usually unimportant but might
365provide information for e.g.@: the DWARF 2 debugging format.
366