1[comment {-*- tcl -*- doctools manpage}]
2[manpage_begin grammar::me::cpu::core n 0.2]
3[copyright {2005-2006 Andreas Kupries <andreas_kupries@users.sourceforge.net>}]
4[moddesc   {Grammar operations and usage}]
5[titledesc {ME virtual machine state manipulation}]
6[category  {Grammars and finite automata}]
7[require Tcl 8.4]
8[require grammar::me::cpu::core [opt 0.2]]
9[description]
10[keywords {virtual machine} parsing grammar]
11[para]
12
13This package provides an implementation of the ME virtual machine.
14
15Please go and read the document [syscmd grammar::me_intro] first if
16you do not know what a ME virtual machine is.
17
18[para]
19
20This implementation represents each ME virtual machine as a Tcl value
21and provides commands to manipulate and query such values to show the
22effects of executing instructions, adding tokens, retrieving state,
23etc.
24
25[para]
26
27The values fully follow the paradigm of Tcl that every value is a
28string and while also allowing C implementations for a proper
29Tcl_ObjType to keep all the important data in native data structures.
30
31Because of the latter it is recommended to access the state values
32[emph only] through the commands of this package to ensure that
33internal representation is not shimmered away.
34
35[para]
36
37The actual structure used by all state values is described in section
38[sectref {CPU STATE}].
39
40
41[section API]
42
43The package directly provides only a single command, and all the
44functionality is made available through its methods.
45
46[list_begin definitions]
47
48[call [cmd ::grammar::me::cpu::core] [method disasm] [arg asm]]
49
50This method returns a list containing a disassembly of the match
51instructions in [arg asm]. The format of [arg asm] is specified in the
52section [sectref {MATCH PROGRAM REPRESENTATION}].
53
54[para]
55
56Each element of the result contains instruction label, instruction
57name, and the instruction arguments, in this order. The label can be
58the empty string. Jump destinations are shown as labels, strings and
59tokens unencoded. Token names are prefixed with their numeric id, if,
60and only if a tokmap is defined. The two components are separated by a
61colon.
62
63
64[call [cmd ::grammar::me::cpu::core] [method asm] [arg asm]]
65
66This method returns code in the format as specified in section
67[sectref {MATCH PROGRAM REPRESENTATION}] generated from ME assembly
68code [arg asm], which is in the format as returned by the method
69[method disasm].
70
71
72[call [cmd ::grammar::me::cpu::core] [method new] [arg asm]]
73
74This method creates state value for a ME virtual machine in its
75initial state and returns it as its result.
76
77[para]
78
79The argument [arg matchcode] contains a Tcl representation of the
80match instructions the machine has to execute while parsing the input
81stream. Its format is specified in the section
82[sectref {MATCH PROGRAM REPRESENTATION}].
83
84[para]
85
86The [arg tokmap] argument taken by the implementation provided by the
87package [package grammar::me::tcl] is here hidden inside of the match
88instructions and therefore not needed.
89
90
91[call [cmd ::grammar::me::cpu::core] [method lc] [arg state] [arg location]]
92
93This method takes the state value of a ME virtual machine and uses it
94to convert a location in the input stream (as offset) into a line
95number and column index. The result of the method is a 2-element list
96containing the two pieces in the order mentioned in the previous
97sentence.
98
99[para]
100
101[emph Note] that the method cannot convert locations which the machine
102has not yet read from the input stream. In other words, if the machine
103has read 7 characters so far it is possible to convert the offsets
104[const 0] to [const 6], but nothing beyond that. This also shows that
105it is not possible to convert offsets which refer to locations before
106the beginning of the stream.
107
108[para]
109
110This utility allows higher levels to convert the location offsets
111found in the error status and the AST into more human readable data.
112
113
114[call [cmd ::grammar::me::cpu::core] [method tok] [arg state] [opt "[arg from] [opt [arg to]]"]]
115
116This method takes the state value of a ME virtual machine and returns
117a Tcl list containing the part of the input stream between the
118locations [arg from] and [arg to] (both inclusive). If [arg to] is not
119specified it will default to the value of [arg from]. If [arg from] is
120not specified either the whole input stream is returned.
121
122[para]
123
124This method places the same restrictions on its location arguments as
125the method [method lc].
126
127
128[call [cmd ::grammar::me::cpu::core] [method pc] [arg state]]
129
130This method takes the state value of a ME virtual machine and returns
131the current value of the stored program counter.
132
133
134[call [cmd ::grammar::me::cpu::core] [method iseof] [arg state]]
135
136This method takes the state value of a ME virtual machine and returns
137the current value of the stored eof flag.
138
139
140[call [cmd ::grammar::me::cpu::core] [method at] [arg state]]
141
142This method takes the state value of a ME virtual machine and returns
143the current location in the input stream.
144
145
146[call [cmd ::grammar::me::cpu::core] [method cc] [arg state]]
147
148This method takes the state value of a ME virtual machine and returns
149the current token.
150
151
152[call [cmd ::grammar::me::cpu::core] [method sv] [arg state]]
153
154This method takes the state value of a ME virtual machine and returns
155the current semantic value stored in it.
156
157This is an abstract syntax tree as specified in the document
158[syscmd grammar::me_ast], section [sectref-external {AST VALUES}].
159
160
161[call [cmd ::grammar::me::cpu::core] [method ok] [arg state]]
162
163This method takes the state value of a ME virtual machine and returns
164the match status stored in it.
165
166
167[call [cmd ::grammar::me::cpu::core] [method error] [arg state]]
168
169This method takes the state value of a ME virtual machine and returns
170the current error status stored in it.
171
172
173[call [cmd ::grammar::me::cpu::core] [method lstk] [arg state]]
174
175This method takes the state value of a ME virtual machine and returns
176the location stack.
177
178
179[call [cmd ::grammar::me::cpu::core] [method astk] [arg state]]
180
181This method takes the state value of a ME virtual machine and returns
182the AST stack.
183
184
185[call [cmd ::grammar::me::cpu::core] [method mstk] [arg state]]
186
187This method takes the state value of a ME virtual machine and returns
188the AST marker stack.
189
190
191[call [cmd ::grammar::me::cpu::core] [method estk] [arg state]]
192
193This method takes the state value of a ME virtual machine and returns
194the error stack.
195
196
197[call [cmd ::grammar::me::cpu::core] [method rstk] [arg state]]
198
199This method takes the state value of a ME virtual machine and returns
200the subroutine return stack.
201
202
203[call [cmd ::grammar::me::cpu::core] [method nc] [arg state]]
204
205This method takes the state value of a ME virtual machine and returns
206the nonterminal match cache as a dictionary.
207
208
209[call [cmd ::grammar::me::cpu::core] [method ast] [arg state]]
210
211This method takes the state value of a ME virtual machine and returns
212the abstract syntax tree currently at the top of the AST stack stored
213in it.
214
215This is an abstract syntax tree as specified in the document
216[syscmd grammar::me_ast], section [sectref-external {AST VALUES}].
217
218
219[call [cmd ::grammar::me::cpu::core] [method halted] [arg state]]
220
221This method takes the state value of a ME virtual machine and returns
222the current halt status stored in it, i.e. if the machine has stopped
223or not.
224
225
226[call [cmd ::grammar::me::cpu::core] [method code] [arg state]]
227
228This method takes the state value of a ME virtual machine and returns
229the code stored in it, i.e. the instructions executed by the machine.
230
231
232[call [cmd ::grammar::me::cpu::core] [method eof] [arg statevar]]
233
234This method takes the state value of a ME virtual machine as stored in
235the variable named by [arg statevar] and modifies it so that the eof
236flag inside is set. This signals to the machine that whatever token
237are in the input queue are the last to be processed. There will be no
238more.
239
240
241[call [cmd ::grammar::me::cpu::core] [method put] [arg statevar] [arg tok] [arg lex] [arg line] [arg col]]
242
243This method takes the state value of a ME virtual machine as stored in
244the variable named by [arg statevar] and modifies it so that the token
245[arg tok] is added to the end of the input queue, with associated
246lexeme data [arg lex] and [arg line]/[arg col]umn information.
247
248[para]
249
250The operation will fail with an error if the eof flag of the machine
251has been set through the method [method eof].
252
253
254[call [cmd ::grammar::me::cpu::core] [method run] [arg statevar] [opt [arg n]]]
255
256This method takes the state value of a ME virtual machine as stored in
257the variable named by [arg statevar], executes a number of
258instructions and stores the state resulting from their modifications
259back into the variable.
260
261[para]
262
263The execution loop will run until either
264
265[list_begin itemized]
266[item] [arg n] instructions have been executed, or
267[item] a halt instruction was executed, or
268[item]
269the input queue is empty and the code is asking for more tokens to
270process.
271[list_end]
272[para]
273
274If no limit [arg n] was set only the last two conditions are checked
275for.
276
277[list_end]
278
279
280[subsection {MATCH PROGRAM REPRESENTATION}]
281
282A match program is represented by nested Tcl list. The first element,
283[term asm], is a list of integer numbers, the instructions to execute,
284and their arguments. The second element, [term pool], is a list of
285strings, referenced by the instructions, for error messages, token
286names, etc. The third element, [term tokmap], provides ordering
287information for the tokens, mapping their names to their numerical
288rank. This element can be empty, forcing lexicographic comparison when
289matching ranges.
290
291[para]
292
293All ME instructions are encoded as integer numbers, with the mapping
294given below. A number of the instructions, those which handle error
295messages, have been given an additional argument to supply that
296message explicitly instead of having it constructed from token names,
297etc. This allows the machine state to store only the message ids
298instead of the full strings.
299
300[para]
301
302Jump destination arguments are absolute indices into the [term asm]
303element, refering to the instruction to jump to. Any string arguments
304are absolute indices into the [term pool] element. Tokens, characters,
305messages, and token (actually character) classes to match are coded as
306references into the [term pool] as well.
307
308[para]
309[list_begin enumerated]
310
311[enum] "[cmd ict_advance] [arg message]"
312[enum] "[cmd ict_match_token] [arg tok] [arg message]"
313[enum] "[cmd ict_match_tokrange] [arg tokbegin] [arg tokend] [arg message]"
314[enum] "[cmd ict_match_tokclass] [arg code] [arg message]"
315[enum] "[cmd inc_restore] [arg branchlabel] [arg nt]"
316[enum] "[cmd inc_save] [arg nt]"
317[enum] "[cmd icf_ntcall] [arg branchlabel]"
318[enum] "[cmd icf_ntreturn]"
319[enum] "[cmd iok_ok]"
320[enum] "[cmd iok_fail]"
321[enum] "[cmd iok_negate]"
322[enum] "[cmd icf_jalways] [arg branchlabel]"
323[enum] "[cmd icf_jok] [arg branchlabel]"
324[enum] "[cmd icf_jfail] [arg branchlabel]"
325[enum] "[cmd icf_halt]"
326[enum] "[cmd icl_push]"
327[enum] "[cmd icl_rewind]"
328[enum] "[cmd icl_pop]"
329[enum] "[cmd ier_push]"
330[enum] "[cmd ier_clear]"
331[enum] "[cmd ier_nonterminal] [arg message]"
332[enum] "[cmd ier_merge]"
333[enum] "[cmd isv_clear]"
334[enum] "[cmd isv_terminal]"
335[enum] "[cmd isv_nonterminal_leaf] [arg nt]"
336[enum] "[cmd isv_nonterminal_range] [arg nt]"
337[enum] "[cmd isv_nonterminal_reduce] [arg nt]"
338[enum] "[cmd ias_push]"
339[enum] "[cmd ias_mark]"
340[enum] "[cmd ias_mrewind]"
341[enum] "[cmd ias_mpop]"
342[list_end]
343
344
345[section {CPU STATE}]
346
347A state value is a list containing the following elements, in the order listed below:
348
349[list_begin enumerated]
350[enum] [term code]: Match instructions, see [sectref {MATCH PROGRAM REPRESENTATION}].
351[enum] [term pc]:   Program counter, [term int].
352[enum] [term halt]: Halt flag, [term boolean].
353[enum] [term eof]:  Eof flag, [term boolean]
354[enum] [term tc]:   Terminal cache, and input queue. Structure see below.
355[enum] [term cl]:   Current location, [term int].
356[enum] [term ct]:   Current token, [term string].
357[enum] [term ok]:   Match status, [term boolean].
358[enum] [term sv]:   Semantic value, [term list].
359[enum] [term er]:   Error status, [term list].
360[enum] [term ls]:   Location stack, [term list].
361[enum] [term as]:   AST stack, [term list].
362[enum] [term ms]:   AST marker stack, [term list].
363[enum] [term es]:   Error stack, [term list].
364[enum] [term rs]:   Return stack, [term list].
365[enum] [term nc]:   Nonterminal cache, [term dictionary].
366[list_end]
367[para]
368
369[term tc], the input queue of tokens waiting for processing and the
370terminal cache containing the tokens already processing are one
371unified data structure simply holding all tokens and their
372information, with the current location separating that which has been
373processed from that which is waiting.
374
375Each element of the queue/cache is a list containing the token, its
376lexeme information, line number, and column index, in this order.
377
378[para]
379
380All stacks have their top element aat the end, i.e. pushing an item is
381equivalent to appending to the list representing the stack, and
382popping it removes the last element.
383
384[para]
385
386[term er], the error status is either empty or a list of two elements,
387a location in the input, and a list of messages, encoded as references
388into the [term pool] element of the [term code].
389
390[para]
391
392[term nc], the nonterminal cache is keyed by nonterminal name and
393location, each value a four-element list containing current location,
394match status, semantic value, and error status, in this order.
395
396[section {BUGS, IDEAS, FEEDBACK}]
397
398This document, and the package it describes, will undoubtedly contain
399bugs and other problems.
400
401Please report such in the category [emph grammar_me] of the
402[uri {http://sourceforge.net/tracker/?group_id=12883} {Tcllib SF Trackers}].
403
404Please also report any ideas for enhancements you may have for either
405package and/or documentation.
406
407
408[manpage_end]
409