1[comment {-*- tcl -*- doctools manpage}] 2[manpage_begin grammar::me::cpu::core n 0.2] 3[copyright {2005-2006 Andreas Kupries <andreas_kupries@users.sourceforge.net>}] 4[moddesc {Grammar operations and usage}] 5[titledesc {ME virtual machine state manipulation}] 6[category {Grammars and finite automata}] 7[require Tcl 8.4] 8[require grammar::me::cpu::core [opt 0.2]] 9[description] 10[keywords {virtual machine} parsing grammar] 11[para] 12 13This package provides an implementation of the ME virtual machine. 14 15Please go and read the document [syscmd grammar::me_intro] first if 16you do not know what a ME virtual machine is. 17 18[para] 19 20This implementation represents each ME virtual machine as a Tcl value 21and provides commands to manipulate and query such values to show the 22effects of executing instructions, adding tokens, retrieving state, 23etc. 24 25[para] 26 27The values fully follow the paradigm of Tcl that every value is a 28string and while also allowing C implementations for a proper 29Tcl_ObjType to keep all the important data in native data structures. 30 31Because of the latter it is recommended to access the state values 32[emph only] through the commands of this package to ensure that 33internal representation is not shimmered away. 34 35[para] 36 37The actual structure used by all state values is described in section 38[sectref {CPU STATE}]. 39 40 41[section API] 42 43The package directly provides only a single command, and all the 44functionality is made available through its methods. 45 46[list_begin definitions] 47 48[call [cmd ::grammar::me::cpu::core] [method disasm] [arg asm]] 49 50This method returns a list containing a disassembly of the match 51instructions in [arg asm]. The format of [arg asm] is specified in the 52section [sectref {MATCH PROGRAM REPRESENTATION}]. 53 54[para] 55 56Each element of the result contains instruction label, instruction 57name, and the instruction arguments, in this order. The label can be 58the empty string. Jump destinations are shown as labels, strings and 59tokens unencoded. Token names are prefixed with their numeric id, if, 60and only if a tokmap is defined. The two components are separated by a 61colon. 62 63 64[call [cmd ::grammar::me::cpu::core] [method asm] [arg asm]] 65 66This method returns code in the format as specified in section 67[sectref {MATCH PROGRAM REPRESENTATION}] generated from ME assembly 68code [arg asm], which is in the format as returned by the method 69[method disasm]. 70 71 72[call [cmd ::grammar::me::cpu::core] [method new] [arg asm]] 73 74This method creates state value for a ME virtual machine in its 75initial state and returns it as its result. 76 77[para] 78 79The argument [arg matchcode] contains a Tcl representation of the 80match instructions the machine has to execute while parsing the input 81stream. Its format is specified in the section 82[sectref {MATCH PROGRAM REPRESENTATION}]. 83 84[para] 85 86The [arg tokmap] argument taken by the implementation provided by the 87package [package grammar::me::tcl] is here hidden inside of the match 88instructions and therefore not needed. 89 90 91[call [cmd ::grammar::me::cpu::core] [method lc] [arg state] [arg location]] 92 93This method takes the state value of a ME virtual machine and uses it 94to convert a location in the input stream (as offset) into a line 95number and column index. The result of the method is a 2-element list 96containing the two pieces in the order mentioned in the previous 97sentence. 98 99[para] 100 101[emph Note] that the method cannot convert locations which the machine 102has not yet read from the input stream. In other words, if the machine 103has read 7 characters so far it is possible to convert the offsets 104[const 0] to [const 6], but nothing beyond that. This also shows that 105it is not possible to convert offsets which refer to locations before 106the beginning of the stream. 107 108[para] 109 110This utility allows higher levels to convert the location offsets 111found in the error status and the AST into more human readable data. 112 113 114[call [cmd ::grammar::me::cpu::core] [method tok] [arg state] [opt "[arg from] [opt [arg to]]"]] 115 116This method takes the state value of a ME virtual machine and returns 117a Tcl list containing the part of the input stream between the 118locations [arg from] and [arg to] (both inclusive). If [arg to] is not 119specified it will default to the value of [arg from]. If [arg from] is 120not specified either the whole input stream is returned. 121 122[para] 123 124This method places the same restrictions on its location arguments as 125the method [method lc]. 126 127 128[call [cmd ::grammar::me::cpu::core] [method pc] [arg state]] 129 130This method takes the state value of a ME virtual machine and returns 131the current value of the stored program counter. 132 133 134[call [cmd ::grammar::me::cpu::core] [method iseof] [arg state]] 135 136This method takes the state value of a ME virtual machine and returns 137the current value of the stored eof flag. 138 139 140[call [cmd ::grammar::me::cpu::core] [method at] [arg state]] 141 142This method takes the state value of a ME virtual machine and returns 143the current location in the input stream. 144 145 146[call [cmd ::grammar::me::cpu::core] [method cc] [arg state]] 147 148This method takes the state value of a ME virtual machine and returns 149the current token. 150 151 152[call [cmd ::grammar::me::cpu::core] [method sv] [arg state]] 153 154This method takes the state value of a ME virtual machine and returns 155the current semantic value stored in it. 156 157This is an abstract syntax tree as specified in the document 158[syscmd grammar::me_ast], section [sectref-external {AST VALUES}]. 159 160 161[call [cmd ::grammar::me::cpu::core] [method ok] [arg state]] 162 163This method takes the state value of a ME virtual machine and returns 164the match status stored in it. 165 166 167[call [cmd ::grammar::me::cpu::core] [method error] [arg state]] 168 169This method takes the state value of a ME virtual machine and returns 170the current error status stored in it. 171 172 173[call [cmd ::grammar::me::cpu::core] [method lstk] [arg state]] 174 175This method takes the state value of a ME virtual machine and returns 176the location stack. 177 178 179[call [cmd ::grammar::me::cpu::core] [method astk] [arg state]] 180 181This method takes the state value of a ME virtual machine and returns 182the AST stack. 183 184 185[call [cmd ::grammar::me::cpu::core] [method mstk] [arg state]] 186 187This method takes the state value of a ME virtual machine and returns 188the AST marker stack. 189 190 191[call [cmd ::grammar::me::cpu::core] [method estk] [arg state]] 192 193This method takes the state value of a ME virtual machine and returns 194the error stack. 195 196 197[call [cmd ::grammar::me::cpu::core] [method rstk] [arg state]] 198 199This method takes the state value of a ME virtual machine and returns 200the subroutine return stack. 201 202 203[call [cmd ::grammar::me::cpu::core] [method nc] [arg state]] 204 205This method takes the state value of a ME virtual machine and returns 206the nonterminal match cache as a dictionary. 207 208 209[call [cmd ::grammar::me::cpu::core] [method ast] [arg state]] 210 211This method takes the state value of a ME virtual machine and returns 212the abstract syntax tree currently at the top of the AST stack stored 213in it. 214 215This is an abstract syntax tree as specified in the document 216[syscmd grammar::me_ast], section [sectref-external {AST VALUES}]. 217 218 219[call [cmd ::grammar::me::cpu::core] [method halted] [arg state]] 220 221This method takes the state value of a ME virtual machine and returns 222the current halt status stored in it, i.e. if the machine has stopped 223or not. 224 225 226[call [cmd ::grammar::me::cpu::core] [method code] [arg state]] 227 228This method takes the state value of a ME virtual machine and returns 229the code stored in it, i.e. the instructions executed by the machine. 230 231 232[call [cmd ::grammar::me::cpu::core] [method eof] [arg statevar]] 233 234This method takes the state value of a ME virtual machine as stored in 235the variable named by [arg statevar] and modifies it so that the eof 236flag inside is set. This signals to the machine that whatever token 237are in the input queue are the last to be processed. There will be no 238more. 239 240 241[call [cmd ::grammar::me::cpu::core] [method put] [arg statevar] [arg tok] [arg lex] [arg line] [arg col]] 242 243This method takes the state value of a ME virtual machine as stored in 244the variable named by [arg statevar] and modifies it so that the token 245[arg tok] is added to the end of the input queue, with associated 246lexeme data [arg lex] and [arg line]/[arg col]umn information. 247 248[para] 249 250The operation will fail with an error if the eof flag of the machine 251has been set through the method [method eof]. 252 253 254[call [cmd ::grammar::me::cpu::core] [method run] [arg statevar] [opt [arg n]]] 255 256This method takes the state value of a ME virtual machine as stored in 257the variable named by [arg statevar], executes a number of 258instructions and stores the state resulting from their modifications 259back into the variable. 260 261[para] 262 263The execution loop will run until either 264 265[list_begin itemized] 266[item] [arg n] instructions have been executed, or 267[item] a halt instruction was executed, or 268[item] 269the input queue is empty and the code is asking for more tokens to 270process. 271[list_end] 272[para] 273 274If no limit [arg n] was set only the last two conditions are checked 275for. 276 277[list_end] 278 279 280[subsection {MATCH PROGRAM REPRESENTATION}] 281 282A match program is represented by nested Tcl list. The first element, 283[term asm], is a list of integer numbers, the instructions to execute, 284and their arguments. The second element, [term pool], is a list of 285strings, referenced by the instructions, for error messages, token 286names, etc. The third element, [term tokmap], provides ordering 287information for the tokens, mapping their names to their numerical 288rank. This element can be empty, forcing lexicographic comparison when 289matching ranges. 290 291[para] 292 293All ME instructions are encoded as integer numbers, with the mapping 294given below. A number of the instructions, those which handle error 295messages, have been given an additional argument to supply that 296message explicitly instead of having it constructed from token names, 297etc. This allows the machine state to store only the message ids 298instead of the full strings. 299 300[para] 301 302Jump destination arguments are absolute indices into the [term asm] 303element, refering to the instruction to jump to. Any string arguments 304are absolute indices into the [term pool] element. Tokens, characters, 305messages, and token (actually character) classes to match are coded as 306references into the [term pool] as well. 307 308[para] 309[list_begin enumerated] 310 311[enum] "[cmd ict_advance] [arg message]" 312[enum] "[cmd ict_match_token] [arg tok] [arg message]" 313[enum] "[cmd ict_match_tokrange] [arg tokbegin] [arg tokend] [arg message]" 314[enum] "[cmd ict_match_tokclass] [arg code] [arg message]" 315[enum] "[cmd inc_restore] [arg branchlabel] [arg nt]" 316[enum] "[cmd inc_save] [arg nt]" 317[enum] "[cmd icf_ntcall] [arg branchlabel]" 318[enum] "[cmd icf_ntreturn]" 319[enum] "[cmd iok_ok]" 320[enum] "[cmd iok_fail]" 321[enum] "[cmd iok_negate]" 322[enum] "[cmd icf_jalways] [arg branchlabel]" 323[enum] "[cmd icf_jok] [arg branchlabel]" 324[enum] "[cmd icf_jfail] [arg branchlabel]" 325[enum] "[cmd icf_halt]" 326[enum] "[cmd icl_push]" 327[enum] "[cmd icl_rewind]" 328[enum] "[cmd icl_pop]" 329[enum] "[cmd ier_push]" 330[enum] "[cmd ier_clear]" 331[enum] "[cmd ier_nonterminal] [arg message]" 332[enum] "[cmd ier_merge]" 333[enum] "[cmd isv_clear]" 334[enum] "[cmd isv_terminal]" 335[enum] "[cmd isv_nonterminal_leaf] [arg nt]" 336[enum] "[cmd isv_nonterminal_range] [arg nt]" 337[enum] "[cmd isv_nonterminal_reduce] [arg nt]" 338[enum] "[cmd ias_push]" 339[enum] "[cmd ias_mark]" 340[enum] "[cmd ias_mrewind]" 341[enum] "[cmd ias_mpop]" 342[list_end] 343 344 345[section {CPU STATE}] 346 347A state value is a list containing the following elements, in the order listed below: 348 349[list_begin enumerated] 350[enum] [term code]: Match instructions, see [sectref {MATCH PROGRAM REPRESENTATION}]. 351[enum] [term pc]: Program counter, [term int]. 352[enum] [term halt]: Halt flag, [term boolean]. 353[enum] [term eof]: Eof flag, [term boolean] 354[enum] [term tc]: Terminal cache, and input queue. Structure see below. 355[enum] [term cl]: Current location, [term int]. 356[enum] [term ct]: Current token, [term string]. 357[enum] [term ok]: Match status, [term boolean]. 358[enum] [term sv]: Semantic value, [term list]. 359[enum] [term er]: Error status, [term list]. 360[enum] [term ls]: Location stack, [term list]. 361[enum] [term as]: AST stack, [term list]. 362[enum] [term ms]: AST marker stack, [term list]. 363[enum] [term es]: Error stack, [term list]. 364[enum] [term rs]: Return stack, [term list]. 365[enum] [term nc]: Nonterminal cache, [term dictionary]. 366[list_end] 367[para] 368 369[term tc], the input queue of tokens waiting for processing and the 370terminal cache containing the tokens already processing are one 371unified data structure simply holding all tokens and their 372information, with the current location separating that which has been 373processed from that which is waiting. 374 375Each element of the queue/cache is a list containing the token, its 376lexeme information, line number, and column index, in this order. 377 378[para] 379 380All stacks have their top element aat the end, i.e. pushing an item is 381equivalent to appending to the list representing the stack, and 382popping it removes the last element. 383 384[para] 385 386[term er], the error status is either empty or a list of two elements, 387a location in the input, and a list of messages, encoded as references 388into the [term pool] element of the [term code]. 389 390[para] 391 392[term nc], the nonterminal cache is keyed by nonterminal name and 393location, each value a four-element list containing current location, 394match status, semantic value, and error status, in this order. 395 396[section {BUGS, IDEAS, FEEDBACK}] 397 398This document, and the package it describes, will undoubtedly contain 399bugs and other problems. 400 401Please report such in the category [emph grammar_me] of the 402[uri {http://sourceforge.net/tracker/?group_id=12883} {Tcllib SF Trackers}]. 403 404Please also report any ideas for enhancements you may have for either 405package and/or documentation. 406 407 408[manpage_end] 409