1(* 2 * @TAG(OTHER_PRINCETON_OSS) 3 *) 4(* This is an -*- sml -*- file *) 5(* Modified by Michael.Norrish@nicta.com.au on 2005-04-13 so that it compiles 6 with both mlton and mosml. *) 7(* Modified by sweeks@acm.org on 2000-8-24. 8 * Ported to MLton. 9 *) 10(* Lexical analyzer generator for Standard ML. 11 Version 1.7.0, June 1998 12 13Copyright (c) 1989-1992 by Andrew W. Appel, 14 David R. Tarditi, James S. Mattson 15 16This software comes with ABSOLUTELY NO WARRANTY. 17This software is subject only to the PRINCETON STANDARD ML SOFTWARE LIBRARY 18COPYRIGHT NOTICE, LICENSE AND DISCLAIMER, (in the file "COPYRIGHT", 19distributed with this software). You may copy and distribute this software; 20see the COPYRIGHT NOTICE for details and restrictions. 21 22 Changes: 23 07/25/89 (drt): added %header declaration, code to place 24 user declarations at same level as makeLexer, etc. 25 This is needed for the parser generator. 26 /10/89 (appel): added %arg declaration (see lexgen.doc). 27 /04/90 (drt): fixed following bug: couldn't use the lexer after an 28 error occurred -- NextTok and inquote weren't being reset 29 10/22/91 (drt): disabled use of lookahead 30 10/23/92 (drt): disabled use of $ operator (which involves lookahead), 31 added handlers for dictionary lookup routine 32 11/02/92 (drt): changed handler for exception Reject in generated lexer 33 to Internal.Reject 34 02/01/94 (appel): Moved the exception handler for Reject in such 35 a way as to allow tail-recursion (improves performance 36 wonderfully!). 37 02/01/94 (appel): Fixed a bug in parsing of state names. 38 05/19/94 (Mikael Pettersson, mpe@ida.liu.se): 39 Transition tables are usually represented as strings, but 40 when the range is too large, int vectors constructed by 41 code like "Vector.vector[1,2,3,...]" are used instead. 42 The problem with this isn't that the vector itself takes 43 a lot of space, but that the code generated by SML/NJ to 44 construct the intermediate list at run-time is *HUGE*. My 45 fix is to encode an int vector as a string literal (using 46 two bytes per int) and emit code to decode the string to 47 a vector at run-time. SML/NJ compiles string literals into 48 substrings in the code, so this uses much less space. 49 06/02/94 (jhr): Modified export-lex.sml to conform to new installation 50 scheme. Also removed tab characters from string literals. 51 10/05/94 (jhr): Changed generator to produce code that uses the new 52 basis style strings and characters. 53 10/06/94 (jhr) Modified code to compile under new basis style strings 54 and characters. 55 02/08/95 (jhr) Modified to use new List module interface. 56 05/18/95 (jhr) changed Vector.vector to Vector.fromList 57* 58 * $Log$ 59 * Revision 1.3 2005/07/21 07:01:27 michaeln 60 * Get mllex to cope with actions that include strings with unbalanced 61 * parentheses. (Code taken from SML/NJ's mllex.) 62 * 63 * Revision 1.2 2005/04/14 05:42:08 michaeln 64 * Slight change to allow the product of mllex foo to be compiled by mosml 65 * without having to use the -toplevel option. Also a "fix" for an off-by-one 66 * issue that I think is a bug. 67 * 68 * Revision 1.1 2005/04/13 05:31:30 michaeln 69 * A MoscowML compilable version of the "standard" mllex tool, as used by 70 * both SML/NJ and MLton. The source code is also compilable by mlton, 71 * though this is cute more than useful as mlton comes with a version of mllex 72 * anyway. 73 * 74 * Revision 1.1.1.1 1998/04/08 18:40:10 george 75 * Version 110.5 76 * 77 * Revision 1.9 1998/01/06 19:23:53 appel 78 * added %posarg feature to permit position-within-file to be passed 79 * as a parameter to makeLexer 80 * 81# Revision 1.8 1998/01/06 19:01:48 appel 82# repaired error messages like "cannot have both %structure and %header" 83# 84# Revision 1.7 1998/01/06 18:55:49 appel 85# permit %% to be unescaped within regular expressions 86# 87# Revision 1.6 1998/01/06 18:46:13 appel 88# removed undocumented feature that permitted extra %% at end of rules 89# 90# Revision 1.5 1998/01/06 18:29:23 appel 91# put yylineno variable inside makeLexer function 92# 93# Revision 1.4 1998/01/06 18:19:59 appel 94# check for newline inside quoted string 95# 96# Revision 1.3 1997/10/04 03:52:13 dbm 97# Fix to remove output file if ml-lex fails. 98# 99# Revision 1.2 1997/05/06 01:12:38 george 100# *** empty log message *** 101# 102 * Revision 1.2 1996/02/26 15:02:27 george 103 * print no longer overloaded. 104 * use of makestring has been removed and replaced with Int.toString .. 105 * use of IO replaced with TextIO 106 * 107 * Revision 1.1.1.1 1996/01/31 16:01:15 george 108 * Version 109 109 * 110 *) 111 112(* Subject: lookahead in sml-lex 113 Reply-to: david.tarditi@CS.CMU.EDU 114 Date: Mon, 21 Oct 91 14:13:26 -0400 115 116There is a serious bug in the implementation of lookahead, 117as done in sml-lex, and described in Aho, Sethi, and Ullman, 118p. 134 "Implementing the Lookahead Operator" 119 120We have disallowed the use of lookahead for now because 121of this bug. 122 123As a counter-example to the implementation described in 124ASU, consider the following specification with the 125input string "aba" (this example is taken from 126a comp.compilers message from Dec. 1989, I think): 127 128type lexresult=unit 129val linenum = ref 1 130fun error x = TextIO.output(TextIO.stdErr, x ^ "\n") 131val eof = fn () => () 132%% 133%structure Lex 134%% 135(a|ab)/ba => (print yytext; print "\n"; ()); 136 137The ASU proposal works as follows. Suppose that we are 138using NFA's to represent our regular expressions. Then to 139build an NFA for e1 / e2, we build an NFA n1 for e1 140and an NFA n2 for e2, and add an epsilon transition 141from e1 to e2. 142 143When lexing, when we encounter the end state of e1e2, 144we take as the end of the string the position in 145the string that was the last occurrence of the state of 146the NFA having a transition on the epsilon introduced 147for /. 148 149Using the example we have above, we'll have an NFA 150with the following states: 151 152 153 1 -- a --> 2 -- b --> 3 154 | | 155 | epsilon | epsilon 156 | | 157 |------------> 4 -- b --> 5 -- a --> 6 158 159On our example, we get the following list of transitions: 160 161a : 2, 4 (make an epsilon transition from 2 to 4) 162ab : 3, 4, 5 (make an epsilon transition from 3 to 4) 163aba : 6 164 165If we chose the last state in which we made an epsilon transition, 166we'll chose the transition from 3 to 4, and end up with "ab" 167as our token, when we should have "a" as our token. 168 169*) 170 171functor RedBlack(B : sig type key 172 val > : key*key->bool 173 end): 174 sig type tree 175 type key 176 val empty : tree 177 val insert : key * tree -> tree 178 val lookup : key * tree -> key 179 exception notfound of key 180 end = 181struct 182 open B 183 datatype color = RED | BLACK 184 datatype tree = empty | tree of key * color * tree * tree 185 exception notfound of key 186 187 fun insert (key,t) = 188 let fun f empty = tree(key,RED,empty,empty) 189 | f (tree(k,BLACK,l,r)) = 190 if key>k 191 then case f r 192 of r as tree(rk,RED, rl as tree(rlk,RED,rll,rlr),rr) => 193 (case l 194 of tree(lk,RED,ll,lr) => 195 tree(k,RED,tree(lk,BLACK,ll,lr), 196 tree(rk,BLACK,rl,rr)) 197 | _ => tree(rlk,BLACK,tree(k,RED,l,rll), 198 tree(rk,RED,rlr,rr))) 199 | r as tree(rk,RED,rl, rr as tree(rrk,RED,rrl,rrr)) => 200 (case l 201 of tree(lk,RED,ll,lr) => 202 tree(k,RED,tree(lk,BLACK,ll,lr), 203 tree(rk,BLACK,rl,rr)) 204 | _ => tree(rk,BLACK,tree(k,RED,l,rl),rr)) 205 | r => tree(k,BLACK,l,r) 206 else if k>key 207 then case f l 208 of l as tree(lk,RED,ll, lr as tree(lrk,RED,lrl,lrr)) => 209 (case r 210 of tree(rk,RED,rl,rr) => 211 tree(k,RED,tree(lk,BLACK,ll,lr), 212 tree(rk,BLACK,rl,rr)) 213 | _ => tree(lrk,BLACK,tree(lk,RED,ll,lrl), 214 tree(k,RED,lrr,r))) 215 | l as tree(lk,RED, ll as tree(llk,RED,lll,llr), lr) => 216 (case r 217 of tree(rk,RED,rl,rr) => 218 tree(k,RED,tree(lk,BLACK,ll,lr), 219 tree(rk,BLACK,rl,rr)) 220 | _ => tree(lk,BLACK,ll,tree(k,RED,lr,r))) 221 | l => tree(k,BLACK,l,r) 222 else tree(key,BLACK,l,r) 223 | f (tree(k,RED,l,r)) = 224 if key>k then tree(k,RED,l, f r) 225 else if k>key then tree(k,RED, f l, r) 226 else tree(key,RED,l,r) 227 in case f t 228 of tree(k,RED, l as tree(_,RED,_,_), r) => tree(k,BLACK,l,r) 229 | tree(k,RED, l, r as tree(_,RED,_,_)) => tree(k,BLACK,l,r) 230 | t => t 231 end 232 233 234 fun lookup (key,t) = 235 let fun look empty = raise (notfound key) 236 | look (tree(k,_,l,r)) = 237 if k>key then look l 238 else if key>k then look r 239 else k 240 in look t 241 end 242 243end 244 245signature LEXGEN = 246 sig 247 val lexGen: string -> unit 248 end 249 250structure LexGen: LEXGEN = 251 struct 252 open Array List 253 infix 9 sub 254 255 datatype token = CHARS of bool array | QMARK | STAR | PLUS | BAR 256 | LP | RP | CARAT | DOLLAR | SLASH | STATE of string list 257 | REPS of int * int | ID of string | ACTION of string 258 | BOF | EOF | ASSIGN | SEMI | ARROW | LEXMARK | LEXSTATES 259 | COUNT | REJECT | FULLCHARSET | STRUCT | HEADER | ARG | POSARG 260 261 datatype exp = EPS | CLASS of bool array * int | CLOSURE of exp 262 | ALT of exp * exp | CAT of exp * exp | TRAIL of int 263 | END of int 264 265 (* flags describing input Lex spec. - unnecessary code is omitted *) 266 (* if possible *) 267 268 val CharFormat = ref false; 269 val UsesTrailingContext = ref false; 270 val UsesPrevNewLine = ref false; 271 272 (* flags for various bells & whistles that Lex has. These slow the 273 lexer down and should be omitted from production lexers (if you 274 really want speed) *) 275 276 val CountNewLines = ref false; 277 val PosArg = ref false; 278 val HaveReject = ref false; 279 280 (* Can increase size of character set *) 281 282 val CharSetSize: int ref = ref 129; 283 284 (* Can name structure or declare header code *) 285 286 val StrName = ref "Mlex" 287 val HeaderCode = ref "" 288 val HeaderDecl = ref false 289 val ArgCode = ref (NONE: string option) 290 val StrDecl = ref false 291 292 val ResetFlags = fn () => (CountNewLines := false; HaveReject := false; 293 PosArg := false; 294 UsesTrailingContext := false; 295 CharSetSize := 129; StrName := "Mlex"; 296 HeaderCode := ""; HeaderDecl:= false; 297 ArgCode := NONE; 298 StrDecl := false) 299 300 val LexOut = ref(TextIO.stdOut) 301 fun say x = TextIO.output(!LexOut, x) 302 303(* Union: merge two sorted lists of integers *) 304 305fun union(a,b) = let val rec merge = fn 306 (nil,nil,z) => z 307 | (nil,el::more,z) => merge(nil,more,el::z) 308 | (el::more,nil,z) => merge(more,nil,el::z) 309 | (x::morex,y::morey,z) => if (x:int)=(y:int) 310 then merge(morex,morey,x::z) 311 else if x>y then merge(morex,y::morey,x::z) 312 else merge(x::morex,morey,y::z) 313 in merge(rev a,rev b,nil) 314end 315 316(* Nullable: compute if a important expression parse tree node is nullable *) 317 318val rec nullable = fn 319 EPS => true 320 | CLASS(_) => false 321 | CLOSURE(_) => true 322 | ALT(n1,n2) => nullable(n1) orelse nullable(n2) 323 | CAT(n1,n2) => nullable(n1) andalso nullable(n2) 324 | TRAIL(_) => true 325 | END(_) => false 326 327(* FIRSTPOS: firstpos function for parse tree expressions *) 328 329and firstpos = fn 330 EPS => nil 331 | CLASS(_,i) => [i] 332 | CLOSURE(n) => firstpos(n) 333 | ALT(n1,n2) => union(firstpos(n1),firstpos(n2)) 334 | CAT(n1,n2) => if nullable(n1) then union(firstpos(n1),firstpos(n2)) 335 else firstpos(n1) 336 | TRAIL(i) => [i] 337 | END(i) => [i] 338 339(* LASTPOS: Lastpos function for parse tree expressions *) 340 341and lastpos = fn 342 EPS => nil 343 | CLASS(_,i) => [i] 344 | CLOSURE(n) => lastpos(n) 345 | ALT(n1,n2) => union(lastpos(n1),lastpos(n2)) 346 | CAT(n1,n2) => if nullable(n2) then union(lastpos(n1),lastpos(n2)) 347 else lastpos(n2) 348 | TRAIL(i) => [i] 349 | END(i) => [i] 350 ; 351 352(* ++: Increment an integer reference *) 353 354fun ++(x) : int = (x := !x + 1; !x); 355 356structure dict = 357 struct 358 type 'a relation = 'a * 'a -> bool 359 abstype ('b,'a) dictionary = DATA of { Table : ('b * 'a) list, 360 Leq : 'b * 'b -> bool } 361 with 362 exception LOOKUP 363 fun create Leqfunc = DATA { Table = nil, Leq = Leqfunc } 364 fun lookup (DATA { Table = entrylist, Leq = leq }) key = 365 let fun search [] = raise LOOKUP 366 | search((k,item)::entries) = 367 if leq(key,k) 368 then if leq(k,key) then item else raise LOOKUP 369 else search entries 370 in search entrylist 371 end 372 fun enter (DATA { Table = entrylist, Leq = leq }) 373 (newentry as (key : 'b,item :'a)) : ('b,'a) dictionary = 374 let val gt = fn a => fn b => not (leq(a,b)) 375 val eq = fn k => fn k' => (leq(k,k')) andalso (leq(k',k)) 376 fun update nil = [ newentry ] 377 | update ((entry as (k,_))::entries) = 378 if (eq key k) then newentry::entries 379 else if gt k key then newentry::(entry::entries) 380 else entry::(update entries) 381 in DATA { Table = update entrylist, Leq = leq } 382 end 383 fun listofdict (DATA { Table = entrylist,Leq = leq}) = 384 let fun f (nil,r) = rev r 385 | f (a::b,r) = f (b,a::r) 386 in f(entrylist,nil) 387 end 388 end 389end 390 391open dict; 392 393(* INPUT.ML : Input w/ one character push back capability *) 394 395val LineNum: int ref = ref 1; 396 397abstype ibuf = 398 BUF of TextIO.instream * {b : string ref, p : int ref} 399with 400 fun make_ibuf(s) = BUF (s, {b=ref"", p = ref 0}) 401 fun close_ibuf (BUF (s,_)) = TextIO.closeIn(s) 402 exception eof 403 fun getch (a as (BUF(s,{b,p}))) = 404 if (!p = (size (!b))) 405 then (b := TextIO.inputN(s, 1024); 406 p := 0; 407 if (size (!b))=0 408 then raise eof 409 else getch a) 410 else (let val ch = String.sub(!b,!p) 411 in (if ch = #"\n" 412 then LineNum := !LineNum + 1 413 else (); 414 p := !p + 1; 415 ch) 416 end) 417 fun ungetch(BUF(s,{b,p})) = ( 418 p := !p - 1; 419 if String.sub(!b,!p) = #"\n" 420 then LineNum := !LineNum - 1 421 else ()) 422end; 423 424exception Error 425 426fun prErr x = ( 427 TextIO.output (TextIO.stdErr, String.concat [ 428 "ml-lex: error, line ", (Int.toString (!LineNum)), ": ", x, "\n" 429 ]); 430 raise Error) 431fun prSynErr x = ( 432 TextIO.output (TextIO.stdErr, String.concat [ 433 "ml-lex: syntax error, line ", (Int.toString (!LineNum)), ": ", x, "\n" 434 ]); 435 raise Error) 436 437exception SyntaxError; (* error in user's input file *) 438 439exception LexError; (* unexpected error in lexer *) 440 441val LexBuf = ref(make_ibuf(TextIO.stdIn)); 442val LexState = ref 0; 443val NextTok = ref BOF; 444val inquote = ref false; 445 446fun AdvanceTok () : unit = let 447 fun isLetter c = 448 ((c >= #"a") andalso (c <= #"z")) orelse 449 ((c >= #"A") andalso (c <= #"Z")) 450 fun isDigit c = (c >= #"0") andalso (c <= #"9") 451 (* check for valid (non-leading) identifier character (added by JHR) *) 452 fun isIdentChr c = 453 ((isLetter c) orelse (isDigit c) orelse (c = #"_") orelse (c = #"'")) 454 fun atoi s = let 455 fun num (c::r, n) = if isDigit c 456 then num (r, 10*n + (Char.ord c - Char.ord #"0")) 457 else n 458 | num ([], n) = n 459 in 460 num (explode s, 0) 461 end 462 463 fun skipws () = (case nextch() 464 of #" " => skipws() 465 | #"\t" => skipws() 466 | #"\n" => skipws() 467 | x => x 468 (* end case *)) 469 470 and nextch () = getch(!LexBuf) 471 472 and escaped () = (case nextch() 473 of #"b" => #"\008" 474 | #"n" => #"\n" 475 | #"t" => #"\t" 476 | #"h" => #"\128" 477 | x => let 478 fun err t = prErr("illegal ascii escape '"^(implode(rev t))^"'") 479 fun cvt c = (Char.ord c - Char.ord #"0") 480 fun f (n: int, c, t) = if c=3 481 then if n >= (!CharSetSize) 482 then err t 483 else Char.chr n 484 else let val ch=nextch() 485 in 486 if isDigit ch 487 then f(n*10+(cvt ch), c+1, ch::t) 488 else err t 489 end 490 in 491 if isDigit x then f(cvt x, 1, [x]) else x 492 end 493 (* end case *)) 494 495 and onechar x = let val c = array(!CharSetSize, false) 496 in 497 update(c, Char.ord(x), true); CHARS(c) 498 end 499 500 in case !LexState of 0 => let val makeTok = fn () => 501 case skipws() 502 (* Lex % operators *) 503 of #"%" => (case nextch() of 504 #"%" => LEXMARK 505 | a => let fun f s = 506 let val a = nextch() 507 in if isLetter a then f(a::s) 508 else (ungetch(!LexBuf); 509 implode(rev s)) 510 end 511 in case f [a] 512 of "reject" => REJECT 513 | "count" => COUNT 514 | "full" => FULLCHARSET 515 | "s" => LEXSTATES 516 | "S" => LEXSTATES 517 | "structure" => STRUCT 518 | "header" => HEADER 519 | "arg" => ARG 520 | "posarg" => POSARG 521 | _ => prErr "unknown % operator " 522 end 523 ) 524 (* semicolon (for end of LEXSTATES) *) 525 | #";" => SEMI 526 (* anything else *) 527 | ch => if isLetter(ch) then 528 let fun getID matched = 529 let val x = nextch() 530(**** fix by JHR 531 in if isLetter(x) orelse isDigit(x) orelse 532 x = "_" orelse x = "'" 533****) 534 in if (isIdentChr x) 535 then getID (x::matched) 536 else (ungetch(!LexBuf); implode(rev matched)) 537 end 538 in ID(getID [ch]) 539 end 540 else (prSynErr ("bad character: " ^ String.str ch)) 541 in NextTok := makeTok() 542 end 543 | 1 => let val rec makeTok = fn () => 544 if !inquote then case nextch() of 545 (* inside quoted string *) 546 #"\\" => onechar(escaped()) 547 | #"\"" => (inquote := false; makeTok()) 548 | #"\n" => (prSynErr "end-of-line inside quoted string"; 549 inquote := false; makeTok()) 550 | x => onechar(x) 551 else case skipws() of 552 (* single character operators *) 553 #"?" => QMARK 554 | #"*" => STAR 555 | #"+" => PLUS 556 | #"|" => BAR 557 | #"(" => LP 558 | #")" => RP 559 | #"^" => CARAT 560 | #"$" => DOLLAR 561 | #"/" => SLASH 562 | #";" => SEMI 563 | #"." => let val c = array(!CharSetSize,true) in 564 update(c,10,false); CHARS(c) 565 end 566 (* assign and arrow *) 567 | #"=" => let val c = nextch() in 568 if c = #">" then ARROW else (ungetch(!LexBuf); ASSIGN) 569 end 570 (* character set *) 571 | #"[" => let val rec classch = fn () => let val x = skipws() 572 in if x = #"\\" then escaped() else x 573 end; 574 val first = classch(); 575 val flag = (first <> #"^"); 576 val c = array(!CharSetSize,not flag); 577 fun add NONE = () 578 | add (SOME x) = update(c, Char.ord(x), flag) 579 and range (x, y) = if x>y 580 then (prErr "bad char. range") 581 else let 582 val i = ref(Char.ord(x)) and j = Char.ord(y) 583 in while !i<=j do ( 584 add (SOME(Char.chr(!i))); 585 i := !i + 1) 586 end 587 and getClass last = (case classch() 588 of #"]" => (add(last); c) 589 | #"-" => (case last 590 of NONE => getClass(SOME #"-") 591 | (SOME last') => let val x = classch() 592 in 593 if x = #"]" 594 then (add(last); add(SOME #"-"); c) 595 else (range(last',x); getClass(NONE)) 596 end 597 (* end case *)) 598 | x => (add(last); getClass(SOME x)) 599 (* end case *)) 600 in CHARS(getClass(if first = #"^" then NONE else SOME first)) 601 end 602 (* Start States specification *) 603 | #"<" => let val rec get_state = fn (prev,matched) => 604 case nextch() of 605 #">" => matched::prev 606 | #"," => get_state(matched::prev,"") 607 | x => if isIdentChr(x) 608 then get_state(prev,matched ^ String.str x) 609 else (prSynErr "bad start state list") 610 in STATE(get_state(nil,"")) 611 end 612 (* {id} or repititions *) 613 | #"{" => let val ch = nextch() in if isLetter(ch) then 614 let fun getID matched = (case nextch() 615 of #"}" => matched 616 | x => if (isIdentChr x) then 617 getID(matched ^ String.str x) 618 else (prErr "invalid char. class name") 619 (* end case *)) 620 in ID(getID(String.str ch)) 621 end 622 else if isDigit(ch) then 623 let fun get_r (matched, r1) = (case nextch() 624 of #"}" => let val n = atoi(matched) in 625 if r1 = ~1 then (n,n) else (r1,n) 626 end 627 | #"," => if r1 = ~1 then get_r("",atoi(matched)) 628 else (prErr "invalid repetitions spec.") 629 | x => if isDigit(x) 630 then get_r(matched ^ String.str x,r1) 631 else (prErr "invalid char in repetitions spec") 632 (* end case *)) 633 in REPS(get_r(String.str ch,~1)) 634 end 635 else (prErr "bad repetitions spec") 636 end 637 (* Lex % operators *) 638 | #"\\" => onechar(escaped()) 639 (* start quoted string *) 640 | #"\"" => (inquote := true; makeTok()) 641 (* anything else *) 642 | ch => onechar(ch) 643 in NextTok := makeTok() 644 end 645 | 2 => NextTok := 646 (case skipws() of 647 #"(" => 648 let 649 fun loop_to_end (backslash, x) = 650 let 651 val c = getch (! LexBuf) 652 val notb = not backslash 653 val nstr = c :: x 654 in 655 case c of 656 #"\"" => if notb then nstr 657 else loop_to_end (false, nstr) 658 | _ => loop_to_end (c = #"\\" andalso notb, nstr) 659 end 660 fun GetAct (lpct, x) = 661 let 662 val c = getch (! LexBuf) 663 val nstr = c :: x 664 in 665 case c of 666 #"\"" => GetAct (lpct, loop_to_end (false, nstr)) 667 | #"(" => GetAct (lpct + 1, nstr) 668 | #")" => if lpct = 0 then implode (rev x) 669 else GetAct(lpct - 1, nstr) 670 | _ => GetAct(lpct, nstr) 671 end 672 in 673 ACTION (GetAct (0,nil)) 674 end 675 | #";" => SEMI 676 | c => (prSynErr ("invalid character " ^ String.str c))) 677 | _ => raise LexError 678end 679handle eof => NextTok := EOF ; 680 681fun GetTok (_:unit) : token = 682 let val t = !NextTok in AdvanceTok(); t 683 end; 684val SymTab = ref (create String.<=) : (string,exp) dictionary ref 685 686fun GetExp () : exp = 687 688 let val rec optional = fn e => ALT(EPS,e) 689 690 and lookup' = fn name => 691 lookup(!SymTab) name 692 handle LOOKUP => prErr ("bad regular expression name: "^ 693 name) 694 695 and newline = fn () => let val c = array(!CharSetSize,false) in 696 update(c,10,true); c 697 end 698 699 and endline = fn e => trail(e,CLASS(newline(),0)) 700 701 and trail = fn (e1,e2) => CAT(CAT(e1,TRAIL(0)),e2) 702 703 and closure1 = fn e => CAT(e,CLOSURE(e)) 704 705 and repeat = fn (min,max,e) => let val rec rep = fn 706 (0,0) => EPS 707 | (0,1) => ALT(e,EPS) 708 | (0,i) => CAT(rep(0,1),rep(0,i-1)) 709 | (i,j) => CAT(e,rep(i-1,j-1)) 710 in rep(min,max) 711 end 712 713 and exp0 = fn () => case GetTok() of 714 CHARS(c) => exp1(CLASS(c,0)) 715 | LP => let val e = exp0() in 716 if !NextTok = RP then 717 (AdvanceTok(); exp1(e)) 718 else (prSynErr "missing '('") end 719 | ID(name) => exp1(lookup' name) 720 | _ => raise SyntaxError 721 722 and exp1 = fn (e) => case !NextTok of 723 SEMI => e 724 | ARROW => e 725 | EOF => e 726 | LP => exp2(e,exp0()) 727 | RP => e 728 | t => (AdvanceTok(); case t of 729 QMARK => exp1(optional(e)) 730 | STAR => exp1(CLOSURE(e)) 731 | PLUS => exp1(closure1(e)) 732 | CHARS(c) => exp2(e,CLASS(c,0)) 733 | BAR => ALT(e,exp0()) 734 | DOLLAR => (UsesTrailingContext := true; endline(e)) 735 | SLASH => (UsesTrailingContext := true; 736 trail(e,exp0())) 737 | REPS(i,j) => exp1(repeat(i,j,e)) 738 | ID(name) => exp2(e,lookup' name) 739 | _ => raise SyntaxError) 740 741 and exp2 = fn (e1,e2) => case !NextTok of 742 SEMI => CAT(e1,e2) 743 | ARROW => CAT(e1,e2) 744 | EOF => CAT(e1,e2) 745 | LP => exp2(CAT(e1,e2),exp0()) 746 | RP => CAT(e1,e2) 747 | t => (AdvanceTok(); case t of 748 QMARK => exp1(CAT(e1,optional(e2))) 749 | STAR => exp1(CAT(e1,CLOSURE(e2))) 750 | PLUS => exp1(CAT(e1,closure1(e2))) 751 | CHARS(c) => exp2(CAT(e1,e2),CLASS(c,0)) 752 | BAR => ALT(CAT(e1,e2),exp0()) 753 | DOLLAR => (UsesTrailingContext := true; 754 endline(CAT(e1,e2))) 755 | SLASH => (UsesTrailingContext := true; 756 trail(CAT(e1,e2),exp0())) 757 | REPS(i,j) => exp1(CAT(e1,repeat(i,j,e2))) 758 | ID(name) => exp2(CAT(e1,e2),lookup' name) 759 | _ => raise SyntaxError) 760in exp0() 761end; 762val StateTab = ref(create(String.<=)) : (string,int) dictionary ref 763 764val StateNum: int ref = ref 0; 765 766fun GetStates () : int list = 767 768 let fun add nil sl = sl 769 | add (x::y) sl = add y (union ([lookup (!StateTab)(x) 770 handle LOOKUP => 771 prErr ("bad state name: "^x) 772 ],sl)) 773 774 fun addall i sl = 775 if i <= !StateNum then addall (i+2) (union ([i],sl)) 776 else sl 777 778 fun incall (x::y) = (x+1)::incall y 779 | incall nil = nil 780 781 fun addincs nil = nil 782 | addincs (x::y) = x::(x+1)::addincs y 783 784 val state_list = 785 case !NextTok of 786 STATE s => (AdvanceTok(); LexState := 1; add s nil) 787 | _ => addall 1 nil 788 789 in case !NextTok 790 of CARAT => (LexState := 1; AdvanceTok(); UsesPrevNewLine := true; 791 incall state_list) 792 | _ => addincs state_list 793 end 794 795val LeafNum: int ref = ref ~1; 796 797fun renum(e : exp) : exp = 798 let val rec label = fn 799 EPS => EPS 800 | CLASS(x,_) => CLASS(x,++LeafNum) 801 | CLOSURE(e) => CLOSURE(label(e)) 802 | ALT(e1,e2) => ALT(label(e1),label(e2)) 803 | CAT(e1,e2) => CAT(label(e1),label(e2)) 804 | TRAIL(i) => TRAIL(++LeafNum) 805 | END(i) => END(++LeafNum) 806in label(e) 807end; 808 809exception ParseError; 810 811fun parse() : (string * (int list * exp) list * ((string,string) dictionary)) = 812 let val Accept = ref (create String.<=) : (string,string) dictionary ref 813 val rec ParseRtns = fn l => case getch(!LexBuf) of 814 #"%" => let val c = getch(!LexBuf) in 815 if c = #"%" then (implode (rev l)) 816 else ParseRtns(c :: #"%" :: l) 817 end 818 | c => ParseRtns(c::l) 819 and ParseDefs = fn () => 820 (LexState:=0; AdvanceTok(); case !NextTok of 821 LEXMARK => () 822 | LEXSTATES => 823 let fun f () = (case !NextTok of (ID i) => 824 (StateTab := enter(!StateTab)(i,++StateNum); 825 ++StateNum; AdvanceTok(); f()) 826 | _ => ()) 827 in AdvanceTok(); f (); 828 if !NextTok=SEMI then ParseDefs() else 829 (prSynErr "expected ';'") 830 end 831 | ID x => (LexState:=1; AdvanceTok(); if GetTok() = ASSIGN 832 then (SymTab := enter(!SymTab)(x,GetExp()); 833 if !NextTok = SEMI then ParseDefs() 834 else (prSynErr "expected ';'")) 835 else raise SyntaxError) 836 | REJECT => (HaveReject := true; ParseDefs()) 837 | COUNT => (CountNewLines := true; ParseDefs()) 838 | FULLCHARSET => (CharSetSize := 256; ParseDefs()) 839 | HEADER => (LexState := 2; AdvanceTok(); 840 case GetTok() 841 of ACTION s => 842 if (!StrDecl) then 843 (prErr "cannot have both %structure and %header \ 844 \declarations") 845 else if (!HeaderDecl) then 846 (prErr "duplicate %header declarations") 847 else 848 (HeaderCode := s; LexState := 0; 849 HeaderDecl := true; ParseDefs()) 850 | _ => raise SyntaxError) 851 | POSARG => (PosArg := true; ParseDefs()) 852 | ARG => (LexState := 2; AdvanceTok(); 853 case GetTok() 854 of ACTION s => 855 (case !ArgCode 856 of SOME _ => prErr "duplicate %arg declarations" 857 | NONE => ArgCode := SOME s; 858 LexState := 0; 859 ParseDefs()) 860 | _ => raise SyntaxError) 861 | STRUCT => (AdvanceTok(); 862 case !NextTok of 863 (ID i) => 864 if (!HeaderDecl) then 865 (prErr "cannot have both %structure and %header \ 866 \declarations") 867 else if (!StrDecl) then 868 (prErr "duplicate %structure declarations") 869 else (StrName := i; StrDecl := true) 870 | _ => (prErr "expected ID"); 871 ParseDefs()) 872 | _ => raise SyntaxError) 873 and ParseRules = 874 fn rules => (LexState:=1; AdvanceTok(); case !NextTok of 875 EOF => rules 876 | _ => 877 let val s = GetStates() 878 val e = renum(CAT(GetExp(),END(0))) 879 in 880 if !NextTok = ARROW then 881 (LexState:=2; AdvanceTok(); 882 case GetTok() of ACTION(act) => 883 if !NextTok=SEMI then 884 (Accept:=enter(!Accept) (Int.toString (!LeafNum),act); 885 ParseRules((s,e)::rules)) 886 else (prSynErr "expected ';'") 887 | _ => raise SyntaxError) 888 else (prSynErr "expected '=>'") 889 end) 890in let val usercode = ParseRtns nil 891 in (ParseDefs(); (usercode,ParseRules(nil),!Accept)) 892 end 893end handle SyntaxError => (prSynErr "") 894 895fun makebegin () : unit = 896 let fun make nil = () 897 | make ((x,n:int)::y)=(say "val "; say x; say " = " ; 898 say "STARTSTATE "; 899 say (Int.toString n); say ";\n"; make y) 900 in say "\n(* start state definitions *)\n\n"; make(listofdict(!StateTab)) 901 end 902 903structure L = 904 struct 905 nonfix > 906 type key = int list * string 907 fun > ((key,item:string),(key',item')) = 908 let fun f ((a:int)::a') (b::b') = if Int.> (a,b) then true 909 else if a=b then f a' b' 910 else false 911 | f _ _ = false 912 in f key key' 913 end 914 end 915 916structure RB = RedBlack(L) 917 918fun maketable (fins:(int * (int list)) list, 919 tcs :(int * (int list)) list, 920 tcpairs: (int * int) list, 921 trans : (int*(int list)) list) : unit = 922 923(* Fins = (state #, list of final leaves for the state) list 924 tcs = (state #, list of trailing context leaves which begin in this state) 925 list 926 tcpairs = (trailing context leaf, end leaf) list 927 trans = (state #,list of transitions for state) list *) 928 929 let datatype elem = N of int | T of int | D of int 930 val count = ref 0 931 val _ = (if length(trans)<256 then CharFormat := true 932 else CharFormat := false; 933 if !UsesTrailingContext then 934 (say "\ndatatype yyfinstate = N of int | \ 935 \ T of int | D of int\n") 936 else say "\ndatatype yyfinstate = N of int"; 937 say "\ntype statedata = {fin : yyfinstate list, trans: "; 938 case !CharFormat of 939 true => say "string}" 940 | false => say "int Vector.vector}"; 941 say "\n(* transition & final state table *)\nval tab = let\n"; 942 case !CharFormat of 943 true => () 944 | false => 945 (say "fun decode s k =\n"; 946 say " let val k' = k + k\n"; 947 say " val hi = Char.ord(String.sub(s, k'))\n"; 948 say " val lo = Char.ord(String.sub(s, k' + 1))\n"; 949 say " in hi * 256 + lo end\n")) 950 951 val newfins = 952 let fun IsEndLeaf t = 953 let fun f ((l,e)::r) = if (e=t) then true else f r 954 | f nil = false in f tcpairs end 955 956 fun GetEndLeaf t = 957 let fun f ((tl,el)::r) = if (tl=t) then el else f r 958 | f [] = raise Fail "GetEndLeaf" 959 in f tcpairs 960 end 961 fun GetTrConLeaves s = 962 let fun f ((s',l)::r) = if (s = s') then l else f r 963 | f nil = nil 964 in f tcs 965 end 966 fun sort_leaves s = 967 let fun insert (x:int) (a::b) = 968 if (x <= a) then x::(a::b) 969 else a::(insert x b) 970 | insert x nil = [x] 971 in List.foldr (fn (x,r) => insert x r) [] s 972 end 973 fun conv a = if (IsEndLeaf a) then (D a) else (N a) 974 fun merge (a::a',b::b') = 975 if (a <= b) then (conv a)::merge(a',b::b') 976 else (T b)::(merge(a::a',b')) 977 | merge (a::a',nil) = (conv a)::(merge (a',nil)) 978 | merge (nil,b::b') = (T b)::(merge (b',nil)) 979 | merge (nil,nil) = nil 980 981 in map (fn (x,l) => 982 rev (merge (l, 983 sort_leaves (map (fn x => GetEndLeaf x) (GetTrConLeaves x))))) 984 fins 985 end 986 987 val rs = 988 let open RB 989 fun makeItems x = 990 let fun emit8(x, pos) = 991 let val s = StringCvt.padLeft #"0" 3 (Int.toString x) 992 in 993 case pos 994 of 16 => (say "\\\n\\\\"; say s; 1) 995 | _ => (say "\\"; say s; pos+1) 996 end 997 fun emit16(x, pos) = 998 let val hi8 = x div 256 999 val lo8 = x - hi8 * 256 (* x rem 256 *) 1000 in 1001 emit8(lo8, emit8(hi8, pos)) 1002 end 1003 fun MakeString([], _, _) = () 1004 | MakeString(x::xs, emitter, pos) = 1005 MakeString(xs, emitter, emitter(x, pos)) 1006 in case !CharFormat of 1007 true => (say " \n\""; MakeString(x,emit8,0); say "\"\n") 1008 | false => (say (Int.toString(length x)); 1009 say ", \n\""; MakeString(x,emit16,0); say "\"\n") 1010 end 1011 1012 fun makeEntry(nil,rs,t) = rev rs 1013 | makeEntry(((l:int,x)::y),rs,t) = 1014 let val name = (Int.toString l) 1015 in let val (r,n) = lookup ((x,name),t) 1016 in makeEntry(y,(n::rs),t) 1017 end handle notfound _ => 1018 (count := !count+1; 1019 say " ("; say name; say ","; 1020 makeItems x; say "),\n"; 1021 makeEntry(y,(name::rs),(insert ((x,name),t)))) 1022 end 1023 1024 val _ = say "val s = [ \n" 1025 val res = makeEntry(trans,nil,empty) 1026 val _ = 1027 case !CharFormat 1028 of true => (say "(0, \"\")]\n"; say "fun f x = x \n") 1029 | false => (say "(0, 0, \"\")]\n"; 1030 say "fun f(n, i, x) = (n, Vector.tabulate(i, decode x)) \n") 1031 1032 val _ = say "val s = map f (rev (tl (rev s))) \n" 1033 val _ = say "exception LexHackingError \n" 1034 val _ = say "fun look ((j,x)::r, i) = if i = j then x else look(r, i) \n" 1035 val _ = say " | look ([], i) = raise LexHackingError\n" 1036 1037 val _ = say "fun g {fin=x, trans=i} = {fin=x, trans=look(s,i)} \n" 1038 in res 1039 end 1040 1041 fun makeTable(nil,nil) = () 1042 | makeTable(a::a',b::b') = 1043 let fun makeItems nil = () 1044 | makeItems (hd::tl) = 1045 let val (t,n) = 1046 case hd of 1047 (N i) => ("(N ",i) 1048 | (T i) => ("(T ",i) 1049 | (D i) => ("(D ",i) 1050 in (say t; say (Int.toString n); say ")"; 1051 if null tl 1052 then () 1053 else (say ","; makeItems tl)) 1054 end 1055 in (say "{fin = ["; makeItems b; 1056 say "], trans = "; say a; say "}"; 1057 if null a' 1058 then () 1059 else (say ",\n"; makeTable(a',b'))) 1060 end 1061 | makeTable _ = raise Fail "makeTable" 1062 1063 fun msg x = TextIO.output(TextIO.stdOut, x) 1064 1065 in (say "in Vector.fromList(map g \n["; makeTable(rs,newfins); 1066 say "])\nend\n"; 1067 msg ("\nNumber of states = " ^ (Int.toString (length trans))); 1068 msg ("\nNumber of distinct rows = " ^ (Int.toString (!count))); 1069 msg ("\nApprox. memory size of trans. table = " ^ 1070 (Int.toString (!count*(!CharSetSize)*(if !CharFormat then 1 else 8)))); 1071 msg " bytes\n") 1072end 1073 1074(* makeaccept: Takes a (string,string) dictionary, prints case statement for 1075 accepting leaf actions. The key strings are the leaf #'s, the data strings 1076 are the actions *) 1077 1078fun makeaccept ends = 1079 let fun startline f = if f then say " " else say "| " 1080 fun make(nil,f) = (startline f; say "_ => raise Internal.LexerError\n") 1081 | make((x,a)::y,f) = (startline f; say x; say " => "; 1082 if Substring.size(#2 (Substring.position "yytext" (Substring.full a))) = 0 1083 then 1084 (say "("; say a; say ")") 1085 else (say "let val yytext=yymktext() in "; 1086 say a; say " end"); 1087 say "\n"; make(y,false)) 1088 in make (listofdict(ends),true) 1089 end 1090 1091fun leafdata(e:(int list * exp) list) = 1092 let val fp = array(!LeafNum + 1,nil) 1093 and leaf = array(!LeafNum + 1,EPS) 1094 and tcpairs = ref nil 1095 and trailmark = ref ~1; 1096 val rec add = fn 1097 (nil,x) => () 1098 | (hd::tl,x) => (update(fp,hd,union(fp sub hd,x)); 1099 add(tl,x)) 1100 and moredata = fn 1101 CLOSURE(e1) => 1102 (moredata(e1); add(lastpos(e1),firstpos(e1))) 1103 | ALT(e1,e2) => (moredata(e1); moredata(e2)) 1104 | CAT(e1,e2) => (moredata(e1); moredata(e2); 1105 add(lastpos(e1),firstpos(e2))) 1106 | CLASS(x,i) => update(leaf,i,CLASS(x,i)) 1107 | TRAIL(i) => (update(leaf,i,TRAIL(i)); if !trailmark = ~1 1108 then trailmark := i else ()) 1109 | END(i) => (update(leaf,i,END(i)); if !trailmark <> ~1 1110 then (tcpairs := (!trailmark,i)::(!tcpairs); 1111 trailmark := ~1) else ()) 1112 | _ => () 1113 and makedata = fn 1114 nil => () 1115 | (_,x)::tl => (moredata(x);makedata(tl)) 1116 in trailmark := ~1; makedata(e); (fp,leaf,!tcpairs) 1117 end; 1118 1119fun makedfa(rules) = 1120let val StateTab = ref (create(String.<=)) : (string,int) dictionary ref 1121 val fintab = ref (create(Int.<=)) : (int,(int list)) dictionary ref 1122 val transtab = ref (create(Int.<=)) : (int,int list) dictionary ref 1123 val tctab = ref (create(Int.<=)) : (int,(int list)) dictionary ref 1124 val (fp, leaf, tcpairs) = leafdata(rules); 1125 1126fun visit (state,statenum) = 1127 let val transitions = gettrans(state) in 1128 fintab := enter(!fintab)(statenum,getfin(state)); 1129 tctab := enter(!tctab)(statenum,gettc(state)); 1130 transtab := enter(!transtab)(statenum,transitions) 1131 end 1132 1133and visitstarts (states) = 1134 let fun vs nil i = () 1135 | vs (hd::tl) i = (visit (hd,i); vs tl (i+1)) 1136 in vs states 0 1137 end 1138 1139and hashstate(s: int list) = 1140 let val rec hs = 1141 fn (nil,z) => z 1142 | ((x:int)::y,z) => hs(y,z ^ " " ^ (Int.toString x)) 1143 in hs(s,"") 1144 end 1145 1146and find(s) = lookup(!StateTab)(hashstate(s)) 1147 1148and add(s,n) = StateTab := enter(!StateTab)(hashstate(s),n) 1149 1150and getstate (state) = 1151 find(state) 1152 handle LOOKUP => let val n = ++StateNum in 1153 add(state,n); visit(state,n); n 1154 end 1155 1156and getfin state = 1157 let fun f nil fins = fins 1158 | f (hd::tl) fins = 1159 case (leaf sub hd) 1160 of END _ => f tl (hd::fins) 1161 | _ => f tl fins 1162 in f state nil 1163 end 1164 1165and gettc state = 1166 let fun f nil fins = fins 1167 | f (hd::tl) fins = 1168 case (leaf sub hd) 1169 of TRAIL _ => f tl (hd::fins) 1170 | _ => f tl fins 1171 in f state nil 1172 end 1173 1174and gettrans (state) = 1175 let fun loop c tlist = 1176 let fun cktrans nil r = r 1177 | cktrans (hd::tl) r = 1178 case (leaf sub hd) of 1179 CLASS(i,_)=> 1180 (if (i sub c) then cktrans tl (union(r,fp sub hd)) 1181 else cktrans tl r handle Subscript => 1182 cktrans tl r 1183 ) 1184 | _ => cktrans tl r 1185 in if c >= 0 then 1186 let val v=cktrans state nil 1187 in loop (c-1) (if v=nil then 0::tlist else (getstate v)::tlist) 1188 end 1189 else tlist 1190 end 1191 in loop ((!CharSetSize) - 1) nil 1192 end 1193 1194and startstates() = 1195 let val startarray = array(!StateNum + 1, nil); 1196 fun listofarray(a,n) = 1197 let fun f i l = if i >= 0 then f (i-1) ((a sub i)::l) else l 1198 in f (n-1) nil end 1199 val rec makess = fn 1200 nil => () 1201 | (startlist,e)::tl => (fix(startlist,firstpos(e));makess(tl)) 1202 and fix = fn 1203 (nil,_) => () 1204 | (s::tl,firsts) => (update(startarray,s, 1205 union(firsts,startarray sub s)); 1206 fix(tl,firsts)) 1207 in makess(rules);listofarray(startarray, !StateNum + 1) 1208 end 1209 1210in visitstarts(startstates()); 1211(listofdict(!fintab),listofdict(!transtab),listofdict(!tctab),tcpairs) 1212end 1213 1214val skel_hd = 1215" struct\n\ 1216\ type int = Int.int\n\ 1217\ structure UserDeclarations =\n\ 1218\ struct\n\ 1219\" 1220 1221val skel_mid2 = 1222" | Internal.D k => action (i,(acts::l),k::rs)\n\ 1223\ | Internal.T k =>\n\ 1224\ let fun f (a::b,r) =\n\ 1225\ if a=k\n\ 1226\ then action(i,(((Internal.N a)::acts)::l),(b@r))\n\ 1227\ else f (b,a::r)\n\ 1228\ | f (nil,r) = action(i,(acts::l),rs)\n\ 1229\ in f (rs,nil)\n\ 1230\ end\n\ 1231\" 1232 1233fun lexGen infile = 1234 let val outfile = infile ^ ".sml" 1235 fun PrintLexer (ends) = 1236 let val sayln = fn x => (say x; say "\n") 1237 in case !ArgCode 1238 of NONE => (sayln "fun lex () : Internal.result ="; 1239 sayln "let fun continue() = lex() in") 1240 | SOME s => (say "fun lex "; say "(yyarg as ("; say s; sayln ")) ="; 1241 sayln "let fun continue() : Internal.result = "); 1242 say " let fun scan (s,AcceptingLeaves : Internal.yyfinstate"; 1243 sayln " list list,l,i0: int) ="; 1244 if !UsesTrailingContext 1245 then say "\tlet fun action (i: int,nil,rs)" 1246 else say "\tlet fun action (i: int,nil)"; 1247 sayln " = raise LexError"; 1248 if !UsesTrailingContext 1249 then sayln "\t| action (i,nil::l,rs) = action(i-1,l,rs)" 1250 else sayln "\t| action (i,nil::l) = action (i-1,l)"; 1251 if !UsesTrailingContext 1252 then sayln "\t| action (i,(node::acts)::l,rs) =" 1253 else sayln "\t| action (i,(node::acts)::l) ="; 1254 sayln "\t\tcase node of"; 1255 sayln "\t\t Internal.N yyk => "; 1256 sayln "\t\t\t(let fun yymktext() = String.substring(!yyb,i0,i-i0)\n\ 1257 \\t\t\t val yypos: int = i0+ !yygone"; 1258 if !CountNewLines 1259 then (sayln "\t\t\tval _ = yylineno := CharVector.foldl"; 1260 sayln "\t\t\t\t(fn (_,#\"\\n\", n) => n+1 | (_,_, n) => n) (!yylineno) (!yyb,i0,SOME(i-i0))") 1261 else (); 1262 if !HaveReject 1263 then (say "\t\t\tfun REJECT() = action(i,acts::l"; 1264 if !UsesTrailingContext 1265 then sayln ",rs)" else sayln ")") 1266 else (); 1267 sayln "\t\t\topen UserDeclarations Internal.StartStates"; 1268 sayln " in (yybufpos := i; case yyk of "; 1269 sayln ""; 1270 sayln "\t\t\t(* Application actions *)\n"; 1271 makeaccept(ends); 1272 say "\n\t\t) end "; 1273 say ")\n\n"; 1274 if (!UsesTrailingContext) then say skel_mid2 else (); 1275 sayln "\tval {fin,trans} = Vector.sub (Internal.tab, s)"; 1276 sayln "\tval NewAcceptingLeaves = fin::AcceptingLeaves"; 1277 sayln "\tin if l = !yybl then"; 1278 sayln "\t if trans = #trans(Vector.sub(Internal.tab,0))"; 1279 sayln "\t then action(l,NewAcceptingLeaves"; 1280 if !UsesTrailingContext then say ",nil" else (); 1281 say ") else"; 1282 sayln "\t let val newchars= if !yydone then \"\" else yyinput 1024"; 1283 sayln "\t in if (String.size newchars)=0"; 1284 sayln "\t\t then (yydone := true;"; 1285 say "\t\t if (l=i0) then UserDeclarations.eof "; 1286 sayln (case !ArgCode of NONE => "()" | SOME _ => "yyarg"); 1287 say "\t\t else action(l,NewAcceptingLeaves"; 1288 if !UsesTrailingContext then 1289 sayln ",nil))" else sayln "))"; 1290 sayln "\t\t else (if i0=l then yyb := newchars"; 1291 sayln "\t\t else yyb := String.substring(!yyb,i0,l-i0)^newchars;"; 1292 sayln "\t\t yygone := !yygone+i0;"; 1293 sayln "\t\t yybl := String.size (!yyb);"; 1294 sayln "\t\t scan (s,AcceptingLeaves,l-i0,0))"; 1295 sayln "\t end"; 1296 sayln "\t else let val NewChar = Char.ord (CharVector.sub (!yyb,l))"; 1297 if !CharSetSize=129 1298 then sayln "\t\tval NewChar = if NewChar<128 then NewChar else 128" 1299 else (); 1300 say "\t\tval NewState = "; 1301 sayln (if !CharFormat 1302 then "Char.ord (CharVector.sub (trans,NewChar))" 1303 else "Vector.sub (trans, NewChar)"); 1304 say "\t\tin if NewState=0 then action(l,NewAcceptingLeaves"; 1305 if !UsesTrailingContext then sayln ",nil)" else sayln ")"; 1306 sayln "\t\telse scan(NewState,NewAcceptingLeaves,l+1,i0)"; 1307 sayln "\tend"; 1308 sayln "\tend"; 1309 if !UsesPrevNewLine then () else sayln "(*"; 1310 sayln "\tval start= if String.substring(!yyb,!yybufpos-1,1)=\"\\n\""; 1311 sayln "then !yybegin+1 else !yybegin"; 1312 if !UsesPrevNewLine then () else sayln "*)"; 1313 say "\tin scan("; 1314 if !UsesPrevNewLine then say "start" 1315 else say "!yybegin (* start *)"; 1316 sayln ",nil,!yybufpos,!yybufpos)"; 1317 sayln " end"; 1318 sayln (case !ArgCode of NONE => "end" | SOME _ => "in continue end"); 1319 sayln " in lex"; 1320 sayln " end"; 1321 sayln "end" 1322 end 1323 1324 in (UsesPrevNewLine := false; 1325 ResetFlags(); 1326 LexBuf := make_ibuf(TextIO.openIn infile); 1327 NextTok := BOF; 1328 inquote := false; 1329 LexOut := TextIO.openOut(outfile); 1330 StateNum := 2; 1331 LineNum := 1; 1332 StateTab := enter(create(String.<=))("INITIAL",1); 1333 LeafNum := ~1; 1334 let 1335 val (user_code,rules,ends) = 1336 parse() handle x => 1337 (close_ibuf(!LexBuf); 1338 TextIO.closeOut(!LexOut); 1339 OS.FileSys.remove outfile; 1340 raise x) 1341 val (fins,trans,tctab,tcpairs) = makedfa(rules) 1342 val _ = if !UsesTrailingContext then 1343 (close_ibuf(!LexBuf); 1344 TextIO.closeOut(!LexOut); 1345 OS.FileSys.remove outfile; 1346 prErr "lookahead is unimplemented") 1347 else () 1348 in 1349 if (!HeaderDecl) 1350 then say (!HeaderCode) 1351 else say ("structure " ^ (!StrName)); 1352 say "=\n"; 1353 say skel_hd; 1354 say user_code; 1355 say "end (* end of user routines *)\n"; 1356 say "exception LexError (* raised if illegal leaf "; 1357 say "action tried *)\n"; 1358 say "structure Internal =\n\tstruct\n"; 1359 maketable(fins,tctab,tcpairs,trans); 1360 say "structure StartStates =\n\tstruct\n"; 1361 say "\tdatatype yystartstate = STARTSTATE of int\n"; 1362 makebegin(); 1363 say "\nend\n"; 1364 say "type result = UserDeclarations.lexresult\n"; 1365 say "\texception LexerError (* raised if illegal leaf "; 1366 say "action tried *)\n"; 1367 say "end\n\n"; 1368 say "type int = Int.int\n"; 1369 say (if (!PosArg) then "fun makeLexer (yyinput: int -> string,yygone0:int) =\nlet\n" 1370 else "fun makeLexer (yyinput: int -> string) =\nlet\tval yygone0:int=0\n"); 1371 if !CountNewLines then say "\tval yylineno: int ref = ref 0\n\n" else (); 1372 say "\tval yyb = ref \"\\n\" \t\t(* buffer *)\n\ 1373 \\tval yybl: int ref = ref 1\t\t(*buffer length *)\n\ 1374 \\tval yybufpos: int ref = ref 1\t\t(* location of next character to use *)\n\ 1375 \\tval yygone: int ref = ref yygone0\t(* position in file of beginning of buffer *)\n\ 1376 \\tval yydone = ref false\t\t(* eof found yet? *)\n\ 1377 \\tval yybegin: int ref = ref 1\t\t(*Current 'start state' for lexer *)\n\ 1378 \\n\tval YYBEGIN = fn (Internal.StartStates.STARTSTATE x) =>\n\ 1379 \\t\t yybegin := x\n\n"; 1380 PrintLexer(ends); 1381 close_ibuf(!LexBuf); 1382 TextIO.closeOut(!LexOut) 1383 end) 1384 end 1385end 1386 1387