1(*
2 * @TAG(OTHER_PRINCETON_OSS)
3 *)
4(* This is an -*- sml -*- file *)
5(* Modified by Michael.Norrish@nicta.com.au on 2005-04-13 so that it compiles
6   with both mlton and mosml. *)
7(* Modified by sweeks@acm.org on 2000-8-24.
8 * Ported to MLton.
9 *)
10(*  Lexical analyzer generator for Standard ML.
11        Version 1.7.0, June 1998
12
13Copyright (c) 1989-1992 by Andrew W. Appel,
14   David R. Tarditi, James S. Mattson
15
16This software comes with ABSOLUTELY NO WARRANTY.
17This software is subject only to the PRINCETON STANDARD ML SOFTWARE LIBRARY
18COPYRIGHT NOTICE, LICENSE AND DISCLAIMER, (in the file "COPYRIGHT",
19distributed with this software). You may copy and distribute this software;
20see the COPYRIGHT NOTICE for details and restrictions.
21
22    Changes:
23        07/25/89 (drt): added %header declaration, code to place
24                user declarations at same level as makeLexer, etc.
25                This is needed for the parser generator.
26          /10/89 (appel): added %arg declaration (see lexgen.doc).
27          /04/90 (drt): fixed following bug: couldn't use the lexer after an
28                error occurred -- NextTok and inquote weren't being reset
29        10/22/91 (drt): disabled use of lookahead
30        10/23/92 (drt): disabled use of $ operator (which involves lookahead),
31                added handlers for dictionary lookup routine
32        11/02/92 (drt): changed handler for exception Reject in generated lexer
33                to Internal.Reject
34        02/01/94 (appel): Moved the exception handler for Reject in such
35                a way as to allow tail-recursion (improves performance
36                wonderfully!).
37        02/01/94 (appel): Fixed a bug in parsing of state names.
38        05/19/94 (Mikael Pettersson, mpe@ida.liu.se):
39                Transition tables are usually represented as strings, but
40                when the range is too large, int vectors constructed by
41                code like "Vector.vector[1,2,3,...]" are used instead.
42                The problem with this isn't that the vector itself takes
43                a lot of space, but that the code generated by SML/NJ to
44                construct the intermediate list at run-time is *HUGE*. My
45                fix is to encode an int vector as a string literal (using
46                two bytes per int) and emit code to decode the string to
47                a vector at run-time. SML/NJ compiles string literals into
48                substrings in the code, so this uses much less space.
49        06/02/94 (jhr): Modified export-lex.sml to conform to new installation
50                scheme.  Also removed tab characters from string literals.
51        10/05/94 (jhr): Changed generator to produce code that uses the new
52                basis style strings and characters.
53        10/06/94 (jhr) Modified code to compile under new basis style strings
54                and characters.
55        02/08/95 (jhr) Modified to use new List module interface.
56        05/18/95 (jhr) changed Vector.vector to Vector.fromList
57*
58 * $Log$
59 * Revision 1.3  2005/07/21 07:01:27  michaeln
60 * Get mllex to cope with actions that include strings with unbalanced
61 * parentheses.  (Code taken from SML/NJ's mllex.)
62 *
63 * Revision 1.2  2005/04/14 05:42:08  michaeln
64 * Slight change to allow the product of mllex foo to be compiled by mosml
65 * without having to use the -toplevel option.  Also a "fix" for an off-by-one
66 * issue that I think is a bug.
67 *
68 * Revision 1.1  2005/04/13 05:31:30  michaeln
69 * A MoscowML compilable version of the "standard" mllex tool, as used by
70 * both SML/NJ and MLton.  The source code is also compilable by mlton,
71 * though this is cute more than useful as mlton comes with a version of mllex
72 * anyway.
73 *
74 * Revision 1.1.1.1  1998/04/08 18:40:10  george
75 * Version 110.5
76 *
77 * Revision 1.9  1998/01/06 19:23:53  appel
78 *   added %posarg feature to permit position-within-file to be passed
79 *   as a parameter to makeLexer
80 *
81# Revision 1.8  1998/01/06  19:01:48  appel
82#   repaired error messages like "cannot have both %structure and %header"
83#
84# Revision 1.7  1998/01/06  18:55:49  appel
85#   permit %% to be unescaped within regular expressions
86#
87# Revision 1.6  1998/01/06  18:46:13  appel
88#   removed undocumented feature that permitted extra %% at end of rules
89#
90# Revision 1.5  1998/01/06  18:29:23  appel
91#   put yylineno variable inside makeLexer function
92#
93# Revision 1.4  1998/01/06  18:19:59  appel
94#   check for newline inside quoted string
95#
96# Revision 1.3  1997/10/04  03:52:13  dbm
97#   Fix to remove output file if ml-lex fails.
98#
99# Revision 1.2  1997/05/06  01:12:38  george
100# *** empty log message ***
101#
102 * Revision 1.2  1996/02/26  15:02:27  george
103 *    print no longer overloaded.
104 *    use of makestring has been removed and replaced with Int.toString ..
105 *    use of IO replaced with TextIO
106 *
107 * Revision 1.1.1.1  1996/01/31  16:01:15  george
108 * Version 109
109 *
110 *)
111
112(* Subject: lookahead in sml-lex
113   Reply-to: david.tarditi@CS.CMU.EDU
114   Date: Mon, 21 Oct 91 14:13:26 -0400
115
116There is a serious bug in the implementation of lookahead,
117as done in sml-lex, and described in Aho, Sethi, and Ullman,
118p. 134 "Implementing the Lookahead Operator"
119
120We have disallowed the use of lookahead for now because
121of this bug.
122
123As a counter-example to the implementation described in
124ASU, consider the following specification with the
125input string "aba" (this example is taken from
126a comp.compilers message from Dec. 1989, I think):
127
128type lexresult=unit
129val linenum = ref 1
130fun error x = TextIO.output(TextIO.stdErr, x ^ "\n")
131val eof = fn () => ()
132%%
133%structure Lex
134%%
135(a|ab)/ba => (print yytext; print "\n"; ());
136
137The ASU proposal works as follows. Suppose that we are
138using NFA's to represent our regular expressions.  Then to
139build an NFA for e1 / e2, we build an NFA n1 for e1
140and an NFA n2 for e2, and add an epsilon transition
141from e1 to e2.
142
143When lexing, when we encounter the end state of e1e2,
144we take as the end of the string the position in
145the string that was the last occurrence of the state of
146the NFA having a transition on the epsilon introduced
147for /.
148
149Using the example we have above, we'll have an NFA
150with the following states:
151
152
153   1 -- a --> 2 -- b --> 3
154              |          |
155              | epsilon  | epsilon
156              |          |
157              |------------> 4 -- b --> 5 -- a --> 6
158
159On our example, we get the following list of transitions:
160
161a   :   2, 4      (make an epsilon transition from 2 to 4)
162ab  :   3, 4, 5   (make an epsilon transition from 3 to 4)
163aba :   6
164
165If we chose the last state in which we made an epsilon transition,
166we'll chose the transition from 3 to 4, and end up with "ab"
167as our token, when we should have "a" as our token.
168
169*)
170
171functor RedBlack(B : sig type key
172             val > : key*key->bool
173             end):
174        sig type tree
175        type key
176        val empty : tree
177        val insert : key * tree -> tree
178        val lookup : key * tree -> key
179        exception notfound of key
180        end =
181struct
182 open B
183 datatype color = RED | BLACK
184 datatype tree = empty | tree of key * color * tree * tree
185 exception notfound of key
186
187 fun insert (key,t) =
188  let fun f empty = tree(key,RED,empty,empty)
189        | f (tree(k,BLACK,l,r)) =
190            if key>k
191            then case f r
192                 of r as tree(rk,RED, rl as tree(rlk,RED,rll,rlr),rr) =>
193                        (case l
194                         of tree(lk,RED,ll,lr) =>
195                                tree(k,RED,tree(lk,BLACK,ll,lr),
196                                           tree(rk,BLACK,rl,rr))
197                          | _ => tree(rlk,BLACK,tree(k,RED,l,rll),
198                                                tree(rk,RED,rlr,rr)))
199                  | r as tree(rk,RED,rl, rr as tree(rrk,RED,rrl,rrr)) =>
200                        (case l
201                         of tree(lk,RED,ll,lr) =>
202                                tree(k,RED,tree(lk,BLACK,ll,lr),
203                                           tree(rk,BLACK,rl,rr))
204                          | _ => tree(rk,BLACK,tree(k,RED,l,rl),rr))
205                  | r => tree(k,BLACK,l,r)
206            else if k>key
207            then case f l
208                 of l as tree(lk,RED,ll, lr as tree(lrk,RED,lrl,lrr)) =>
209                        (case r
210                         of tree(rk,RED,rl,rr) =>
211                                tree(k,RED,tree(lk,BLACK,ll,lr),
212                                           tree(rk,BLACK,rl,rr))
213                          | _ => tree(lrk,BLACK,tree(lk,RED,ll,lrl),
214                                                tree(k,RED,lrr,r)))
215                  | l as tree(lk,RED, ll as tree(llk,RED,lll,llr), lr) =>
216                        (case r
217                         of tree(rk,RED,rl,rr) =>
218                                tree(k,RED,tree(lk,BLACK,ll,lr),
219                                           tree(rk,BLACK,rl,rr))
220                          | _ => tree(lk,BLACK,ll,tree(k,RED,lr,r)))
221                  | l => tree(k,BLACK,l,r)
222            else tree(key,BLACK,l,r)
223        | f (tree(k,RED,l,r)) =
224            if key>k then tree(k,RED,l, f r)
225            else if k>key then tree(k,RED, f l, r)
226            else tree(key,RED,l,r)
227   in case f t
228      of tree(k,RED, l as tree(_,RED,_,_), r) => tree(k,BLACK,l,r)
229       | tree(k,RED, l, r as tree(_,RED,_,_)) => tree(k,BLACK,l,r)
230       | t => t
231  end
232
233
234 fun lookup (key,t) =
235  let fun look empty = raise (notfound key)
236        | look (tree(k,_,l,r)) =
237                if k>key then look l
238                else if key>k then look r
239                else k
240   in look t
241  end
242
243end
244
245signature LEXGEN =
246  sig
247     val lexGen: string -> unit
248  end
249
250structure LexGen: LEXGEN =
251   struct
252   open Array List
253   infix 9 sub
254
255   datatype token = CHARS of bool array | QMARK | STAR | PLUS | BAR
256          | LP | RP | CARAT | DOLLAR | SLASH | STATE of string list
257          | REPS of int * int | ID of string | ACTION of string
258          | BOF | EOF | ASSIGN | SEMI | ARROW | LEXMARK | LEXSTATES
259          | COUNT | REJECT | FULLCHARSET | STRUCT | HEADER | ARG | POSARG
260
261   datatype exp = EPS | CLASS of bool array * int | CLOSURE of exp
262                | ALT of exp * exp | CAT of exp * exp | TRAIL of int
263                | END of int
264
265   (* flags describing input Lex spec. - unnecessary code is omitted *)
266   (* if possible *)
267
268   val CharFormat = ref false;
269   val UsesTrailingContext = ref false;
270   val UsesPrevNewLine = ref false;
271
272   (* flags for various bells & whistles that Lex has.  These slow the
273      lexer down and should be omitted from production lexers (if you
274      really want speed) *)
275
276   val CountNewLines = ref false;
277   val PosArg = ref false;
278   val HaveReject = ref false;
279
280   (* Can increase size of character set *)
281
282   val CharSetSize: int ref = ref 129;
283
284   (* Can name structure or declare header code *)
285
286   val StrName = ref "Mlex"
287   val HeaderCode = ref ""
288   val HeaderDecl = ref false
289   val ArgCode = ref (NONE: string option)
290   val StrDecl = ref false
291
292   val ResetFlags = fn () => (CountNewLines := false; HaveReject := false;
293                              PosArg := false;
294                              UsesTrailingContext := false;
295                              CharSetSize := 129; StrName := "Mlex";
296                              HeaderCode := ""; HeaderDecl:= false;
297                              ArgCode := NONE;
298                              StrDecl := false)
299
300   val LexOut = ref(TextIO.stdOut)
301   fun say x = TextIO.output(!LexOut, x)
302
303(* Union: merge two sorted lists of integers *)
304
305fun union(a,b) = let val rec merge = fn
306          (nil,nil,z) => z
307        | (nil,el::more,z) => merge(nil,more,el::z)
308        | (el::more,nil,z) => merge(more,nil,el::z)
309        | (x::morex,y::morey,z) => if (x:int)=(y:int)
310                then merge(morex,morey,x::z)
311                else if x>y then merge(morex,y::morey,x::z)
312                else merge(x::morex,morey,y::z)
313        in merge(rev a,rev b,nil)
314end
315
316(* Nullable: compute if a important expression parse tree node is nullable *)
317
318val rec nullable = fn
319          EPS => true
320        | CLASS(_) => false
321        | CLOSURE(_) => true
322        | ALT(n1,n2) => nullable(n1) orelse nullable(n2)
323        | CAT(n1,n2) => nullable(n1) andalso nullable(n2)
324        | TRAIL(_) => true
325        | END(_) => false
326
327(* FIRSTPOS: firstpos function for parse tree expressions *)
328
329and firstpos = fn
330          EPS => nil
331        | CLASS(_,i) => [i]
332        | CLOSURE(n) => firstpos(n)
333        | ALT(n1,n2) => union(firstpos(n1),firstpos(n2))
334        | CAT(n1,n2) => if nullable(n1) then union(firstpos(n1),firstpos(n2))
335                else firstpos(n1)
336        | TRAIL(i) => [i]
337        | END(i) => [i]
338
339(* LASTPOS: Lastpos function for parse tree expressions *)
340
341and lastpos = fn
342          EPS => nil
343        | CLASS(_,i) => [i]
344        | CLOSURE(n) => lastpos(n)
345        | ALT(n1,n2) => union(lastpos(n1),lastpos(n2))
346        | CAT(n1,n2) => if nullable(n2) then union(lastpos(n1),lastpos(n2))
347                else lastpos(n2)
348        | TRAIL(i) => [i]
349        | END(i) => [i]
350        ;
351
352(* ++: Increment an integer reference *)
353
354fun ++(x) : int = (x := !x + 1; !x);
355
356structure dict =
357    struct
358        type 'a relation = 'a * 'a -> bool
359        abstype ('b,'a) dictionary = DATA of { Table : ('b * 'a) list,
360                                               Leq : 'b * 'b -> bool }
361        with
362                exception LOOKUP
363            fun create Leqfunc = DATA { Table = nil, Leq = Leqfunc }
364            fun lookup (DATA { Table = entrylist, Leq = leq }) key =
365                let fun search [] = raise LOOKUP
366                      | search((k,item)::entries) =
367                        if leq(key,k)
368                        then if leq(k,key) then item else raise LOOKUP
369                        else search entries
370                in search entrylist
371                end
372             fun enter (DATA { Table = entrylist, Leq = leq })
373                (newentry as (key : 'b,item :'a)) : ('b,'a) dictionary =
374                   let val gt = fn a => fn b => not (leq(a,b))
375                       val eq = fn k => fn k' => (leq(k,k')) andalso (leq(k',k))
376                       fun update nil = [ newentry ]
377                         | update ((entry as (k,_))::entries) =
378                              if (eq  key k) then newentry::entries
379                              else if gt k key then newentry::(entry::entries)
380                              else entry::(update entries)
381                   in DATA { Table = update entrylist, Leq = leq }
382                   end
383             fun listofdict (DATA { Table = entrylist,Leq = leq}) =
384                let fun f (nil,r) = rev r
385                      | f (a::b,r) = f (b,a::r)
386                   in f(entrylist,nil)
387                end
388      end
389end
390
391open dict;
392
393(* INPUT.ML : Input w/ one character push back capability *)
394
395val LineNum: int ref = ref 1;
396
397abstype ibuf =
398        BUF of TextIO.instream * {b : string ref, p : int ref}
399with
400        fun make_ibuf(s) = BUF (s, {b=ref"", p = ref 0})
401        fun close_ibuf (BUF (s,_)) = TextIO.closeIn(s)
402        exception eof
403        fun getch (a as (BUF(s,{b,p}))) =
404                 if (!p = (size (!b)))
405                   then (b := TextIO.inputN(s, 1024);
406                         p := 0;
407                         if (size (!b))=0
408                            then raise eof
409                            else getch a)
410                   else (let val ch = String.sub(!b,!p)
411                         in (if ch = #"\n"
412                                 then LineNum := !LineNum + 1
413                                 else ();
414                             p := !p + 1;
415                             ch)
416                         end)
417        fun ungetch(BUF(s,{b,p})) = (
418           p := !p - 1;
419           if String.sub(!b,!p) = #"\n"
420              then LineNum := !LineNum - 1
421              else ())
422end;
423
424exception Error
425
426fun prErr x = (
427      TextIO.output (TextIO.stdErr, String.concat [
428          "ml-lex: error, line ", (Int.toString (!LineNum)), ": ", x, "\n"
429        ]);
430      raise Error)
431fun prSynErr x = (
432      TextIO.output (TextIO.stdErr, String.concat [
433          "ml-lex: syntax error, line ", (Int.toString (!LineNum)), ": ", x, "\n"
434        ]);
435      raise Error)
436
437exception SyntaxError; (* error in user's input file *)
438
439exception LexError; (* unexpected error in lexer *)
440
441val LexBuf = ref(make_ibuf(TextIO.stdIn));
442val LexState = ref 0;
443val NextTok = ref BOF;
444val inquote = ref false;
445
446fun AdvanceTok () : unit = let
447      fun isLetter c =
448            ((c >= #"a") andalso (c <= #"z")) orelse
449            ((c >= #"A") andalso (c <= #"Z"))
450      fun isDigit c = (c >= #"0") andalso (c <= #"9")
451    (* check for valid (non-leading) identifier character (added by JHR) *)
452      fun isIdentChr c =
453            ((isLetter c) orelse (isDigit c) orelse (c = #"_") orelse (c = #"'"))
454      fun atoi s = let
455            fun num (c::r, n) = if isDigit c
456                  then num (r, 10*n + (Char.ord c - Char.ord #"0"))
457                  else n
458              | num ([], n) = n
459            in
460              num (explode s, 0)
461            end
462
463      fun skipws () = (case nextch()
464             of #" " => skipws()
465              | #"\t" => skipws()
466              | #"\n" => skipws()
467              | x => x
468            (* end case *))
469
470      and nextch () = getch(!LexBuf)
471
472      and escaped () = (case nextch()
473             of #"b" => #"\008"
474              | #"n" => #"\n"
475              | #"t" => #"\t"
476              | #"h" => #"\128"
477              | x => let
478                  fun err t = prErr("illegal ascii escape '"^(implode(rev t))^"'")
479                  fun cvt c = (Char.ord c - Char.ord #"0")
480                  fun f (n: int, c, t) = if c=3
481                        then if n >= (!CharSetSize)
482                          then err t
483                          else Char.chr n
484                        else let val ch=nextch()
485                          in
486                            if isDigit ch
487                              then f(n*10+(cvt ch), c+1, ch::t)
488                              else err t
489                          end
490                  in
491                    if isDigit x then f(cvt x, 1, [x]) else x
492                  end
493            (* end case *))
494
495      and onechar x = let val c = array(!CharSetSize, false)
496              in
497                update(c, Char.ord(x), true); CHARS(c)
498              end
499
500      in case !LexState of 0 => let val makeTok = fn () =>
501                case skipws()
502                        (* Lex % operators *)
503                 of #"%" => (case nextch() of
504                            #"%" => LEXMARK
505                        | a => let fun f s =
506                                    let val a = nextch()
507                                    in if isLetter a then f(a::s)
508                                       else (ungetch(!LexBuf);
509                                             implode(rev s))
510                                    end
511                                in case f [a]
512                                 of "reject" => REJECT
513                                  | "count"  => COUNT
514                                  | "full"   => FULLCHARSET
515                                  | "s"      => LEXSTATES
516                                  | "S"      => LEXSTATES
517                                  | "structure" => STRUCT
518                                  | "header" => HEADER
519                                  | "arg"    => ARG
520                                  | "posarg" => POSARG
521                                  | _ => prErr "unknown % operator "
522                               end
523                             )
524                        (* semicolon (for end of LEXSTATES) *)
525                | #";" => SEMI
526                        (* anything else *)
527                | ch => if isLetter(ch) then
528                         let fun getID matched =
529                             let val x = nextch()
530(**** fix by JHR
531                             in if isLetter(x) orelse isDigit(x) orelse
532                                   x = "_" orelse x = "'"
533****)
534                             in if (isIdentChr x)
535                                 then getID (x::matched)
536                                 else (ungetch(!LexBuf); implode(rev matched))
537                             end
538                        in ID(getID [ch])
539                        end
540                      else (prSynErr ("bad character: " ^ String.str ch))
541        in NextTok := makeTok()
542        end
543        | 1 => let val rec makeTok = fn () =>
544                if !inquote then case nextch() of
545                        (* inside quoted string *)
546                  #"\\" => onechar(escaped())
547                | #"\"" => (inquote := false; makeTok())
548                | #"\n" => (prSynErr "end-of-line inside quoted string";
549                            inquote := false; makeTok())
550                | x => onechar(x)
551                else case skipws() of
552                        (* single character operators *)
553                  #"?" => QMARK
554                | #"*" => STAR
555                | #"+" => PLUS
556                | #"|" => BAR
557                | #"(" => LP
558                | #")" => RP
559                | #"^" => CARAT
560                | #"$" => DOLLAR
561                | #"/" => SLASH
562                | #";" => SEMI
563                | #"." => let val c = array(!CharSetSize,true) in
564                                update(c,10,false); CHARS(c)
565                        end
566                        (* assign and arrow *)
567                | #"=" => let val c = nextch() in
568                        if c = #">" then ARROW else (ungetch(!LexBuf); ASSIGN)
569                end
570                        (* character set *)
571                | #"[" => let val rec classch = fn () => let val x = skipws()
572                                in if x = #"\\" then escaped() else x
573                                end;
574                        val first = classch();
575                        val flag = (first <> #"^");
576                        val c = array(!CharSetSize,not flag);
577                        fun add NONE = ()
578                          | add (SOME x) = update(c, Char.ord(x), flag)
579                        and range (x, y) = if x>y
580                              then (prErr "bad char. range")
581                              else let
582                                val i = ref(Char.ord(x)) and j = Char.ord(y)
583                                in while !i<=j do (
584                                  add (SOME(Char.chr(!i)));
585                                  i := !i + 1)
586                                end
587                        and getClass last = (case classch()
588                             of #"]" => (add(last); c)
589                              | #"-" => (case last
590                                   of NONE => getClass(SOME #"-")
591                                    | (SOME last') => let val x = classch()
592                                        in
593                                          if x = #"]"
594                                            then (add(last); add(SOME #"-"); c)
595                                            else (range(last',x); getClass(NONE))
596                                        end
597                                  (* end case *))
598                              | x => (add(last); getClass(SOME x))
599                            (* end case *))
600                in CHARS(getClass(if first = #"^" then NONE else SOME first))
601                end
602                        (* Start States specification *)
603                | #"<" => let val rec get_state = fn (prev,matched) =>
604                        case nextch() of
605                          #">" => matched::prev
606                        | #"," => get_state(matched::prev,"")
607                        | x => if isIdentChr(x)
608                                then get_state(prev,matched ^ String.str x)
609                                else (prSynErr "bad start state list")
610                in STATE(get_state(nil,""))
611                end
612                        (* {id} or repititions *)
613                | #"{" => let val ch = nextch() in if isLetter(ch) then
614                        let fun getID matched = (case nextch()
615                          of #"}" => matched
616                           | x => if (isIdentChr x) then
617                                getID(matched ^ String.str x)
618                                else (prErr "invalid char. class name")
619                         (* end case *))
620                        in ID(getID(String.str ch))
621                        end
622                        else if isDigit(ch) then
623                         let fun get_r (matched, r1) = (case nextch()
624                                 of #"}" => let val n = atoi(matched) in
625                                        if r1 = ~1 then (n,n) else (r1,n)
626                                        end
627                                  | #"," => if r1 = ~1 then get_r("",atoi(matched))
628                                       else (prErr "invalid repetitions spec.")
629                                  | x => if isDigit(x)
630                                    then get_r(matched ^ String.str x,r1)
631                                    else (prErr "invalid char in repetitions spec")
632                                (* end case *))
633                         in REPS(get_r(String.str ch,~1))
634                         end
635                        else (prErr "bad repetitions spec")
636                end
637                        (* Lex % operators *)
638                | #"\\" => onechar(escaped())
639                        (* start quoted string *)
640                | #"\"" => (inquote := true; makeTok())
641                        (* anything else *)
642                | ch => onechar(ch)
643        in NextTok := makeTok()
644        end
645        | 2 => NextTok :=
646               (case skipws() of
647                  #"(" =>
648                  let
649                    fun loop_to_end (backslash, x) =
650                      let
651                        val c    = getch (! LexBuf)
652                        val notb = not backslash
653                        val nstr = c :: x
654                      in
655                        case c of
656                          #"\"" => if notb then nstr
657                                   else loop_to_end (false, nstr)
658                        | _ => loop_to_end (c = #"\\" andalso notb, nstr)
659                      end
660                    fun GetAct (lpct, x) =
661                      let
662                        val c    = getch (! LexBuf)
663                        val nstr = c :: x
664                      in
665                        case c of
666                          #"\"" => GetAct (lpct, loop_to_end (false, nstr))
667                        | #"(" => GetAct (lpct + 1, nstr)
668                        | #")" => if lpct = 0 then implode (rev x)
669                                  else GetAct(lpct - 1, nstr)
670                        | _ => GetAct(lpct, nstr)
671                      end
672                  in
673                    ACTION (GetAct (0,nil))
674                  end
675                | #";" => SEMI
676                | c => (prSynErr ("invalid character " ^ String.str c)))
677        | _ => raise LexError
678end
679handle eof => NextTok := EOF ;
680
681fun GetTok (_:unit) : token =
682        let val t = !NextTok in AdvanceTok(); t
683        end;
684val SymTab = ref (create String.<=) : (string,exp) dictionary ref
685
686fun GetExp () : exp =
687
688        let val rec optional = fn e => ALT(EPS,e)
689
690            and lookup' = fn name =>
691                lookup(!SymTab) name
692                handle LOOKUP => prErr ("bad regular expression name: "^
693                                            name)
694
695        and newline = fn () => let val c = array(!CharSetSize,false) in
696                update(c,10,true); c
697                end
698
699        and endline = fn e => trail(e,CLASS(newline(),0))
700
701        and trail = fn (e1,e2) => CAT(CAT(e1,TRAIL(0)),e2)
702
703        and closure1 = fn e => CAT(e,CLOSURE(e))
704
705        and repeat = fn (min,max,e) => let val rec rep = fn
706                  (0,0) => EPS
707                | (0,1) => ALT(e,EPS)
708                | (0,i) => CAT(rep(0,1),rep(0,i-1))
709                | (i,j) => CAT(e,rep(i-1,j-1))
710        in rep(min,max)
711        end
712
713        and exp0 = fn () => case GetTok() of
714                  CHARS(c) => exp1(CLASS(c,0))
715                | LP => let val e = exp0() in
716                 if !NextTok = RP then
717                  (AdvanceTok(); exp1(e))
718                 else (prSynErr "missing '('") end
719                | ID(name) => exp1(lookup' name)
720                | _ => raise SyntaxError
721
722        and exp1 = fn (e) => case !NextTok of
723                  SEMI => e
724                | ARROW => e
725                | EOF => e
726                | LP => exp2(e,exp0())
727                | RP => e
728                | t => (AdvanceTok(); case t of
729                          QMARK => exp1(optional(e))
730                        | STAR => exp1(CLOSURE(e))
731                        | PLUS => exp1(closure1(e))
732                        | CHARS(c) => exp2(e,CLASS(c,0))
733                        | BAR => ALT(e,exp0())
734                        | DOLLAR => (UsesTrailingContext := true; endline(e))
735                        | SLASH => (UsesTrailingContext := true;
736                                    trail(e,exp0()))
737                        | REPS(i,j) => exp1(repeat(i,j,e))
738                        | ID(name) => exp2(e,lookup' name)
739                        | _ => raise SyntaxError)
740
741        and exp2 = fn (e1,e2) => case !NextTok of
742                  SEMI => CAT(e1,e2)
743                | ARROW => CAT(e1,e2)
744                | EOF => CAT(e1,e2)
745                | LP => exp2(CAT(e1,e2),exp0())
746                | RP => CAT(e1,e2)
747                | t => (AdvanceTok(); case t of
748                            QMARK => exp1(CAT(e1,optional(e2)))
749                        | STAR => exp1(CAT(e1,CLOSURE(e2)))
750                        | PLUS => exp1(CAT(e1,closure1(e2)))
751                        | CHARS(c) => exp2(CAT(e1,e2),CLASS(c,0))
752                        | BAR => ALT(CAT(e1,e2),exp0())
753                        | DOLLAR => (UsesTrailingContext := true;
754                                     endline(CAT(e1,e2)))
755                        | SLASH => (UsesTrailingContext := true;
756                                    trail(CAT(e1,e2),exp0()))
757                        | REPS(i,j) => exp1(CAT(e1,repeat(i,j,e2)))
758                        | ID(name) => exp2(CAT(e1,e2),lookup' name)
759                        | _ => raise SyntaxError)
760in exp0()
761end;
762val StateTab = ref(create(String.<=)) : (string,int) dictionary ref
763
764val StateNum: int ref = ref 0;
765
766fun GetStates () : int list =
767
768   let fun add nil sl = sl
769            | add (x::y) sl = add y (union ([lookup (!StateTab)(x)
770                                           handle LOOKUP =>
771                                              prErr ("bad state name: "^x)
772                                          ],sl))
773
774        fun addall i sl =
775            if i <= !StateNum then addall (i+2) (union ([i],sl))
776            else sl
777
778        fun incall (x::y) = (x+1)::incall y
779          | incall nil = nil
780
781        fun addincs nil = nil
782            | addincs (x::y) = x::(x+1)::addincs y
783
784        val state_list =
785           case !NextTok of
786             STATE s => (AdvanceTok(); LexState := 1; add s nil)
787             | _ => addall 1 nil
788
789      in case !NextTok
790           of CARAT => (LexState := 1; AdvanceTok(); UsesPrevNewLine := true;
791                        incall state_list)
792            | _ => addincs state_list
793      end
794
795val LeafNum: int ref = ref ~1;
796
797fun renum(e : exp) : exp =
798        let val rec label = fn
799          EPS => EPS
800        | CLASS(x,_) => CLASS(x,++LeafNum)
801        | CLOSURE(e) => CLOSURE(label(e))
802        | ALT(e1,e2) => ALT(label(e1),label(e2))
803        | CAT(e1,e2) => CAT(label(e1),label(e2))
804        | TRAIL(i) => TRAIL(++LeafNum)
805        | END(i) => END(++LeafNum)
806in label(e)
807end;
808
809exception ParseError;
810
811fun parse() : (string * (int list * exp) list * ((string,string) dictionary)) =
812        let val Accept = ref (create String.<=) : (string,string) dictionary ref
813        val rec ParseRtns = fn l => case getch(!LexBuf) of
814                  #"%" => let val c = getch(!LexBuf) in
815                               if c = #"%" then (implode (rev l))
816                           else ParseRtns(c :: #"%" :: l)
817                        end
818                | c => ParseRtns(c::l)
819        and ParseDefs = fn () =>
820                (LexState:=0; AdvanceTok(); case !NextTok of
821                  LEXMARK => ()
822                | LEXSTATES =>
823                   let fun f () = (case !NextTok of (ID i) =>
824                                    (StateTab := enter(!StateTab)(i,++StateNum);
825                                     ++StateNum; AdvanceTok(); f())
826                                        | _ => ())
827                   in AdvanceTok(); f ();
828                      if !NextTok=SEMI then ParseDefs() else
829                        (prSynErr "expected ';'")
830                   end
831                | ID x => (LexState:=1; AdvanceTok(); if GetTok() = ASSIGN
832                          then (SymTab := enter(!SymTab)(x,GetExp());
833                               if !NextTok = SEMI then ParseDefs()
834                               else (prSynErr "expected ';'"))
835                        else raise SyntaxError)
836                | REJECT => (HaveReject := true; ParseDefs())
837                | COUNT => (CountNewLines := true; ParseDefs())
838                | FULLCHARSET => (CharSetSize := 256; ParseDefs())
839                | HEADER => (LexState := 2; AdvanceTok();
840                             case GetTok()
841                             of ACTION s =>
842                                if (!StrDecl) then
843                                   (prErr "cannot have both %structure and %header \
844                                    \declarations")
845                                else if (!HeaderDecl) then
846                                   (prErr "duplicate %header declarations")
847                                else
848                                    (HeaderCode := s; LexState := 0;
849                                     HeaderDecl := true; ParseDefs())
850                                | _ => raise SyntaxError)
851                | POSARG => (PosArg := true; ParseDefs())
852                | ARG => (LexState := 2; AdvanceTok();
853                             case GetTok()
854                             of ACTION s =>
855                                (case !ArgCode
856                                   of SOME _ => prErr "duplicate %arg declarations"
857                                    | NONE => ArgCode := SOME s;
858                                 LexState := 0;
859                                 ParseDefs())
860                                | _ => raise SyntaxError)
861                | STRUCT => (AdvanceTok();
862                            case !NextTok of
863                               (ID i) =>
864                                if (!HeaderDecl) then
865                                   (prErr "cannot have both %structure and %header \
866                                    \declarations")
867                                else if (!StrDecl) then
868                                   (prErr "duplicate %structure declarations")
869                                else (StrName := i; StrDecl := true)
870                                 | _  => (prErr "expected ID");
871                                ParseDefs())
872                | _ => raise SyntaxError)
873        and ParseRules =
874                fn rules => (LexState:=1; AdvanceTok(); case !NextTok of
875                  EOF => rules
876                | _ =>
877                 let val s = GetStates()
878                     val e = renum(CAT(GetExp(),END(0)))
879                 in
880                 if !NextTok = ARROW then
881                   (LexState:=2; AdvanceTok();
882                    case GetTok() of ACTION(act) =>
883                      if !NextTok=SEMI then
884                        (Accept:=enter(!Accept) (Int.toString (!LeafNum),act);
885                         ParseRules((s,e)::rules))
886                      else (prSynErr "expected ';'")
887                    | _ => raise SyntaxError)
888                  else (prSynErr "expected '=>'")
889                end)
890in let val usercode = ParseRtns nil
891   in (ParseDefs(); (usercode,ParseRules(nil),!Accept))
892   end
893end handle SyntaxError => (prSynErr "")
894
895fun makebegin () : unit =
896   let fun make nil = ()
897         | make ((x,n:int)::y)=(say "val "; say x; say " = " ;
898                                say "STARTSTATE ";
899                                say (Int.toString n); say ";\n"; make y)
900   in say "\n(* start state definitions *)\n\n"; make(listofdict(!StateTab))
901   end
902
903structure L =
904        struct
905          nonfix >
906          type key = int list * string
907          fun > ((key,item:string),(key',item')) =
908            let fun f ((a:int)::a') (b::b') = if Int.> (a,b) then true
909                                           else if a=b then f a' b'
910                                           else false
911                  | f _ _ = false
912            in f key key'
913            end
914        end
915
916structure RB = RedBlack(L)
917
918fun maketable (fins:(int * (int list)) list,
919             tcs :(int * (int list)) list,
920             tcpairs: (int * int) list,
921             trans : (int*(int list)) list) : unit =
922
923(* Fins = (state #, list of final leaves for the state) list
924   tcs = (state #, list of trailing context leaves which begin in this state)
925         list
926   tcpairs = (trailing context leaf, end leaf) list
927   trans = (state #,list of transitions for state) list *)
928
929   let datatype elem = N of int | T of int | D of int
930       val count = ref 0
931       val _ = (if length(trans)<256 then CharFormat := true
932                 else CharFormat := false;
933                 if !UsesTrailingContext then
934                         (say "\ndatatype yyfinstate = N of int | \
935                           \ T of int | D of int\n")
936                 else say "\ndatatype yyfinstate = N of int";
937                 say "\ntype statedata = {fin : yyfinstate list, trans: ";
938                 case !CharFormat of
939                       true => say "string}"
940                     | false => say "int Vector.vector}";
941                 say "\n(* transition & final state table *)\nval tab = let\n";
942                 case !CharFormat of
943                       true => ()
944                     | false =>
945                       (say "fun decode s k =\n";
946                        say "  let val k' = k + k\n";
947                        say "      val hi = Char.ord(String.sub(s, k'))\n";
948                        say "      val lo = Char.ord(String.sub(s, k' + 1))\n";
949                        say "  in hi * 256 + lo end\n"))
950
951      val newfins =
952        let fun IsEndLeaf t =
953             let fun f ((l,e)::r) = if (e=t) then true else f r
954                   | f nil = false in f tcpairs end
955
956         fun GetEndLeaf t =
957           let fun f ((tl,el)::r) = if (tl=t) then el else f r
958                 | f [] = raise Fail "GetEndLeaf"
959           in f tcpairs
960           end
961         fun GetTrConLeaves s =
962           let fun f ((s',l)::r) = if (s = s') then l else f r
963                 | f nil = nil
964           in f tcs
965           end
966         fun sort_leaves s =
967           let fun insert (x:int) (a::b) =
968                 if (x <= a) then x::(a::b)
969                 else a::(insert x b)
970                 | insert x nil = [x]
971           in List.foldr (fn (x,r) => insert x r) [] s
972           end
973         fun conv a = if (IsEndLeaf a) then (D a) else (N a)
974         fun merge (a::a',b::b') =
975           if (a <= b) then (conv a)::merge(a',b::b')
976           else (T b)::(merge(a::a',b'))
977           | merge (a::a',nil) = (conv a)::(merge (a',nil))
978           | merge (nil,b::b') = (T b)::(merge (b',nil))
979           | merge (nil,nil) = nil
980
981        in map (fn (x,l) =>
982          rev (merge (l,
983                sort_leaves (map (fn x => GetEndLeaf x) (GetTrConLeaves x)))))
984                    fins
985        end
986
987        val rs =
988         let open RB
989             fun makeItems x =
990               let fun emit8(x, pos) =
991                     let val s = StringCvt.padLeft #"0" 3 (Int.toString x)
992                     in
993                       case pos
994                         of 16        => (say "\\\n\\\\"; say s; 1)
995                          | _        => (say "\\"; say s; pos+1)
996                     end
997                   fun emit16(x, pos) =
998                     let val hi8 = x div 256
999                         val lo8 = x - hi8 * 256        (* x rem 256 *)
1000                     in
1001                       emit8(lo8, emit8(hi8, pos))
1002                     end
1003                   fun MakeString([], _, _) = ()
1004                     | MakeString(x::xs, emitter, pos) =
1005                        MakeString(xs, emitter, emitter(x, pos))
1006                in case !CharFormat of
1007                    true => (say " \n\""; MakeString(x,emit8,0); say "\"\n")
1008                  | false => (say (Int.toString(length x));
1009                     say ", \n\""; MakeString(x,emit16,0); say "\"\n")
1010                end
1011
1012            fun makeEntry(nil,rs,t) = rev rs
1013              | makeEntry(((l:int,x)::y),rs,t) =
1014                  let val name = (Int.toString l)
1015                  in let val (r,n) = lookup ((x,name),t)
1016                      in makeEntry(y,(n::rs),t)
1017                      end handle notfound _ =>
1018                        (count := !count+1;
1019                          say " ("; say name; say ",";
1020                          makeItems x; say "),\n";
1021                         makeEntry(y,(name::rs),(insert ((x,name),t))))
1022                     end
1023
1024            val _ = say "val s = [ \n"
1025            val res =  makeEntry(trans,nil,empty)
1026            val _ =
1027              case !CharFormat
1028               of true => (say "(0, \"\")]\n"; say "fun f x = x \n")
1029                | false => (say "(0, 0, \"\")]\n";
1030                    say "fun f(n, i, x) = (n, Vector.tabulate(i, decode x)) \n")
1031
1032            val _ = say "val s = map f (rev (tl (rev s))) \n"
1033            val _ = say "exception LexHackingError \n"
1034            val _ = say "fun look ((j,x)::r, i) = if i = j then x else look(r, i) \n"
1035            val _ = say "  | look ([], i) = raise LexHackingError\n"
1036
1037        val _ = say "fun g {fin=x, trans=i} = {fin=x, trans=look(s,i)} \n"
1038          in res
1039        end
1040
1041        fun makeTable(nil,nil) = ()
1042          | makeTable(a::a',b::b') =
1043             let fun makeItems nil = ()
1044                   | makeItems (hd::tl) =
1045                     let val (t,n) =
1046                         case hd of
1047                           (N i) => ("(N ",i)
1048                         | (T i) => ("(T ",i)
1049                         | (D i) => ("(D ",i)
1050                     in (say t; say (Int.toString n); say ")";
1051                         if null tl
1052                         then ()
1053                         else (say ","; makeItems tl))
1054                     end
1055              in (say "{fin = ["; makeItems b;
1056                  say "], trans = "; say a; say "}";
1057                  if null a'
1058                  then ()
1059                  else (say ",\n"; makeTable(a',b')))
1060              end
1061          | makeTable _ = raise Fail "makeTable"
1062
1063        fun msg x = TextIO.output(TextIO.stdOut, x)
1064
1065  in (say "in Vector.fromList(map g \n["; makeTable(rs,newfins);
1066      say "])\nend\n";
1067    msg ("\nNumber of states = " ^ (Int.toString (length trans)));
1068    msg ("\nNumber of distinct rows = " ^ (Int.toString (!count)));
1069    msg ("\nApprox. memory size of trans. table = " ^
1070          (Int.toString (!count*(!CharSetSize)*(if !CharFormat then 1 else 8))));
1071    msg " bytes\n")
1072end
1073
1074(* makeaccept: Takes a (string,string) dictionary, prints case statement for
1075   accepting leaf actions.  The key strings are the leaf #'s, the data strings
1076   are the actions *)
1077
1078fun makeaccept ends =
1079    let fun startline f = if f then say "  " else say "| "
1080         fun make(nil,f) = (startline f; say "_ => raise Internal.LexerError\n")
1081          | make((x,a)::y,f) = (startline f; say x; say " => ";
1082                                if Substring.size(#2 (Substring.position "yytext" (Substring.full a))) = 0
1083 then
1084                                     (say "("; say a; say ")")
1085                                else (say "let val yytext=yymktext() in ";
1086                                      say a; say " end");
1087                                say "\n"; make(y,false))
1088    in make (listofdict(ends),true)
1089    end
1090
1091fun leafdata(e:(int list * exp) list) =
1092        let val fp = array(!LeafNum + 1,nil)
1093        and leaf = array(!LeafNum + 1,EPS)
1094        and tcpairs = ref nil
1095        and trailmark = ref ~1;
1096        val rec add = fn
1097                  (nil,x) => ()
1098                | (hd::tl,x) => (update(fp,hd,union(fp sub hd,x));
1099                        add(tl,x))
1100        and moredata = fn
1101                  CLOSURE(e1) =>
1102                        (moredata(e1); add(lastpos(e1),firstpos(e1)))
1103                | ALT(e1,e2) => (moredata(e1); moredata(e2))
1104                | CAT(e1,e2) => (moredata(e1); moredata(e2);
1105                        add(lastpos(e1),firstpos(e2)))
1106                | CLASS(x,i) => update(leaf,i,CLASS(x,i))
1107                | TRAIL(i) => (update(leaf,i,TRAIL(i)); if !trailmark = ~1
1108                        then trailmark := i else ())
1109                | END(i) => (update(leaf,i,END(i)); if !trailmark <> ~1
1110                        then (tcpairs := (!trailmark,i)::(!tcpairs);
1111                        trailmark := ~1) else ())
1112                | _ => ()
1113        and makedata = fn
1114                  nil => ()
1115                | (_,x)::tl => (moredata(x);makedata(tl))
1116        in trailmark := ~1; makedata(e); (fp,leaf,!tcpairs)
1117        end;
1118
1119fun makedfa(rules) =
1120let val StateTab = ref (create(String.<=)) : (string,int) dictionary ref
1121    val fintab = ref (create(Int.<=)) : (int,(int list)) dictionary ref
1122    val transtab = ref (create(Int.<=)) : (int,int list) dictionary ref
1123    val tctab = ref (create(Int.<=)) : (int,(int list)) dictionary ref
1124    val (fp, leaf, tcpairs) = leafdata(rules);
1125
1126fun visit (state,statenum) =
1127        let val transitions = gettrans(state) in
1128           fintab := enter(!fintab)(statenum,getfin(state));
1129           tctab := enter(!tctab)(statenum,gettc(state));
1130           transtab := enter(!transtab)(statenum,transitions)
1131        end
1132
1133and visitstarts (states) =
1134        let fun vs nil i = ()
1135              | vs (hd::tl) i = (visit (hd,i); vs tl (i+1))
1136        in vs states 0
1137        end
1138
1139and hashstate(s: int list) =
1140        let val rec hs =
1141                fn (nil,z) => z
1142                 | ((x:int)::y,z) => hs(y,z ^ " " ^ (Int.toString x))
1143        in hs(s,"")
1144        end
1145
1146and find(s) = lookup(!StateTab)(hashstate(s))
1147
1148and add(s,n) = StateTab := enter(!StateTab)(hashstate(s),n)
1149
1150and getstate (state) =
1151        find(state)
1152        handle LOOKUP => let val n = ++StateNum in
1153                add(state,n); visit(state,n); n
1154                end
1155
1156and getfin state =
1157        let fun f nil fins = fins
1158              | f (hd::tl) fins =
1159                 case (leaf sub hd)
1160                    of END _ => f tl (hd::fins)
1161                     | _ => f tl fins
1162        in f state nil
1163        end
1164
1165and gettc state =
1166        let fun f nil fins = fins
1167              | f (hd::tl) fins =
1168                 case (leaf sub hd)
1169                    of TRAIL _ => f tl (hd::fins)
1170                     | _ => f tl fins
1171        in f state nil
1172        end
1173
1174and gettrans (state) =
1175      let fun loop c tlist =
1176         let fun cktrans nil r = r
1177               | cktrans (hd::tl) r =
1178                  case (leaf sub hd) of
1179                   CLASS(i,_)=>
1180                        (if (i sub c) then cktrans tl (union(r,fp sub hd))
1181                         else cktrans tl r handle Subscript =>
1182                                                cktrans tl r
1183                        )
1184                   | _ => cktrans tl r
1185         in if c >= 0 then
1186              let val v=cktrans state nil
1187              in loop (c-1) (if v=nil then 0::tlist else (getstate v)::tlist)
1188              end
1189            else tlist
1190         end
1191     in loop ((!CharSetSize) - 1) nil
1192     end
1193
1194and startstates() =
1195        let val startarray = array(!StateNum + 1, nil);
1196            fun listofarray(a,n) =
1197                  let fun f i l = if i >= 0 then  f (i-1) ((a sub i)::l) else l
1198                 in f (n-1) nil end
1199        val rec makess = fn
1200                  nil => ()
1201                | (startlist,e)::tl => (fix(startlist,firstpos(e));makess(tl))
1202        and fix = fn
1203                  (nil,_) => ()
1204                | (s::tl,firsts) => (update(startarray,s,
1205                        union(firsts,startarray sub s));
1206                        fix(tl,firsts))
1207        in makess(rules);listofarray(startarray, !StateNum + 1)
1208        end
1209
1210in visitstarts(startstates());
1211(listofdict(!fintab),listofdict(!transtab),listofdict(!tctab),tcpairs)
1212end
1213
1214val skel_hd =
1215"   struct\n\
1216\    type int = Int.int\n\
1217\    structure UserDeclarations =\n\
1218\      struct\n\
1219\"
1220
1221val skel_mid2 =
1222"                       | Internal.D k => action (i,(acts::l),k::rs)\n\
1223\                       | Internal.T k =>\n\
1224\                         let fun f (a::b,r) =\n\
1225\                              if a=k\n\
1226\                                then action(i,(((Internal.N a)::acts)::l),(b@r))\n\
1227\                                else f (b,a::r)\n\
1228\                               | f (nil,r) = action(i,(acts::l),rs)\n\
1229\                          in f (rs,nil)\n\
1230\                          end\n\
1231\"
1232
1233fun lexGen infile =
1234    let val outfile = infile ^ ".sml"
1235      fun PrintLexer (ends) =
1236    let val sayln = fn x => (say x; say "\n")
1237     in case !ArgCode
1238         of NONE => (sayln "fun lex () : Internal.result =";
1239                     sayln "let fun continue() = lex() in")
1240          | SOME s => (say "fun lex "; say "(yyarg as ("; say s; sayln ")) =";
1241                       sayln "let fun continue() : Internal.result = ");
1242         say "  let fun scan (s,AcceptingLeaves : Internal.yyfinstate";
1243         sayln " list list,l,i0: int) =";
1244         if !UsesTrailingContext
1245             then say "\tlet fun action (i: int,nil,rs)"
1246             else say "\tlet fun action (i: int,nil)";
1247         sayln " = raise LexError";
1248         if !UsesTrailingContext
1249             then sayln "\t| action (i,nil::l,rs) = action(i-1,l,rs)"
1250             else sayln "\t| action (i,nil::l) = action (i-1,l)";
1251         if !UsesTrailingContext
1252             then sayln "\t| action (i,(node::acts)::l,rs) ="
1253             else sayln "\t| action (i,(node::acts)::l) =";
1254         sayln "\t\tcase node of";
1255         sayln "\t\t    Internal.N yyk => ";
1256         sayln "\t\t\t(let fun yymktext() = String.substring(!yyb,i0,i-i0)\n\
1257               \\t\t\t     val yypos: int = i0+ !yygone";
1258         if !CountNewLines
1259            then (sayln "\t\t\tval _ = yylineno := CharVector.foldl";
1260                    sayln "\t\t\t\t(fn (_,#\"\\n\", n) => n+1 | (_,_, n) => n) (!yylineno) (!yyb,i0,SOME(i-i0))")
1261            else ();
1262         if !HaveReject
1263             then (say "\t\t\tfun REJECT() = action(i,acts::l";
1264                   if !UsesTrailingContext
1265                       then sayln ",rs)" else sayln ")")
1266             else ();
1267         sayln "\t\t\topen UserDeclarations Internal.StartStates";
1268         sayln " in (yybufpos := i; case yyk of ";
1269         sayln "";
1270         sayln "\t\t\t(* Application actions *)\n";
1271         makeaccept(ends);
1272         say "\n\t\t) end ";
1273         say ")\n\n";
1274         if (!UsesTrailingContext) then say skel_mid2 else ();
1275         sayln "\tval {fin,trans} = Vector.sub (Internal.tab, s)";
1276         sayln "\tval NewAcceptingLeaves = fin::AcceptingLeaves";
1277         sayln "\tin if l = !yybl then";
1278         sayln "\t     if trans = #trans(Vector.sub(Internal.tab,0))";
1279         sayln "\t       then action(l,NewAcceptingLeaves";
1280         if !UsesTrailingContext then say ",nil" else ();
1281         say ") else";
1282         sayln "\t    let val newchars= if !yydone then \"\" else yyinput 1024";
1283         sayln "\t    in if (String.size newchars)=0";
1284         sayln "\t\t  then (yydone := true;";
1285         say "\t\t        if (l=i0) then UserDeclarations.eof ";
1286         sayln (case !ArgCode of NONE => "()" | SOME _ => "yyarg");
1287         say   "\t\t                  else action(l,NewAcceptingLeaves";
1288         if !UsesTrailingContext then
1289            sayln ",nil))" else sayln "))";
1290         sayln "\t\t  else (if i0=l then yyb := newchars";
1291         sayln "\t\t     else yyb := String.substring(!yyb,i0,l-i0)^newchars;";
1292         sayln "\t\t     yygone := !yygone+i0;";
1293         sayln "\t\t     yybl := String.size (!yyb);";
1294         sayln "\t\t     scan (s,AcceptingLeaves,l-i0,0))";
1295         sayln "\t    end";
1296         sayln "\t  else let val NewChar = Char.ord (CharVector.sub (!yyb,l))";
1297         if !CharSetSize=129
1298           then sayln "\t\tval NewChar = if NewChar<128 then NewChar else 128"
1299           else ();
1300         say "\t\tval NewState = ";
1301         sayln (if !CharFormat
1302                then "Char.ord (CharVector.sub (trans,NewChar))"
1303                else "Vector.sub (trans, NewChar)");
1304         say "\t\tin if NewState=0 then action(l,NewAcceptingLeaves";
1305         if !UsesTrailingContext then sayln ",nil)" else sayln ")";
1306         sayln "\t\telse scan(NewState,NewAcceptingLeaves,l+1,i0)";
1307         sayln "\tend";
1308         sayln "\tend";
1309         if !UsesPrevNewLine then () else sayln "(*";
1310         sayln "\tval start= if String.substring(!yyb,!yybufpos-1,1)=\"\\n\"";
1311         sayln "then !yybegin+1 else !yybegin";
1312         if !UsesPrevNewLine then () else sayln "*)";
1313         say "\tin scan(";
1314         if !UsesPrevNewLine then say "start"
1315         else say "!yybegin (* start *)";
1316         sayln ",nil,!yybufpos,!yybufpos)";
1317         sayln "    end";
1318         sayln (case !ArgCode of NONE => "end" | SOME _ => "in continue end");
1319         sayln "  in lex";
1320         sayln "  end";
1321         sayln "end"
1322        end
1323
1324    in (UsesPrevNewLine := false;
1325        ResetFlags();
1326        LexBuf := make_ibuf(TextIO.openIn infile);
1327        NextTok := BOF;
1328        inquote := false;
1329        LexOut := TextIO.openOut(outfile);
1330        StateNum := 2;
1331        LineNum := 1;
1332        StateTab := enter(create(String.<=))("INITIAL",1);
1333        LeafNum := ~1;
1334        let
1335           val (user_code,rules,ends) =
1336               parse() handle x =>
1337                         (close_ibuf(!LexBuf);
1338                   TextIO.closeOut(!LexOut);
1339                   OS.FileSys.remove outfile;
1340                   raise x)
1341           val (fins,trans,tctab,tcpairs) = makedfa(rules)
1342           val _ = if !UsesTrailingContext then
1343                      (close_ibuf(!LexBuf);
1344                       TextIO.closeOut(!LexOut);
1345                       OS.FileSys.remove outfile;
1346                       prErr "lookahead is unimplemented")
1347                   else ()
1348        in
1349          if (!HeaderDecl)
1350              then say (!HeaderCode)
1351              else say ("structure " ^ (!StrName));
1352          say "=\n";
1353          say skel_hd;
1354          say user_code;
1355          say "end (* end of user routines *)\n";
1356          say "exception LexError (* raised if illegal leaf ";
1357          say "action tried *)\n";
1358          say "structure Internal =\n\tstruct\n";
1359          maketable(fins,tctab,tcpairs,trans);
1360          say "structure StartStates =\n\tstruct\n";
1361          say "\tdatatype yystartstate = STARTSTATE of int\n";
1362          makebegin();
1363          say "\nend\n";
1364          say "type result = UserDeclarations.lexresult\n";
1365          say "\texception LexerError (* raised if illegal leaf ";
1366          say "action tried *)\n";
1367          say "end\n\n";
1368          say "type int = Int.int\n";
1369          say (if (!PosArg) then "fun makeLexer (yyinput: int -> string,yygone0:int) =\nlet\n"
1370                else "fun makeLexer (yyinput: int -> string) =\nlet\tval yygone0:int=0\n");
1371          if !CountNewLines then say "\tval yylineno: int ref = ref 0\n\n" else ();
1372          say "\tval yyb = ref \"\\n\" \t\t(* buffer *)\n\
1373          \\tval yybl: int ref = ref 1\t\t(*buffer length *)\n\
1374          \\tval yybufpos: int ref = ref 1\t\t(* location of next character to use *)\n\
1375          \\tval yygone: int ref = ref yygone0\t(* position in file of beginning of buffer *)\n\
1376          \\tval yydone = ref false\t\t(* eof found yet? *)\n\
1377          \\tval yybegin: int ref = ref 1\t\t(*Current 'start state' for lexer *)\n\
1378            \\n\tval YYBEGIN = fn (Internal.StartStates.STARTSTATE x) =>\n\
1379          \\t\t yybegin := x\n\n";
1380          PrintLexer(ends);
1381          close_ibuf(!LexBuf);
1382           TextIO.closeOut(!LexOut)
1383         end)
1384    end
1385end
1386
1387