1
2
3
4FLEX(1)                  USER COMMANDS                    FLEX(1)
5
6
7
8NAME
9     flex - fast lexical analyzer generator
10
11SYNOPSIS
12     flex [-bcdfhilnpstvwBFILTV78+? -C[aefFmr] -ooutput  -Pprefix
13     -Sskeleton] [--help --version] [filename ...]
14
15OVERVIEW
16     This manual describes flex, a tool for  generating  programs
17     that  perform pattern-matching on text.  The manual includes
18     both tutorial and reference sections:
19
20         Description
21             a brief overview of the tool
22
23         Some Simple Examples
24
25         Format Of The Input File
26
27         Patterns
28             the extended regular expressions used by flex
29
30         How The Input Is Matched
31             the rules for determining what has been matched
32
33         Actions
34             how to specify what to do when a pattern is matched
35
36         The Generated Scanner
37             details regarding the scanner that flex produces;
38             how to control the input source
39
40         Start Conditions
41             introducing context into your scanners, and
42             managing "mini-scanners"
43
44         Multiple Input Buffers
45             how to manipulate multiple input sources; how to
46             scan from strings instead of files
47
48         End-of-file Rules
49             special rules for matching the end of the input
50
51         Miscellaneous Macros
52             a summary of macros available to the actions
53
54         Values Available To The User
55             a summary of values available to the actions
56
57         Interfacing With Yacc
58             connecting flex scanners together with yacc parsers
59
60
61
62
63Version 2.5          Last change: April 1995                    1
64
65
66
67
68
69
70FLEX(1)                  USER COMMANDS                    FLEX(1)
71
72
73
74         Options
75             flex command-line options, and the "%option"
76             directive
77
78         Performance Considerations
79             how to make your scanner go as fast as possible
80
81         Generating C++ Scanners
82             the (experimental) facility for generating C++
83             scanner classes
84
85         Incompatibilities With Lex And POSIX
86             how flex differs from AT&T lex and the POSIX lex
87             standard
88
89         Diagnostics
90             those error messages produced by flex (or scanners
91             it generates) whose meanings might not be apparent
92
93         Files
94             files used by flex
95
96         Deficiencies / Bugs
97             known problems with flex
98
99         See Also
100             other documentation, related tools
101
102         Author
103             includes contact information
104
105
106DESCRIPTION
107     flex is a  tool  for  generating  scanners:  programs  which
108     recognized  lexical  patterns in text.  flex reads the given
109     input files, or its standard input  if  no  file  names  are
110     given,  for  a  description  of  a scanner to generate.  The
111     description is in the form of pairs of  regular  expressions
112     and  C  code,  called  rules.  flex  generates as output a C
113     source file, lex.yy.c, which defines a routine yylex(). This
114     file is compiled and linked with the -lfl library to produce
115     an executable.  When the executable is run, it analyzes  its
116     input  for occurrences of the regular expressions.  Whenever
117     it finds one, it executes the corresponding C code.
118
119SOME SIMPLE EXAMPLES
120     First some simple examples to get the flavor of how one uses
121     flex.  The  following  flex  input specifies a scanner which
122     whenever it encounters the string "username" will replace it
123     with the user's login name:
124
125         %%
126
127
128
129Version 2.5          Last change: April 1995                    2
130
131
132
133
134
135
136FLEX(1)                  USER COMMANDS                    FLEX(1)
137
138
139
140         username    printf( "%s", getlogin() );
141
142     By default, any text not matched by a flex scanner is copied
143     to  the output, so the net effect of this scanner is to copy
144     its input file to its output with each occurrence of  "user-
145     name"  expanded.   In  this  input,  there is just one rule.
146     "username" is the pattern and the "printf"  is  the  action.
147     The "%%" marks the beginning of the rules.
148
149     Here's another simple example:
150
151                 int num_lines = 0, num_chars = 0;
152
153         %%
154         \n      ++num_lines; ++num_chars;
155         .       ++num_chars;
156
157         %%
158         main()
159                 {
160                 yylex();
161                 printf( "# of lines = %d, # of chars = %d\n",
162                         num_lines, num_chars );
163                 }
164
165     This scanner counts the number of characters and the  number
166     of  lines in its input (it produces no output other than the
167     final report on the counts).  The first  line  declares  two
168     globals,  "num_lines"  and "num_chars", which are accessible
169     both inside yylex() and in the main() routine declared after
170     the  second  "%%".  There are two rules, one which matches a
171     newline ("\n") and increments both the line  count  and  the
172     character  count,  and one which matches any character other
173     than a newline (indicated by the "." regular expression).
174
175     A somewhat more complicated example:
176
177         /* scanner for a toy Pascal-like language */
178
179         %{
180         /* need this for the call to atof() below */
181         #include <math.h>
182         %}
183
184         DIGIT    [0-9]
185         ID       [a-z][a-z0-9]*
186
187         %%
188
189         {DIGIT}+    {
190                     printf( "An integer: %s (%d)\n", yytext,
191                             atoi( yytext ) );
192
193
194
195Version 2.5          Last change: April 1995                    3
196
197
198
199
200
201
202FLEX(1)                  USER COMMANDS                    FLEX(1)
203
204
205
206                     }
207
208         {DIGIT}+"."{DIGIT}*        {
209                     printf( "A float: %s (%g)\n", yytext,
210                             atof( yytext ) );
211                     }
212
213         if|then|begin|end|procedure|function        {
214                     printf( "A keyword: %s\n", yytext );
215                     }
216
217         {ID}        printf( "An identifier: %s\n", yytext );
218
219         "+"|"-"|"*"|"/"   printf( "An operator: %s\n", yytext );
220
221         "{"[^}\n]*"}"     /* eat up one-line comments */
222
223         [ \t\n]+          /* eat up whitespace */
224
225         .           printf( "Unrecognized character: %s\n", yytext );
226
227         %%
228
229         main( argc, argv )
230         int argc;
231         char **argv;
232             {
233             ++argv, --argc;  /* skip over program name */
234             if ( argc > 0 )
235                     yyin = fopen( argv[0], "r" );
236             else
237                     yyin = stdin;
238
239             yylex();
240             }
241
242     This is the beginnings of a simple scanner  for  a  language
243     like  Pascal.   It  identifies different types of tokens and
244     reports on what it has seen.
245
246     The details of this example will be explained in the follow-
247     ing sections.
248
249FORMAT OF THE INPUT FILE
250     The flex input file consists of three sections, separated by
251     a line with just %% in it:
252
253         definitions
254         %%
255         rules
256         %%
257         user code
258
259
260
261Version 2.5          Last change: April 1995                    4
262
263
264
265
266
267
268FLEX(1)                  USER COMMANDS                    FLEX(1)
269
270
271
272     The definitions section contains declarations of simple name
273     definitions  to  simplify  the  scanner  specification,  and
274     declarations of start conditions, which are explained  in  a
275     later section.
276
277     Name definitions have the form:
278
279         name definition
280
281     The "name" is a word beginning with a letter  or  an  under-
282     score  ('_')  followed by zero or more letters, digits, '_',
283     or '-' (dash).  The definition is  taken  to  begin  at  the
284     first  non-white-space character following the name and con-
285     tinuing to the end of the line.  The definition  can  subse-
286     quently  be referred to using "{name}", which will expand to
287     "(definition)".  For example,
288
289         DIGIT    [0-9]
290         ID       [a-z][a-z0-9]*
291
292     defines "DIGIT" to be a regular expression which  matches  a
293     single  digit,  and  "ID"  to  be a regular expression which
294     matches a letter followed by zero-or-more letters-or-digits.
295     A subsequent reference to
296
297         {DIGIT}+"."{DIGIT}*
298
299     is identical to
300
301         ([0-9])+"."([0-9])*
302
303     and matches one-or-more digits followed by a '.' followed by
304     zero-or-more digits.
305
306     The rules section of the flex input  contains  a  series  of
307     rules of the form:
308
309         pattern   action
310
311     where the pattern must be unindented  and  the  action  must
312     begin on the same line.
313
314     See below for a further description of patterns and actions.
315
316     Finally, the user code section is simply copied to  lex.yy.c
317     verbatim.   It  is used for companion routines which call or
318     are called by the scanner.  The presence of this section  is
319     optional;  if it is missing, the second %% in the input file
320     may be skipped, too.
321
322     In the definitions and rules sections, any indented text  or
323     text  enclosed in %{ and %} is copied verbatim to the output
324
325
326
327Version 2.5          Last change: April 1995                    5
328
329
330
331
332
333
334FLEX(1)                  USER COMMANDS                    FLEX(1)
335
336
337
338     (with the %{}'s removed).  The %{}'s must appear  unindented
339     on lines by themselves.
340
341     In the rules section, any indented  or  %{}  text  appearing
342     before the first rule may be used to declare variables which
343     are local to the scanning routine and  (after  the  declara-
344     tions)  code  which  is to be executed whenever the scanning
345     routine is entered.  Other indented or %{} text in the  rule
346     section  is  still  copied to the output, but its meaning is
347     not well-defined and it may well cause  compile-time  errors
348     (this feature is present for POSIX compliance; see below for
349     other such features).
350
351     In the definitions section (but not in the  rules  section),
352     an  unindented comment (i.e., a line beginning with "/*") is
353     also copied verbatim to the output up to the next "*/".
354
355PATTERNS
356     The patterns in the input are written using an extended  set
357     of regular expressions.  These are:
358
359         x          match the character 'x'
360         .          any character (byte) except newline
361         [xyz]      a "character class"; in this case, the pattern
362                      matches either an 'x', a 'y', or a 'z'
363         [abj-oZ]   a "character class" with a range in it; matches
364                      an 'a', a 'b', any letter from 'j' through 'o',
365                      or a 'Z'
366         [^A-Z]     a "negated character class", i.e., any character
367                      but those in the class.  In this case, any
368                      character EXCEPT an uppercase letter.
369         [^A-Z\n]   any character EXCEPT an uppercase letter or
370                      a newline
371         r*         zero or more r's, where r is any regular expression
372         r+         one or more r's
373         r?         zero or one r's (that is, "an optional r")
374         r{2,5}     anywhere from two to five r's
375         r{2,}      two or more r's
376         r{4}       exactly 4 r's
377         {name}     the expansion of the "name" definition
378                    (see above)
379         "[xyz]\"foo"
380                    the literal string: [xyz]"foo
381         \X         if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
382                      then the ANSI-C interpretation of \x.
383                      Otherwise, a literal 'X' (used to escape
384                      operators such as '*')
385         \0         a NUL character (ASCII code 0)
386         \123       the character with octal value 123
387         \x2a       the character with hexadecimal value 2a
388         (r)        match an r; parentheses are used to override
389                      precedence (see below)
390
391
392
393Version 2.5          Last change: April 1995                    6
394
395
396
397
398
399
400FLEX(1)                  USER COMMANDS                    FLEX(1)
401
402
403
404         rs         the regular expression r followed by the
405                      regular expression s; called "concatenation"
406
407
408         r|s        either an r or an s
409
410
411         r/s        an r but only if it is followed by an s.  The
412                      text matched by s is included when determining
413                      whether this rule is the "longest match",
414                      but is then returned to the input before
415                      the action is executed.  So the action only
416                      sees the text matched by r.  This type
417                      of pattern is called trailing context".
418                      (There are some combinations of r/s that flex
419                      cannot match correctly; see notes in the
420                      Deficiencies / Bugs section below regarding
421                      "dangerous trailing context".)
422         ^r         an r, but only at the beginning of a line (i.e.,
423                      which just starting to scan, or right after a
424                      newline has been scanned).
425         r$         an r, but only at the end of a line (i.e., just
426                      before a newline).  Equivalent to "r/\n".
427
428                    Note that flex's notion of "newline" is exactly
429                    whatever the C compiler used to compile flex
430                    interprets '\n' as; in particular, on some DOS
431                    systems you must either filter out \r's in the
432                    input yourself, or explicitly use r/\r\n for "r$".
433
434
435         <s>r       an r, but only in start condition s (see
436                      below for discussion of start conditions)
437         <s1,s2,s3>r
438                    same, but in any of start conditions s1,
439                      s2, or s3
440         <*>r       an r in any start condition, even an exclusive one.
441
442
443         <<EOF>>    an end-of-file
444         <s1,s2><<EOF>>
445                    an end-of-file when in start condition s1 or s2
446
447     Note that inside of a character class, all  regular  expres-
448     sion  operators  lose  their  special  meaning except escape
449     ('\') and the character class operators, '-', ']',  and,  at
450     the beginning of the class, '^'.
451
452     The regular expressions listed above are  grouped  according
453     to  precedence, from highest precedence at the top to lowest
454     at the bottom.   Those  grouped  together  have  equal  pre-
455     cedence.  For example,
456
457
458
459Version 2.5          Last change: April 1995                    7
460
461
462
463
464
465
466FLEX(1)                  USER COMMANDS                    FLEX(1)
467
468
469
470         foo|bar*
471
472     is the same as
473
474         (foo)|(ba(r*))
475
476     since the '*' operator has higher precedence than concatena-
477     tion, and concatenation higher than alternation ('|').  This
478     pattern therefore matches either the  string  "foo"  or  the
479     string "ba" followed by zero-or-more r's.  To match "foo" or
480     zero-or-more "bar"'s, use:
481
482         foo|(bar)*
483
484     and to match zero-or-more "foo"'s-or-"bar"'s:
485
486         (foo|bar)*
487
488
489     In addition to characters and ranges of characters,  charac-
490     ter  classes  can  also contain character class expressions.
491     These are expressions enclosed inside [: and  :]  delimiters
492     (which themselves must appear between the '[' and ']' of the
493     character class; other elements may occur inside the charac-
494     ter class, too).  The valid expressions are:
495
496         [:alnum:] [:alpha:] [:blank:]
497         [:cntrl:] [:digit:] [:graph:]
498         [:lower:] [:print:] [:punct:]
499         [:space:] [:upper:] [:xdigit:]
500
501     These  expressions  all  designate  a  set   of   characters
502     equivalent  to  the corresponding standard C isXXX function.
503     For example, [:alnum:] designates those characters for which
504     isalnum()  returns  true  - i.e., any alphabetic or numeric.
505     Some  systems  don't  provide  isblank(),  so  flex  defines
506     [:blank:] as a blank or a tab.
507
508     For  example,  the  following  character  classes  are   all
509     equivalent:
510
511         [[:alnum:]]
512         [[:alpha:][:digit:]
513         [[:alpha:]0-9]
514         [a-zA-Z0-9]
515
516     If your scanner is  case-insensitive  (the  -i  flag),  then
517     [:upper:] and [:lower:] are equivalent to [:alpha:].
518
519     Some notes on patterns:
520
521     -    A negated character class such as the example  "[^A-Z]"
522
523
524
525Version 2.5          Last change: April 1995                    8
526
527
528
529
530
531
532FLEX(1)                  USER COMMANDS                    FLEX(1)
533
534
535
536          above   will   match  a  newline  unless  "\n"  (or  an
537          equivalent escape sequence) is one  of  the  characters
538          explicitly  present  in  the  negated  character  class
539          (e.g., "[^A-Z\n]").  This is unlike how many other reg-
540          ular  expression tools treat negated character classes,
541          but unfortunately  the  inconsistency  is  historically
542          entrenched.   Matching  newlines  means  that a pattern
543          like [^"]* can match the entire  input  unless  there's
544          another quote in the input.
545
546     -    A rule can have at most one instance of  trailing  con-
547          text (the '/' operator or the '$' operator).  The start
548          condition, '^', and "<<EOF>>" patterns can  only  occur
549          at the beginning of a pattern, and, as well as with '/'
550          and '$', cannot be grouped inside parentheses.   A  '^'
551          which  does  not  occur at the beginning of a rule or a
552          '$' which does not occur at the end of a rule loses its
553          special  properties  and is treated as a normal charac-
554          ter.
555
556          The following are illegal:
557
558              foo/bar$
559              <sc1>foo<sc2>bar
560
561          Note  that  the  first  of  these,   can   be   written
562          "foo/bar\n".
563
564          The following will result in '$' or '^'  being  treated
565          as a normal character:
566
567              foo|(bar$)
568              foo|^bar
569
570          If what's wanted is a  "foo"  or  a  bar-followed-by-a-
571          newline,  the  following could be used (the special '|'
572          action is explained below):
573
574              foo      |
575              bar$     /* action goes here */
576
577          A similar trick will work for matching a foo or a  bar-
578          at-the-beginning-of-a-line.
579
580HOW THE INPUT IS MATCHED
581     When the generated scanner is run,  it  analyzes  its  input
582     looking  for strings which match any of its patterns.  If it
583     finds more than one match, it takes  the  one  matching  the
584     most  text  (for  trailing  context rules, this includes the
585     length of the trailing part, even though  it  will  then  be
586     returned  to the input).  If it finds two or more matches of
587     the same length, the rule listed first  in  the  flex  input
588
589
590
591Version 2.5          Last change: April 1995                    9
592
593
594
595
596
597
598FLEX(1)                  USER COMMANDS                    FLEX(1)
599
600
601
602     file is chosen.
603
604     Once the match is determined, the text corresponding to  the
605     match  (called  the  token)  is made available in the global
606     character pointer yytext,  and  its  length  in  the  global
607     integer yyleng. The action corresponding to the matched pat-
608     tern is  then  executed  (a  more  detailed  description  of
609     actions  follows),  and  then the remaining input is scanned
610     for another match.
611
612     If no match is found, then the default rule is executed: the
613     next character in the input is considered matched and copied
614     to the standard output.  Thus, the simplest legal flex input
615     is:
616
617         %%
618
619     which generates a scanner that simply copies its input  (one
620     character at a time) to its output.
621
622     Note that yytext can  be  defined  in  two  different  ways:
623     either  as  a character pointer or as a character array. You
624     can control which definition flex uses by including  one  of
625     the  special  directives  %pointer  or  %array  in the first
626     (definitions) section of your flex input.   The  default  is
627     %pointer, unless you use the -l lex compatibility option, in
628     which case yytext will be an array.  The advantage of  using
629     %pointer  is  substantially  faster  scanning  and no buffer
630     overflow when matching very large tokens (unless you run out
631     of  dynamic  memory).  The disadvantage is that you are res-
632     tricted in how your actions can modify yytext (see the  next
633     section),  and  calls  to  the unput() function destroys the
634     present contents of yytext,  which  can  be  a  considerable
635     porting headache when moving between different lex versions.
636
637     The advantage of %array is that you can then  modify  yytext
638     to your heart's content, and calls to unput() do not destroy
639     yytext (see  below).   Furthermore,  existing  lex  programs
640     sometimes access yytext externally using declarations of the
641     form:
642         extern char yytext[];
643     This definition is erroneous when used  with  %pointer,  but
644     correct for %array.
645
646     %array defines yytext to be an array of  YYLMAX  characters,
647     which  defaults to a fairly large value.  You can change the
648     size by simply #define'ing YYLMAX to a  different  value  in
649     the  first  section of your flex input.  As mentioned above,
650     with %pointer yytext grows dynamically to accommodate  large
651     tokens.  While this means your %pointer scanner can accommo-
652     date very large tokens (such as matching  entire  blocks  of
653     comments),  bear  in  mind  that  each time the scanner must
654
655
656
657Version 2.5          Last change: April 1995                   10
658
659
660
661
662
663
664FLEX(1)                  USER COMMANDS                    FLEX(1)
665
666
667
668     resize yytext it also must rescan the entire token from  the
669     beginning,  so  matching such tokens can prove slow.  yytext
670     presently does not dynamically grow if  a  call  to  unput()
671     results  in too much text being pushed back; instead, a run-
672     time error results.
673
674     Also note that  you  cannot  use  %array  with  C++  scanner
675     classes (the c++ option; see below).
676
677ACTIONS
678     Each pattern in a rule has a corresponding action, which can
679     be any arbitrary C statement.  The pattern ends at the first
680     non-escaped whitespace character; the remainder of the  line
681     is  its  action.  If the action is empty, then when the pat-
682     tern is matched the input token is  simply  discarded.   For
683     example,  here  is  the  specification  for  a program which
684     deletes all occurrences of "zap me" from its input:
685
686         %%
687         "zap me"
688
689     (It will copy all other characters in the input to the  out-
690     put since they will be matched by the default rule.)
691
692     Here is a program which compresses multiple blanks and  tabs
693     down  to a single blank, and throws away whitespace found at
694     the end of a line:
695
696         %%
697         [ \t]+        putchar( ' ' );
698         [ \t]+$       /* ignore this token */
699
700
701     If the action contains a '{', then the action spans till the
702     balancing  '}'  is  found, and the action may cross multiple
703     lines.  flex knows about C strings and comments and won't be
704     fooled  by braces found within them, but also allows actions
705     to begin with %{ and will consider the action to be all  the
706     text up to the next %} (regardless of ordinary braces inside
707     the action).
708
709     An action consisting solely of a vertical  bar  ('|')  means
710     "same  as  the  action for the next rule."  See below for an
711     illustration.
712
713     Actions can  include  arbitrary  C  code,  including  return
714     statements  to  return  a  value  to whatever routine called
715     yylex(). Each time yylex() is called it continues processing
716     tokens  from  where it last left off until it either reaches
717     the end of the file or executes a return.
718
719
720
721
722
723Version 2.5          Last change: April 1995                   11
724
725
726
727
728
729
730FLEX(1)                  USER COMMANDS                    FLEX(1)
731
732
733
734     Actions are free to modify yytext except for lengthening  it
735     (adding  characters  to  its end--these will overwrite later
736     characters in the input  stream).   This  however  does  not
737     apply  when  using  %array (see above); in that case, yytext
738     may be freely modified in any way.
739
740     Actions are free to modify yyleng except they should not  do
741     so if the action also includes use of yymore() (see below).
742
743     There are a  number  of  special  directives  which  can  be
744     included within an action:
745
746     -    ECHO copies yytext to the scanner's output.
747
748     -    BEGIN followed by the name of a start condition  places
749          the  scanner  in the corresponding start condition (see
750          below).
751
752     -    REJECT directs the scanner to proceed on to the "second
753          best"  rule which matched the input (or a prefix of the
754          input).  The rule is chosen as described above in  "How
755          the  Input  is  Matched",  and yytext and yyleng set up
756          appropriately.  It may either be one which  matched  as
757          much  text as the originally chosen rule but came later
758          in the flex input file, or one which matched less text.
759          For example, the following will both count the words in
760          the input  and  call  the  routine  special()  whenever
761          "frob" is seen:
762
763                      int word_count = 0;
764              %%
765
766              frob        special(); REJECT;
767              [^ \t\n]+   ++word_count;
768
769          Without the REJECT, any "frob"'s in the input would not
770          be  counted  as  words, since the scanner normally exe-
771          cutes only one action per token.  Multiple REJECT's are
772          allowed,  each  one finding the next best choice to the
773          currently active rule.  For example, when the following
774          scanner  scans the token "abcd", it will write "abcdab-
775          caba" to the output:
776
777              %%
778              a        |
779              ab       |
780              abc      |
781              abcd     ECHO; REJECT;
782              .|\n     /* eat up any unmatched character */
783
784          (The first three rules share the fourth's action  since
785          they   use   the  special  '|'  action.)  REJECT  is  a
786
787
788
789Version 2.5          Last change: April 1995                   12
790
791
792
793
794
795
796FLEX(1)                  USER COMMANDS                    FLEX(1)
797
798
799
800          particularly expensive feature in terms of scanner per-
801          formance; if it is used in any of the scanner's actions
802          it will  slow  down  all  of  the  scanner's  matching.
803          Furthermore,  REJECT cannot be used with the -Cf or -CF
804          options (see below).
805
806          Note also that unlike the other special actions, REJECT
807          is  a  branch;  code  immediately  following  it in the
808          action will not be executed.
809
810     -    yymore() tells  the  scanner  that  the  next  time  it
811          matches  a  rule,  the  corresponding  token  should be
812          appended onto the current value of yytext  rather  than
813          replacing  it.   For  example,  given  the input "mega-
814          kludge" the following will write "mega-mega-kludge"  to
815          the output:
816
817              %%
818              mega-    ECHO; yymore();
819              kludge   ECHO;
820
821          First "mega-" is matched  and  echoed  to  the  output.
822          Then  "kludge"  is matched, but the previous "mega-" is
823          still hanging around at the beginning of yytext so  the
824          ECHO  for  the "kludge" rule will actually write "mega-
825          kludge".
826
827     Two notes regarding use of yymore(). First, yymore() depends
828     on  the value of yyleng correctly reflecting the size of the
829     current token, so you must not  modify  yyleng  if  you  are
830     using  yymore().  Second,  the  presence  of yymore() in the
831     scanner's action entails a minor performance penalty in  the
832     scanner's matching speed.
833
834     -    yyless(n) returns all but the first n characters of the
835          current token back to the input stream, where they will
836          be rescanned when the scanner looks for the next match.
837          yytext  and  yyleng  are  adjusted appropriately (e.g.,
838          yyleng will now be equal to n ).  For example,  on  the
839          input  "foobar"  the  following will write out "foobar-
840          bar":
841
842              %%
843              foobar    ECHO; yyless(3);
844              [a-z]+    ECHO;
845
846          An argument of  0  to  yyless  will  cause  the  entire
847          current  input  string  to  be  scanned  again.  Unless
848          you've changed how the scanner will  subsequently  pro-
849          cess  its  input  (using BEGIN, for example), this will
850          result in an endless loop.
851
852
853
854
855Version 2.5          Last change: April 1995                   13
856
857
858
859
860
861
862FLEX(1)                  USER COMMANDS                    FLEX(1)
863
864
865
866     Note that yyless is a macro and can only be used in the flex
867     input file, not from other source files.
868
869     -    unput(c) puts the  character  c  back  onto  the  input
870          stream.   It  will  be the next character scanned.  The
871          following action will take the current token and  cause
872          it to be rescanned enclosed in parentheses.
873
874              {
875              int i;
876              /* Copy yytext because unput() trashes yytext */
877              char *yycopy = strdup( yytext );
878              unput( ')' );
879              for ( i = yyleng - 1; i >= 0; --i )
880                  unput( yycopy[i] );
881              unput( '(' );
882              free( yycopy );
883              }
884
885          Note that since each unput() puts the  given  character
886          back at the beginning of the input stream, pushing back
887          strings must be done back-to-front.
888
889     An important potential problem when using unput() is that if
890     you are using %pointer (the default), a call to unput() des-
891     troys the contents of yytext, starting  with  its  rightmost
892     character  and devouring one character to the left with each
893     call.  If you need the value of  yytext  preserved  after  a
894     call  to  unput() (as in the above example), you must either
895     first copy it elsewhere, or build your scanner using  %array
896     instead (see How The Input Is Matched).
897
898     Finally, note that you cannot put back  EOF  to  attempt  to
899     mark the input stream with an end-of-file.
900
901     -    input() reads the next character from the input stream.
902          For  example, the following is one way to eat up C com-
903          ments:
904
905              %%
906              "/*"        {
907                          register int c;
908
909                          for ( ; ; )
910                              {
911                              while ( (c = input()) != '*' &&
912                                      c != EOF )
913                                  ;    /* eat up text of comment */
914
915                              if ( c == '*' )
916                                  {
917                                  while ( (c = input()) == '*' )
918
919
920
921Version 2.5          Last change: April 1995                   14
922
923
924
925
926
927
928FLEX(1)                  USER COMMANDS                    FLEX(1)
929
930
931
932                                      ;
933                                  if ( c == '/' )
934                                      break;    /* found the end */
935                                  }
936
937                              if ( c == EOF )
938                                  {
939                                  error( "EOF in comment" );
940                                  break;
941                                  }
942                              }
943                          }
944
945          (Note that if the scanner is compiled using  C++,  then
946          input()  is  instead referred to as yyinput(), in order
947          to avoid a name clash with the C++ stream by  the  name
948          of input.)
949
950     -    YY_FLUSH_BUFFER flushes the scanner's  internal  buffer
951          so  that  the next time the scanner attempts to match a
952          token, it will first refill the buffer  using  YY_INPUT
953          (see  The  Generated Scanner, below).  This action is a
954          special case  of  the  more  general  yy_flush_buffer()
955          function, described below in the section Multiple Input
956          Buffers.
957
958     -    yyterminate() can be used in lieu of a return statement
959          in  an action.  It terminates the scanner and returns a
960          0 to the scanner's caller, indicating "all  done".   By
961          default,  yyterminate()  is also called when an end-of-
962          file is encountered.  It is a macro and  may  be  rede-
963          fined.
964
965THE GENERATED SCANNER
966     The output of flex is the file lex.yy.c, which contains  the
967     scanning  routine yylex(), a number of tables used by it for
968     matching tokens, and a number of auxiliary routines and mac-
969     ros.  By default, yylex() is declared as follows:
970
971         int yylex()
972             {
973             ... various definitions and the actions in here ...
974             }
975
976     (If your environment supports function prototypes,  then  it
977     will  be  "int  yylex(  void  )".)   This  definition may be
978     changed by defining the "YY_DECL" macro.  For  example,  you
979     could use:
980
981         #define YY_DECL float lexscan( a, b ) float a, b;
982
983     to give the scanning routine the name lexscan,  returning  a
984
985
986
987Version 2.5          Last change: April 1995                   15
988
989
990
991
992
993
994FLEX(1)                  USER COMMANDS                    FLEX(1)
995
996
997
998     float, and taking two floats as arguments.  Note that if you
999     give  arguments  to  the  scanning  routine  using  a   K&R-
1000     style/non-prototyped  function  declaration,  you  must ter-
1001     minate the definition with a semi-colon (;).
1002
1003     Whenever yylex() is called, it scans tokens from the  global
1004     input  file  yyin  (which  defaults to stdin).  It continues
1005     until it either reaches an end-of-file (at  which  point  it
1006     returns the value 0) or one of its actions executes a return
1007     statement.
1008
1009     If the scanner reaches an end-of-file, subsequent calls  are
1010     undefined  unless either yyin is pointed at a new input file
1011     (in which case scanning continues from that file), or yyres-
1012     tart()  is called.  yyrestart() takes one argument, a FILE *
1013     pointer (which can be nil, if you've set up YY_INPUT to scan
1014     from  a  source  other  than yyin), and initializes yyin for
1015     scanning from that file.  Essentially there is no difference
1016     between  just  assigning  yyin  to a new input file or using
1017     yyrestart() to do so; the latter is available  for  compati-
1018     bility with previous versions of flex, and because it can be
1019     used to switch input files in the middle  of  scanning.   It
1020     can  also be used to throw away the current input buffer, by
1021     calling it with an argument of yyin; but better  is  to  use
1022     YY_FLUSH_BUFFER (see above).  Note that yyrestart() does not
1023     reset the start condition to INITIAL (see Start  Conditions,
1024     below).
1025
1026     If yylex() stops scanning due to executing a  return  state-
1027     ment  in  one of the actions, the scanner may then be called
1028     again and it will resume scanning where it left off.
1029
1030     By default (and for purposes  of  efficiency),  the  scanner
1031     uses  block-reads  rather  than  simple getc() calls to read
1032     characters from yyin. The nature of how it  gets  its  input
1033     can   be   controlled   by   defining  the  YY_INPUT  macro.
1034     YY_INPUT's           calling           sequence           is
1035     "YY_INPUT(buf,result,max_size)".   Its action is to place up
1036     to max_size characters in the character array buf and return
1037     in  the integer variable result either the number of charac-
1038     ters read or the constant YY_NULL (0  on  Unix  systems)  to
1039     indicate  EOF.   The  default YY_INPUT reads from the global
1040     file-pointer "yyin".
1041
1042     A sample definition of YY_INPUT (in the definitions  section
1043     of the input file):
1044
1045         %{
1046         #define YY_INPUT(buf,result,max_size) \
1047             { \
1048             int c = getchar(); \
1049             result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \
1050
1051
1052
1053Version 2.5          Last change: April 1995                   16
1054
1055
1056
1057
1058
1059
1060FLEX(1)                  USER COMMANDS                    FLEX(1)
1061
1062
1063
1064             }
1065         %}
1066
1067     This definition will change the input  processing  to  occur
1068     one character at a time.
1069
1070     When the scanner receives  an  end-of-file  indication  from
1071     YY_INPUT, it then checks the yywrap() function.  If yywrap()
1072     returns false (zero), then it is assumed that  the  function
1073     has  gone  ahead  and  set up yyin to point to another input
1074     file, and scanning continues.   If  it  returns  true  (non-
1075     zero),  then  the  scanner  terminates,  returning  0 to its
1076     caller.  Note that  in  either  case,  the  start  condition
1077     remains unchanged; it does not revert to INITIAL.
1078
1079     If you do not supply your own version of yywrap(), then  you
1080     must  either use %option noyywrap (in which case the scanner
1081     behaves as though yywrap() returned 1),  or  you  must  link
1082     with  -lfl  to  obtain  the  default version of the routine,
1083     which always returns 1.
1084
1085     Three routines are available  for  scanning  from  in-memory
1086     buffers     rather     than     files:     yy_scan_string(),
1087     yy_scan_bytes(), and yy_scan_buffer(). See the discussion of
1088     them below in the section Multiple Input Buffers.
1089
1090     The scanner writes its  ECHO  output  to  the  yyout  global
1091     (default, stdout), which may be redefined by the user simply
1092     by assigning it to some other FILE pointer.
1093
1094START CONDITIONS
1095     flex  provides  a  mechanism  for  conditionally  activating
1096     rules.   Any rule whose pattern is prefixed with "<sc>" will
1097     only be active when the scanner is in  the  start  condition
1098     named "sc".  For example,
1099
1100         <STRING>[^"]*        { /* eat up the string body ... */
1101                     ...
1102                     }
1103
1104     will be active only when the  scanner  is  in  the  "STRING"
1105     start condition, and
1106
1107         <INITIAL,STRING,QUOTE>\.        { /* handle an escape ... */
1108                     ...
1109                     }
1110
1111     will be active only when  the  current  start  condition  is
1112     either "INITIAL", "STRING", or "QUOTE".
1113
1114     Start conditions are declared  in  the  definitions  (first)
1115     section  of  the input using unindented lines beginning with
1116
1117
1118
1119Version 2.5          Last change: April 1995                   17
1120
1121
1122
1123
1124
1125
1126FLEX(1)                  USER COMMANDS                    FLEX(1)
1127
1128
1129
1130     either %s or %x followed by a list  of  names.   The  former
1131     declares  inclusive  start  conditions, the latter exclusive
1132     start conditions.  A start condition is activated using  the
1133     BEGIN  action.   Until  the  next  BEGIN action is executed,
1134     rules with the given start  condition  will  be  active  and
1135     rules  with other start conditions will be inactive.  If the
1136     start condition is inclusive, then rules with no start  con-
1137     ditions  at  all  will  also be active.  If it is exclusive,
1138     then only rules qualified with the start condition  will  be
1139     active.   A  set  of  rules contingent on the same exclusive
1140     start condition describe a scanner which is  independent  of
1141     any  of the other rules in the flex input.  Because of this,
1142     exclusive start conditions make it easy  to  specify  "mini-
1143     scanners"  which scan portions of the input that are syntac-
1144     tically different from the rest (e.g., comments).
1145
1146     If the distinction between  inclusive  and  exclusive  start
1147     conditions  is still a little vague, here's a simple example
1148     illustrating the connection between the  two.   The  set  of
1149     rules:
1150
1151         %s example
1152         %%
1153
1154         <example>foo   do_something();
1155
1156         bar            something_else();
1157
1158     is equivalent to
1159
1160         %x example
1161         %%
1162
1163         <example>foo   do_something();
1164
1165         <INITIAL,example>bar    something_else();
1166
1167     Without the <INITIAL,example> qualifier, the bar pattern  in
1168     the second example wouldn't be active (i.e., couldn't match)
1169     when in start condition example. If we just  used  <example>
1170     to  qualify  bar,  though,  then  it would only be active in
1171     example and not in INITIAL, while in the first example  it's
1172     active  in  both,  because  in the first example the example
1173     startion condition is an inclusive (%s) start condition.
1174
1175     Also note that the  special  start-condition  specifier  <*>
1176     matches  every  start  condition.   Thus,  the above example
1177     could also have been written;
1178
1179         %x example
1180         %%
1181
1182
1183
1184
1185Version 2.5          Last change: April 1995                   18
1186
1187
1188
1189
1190
1191
1192FLEX(1)                  USER COMMANDS                    FLEX(1)
1193
1194
1195
1196         <example>foo   do_something();
1197
1198         <*>bar    something_else();
1199
1200
1201     The default rule (to ECHO any unmatched  character)  remains
1202     active in start conditions.  It is equivalent to:
1203
1204         <*>.|\n     ECHO;
1205
1206
1207     BEGIN(0) returns to the original state where only the  rules
1208     with no start conditions are active.  This state can also be
1209     referred   to   as   the   start-condition   "INITIAL",   so
1210     BEGIN(INITIAL)  is  equivalent to BEGIN(0). (The parentheses
1211     around the start condition name are  not  required  but  are
1212     considered good style.)
1213
1214     BEGIN actions can also be given  as  indented  code  at  the
1215     beginning  of the rules section.  For example, the following
1216     will cause the scanner to enter the "SPECIAL"  start  condi-
1217     tion  whenever  yylex()  is  called  and the global variable
1218     enter_special is true:
1219
1220                 int enter_special;
1221
1222         %x SPECIAL
1223         %%
1224                 if ( enter_special )
1225                     BEGIN(SPECIAL);
1226
1227         <SPECIAL>blahblahblah
1228         ...more rules follow...
1229
1230
1231     To illustrate the  uses  of  start  conditions,  here  is  a
1232     scanner  which  provides  two different interpretations of a
1233     string like "123.456".  By default it will treat it as three
1234     tokens,  the  integer  "123",  a  dot ('.'), and the integer
1235     "456".  But if the string is preceded earlier in the line by
1236     the  string  "expect-floats"  it  will  treat it as a single
1237     token, the floating-point number 123.456:
1238
1239         %{
1240         #include <math.h>
1241         %}
1242         %s expect
1243
1244         %%
1245         expect-floats        BEGIN(expect);
1246
1247         <expect>[0-9]+"."[0-9]+      {
1248
1249
1250
1251Version 2.5          Last change: April 1995                   19
1252
1253
1254
1255
1256
1257
1258FLEX(1)                  USER COMMANDS                    FLEX(1)
1259
1260
1261
1262                     printf( "found a float, = %f\n",
1263                             atof( yytext ) );
1264                     }
1265         <expect>\n           {
1266                     /* that's the end of the line, so
1267                      * we need another "expect-number"
1268                      * before we'll recognize any more
1269                      * numbers
1270                      */
1271                     BEGIN(INITIAL);
1272                     }
1273
1274         [0-9]+      {
1275                     printf( "found an integer, = %d\n",
1276                             atoi( yytext ) );
1277                     }
1278
1279         "."         printf( "found a dot\n" );
1280
1281     Here is a scanner which recognizes (and discards) C comments
1282     while maintaining a count of the current input line.
1283
1284         %x comment
1285         %%
1286                 int line_num = 1;
1287
1288         "/*"         BEGIN(comment);
1289
1290         <comment>[^*\n]*        /* eat anything that's not a '*' */
1291         <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
1292         <comment>\n             ++line_num;
1293         <comment>"*"+"/"        BEGIN(INITIAL);
1294
1295     This scanner goes to a bit of trouble to match as much  text
1296     as  possible with each rule.  In general, when attempting to
1297     write a high-speed scanner try to match as much possible  in
1298     each rule, as it's a big win.
1299
1300     Note that start-conditions names are really  integer  values
1301     and  can  be  stored  as  such.   Thus,  the  above could be
1302     extended in the following fashion:
1303
1304         %x comment foo
1305         %%
1306                 int line_num = 1;
1307                 int comment_caller;
1308
1309         "/*"         {
1310                      comment_caller = INITIAL;
1311                      BEGIN(comment);
1312                      }
1313
1314
1315
1316
1317Version 2.5          Last change: April 1995                   20
1318
1319
1320
1321
1322
1323
1324FLEX(1)                  USER COMMANDS                    FLEX(1)
1325
1326
1327
1328         ...
1329
1330         <foo>"/*"    {
1331                      comment_caller = foo;
1332                      BEGIN(comment);
1333                      }
1334
1335         <comment>[^*\n]*        /* eat anything that's not a '*' */
1336         <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
1337         <comment>\n             ++line_num;
1338         <comment>"*"+"/"        BEGIN(comment_caller);
1339
1340     Furthermore, you can  access  the  current  start  condition
1341     using  the  integer-valued YY_START macro.  For example, the
1342     above assignments to comment_caller could instead be written
1343
1344         comment_caller = YY_START;
1345
1346     Flex provides YYSTATE as an alias for YY_START  (since  that
1347     is what's used by AT&T lex).
1348
1349     Note that start conditions do not have their own name-space;
1350     %s's   and  %x's  declare  names  in  the  same  fashion  as
1351     #define's.
1352
1353     Finally, here's an example of how to  match  C-style  quoted
1354     strings using exclusive start conditions, including expanded
1355     escape sequences (but not including checking  for  a  string
1356     that's too long):
1357
1358         %x str
1359
1360         %%
1361                 char string_buf[MAX_STR_CONST];
1362                 char *string_buf_ptr;
1363
1364
1365         \"      string_buf_ptr = string_buf; BEGIN(str);
1366
1367         <str>\"        { /* saw closing quote - all done */
1368                 BEGIN(INITIAL);
1369                 *string_buf_ptr = '\0';
1370                 /* return string constant token type and
1371                  * value to parser
1372                  */
1373                 }
1374
1375         <str>\n        {
1376                 /* error - unterminated string constant */
1377                 /* generate error message */
1378                 }
1379
1380
1381
1382
1383Version 2.5          Last change: April 1995                   21
1384
1385
1386
1387
1388
1389
1390FLEX(1)                  USER COMMANDS                    FLEX(1)
1391
1392
1393
1394         <str>\\[0-7]{1,3} {
1395                 /* octal escape sequence */
1396                 int result;
1397
1398                 (void) sscanf( yytext + 1, "%o", &result );
1399
1400                 if ( result > 0xff )
1401                         /* error, constant is out-of-bounds */
1402
1403                 *string_buf_ptr++ = result;
1404                 }
1405
1406         <str>\\[0-9]+ {
1407                 /* generate error - bad escape sequence; something
1408                  * like '\48' or '\0777777'
1409                  */
1410                 }
1411
1412         <str>\\n  *string_buf_ptr++ = '\n';
1413         <str>\\t  *string_buf_ptr++ = '\t';
1414         <str>\\r  *string_buf_ptr++ = '\r';
1415         <str>\\b  *string_buf_ptr++ = '\b';
1416         <str>\\f  *string_buf_ptr++ = '\f';
1417
1418         <str>\\(.|\n)  *string_buf_ptr++ = yytext[1];
1419
1420         <str>[^\\\n\"]+        {
1421                 char *yptr = yytext;
1422
1423                 while ( *yptr )
1424                         *string_buf_ptr++ = *yptr++;
1425                 }
1426
1427
1428     Often, such as in some of the examples above,  you  wind  up
1429     writing  a  whole  bunch  of  rules all preceded by the same
1430     start condition(s).  Flex makes this  a  little  easier  and
1431     cleaner  by introducing a notion of start condition scope. A
1432     start condition scope is begun with:
1433
1434         <SCs>{
1435
1436     where SCs is a list of one or more start conditions.  Inside
1437     the  start condition scope, every rule automatically has the
1438     prefix <SCs> applied to it, until a '}'  which  matches  the
1439     initial '{'. So, for example,
1440
1441         <ESC>{
1442             "\\n"   return '\n';
1443             "\\r"   return '\r';
1444             "\\f"   return '\f';
1445             "\\0"   return '\0';
1446
1447
1448
1449Version 2.5          Last change: April 1995                   22
1450
1451
1452
1453
1454
1455
1456FLEX(1)                  USER COMMANDS                    FLEX(1)
1457
1458
1459
1460         }
1461
1462     is equivalent to:
1463
1464         <ESC>"\\n"  return '\n';
1465         <ESC>"\\r"  return '\r';
1466         <ESC>"\\f"  return '\f';
1467         <ESC>"\\0"  return '\0';
1468
1469     Start condition scopes may be nested.
1470
1471     Three routines are  available  for  manipulating  stacks  of
1472     start conditions:
1473
1474     void yy_push_state(int new_state)
1475          pushes the current start condition onto the top of  the
1476          start  condition  stack  and  switches  to new_state as
1477          though you had used BEGIN new_state (recall that  start
1478          condition names are also integers).
1479
1480     void yy_pop_state()
1481          pops the top of the stack and switches to it via BEGIN.
1482
1483     int yy_top_state()
1484          returns the top  of  the  stack  without  altering  the
1485          stack's contents.
1486
1487     The start condition stack grows dynamically and  so  has  no
1488     built-in  size  limitation.  If memory is exhausted, program
1489     execution aborts.
1490
1491     To use start condition stacks, your scanner must  include  a
1492     %option stack directive (see Options below).
1493
1494MULTIPLE INPUT BUFFERS
1495     Some scanners (such as those which support "include"  files)
1496     require   reading  from  several  input  streams.   As  flex
1497     scanners do a large amount of buffering, one cannot  control
1498     where  the  next input will be read from by simply writing a
1499     YY_INPUT  which  is  sensitive  to  the  scanning   context.
1500     YY_INPUT  is only called when the scanner reaches the end of
1501     its buffer, which may be a long time after scanning a state-
1502     ment such as an "include" which requires switching the input
1503     source.
1504
1505     To negotiate  these  sorts  of  problems,  flex  provides  a
1506     mechanism  for creating and switching between multiple input
1507     buffers.  An input buffer is created by using:
1508
1509         YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
1510
1511     which takes a FILE pointer and a size and creates  a  buffer
1512
1513
1514
1515Version 2.5          Last change: April 1995                   23
1516
1517
1518
1519
1520
1521
1522FLEX(1)                  USER COMMANDS                    FLEX(1)
1523
1524
1525
1526     associated with the given file and large enough to hold size
1527     characters (when in doubt, use YY_BUF_SIZE  for  the  size).
1528     It  returns  a  YY_BUFFER_STATE  handle,  which  may then be
1529     passed to other routines (see below).   The  YY_BUFFER_STATE
1530     type is a pointer to an opaque struct yy_buffer_state struc-
1531     ture, so you may safely initialize YY_BUFFER_STATE variables
1532     to  ((YY_BUFFER_STATE) 0) if you wish, and also refer to the
1533     opaque structure in order to correctly declare input buffers
1534     in  source files other than that of your scanner.  Note that
1535     the FILE pointer in the call  to  yy_create_buffer  is  only
1536     used  as the value of yyin seen by YY_INPUT; if you redefine
1537     YY_INPUT so it no longer uses yyin, then you can safely pass
1538     a nil FILE pointer to yy_create_buffer. You select a partic-
1539     ular buffer to scan from using:
1540
1541         void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
1542
1543     switches the scanner's input  buffer  so  subsequent  tokens
1544     will  come  from new_buffer. Note that yy_switch_to_buffer()
1545     may be used by yywrap() to set things up for continued scan-
1546     ning, instead of opening a new file and pointing yyin at it.
1547     Note  also  that  switching   input   sources   via   either
1548     yy_switch_to_buffer()  or yywrap() does not change the start
1549     condition.
1550
1551         void yy_delete_buffer( YY_BUFFER_STATE buffer )
1552
1553     is used to reclaim the storage associated with a buffer.   (
1554     buffer  can be nil, in which case the routine does nothing.)
1555     You can also clear the current contents of a buffer using:
1556
1557         void yy_flush_buffer( YY_BUFFER_STATE buffer )
1558
1559     This function discards the buffer's contents,  so  the  next
1560     time  the scanner attempts to match a token from the buffer,
1561     it will first fill the buffer anew using YY_INPUT.
1562
1563     yy_new_buffer() is an alias for yy_create_buffer(), provided
1564     for  compatibility  with  the  C++ use of new and delete for
1565     creating and destroying dynamic objects.
1566
1567     Finally,   the    YY_CURRENT_BUFFER    macro    returns    a
1568     YY_BUFFER_STATE handle to the current buffer.
1569
1570     Here is an example of using these  features  for  writing  a
1571     scanner  which expands include files (the <<EOF>> feature is
1572     discussed below):
1573
1574         /* the "incl" state is used for picking up the name
1575          * of an include file
1576          */
1577         %x incl
1578
1579
1580
1581Version 2.5          Last change: April 1995                   24
1582
1583
1584
1585
1586
1587
1588FLEX(1)                  USER COMMANDS                    FLEX(1)
1589
1590
1591
1592         %{
1593         #define MAX_INCLUDE_DEPTH 10
1594         YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
1595         int include_stack_ptr = 0;
1596         %}
1597
1598         %%
1599         include             BEGIN(incl);
1600
1601         [a-z]+              ECHO;
1602         [^a-z\n]*\n?        ECHO;
1603
1604         <incl>[ \t]*      /* eat the whitespace */
1605         <incl>[^ \t\n]+   { /* got the include file name */
1606                 if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
1607                     {
1608                     fprintf( stderr, "Includes nested too deeply" );
1609                     exit( 1 );
1610                     }
1611
1612                 include_stack[include_stack_ptr++] =
1613                     YY_CURRENT_BUFFER;
1614
1615                 yyin = fopen( yytext, "r" );
1616
1617                 if ( ! yyin )
1618                     error( ... );
1619
1620                 yy_switch_to_buffer(
1621                     yy_create_buffer( yyin, YY_BUF_SIZE ) );
1622
1623                 BEGIN(INITIAL);
1624                 }
1625
1626         <<EOF>> {
1627                 if ( --include_stack_ptr < 0 )
1628                     {
1629                     yyterminate();
1630                     }
1631
1632                 else
1633                     {
1634                     yy_delete_buffer( YY_CURRENT_BUFFER );
1635                     yy_switch_to_buffer(
1636                          include_stack[include_stack_ptr] );
1637                     }
1638                 }
1639
1640     Three routines are available for setting  up  input  buffers
1641     for  scanning  in-memory  strings  instead of files.  All of
1642     them create a new input buffer for scanning the string,  and
1643     return  a  corresponding  YY_BUFFER_STATE  handle (which you
1644
1645
1646
1647Version 2.5          Last change: April 1995                   25
1648
1649
1650
1651
1652
1653
1654FLEX(1)                  USER COMMANDS                    FLEX(1)
1655
1656
1657
1658     should delete with yy_delete_buffer() when  done  with  it).
1659     They    also    switch    to    the    new    buffer   using
1660     yy_switch_to_buffer(), so the  next  call  to  yylex()  will
1661     start scanning the string.
1662
1663     yy_scan_string(const char *str)
1664          scans a NUL-terminated string.
1665
1666     yy_scan_bytes(const char *bytes, int len)
1667          scans len bytes (including possibly NUL's) starting  at
1668          location bytes.
1669
1670     Note that both of these functions create and scan a copy  of
1671     the  string or bytes.  (This may be desirable, since yylex()
1672     modifies the contents of the buffer it  is  scanning.)   You
1673     can avoid the copy by using:
1674
1675     yy_scan_buffer(char *base, yy_size_t size)
1676          which scans in place the buffer starting at base,  con-
1677          sisting of size bytes, the last two bytes of which must
1678          be YY_END_OF_BUFFER_CHAR (ASCII NUL).  These  last  two
1679          bytes  are  not  scanned;  thus,  scanning  consists of
1680          base[0] through base[size-2], inclusive.
1681
1682          If you fail to set up base in this manner (i.e., forget
1683          the   final   two  YY_END_OF_BUFFER_CHAR  bytes),  then
1684          yy_scan_buffer()  returns  a  nil  pointer  instead  of
1685          creating a new input buffer.
1686
1687          The type yy_size_t is an integral type to which you can
1688          cast  an  integer expression reflecting the size of the
1689          buffer.
1690
1691END-OF-FILE RULES
1692     The special rule "<<EOF>>" indicates actions which are to be
1693     taken  when  an  end-of-file  is  encountered  and  yywrap()
1694     returns non-zero (i.e., indicates no further files  to  pro-
1695     cess).  The action must finish by doing one of four things:
1696
1697     -    assigning yyin to a new input file  (in  previous  ver-
1698          sions  of  flex,  after doing the assignment you had to
1699          call the special action YY_NEW_FILE; this is no  longer
1700          necessary);
1701
1702     -    executing a return statement;
1703
1704     -    executing the special yyterminate() action;
1705
1706     -    or,    switching    to    a    new     buffer     using
1707          yy_switch_to_buffer() as shown in the example above.
1708
1709
1710
1711
1712
1713Version 2.5          Last change: April 1995                   26
1714
1715
1716
1717
1718
1719
1720FLEX(1)                  USER COMMANDS                    FLEX(1)
1721
1722
1723
1724     <<EOF>> rules may not be used with other patterns; they  may
1725     only  be  qualified  with a list of start conditions.  If an
1726     unqualified <<EOF>> rule is given, it applies to  all  start
1727     conditions  which  do  not already have <<EOF>> actions.  To
1728     specify an <<EOF>> rule for only the  initial  start  condi-
1729     tion, use
1730
1731         <INITIAL><<EOF>>
1732
1733
1734     These rules are useful for  catching  things  like  unclosed
1735     comments.  An example:
1736
1737         %x quote
1738         %%
1739
1740         ...other rules for dealing with quotes...
1741
1742         <quote><<EOF>>   {
1743                  error( "unterminated quote" );
1744                  yyterminate();
1745                  }
1746         <<EOF>>  {
1747                  if ( *++filelist )
1748                      yyin = fopen( *filelist, "r" );
1749                  else
1750                     yyterminate();
1751                  }
1752
1753
1754MISCELLANEOUS MACROS
1755     The macro YY_USER_ACTION can be defined to provide an action
1756     which is always executed prior to the matched rule's action.
1757     For example, it could be #define'd to call a routine to con-
1758     vert  yytext to lower-case.  When YY_USER_ACTION is invoked,
1759     the variable yy_act gives the number  of  the  matched  rule
1760     (rules  are  numbered starting with 1).  Suppose you want to
1761     profile how often each of your rules is matched.   The  fol-
1762     lowing would do the trick:
1763
1764         #define YY_USER_ACTION ++ctr[yy_act]
1765
1766     where ctr is an array to hold the counts for  the  different
1767     rules.   Note  that  the  macro YY_NUM_RULES gives the total
1768     number of rules (including the default rule, even if you use
1769     -s), so a correct declaration for ctr is:
1770
1771         int ctr[YY_NUM_RULES];
1772
1773
1774     The macro YY_USER_INIT may be defined to provide  an  action
1775     which  is  always executed before the first scan (and before
1776
1777
1778
1779Version 2.5          Last change: April 1995                   27
1780
1781
1782
1783
1784
1785
1786FLEX(1)                  USER COMMANDS                    FLEX(1)
1787
1788
1789
1790     the scanner's internal initializations are done).  For exam-
1791     ple,  it  could  be used to call a routine to read in a data
1792     table or open a logging file.
1793
1794     The macro yy_set_interactive(is_interactive) can be used  to
1795     control  whether  the  current buffer is considered interac-
1796     tive. An interactive buffer is processed  more  slowly,  but
1797     must  be  used  when  the  scanner's  input source is indeed
1798     interactive to avoid problems due to waiting to fill buffers
1799     (see the discussion of the -I flag below).  A non-zero value
1800     in the macro invocation marks the buffer as  interactive,  a
1801     zero  value as non-interactive.  Note that use of this macro
1802     overrides  %option  always-interactive  or  %option   never-
1803     interactive  (see Options below).  yy_set_interactive() must
1804     be invoked prior to beginning to scan the buffer that is (or
1805     is not) to be considered interactive.
1806
1807     The macro yy_set_bol(at_bol) can be used to control  whether
1808     the  current  buffer's  scanning  context for the next token
1809     match is done as though at the beginning of a line.  A  non-
1810     zero macro argument makes rules anchored with
1811
1812     The macro YY_AT_BOL() returns true if the next token scanned
1813     from  the  current  buffer will have '^' rules active, false
1814     otherwise.
1815
1816     In the generated scanner, the actions are  all  gathered  in
1817     one  large  switch  statement  and separated using YY_BREAK,
1818     which may be redefined.  By default, it is simply a "break",
1819     to  separate  each  rule's action from the following rule's.
1820     Redefining  YY_BREAK  allows,  for  example,  C++  users  to
1821     #define  YY_BREAK  to  do  nothing (while being very careful
1822     that every rule ends with a "break" or a "return"!) to avoid
1823     suffering  from unreachable statement warnings where because
1824     a rule's action ends with "return", the YY_BREAK is inacces-
1825     sible.
1826
1827VALUES AVAILABLE TO THE USER
1828     This section summarizes the various values available to  the
1829     user in the rule actions.
1830
1831     -    char *yytext holds the text of the current  token.   It
1832          may  be  modified but not lengthened (you cannot append
1833          characters to the end).
1834
1835          If the special directive %array appears  in  the  first
1836          section  of  the  scanner  description,  then yytext is
1837          instead declared char yytext[YYLMAX], where YYLMAX is a
1838          macro  definition  that  you  can redefine in the first
1839          section if you don't like the default value  (generally
1840          8KB).    Using   %array   results  in  somewhat  slower
1841          scanners, but the value of  yytext  becomes  immune  to
1842
1843
1844
1845Version 2.5          Last change: April 1995                   28
1846
1847
1848
1849
1850
1851
1852FLEX(1)                  USER COMMANDS                    FLEX(1)
1853
1854
1855
1856          calls to input() and unput(), which potentially destroy
1857          its value when yytext  is  a  character  pointer.   The
1858          opposite of %array is %pointer, which is the default.
1859
1860          You cannot  use  %array  when  generating  C++  scanner
1861          classes (the -+ flag).
1862
1863     -    int yyleng holds the length of the current token.
1864
1865     -    FILE *yyin is the file  which  by  default  flex  reads
1866          from.   It  may  be  redefined  but doing so only makes
1867          sense before scanning begins or after an EOF  has  been
1868          encountered.  Changing it in the midst of scanning will
1869          have unexpected results since flex buffers  its  input;
1870          use  yyrestart()  instead.   Once  scanning  terminates
1871          because an end-of-file has been seen,  you  can  assign
1872          yyin  at  the  new input file and then call the scanner
1873          again to continue scanning.
1874
1875     -    void yyrestart( FILE *new_file ) may be called to point
1876          yyin at the new input file.  The switch-over to the new
1877          file is immediate (any previously buffered-up input  is
1878          lost).   Note  that calling yyrestart() with yyin as an
1879          argument thus throws away the current input buffer  and
1880          continues scanning the same input file.
1881
1882     -    FILE *yyout is the file to which ECHO actions are done.
1883          It can be reassigned by the user.
1884
1885     -    YY_CURRENT_BUFFER returns a YY_BUFFER_STATE  handle  to
1886          the current buffer.
1887
1888     -    YY_START returns an integer value corresponding to  the
1889          current start condition.  You can subsequently use this
1890          value with BEGIN to return to that start condition.
1891
1892INTERFACING WITH YACC
1893     One of the main uses of flex is as a companion to  the  yacc
1894     parser-generator.   yacc  parsers  expect  to call a routine
1895     named yylex() to find the next input token.  The routine  is
1896     supposed  to  return  the  type of the next token as well as
1897     putting any associated value in the global  yylval.  To  use
1898     flex  with  yacc,  one  specifies  the  -d option to yacc to
1899     instruct it to generate the file y.tab.h containing  defini-
1900     tions  of all the %tokens appearing in the yacc input.  This
1901     file is then included in the flex scanner.  For example,  if
1902     one of the tokens is "TOK_NUMBER", part of the scanner might
1903     look like:
1904
1905         %{
1906         #include "y.tab.h"
1907         %}
1908
1909
1910
1911Version 2.5          Last change: April 1995                   29
1912
1913
1914
1915
1916
1917
1918FLEX(1)                  USER COMMANDS                    FLEX(1)
1919
1920
1921
1922         %%
1923
1924         [0-9]+        yylval = atoi( yytext ); return TOK_NUMBER;
1925
1926
1927OPTIONS
1928     flex has the following options:
1929
1930     -b   Generate backing-up information to lex.backup. This  is
1931          a  list  of scanner states which require backing up and
1932          the input characters on which they do  so.   By  adding
1933          rules   one  can  remove  backing-up  states.   If  all
1934          backing-up states are eliminated  and  -Cf  or  -CF  is
1935          used, the generated scanner will run faster (see the -p
1936          flag).  Only users who wish to squeeze every last cycle
1937          out  of  their  scanners  need worry about this option.
1938          (See the section on Performance Considerations below.)
1939
1940     -c   is a do-nothing, deprecated option included  for  POSIX
1941          compliance.
1942
1943     -d   makes the generated scanner run in debug  mode.   When-
1944          ever   a   pattern   is   recognized   and  the  global
1945          yy_flex_debug is non-zero (which is the  default),  the
1946          scanner will write to stderr a line of the form:
1947
1948              --accepting rule at line 53 ("the matched text")
1949
1950          The line number refers to the location of the  rule  in
1951          the  file defining the scanner (i.e., the file that was
1952          fed to flex).  Messages are  also  generated  when  the
1953          scanner backs up, accepts the default rule, reaches the
1954          end of its input buffer (or encounters a NUL;  at  this
1955          point,  the  two  look the same as far as the scanner's
1956          concerned), or reaches an end-of-file.
1957
1958     -f   specifies fast scanner. No table  compression  is  done
1959          and  stdio  is bypassed.  The result is large but fast.
1960          This option is equivalent to -Cfr (see below).
1961
1962     -h   generates a "help" summary of flex's options to  stdout
1963          and then exits.  -? and --help are synonyms for -h.
1964
1965     -i   instructs flex to generate a case-insensitive  scanner.
1966          The  case  of  letters given in the flex input patterns
1967          will be ignored,  and  tokens  in  the  input  will  be
1968          matched  regardless of case.  The matched text given in
1969          yytext will have the preserved case (i.e., it will  not
1970          be folded).
1971
1972     -l   turns on maximum compatibility with the  original  AT&T
1973          lex  implementation.  Note that this does not mean full
1974
1975
1976
1977Version 2.5          Last change: April 1995                   30
1978
1979
1980
1981
1982
1983
1984FLEX(1)                  USER COMMANDS                    FLEX(1)
1985
1986
1987
1988          compatibility.  Use of this option costs a considerable
1989          amount  of  performance, and it cannot be used with the
1990          -+, -f, -F, -Cf, or -CF options.  For  details  on  the
1991          compatibilities  it provides, see the section "Incompa-
1992          tibilities With Lex And POSIX" below.  This option also
1993          results  in the name YY_FLEX_LEX_COMPAT being #define'd
1994          in the generated scanner.
1995
1996     -n   is another do-nothing, deprecated option included  only
1997          for POSIX compliance.
1998
1999     -p   generates a performance report to stderr.   The  report
2000          consists  of  comments  regarding  features of the flex
2001          input file which will cause a serious loss  of  perfor-
2002          mance  in  the resulting scanner.  If you give the flag
2003          twice, you will also get  comments  regarding  features
2004          that lead to minor performance losses.
2005
2006          Note that the use  of  REJECT,  %option  yylineno,  and
2007          variable  trailing context (see the Deficiencies / Bugs
2008          section  below)  entails  a   substantial   performance
2009          penalty;  use  of  yymore(), the ^ operator, and the -I
2010          flag entail minor performance penalties.
2011
2012     -s   causes the default rule (that unmatched  scanner  input
2013          is  echoed to stdout) to be suppressed.  If the scanner
2014          encounters input that does not match any of its  rules,
2015          it  aborts  with  an  error.  This option is useful for
2016          finding holes in a scanner's rule set.
2017
2018     -t   instructs flex to write the  scanner  it  generates  to
2019          standard output instead of lex.yy.c.
2020
2021     -v   specifies that flex should write to stderr a summary of
2022          statistics regarding the scanner it generates.  Most of
2023          the statistics are meaningless to the casual flex user,
2024          but the first line identifies the version of flex (same
2025          as reported by -V), and the next line  the  flags  used
2026          when  generating  the scanner, including those that are
2027          on by default.
2028
2029     -w   suppresses warning messages.
2030
2031     -B   instructs flex to generate a batch scanner,  the  oppo-
2032          site  of  interactive  scanners  generated  by  -I (see
2033          below).  In general, you use -B when  you  are  certain
2034          that your scanner will never be used interactively, and
2035          you want to squeeze a little more  performance  out  of
2036          it.   If your goal is instead to squeeze out a lot more
2037          performance, you  should   be  using  the  -Cf  or  -CF
2038          options  (discussed  below), which turn on -B automati-
2039          cally anyway.
2040
2041
2042
2043Version 2.5          Last change: April 1995                   31
2044
2045
2046
2047
2048
2049
2050FLEX(1)                  USER COMMANDS                    FLEX(1)
2051
2052
2053
2054     -F   specifies that the fast  scanner  table  representation
2055          should  be used (and stdio bypassed).  This representa-
2056          tion is about as fast as the full table  representation
2057          (-f),  and  for some sets of patterns will be consider-
2058          ably smaller (and for others, larger).  In general,  if
2059          the  pattern  set contains both "keywords" and a catch-
2060          all, "identifier" rule, such as in the set:
2061
2062              "case"    return TOK_CASE;
2063              "switch"  return TOK_SWITCH;
2064              ...
2065              "default" return TOK_DEFAULT;
2066              [a-z]+    return TOK_ID;
2067
2068          then you're better off using the full table representa-
2069          tion.  If only the "identifier" rule is present and you
2070          then use a hash table or some such to detect  the  key-
2071          words, you're better off using -F.
2072
2073          This option is equivalent to -CFr (see below).  It can-
2074          not be used with -+.
2075
2076     -I   instructs flex to generate an interactive scanner.   An
2077          interactive  scanner  is  one  that only looks ahead to
2078          decide what token has been  matched  if  it  absolutely
2079          must.  It turns out that always looking one extra char-
2080          acter ahead, even  if  the  scanner  has  already  seen
2081          enough text to disambiguate the current token, is a bit
2082          faster than only looking  ahead  when  necessary.   But
2083          scanners  that always look ahead give dreadful interac-
2084          tive performance; for example, when a user types a new-
2085          line,  it  is  not  recognized as a newline token until
2086          they enter another token, which often means  typing  in
2087          another whole line.
2088
2089          Flex scanners default to interactive unless you use the
2090          -Cf  or  -CF  table-compression  options  (see  below).
2091          That's because if you're looking  for  high-performance
2092          you  should  be  using  one of these options, so if you
2093          didn't, flex assumes you'd rather trade off  a  bit  of
2094          run-time    performance   for   intuitive   interactive
2095          behavior.  Note also that you cannot use -I in conjunc-
2096          tion  with  -Cf or -CF. Thus, this option is not really
2097          needed; it is on by default  for  all  those  cases  in
2098          which it is allowed.
2099
2100          You can force a scanner to not be interactive by  using
2101          -B (see above).
2102
2103     -L   instructs  flex  not  to  generate  #line   directives.
2104          Without this option, flex peppers the generated scanner
2105          with #line directives so error messages in the  actions
2106
2107
2108
2109Version 2.5          Last change: April 1995                   32
2110
2111
2112
2113
2114
2115
2116FLEX(1)                  USER COMMANDS                    FLEX(1)
2117
2118
2119
2120          will  be  correctly  located with respect to either the
2121          original flex input file (if the errors are due to code
2122          in  the  input  file),  or  lex.yy.c (if the errors are
2123          flex's fault -- you should report these sorts of errors
2124          to the email address given below).
2125
2126     -T   makes flex run in trace mode.  It will generate  a  lot
2127          of  messages to stderr concerning the form of the input
2128          and the resultant non-deterministic  and  deterministic
2129          finite  automata.   This  option  is  mostly for use in
2130          maintaining flex.
2131
2132     -V   prints the version number to stdout and exits.   --ver-
2133          sion is a synonym for -V.
2134
2135     -7   instructs flex to generate a 7-bit scanner,  i.e.,  one
2136          which  can  only  recognized  7-bit  characters  in its
2137          input.  The advantage of using -7 is that the scanner's
2138          tables  can  be  up to half the size of those generated
2139          using the -8 option (see below).  The  disadvantage  is
2140          that  such  scanners often hang or crash if their input
2141          contains an 8-bit character.
2142
2143          Note, however, that unless you  generate  your  scanner
2144          using  the -Cf or -CF table compression options, use of
2145          -7 will save only a small amount of  table  space,  and
2146          make  your  scanner considerably less portable.  Flex's
2147          default behavior is to generate an 8-bit scanner unless
2148          you  use the -Cf or -CF, in which case flex defaults to
2149          generating 7-bit scanners unless your site  was  always
2150          configured to generate 8-bit scanners (as will often be
2151          the case with non-USA sites).   You  can  tell  whether
2152          flex  generated a 7-bit or an 8-bit scanner by inspect-
2153          ing the flag summary in  the  -v  output  as  described
2154          above.
2155
2156          Note that if you use -Cfe or -CFe (those table compres-
2157          sion  options,  but  also  using equivalence classes as
2158          discussed see below), flex still defaults to generating
2159          an  8-bit scanner, since usually with these compression
2160          options full 8-bit tables are not much  more  expensive
2161          than 7-bit tables.
2162
2163     -8   instructs flex to generate an 8-bit scanner, i.e.,  one
2164          which  can  recognize  8-bit  characters.  This flag is
2165          only needed for scanners generated using -Cf or -CF, as
2166          otherwise  flex defaults to generating an 8-bit scanner
2167          anyway.
2168
2169          See the discussion  of  -7  above  for  flex's  default
2170          behavior  and  the  tradeoffs  between  7-bit and 8-bit
2171          scanners.
2172
2173
2174
2175Version 2.5          Last change: April 1995                   33
2176
2177
2178
2179
2180
2181
2182FLEX(1)                  USER COMMANDS                    FLEX(1)
2183
2184
2185
2186     -+   specifies that you want flex to generate a C++  scanner
2187          class.   See  the  section  on  Generating C++ Scanners
2188          below for details.
2189
2190     -C[aefFmr]
2191          controls the degree of table compression and, more gen-
2192          erally,  trade-offs  between  small  scanners  and fast
2193          scanners.
2194
2195          -Ca ("align") instructs flex to trade off larger tables
2196          in the generated scanner for faster performance because
2197          the elements of  the  tables  are  better  aligned  for
2198          memory  access and computation.  On some RISC architec-
2199          tures, fetching  and  manipulating  longwords  is  more
2200          efficient  than with smaller-sized units such as short-
2201          words.  This option can double the size of  the  tables
2202          used by your scanner.
2203
2204          -Ce directs  flex  to  construct  equivalence  classes,
2205          i.e.,  sets  of characters which have identical lexical
2206          properties (for example,  if  the  only  appearance  of
2207          digits  in  the  flex  input  is in the character class
2208          "[0-9]" then the digits '0', '1', ..., '9' will all  be
2209          put   in  the  same  equivalence  class).   Equivalence
2210          classes usually give dramatic reductions in  the  final
2211          table/object file sizes (typically a factor of 2-5) and
2212          are pretty cheap performance-wise  (one  array  look-up
2213          per character scanned).
2214
2215          -Cf specifies that the full scanner  tables  should  be
2216          generated - flex should not compress the tables by tak-
2217          ing advantages of similar transition functions for dif-
2218          ferent states.
2219
2220          -CF specifies that the alternate fast scanner represen-
2221          tation  (described  above  under the -F flag) should be
2222          used.  This option cannot be used with -+.
2223
2224          -Cm directs flex to construct meta-equivalence classes,
2225          which  are  sets of equivalence classes (or characters,
2226          if equivalence classes are not  being  used)  that  are
2227          commonly  used  together.  Meta-equivalence classes are
2228          often a big win when using compressed tables, but  they
2229          have  a  moderate  performance  impact (one or two "if"
2230          tests and one array look-up per character scanned).
2231
2232          -Cr causes the generated scanner to bypass use  of  the
2233          standard  I/O  library  (stdio)  for input.  Instead of
2234          calling fread() or getc(), the  scanner  will  use  the
2235          read()  system  call,  resulting  in a performance gain
2236          which varies from system to system, but in  general  is
2237          probably  negligible  unless  you are also using -Cf or
2238
2239
2240
2241Version 2.5          Last change: April 1995                   34
2242
2243
2244
2245
2246
2247
2248FLEX(1)                  USER COMMANDS                    FLEX(1)
2249
2250
2251
2252          -CF. Using -Cr can cause strange behavior if, for exam-
2253          ple,  you  read  from yyin using stdio prior to calling
2254          the scanner (because the  scanner  will  miss  whatever
2255          text  your  previous  reads  left  in  the  stdio input
2256          buffer).
2257
2258          -Cr has no effect if you define YY_INPUT (see The  Gen-
2259          erated Scanner above).
2260
2261          A lone -C specifies that the scanner tables  should  be
2262          compressed  but  neither  equivalence classes nor meta-
2263          equivalence classes should be used.
2264
2265          The options -Cf or  -CF  and  -Cm  do  not  make  sense
2266          together - there is no opportunity for meta-equivalence
2267          classes if the table is not being  compressed.   Other-
2268          wise  the  options may be freely mixed, and are cumula-
2269          tive.
2270
2271          The default setting is -Cem, which specifies that  flex
2272          should   generate   equivalence   classes   and   meta-
2273          equivalence classes.  This setting provides the highest
2274          degree   of  table  compression.   You  can  trade  off
2275          faster-executing scanners at the cost of larger  tables
2276          with the following generally being true:
2277
2278              slowest & smallest
2279                    -Cem
2280                    -Cm
2281                    -Ce
2282                    -C
2283                    -C{f,F}e
2284                    -C{f,F}
2285                    -C{f,F}a
2286              fastest & largest
2287
2288          Note that scanners with the smallest tables are usually
2289          generated and compiled the quickest, so during develop-
2290          ment you will usually want to use the default,  maximal
2291          compression.
2292
2293          -Cfe is often a good compromise between speed and  size
2294          for production scanners.
2295
2296     -ooutput
2297          directs flex to write the scanner to  the  file  output
2298          instead  of  lex.yy.c.  If  you  combine -o with the -t
2299          option, then the scanner is written to stdout  but  its
2300          #line directives (see the -L option above) refer to the
2301          file output.
2302
2303     -Pprefix
2304
2305
2306
2307Version 2.5          Last change: April 1995                   35
2308
2309
2310
2311
2312
2313
2314FLEX(1)                  USER COMMANDS                    FLEX(1)
2315
2316
2317
2318          changes the default yy prefix  used  by  flex  for  all
2319          globally-visible variable and function names to instead
2320          be prefix. For  example,  -Pfoo  changes  the  name  of
2321          yytext  to  footext.  It  also  changes the name of the
2322          default output file from lex.yy.c  to  lex.foo.c.  Here
2323          are all of the names affected:
2324
2325              yy_create_buffer
2326              yy_delete_buffer
2327              yy_flex_debug
2328              yy_init_buffer
2329              yy_flush_buffer
2330              yy_load_buffer_state
2331              yy_switch_to_buffer
2332              yyin
2333              yyleng
2334              yylex
2335              yylineno
2336              yyout
2337              yyrestart
2338              yytext
2339              yywrap
2340
2341          (If you are using a C++ scanner, then only  yywrap  and
2342          yyFlexLexer  are affected.) Within your scanner itself,
2343          you can still refer to the global variables  and  func-
2344          tions  using  either  version of their name; but exter-
2345          nally, they have the modified name.
2346
2347          This option lets you easily link together multiple flex
2348          programs  into the same executable.  Note, though, that
2349          using this option also renames  yywrap(),  so  you  now
2350          must either provide your own (appropriately-named) ver-
2351          sion of the routine for your scanner,  or  use  %option
2352          noyywrap,  as  linking with -lfl no longer provides one
2353          for you by default.
2354
2355     -Sskeleton_file
2356          overrides the default skeleton  file  from  which  flex
2357          constructs its scanners.  You'll never need this option
2358          unless you are doing flex maintenance or development.
2359
2360     flex also  provides  a  mechanism  for  controlling  options
2361     within  the  scanner  specification itself, rather than from
2362     the flex command-line.  This is done  by  including  %option
2363     directives  in  the  first section of the scanner specifica-
2364     tion.  You  can  specify  multiple  options  with  a  single
2365     %option directive, and multiple directives in the first sec-
2366     tion of your flex input file.
2367
2368     Most options are given simply as names, optionally  preceded
2369     by  the word "no" (with no intervening whitespace) to negate
2370
2371
2372
2373Version 2.5          Last change: April 1995                   36
2374
2375
2376
2377
2378
2379
2380FLEX(1)                  USER COMMANDS                    FLEX(1)
2381
2382
2383
2384     their meaning.  A number are equivalent  to  flex  flags  or
2385     their negation:
2386
2387         7bit            -7 option
2388         8bit            -8 option
2389         align           -Ca option
2390         backup          -b option
2391         batch           -B option
2392         c++             -+ option
2393
2394         caseful or
2395         case-sensitive  opposite of -i (default)
2396
2397         case-insensitive or
2398         caseless        -i option
2399
2400         debug           -d option
2401         default         opposite of -s option
2402         ecs             -Ce option
2403         fast            -F option
2404         full            -f option
2405         interactive     -I option
2406         lex-compat      -l option
2407         meta-ecs        -Cm option
2408         perf-report     -p option
2409         read            -Cr option
2410         stdout          -t option
2411         verbose         -v option
2412         warn            opposite of -w option
2413                         (use "%option nowarn" for -w)
2414
2415         array           equivalent to "%array"
2416         pointer         equivalent to "%pointer" (default)
2417
2418     Some %option's provide features otherwise not available:
2419
2420     always-interactive
2421          instructs flex to generate a scanner which always  con-
2422          siders  its input "interactive".  Normally, on each new
2423          input file the scanner calls isatty() in an attempt  to
2424          determine   whether   the  scanner's  input  source  is
2425          interactive and thus should be read a  character  at  a
2426          time.   When this option is used, however, then no such
2427          call is made.
2428
2429     main directs flex to provide a default  main()  program  for
2430          the  scanner,  which  simply calls yylex(). This option
2431          implies noyywrap (see below).
2432
2433     never-interactive
2434          instructs flex to generate a scanner which  never  con-
2435          siders  its input "interactive" (again, no call made to
2436
2437
2438
2439Version 2.5          Last change: April 1995                   37
2440
2441
2442
2443
2444
2445
2446FLEX(1)                  USER COMMANDS                    FLEX(1)
2447
2448
2449
2450          isatty()). This is the opposite of always-interactive.
2451
2452     stack
2453          enables the use of start condition  stacks  (see  Start
2454          Conditions above).
2455
2456     stdinit
2457          if set (i.e., %option  stdinit)  initializes  yyin  and
2458          yyout  to  stdin  and stdout, instead of the default of
2459          nil.  Some  existing  lex  programs  depend   on   this
2460          behavior,  even though it is not compliant with ANSI C,
2461          which does not require stdin and stdout to be  compile-
2462          time constant.
2463
2464     yylineno
2465          directs flex to generate a scanner that  maintains  the
2466          number  of  the current line read from its input in the
2467          global variable yylineno. This  option  is  implied  by
2468          %option lex-compat.
2469
2470     yywrap
2471          if unset (i.e., %option noyywrap),  makes  the  scanner
2472          not  call  yywrap()  upon  an  end-of-file,  but simply
2473          assume that there are no more files to scan (until  the
2474          user  points  yyin  at  a  new  file  and calls yylex()
2475          again).
2476
2477     flex scans your rule actions to determine  whether  you  use
2478     the  REJECT  or  yymore()  features.   The reject and yymore
2479     options are available to override its decision as to whether
2480     you  use  the options, either by setting them (e.g., %option
2481     reject) to indicate the feature is indeed used, or unsetting
2482     them  to  indicate  it  actually  is not used (e.g., %option
2483     noyymore).
2484
2485     Three options take string-delimited values, offset with '=':
2486
2487         %option outfile="ABC"
2488
2489     is equivalent to -oABC, and
2490
2491         %option prefix="XYZ"
2492
2493     is equivalent to -PXYZ. Finally,
2494
2495         %option yyclass="foo"
2496
2497     only applies when generating a C++ scanner ( -+ option).  It
2498     informs  flex  that  you  have  derived foo as a subclass of
2499     yyFlexLexer, so flex will place your actions in  the  member
2500     function  foo::yylex()  instead  of yyFlexLexer::yylex(). It
2501     also generates a yyFlexLexer::yylex() member  function  that
2502
2503
2504
2505Version 2.5          Last change: April 1995                   38
2506
2507
2508
2509
2510
2511
2512FLEX(1)                  USER COMMANDS                    FLEX(1)
2513
2514
2515
2516     emits      a      run-time      error      (by      invoking
2517     yyFlexLexer::LexerError()) if called.   See  Generating  C++
2518     Scanners, below, for additional information.
2519
2520     A number of options are available for lint purists who  want
2521     to  suppress the appearance of unneeded routines in the gen-
2522     erated scanner.  Each of  the  following,  if  unset  (e.g.,
2523     %option  nounput ), results in the corresponding routine not
2524     appearing in the generated scanner:
2525
2526         input, unput
2527         yy_push_state, yy_pop_state, yy_top_state
2528         yy_scan_buffer, yy_scan_bytes, yy_scan_string
2529
2530     (though yy_push_state()  and  friends  won't  appear  anyway
2531     unless you use %option stack).
2532
2533PERFORMANCE CONSIDERATIONS
2534     The main design goal of  flex  is  that  it  generate  high-
2535     performance  scanners.   It  has  been optimized for dealing
2536     well with large sets of rules.  Aside from  the  effects  on
2537     scanner  speed  of the table compression -C options outlined
2538     above, there are a number of options/actions  which  degrade
2539     performance.  These are, from most expensive to least:
2540
2541         REJECT
2542         %option yylineno
2543         arbitrary trailing context
2544
2545         pattern sets that require backing up
2546         %array
2547         %option interactive
2548         %option always-interactive
2549
2550         '^' beginning-of-line operator
2551         yymore()
2552
2553     with the first three all being quite expensive and the  last
2554     two  being  quite  cheap.   Note also that unput() is imple-
2555     mented as a routine call that potentially does quite  a  bit
2556     of  work,  while yyless() is a quite-cheap macro; so if just
2557     putting back some excess text you scanned, use yyless().
2558
2559     REJECT should be avoided at all costs  when  performance  is
2560     important.  It is a particularly expensive option.
2561
2562     Getting rid of backing up is messy and often may be an enor-
2563     mous  amount  of work for a complicated scanner.  In princi-
2564     pal,  one  begins  by  using  the  -b  flag  to  generate  a
2565     lex.backup file.  For example, on the input
2566
2567         %%
2568
2569
2570
2571Version 2.5          Last change: April 1995                   39
2572
2573
2574
2575
2576
2577
2578FLEX(1)                  USER COMMANDS                    FLEX(1)
2579
2580
2581
2582         foo        return TOK_KEYWORD;
2583         foobar     return TOK_KEYWORD;
2584
2585     the file looks like:
2586
2587         State #6 is non-accepting -
2588          associated rule line numbers:
2589                2       3
2590          out-transitions: [ o ]
2591          jam-transitions: EOF [ \001-n  p-\177 ]
2592
2593         State #8 is non-accepting -
2594          associated rule line numbers:
2595                3
2596          out-transitions: [ a ]
2597          jam-transitions: EOF [ \001-`  b-\177 ]
2598
2599         State #9 is non-accepting -
2600          associated rule line numbers:
2601                3
2602          out-transitions: [ r ]
2603          jam-transitions: EOF [ \001-q  s-\177 ]
2604
2605         Compressed tables always back up.
2606
2607     The first few lines tell us that there's a scanner state  in
2608     which  it  can  make  a  transition on an 'o' but not on any
2609     other character,  and  that  in  that  state  the  currently
2610     scanned text does not match any rule.  The state occurs when
2611     trying to match the rules found at lines  2  and  3  in  the
2612     input  file.  If the scanner is in that state and then reads
2613     something other than an 'o', it will have to back up to find
2614     a  rule  which is matched.  With a bit of headscratching one
2615     can see that this must be the state it's in when it has seen
2616     "fo".   When  this  has  happened,  if  anything  other than
2617     another 'o' is seen, the scanner will have  to  back  up  to
2618     simply match the 'f' (by the default rule).
2619
2620     The comment regarding State #8 indicates there's  a  problem
2621     when  "foob"  has  been  scanned.   Indeed, on any character
2622     other than an 'a', the scanner  will  have  to  back  up  to
2623     accept  "foo".  Similarly, the comment for State #9 concerns
2624     when "fooba" has been scanned and an 'r' does not follow.
2625
2626     The final comment reminds us that there's no point going  to
2627     all the trouble of removing backing up from the rules unless
2628     we're using -Cf or -CF, since there's  no  performance  gain
2629     doing so with compressed scanners.
2630
2631     The way to remove the backing up is to add "error" rules:
2632
2633         %%
2634
2635
2636
2637Version 2.5          Last change: April 1995                   40
2638
2639
2640
2641
2642
2643
2644FLEX(1)                  USER COMMANDS                    FLEX(1)
2645
2646
2647
2648         foo         return TOK_KEYWORD;
2649         foobar      return TOK_KEYWORD;
2650
2651         fooba       |
2652         foob        |
2653         fo          {
2654                     /* false alarm, not really a keyword */
2655                     return TOK_ID;
2656                     }
2657
2658
2659     Eliminating backing up among a list of keywords can also  be
2660     done using a "catch-all" rule:
2661
2662         %%
2663         foo         return TOK_KEYWORD;
2664         foobar      return TOK_KEYWORD;
2665
2666         [a-z]+      return TOK_ID;
2667
2668     This is usually the best solution when appropriate.
2669
2670     Backing up messages tend to cascade.  With a complicated set
2671     of  rules it's not uncommon to get hundreds of messages.  If
2672     one can decipher them, though, it often only takes  a  dozen
2673     or so rules to eliminate the backing up (though it's easy to
2674     make a mistake and have an error rule accidentally  match  a
2675     valid  token.   A  possible  future  flex feature will be to
2676     automatically add rules to eliminate backing up).
2677
2678     It's important to keep in mind that you gain the benefits of
2679     eliminating  backing up only if you eliminate every instance
2680     of backing up.  Leaving just one means you gain nothing.
2681
2682     Variable trailing context (where both the leading and trail-
2683     ing  parts  do  not  have a fixed length) entails almost the
2684     same performance loss as  REJECT  (i.e.,  substantial).   So
2685     when possible a rule like:
2686
2687         %%
2688         mouse|rat/(cat|dog)   run();
2689
2690     is better written:
2691
2692         %%
2693         mouse/cat|dog         run();
2694         rat/cat|dog           run();
2695
2696     or as
2697
2698         %%
2699         mouse|rat/cat         run();
2700
2701
2702
2703Version 2.5          Last change: April 1995                   41
2704
2705
2706
2707
2708
2709
2710FLEX(1)                  USER COMMANDS                    FLEX(1)
2711
2712
2713
2714         mouse|rat/dog         run();
2715
2716     Note that here the special '|' action does not  provide  any
2717     savings,  and can even make things worse (see Deficiencies /
2718     Bugs below).
2719
2720     Another area where the user can increase a scanner's perfor-
2721     mance  (and  one that's easier to implement) arises from the
2722     fact that the longer the  tokens  matched,  the  faster  the
2723     scanner will run.  This is because with long tokens the pro-
2724     cessing of most input characters takes place in the  (short)
2725     inner  scanning  loop, and does not often have to go through
2726     the additional work of setting up the  scanning  environment
2727     (e.g.,  yytext)  for  the  action.  Recall the scanner for C
2728     comments:
2729
2730         %x comment
2731         %%
2732                 int line_num = 1;
2733
2734         "/*"         BEGIN(comment);
2735
2736         <comment>[^*\n]*
2737         <comment>"*"+[^*/\n]*
2738         <comment>\n             ++line_num;
2739         <comment>"*"+"/"        BEGIN(INITIAL);
2740
2741     This could be sped up by writing it as:
2742
2743         %x comment
2744         %%
2745                 int line_num = 1;
2746
2747         "/*"         BEGIN(comment);
2748
2749         <comment>[^*\n]*
2750         <comment>[^*\n]*\n      ++line_num;
2751         <comment>"*"+[^*/\n]*
2752         <comment>"*"+[^*/\n]*\n ++line_num;
2753         <comment>"*"+"/"        BEGIN(INITIAL);
2754
2755     Now instead of each  newline  requiring  the  processing  of
2756     another  action,  recognizing  the newlines is "distributed"
2757     over the other rules to keep the matched  text  as  long  as
2758     possible.   Note  that  adding  rules does not slow down the
2759     scanner!  The speed of the scanner  is  independent  of  the
2760     number  of  rules or (modulo the considerations given at the
2761     beginning of this section) how  complicated  the  rules  are
2762     with regard to operators such as '*' and '|'.
2763
2764     A final example in speeding up a scanner: suppose  you  want
2765     to  scan through a file containing identifiers and keywords,
2766
2767
2768
2769Version 2.5          Last change: April 1995                   42
2770
2771
2772
2773
2774
2775
2776FLEX(1)                  USER COMMANDS                    FLEX(1)
2777
2778
2779
2780     one per line and with no other  extraneous  characters,  and
2781     recognize all the keywords.  A natural first approach is:
2782
2783         %%
2784         asm      |
2785         auto     |
2786         break    |
2787         ... etc ...
2788         volatile |
2789         while    /* it's a keyword */
2790
2791         .|\n     /* it's not a keyword */
2792
2793     To eliminate the back-tracking, introduce a catch-all rule:
2794
2795         %%
2796         asm      |
2797         auto     |
2798         break    |
2799         ... etc ...
2800         volatile |
2801         while    /* it's a keyword */
2802
2803         [a-z]+   |
2804         .|\n     /* it's not a keyword */
2805
2806     Now, if it's guaranteed that there's exactly  one  word  per
2807     line,  then  we  can reduce the total number of matches by a
2808     half by merging in the recognition of newlines with that  of
2809     the other tokens:
2810
2811         %%
2812         asm\n    |
2813         auto\n   |
2814         break\n  |
2815         ... etc ...
2816         volatile\n |
2817         while\n  /* it's a keyword */
2818
2819         [a-z]+\n |
2820         .|\n     /* it's not a keyword */
2821
2822     One has to be careful here,  as  we  have  now  reintroduced
2823     backing  up  into the scanner.  In particular, while we know
2824     that there will never be any characters in the input  stream
2825     other  than letters or newlines, flex can't figure this out,
2826     and it will plan for possibly needing to back up when it has
2827     scanned  a  token like "auto" and then the next character is
2828     something other than a newline or a letter.   Previously  it
2829     would  then  just match the "auto" rule and be done, but now
2830     it has no "auto" rule, only a "auto\n" rule.   To  eliminate
2831     the possibility of backing up, we could either duplicate all
2832
2833
2834
2835Version 2.5          Last change: April 1995                   43
2836
2837
2838
2839
2840
2841
2842FLEX(1)                  USER COMMANDS                    FLEX(1)
2843
2844
2845
2846     rules but without final newlines, or, since we never  expect
2847     to  encounter  such  an  input  and therefore don't how it's
2848     classified, we can introduce one more catch-all  rule,  this
2849     one which doesn't include a newline:
2850
2851         %%
2852         asm\n    |
2853         auto\n   |
2854         break\n  |
2855         ... etc ...
2856         volatile\n |
2857         while\n  /* it's a keyword */
2858
2859         [a-z]+\n |
2860         [a-z]+   |
2861         .|\n     /* it's not a keyword */
2862
2863     Compiled with -Cf, this is about as fast as one  can  get  a
2864     flex scanner to go for this particular problem.
2865
2866     A final note: flex is slow when matching NUL's, particularly
2867     when  a  token  contains multiple NUL's.  It's best to write
2868     rules which match short amounts of text if it's  anticipated
2869     that the text will often include NUL's.
2870
2871     Another final note regarding performance: as mentioned above
2872     in  the section How the Input is Matched, dynamically resiz-
2873     ing yytext to accommodate huge  tokens  is  a  slow  process
2874     because  it presently requires that the (huge) token be res-
2875     canned from the beginning.  Thus if  performance  is  vital,
2876     you  should  attempt to match "large" quantities of text but
2877     not "huge" quantities, where the cutoff between the  two  is
2878     at about 8K characters/token.
2879
2880GENERATING C++ SCANNERS
2881     flex provides two different ways to  generate  scanners  for
2882     use  with C++.  The first way is to simply compile a scanner
2883     generated by flex using a C++ compiler instead of a  C  com-
2884     piler.   You  should  not  encounter any compilations errors
2885     (please report any you find to the email  address  given  in
2886     the  Author  section  below).   You can then use C++ code in
2887     your rule actions instead of C code.  Note that the  default
2888     input  source  for  your  scanner  remains yyin, and default
2889     echoing is still done to yyout. Both of these remain FILE  *
2890     variables and not C++ streams.
2891
2892     You can also use flex to generate a C++ scanner class, using
2893     the  -+  option  (or,  equivalently,  %option c++), which is
2894     automatically specified if the name of the  flex  executable
2895     ends  in a '+', such as flex++. When using this option, flex
2896     defaults to generating the scanner  to  the  file  lex.yy.cc
2897     instead  of  lex.yy.c.  The  generated  scanner includes the
2898
2899
2900
2901Version 2.5          Last change: April 1995                   44
2902
2903
2904
2905
2906
2907
2908FLEX(1)                  USER COMMANDS                    FLEX(1)
2909
2910
2911
2912     header file FlexLexer.h, which defines the interface to  two
2913     C++ classes.
2914
2915     The first class, FlexLexer, provides an abstract base  class
2916     defining  the  general scanner class interface.  It provides
2917     the following member functions:
2918
2919     const char* YYText()
2920          returns the text of the most  recently  matched  token,
2921          the equivalent of yytext.
2922
2923     int YYLeng()
2924          returns the length of the most recently matched  token,
2925          the equivalent of yyleng.
2926
2927     int lineno() const
2928          returns the current  input  line  number  (see  %option
2929          yylineno), or 1 if %option yylineno was not used.
2930
2931     void set_debug( int flag )
2932          sets the debugging flag for the scanner, equivalent  to
2933          assigning  to  yy_flex_debug  (see  the Options section
2934          above).  Note that you must  build  the  scanner  using
2935          %option debug to include debugging information in it.
2936
2937     int debug() const
2938          returns the current setting of the debugging flag.
2939
2940     Also   provided   are   member   functions   equivalent   to
2941     yy_switch_to_buffer(),  yy_create_buffer() (though the first
2942     argument is an istream* object pointer  and  not  a  FILE*),
2943     yy_flush_buffer(),   yy_delete_buffer(),   and   yyrestart()
2944     (again, the first argument is a istream* object pointer).
2945
2946     The second class  defined  in  FlexLexer.h  is  yyFlexLexer,
2947     which  is  derived  from FlexLexer. It defines the following
2948     additional member functions:
2949
2950     yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )
2951          constructs a yyFlexLexer object using the given streams
2952          for  input  and  output.  If not specified, the streams
2953          default to cin and cout, respectively.
2954
2955     virtual int yylex()
2956          performs the same role is  yylex()  does  for  ordinary
2957          flex  scanners:  it  scans  the input stream, consuming
2958          tokens, until a rule's action returns a value.  If  you
2959          derive a subclass S from yyFlexLexer and want to access
2960          the member functions and variables of S inside yylex(),
2961          then you need to use %option yyclass="S" to inform flex
2962          that you will be using that subclass instead of yyFlex-
2963          Lexer.   In   this   case,   rather   than   generating
2964
2965
2966
2967Version 2.5          Last change: April 1995                   45
2968
2969
2970
2971
2972
2973
2974FLEX(1)                  USER COMMANDS                    FLEX(1)
2975
2976
2977
2978          yyFlexLexer::yylex(), flex  generates  S::yylex()  (and
2979          also  generates a dummy yyFlexLexer::yylex() that calls
2980          yyFlexLexer::LexerError() if called).
2981
2982     virtual void switch_streams(istream* new_in = 0,
2983          ostream* new_out = 0)  reassigns  yyin  to  new_in  (if
2984          non-nil)  and  yyout  to  new_out (ditto), deleting the
2985          previous input buffer if yyin is reassigned.
2986
2987     int yylex( istream* new_in, ostream* new_out = 0 )
2988          first switches the input  streams  via  switch_streams(
2989          new_in,  new_out  )  and  then  returns  the  value  of
2990          yylex().
2991
2992     In addition, yyFlexLexer  defines  the  following  protected
2993     virtual  functions which you can redefine in derived classes
2994     to tailor the scanner:
2995
2996     virtual int LexerInput( char* buf, int max_size )
2997          reads up to max_size characters into  buf  and  returns
2998          the  number  of  characters  read.  To indicate end-of-
2999          input, return 0 characters.   Note  that  "interactive"
3000          scanners  (see  the  -B  and -I flags) define the macro
3001          YY_INTERACTIVE. If you redefine LexerInput()  and  need
3002          to  take  different actions depending on whether or not
3003          the scanner might  be  scanning  an  interactive  input
3004          source,  you can test for the presence of this name via
3005          #ifdef.
3006
3007     virtual void LexerOutput( const char* buf, int size )
3008          writes out size characters from the buffer buf,  which,
3009          while NUL-terminated, may also contain "internal" NUL's
3010          if the scanner's rules can match  text  with  NUL's  in
3011          them.
3012
3013     virtual void LexerError( const char* msg )
3014          reports a fatal error message.  The default version  of
3015          this function writes the message to the stream cerr and
3016          exits.
3017
3018     Note that a yyFlexLexer object contains its entire  scanning
3019     state.   Thus  you  can use such objects to create reentrant
3020     scanners.  You can instantiate  multiple  instances  of  the
3021     same  yyFlexLexer  class,  and you can also combine multiple
3022     C++ scanner classes together in the same program  using  the
3023     -P option discussed above.
3024
3025     Finally, note that the %array feature is  not  available  to
3026     C++ scanner classes; you must use %pointer (the default).
3027
3028     Here is an example of a simple C++ scanner:
3029
3030
3031
3032
3033Version 2.5          Last change: April 1995                   46
3034
3035
3036
3037
3038
3039
3040FLEX(1)                  USER COMMANDS                    FLEX(1)
3041
3042
3043
3044             // An example of using the flex C++ scanner class.
3045
3046         %{
3047         int mylineno = 0;
3048         %}
3049
3050         string  \"[^\n"]+\"
3051
3052         ws      [ \t]+
3053
3054         alpha   [A-Za-z]
3055         dig     [0-9]
3056         name    ({alpha}|{dig}|\$)({alpha}|{dig}|[_.\-/$])*
3057         num1    [-+]?{dig}+\.?([eE][-+]?{dig}+)?
3058         num2    [-+]?{dig}*\.{dig}+([eE][-+]?{dig}+)?
3059         number  {num1}|{num2}
3060
3061         %%
3062
3063         {ws}    /* skip blanks and tabs */
3064
3065         "/*"    {
3066                 int c;
3067
3068                 while((c = yyinput()) != 0)
3069                     {
3070                     if(c == '\n')
3071                         ++mylineno;
3072
3073                     else if(c == '*')
3074                         {
3075                         if((c = yyinput()) == '/')
3076                             break;
3077                         else
3078                             unput(c);
3079                         }
3080                     }
3081                 }
3082
3083         {number}  cout << "number " << YYText() << '\n';
3084
3085         \n        mylineno++;
3086
3087         {name}    cout << "name " << YYText() << '\n';
3088
3089         {string}  cout << "string " << YYText() << '\n';
3090
3091         %%
3092
3093         int main( int /* argc */, char** /* argv */ )
3094             {
3095             FlexLexer* lexer = new yyFlexLexer;
3096
3097
3098
3099Version 2.5          Last change: April 1995                   47
3100
3101
3102
3103
3104
3105
3106FLEX(1)                  USER COMMANDS                    FLEX(1)
3107
3108
3109
3110             while(lexer->yylex() != 0)
3111                 ;
3112             return 0;
3113             }
3114     If you want to create multiple  (different)  lexer  classes,
3115     you  use  the -P flag (or the prefix= option) to rename each
3116     yyFlexLexer to some other xxFlexLexer. You then can  include
3117     <FlexLexer.h>  in  your  other sources once per lexer class,
3118     first renaming yyFlexLexer as follows:
3119
3120         #undef yyFlexLexer
3121         #define yyFlexLexer xxFlexLexer
3122         #include <FlexLexer.h>
3123
3124         #undef yyFlexLexer
3125         #define yyFlexLexer zzFlexLexer
3126         #include <FlexLexer.h>
3127
3128     if, for example, you used %option  prefix="xx"  for  one  of
3129     your scanners and %option prefix="zz" for the other.
3130
3131     IMPORTANT: the present form of the scanning class is experi-
3132     mental and may change considerably between major releases.
3133
3134INCOMPATIBILITIES WITH LEX AND POSIX
3135     flex is a rewrite of the AT&T Unix lex tool (the two  imple-
3136     mentations  do not share any code, though), with some exten-
3137     sions and incompatibilities, both of which are of concern to
3138     those who wish to write scanners acceptable to either imple-
3139     mentation.  Flex is  fully  compliant  with  the  POSIX  lex
3140     specification,   except   that   when  using  %pointer  (the
3141     default), a call to unput() destroys the contents of yytext,
3142     which is counter to the POSIX specification.
3143
3144     In this section we discuss all of the known areas of  incom-
3145     patibility  between flex, AT&T lex, and the POSIX specifica-
3146     tion.
3147
3148     flex's -l option turns on  maximum  compatibility  with  the
3149     original  AT&T  lex  implementation,  at the cost of a major
3150     loss in the generated scanner's performance.  We note  below
3151     which incompatibilities can be overcome using the -l option.
3152
3153     flex is fully compatible with lex with the following  excep-
3154     tions:
3155
3156     -    The undocumented lex scanner internal variable yylineno
3157          is not supported unless -l or %option yylineno is used.
3158
3159          yylineno should be maintained on  a  per-buffer  basis,
3160          rather  than  a  per-scanner  (single  global variable)
3161          basis.
3162
3163
3164
3165Version 2.5          Last change: April 1995                   48
3166
3167
3168
3169
3170
3171
3172FLEX(1)                  USER COMMANDS                    FLEX(1)
3173
3174
3175
3176          yylineno is not part of the POSIX specification.
3177
3178     -    The input() routine is not redefinable, though  it  may
3179          be  called  to  read  characters following whatever has
3180          been matched by a rule.  If input() encounters an  end-
3181          of-file  the  normal  yywrap()  processing  is done.  A
3182          ``real'' end-of-file is returned by input() as EOF.
3183
3184          Input is instead controlled by  defining  the  YY_INPUT
3185          macro.
3186
3187          The flex restriction that input() cannot  be  redefined
3188          is  in  accordance  with the POSIX specification, which
3189          simply does not specify  any  way  of  controlling  the
3190          scanner's input other than by making an initial assign-
3191          ment to yyin.
3192
3193     -    The unput() routine is not redefinable.  This  restric-
3194          tion is in accordance with POSIX.
3195
3196     -    flex scanners are not as reentrant as lex scanners.  In
3197          particular,  if  you have an interactive scanner and an
3198          interrupt handler which long-jumps out of the  scanner,
3199          and  the  scanner is subsequently called again, you may
3200          get the following message:
3201
3202              fatal flex scanner internal error--end of buffer missed
3203
3204          To reenter the scanner, first use
3205
3206              yyrestart( yyin );
3207
3208          Note that this call will throw away any buffered input;
3209          usually  this  isn't  a  problem  with  an  interactive
3210          scanner.
3211
3212          Also note that flex C++ scanner classes are  reentrant,
3213          so  if  using  C++ is an option for you, you should use
3214          them instead.  See "Generating C++ Scanners" above  for
3215          details.
3216
3217     -    output() is not supported.  Output from the ECHO  macro
3218          is done to the file-pointer yyout (default stdout).
3219
3220          output() is not part of the POSIX specification.
3221
3222     -    lex does not support exclusive start  conditions  (%x),
3223          though they are in the POSIX specification.
3224
3225     -    When definitions are expanded, flex  encloses  them  in
3226          parentheses.  With lex, the following:
3227
3228
3229
3230
3231Version 2.5          Last change: April 1995                   49
3232
3233
3234
3235
3236
3237
3238FLEX(1)                  USER COMMANDS                    FLEX(1)
3239
3240
3241
3242              NAME    [A-Z][A-Z0-9]*
3243              %%
3244              foo{NAME}?      printf( "Found it\n" );
3245              %%
3246
3247          will not match the string "foo" because when the  macro
3248          is  expanded  the rule is equivalent to "foo[A-Z][A-Z0-
3249          9]*?" and the precedence is such that the '?' is  asso-
3250          ciated  with  "[A-Z0-9]*".  With flex, the rule will be
3251          expanded to "foo([A-Z][A-Z0-9]*)?" and  so  the  string
3252          "foo" will match.
3253
3254          Note that if the definition begins with ^ or ends  with
3255          $  then  it  is not expanded with parentheses, to allow
3256          these operators to appear in definitions without losing
3257          their  special  meanings.   But the <s>, /, and <<EOF>>
3258          operators cannot be used in a flex definition.
3259
3260          Using -l results in the lex behavior of no  parentheses
3261          around the definition.
3262
3263          The POSIX  specification  is  that  the  definition  be
3264          enclosed in parentheses.
3265
3266     -    Some implementations of lex allow a  rule's  action  to
3267          begin  on  a  separate  line, if the rule's pattern has
3268          trailing whitespace:
3269
3270              %%
3271              foo|bar<space here>
3272                { foobar_action(); }
3273
3274          flex does not support this feature.
3275
3276     -    The lex %r (generate a Ratfor scanner)  option  is  not
3277          supported.  It is not part of the POSIX specification.
3278
3279     -    After a call to unput(), yytext is undefined until  the
3280          next  token  is  matched,  unless the scanner was built
3281          using %array. This is not the  case  with  lex  or  the
3282          POSIX specification.  The -l option does away with this
3283          incompatibility.
3284
3285     -    The precedence of the {} (numeric  range)  operator  is
3286          different.   lex  interprets  "abc{1,3}" as "match one,
3287          two, or  three  occurrences  of  'abc'",  whereas  flex
3288          interprets  it  as "match 'ab' followed by one, two, or
3289          three occurrences of 'c'".  The latter is in  agreement
3290          with the POSIX specification.
3291
3292     -    The precedence of the ^  operator  is  different.   lex
3293          interprets  "^foo|bar"  as  "match  either 'foo' at the
3294
3295
3296
3297Version 2.5          Last change: April 1995                   50
3298
3299
3300
3301
3302
3303
3304FLEX(1)                  USER COMMANDS                    FLEX(1)
3305
3306
3307
3308          beginning of a line, or 'bar' anywhere",  whereas  flex
3309          interprets  it  as "match either 'foo' or 'bar' if they
3310          come at the beginning of a line".   The  latter  is  in
3311          agreement with the POSIX specification.
3312
3313     -    The special table-size declarations  such  as  %a  sup-
3314          ported  by  lex are not required by flex scanners; flex
3315          ignores them.
3316
3317     -    The name FLEX_SCANNER is #define'd so scanners  may  be
3318          written  for use with either flex or lex. Scanners also
3319          include YY_FLEX_MAJOR_VERSION and YY_FLEX_MINOR_VERSION
3320          indicating  which version of flex generated the scanner
3321          (for example, for the 2.5 release, these defines  would
3322          be 2 and 5 respectively).
3323
3324     The following flex features are not included in lex  or  the
3325     POSIX specification:
3326
3327         C++ scanners
3328         %option
3329         start condition scopes
3330         start condition stacks
3331         interactive/non-interactive scanners
3332         yy_scan_string() and friends
3333         yyterminate()
3334         yy_set_interactive()
3335         yy_set_bol()
3336         YY_AT_BOL()
3337         <<EOF>>
3338         <*>
3339         YY_DECL
3340         YY_START
3341         YY_USER_ACTION
3342         YY_USER_INIT
3343         #line directives
3344         %{}'s around actions
3345         multiple actions on a line
3346
3347     plus almost all of the flex flags.  The last feature in  the
3348     list  refers to the fact that with flex you can put multiple
3349     actions on the same line, separated with semi-colons,  while
3350     with lex, the following
3351
3352         foo    handle_foo(); ++num_foos_seen;
3353
3354     is (rather surprisingly) truncated to
3355
3356         foo    handle_foo();
3357
3358     flex does not truncate the action.   Actions  that  are  not
3359     enclosed  in  braces are simply terminated at the end of the
3360
3361
3362
3363Version 2.5          Last change: April 1995                   51
3364
3365
3366
3367
3368
3369
3370FLEX(1)                  USER COMMANDS                    FLEX(1)
3371
3372
3373
3374     line.
3375
3376DIAGNOSTICS
3377     warning, rule cannot be matched  indicates  that  the  given
3378     rule  cannot  be matched because it follows other rules that
3379     will always match the same text as it.  For example, in  the
3380     following  "foo" cannot be matched because it comes after an
3381     identifier "catch-all" rule:
3382
3383         [a-z]+    got_identifier();
3384         foo       got_foo();
3385
3386     Using REJECT in a scanner suppresses this warning.
3387
3388     warning, -s option given but default  rule  can  be  matched
3389     means  that  it  is  possible  (perhaps only in a particular
3390     start condition) that the default  rule  (match  any  single
3391     character)  is  the  only  one  that will match a particular
3392     input.  Since -s was given, presumably this is not intended.
3393
3394     reject_used_but_not_detected          undefined           or
3395     yymore_used_but_not_detected  undefined  -  These errors can
3396     occur at compile time.  They indicate that the scanner  uses
3397     REJECT  or yymore() but that flex failed to notice the fact,
3398     meaning that flex scanned the first two sections looking for
3399     occurrences  of  these  actions  and failed to find any, but
3400     somehow you snuck some in (via a #include  file,  for  exam-
3401     ple).   Use  %option reject or %option yymore to indicate to
3402     flex that you really do use these features.
3403
3404     flex scanner jammed - a scanner compiled with -s has encoun-
3405     tered  an  input  string  which wasn't matched by any of its
3406     rules.  This error can also occur due to internal problems.
3407
3408     token too large, exceeds YYLMAX - your scanner  uses  %array
3409     and one of its rules matched a string longer than the YYLMAX
3410     constant (8K bytes by default).  You can increase the  value
3411     by  #define'ing  YYLMAX  in  the definitions section of your
3412     flex input.
3413
3414     scanner requires -8 flag to use the  character  'x'  -  Your
3415     scanner specification includes recognizing the 8-bit charac-
3416     ter 'x' and you did  not  specify  the  -8  flag,  and  your
3417     scanner  defaulted  to 7-bit because you used the -Cf or -CF
3418     table compression options.  See the  discussion  of  the  -7
3419     flag for details.
3420
3421     flex scanner push-back overflow - you used unput()  to  push
3422     back  so  much text that the scanner's buffer could not hold
3423     both the pushed-back text and the current token  in  yytext.
3424     Ideally  the scanner should dynamically resize the buffer in
3425     this case, but at present it does not.
3426
3427
3428
3429Version 2.5          Last change: April 1995                   52
3430
3431
3432
3433
3434
3435
3436FLEX(1)                  USER COMMANDS                    FLEX(1)
3437
3438
3439
3440     input buffer overflow, can't enlarge buffer because  scanner
3441     uses  REJECT  -  the  scanner  was  working  on  matching an
3442     extremely large token and needed to expand the input buffer.
3443     This doesn't work with scanners that use REJECT.
3444
3445     fatal flex scanner internal error--end of  buffer  missed  -
3446     This  can  occur  in  an  scanner which is reentered after a
3447     long-jump has jumped out (or over) the scanner's  activation
3448     frame.  Before reentering the scanner, use:
3449
3450         yyrestart( yyin );
3451
3452     or, as noted above, switch to using the C++ scanner class.
3453
3454     too many start conditions in <> you listed more start condi-
3455     tions  in a <> construct than exist (so you must have listed
3456     at least one of them twice).
3457
3458FILES
3459     -lfl library with which scanners must be linked.
3460
3461     lex.yy.c
3462          generated scanner (called lexyy.c on some systems).
3463
3464     lex.yy.cc
3465          generated C++ scanner class, when using -+.
3466
3467     <FlexLexer.h>
3468          header file defining the C++ scanner base class,  Flex-
3469          Lexer, and its derived class, yyFlexLexer.
3470
3471     flex.skl
3472          skeleton scanner.  This file is only used when building
3473          flex, not when flex executes.
3474
3475     lex.backup
3476          backing-up information for -b flag (called  lex.bck  on
3477          some systems).
3478
3479DEFICIENCIES / BUGS
3480     Some trailing context patterns cannot  be  properly  matched
3481     and  generate  warning  messages  ("dangerous  trailing con-
3482     text").  These are patterns where the ending  of  the  first
3483     part  of  the rule matches the beginning of the second part,
3484     such as "zx*/xy*", where the 'x*' matches  the  'x'  at  the
3485     beginning  of  the  trailing  context.  (Note that the POSIX
3486     draft states that the text matched by such patterns is unde-
3487     fined.)
3488
3489     For some trailing context rules, parts  which  are  actually
3490     fixed-length  are  not  recognized  as  such, leading to the
3491     abovementioned performance loss.  In particular, parts using
3492
3493
3494
3495Version 2.5          Last change: April 1995                   53
3496
3497
3498
3499
3500
3501
3502FLEX(1)                  USER COMMANDS                    FLEX(1)
3503
3504
3505
3506     '|'   or  {n}  (such  as  "foo{3}")  are  always  considered
3507     variable-length.
3508
3509     Combining trailing context with the special '|'  action  can
3510     result  in fixed trailing context being turned into the more
3511     expensive variable trailing context.  For  example,  in  the
3512     following:
3513
3514         %%
3515         abc      |
3516         xyz/def
3517
3518
3519     Use of unput() invalidates yytext  and  yyleng,  unless  the
3520     %array directive or the -l option has been used.
3521
3522     Pattern-matching  of  NUL's  is  substantially  slower  than
3523     matching other characters.
3524
3525     Dynamic resizing of the input buffer is slow, as it  entails
3526     rescanning  all the text matched so far by the current (gen-
3527     erally huge) token.
3528
3529     Due to both buffering of input and  read-ahead,  you  cannot
3530     intermix  calls to <stdio.h> routines, such as, for example,
3531     getchar(), with flex rules and  expect  it  to  work.   Call
3532     input() instead.
3533
3534     The total table entries listed by the -v flag  excludes  the
3535     number  of  table  entries needed to determine what rule has
3536     been matched.  The number of entries is equal to the  number
3537     of  DFA states if the scanner does not use REJECT, and some-
3538     what greater than the number of states if it does.
3539
3540     REJECT cannot be used with the -f or -F options.
3541
3542     The flex internal algorithms need documentation.
3543
3544SEE ALSO
3545     lex(1), yacc(1), sed(1), awk(1).
3546
3547     John Levine,  Tony  Mason,  and  Doug  Brown,  Lex  &  Yacc,
3548     O'Reilly and Associates.  Be sure to get the 2nd edition.
3549
3550     M. E. Lesk and E. Schmidt, LEX - Lexical Analyzer Generator
3551
3552     Alfred Aho, Ravi Sethi and Jeffrey Ullman, Compilers:  Prin-
3553     ciples,   Techniques   and   Tools,  Addison-Wesley  (1986).
3554     Describes  the  pattern-matching  techniques  used  by  flex
3555     (deterministic finite automata).
3556
3557
3558
3559
3560
3561Version 2.5          Last change: April 1995                   54
3562
3563
3564
3565
3566
3567
3568FLEX(1)                  USER COMMANDS                    FLEX(1)
3569
3570
3571
3572AUTHOR
3573     Vern Paxson, with the help of many ideas and  much  inspira-
3574     tion  from Van Jacobson.  Original version by Jef Poskanzer.
3575     The fast table representation is a partial implementation of
3576     a  design done by Van Jacobson.  The implementation was done
3577     by Kevin Gong and Vern Paxson.
3578
3579     Thanks to the many flex beta-testers, feedbackers, and  con-
3580     tributors,  especially Francois Pinard, Casey Leedom, Robert
3581     Abramovitz,  Stan  Adermann,  Terry  Allen,  David   Barker-
3582     Plummer,  John  Basrai,  Neal  Becker,  Nelson  H.F.  Beebe,
3583     benson@odi.com, Karl Berry, Peter A. Bigot, Simon Blanchard,
3584     Keith  Bostic,  Frederic Brehm, Ian Brockbank, Kin Cho, Nick
3585     Christopher, Brian Clapper, J.T.  Conklin,  Jason  Coughlin,
3586     Bill  Cox,  Nick  Cropper, Dave Curtis, Scott David Daniels,
3587     Chris  G.  Demetriou,  Theo  Deraadt,  Mike  Donahue,  Chuck
3588     Doucette,  Tom  Epperly,  Leo  Eskin,  Chris  Faylor,  Chris
3589     Flatters, Jon Forrest, Jeffrey Friedl, Joe Gayda,  Kaveh  R.
3590     Ghazi,  Wolfgang  Glunz, Eric Goldman, Christopher M. Gould,
3591     Ulrich Grepel, Peer Griebel, Jan  Hajic,  Charles  Hemphill,
3592     NORO  Hideo,  Jarkko  Hietaniemi, Scott Hofmann, Jeff Honig,
3593     Dana Hudes, Eric Hughes,  John  Interrante,  Ceriel  Jacobs,
3594     Michal Jaegermann, Sakari Jalovaara, Jeffrey R. Jones, Henry
3595     Juengst, Klaus Kaempf, Jonathan I. Kamens, Terrence O  Kane,
3596     Amir  Katz, ken@ken.hilco.com, Kevin B. Kenny, Steve Kirsch,
3597     Winfried Koenig, Marq  Kole,  Ronald  Lamprecht,  Greg  Lee,
3598     Rohan  Lenard, Craig Leres, John Levine, Steve Liddle, David
3599     Loffredo, Mike Long, Mohamed el Lozy, Brian  Madsen,  Malte,
3600     Joe Marshall, Bengt Martensson, Chris Metcalf, Luke Mewburn,
3601     Jim Meyering,  R.  Alexander  Milowski,  Erik  Naggum,  G.T.
3602     Nicol,  Landon  Noll,  James  Nordby,  Marc  Nozell, Richard
3603     Ohnemus, Karsten Pahnke, Sven Panne,  Roland  Pesch,  Walter
3604     Pelissero,  Gaumond  Pierre, Esmond Pitt, Jef Poskanzer, Joe
3605     Rahmeh, Jarmo Raiha, Frederic Raimbault,  Pat  Rankin,  Rick
3606     Richardson,  Kevin  Rodgers,  Kai  Uwe  Rommel, Jim Roskind,
3607     Alberto Santini,  Andreas  Scherer,  Darrell  Schiebel,  Raf
3608     Schietekat,  Doug  Schmidt,  Philippe  Schnoebelen,  Andreas
3609     Schwab, Larry Schwimmer, Alex Siegel, Eckehard  Stolz,  Jan-
3610     Erik  Strvmquist, Mike Stump, Paul Stuart, Dave Tallman, Ian
3611     Lance Taylor, Chris Thewalt, Richard M. Timoney, Jodi  Tsai,
3612     Paul  Tuinenga,  Gary  Weik, Frank Whaley, Gerhard Wilhelms,
3613     Kent Williams, Ken Yap,  Ron  Zellar,  Nathan  Zelle,  David
3614     Zuhn,  and  those whose names have slipped my marginal mail-
3615     archiving skills but whose contributions are appreciated all
3616     the same.
3617
3618     Thanks to Keith Bostic, Jon  Forrest,  Noah  Friedman,  John
3619     Gilmore, Craig Leres, John Levine, Bob Mulcahy, G.T.  Nicol,
3620     Francois Pinard, Rich Salz, and Richard  Stallman  for  help
3621     with various distribution headaches.
3622
3623
3624
3625
3626
3627Version 2.5          Last change: April 1995                   55
3628
3629
3630
3631
3632
3633
3634FLEX(1)                  USER COMMANDS                    FLEX(1)
3635
3636
3637
3638     Thanks to Esmond Pitt and Earle Horton for  8-bit  character
3639     support; to Benson Margulies and Fred Burke for C++ support;
3640     to Kent Williams and Tom Epperly for C++ class  support;  to
3641     Ove  Ewerlid  for  support  of NUL's; and to Eric Hughes for
3642     support of multiple buffers.
3643
3644     This work was primarily done when I was with the  Real  Time
3645     Systems  Group at the Lawrence Berkeley Laboratory in Berke-
3646     ley, CA.  Many  thanks  to  all  there  for  the  support  I
3647     received.
3648
3649     Send comments to vern@ee.lbl.gov.
3650
3651
3652
3653
3654
3655
3656
3657
3658
3659
3660
3661
3662
3663
3664
3665
3666
3667
3668
3669
3670
3671
3672
3673
3674
3675
3676
3677
3678
3679
3680
3681
3682
3683
3684
3685
3686
3687
3688
3689
3690
3691
3692
3693Version 2.5          Last change: April 1995                   56
3694