1This is Info file flex.info, produced by Makeinfo-1.55 from the input
2file flex.texi.
3
4START-INFO-DIR-ENTRY
5* Flex: (flex).         A fast scanner generator.
6END-INFO-DIR-ENTRY
7
8   This file documents Flex.
9
10   Copyright (c) 1990 The Regents of the University of California.  All
11rights reserved.
12
13   This code is derived from software contributed to Berkeley by Vern
14Paxson.
15
16   The United States Government has rights in this work pursuant to
17contract no. DE-AC03-76SF00098 between the United States Department of
18Energy and the University of California.
19
20   Redistribution and use in source and binary forms with or without
21modification are permitted provided that: (1) source distributions
22retain this entire copyright notice and comment, and (2) distributions
23including binaries display the following acknowledgement:  "This
24product includes software developed by the University of California,
25Berkeley and its contributors" in the documentation or other materials
26provided with the distribution and in all advertising materials
27mentioning features or use of this software.  Neither the name of the
28University nor the names of its contributors may be used to endorse or
29promote products derived from this software without specific prior
30written permission.
31
32   THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
33WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
34MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
35
36
37File: flex.info,  Node: Top,  Next: Name,  Prev: (dir),  Up: (dir)
38
39flex
40****
41
42   This manual documents `flex'.  It covers release 2.5.
43
44* Menu:
45
46* Name::                        Name
47* Synopsis::                    Synopsis
48* Overview::                    Overview
49* Description::                 Description
50* Examples::                    Some simple examples
51* Format::                      Format of the input file
52* Patterns::                    Patterns
53* Matching::                    How the input is matched
54* Actions::                     Actions
55* Generated scanner::           The generated scanner
56* Start conditions::            Start conditions
57* Multiple buffers::            Multiple input buffers
58* End-of-file rules::           End-of-file rules
59* Miscellaneous::               Miscellaneous macros
60* User variables::              Values available to the user
61* YACC interface::              Interfacing with `yacc'
62* Options::                     Options
63* Performance::                 Performance considerations
64* C++::                         Generating C++ scanners
65* Incompatibilities::           Incompatibilities with `lex' and POSIX
66* Diagnostics::                 Diagnostics
67* Files::                       Files
68* Deficiencies::                Deficiencies / Bugs
69* See also::                    See also
70* Author::                      Author
71
72
73File: flex.info,  Node: Name,  Next: Synopsis,  Prev: Top,  Up: Top
74
75Name
76====
77
78   flex - fast lexical analyzer generator
79
80
81File: flex.info,  Node: Synopsis,  Next: Overview,  Prev: Name,  Up: Top
82
83Synopsis
84========
85
86     flex [-bcdfhilnpstvwBFILTV78+? -C[aefFmr] -ooutput -Pprefix -Sskeleton]
87     [--help --version] [FILENAME ...]
88
89
90File: flex.info,  Node: Overview,  Next: Description,  Prev: Synopsis,  Up: Top
91
92Overview
93========
94
95   This manual describes `flex', a tool for generating programs that
96perform pattern-matching on text.  The manual includes both tutorial
97and reference sections:
98
99Description
100     a brief overview of the tool
101
102Some Simple Examples
103Format Of The Input File
104Patterns
105     the extended regular expressions used by flex
106
107How The Input Is Matched
108     the rules for determining what has been matched
109
110Actions
111     how to specify what to do when a pattern is matched
112
113The Generated Scanner
114     details regarding the scanner that flex produces; how to control
115     the input source
116
117Start Conditions
118     introducing context into your scanners, and managing
119     "mini-scanners"
120
121Multiple Input Buffers
122     how to manipulate multiple input sources; how to scan from strings
123     instead of files
124
125End-of-file Rules
126     special rules for matching the end of the input
127
128Miscellaneous Macros
129     a summary of macros available to the actions
130
131Values Available To The User
132     a summary of values available to the actions
133
134Interfacing With Yacc
135     connecting flex scanners together with yacc parsers
136
137Options
138     flex command-line options, and the "%option" directive
139
140Performance Considerations
141     how to make your scanner go as fast as possible
142
143Generating C++ Scanners
144     the (experimental) facility for generating C++ scanner classes
145
146Incompatibilities With Lex And POSIX
147     how flex differs from AT&T lex and the POSIX lex standard
148
149Diagnostics
150     those error messages produced by flex (or scanners it generates)
151     whose meanings might not be apparent
152
153Files
154     files used by flex
155
156Deficiencies / Bugs
157     known problems with flex
158
159See Also
160     other documentation, related tools
161
162Author
163     includes contact information
164
165
166File: flex.info,  Node: Description,  Next: Examples,  Prev: Overview,  Up: Top
167
168Description
169===========
170
171   `flex' is a tool for generating "scanners": programs which
172recognized lexical patterns in text.  `flex' reads the given input
173files, or its standard input if no file names are given, for a
174description of a scanner to generate.  The description is in the form
175of pairs of regular expressions and C code, called "rules". `flex'
176generates as output a C source file, `lex.yy.c', which defines a
177routine `yylex()'.  This file is compiled and linked with the `-lfl'
178library to produce an executable.  When the executable is run, it
179analyzes its input for occurrences of the regular expressions.
180Whenever it finds one, it executes the corresponding C code.
181
182
183File: flex.info,  Node: Examples,  Next: Format,  Prev: Description,  Up: Top
184
185Some simple examples
186====================
187
188   First some simple examples to get the flavor of how one uses `flex'.
189The following `flex' input specifies a scanner which whenever it
190encounters the string "username" will replace it with the user's login
191name:
192
193     %%
194     username    printf( "%s", getlogin() );
195
196   By default, any text not matched by a `flex' scanner is copied to
197the output, so the net effect of this scanner is to copy its input file
198to its output with each occurrence of "username" expanded.  In this
199input, there is just one rule.  "username" is the PATTERN and the
200"printf" is the ACTION.  The "%%" marks the beginning of the rules.
201
202   Here's another simple example:
203
204             int num_lines = 0, num_chars = 0;
205     
206     %%
207     \n      ++num_lines; ++num_chars;
208     .       ++num_chars;
209     
210     %%
211     main()
212             {
213             yylex();
214             printf( "# of lines = %d, # of chars = %d\n",
215                     num_lines, num_chars );
216             }
217
218   This scanner counts the number of characters and the number of lines
219in its input (it produces no output other than the final report on the
220counts).  The first line declares two globals, "num_lines" and
221"num_chars", which are accessible both inside `yylex()' and in the
222`main()' routine declared after the second "%%".  There are two rules,
223one which matches a newline ("\n") and increments both the line count
224and the character count, and one which matches any character other than
225a newline (indicated by the "." regular expression).
226
227   A somewhat more complicated example:
228
229     /* scanner for a toy Pascal-like language */
230     
231     %{
232     /* need this for the call to atof() below */
233     #include <math.h>
234     %}
235     
236     DIGIT    [0-9]
237     ID       [a-z][a-z0-9]*
238     
239     %%
240     
241     {DIGIT}+    {
242                 printf( "An integer: %s (%d)\n", yytext,
243                         atoi( yytext ) );
244                 }
245     
246     {DIGIT}+"."{DIGIT}*        {
247                 printf( "A float: %s (%g)\n", yytext,
248                         atof( yytext ) );
249                 }
250     
251     if|then|begin|end|procedure|function        {
252                 printf( "A keyword: %s\n", yytext );
253                 }
254     
255     {ID}        printf( "An identifier: %s\n", yytext );
256     
257     "+"|"-"|"*"|"/"   printf( "An operator: %s\n", yytext );
258     
259     "{"[^}\n]*"}"     /* eat up one-line comments */
260     
261     [ \t\n]+          /* eat up whitespace */
262     
263     .           printf( "Unrecognized character: %s\n", yytext );
264     
265     %%
266     
267     main( argc, argv )
268     int argc;
269     char **argv;
270         {
271         ++argv, --argc;  /* skip over program name */
272         if ( argc > 0 )
273                 yyin = fopen( argv[0], "r" );
274         else
275                 yyin = stdin;
276     
277         yylex();
278         }
279
280   This is the beginnings of a simple scanner for a language like
281Pascal.  It identifies different types of TOKENS and reports on what it
282has seen.
283
284   The details of this example will be explained in the following
285sections.
286
287
288File: flex.info,  Node: Format,  Next: Patterns,  Prev: Examples,  Up: Top
289
290Format of the input file
291========================
292
293   The `flex' input file consists of three sections, separated by a
294line with just `%%' in it:
295
296     definitions
297     %%
298     rules
299     %%
300     user code
301
302   The "definitions" section contains declarations of simple "name"
303definitions to simplify the scanner specification, and declarations of
304"start conditions", which are explained in a later section.  Name
305definitions have the form:
306
307     name definition
308
309   The "name" is a word beginning with a letter or an underscore ('_')
310followed by zero or more letters, digits, '_', or '-' (dash).  The
311definition is taken to begin at the first non-white-space character
312following the name and continuing to the end of the line.  The
313definition can subsequently be referred to using "{name}", which will
314expand to "(definition)".  For example,
315
316     DIGIT    [0-9]
317     ID       [a-z][a-z0-9]*
318
319defines "DIGIT" to be a regular expression which matches a single
320digit, and "ID" to be a regular expression which matches a letter
321followed by zero-or-more letters-or-digits.  A subsequent reference to
322
323     {DIGIT}+"."{DIGIT}*
324
325is identical to
326
327     ([0-9])+"."([0-9])*
328
329and matches one-or-more digits followed by a '.' followed by
330zero-or-more digits.
331
332   The RULES section of the `flex' input contains a series of rules of
333the form:
334
335     pattern   action
336
337where the pattern must be unindented and the action must begin on the
338same line.
339
340   See below for a further description of patterns and actions.
341
342   Finally, the user code section is simply copied to `lex.yy.c'
343verbatim.  It is used for companion routines which call or are called
344by the scanner.  The presence of this section is optional; if it is
345missing, the second `%%' in the input file may be skipped, too.
346
347   In the definitions and rules sections, any *indented* text or text
348enclosed in `%{' and `%}' is copied verbatim to the output (with the
349`%{}''s removed).  The `%{}''s must appear unindented on lines by
350themselves.
351
352   In the rules section, any indented or %{} text appearing before the
353first rule may be used to declare variables which are local to the
354scanning routine and (after the declarations) code which is to be
355executed whenever the scanning routine is entered.  Other indented or
356%{} text in the rule section is still copied to the output, but its
357meaning is not well-defined and it may well cause compile-time errors
358(this feature is present for `POSIX' compliance; see below for other
359such features).
360
361   In the definitions section (but not in the rules section), an
362unindented comment (i.e., a line beginning with "/*") is also copied
363verbatim to the output up to the next "*/".
364
365
366File: flex.info,  Node: Patterns,  Next: Matching,  Prev: Format,  Up: Top
367
368Patterns
369========
370
371   The patterns in the input are written using an extended set of
372regular expressions.  These are:
373
374`x'
375     match the character `x'
376
377`.'
378     any character (byte) except newline
379
380`[xyz]'
381     a "character class"; in this case, the pattern matches either an
382     `x', a `y', or a `z'
383
384`[abj-oZ]'
385     a "character class" with a range in it; matches an `a', a `b', any
386     letter from `j' through `o', or a `Z'
387
388`[^A-Z]'
389     a "negated character class", i.e., any character but those in the
390     class.  In this case, any character EXCEPT an uppercase letter.
391
392`[^A-Z\n]'
393     any character EXCEPT an uppercase letter or a newline
394
395`R*'
396     zero or more R's, where R is any regular expression
397
398`R+'
399     one or more R's
400
401`R?'
402     zero or one R's (that is, "an optional R")
403
404`R{2,5}'
405     anywhere from two to five R's
406
407`R{2,}'
408     two or more R's
409
410`R{4}'
411     exactly 4 R's
412
413`{NAME}'
414     the expansion of the "NAME" definition (see above)
415
416`"[xyz]\"foo"'
417     the literal string: `[xyz]"foo'
418
419`\X'
420     if X is an `a', `b', `f', `n', `r', `t', or `v', then the ANSI-C
421     interpretation of \X.  Otherwise, a literal `X' (used to escape
422     operators such as `*')
423
424`\0'
425     a NUL character (ASCII code 0)
426
427`\123'
428     the character with octal value 123
429
430`\x2a'
431     the character with hexadecimal value `2a'
432
433`(R)'
434     match an R; parentheses are used to override precedence (see below)
435
436`RS'
437     the regular expression R followed by the regular expression S;
438     called "concatenation"
439
440`R|S'
441     either an R or an S
442
443`R/S'
444     an R but only if it is followed by an S.  The text matched by S is
445     included when determining whether this rule is the "longest
446     match", but is then returned to the input before the action is
447     executed.  So the action only sees the text matched by R.  This
448     type of pattern is called "trailing context".  (There are some
449     combinations of `R/S' that `flex' cannot match correctly; see
450     notes in the Deficiencies / Bugs section below regarding
451     "dangerous trailing context".)
452
453`^R'
454     an R, but only at the beginning of a line (i.e., which just
455     starting to scan, or right after a newline has been scanned).
456
457`R$'
458     an R, but only at the end of a line (i.e., just before a newline).
459     Equivalent to "R/\n".
460
461     Note that flex's notion of "newline" is exactly whatever the C
462     compiler used to compile flex interprets '\n' as; in particular,
463     on some DOS systems you must either filter out \r's in the input
464     yourself, or explicitly use R/\r\n for "r$".
465
466`<S>R'
467     an R, but only in start condition S (see below for discussion of
468     start conditions) <S1,S2,S3>R same, but in any of start conditions
469     S1, S2, or S3
470
471`<*>R'
472     an R in any start condition, even an exclusive one.
473
474`<<EOF>>'
475     an end-of-file <S1,S2><<EOF>> an end-of-file when in start
476     condition S1 or S2
477
478   Note that inside of a character class, all regular expression
479operators lose their special meaning except escape ('\') and the
480character class operators, '-', ']', and, at the beginning of the
481class, '^'.
482
483   The regular expressions listed above are grouped according to
484precedence, from highest precedence at the top to lowest at the bottom.
485Those grouped together have equal precedence.  For example,
486
487     foo|bar*
488
489is the same as
490
491     (foo)|(ba(r*))
492
493since the '*' operator has higher precedence than concatenation, and
494concatenation higher than alternation ('|').  This pattern therefore
495matches *either* the string "foo" *or* the string "ba" followed by
496zero-or-more r's.  To match "foo" or zero-or-more "bar"'s, use:
497
498     foo|(bar)*
499
500and to match zero-or-more "foo"'s-or-"bar"'s:
501
502     (foo|bar)*
503
504   In addition to characters and ranges of characters, character
505classes can also contain character class "expressions".  These are
506expressions enclosed inside `[': and `:'] delimiters (which themselves
507must appear between the '[' and ']' of the character class; other
508elements may occur inside the character class, too).  The valid
509expressions are:
510
511     [:alnum:] [:alpha:] [:blank:]
512     [:cntrl:] [:digit:] [:graph:]
513     [:lower:] [:print:] [:punct:]
514     [:space:] [:upper:] [:xdigit:]
515
516   These expressions all designate a set of characters equivalent to
517the corresponding standard C `isXXX' function.  For example,
518`[:alnum:]' designates those characters for which `isalnum()' returns
519true - i.e., any alphabetic or numeric.  Some systems don't provide
520`isblank()', so flex defines `[:blank:]' as a blank or a tab.
521
522   For example, the following character classes are all equivalent:
523
524     [[:alnum:]]
525     [[:alpha:][:digit:]
526     [[:alpha:]0-9]
527     [a-zA-Z0-9]
528
529   If your scanner is case-insensitive (the `-i' flag), then
530`[:upper:]' and `[:lower:]' are equivalent to `[:alpha:]'.
531
532   Some notes on patterns:
533
534   - A negated character class such as the example "[^A-Z]" above *will
535     match a newline* unless "\n" (or an equivalent escape sequence) is
536     one of the characters explicitly present in the negated character
537     class (e.g., "[^A-Z\n]").  This is unlike how many other regular
538     expression tools treat negated character classes, but
539     unfortunately the inconsistency is historically entrenched.
540     Matching newlines means that a pattern like [^"]* can match the
541     entire input unless there's another quote in the input.
542
543   - A rule can have at most one instance of trailing context (the '/'
544     operator or the '$' operator).  The start condition, '^', and
545     "<<EOF>>" patterns can only occur at the beginning of a pattern,
546     and, as well as with '/' and '$', cannot be grouped inside
547     parentheses.  A '^' which does not occur at the beginning of a
548     rule or a '$' which does not occur at the end of a rule loses its
549     special properties and is treated as a normal character.
550
551     The following are illegal:
552
553          foo/bar$
554          <sc1>foo<sc2>bar
555
556     Note that the first of these, can be written "foo/bar\n".
557
558     The following will result in '$' or '^' being treated as a normal
559     character:
560
561          foo|(bar$)
562          foo|^bar
563
564     If what's wanted is a "foo" or a bar-followed-by-a-newline, the
565     following could be used (the special '|' action is explained
566     below):
567
568          foo      |
569          bar$     /* action goes here */
570
571     A similar trick will work for matching a foo or a
572     bar-at-the-beginning-of-a-line.
573
574
575File: flex.info,  Node: Matching,  Next: Actions,  Prev: Patterns,  Up: Top
576
577How the input is matched
578========================
579
580   When the generated scanner is run, it analyzes its input looking for
581strings which match any of its patterns.  If it finds more than one
582match, it takes the one matching the most text (for trailing context
583rules, this includes the length of the trailing part, even though it
584will then be returned to the input).  If it finds two or more matches
585of the same length, the rule listed first in the `flex' input file is
586chosen.
587
588   Once the match is determined, the text corresponding to the match
589(called the TOKEN) is made available in the global character pointer
590`yytext', and its length in the global integer `yyleng'.  The ACTION
591corresponding to the matched pattern is then executed (a more detailed
592description of actions follows), and then the remaining input is
593scanned for another match.
594
595   If no match is found, then the "default rule" is executed: the next
596character in the input is considered matched and copied to the standard
597output.  Thus, the simplest legal `flex' input is:
598
599     %%
600
601   which generates a scanner that simply copies its input (one
602character at a time) to its output.
603
604   Note that `yytext' can be defined in two different ways: either as a
605character *pointer* or as a character *array*.  You can control which
606definition `flex' uses by including one of the special directives
607`%pointer' or `%array' in the first (definitions) section of your flex
608input.  The default is `%pointer', unless you use the `-l' lex
609compatibility option, in which case `yytext' will be an array.  The
610advantage of using `%pointer' is substantially faster scanning and no
611buffer overflow when matching very large tokens (unless you run out of
612dynamic memory).  The disadvantage is that you are restricted in how
613your actions can modify `yytext' (see the next section), and calls to
614the `unput()' function destroys the present contents of `yytext', which
615can be a considerable porting headache when moving between different
616`lex' versions.
617
618   The advantage of `%array' is that you can then modify `yytext' to
619your heart's content, and calls to `unput()' do not destroy `yytext'
620(see below).  Furthermore, existing `lex' programs sometimes access
621`yytext' externally using declarations of the form:
622     extern char yytext[];
623   This definition is erroneous when used with `%pointer', but correct
624for `%array'.
625
626   `%array' defines `yytext' to be an array of `YYLMAX' characters,
627which defaults to a fairly large value.  You can change the size by
628simply #define'ing `YYLMAX' to a different value in the first section
629of your `flex' input.  As mentioned above, with `%pointer' yytext grows
630dynamically to accommodate large tokens.  While this means your
631`%pointer' scanner can accommodate very large tokens (such as matching
632entire blocks of comments), bear in mind that each time the scanner
633must resize `yytext' it also must rescan the entire token from the
634beginning, so matching such tokens can prove slow.  `yytext' presently
635does *not* dynamically grow if a call to `unput()' results in too much
636text being pushed back; instead, a run-time error results.
637
638   Also note that you cannot use `%array' with C++ scanner classes (the
639`c++' option; see below).
640
641
642File: flex.info,  Node: Actions,  Next: Generated scanner,  Prev: Matching,  Up: Top
643
644Actions
645=======
646
647   Each pattern in a rule has a corresponding action, which can be any
648arbitrary C statement.  The pattern ends at the first non-escaped
649whitespace character; the remainder of the line is its action.  If the
650action is empty, then when the pattern is matched the input token is
651simply discarded.  For example, here is the specification for a program
652which deletes all occurrences of "zap me" from its input:
653
654     %%
655     "zap me"
656
657   (It will copy all other characters in the input to the output since
658they will be matched by the default rule.)
659
660   Here is a program which compresses multiple blanks and tabs down to
661a single blank, and throws away whitespace found at the end of a line:
662
663     %%
664     [ \t]+        putchar( ' ' );
665     [ \t]+$       /* ignore this token */
666
667   If the action contains a '{', then the action spans till the
668balancing '}' is found, and the action may cross multiple lines.
669`flex' knows about C strings and comments and won't be fooled by braces
670found within them, but also allows actions to begin with `%{' and will
671consider the action to be all the text up to the next `%}' (regardless
672of ordinary braces inside the action).
673
674   An action consisting solely of a vertical bar ('|') means "same as
675the action for the next rule." See below for an illustration.
676
677   Actions can include arbitrary C code, including `return' statements
678to return a value to whatever routine called `yylex()'.  Each time
679`yylex()' is called it continues processing tokens from where it last
680left off until it either reaches the end of the file or executes a
681return.
682
683   Actions are free to modify `yytext' except for lengthening it
684(adding characters to its end-these will overwrite later characters in
685the input stream).  This however does not apply when using `%array'
686(see above); in that case, `yytext' may be freely modified in any way.
687
688   Actions are free to modify `yyleng' except they should not do so if
689the action also includes use of `yymore()' (see below).
690
691   There are a number of special directives which can be included
692within an action:
693
694   - `ECHO' copies yytext to the scanner's output.
695
696   - `BEGIN' followed by the name of a start condition places the
697     scanner in the corresponding start condition (see below).
698
699   - `REJECT' directs the scanner to proceed on to the "second best"
700     rule which matched the input (or a prefix of the input).  The rule
701     is chosen as described above in "How the Input is Matched", and
702     `yytext' and `yyleng' set up appropriately.  It may either be one
703     which matched as much text as the originally chosen rule but came
704     later in the `flex' input file, or one which matched less text.
705     For example, the following will both count the words in the input
706     and call the routine special() whenever "frob" is seen:
707
708                  int word_count = 0;
709          %%
710          
711          frob        special(); REJECT;
712          [^ \t\n]+   ++word_count;
713
714     Without the `REJECT', any "frob"'s in the input would not be
715     counted as words, since the scanner normally executes only one
716     action per token.  Multiple `REJECT's' are allowed, each one
717     finding the next best choice to the currently active rule.  For
718     example, when the following scanner scans the token "abcd", it
719     will write "abcdabcaba" to the output:
720
721          %%
722          a        |
723          ab       |
724          abc      |
725          abcd     ECHO; REJECT;
726          .|\n     /* eat up any unmatched character */
727
728     (The first three rules share the fourth's action since they use
729     the special '|' action.)  `REJECT' is a particularly expensive
730     feature in terms of scanner performance; if it is used in *any* of
731     the scanner's actions it will slow down *all* of the scanner's
732     matching.  Furthermore, `REJECT' cannot be used with the `-Cf' or
733     `-CF' options (see below).
734
735     Note also that unlike the other special actions, `REJECT' is a
736     *branch*; code immediately following it in the action will *not*
737     be executed.
738
739   - `yymore()' tells the scanner that the next time it matches a rule,
740     the corresponding token should be *appended* onto the current
741     value of `yytext' rather than replacing it.  For example, given
742     the input "mega-kludge" the following will write
743     "mega-mega-kludge" to the output:
744
745          %%
746          mega-    ECHO; yymore();
747          kludge   ECHO;
748
749     First "mega-" is matched and echoed to the output.  Then "kludge"
750     is matched, but the previous "mega-" is still hanging around at
751     the beginning of `yytext' so the `ECHO' for the "kludge" rule will
752     actually write "mega-kludge".
753
754   Two notes regarding use of `yymore()'.  First, `yymore()' depends on
755the value of `yyleng' correctly reflecting the size of the current
756token, so you must not modify `yyleng' if you are using `yymore()'.
757Second, the presence of `yymore()' in the scanner's action entails a
758minor performance penalty in the scanner's matching speed.
759
760   - `yyless(n)' returns all but the first N characters of the current
761     token back to the input stream, where they will be rescanned when
762     the scanner looks for the next match.  `yytext' and `yyleng' are
763     adjusted appropriately (e.g., `yyleng' will now be equal to N ).
764     For example, on the input "foobar" the following will write out
765     "foobarbar":
766
767          %%
768          foobar    ECHO; yyless(3);
769          [a-z]+    ECHO;
770
771     An argument of 0 to `yyless' will cause the entire current input
772     string to be scanned again.  Unless you've changed how the scanner
773     will subsequently process its input (using `BEGIN', for example),
774     this will result in an endless loop.
775
776     Note that `yyless' is a macro and can only be used in the flex
777     input file, not from other source files.
778
779   - `unput(c)' puts the character `c' back onto the input stream.  It
780     will be the next character scanned.  The following action will
781     take the current token and cause it to be rescanned enclosed in
782     parentheses.
783
784          {
785          int i;
786          /* Copy yytext because unput() trashes yytext */
787          char *yycopy = strdup( yytext );
788          unput( ')' );
789          for ( i = yyleng - 1; i >= 0; --i )
790              unput( yycopy[i] );
791          unput( '(' );
792          free( yycopy );
793          }
794
795     Note that since each `unput()' puts the given character back at
796     the *beginning* of the input stream, pushing back strings must be
797     done back-to-front.  An important potential problem when using
798     `unput()' is that if you are using `%pointer' (the default), a
799     call to `unput()' *destroys* the contents of `yytext', starting
800     with its rightmost character and devouring one character to the
801     left with each call.  If you need the value of yytext preserved
802     after a call to `unput()' (as in the above example), you must
803     either first copy it elsewhere, or build your scanner using
804     `%array' instead (see How The Input Is Matched).
805
806     Finally, note that you cannot put back `EOF' to attempt to mark
807     the input stream with an end-of-file.
808
809   - `input()' reads the next character from the input stream.  For
810     example, the following is one way to eat up C comments:
811
812          %%
813          "/*"        {
814                      register int c;
815          
816                      for ( ; ; )
817                          {
818                          while ( (c = input()) != '*' &&
819                                  c != EOF )
820                              ;    /* eat up text of comment */
821          
822                          if ( c == '*' )
823                              {
824                              while ( (c = input()) == '*' )
825                                  ;
826                              if ( c == '/' )
827                                  break;    /* found the end */
828                              }
829          
830                          if ( c == EOF )
831                              {
832                              error( "EOF in comment" );
833                              break;
834                              }
835                          }
836                      }
837
838     (Note that if the scanner is compiled using `C++', then `input()'
839     is instead referred to as `yyinput()', in order to avoid a name
840     clash with the `C++' stream by the name of `input'.)
841
842   - YY_FLUSH_BUFFER flushes the scanner's internal buffer so that the
843     next time the scanner attempts to match a token, it will first
844     refill the buffer using `YY_INPUT' (see The Generated Scanner,
845     below).  This action is a special case of the more general
846     `yy_flush_buffer()' function, described below in the section
847     Multiple Input Buffers.
848
849   - `yyterminate()' can be used in lieu of a return statement in an
850     action.  It terminates the scanner and returns a 0 to the
851     scanner's caller, indicating "all done".  By default,
852     `yyterminate()' is also called when an end-of-file is encountered.
853     It is a macro and may be redefined.
854
855
856File: flex.info,  Node: Generated scanner,  Next: Start conditions,  Prev: Actions,  Up: Top
857
858The generated scanner
859=====================
860
861   The output of `flex' is the file `lex.yy.c', which contains the
862scanning routine `yylex()', a number of tables used by it for matching
863tokens, and a number of auxiliary routines and macros.  By default,
864`yylex()' is declared as follows:
865
866     int yylex()
867         {
868         ... various definitions and the actions in here ...
869         }
870
871   (If your environment supports function prototypes, then it will be
872"int yylex( void  )".)   This  definition  may  be changed by defining
873the "YY_DECL" macro.  For example, you could use:
874
875     #define YY_DECL float lexscan( a, b ) float a, b;
876
877   to give the scanning routine the name `lexscan', returning a float,
878and taking two floats as arguments.  Note that if you give arguments to
879the scanning routine using a K&R-style/non-prototyped function
880declaration, you must terminate the definition with a semi-colon (`;').
881
882   Whenever `yylex()' is called, it scans tokens from the global input
883file `yyin' (which defaults to stdin).  It continues until it either
884reaches an end-of-file (at which point it returns the value 0) or one
885of its actions executes a `return' statement.
886
887   If the scanner reaches an end-of-file, subsequent calls are undefined
888unless either `yyin' is pointed at a new input file (in which case
889scanning continues from that file), or `yyrestart()' is called.
890`yyrestart()' takes one argument, a `FILE *' pointer (which can be nil,
891if you've set up `YY_INPUT' to scan from a source other than `yyin'),
892and initializes `yyin' for scanning from that file.  Essentially there
893is no difference between just assigning `yyin' to a new input file or
894using `yyrestart()' to do so; the latter is available for compatibility
895with previous versions of `flex', and because it can be used to switch
896input files in the middle of scanning.  It can also be used to throw
897away the current input buffer, by calling it with an argument of
898`yyin'; but better is to use `YY_FLUSH_BUFFER' (see above).  Note that
899`yyrestart()' does *not* reset the start condition to `INITIAL' (see
900Start Conditions, below).
901
902   If `yylex()' stops scanning due to executing a `return' statement in
903one of the actions, the scanner may then be called again and it will
904resume scanning where it left off.
905
906   By default (and for purposes of efficiency), the scanner uses
907block-reads rather than simple `getc()' calls to read characters from
908`yyin'.  The nature of how it gets its input can be controlled by
909defining the `YY_INPUT' macro.  YY_INPUT's calling sequence is
910"YY_INPUT(buf,result,max_size)".  Its action is to place up to MAX_SIZE
911characters in the character array BUF and return in the integer
912variable RESULT either the number of characters read or the constant
913YY_NULL (0 on Unix systems) to indicate EOF.  The default YY_INPUT
914reads from the global file-pointer "yyin".
915
916   A sample definition of YY_INPUT (in the definitions section of the
917input file):
918
919     %{
920     #define YY_INPUT(buf,result,max_size) \
921         { \
922         int c = getchar(); \
923         result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \
924         }
925     %}
926
927   This definition will change the input processing to occur one
928character at a time.
929
930   When the scanner receives an end-of-file indication from YY_INPUT,
931it then checks the `yywrap()' function.  If `yywrap()' returns false
932(zero), then it is assumed that the function has gone ahead and set up
933`yyin' to point to another input file, and scanning continues.  If it
934returns true (non-zero), then the scanner terminates, returning 0 to
935its caller.  Note that in either case, the start condition remains
936unchanged; it does *not* revert to `INITIAL'.
937
938   If you do not supply your own version of `yywrap()', then you must
939either use `%option noyywrap' (in which case the scanner behaves as
940though `yywrap()' returned 1), or you must link with `-lfl' to obtain
941the default version of the routine, which always returns 1.
942
943   Three routines are available for scanning from in-memory buffers
944rather than files: `yy_scan_string()', `yy_scan_bytes()', and
945`yy_scan_buffer()'.  See the discussion of them below in the section
946Multiple Input Buffers.
947
948   The scanner writes its `ECHO' output to the `yyout' global (default,
949stdout), which may be redefined by the user simply by assigning it to
950some other `FILE' pointer.
951
952
953File: flex.info,  Node: Start conditions,  Next: Multiple buffers,  Prev: Generated scanner,  Up: Top
954
955Start conditions
956================
957
958   `flex' provides a mechanism for conditionally activating rules.  Any
959rule whose pattern is prefixed with "<sc>" will only be active when the
960scanner is in the start condition named "sc".  For example,
961
962     <STRING>[^"]*        { /* eat up the string body ... */
963                 ...
964                 }
965
966will be active only when the scanner is in the "STRING" start
967condition, and
968
969     <INITIAL,STRING,QUOTE>\.        { /* handle an escape ... */
970                 ...
971                 }
972
973will be active only when the current start condition is either
974"INITIAL", "STRING", or "QUOTE".
975
976   Start conditions are declared in the definitions (first) section of
977the input using unindented lines beginning with either `%s' or `%x'
978followed by a list of names.  The former declares *inclusive* start
979conditions, the latter *exclusive* start conditions.  A start condition
980is activated using the `BEGIN' action.  Until the next `BEGIN' action is
981executed, rules with the given start condition will be active and rules
982with other start conditions will be inactive.  If the start condition
983is *inclusive*, then rules with no start conditions at all will also be
984active.  If it is *exclusive*, then *only* rules qualified with the
985start condition will be active.  A set of rules contingent on the same
986exclusive start condition describe a scanner which is independent of
987any of the other rules in the `flex' input.  Because of this, exclusive
988start conditions make it easy to specify "mini-scanners" which scan
989portions of the input that are syntactically different from the rest
990(e.g., comments).
991
992   If the distinction between inclusive and exclusive start conditions
993is still a little vague, here's a simple example illustrating the
994connection between the two.  The set of rules:
995
996     %s example
997     %%
998     
999     <example>foo   do_something();
1000     
1001     bar            something_else();
1002
1003is equivalent to
1004
1005     %x example
1006     %%
1007     
1008     <example>foo   do_something();
1009     
1010     <INITIAL,example>bar    something_else();
1011
1012   Without the `<INITIAL,example>' qualifier, the `bar' pattern in the
1013second example wouldn't be active (i.e., couldn't match) when in start
1014condition `example'.  If we just used `<example>' to qualify `bar',
1015though, then it would only be active in `example' and not in `INITIAL',
1016while in the first example it's active in both, because in the first
1017example the `example' starting condition is an *inclusive* (`%s') start
1018condition.
1019
1020   Also note that the special start-condition specifier `<*>' matches
1021every start condition.  Thus, the above example could also have been
1022written;
1023
1024     %x example
1025     %%
1026     
1027     <example>foo   do_something();
1028     
1029     <*>bar    something_else();
1030
1031   The default rule (to `ECHO' any unmatched character) remains active
1032in start conditions.  It is equivalent to:
1033
1034     <*>.|\\n     ECHO;
1035
1036   `BEGIN(0)' returns to the original state where only the rules with
1037no start conditions are active.  This state can also be referred to as
1038the start-condition "INITIAL", so `BEGIN(INITIAL)' is equivalent to
1039`BEGIN(0)'.  (The parentheses around the start condition name are not
1040required but are considered good style.)
1041
1042   `BEGIN' actions can also be given as indented code at the beginning
1043of the rules section.  For example, the following will cause the
1044scanner to enter the "SPECIAL" start condition whenever `yylex()' is
1045called and the global variable `enter_special' is true:
1046
1047             int enter_special;
1048     
1049     %x SPECIAL
1050     %%
1051             if ( enter_special )
1052                 BEGIN(SPECIAL);
1053     
1054     <SPECIAL>blahblahblah
1055     ...more rules follow...
1056
1057   To illustrate the uses of start conditions, here is a scanner which
1058provides two different interpretations of a string like "123.456".  By
1059default it will treat it as as three tokens, the integer "123", a dot
1060('.'), and the integer "456".  But if the string is preceded earlier in
1061the line by the string "expect-floats" it will treat it as a single
1062token, the floating-point number 123.456:
1063
1064     %{
1065     #include <math.h>
1066     %}
1067     %s expect
1068     
1069     %%
1070     expect-floats        BEGIN(expect);
1071     
1072     <expect>[0-9]+"."[0-9]+      {
1073                 printf( "found a float, = %f\n",
1074                         atof( yytext ) );
1075                 }
1076     <expect>\n           {
1077                 /* that's the end of the line, so
1078                  * we need another "expect-number"
1079                  * before we'll recognize any more
1080                  * numbers
1081                  */
1082                 BEGIN(INITIAL);
1083                 }
1084     
1085     [0-9]+      {
1086     
1087     Version 2.5               December 1994                        18
1088     
1089                 printf( "found an integer, = %d\n",
1090                         atoi( yytext ) );
1091                 }
1092     
1093     "."         printf( "found a dot\n" );
1094
1095   Here is a scanner which recognizes (and discards) C comments while
1096maintaining a count of the current input line.
1097
1098     %x comment
1099     %%
1100             int line_num = 1;
1101     
1102     "/*"         BEGIN(comment);
1103     
1104     <comment>[^*\n]*        /* eat anything that's not a '*' */
1105     <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
1106     <comment>\n             ++line_num;
1107     <comment>"*"+"/"        BEGIN(INITIAL);
1108
1109   This scanner goes to a bit of trouble to match as much text as
1110possible with each rule.  In general, when attempting to write a
1111high-speed scanner try to match as much possible in each rule, as it's
1112a big win.
1113
1114   Note that start-conditions names are really integer values and can
1115be stored as such.  Thus, the above could be extended in the following
1116fashion:
1117
1118     %x comment foo
1119     %%
1120             int line_num = 1;
1121             int comment_caller;
1122     
1123     "/*"         {
1124                  comment_caller = INITIAL;
1125                  BEGIN(comment);
1126                  }
1127     
1128     ...
1129     
1130     <foo>"/*"    {
1131                  comment_caller = foo;
1132                  BEGIN(comment);
1133                  }
1134     
1135     <comment>[^*\n]*        /* eat anything that's not a '*' */
1136     <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
1137     <comment>\n             ++line_num;
1138     <comment>"*"+"/"        BEGIN(comment_caller);
1139
1140   Furthermore, you can access the current start condition using the
1141integer-valued `YY_START' macro.  For example, the above assignments to
1142`comment_caller' could instead be written
1143
1144     comment_caller = YY_START;
1145
1146   Flex provides `YYSTATE' as an alias for `YY_START' (since that is
1147what's used by AT&T `lex').
1148
1149   Note that start conditions do not have their own name-space; %s's
1150and %x's declare names in the same fashion as #define's.
1151
1152   Finally, here's an example of how to match C-style quoted strings
1153using exclusive start conditions, including expanded escape sequences
1154(but not including checking for a string that's too long):
1155
1156     %x str
1157     
1158     %%
1159             char string_buf[MAX_STR_CONST];
1160             char *string_buf_ptr;
1161     
1162     \"      string_buf_ptr = string_buf; BEGIN(str);
1163     
1164     <str>\"        { /* saw closing quote - all done */
1165             BEGIN(INITIAL);
1166             *string_buf_ptr = '\0';
1167             /* return string constant token type and
1168              * value to parser
1169              */
1170             }
1171     
1172     <str>\n        {
1173             /* error - unterminated string constant */
1174             /* generate error message */
1175             }
1176     
1177     <str>\\[0-7]{1,3} {
1178             /* octal escape sequence */
1179             int result;
1180     
1181             (void) sscanf( yytext + 1, "%o", &result );
1182     
1183             if ( result > 0xff )
1184                     /* error, constant is out-of-bounds */
1185     
1186             *string_buf_ptr++ = result;
1187             }
1188     
1189     <str>\\[0-9]+ {
1190             /* generate error - bad escape sequence; something
1191              * like '\48' or '\0777777'
1192              */
1193             }
1194     
1195     <str>\\n  *string_buf_ptr++ = '\n';
1196     <str>\\t  *string_buf_ptr++ = '\t';
1197     <str>\\r  *string_buf_ptr++ = '\r';
1198     <str>\\b  *string_buf_ptr++ = '\b';
1199     <str>\\f  *string_buf_ptr++ = '\f';
1200     
1201     <str>\\(.|\n)  *string_buf_ptr++ = yytext[1];
1202     
1203     <str>[^\\\n\"]+        {
1204             char *yptr = yytext;
1205     
1206             while ( *yptr )
1207                     *string_buf_ptr++ = *yptr++;
1208             }
1209
1210   Often, such as in some of the examples above, you wind up writing a
1211whole bunch of rules all preceded by the same start condition(s).  Flex
1212makes this a little easier and cleaner by introducing a notion of start
1213condition "scope".  A start condition scope is begun with:
1214
1215     <SCs>{
1216
1217where SCs is a list of one or more start conditions.  Inside the start
1218condition scope, every rule automatically has the prefix `<SCs>'
1219applied to it, until a `}' which matches the initial `{'.  So, for
1220example,
1221
1222     <ESC>{
1223         "\\n"   return '\n';
1224         "\\r"   return '\r';
1225         "\\f"   return '\f';
1226         "\\0"   return '\0';
1227     }
1228
1229is equivalent to:
1230
1231     <ESC>"\\n"  return '\n';
1232     <ESC>"\\r"  return '\r';
1233     <ESC>"\\f"  return '\f';
1234     <ESC>"\\0"  return '\0';
1235
1236   Start condition scopes may be nested.
1237
1238   Three routines are available for manipulating stacks of start
1239conditions:
1240
1241`void yy_push_state(int new_state)'
1242     pushes the current start condition onto the top of the start
1243     condition stack and switches to NEW_STATE as though you had used
1244     `BEGIN new_state' (recall that start condition names are also
1245     integers).
1246
1247`void yy_pop_state()'
1248     pops the top of the stack and switches to it via `BEGIN'.
1249
1250`int yy_top_state()'
1251     returns the top of the stack without altering the stack's contents.
1252
1253   The start condition stack grows dynamically and so has no built-in
1254size limitation.  If memory is exhausted, program execution aborts.
1255
1256   To use start condition stacks, your scanner must include a `%option
1257stack' directive (see Options below).
1258
1259
1260File: flex.info,  Node: Multiple buffers,  Next: End-of-file rules,  Prev: Start conditions,  Up: Top
1261
1262Multiple input buffers
1263======================
1264
1265   Some scanners (such as those which support "include" files) require
1266reading from several input streams.  As `flex' scanners do a large
1267amount of buffering, one cannot control where the next input will be
1268read from by simply writing a `YY_INPUT' which is sensitive to the
1269scanning context.  `YY_INPUT' is only called when the scanner reaches
1270the end of its buffer, which may be a long time after scanning a
1271statement such as an "include" which requires switching the input
1272source.
1273
1274   To negotiate these sorts of problems, `flex' provides a mechanism
1275for creating and switching between multiple input buffers.  An input
1276buffer is created by using:
1277
1278     YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
1279
1280which takes a `FILE' pointer and a size and creates a buffer associated
1281with the given file and large enough to hold SIZE characters (when in
1282doubt, use `YY_BUF_SIZE' for the size).  It returns a `YY_BUFFER_STATE'
1283handle, which may then be passed to other routines (see below).  The
1284`YY_BUFFER_STATE' type is a pointer to an opaque `struct'
1285`yy_buffer_state' structure, so you may safely initialize
1286YY_BUFFER_STATE variables to `((YY_BUFFER_STATE) 0)' if you wish, and
1287also refer to the opaque structure in order to correctly declare input
1288buffers in source files other than that of your scanner.  Note that the
1289`FILE' pointer in the call to `yy_create_buffer' is only used as the
1290value of `yyin' seen by `YY_INPUT'; if you redefine `YY_INPUT' so it no
1291longer uses `yyin', then you can safely pass a nil `FILE' pointer to
1292`yy_create_buffer'.  You select a particular buffer to scan from using:
1293
1294     void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
1295
1296   switches the scanner's input buffer so subsequent tokens will come
1297from NEW_BUFFER.  Note that `yy_switch_to_buffer()' may be used by
1298`yywrap()' to set things up for continued scanning, instead of opening
1299a new file and pointing `yyin' at it.  Note also that switching input
1300sources via either `yy_switch_to_buffer()' or `yywrap()' does *not*
1301change the start condition.
1302
1303     void yy_delete_buffer( YY_BUFFER_STATE buffer )
1304
1305is used to reclaim the storage associated with a buffer.  You can also
1306clear the current contents of a buffer using:
1307
1308     void yy_flush_buffer( YY_BUFFER_STATE buffer )
1309
1310   This function discards the buffer's contents, so the next time the
1311scanner attempts to match a token from the buffer, it will first fill
1312the buffer anew using `YY_INPUT'.
1313
1314   `yy_new_buffer()' is an alias for `yy_create_buffer()', provided for
1315compatibility with the C++ use of `new' and `delete' for creating and
1316destroying dynamic objects.
1317
1318   Finally, the `YY_CURRENT_BUFFER' macro returns a `YY_BUFFER_STATE'
1319handle to the current buffer.
1320
1321   Here is an example of using these features for writing a scanner
1322which expands include files (the `<<EOF>>' feature is discussed below):
1323
1324     /* the "incl" state is used for picking up the name
1325      * of an include file
1326      */
1327     %x incl
1328     
1329     %{
1330     #define MAX_INCLUDE_DEPTH 10
1331     YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
1332     int include_stack_ptr = 0;
1333     %}
1334     
1335     %%
1336     include             BEGIN(incl);
1337     
1338     [a-z]+              ECHO;
1339     [^a-z\n]*\n?        ECHO;
1340     
1341     <incl>[ \t]*      /* eat the whitespace */
1342     <incl>[^ \t\n]+   { /* got the include file name */
1343             if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
1344                 {
1345                 fprintf( stderr, "Includes nested too deeply" );
1346                 exit( 1 );
1347                 }
1348     
1349             include_stack[include_stack_ptr++] =
1350                 YY_CURRENT_BUFFER;
1351     
1352             yyin = fopen( yytext, "r" );
1353     
1354             if ( ! yyin )
1355                 error( ... );
1356     
1357             yy_switch_to_buffer(
1358                 yy_create_buffer( yyin, YY_BUF_SIZE ) );
1359     
1360             BEGIN(INITIAL);
1361             }
1362     
1363     <<EOF>> {
1364             if ( --include_stack_ptr < 0 )
1365                 {
1366                 yyterminate();
1367                 }
1368     
1369             else
1370                 {
1371                 yy_delete_buffer( YY_CURRENT_BUFFER );
1372                 yy_switch_to_buffer(
1373                      include_stack[include_stack_ptr] );
1374                 }
1375             }
1376
1377   Three routines are available for setting up input buffers for
1378scanning in-memory strings instead of files.  All of them create a new
1379input buffer for scanning the string, and return a corresponding
1380`YY_BUFFER_STATE' handle (which you should delete with
1381`yy_delete_buffer()' when done with it).  They also switch to the new
1382buffer using `yy_switch_to_buffer()', so the next call to `yylex()' will
1383start scanning the string.
1384
1385`yy_scan_string(const char *str)'
1386     scans a NUL-terminated string.
1387
1388`yy_scan_bytes(const char *bytes, int len)'
1389     scans `len' bytes (including possibly NUL's) starting at location
1390     BYTES.
1391
1392   Note that both of these functions create and scan a *copy* of the
1393string or bytes.  (This may be desirable, since `yylex()' modifies the
1394contents of the buffer it is scanning.) You can avoid the copy by using:
1395
1396`yy_scan_buffer(char *base, yy_size_t size)'
1397     which scans in place the buffer starting at BASE, consisting of
1398     SIZE bytes, the last two bytes of which *must* be
1399     `YY_END_OF_BUFFER_CHAR' (ASCII NUL).  These last two bytes are not
1400     scanned; thus, scanning consists of `base[0]' through
1401     `base[size-2]', inclusive.
1402
1403     If you fail to set up BASE in this manner (i.e., forget the final
1404     two `YY_END_OF_BUFFER_CHAR' bytes), then `yy_scan_buffer()'
1405     returns a nil pointer instead of creating a new input buffer.
1406
1407     The type `yy_size_t' is an integral type to which you can cast an
1408     integer expression reflecting the size of the buffer.
1409
1410
1411File: flex.info,  Node: End-of-file rules,  Next: Miscellaneous,  Prev: Multiple buffers,  Up: Top
1412
1413End-of-file rules
1414=================
1415
1416   The special rule "<<EOF>>" indicates actions which are to be taken
1417when an end-of-file is encountered and yywrap() returns non-zero (i.e.,
1418indicates no further files to process).  The action must finish by
1419doing one of four things:
1420
1421   - assigning `yyin' to a new input file (in previous versions of
1422     flex, after doing the assignment you had to call the special
1423     action `YY_NEW_FILE'; this is no longer necessary);
1424
1425   - executing a `return' statement;
1426
1427   - executing the special `yyterminate()' action;
1428
1429   - or, switching to a new buffer using `yy_switch_to_buffer()' as
1430     shown in the example above.
1431
1432   <<EOF>> rules may not be used with other patterns; they may only be
1433qualified with a list of start conditions.  If an unqualified <<EOF>>
1434rule is given, it applies to *all* start conditions which do not
1435already have <<EOF>> actions.  To specify an <<EOF>> rule for only the
1436initial start condition, use
1437
1438     <INITIAL><<EOF>>
1439
1440   These rules are useful for catching things like unclosed comments.
1441An example:
1442
1443     %x quote
1444     %%
1445     
1446     ...other rules for dealing with quotes...
1447     
1448     <quote><<EOF>>   {
1449              error( "unterminated quote" );
1450              yyterminate();
1451              }
1452     <<EOF>>  {
1453              if ( *++filelist )
1454                  yyin = fopen( *filelist, "r" );
1455              else
1456                 yyterminate();
1457              }
1458
1459
1460File: flex.info,  Node: Miscellaneous,  Next: User variables,  Prev: End-of-file rules,  Up: Top
1461
1462Miscellaneous macros
1463====================
1464
1465   The macro `YY_USER_ACTION' can be defined to provide an action which
1466is always executed prior to the matched rule's action.  For example, it
1467could be #define'd to call a routine to convert yytext to lower-case.
1468When `YY_USER_ACTION' is invoked, the variable `yy_act' gives the
1469number of the matched rule (rules are numbered starting with 1).
1470Suppose you want to profile how often each of your rules is matched.
1471The following would do the trick:
1472
1473     #define YY_USER_ACTION ++ctr[yy_act]
1474
1475   where `ctr' is an array to hold the counts for the different rules.
1476Note that the macro `YY_NUM_RULES' gives the total number of rules
1477(including the default rule, even if you use `-s', so a correct
1478declaration for `ctr' is:
1479
1480     int ctr[YY_NUM_RULES];
1481
1482   The macro `YY_USER_INIT' may be defined to provide an action which
1483is always executed before the first scan (and before the scanner's
1484internal initializations are done).  For example, it could be used to
1485call a routine to read in a data table or open a logging file.
1486
1487   The macro `yy_set_interactive(is_interactive)' can be used to
1488control whether the current buffer is considered *interactive*.  An
1489interactive buffer is processed more slowly, but must be used when the
1490scanner's input source is indeed interactive to avoid problems due to
1491waiting to fill buffers (see the discussion of the `-I' flag below).  A
1492non-zero value in the macro invocation marks the buffer as interactive,
1493a zero value as non-interactive.  Note that use of this macro overrides
1494`%option always-interactive' or `%option never-interactive' (see
1495Options below).  `yy_set_interactive()' must be invoked prior to
1496beginning to scan the buffer that is (or is not) to be considered
1497interactive.
1498
1499   The macro `yy_set_bol(at_bol)' can be used to control whether the
1500current buffer's scanning context for the next token match is done as
1501though at the beginning of a line.  A non-zero macro argument makes
1502rules anchored with
1503
1504   The macro `YY_AT_BOL()' returns true if the next token scanned from
1505the current buffer will have '^' rules active, false otherwise.
1506
1507   In the generated scanner, the actions are all gathered in one large
1508switch statement and separated using `YY_BREAK', which may be
1509redefined.  By default, it is simply a "break", to separate each rule's
1510action from the following rule's.  Redefining `YY_BREAK' allows, for
1511example, C++ users to #define YY_BREAK to do nothing (while being very
1512careful that every rule ends with a "break" or a "return"!) to avoid
1513suffering from unreachable statement warnings where because a rule's
1514action ends with "return", the `YY_BREAK' is inaccessible.
1515
1516
1517File: flex.info,  Node: User variables,  Next: YACC interface,  Prev: Miscellaneous,  Up: Top
1518
1519Values available to the user
1520============================
1521
1522   This section summarizes the various values available to the user in
1523the rule actions.
1524
1525   - `char *yytext' holds the text of the current token.  It may be
1526     modified but not lengthened (you cannot append characters to the
1527     end).
1528
1529     If the special directive `%array' appears in the first section of
1530     the scanner description, then `yytext' is instead declared `char
1531     yytext[YYLMAX]', where `YYLMAX' is a macro definition that you can
1532     redefine in the first section if you don't like the default value
1533     (generally 8KB).  Using `%array' results in somewhat slower
1534     scanners, but the value of `yytext' becomes immune to calls to
1535     `input()' and `unput()', which potentially destroy its value when
1536     `yytext' is a character pointer.  The opposite of `%array' is
1537     `%pointer', which is the default.
1538
1539     You cannot use `%array' when generating C++ scanner classes (the
1540     `-+' flag).
1541
1542   - `int yyleng' holds the length of the current token.
1543
1544   - `FILE *yyin' is the file which by default `flex' reads from.  It
1545     may be redefined but doing so only makes sense before scanning
1546     begins or after an EOF has been encountered.  Changing it in the
1547     midst of scanning will have unexpected results since `flex'
1548     buffers its input; use `yyrestart()' instead.  Once scanning
1549     terminates because an end-of-file has been seen, you can assign
1550     `yyin' at the new input file and then call the scanner again to
1551     continue scanning.
1552
1553   - `void yyrestart( FILE *new_file )' may be called to point `yyin'
1554     at the new input file.  The switch-over to the new file is
1555     immediate (any previously buffered-up input is lost).  Note that
1556     calling `yyrestart()' with `yyin' as an argument thus throws away
1557     the current input buffer and continues scanning the same input
1558     file.
1559
1560   - `FILE *yyout' is the file to which `ECHO' actions are done.  It
1561     can be reassigned by the user.
1562
1563   - `YY_CURRENT_BUFFER' returns a `YY_BUFFER_STATE' handle to the
1564     current buffer.
1565
1566   - `YY_START' returns an integer value corresponding to the current
1567     start condition.  You can subsequently use this value with `BEGIN'
1568     to return to that start condition.
1569
1570
1571File: flex.info,  Node: YACC interface,  Next: Options,  Prev: User variables,  Up: Top
1572
1573Interfacing with `yacc'
1574=======================
1575
1576   One of the main uses of `flex' is as a companion to the `yacc'
1577parser-generator.  `yacc' parsers expect to call a routine named
1578`yylex()' to find the next input token.  The routine is supposed to
1579return the type of the next token as well as putting any associated
1580value in the global `yylval'.  To use `flex' with `yacc', one specifies
1581the `-d' option to `yacc' to instruct it to generate the file `y.tab.h'
1582containing definitions of all the `%tokens' appearing in the `yacc'
1583input.  This file is then included in the `flex' scanner.  For example,
1584if one of the tokens is "TOK_NUMBER", part of the scanner might look
1585like:
1586
1587     %{
1588     #include "y.tab.h"
1589     %}
1590     
1591     %%
1592     
1593     [0-9]+        yylval = atoi( yytext ); return TOK_NUMBER;
1594
1595
1596File: flex.info,  Node: Options,  Next: Performance,  Prev: YACC interface,  Up: Top
1597
1598Options
1599=======
1600
1601   `flex' has the following options:
1602
1603`-b'
1604     Generate backing-up information to `lex.backup'.  This is a list
1605     of scanner states which require backing up and the input
1606     characters on which they do so.  By adding rules one can remove
1607     backing-up states.  If *all* backing-up states are eliminated and
1608     `-Cf' or `-CF' is used, the generated scanner will run faster (see
1609     the `-p' flag).  Only users who wish to squeeze every last cycle
1610     out of their scanners need worry about this option.  (See the
1611     section on Performance Considerations below.)
1612
1613`-c'
1614     is a do-nothing, deprecated option included for POSIX compliance.
1615
1616`-d'
1617     makes the generated scanner run in "debug" mode.  Whenever a
1618     pattern is recognized and the global `yy_flex_debug' is non-zero
1619     (which is the default), the scanner will write to `stderr' a line
1620     of the form:
1621
1622          --accepting rule at line 53 ("the matched text")
1623
1624     The line number refers to the location of the rule in the file
1625     defining the scanner (i.e., the file that was fed to flex).
1626     Messages are also generated when the scanner backs up, accepts the
1627     default rule, reaches the end of its input buffer (or encounters a
1628     NUL; at this point, the two look the same as far as the scanner's
1629     concerned), or reaches an end-of-file.
1630
1631`-f'
1632     specifies "fast scanner".  No table compression is done and stdio
1633     is bypassed.  The result is large but fast.  This option is
1634     equivalent to `-Cfr' (see below).
1635
1636`-h'
1637     generates a "help" summary of `flex's' options to `stdout' and
1638     then exits.  `-?' and `--help' are synonyms for `-h'.
1639
1640`-i'
1641     instructs `flex' to generate a *case-insensitive* scanner.  The
1642     case of letters given in the `flex' input patterns will be
1643     ignored, and tokens in the input will be matched regardless of
1644     case.  The matched text given in `yytext' will have the preserved
1645     case (i.e., it will not be folded).
1646
1647`-l'
1648     turns on maximum compatibility with the original AT&T `lex'
1649     implementation.  Note that this does not mean *full*
1650     compatibility.  Use of this option costs a considerable amount of
1651     performance, and it cannot be used with the `-+, -f, -F, -Cf', or
1652     `-CF' options.  For details on the compatibilities it provides, see
1653     the section "Incompatibilities With Lex And POSIX" below.  This
1654     option also results in the name `YY_FLEX_LEX_COMPAT' being
1655     #define'd in the generated scanner.
1656
1657`-n'
1658     is another do-nothing, deprecated option included only for POSIX
1659     compliance.
1660
1661`-p'
1662     generates a performance report to stderr.  The report consists of
1663     comments regarding features of the `flex' input file which will
1664     cause a serious loss of performance in the resulting scanner.  If
1665     you give the flag twice, you will also get comments regarding
1666     features that lead to minor performance losses.
1667
1668     Note that the use of `REJECT', `%option yylineno' and variable
1669     trailing context (see the Deficiencies / Bugs section below)
1670     entails a substantial performance penalty; use of `yymore()', the
1671     `^' operator, and the `-I' flag entail minor performance penalties.
1672
1673`-s'
1674     causes the "default rule" (that unmatched scanner input is echoed
1675     to `stdout') to be suppressed.  If the scanner encounters input
1676     that does not match any of its rules, it aborts with an error.
1677     This option is useful for finding holes in a scanner's rule set.
1678
1679`-t'
1680     instructs `flex' to write the scanner it generates to standard
1681     output instead of `lex.yy.c'.
1682
1683`-v'
1684     specifies that `flex' should write to `stderr' a summary of
1685     statistics regarding the scanner it generates.  Most of the
1686     statistics are meaningless to the casual `flex' user, but the
1687     first line identifies the version of `flex' (same as reported by
1688     `-V'), and the next line the flags used when generating the
1689     scanner, including those that are on by default.
1690
1691`-w'
1692     suppresses warning messages.
1693
1694`-B'
1695     instructs `flex' to generate a *batch* scanner, the opposite of
1696     *interactive* scanners generated by `-I' (see below).  In general,
1697     you use `-B' when you are *certain* that your scanner will never
1698     be used interactively, and you want to squeeze a *little* more
1699     performance out of it.  If your goal is instead to squeeze out a
1700     *lot* more performance, you should be using the `-Cf' or `-CF'
1701     options (discussed below), which turn on `-B' automatically anyway.
1702
1703`-F'
1704     specifies that the "fast" scanner table representation should be
1705     used (and stdio bypassed).  This representation is about as fast
1706     as the full table representation `(-f)', and for some sets of
1707     patterns will be considerably smaller (and for others, larger).
1708     In general, if the pattern set contains both "keywords" and a
1709     catch-all, "identifier" rule, such as in the set:
1710
1711          "case"    return TOK_CASE;
1712          "switch"  return TOK_SWITCH;
1713          ...
1714          "default" return TOK_DEFAULT;
1715          [a-z]+    return TOK_ID;
1716
1717     then you're better off using the full table representation.  If
1718     only the "identifier" rule is present and you then use a hash
1719     table or some such to detect the keywords, you're better off using
1720     `-F'.
1721
1722     This option is equivalent to `-CFr' (see below).  It cannot be
1723     used with `-+'.
1724
1725`-I'
1726     instructs `flex' to generate an *interactive* scanner.  An
1727     interactive scanner is one that only looks ahead to decide what
1728     token has been matched if it absolutely must.  It turns out that
1729     always looking one extra character ahead, even if the scanner has
1730     already seen enough text to disambiguate the current token, is a
1731     bit faster than only looking ahead when necessary.  But scanners
1732     that always look ahead give dreadful interactive performance; for
1733     example, when a user types a newline, it is not recognized as a
1734     newline token until they enter *another* token, which often means
1735     typing in another whole line.
1736
1737     `Flex' scanners default to *interactive* unless you use the `-Cf'
1738     or `-CF' table-compression options (see below).  That's because if
1739     you're looking for high-performance you should be using one of
1740     these options, so if you didn't, `flex' assumes you'd rather trade
1741     off a bit of run-time performance for intuitive interactive
1742     behavior.  Note also that you *cannot* use `-I' in conjunction
1743     with `-Cf' or `-CF'.  Thus, this option is not really needed; it
1744     is on by default for all those cases in which it is allowed.
1745
1746     You can force a scanner to *not* be interactive by using `-B' (see
1747     above).
1748
1749`-L'
1750     instructs `flex' not to generate `#line' directives.  Without this
1751     option, `flex' peppers the generated scanner with #line directives
1752     so error messages in the actions will be correctly located with
1753     respect to either the original `flex' input file (if the errors
1754     are due to code in the input file), or `lex.yy.c' (if the errors
1755     are `flex's' fault - you should report these sorts of errors to
1756     the email address given below).
1757
1758`-T'
1759     makes `flex' run in `trace' mode.  It will generate a lot of
1760     messages to `stderr' concerning the form of the input and the
1761     resultant non-deterministic and deterministic finite automata.
1762     This option is mostly for use in maintaining `flex'.
1763
1764`-V'
1765     prints the version number to `stdout' and exits.  `--version' is a
1766     synonym for `-V'.
1767
1768`-7'
1769     instructs `flex' to generate a 7-bit scanner, i.e., one which can
1770     only recognized 7-bit characters in its input.  The advantage of
1771     using `-7' is that the scanner's tables can be up to half the size
1772     of those generated using the `-8' option (see below).  The
1773     disadvantage is that such scanners often hang or crash if their
1774     input contains an 8-bit character.
1775
1776     Note, however, that unless you generate your scanner using the
1777     `-Cf' or `-CF' table compression options, use of `-7' will save
1778     only a small amount of table space, and make your scanner
1779     considerably less portable.  `Flex's' default behavior is to
1780     generate an 8-bit scanner unless you use the `-Cf' or `-CF', in
1781     which case `flex' defaults to generating 7-bit scanners unless
1782     your site was always configured to generate 8-bit scanners (as
1783     will often be the case with non-USA sites).  You can tell whether
1784     flex generated a 7-bit or an 8-bit scanner by inspecting the flag
1785     summary in the `-v' output as described above.
1786
1787     Note that if you use `-Cfe' or `-CFe' (those table compression
1788     options, but also using equivalence classes as discussed see
1789     below), flex still defaults to generating an 8-bit scanner, since
1790     usually with these compression options full 8-bit tables are not
1791     much more expensive than 7-bit tables.
1792
1793`-8'
1794     instructs `flex' to generate an 8-bit scanner, i.e., one which can
1795     recognize 8-bit characters.  This flag is only needed for scanners
1796     generated using `-Cf' or `-CF', as otherwise flex defaults to
1797     generating an 8-bit scanner anyway.
1798
1799     See the discussion of `-7' above for flex's default behavior and
1800     the tradeoffs between 7-bit and 8-bit scanners.
1801
1802`-+'
1803     specifies that you want flex to generate a C++ scanner class.  See
1804     the section on Generating C++ Scanners below for details.
1805
1806`-C[aefFmr]'
1807     controls the degree of table compression and, more generally,
1808     trade-offs between small scanners and fast scanners.
1809
1810     `-Ca' ("align") instructs flex to trade off larger tables in the
1811     generated scanner for faster performance because the elements of
1812     the tables are better aligned for memory access and computation.
1813     On some RISC architectures, fetching and manipulating long-words
1814     is more efficient than with smaller-sized units such as
1815     shortwords.  This option can double the size of the tables used by
1816     your scanner.
1817
1818     `-Ce' directs `flex' to construct "equivalence classes", i.e.,
1819     sets of characters which have identical lexical properties (for
1820     example, if the only appearance of digits in the `flex' input is
1821     in the character class "[0-9]" then the digits '0', '1', ..., '9'
1822     will all be put in the same equivalence class).  Equivalence
1823     classes usually give dramatic reductions in the final table/object
1824     file sizes (typically a factor of 2-5) and are pretty cheap
1825     performance-wise (one array look-up per character scanned).
1826
1827     `-Cf' specifies that the *full* scanner tables should be generated
1828     - `flex' should not compress the tables by taking advantages of
1829     similar transition functions for different states.
1830
1831     `-CF' specifies that the alternate fast scanner representation
1832     (described above under the `-F' flag) should be used.  This option
1833     cannot be used with `-+'.
1834
1835     `-Cm' directs `flex' to construct "meta-equivalence classes",
1836     which are sets of equivalence classes (or characters, if
1837     equivalence classes are not being used) that are commonly used
1838     together.  Meta-equivalence classes are often a big win when using
1839     compressed tables, but they have a moderate performance impact
1840     (one or two "if" tests and one array look-up per character
1841     scanned).
1842
1843     `-Cr' causes the generated scanner to *bypass* use of the standard
1844     I/O library (stdio) for input.  Instead of calling `fread()' or
1845     `getc()', the scanner will use the `read()' system call, resulting
1846     in a performance gain which varies from system to system, but in
1847     general is probably negligible unless you are also using `-Cf' or
1848     `-CF'.  Using `-Cr' can cause strange behavior if, for example,
1849     you read from `yyin' using stdio prior to calling the scanner
1850     (because the scanner will miss whatever text your previous reads
1851     left in the stdio input buffer).
1852
1853     `-Cr' has no effect if you define `YY_INPUT' (see The Generated
1854     Scanner above).
1855
1856     A lone `-C' specifies that the scanner tables should be compressed
1857     but neither equivalence classes nor meta-equivalence classes
1858     should be used.
1859
1860     The options `-Cf' or `-CF' and `-Cm' do not make sense together -
1861     there is no opportunity for meta-equivalence classes if the table
1862     is not being compressed.  Otherwise the options may be freely
1863     mixed, and are cumulative.
1864
1865     The default setting is `-Cem', which specifies that `flex' should
1866     generate equivalence classes and meta-equivalence classes.  This
1867     setting provides the highest degree of table compression.  You can
1868     trade off faster-executing scanners at the cost of larger tables
1869     with the following generally being true:
1870
1871          slowest & smallest
1872                -Cem
1873                -Cm
1874                -Ce
1875                -C
1876                -C{f,F}e
1877                -C{f,F}
1878                -C{f,F}a
1879          fastest & largest
1880
1881     Note that scanners with the smallest tables are usually generated
1882     and compiled the quickest, so during development you will usually
1883     want to use the default, maximal compression.
1884
1885     `-Cfe' is often a good compromise between speed and size for
1886     production scanners.
1887
1888`-ooutput'
1889     directs flex to write the scanner to the file `out-' `put' instead
1890     of `lex.yy.c'.  If you combine `-o' with the `-t' option, then the
1891     scanner is written to `stdout' but its `#line' directives (see the
1892     `-L' option above) refer to the file `output'.
1893
1894`-Pprefix'
1895     changes the default `yy' prefix used by `flex' for all
1896     globally-visible variable and function names to instead be PREFIX.
1897     For example, `-Pfoo' changes the name of `yytext' to `footext'.
1898     It also changes the name of the default output file from
1899     `lex.yy.c' to `lex.foo.c'.  Here are all of the names affected:
1900
1901          yy_create_buffer
1902          yy_delete_buffer
1903          yy_flex_debug
1904          yy_init_buffer
1905          yy_flush_buffer
1906          yy_load_buffer_state
1907          yy_switch_to_buffer
1908          yyin
1909          yyleng
1910          yylex
1911          yylineno
1912          yyout
1913          yyrestart
1914          yytext
1915          yywrap
1916
1917     (If you are using a C++ scanner, then only `yywrap' and
1918     `yyFlexLexer' are affected.) Within your scanner itself, you can
1919     still refer to the global variables and functions using either
1920     version of their name; but externally, they have the modified name.
1921
1922     This option lets you easily link together multiple `flex' programs
1923     into the same executable.  Note, though, that using this option
1924     also renames `yywrap()', so you now *must* either provide your own
1925     (appropriately-named) version of the routine for your scanner, or
1926     use `%option noyywrap', as linking with `-lfl' no longer provides
1927     one for you by default.
1928
1929`-Sskeleton_file'
1930     overrides the default skeleton file from which `flex' constructs
1931     its scanners.  You'll never need this option unless you are doing
1932     `flex' maintenance or development.
1933
1934   `flex' also provides a mechanism for controlling options within the
1935scanner specification itself, rather than from the flex command-line.
1936This is done by including `%option' directives in the first section of
1937the scanner specification.  You can specify multiple options with a
1938single `%option' directive, and multiple directives in the first
1939section of your flex input file.  Most options are given simply as
1940names, optionally preceded by the word "no" (with no intervening
1941whitespace) to negate their meaning.  A number are equivalent to flex
1942flags or their negation:
1943
1944     7bit            -7 option
1945     8bit            -8 option
1946     align           -Ca option
1947     backup          -b option
1948     batch           -B option
1949     c++             -+ option
1950     
1951     caseful or
1952     case-sensitive  opposite of -i (default)
1953     
1954     case-insensitive or
1955     caseless        -i option
1956     
1957     debug           -d option
1958     default         opposite of -s option
1959     ecs             -Ce option
1960     fast            -F option
1961     full            -f option
1962     interactive     -I option
1963     lex-compat      -l option
1964     meta-ecs        -Cm option
1965     perf-report     -p option
1966     read            -Cr option
1967     stdout          -t option
1968     verbose         -v option
1969     warn            opposite of -w option
1970                     (use "%option nowarn" for -w)
1971     
1972     array           equivalent to "%array"
1973     pointer         equivalent to "%pointer" (default)
1974
1975   Some `%option's' provide features otherwise not available:
1976
1977`always-interactive'
1978     instructs flex to generate a scanner which always considers its
1979     input "interactive".  Normally, on each new input file the scanner
1980     calls `isatty()' in an attempt to determine whether the scanner's
1981     input source is interactive and thus should be read a character at
1982     a time.  When this option is used, however, then no such call is
1983     made.
1984
1985`main'
1986     directs flex to provide a default `main()' program for the
1987     scanner, which simply calls `yylex()'.  This option implies
1988     `noyywrap' (see below).
1989
1990`never-interactive'
1991     instructs flex to generate a scanner which never considers its
1992     input "interactive" (again, no call made to `isatty())'.  This is
1993     the opposite of `always-' *interactive*.
1994
1995`stack'
1996     enables the use of start condition stacks (see Start Conditions
1997     above).
1998
1999`stdinit'
2000     if unset (i.e., `%option nostdinit') initializes `yyin' and
2001     `yyout' to nil `FILE' pointers, instead of `stdin' and `stdout'.
2002
2003`yylineno'
2004     directs `flex' to generate a scanner that maintains the number of
2005     the current line read from its input in the global variable
2006     `yylineno'.  This option is implied by `%option lex-compat'.
2007
2008`yywrap'
2009     if unset (i.e., `%option noyywrap'), makes the scanner not call
2010     `yywrap()' upon an end-of-file, but simply assume that there are
2011     no more files to scan (until the user points `yyin' at a new file
2012     and calls `yylex()' again).
2013
2014   `flex' scans your rule actions to determine whether you use the
2015`REJECT' or `yymore()' features.  The `reject' and `yymore' options are
2016available to override its decision as to whether you use the options,
2017either by setting them (e.g., `%option reject') to indicate the feature
2018is indeed used, or unsetting them to indicate it actually is not used
2019(e.g., `%option noyymore').
2020
2021   Three options take string-delimited values, offset with '=':
2022
2023     %option outfile="ABC"
2024
2025is equivalent to `-oABC', and
2026
2027     %option prefix="XYZ"
2028
2029is equivalent to `-PXYZ'.
2030
2031   Finally,
2032
2033     %option yyclass="foo"
2034
2035only applies when generating a C++ scanner (`-+' option).  It informs
2036`flex' that you have derived `foo' as a subclass of `yyFlexLexer' so
2037`flex' will place your actions in the member function `foo::yylex()'
2038instead of `yyFlexLexer::yylex()'.  It also generates a
2039`yyFlexLexer::yylex()' member function that emits a run-time error (by
2040invoking `yyFlexLexer::LexerError()') if called.  See Generating C++
2041Scanners, below, for additional information.
2042
2043   A number of options are available for lint purists who want to
2044suppress the appearance of unneeded routines in the generated scanner.
2045Each of the following, if unset, results in the corresponding routine
2046not appearing in the generated scanner:
2047
2048     input, unput
2049     yy_push_state, yy_pop_state, yy_top_state
2050     yy_scan_buffer, yy_scan_bytes, yy_scan_string
2051
2052(though `yy_push_state()' and friends won't appear anyway unless you
2053use `%option stack').
2054
2055
2056File: flex.info,  Node: Performance,  Next: C++,  Prev: Options,  Up: Top
2057
2058Performance considerations
2059==========================
2060
2061   The main design goal of `flex' is that it generate high-performance
2062scanners.  It has been optimized for dealing well with large sets of
2063rules.  Aside from the effects on scanner speed of the table
2064compression `-C' options outlined above, there are a number of
2065options/actions which degrade performance.  These are, from most
2066expensive to least:
2067
2068     REJECT
2069     %option yylineno
2070     arbitrary trailing context
2071     
2072     pattern sets that require backing up
2073     %array
2074     %option interactive
2075     %option always-interactive
2076     
2077     '^' beginning-of-line operator
2078     yymore()
2079
2080   with the first three all being quite expensive and the last two
2081being quite cheap.  Note also that `unput()' is implemented as a
2082routine call that potentially does quite a bit of work, while
2083`yyless()' is a quite-cheap macro; so if just putting back some excess
2084text you scanned, use `yyless()'.
2085
2086   `REJECT' should be avoided at all costs when performance is
2087important.  It is a particularly expensive option.
2088
2089   Getting rid of backing up is messy and often may be an enormous
2090amount of work for a complicated scanner.  In principal, one begins by
2091using the `-b' flag to generate a `lex.backup' file.  For example, on
2092the input
2093
2094     %%
2095     foo        return TOK_KEYWORD;
2096     foobar     return TOK_KEYWORD;
2097
2098the file looks like:
2099
2100     State #6 is non-accepting -
2101      associated rule line numbers:
2102            2       3
2103      out-transitions: [ o ]
2104      jam-transitions: EOF [ \001-n  p-\177 ]
2105     
2106     State #8 is non-accepting -
2107      associated rule line numbers:
2108            3
2109      out-transitions: [ a ]
2110      jam-transitions: EOF [ \001-`  b-\177 ]
2111     
2112     State #9 is non-accepting -
2113      associated rule line numbers:
2114            3
2115      out-transitions: [ r ]
2116      jam-transitions: EOF [ \001-q  s-\177 ]
2117     
2118     Compressed tables always back up.
2119
2120   The first few lines tell us that there's a scanner state in which it
2121can make a transition on an 'o' but not on any other character, and
2122that in that state the currently scanned text does not match any rule.
2123The state occurs when trying to match the rules found at lines 2 and 3
2124in the input file.  If the scanner is in that state and then reads
2125something other than an 'o', it will have to back up to find a rule
2126which is matched.  With a bit of head-scratching one can see that this
2127must be the state it's in when it has seen "fo".  When this has
2128happened, if anything other than another 'o' is seen, the scanner will
2129have to back up to simply match the 'f' (by the default rule).
2130
2131   The comment regarding State #8 indicates there's a problem when
2132"foob" has been scanned.  Indeed, on any character other than an 'a',
2133the scanner will have to back up to accept "foo".  Similarly, the
2134comment for State #9 concerns when "fooba" has been scanned and an 'r'
2135does not follow.
2136
2137   The final comment reminds us that there's no point going to all the
2138trouble of removing backing up from the rules unless we're using `-Cf'
2139or `-CF', since there's no performance gain doing so with compressed
2140scanners.
2141
2142   The way to remove the backing up is to add "error" rules:
2143
2144     %%
2145     foo         return TOK_KEYWORD;
2146     foobar      return TOK_KEYWORD;
2147     
2148     fooba       |
2149     foob        |
2150     fo          {
2151                 /* false alarm, not really a keyword */
2152                 return TOK_ID;
2153                 }
2154
2155   Eliminating backing up among a list of keywords can also be done
2156using a "catch-all" rule:
2157
2158     %%
2159     foo         return TOK_KEYWORD;
2160     foobar      return TOK_KEYWORD;
2161     
2162     [a-z]+      return TOK_ID;
2163
2164   This is usually the best solution when appropriate.
2165
2166   Backing up messages tend to cascade.  With a complicated set of
2167rules it's not uncommon to get hundreds of messages.  If one can
2168decipher them, though, it often only takes a dozen or so rules to
2169eliminate the backing up (though it's easy to make a mistake and have
2170an error rule accidentally match a valid token.  A possible future
2171`flex' feature will be to automatically add rules to eliminate backing
2172up).
2173
2174   It's important to keep in mind that you gain the benefits of
2175eliminating backing up only if you eliminate *every* instance of
2176backing up.  Leaving just one means you gain nothing.
2177
2178   VARIABLE trailing context (where both the leading and trailing parts
2179do not have a fixed length) entails almost the same performance loss as
2180`REJECT' (i.e., substantial).  So when possible a rule like:
2181
2182     %%
2183     mouse|rat/(cat|dog)   run();
2184
2185is better written:
2186
2187     %%
2188     mouse/cat|dog         run();
2189     rat/cat|dog           run();
2190
2191or as
2192
2193     %%
2194     mouse|rat/cat         run();
2195     mouse|rat/dog         run();
2196
2197   Note that here the special '|' action does *not* provide any
2198savings, and can even make things worse (see Deficiencies / Bugs below).
2199
2200   Another area where the user can increase a scanner's performance
2201(and one that's easier to implement) arises from the fact that the
2202longer the tokens matched, the faster the scanner will run.  This is
2203because with long tokens the processing of most input characters takes
2204place in the (short) inner scanning loop, and does not often have to go
2205through the additional work of setting up the scanning environment
2206(e.g., `yytext') for the action.  Recall the scanner for C comments:
2207
2208     %x comment
2209     %%
2210             int line_num = 1;
2211     
2212     "/*"         BEGIN(comment);
2213     
2214     <comment>[^*\n]*
2215     <comment>"*"+[^*/\n]*
2216     <comment>\n             ++line_num;
2217     <comment>"*"+"/"        BEGIN(INITIAL);
2218
2219   This could be sped up by writing it as:
2220
2221     %x comment
2222     %%
2223             int line_num = 1;
2224     
2225     "/*"         BEGIN(comment);
2226     
2227     <comment>[^*\n]*
2228     <comment>[^*\n]*\n      ++line_num;
2229     <comment>"*"+[^*/\n]*
2230     <comment>"*"+[^*/\n]*\n ++line_num;
2231     <comment>"*"+"/"        BEGIN(INITIAL);
2232
2233   Now instead of each newline requiring the processing of another
2234action, recognizing the newlines is "distributed" over the other rules
2235to keep the matched text as long as possible.  Note that *adding* rules
2236does *not* slow down the scanner!  The speed of the scanner is
2237independent of the number of rules or (modulo the considerations given
2238at the beginning of this section) how complicated the rules are with
2239regard to operators such as '*' and '|'.
2240
2241   A final example in speeding up a scanner: suppose you want to scan
2242through a file containing identifiers and keywords, one per line and
2243with no other extraneous characters, and recognize all the keywords.  A
2244natural first approach is:
2245
2246     %%
2247     asm      |
2248     auto     |
2249     break    |
2250     ... etc ...
2251     volatile |
2252     while    /* it's a keyword */
2253     
2254     .|\n     /* it's not a keyword */
2255
2256   To eliminate the back-tracking, introduce a catch-all rule:
2257
2258     %%
2259     asm      |
2260     auto     |
2261     break    |
2262     ... etc ...
2263     volatile |
2264     while    /* it's a keyword */
2265     
2266     [a-z]+   |
2267     .|\n     /* it's not a keyword */
2268
2269   Now, if it's guaranteed that there's exactly one word per line, then
2270we can reduce the total number of matches by a half by merging in the
2271recognition of newlines with that of the other tokens:
2272
2273     %%
2274     asm\n    |
2275     auto\n   |
2276     break\n  |
2277     ... etc ...
2278     volatile\n |
2279     while\n  /* it's a keyword */
2280     
2281     [a-z]+\n |
2282     .|\n     /* it's not a keyword */
2283
2284   One has to be careful here, as we have now reintroduced backing up
2285into the scanner.  In particular, while *we* know that there will never
2286be any characters in the input stream other than letters or newlines,
2287`flex' can't figure this out, and it will plan for possibly needing to
2288back up when it has scanned a token like "auto" and then the next
2289character is something other than a newline or a letter.  Previously it
2290would then just match the "auto" rule and be done, but now it has no
2291"auto" rule, only a "auto\n" rule.  To eliminate the possibility of
2292backing up, we could either duplicate all rules but without final
2293newlines, or, since we never expect to encounter such an input and
2294therefore don't how it's classified, we can introduce one more
2295catch-all rule, this one which doesn't include a newline:
2296
2297     %%
2298     asm\n    |
2299     auto\n   |
2300     break\n  |
2301     ... etc ...
2302     volatile\n |
2303     while\n  /* it's a keyword */
2304     
2305     [a-z]+\n |
2306     [a-z]+   |
2307     .|\n     /* it's not a keyword */
2308
2309   Compiled with `-Cf', this is about as fast as one can get a `flex'
2310scanner to go for this particular problem.
2311
2312   A final note: `flex' is slow when matching NUL's, particularly when
2313a token contains multiple NUL's.  It's best to write rules which match
2314*short* amounts of text if it's anticipated that the text will often
2315include NUL's.
2316
2317   Another final note regarding performance: as mentioned above in the
2318section How the Input is Matched, dynamically resizing `yytext' to
2319accommodate huge tokens is a slow process because it presently requires
2320that the (huge) token be rescanned from the beginning.  Thus if
2321performance is vital, you should attempt to match "large" quantities of
2322text but not "huge" quantities, where the cutoff between the two is at
2323about 8K characters/token.
2324
2325
2326File: flex.info,  Node: C++,  Next: Incompatibilities,  Prev: Performance,  Up: Top
2327
2328Generating C++ scanners
2329=======================
2330
2331   `flex' provides two different ways to generate scanners for use with
2332C++.  The first way is to simply compile a scanner generated by `flex'
2333using a C++ compiler instead of a C compiler.  You should not encounter
2334any compilations errors (please report any you find to the email address
2335given in the Author section below).  You can then use C++ code in your
2336rule actions instead of C code.  Note that the default input source for
2337your scanner remains `yyin', and default echoing is still done to
2338`yyout'.  Both of these remain `FILE *' variables and not C++ `streams'.
2339
2340   You can also use `flex' to generate a C++ scanner class, using the
2341`-+' option, (or, equivalently, `%option c++'), which is automatically
2342specified if the name of the flex executable ends in a `+', such as
2343`flex++'.  When using this option, flex defaults to generating the
2344scanner to the file `lex.yy.cc' instead of `lex.yy.c'.  The generated
2345scanner includes the header file `FlexLexer.h', which defines the
2346interface to two C++ classes.
2347
2348   The first class, `FlexLexer', provides an abstract base class
2349defining the general scanner class interface.  It provides the
2350following member functions:
2351
2352`const char* YYText()'
2353     returns the text of the most recently matched token, the
2354     equivalent of `yytext'.
2355
2356`int YYLeng()'
2357     returns the length of the most recently matched token, the
2358     equivalent of `yyleng'.
2359
2360`int lineno() const'
2361     returns the current input line number (see `%option yylineno'), or
2362     1 if `%option yylineno' was not used.
2363
2364`void set_debug( int flag )'
2365     sets the debugging flag for the scanner, equivalent to assigning to
2366     `yy_flex_debug' (see the Options section above).  Note that you
2367     must build the scanner using `%option debug' to include debugging
2368     information in it.
2369
2370`int debug() const'
2371     returns the current setting of the debugging flag.
2372
2373   Also provided are member functions equivalent to
2374`yy_switch_to_buffer(), yy_create_buffer()' (though the first argument
2375is an `istream*' object pointer and not a `FILE*', `yy_flush_buffer()',
2376`yy_delete_buffer()', and `yyrestart()' (again, the first argument is a
2377`istream*' object pointer).
2378
2379   The second class defined in `FlexLexer.h' is `yyFlexLexer', which is
2380derived from `FlexLexer'.  It defines the following additional member
2381functions:
2382
2383`yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )'
2384     constructs a `yyFlexLexer' object using the given streams for
2385     input and output.  If not specified, the streams default to `cin'
2386     and `cout', respectively.
2387
2388`virtual int yylex()'
2389     performs the same role is `yylex()' does for ordinary flex
2390     scanners: it scans the input stream, consuming tokens, until a
2391     rule's action returns a value.  If you derive a subclass S from
2392     `yyFlexLexer' and want to access the member functions and
2393     variables of S inside `yylex()', then you need to use `%option
2394     yyclass="S"' to inform `flex' that you will be using that subclass
2395     instead of `yyFlexLexer'.  In this case, rather than generating
2396     `yyFlexLexer::yylex()', `flex' generates `S::yylex()' (and also
2397     generates a dummy `yyFlexLexer::yylex()' that calls
2398     `yyFlexLexer::LexerError()' if called).
2399
2400`virtual void switch_streams(istream* new_in = 0, ostream* new_out = 0)'
2401     reassigns `yyin' to `new_in' (if non-nil) and `yyout' to `new_out'
2402     (ditto), deleting the previous input buffer if `yyin' is
2403     reassigned.
2404
2405`int yylex( istream* new_in = 0, ostream* new_out = 0 )'
2406     first switches the input streams via `switch_streams( new_in,
2407     new_out )' and then returns the value of `yylex()'.
2408
2409   In addition, `yyFlexLexer' defines the following protected virtual
2410functions which you can redefine in derived classes to tailor the
2411scanner:
2412
2413`virtual int LexerInput( char* buf, int max_size )'
2414     reads up to `max_size' characters into BUF and returns the number
2415     of characters read.  To indicate end-of-input, return 0
2416     characters.  Note that "interactive" scanners (see the `-B' and
2417     `-I' flags) define the macro `YY_INTERACTIVE'.  If you redefine
2418     `LexerInput()' and need to take different actions depending on
2419     whether or not the scanner might be scanning an interactive input
2420     source, you can test for the presence of this name via `#ifdef'.
2421
2422`virtual void LexerOutput( const char* buf, int size )'
2423     writes out SIZE characters from the buffer BUF, which, while
2424     NUL-terminated, may also contain "internal" NUL's if the scanner's
2425     rules can match text with NUL's in them.
2426
2427`virtual void LexerError( const char* msg )'
2428     reports a fatal error message.  The default version of this
2429     function writes the message to the stream `cerr' and exits.
2430
2431   Note that a `yyFlexLexer' object contains its *entire* scanning
2432state.  Thus you can use such objects to create reentrant scanners.
2433You can instantiate multiple instances of the same `yyFlexLexer' class,
2434and you can also combine multiple C++ scanner classes together in the
2435same program using the `-P' option discussed above.  Finally, note that
2436the `%array' feature is not available to C++ scanner classes; you must
2437use `%pointer' (the default).
2438
2439   Here is an example of a simple C++ scanner:
2440
2441         // An example of using the flex C++ scanner class.
2442     
2443     %{
2444     int mylineno = 0;
2445     %}
2446     
2447     string  \"[^\n"]+\"
2448     
2449     ws      [ \t]+
2450     
2451     alpha   [A-Za-z]
2452     dig     [0-9]
2453     name    ({alpha}|{dig}|\$)({alpha}|{dig}|[_.\-/$])*
2454     num1    [-+]?{dig}+\.?([eE][-+]?{dig}+)?
2455     num2    [-+]?{dig}*\.{dig}+([eE][-+]?{dig}+)?
2456     number  {num1}|{num2}
2457     
2458     %%
2459     
2460     {ws}    /* skip blanks and tabs */
2461     
2462     "/*"    {
2463             int c;
2464     
2465             while((c = yyinput()) != 0)
2466                 {
2467                 if(c == '\n')
2468                     ++mylineno;
2469     
2470                 else if(c == '*')
2471                     {
2472                     if((c = yyinput()) == '/')
2473                         break;
2474                     else
2475                         unput(c);
2476                     }
2477                 }
2478             }
2479     
2480     {number}  cout << "number " << YYText() << '\n';
2481     
2482     \n        mylineno++;
2483     
2484     {name}    cout << "name " << YYText() << '\n';
2485     
2486     {string}  cout << "string " << YYText() << '\n';
2487     
2488     %%
2489     
2490     Version 2.5               December 1994                        44
2491     
2492     int main( int /* argc */, char** /* argv */ )
2493         {
2494         FlexLexer* lexer = new yyFlexLexer;
2495         while(lexer->yylex() != 0)
2496             ;
2497         return 0;
2498         }
2499
2500   If you want to create multiple (different) lexer classes, you use
2501the `-P' flag (or the `prefix=' option) to rename each `yyFlexLexer' to
2502some other `xxFlexLexer'.  You then can include `<FlexLexer.h>' in your
2503other sources once per lexer class, first renaming `yyFlexLexer' as
2504follows:
2505
2506     #undef yyFlexLexer
2507     #define yyFlexLexer xxFlexLexer
2508     #include <FlexLexer.h>
2509     
2510     #undef yyFlexLexer
2511     #define yyFlexLexer zzFlexLexer
2512     #include <FlexLexer.h>
2513
2514   if, for example, you used `%option prefix="xx"' for one of your
2515scanners and `%option prefix="zz"' for the other.
2516
2517   IMPORTANT: the present form of the scanning class is *experimental*
2518and may change considerably between major releases.
2519
2520
2521File: flex.info,  Node: Incompatibilities,  Next: Diagnostics,  Prev: C++,  Up: Top
2522
2523Incompatibilities with `lex' and POSIX
2524======================================
2525
2526   `flex' is a rewrite of the AT&T Unix `lex' tool (the two
2527implementations do not share any code, though), with some extensions
2528and incompatibilities, both of which are of concern to those who wish
2529to write scanners acceptable to either implementation.  Flex is fully
2530compliant with the POSIX `lex' specification, except that when using
2531`%pointer' (the default), a call to `unput()' destroys the contents of
2532`yytext', which is counter to the POSIX specification.
2533
2534   In this section we discuss all of the known areas of incompatibility
2535between flex, AT&T lex, and the POSIX specification.
2536
2537   `flex's' `-l' option turns on maximum compatibility with the
2538original AT&T `lex' implementation, at the cost of a major loss in the
2539generated scanner's performance.  We note below which incompatibilities
2540can be overcome using the `-l' option.
2541
2542   `flex' is fully compatible with `lex' with the following exceptions:
2543
2544   - The undocumented `lex' scanner internal variable `yylineno' is not
2545     supported unless `-l' or `%option yylineno' is used.  `yylineno'
2546     should be maintained on a per-buffer basis, rather than a
2547     per-scanner (single global variable) basis.  `yylineno' is not
2548     part of the POSIX specification.
2549
2550   - The `input()' routine is not redefinable, though it may be called
2551     to read characters following whatever has been matched by a rule.
2552     If `input()' encounters an end-of-file the normal `yywrap()'
2553     processing is done.  A "real" end-of-file is returned by `input()'
2554     as `EOF'.
2555
2556     Input is instead controlled by defining the `YY_INPUT' macro.
2557
2558     The `flex' restriction that `input()' cannot be redefined is in
2559     accordance with the POSIX specification, which simply does not
2560     specify any way of controlling the scanner's input other than by
2561     making an initial assignment to `yyin'.
2562
2563   - The `unput()' routine is not redefinable.  This restriction is in
2564     accordance with POSIX.
2565
2566   - `flex' scanners are not as reentrant as `lex' scanners.  In
2567     particular, if you have an interactive scanner and an interrupt
2568     handler which long-jumps out of the scanner, and the scanner is
2569     subsequently called again, you may get the following message:
2570
2571          fatal flex scanner internal error--end of buffer missed
2572
2573     To reenter the scanner, first use
2574
2575          yyrestart( yyin );
2576
2577     Note that this call will throw away any buffered input; usually
2578     this isn't a problem with an interactive scanner.
2579
2580     Also note that flex C++ scanner classes *are* reentrant, so if
2581     using C++ is an option for you, you should use them instead.  See
2582     "Generating C++ Scanners" above for details.
2583
2584   - `output()' is not supported.  Output from the `ECHO' macro is done
2585     to the file-pointer `yyout' (default `stdout').
2586
2587     `output()' is not part of the POSIX specification.
2588
2589   - `lex' does not support exclusive start conditions (%x), though
2590     they are in the POSIX specification.
2591
2592   - When definitions are expanded, `flex' encloses them in
2593     parentheses.  With lex, the following:
2594
2595          NAME    [A-Z][A-Z0-9]*
2596          %%
2597          foo{NAME}?      printf( "Found it\n" );
2598          %%
2599
2600     will not match the string "foo" because when the macro is expanded
2601     the rule is equivalent to "foo[A-Z][A-Z0-9]*?" and the precedence
2602     is such that the '?' is associated with "[A-Z0-9]*".  With `flex',
2603     the rule will be expanded to "foo([A-Z][A-Z0-9]*)?" and so the
2604     string "foo" will match.
2605
2606     Note that if the definition begins with `^' or ends with `$' then
2607     it is *not* expanded with parentheses, to allow these operators to
2608     appear in definitions without losing their special meanings.  But
2609     the `<s>, /', and `<<EOF>>' operators cannot be used in a `flex'
2610     definition.
2611
2612     Using `-l' results in the `lex' behavior of no parentheses around
2613     the definition.
2614
2615     The POSIX specification is that the definition be enclosed in
2616     parentheses.
2617
2618   - Some implementations of `lex' allow a rule's action to begin on a
2619     separate line, if the rule's pattern has trailing whitespace:
2620
2621          %%
2622          foo|bar<space here>
2623            { foobar_action(); }
2624
2625     `flex' does not support this feature.
2626
2627   - The `lex' `%r' (generate a Ratfor scanner) option is not
2628     supported.  It is not part of the POSIX specification.
2629
2630   - After a call to `unput()', `yytext' is undefined until the next
2631     token is matched, unless the scanner was built using `%array'.
2632     This is not the case with `lex' or the POSIX specification.  The
2633     `-l' option does away with this incompatibility.
2634
2635   - The precedence of the `{}' (numeric range) operator is different.
2636     `lex' interprets "abc{1,3}" as "match one, two, or three
2637     occurrences of 'abc'", whereas `flex' interprets it as "match 'ab'
2638     followed by one, two, or three occurrences of 'c'".  The latter is
2639     in agreement with the POSIX specification.
2640
2641   - The precedence of the `^' operator is different.  `lex' interprets
2642     "^foo|bar" as "match either 'foo' at the beginning of a line, or
2643     'bar' anywhere", whereas `flex' interprets it as "match either
2644     'foo' or 'bar' if they come at the beginning of a line".  The
2645     latter is in agreement with the POSIX specification.
2646
2647   - The special table-size declarations such as `%a' supported by
2648     `lex' are not required by `flex' scanners; `flex' ignores them.
2649
2650   - The name FLEX_SCANNER is #define'd so scanners may be written for
2651     use with either `flex' or `lex'.  Scanners also include
2652     `YY_FLEX_MAJOR_VERSION' and `YY_FLEX_MINOR_VERSION' indicating
2653     which version of `flex' generated the scanner (for example, for the
2654     2.5 release, these defines would be 2 and 5 respectively).
2655
2656   The following `flex' features are not included in `lex' or the POSIX
2657specification:
2658
2659     C++ scanners
2660     %option
2661     start condition scopes
2662     start condition stacks
2663     interactive/non-interactive scanners
2664     yy_scan_string() and friends
2665     yyterminate()
2666     yy_set_interactive()
2667     yy_set_bol()
2668     YY_AT_BOL()
2669     <<EOF>>
2670     <*>
2671     YY_DECL
2672     YY_START
2673     YY_USER_ACTION
2674     YY_USER_INIT
2675     #line directives
2676     %{}'s around actions
2677     multiple actions on a line
2678
2679plus almost all of the flex flags.  The last feature in the list refers
2680to the fact that with `flex' you can put multiple actions on the same
2681line, separated with semicolons, while with `lex', the following
2682
2683     foo    handle_foo(); ++num_foos_seen;
2684
2685is (rather surprisingly) truncated to
2686
2687     foo    handle_foo();
2688
2689   `flex' does not truncate the action.  Actions that are not enclosed
2690in braces are simply terminated at the end of the line.
2691
2692
2693File: flex.info,  Node: Diagnostics,  Next: Files,  Prev: Incompatibilities,  Up: Top
2694
2695Diagnostics
2696===========
2697
2698`warning, rule cannot be matched'
2699     indicates that the given rule cannot be matched because it follows
2700     other rules that will always match the same text as it.  For
2701     example, in the following "foo" cannot be matched because it comes
2702     after an identifier "catch-all" rule:
2703
2704          [a-z]+    got_identifier();
2705          foo       got_foo();
2706
2707     Using `REJECT' in a scanner suppresses this warning.
2708
2709`warning, -s option given but default rule can be matched'
2710     means that it is possible (perhaps only in a particular start
2711     condition) that the default rule (match any single character) is
2712     the only one that will match a particular input.  Since `-s' was
2713     given, presumably this is not intended.
2714
2715`reject_used_but_not_detected undefined'
2716`yymore_used_but_not_detected undefined'
2717     These errors can occur at compile time.  They indicate that the
2718     scanner uses `REJECT' or `yymore()' but that `flex' failed to
2719     notice the fact, meaning that `flex' scanned the first two sections
2720     looking for occurrences of these actions and failed to find any,
2721     but somehow you snuck some in (via a #include file, for example).
2722     Use `%option reject' or `%option yymore' to indicate to flex that
2723     you really do use these features.
2724
2725`flex scanner jammed'
2726     a scanner compiled with `-s' has encountered an input string which
2727     wasn't matched by any of its rules.  This error can also occur due
2728     to internal problems.
2729
2730`token too large, exceeds YYLMAX'
2731     your scanner uses `%array' and one of its rules matched a string
2732     longer than the `YYL-' `MAX' constant (8K bytes by default).  You
2733     can increase the value by #define'ing `YYLMAX' in the definitions
2734     section of your `flex' input.
2735
2736`scanner requires -8 flag to use the character 'X''
2737     Your scanner specification includes recognizing the 8-bit
2738     character X and you did not specify the -8 flag, and your scanner
2739     defaulted to 7-bit because you used the `-Cf' or `-CF' table
2740     compression options.  See the discussion of the `-7' flag for
2741     details.
2742
2743`flex scanner push-back overflow'
2744     you used `unput()' to push back so much text that the scanner's
2745     buffer could not hold both the pushed-back text and the current
2746     token in `yytext'.  Ideally the scanner should dynamically resize
2747     the buffer in this case, but at present it does not.
2748
2749`input buffer overflow, can't enlarge buffer because scanner uses REJECT'
2750     the scanner was working on matching an extremely large token and
2751     needed to expand the input buffer.  This doesn't work with
2752     scanners that use `REJECT'.
2753
2754`fatal flex scanner internal error--end of buffer missed'
2755     This can occur in an scanner which is reentered after a long-jump
2756     has jumped out (or over) the scanner's activation frame.  Before
2757     reentering the scanner, use:
2758
2759          yyrestart( yyin );
2760
2761     or, as noted above, switch to using the C++ scanner class.
2762
2763`too many start conditions in <> construct!'
2764     you listed more start conditions in a <> construct than exist (so
2765     you must have listed at least one of them twice).
2766
2767
2768File: flex.info,  Node: Files,  Next: Deficiencies,  Prev: Diagnostics,  Up: Top
2769
2770Files
2771=====
2772
2773`-lfl'
2774     library with which scanners must be linked.
2775
2776`lex.yy.c'
2777     generated scanner (called `lexyy.c' on some systems).
2778
2779`lex.yy.cc'
2780     generated C++ scanner class, when using `-+'.
2781
2782`<FlexLexer.h>'
2783     header file defining the C++ scanner base class, `FlexLexer', and
2784     its derived class, `yyFlexLexer'.
2785
2786`flex.skl'
2787     skeleton scanner.  This file is only used when building flex, not
2788     when flex executes.
2789
2790`lex.backup'
2791     backing-up information for `-b' flag (called `lex.bck' on some
2792     systems).
2793
2794
2795File: flex.info,  Node: Deficiencies,  Next: See also,  Prev: Files,  Up: Top
2796
2797Deficiencies / Bugs
2798===================
2799
2800   Some trailing context patterns cannot be properly matched and
2801generate warning messages ("dangerous trailing context").  These are
2802patterns where the ending of the first part of the rule matches the
2803beginning of the second part, such as "zx*/xy*", where the 'x*' matches
2804the 'x' at the beginning of the trailing context.  (Note that the POSIX
2805draft states that the text matched by such patterns is undefined.)
2806
2807   For some trailing context rules, parts which are actually
2808fixed-length are not recognized as such, leading to the abovementioned
2809performance loss.  In particular, parts using '|' or {n} (such as
2810"foo{3}") are always considered variable-length.
2811
2812   Combining trailing context with the special '|' action can result in
2813*fixed* trailing context being turned into the more expensive VARIABLE
2814trailing context.  For example, in the following:
2815
2816     %%
2817     abc      |
2818     xyz/def
2819
2820   Use of `unput()' invalidates yytext and yyleng, unless the `%array'
2821directive or the `-l' option has been used.
2822
2823   Pattern-matching of NUL's is substantially slower than matching
2824other characters.
2825
2826   Dynamic resizing of the input buffer is slow, as it entails
2827rescanning all the text matched so far by the current (generally huge)
2828token.
2829
2830   Due to both buffering of input and read-ahead, you cannot intermix
2831calls to <stdio.h> routines, such as, for example, `getchar()', with
2832`flex' rules and expect it to work.  Call `input()' instead.
2833
2834   The total table entries listed by the `-v' flag excludes the number
2835of table entries needed to determine what rule has been matched.  The
2836number of entries is equal to the number of DFA states if the scanner
2837does not use `REJECT', and somewhat greater than the number of states
2838if it does.
2839
2840   `REJECT' cannot be used with the `-f' or `-F' options.
2841
2842   The `flex' internal algorithms need documentation.
2843
2844
2845File: flex.info,  Node: See also,  Next: Author,  Prev: Deficiencies,  Up: Top
2846
2847See also
2848========
2849
2850   `lex'(1), `yacc'(1), `sed'(1), `awk'(1).
2851
2852   John Levine, Tony Mason, and Doug Brown: Lex & Yacc; O'Reilly and
2853Associates.  Be sure to get the 2nd edition.
2854
2855   M. E. Lesk and E. Schmidt, LEX - Lexical Analyzer Generator.
2856
2857   Alfred Aho, Ravi Sethi and Jeffrey Ullman: Compilers: Principles,
2858Techniques and Tools; Addison-Wesley (1986).  Describes the
2859pattern-matching techniques used by `flex' (deterministic finite
2860automata).
2861
2862
2863File: flex.info,  Node: Author,  Prev: See also,  Up: Top
2864
2865Author
2866======
2867
2868   Vern Paxson, with the help of many ideas and much inspiration from
2869Van Jacobson.  Original version by Jef Poskanzer.  The fast table
2870representation is a partial implementation of a design done by Van
2871Jacobson.  The implementation was done by Kevin Gong and Vern Paxson.
2872
2873   Thanks to the many `flex' beta-testers, feedbackers, and
2874contributors, especially Francois Pinard, Casey Leedom, Stan Adermann,
2875Terry Allen, David Barker-Plummer, John Basrai, Nelson H.F. Beebe,
2876`benson@odi.com', Karl Berry, Peter A. Bigot, Simon Blanchard, Keith
2877Bostic, Frederic Brehm, Ian Brockbank, Kin Cho, Nick Christopher, Brian
2878Clapper, J.T. Conklin, Jason Coughlin, Bill Cox, Nick Cropper, Dave
2879Curtis, Scott David Daniels, Chris G. Demetriou, Theo Deraadt, Mike
2880Donahue, Chuck Doucette, Tom Epperly, Leo Eskin, Chris Faylor, Chris
2881Flatters, Jon Forrest, Joe Gayda, Kaveh R. Ghazi, Eric Goldman,
2882Christopher M.  Gould, Ulrich Grepel, Peer Griebel, Jan Hajic, Charles
2883Hemphill, NORO Hideo, Jarkko Hietaniemi, Scott Hofmann, Jeff Honig,
2884Dana Hudes, Eric Hughes, John Interrante, Ceriel Jacobs, Michal
2885Jaegermann, Sakari Jalovaara, Jeffrey R. Jones, Henry Juengst, Klaus
2886Kaempf, Jonathan I. Kamens, Terrence O Kane, Amir Katz,
2887`ken@ken.hilco.com', Kevin B. Kenny, Steve Kirsch, Winfried Koenig,
2888Marq Kole, Ronald Lamprecht, Greg Lee, Rohan Lenard, Craig Leres, John
2889Levine, Steve Liddle, Mike Long, Mohamed el Lozy, Brian Madsen, Malte,
2890Joe Marshall, Bengt Martensson, Chris Metcalf, Luke Mewburn, Jim
2891Meyering, R.  Alexander Milowski, Erik Naggum, G.T. Nicol, Landon Noll,
2892James Nordby, Marc Nozell, Richard Ohnemus, Karsten Pahnke, Sven Panne,
2893Roland Pesch, Walter Pelissero, Gaumond Pierre, Esmond Pitt, Jef
2894Poskanzer, Joe Rahmeh, Jarmo Raiha, Frederic Raimbault, Pat Rankin,
2895Rick Richardson, Kevin Rodgers, Kai Uwe Rommel, Jim Roskind, Alberto
2896Santini, Andreas Scherer, Darrell Schiebel, Raf Schietekat, Doug
2897Schmidt, Philippe Schnoebelen, Andreas Schwab, Alex Siegel, Eckehard
2898Stolz, Jan-Erik Strvmquist, Mike Stump, Paul Stuart, Dave Tallman, Ian
2899Lance Taylor, Chris Thewalt, Richard M. Timoney, Jodi Tsai, Paul
2900Tuinenga, Gary Weik, Frank Whaley, Gerhard Wilhelms, Kent Williams, Ken
2901Yap, Ron Zellar, Nathan Zelle, David Zuhn, and those whose names have
2902slipped my marginal mail-archiving skills but whose contributions are
2903appreciated all the same.
2904
2905   Thanks to Keith Bostic, Jon Forrest, Noah Friedman, John Gilmore,
2906Craig Leres, John Levine, Bob Mulcahy, G.T.  Nicol, Francois Pinard,
2907Rich Salz, and Richard Stallman for help with various distribution
2908headaches.
2909
2910   Thanks to Esmond Pitt and Earle Horton for 8-bit character support;
2911to Benson Margulies and Fred Burke for C++ support; to Kent Williams
2912and Tom Epperly for C++ class support; to Ove Ewerlid for support of
2913NUL's; and to Eric Hughes for support of multiple buffers.
2914
2915   This work was primarily done when I was with the Real Time Systems
2916Group at the Lawrence Berkeley Laboratory in Berkeley, CA.  Many thanks
2917to all there for the support I received.
2918
2919   Send comments to `vern@ee.lbl.gov'.
2920
2921
2922
2923Tag Table:
2924Node: Top1430
2925Node: Name2808
2926Node: Synopsis2933
2927Node: Overview3145
2928Node: Description4986
2929Node: Examples5748
2930Node: Format8896
2931Node: Patterns11637
2932Node: Matching18138
2933Node: Actions21438
2934Node: Generated scanner30560
2935Node: Start conditions34988
2936Node: Multiple buffers45069
2937Node: End-of-file rules50975
2938Node: Miscellaneous52508
2939Node: User variables55279
2940Node: YACC interface57651
2941Node: Options58542
2942Node: Performance78234
2943Node: C++87532
2944Node: Incompatibilities94993
2945Node: Diagnostics101853
2946Node: Files105094
2947Node: Deficiencies105715
2948Node: See also107684
2949Node: Author108216
2950
2951End Tag Table
2952