1This is flex.info, produced by makeinfo version 6.1 from flex.texi. 2 3The flex manual is placed under the same licensing conditions as the 4rest of flex: 5 6 Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2012 The Flex 7Project. 8 9 Copyright (C) 1990, 1997 The Regents of the University of California. 10All rights reserved. 11 12 This code is derived from software contributed to Berkeley by Vern 13Paxson. 14 15 The United States Government has rights in this work pursuant to 16contract no. DE-AC03-76SF00098 between the United States Department of 17Energy and the University of California. 18 19 Redistribution and use in source and binary forms, with or without 20modification, are permitted provided that the following conditions are 21met: 22 23 1. Redistributions of source code must retain the above copyright 24 notice, this list of conditions and the following disclaimer. 25 26 2. Redistributions in binary form must reproduce the above copyright 27 notice, this list of conditions and the following disclaimer in the 28 documentation and/or other materials provided with the 29 distribution. 30 31 Neither the name of the University nor the names of its contributors 32may be used to endorse or promote products derived from this software 33without specific prior written permission. 34 35 THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED 36WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF 37MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. 38INFO-DIR-SECTION Programming 39START-INFO-DIR-ENTRY 40* flex: (flex). Fast lexical analyzer generator (lex replacement). 41END-INFO-DIR-ENTRY 42 43 44File: flex.info, Node: Top, Next: Copyright, Prev: (dir), Up: (dir) 45 46flex 47**** 48 49This manual describes 'flex', a tool for generating programs that 50perform pattern-matching on text. The manual includes both tutorial and 51reference sections. 52 53 This edition of 'The flex Manual' documents 'flex' version 2.6.4. It 54was last updated on 6 May 2017. 55 56 This manual was written by Vern Paxson, Will Estes and John Millaway. 57 58* Menu: 59 60* Copyright:: 61* Reporting Bugs:: 62* Introduction:: 63* Simple Examples:: 64* Format:: 65* Patterns:: 66* Matching:: 67* Actions:: 68* Generated Scanner:: 69* Start Conditions:: 70* Multiple Input Buffers:: 71* EOF:: 72* Misc Macros:: 73* User Values:: 74* Yacc:: 75* Scanner Options:: 76* Performance:: 77* Cxx:: 78* Reentrant:: 79* Lex and Posix:: 80* Memory Management:: 81* Serialized Tables:: 82* Diagnostics:: 83* Limitations:: 84* Bibliography:: 85* FAQ:: 86* Appendices:: 87* Indices:: 88 89 -- The Detailed Node Listing -- 90 91Format of the Input File 92 93* Definitions Section:: 94* Rules Section:: 95* User Code Section:: 96* Comments in the Input:: 97 98Scanner Options 99 100* Options for Specifying Filenames:: 101* Options Affecting Scanner Behavior:: 102* Code-Level And API Options:: 103* Options for Scanner Speed and Size:: 104* Debugging Options:: 105* Miscellaneous Options:: 106 107Reentrant C Scanners 108 109* Reentrant Uses:: 110* Reentrant Overview:: 111* Reentrant Example:: 112* Reentrant Detail:: 113* Reentrant Functions:: 114 115The Reentrant API in Detail 116 117* Specify Reentrant:: 118* Extra Reentrant Argument:: 119* Global Replacement:: 120* Init and Destroy Functions:: 121* Accessor Methods:: 122* Extra Data:: 123* About yyscan_t:: 124 125Memory Management 126 127* The Default Memory Management:: 128* Overriding The Default Memory Management:: 129* A Note About yytext And Memory:: 130 131Serialized Tables 132 133* Creating Serialized Tables:: 134* Loading and Unloading Serialized Tables:: 135* Tables File Format:: 136 137FAQ 138 139* When was flex born?:: 140* How do I expand backslash-escape sequences in C-style quoted strings?:: 141* Why do flex scanners call fileno if it is not ANSI compatible?:: 142* Does flex support recursive pattern definitions?:: 143* How do I skip huge chunks of input (tens of megabytes) while using flex?:: 144* Flex is not matching my patterns in the same order that I defined them.:: 145* My actions are executing out of order or sometimes not at all.:: 146* How can I have multiple input sources feed into the same scanner at the same time?:: 147* Can I build nested parsers that work with the same input file?:: 148* How can I match text only at the end of a file?:: 149* How can I make REJECT cascade across start condition boundaries?:: 150* Why cant I use fast or full tables with interactive mode?:: 151* How much faster is -F or -f than -C?:: 152* If I have a simple grammar cant I just parse it with flex?:: 153* Why doesn't yyrestart() set the start state back to INITIAL?:: 154* How can I match C-style comments?:: 155* The period isn't working the way I expected.:: 156* Can I get the flex manual in another format?:: 157* Does there exist a "faster" NDFA->DFA algorithm?:: 158* How does flex compile the DFA so quickly?:: 159* How can I use more than 8192 rules?:: 160* How do I abandon a file in the middle of a scan and switch to a new file?:: 161* How do I execute code only during initialization (only before the first scan)?:: 162* How do I execute code at termination?:: 163* Where else can I find help?:: 164* Can I include comments in the "rules" section of the file?:: 165* I get an error about undefined yywrap().:: 166* How can I change the matching pattern at run time?:: 167* How can I expand macros in the input?:: 168* How can I build a two-pass scanner?:: 169* How do I match any string not matched in the preceding rules?:: 170* I am trying to port code from AT&T lex that uses yysptr and yysbuf.:: 171* Is there a way to make flex treat NULL like a regular character?:: 172* Whenever flex can not match the input it says "flex scanner jammed".:: 173* Why doesn't flex have non-greedy operators like perl does?:: 174* Memory leak - 16386 bytes allocated by malloc.:: 175* How do I track the byte offset for lseek()?:: 176* How do I use my own I/O classes in a C++ scanner?:: 177* How do I skip as many chars as possible?:: 178* deleteme00:: 179* Are certain equivalent patterns faster than others?:: 180* Is backing up a big deal?:: 181* Can I fake multi-byte character support?:: 182* deleteme01:: 183* Can you discuss some flex internals?:: 184* unput() messes up yy_at_bol:: 185* The | operator is not doing what I want:: 186* Why can't flex understand this variable trailing context pattern?:: 187* The ^ operator isn't working:: 188* Trailing context is getting confused with trailing optional patterns:: 189* Is flex GNU or not?:: 190* ERASEME53:: 191* I need to scan if-then-else blocks and while loops:: 192* ERASEME55:: 193* ERASEME56:: 194* ERASEME57:: 195* Is there a repository for flex scanners?:: 196* How can I conditionally compile or preprocess my flex input file?:: 197* Where can I find grammars for lex and yacc?:: 198* I get an end-of-buffer message for each character scanned.:: 199* unnamed-faq-62:: 200* unnamed-faq-63:: 201* unnamed-faq-64:: 202* unnamed-faq-65:: 203* unnamed-faq-66:: 204* unnamed-faq-67:: 205* unnamed-faq-68:: 206* unnamed-faq-69:: 207* unnamed-faq-70:: 208* unnamed-faq-71:: 209* unnamed-faq-72:: 210* unnamed-faq-73:: 211* unnamed-faq-74:: 212* unnamed-faq-75:: 213* unnamed-faq-76:: 214* unnamed-faq-77:: 215* unnamed-faq-78:: 216* unnamed-faq-79:: 217* unnamed-faq-80:: 218* unnamed-faq-81:: 219* unnamed-faq-82:: 220* unnamed-faq-83:: 221* unnamed-faq-84:: 222* unnamed-faq-85:: 223* unnamed-faq-86:: 224* unnamed-faq-87:: 225* unnamed-faq-88:: 226* unnamed-faq-90:: 227* unnamed-faq-91:: 228* unnamed-faq-92:: 229* unnamed-faq-93:: 230* unnamed-faq-94:: 231* unnamed-faq-95:: 232* unnamed-faq-96:: 233* unnamed-faq-97:: 234* unnamed-faq-98:: 235* unnamed-faq-99:: 236* unnamed-faq-100:: 237* unnamed-faq-101:: 238* What is the difference between YYLEX_PARAM and YY_DECL?:: 239* Why do I get "conflicting types for yylex" error?:: 240* How do I access the values set in a Flex action from within a Bison action?:: 241 242Appendices 243 244* Makefiles and Flex:: 245* Bison Bridge:: 246* M4 Dependency:: 247* Common Patterns:: 248 249Indices 250 251* Concept Index:: 252* Index of Functions and Macros:: 253* Index of Variables:: 254* Index of Data Types:: 255* Index of Hooks:: 256* Index of Scanner Options:: 257 258 259 260File: flex.info, Node: Copyright, Next: Reporting Bugs, Prev: Top, Up: Top 261 2621 Copyright 263*********** 264 265The flex manual is placed under the same licensing conditions as the 266rest of flex: 267 268 Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2012 The Flex 269Project. 270 271 Copyright (C) 1990, 1997 The Regents of the University of California. 272All rights reserved. 273 274 This code is derived from software contributed to Berkeley by Vern 275Paxson. 276 277 The United States Government has rights in this work pursuant to 278contract no. DE-AC03-76SF00098 between the United States Department of 279Energy and the University of California. 280 281 Redistribution and use in source and binary forms, with or without 282modification, are permitted provided that the following conditions are 283met: 284 285 1. Redistributions of source code must retain the above copyright 286 notice, this list of conditions and the following disclaimer. 287 288 2. Redistributions in binary form must reproduce the above copyright 289 notice, this list of conditions and the following disclaimer in the 290 documentation and/or other materials provided with the 291 distribution. 292 293 Neither the name of the University nor the names of its contributors 294may be used to endorse or promote products derived from this software 295without specific prior written permission. 296 297 THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED 298WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF 299MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. 300 301 302File: flex.info, Node: Reporting Bugs, Next: Introduction, Prev: Copyright, Up: Top 303 3042 Reporting Bugs 305**************** 306 307If you find a bug in 'flex', please report it using GitHub's issue 308tracking facility at <https://github.com/westes/flex/issues/> 309 310 311File: flex.info, Node: Introduction, Next: Simple Examples, Prev: Reporting Bugs, Up: Top 312 3133 Introduction 314************** 315 316'flex' is a tool for generating "scanners". A scanner is a program 317which recognizes lexical patterns in text. The 'flex' program reads the 318given input files, or its standard input if no file names are given, for 319a description of a scanner to generate. The description is in the form 320of pairs of regular expressions and C code, called "rules". 'flex' 321generates as output a C source file, 'lex.yy.c' by default, which 322defines a routine 'yylex()'. This file can be compiled and linked with 323the flex runtime library to produce an executable. When the executable 324is run, it analyzes its input for occurrences of the regular 325expressions. Whenever it finds one, it executes the corresponding C 326code. 327 328 329File: flex.info, Node: Simple Examples, Next: Format, Prev: Introduction, Up: Top 330 3314 Some Simple Examples 332********************** 333 334First some simple examples to get the flavor of how one uses 'flex'. 335 336 The following 'flex' input specifies a scanner which, when it 337encounters the string 'username' will replace it with the user's login 338name: 339 340 %% 341 username printf( "%s", getlogin() ); 342 343 By default, any text not matched by a 'flex' scanner is copied to the 344output, so the net effect of this scanner is to copy its input file to 345its output with each occurrence of 'username' expanded. In this input, 346there is just one rule. 'username' is the "pattern" and the 'printf' is 347the "action". The '%%' symbol marks the beginning of the rules. 348 349 Here's another simple example: 350 351 int num_lines = 0, num_chars = 0; 352 353 %% 354 \n ++num_lines; ++num_chars; 355 . ++num_chars; 356 357 %% 358 359 int main() 360 { 361 yylex(); 362 printf( "# of lines = %d, # of chars = %d\n", 363 num_lines, num_chars ); 364 } 365 366 This scanner counts the number of characters and the number of lines 367in its input. It produces no output other than the final report on the 368character and line counts. The first line declares two globals, 369'num_lines' and 'num_chars', which are accessible both inside 'yylex()' 370and in the 'main()' routine declared after the second '%%'. There are 371two rules, one which matches a newline ('\n') and increments both the 372line count and the character count, and one which matches any character 373other than a newline (indicated by the '.' regular expression). 374 375 A somewhat more complicated example: 376 377 /* scanner for a toy Pascal-like language */ 378 379 %{ 380 /* need this for the call to atof() below */ 381 #include <math.h> 382 %} 383 384 DIGIT [0-9] 385 ID [a-z][a-z0-9]* 386 387 %% 388 389 {DIGIT}+ { 390 printf( "An integer: %s (%d)\n", yytext, 391 atoi( yytext ) ); 392 } 393 394 {DIGIT}+"."{DIGIT}* { 395 printf( "A float: %s (%g)\n", yytext, 396 atof( yytext ) ); 397 } 398 399 if|then|begin|end|procedure|function { 400 printf( "A keyword: %s\n", yytext ); 401 } 402 403 {ID} printf( "An identifier: %s\n", yytext ); 404 405 "+"|"-"|"*"|"/" printf( "An operator: %s\n", yytext ); 406 407 "{"[^{}\n]*"}" /* eat up one-line comments */ 408 409 [ \t\n]+ /* eat up whitespace */ 410 411 . printf( "Unrecognized character: %s\n", yytext ); 412 413 %% 414 415 int main( int argc, char **argv ) 416 { 417 ++argv, --argc; /* skip over program name */ 418 if ( argc > 0 ) 419 yyin = fopen( argv[0], "r" ); 420 else 421 yyin = stdin; 422 423 yylex(); 424 } 425 426 This is the beginnings of a simple scanner for a language like 427Pascal. It identifies different types of "tokens" and reports on what 428it has seen. 429 430 The details of this example will be explained in the following 431sections. 432 433 434File: flex.info, Node: Format, Next: Patterns, Prev: Simple Examples, Up: Top 435 4365 Format of the Input File 437************************** 438 439The 'flex' input file consists of three sections, separated by a line 440containing only '%%'. 441 442 definitions 443 %% 444 rules 445 %% 446 user code 447 448* Menu: 449 450* Definitions Section:: 451* Rules Section:: 452* User Code Section:: 453* Comments in the Input:: 454 455 456File: flex.info, Node: Definitions Section, Next: Rules Section, Prev: Format, Up: Format 457 4585.1 Format of the Definitions Section 459===================================== 460 461The "definitions section" contains declarations of simple "name" 462definitions to simplify the scanner specification, and declarations of 463"start conditions", which are explained in a later section. 464 465 Name definitions have the form: 466 467 name definition 468 469 The 'name' is a word beginning with a letter or an underscore ('_') 470followed by zero or more letters, digits, '_', or '-' (dash). The 471definition is taken to begin at the first non-whitespace character 472following the name and continuing to the end of the line. The 473definition can subsequently be referred to using '{name}', which will 474expand to '(definition)'. For example, 475 476 DIGIT [0-9] 477 ID [a-z][a-z0-9]* 478 479 Defines 'DIGIT' to be a regular expression which matches a single 480digit, and 'ID' to be a regular expression which matches a letter 481followed by zero-or-more letters-or-digits. A subsequent reference to 482 483 {DIGIT}+"."{DIGIT}* 484 485 is identical to 486 487 ([0-9])+"."([0-9])* 488 489 and matches one-or-more digits followed by a '.' followed by 490zero-or-more digits. 491 492 An unindented comment (i.e., a line beginning with '/*') is copied 493verbatim to the output up to the next '*/'. 494 495 Any _indented_ text or text enclosed in '%{' and '%}' is also copied 496verbatim to the output (with the %{ and %} symbols removed). The %{ and 497%} symbols must appear unindented on lines by themselves. 498 499 A '%top' block is similar to a '%{' ... '%}' block, except that the 500code in a '%top' block is relocated to the _top_ of the generated file, 501before any flex definitions (1). The '%top' block is useful when you 502want certain preprocessor macros to be defined or certain files to be 503included before the generated code. The single characters, '{' and '}' 504are used to delimit the '%top' block, as show in the example below: 505 506 %top{ 507 /* This code goes at the "top" of the generated file. */ 508 #include <stdint.h> 509 #include <inttypes.h> 510 } 511 512 Multiple '%top' blocks are allowed, and their order is preserved. 513 514 ---------- Footnotes ---------- 515 516 (1) Actually, 'yyIN_HEADER' is defined before the '%top' block. 517 518 519File: flex.info, Node: Rules Section, Next: User Code Section, Prev: Definitions Section, Up: Format 520 5215.2 Format of the Rules Section 522=============================== 523 524The "rules" section of the 'flex' input contains a series of rules of 525the form: 526 527 pattern action 528 529 where the pattern must be unindented and the action must begin on the 530same line. *Note Patterns::, for a further description of patterns and 531actions. 532 533 In the rules section, any indented or %{ %} enclosed text appearing 534before the first rule may be used to declare variables which are local 535to the scanning routine and (after the declarations) code which is to be 536executed whenever the scanning routine is entered. Other indented or %{ 537%} text in the rule section is still copied to the output, but its 538meaning is not well-defined and it may well cause compile-time errors 539(this feature is present for POSIX compliance. *Note Lex and Posix::, 540for other such features). 541 542 Any _indented_ text or text enclosed in '%{' and '%}' is copied 543verbatim to the output (with the %{ and %} symbols removed). The %{ and 544%} symbols must appear unindented on lines by themselves. 545 546 547File: flex.info, Node: User Code Section, Next: Comments in the Input, Prev: Rules Section, Up: Format 548 5495.3 Format of the User Code Section 550=================================== 551 552The user code section is simply copied to 'lex.yy.c' verbatim. It is 553used for companion routines which call or are called by the scanner. 554The presence of this section is optional; if it is missing, the second 555'%%' in the input file may be skipped, too. 556 557 558File: flex.info, Node: Comments in the Input, Prev: User Code Section, Up: Format 559 5605.4 Comments in the Input 561========================= 562 563Flex supports C-style comments, that is, anything between '/*' and '*/' 564is considered a comment. Whenever flex encounters a comment, it copies 565the entire comment verbatim to the generated source code. Comments may 566appear just about anywhere, but with the following exceptions: 567 568 * Comments may not appear in the Rules Section wherever flex is 569 expecting a regular expression. This means comments may not appear 570 at the beginning of a line, or immediately following a list of 571 scanner states. 572 * Comments may not appear on an '%option' line in the Definitions 573 Section. 574 575 If you want to follow a simple rule, then always begin a comment on a 576new line, with one or more whitespace characters before the initial 577'/*'). This rule will work anywhere in the input file. 578 579 All the comments in the following example are valid: 580 581 %{ 582 /* code block */ 583 %} 584 585 /* Definitions Section */ 586 %x STATE_X 587 588 %% 589 /* Rules Section */ 590 ruleA /* after regex */ { /* code block */ } /* after code block */ 591 /* Rules Section (indented) */ 592 <STATE_X>{ 593 ruleC ECHO; 594 ruleD ECHO; 595 %{ 596 /* code block */ 597 %} 598 } 599 %% 600 /* User Code Section */ 601 602 603 604File: flex.info, Node: Patterns, Next: Matching, Prev: Format, Up: Top 605 6066 Patterns 607********** 608 609The patterns in the input (see *note Rules Section::) are written using 610an extended set of regular expressions. These are: 611 612'x' 613 match the character 'x' 614 615'.' 616 any character (byte) except newline 617 618'[xyz]' 619 a "character class"; in this case, the pattern matches either an 620 'x', a 'y', or a 'z' 621 622'[abj-oZ]' 623 a "character class" with a range in it; matches an 'a', a 'b', any 624 letter from 'j' through 'o', or a 'Z' 625 626'[^A-Z]' 627 a "negated character class", i.e., any character but those in the 628 class. In this case, any character EXCEPT an uppercase letter. 629 630'[^A-Z\n]' 631 any character EXCEPT an uppercase letter or a newline 632 633'[a-z]{-}[aeiou]' 634 the lowercase consonants 635 636'r*' 637 zero or more r's, where r is any regular expression 638 639'r+' 640 one or more r's 641 642'r?' 643 zero or one r's (that is, "an optional r") 644 645'r{2,5}' 646 anywhere from two to five r's 647 648'r{2,}' 649 two or more r's 650 651'r{4}' 652 exactly 4 r's 653 654'{name}' 655 the expansion of the 'name' definition (*note Format::). 656 657'"[xyz]\"foo"' 658 the literal string: '[xyz]"foo' 659 660'\X' 661 if X is 'a', 'b', 'f', 'n', 'r', 't', or 'v', then the ANSI-C 662 interpretation of '\x'. Otherwise, a literal 'X' (used to escape 663 operators such as '*') 664 665'\0' 666 a NUL character (ASCII code 0) 667 668'\123' 669 the character with octal value 123 670 671'\x2a' 672 the character with hexadecimal value 2a 673 674'(r)' 675 match an 'r'; parentheses are used to override precedence (see 676 below) 677 678'(?r-s:pattern)' 679 apply option 'r' and omit option 's' while interpreting pattern. 680 Options may be zero or more of the characters 'i', 's', or 'x'. 681 682 'i' means case-insensitive. '-i' means case-sensitive. 683 684 's' alters the meaning of the '.' syntax to match any single byte 685 whatsoever. '-s' alters the meaning of '.' to match any byte 686 except '\n'. 687 688 'x' ignores comments and whitespace in patterns. Whitespace is 689 ignored unless it is backslash-escaped, contained within '""'s, or 690 appears inside a character class. 691 692 The following are all valid: 693 694 (?:foo) same as (foo) 695 (?i:ab7) same as ([aA][bB]7) 696 (?-i:ab) same as (ab) 697 (?s:.) same as [\x00-\xFF] 698 (?-s:.) same as [^\n] 699 (?ix-s: a . b) same as ([Aa][^\n][bB]) 700 (?x:a b) same as ("ab") 701 (?x:a\ b) same as ("a b") 702 (?x:a" "b) same as ("a b") 703 (?x:a[ ]b) same as ("a b") 704 (?x:a 705 /* comment */ 706 b 707 c) same as (abc) 708 709'(?# comment )' 710 omit everything within '()'. The first ')' character encountered 711 ends the pattern. It is not possible to for the comment to contain 712 a ')' character. The comment may span lines. 713 714'rs' 715 the regular expression 'r' followed by the regular expression 's'; 716 called "concatenation" 717 718'r|s' 719 either an 'r' or an 's' 720 721'r/s' 722 an 'r' but only if it is followed by an 's'. The text matched by 723 's' is included when determining whether this rule is the longest 724 match, but is then returned to the input before the action is 725 executed. So the action only sees the text matched by 'r'. This 726 type of pattern is called "trailing context". (There are some 727 combinations of 'r/s' that flex cannot match correctly. *Note 728 Limitations::, regarding dangerous trailing context.) 729 730'^r' 731 an 'r', but only at the beginning of a line (i.e., when just 732 starting to scan, or right after a newline has been scanned). 733 734'r$' 735 an 'r', but only at the end of a line (i.e., just before a 736 newline). Equivalent to 'r/\n'. 737 738 Note that 'flex''s notion of "newline" is exactly whatever the C 739 compiler used to compile 'flex' interprets '\n' as; in particular, 740 on some DOS systems you must either filter out '\r's in the input 741 yourself, or explicitly use 'r/\r\n' for 'r$'. 742 743'<s>r' 744 an 'r', but only in start condition 's' (see *note Start 745 Conditions:: for discussion of start conditions). 746 747'<s1,s2,s3>r' 748 same, but in any of start conditions 's1', 's2', or 's3'. 749 750'<*>r' 751 an 'r' in any start condition, even an exclusive one. 752 753'<<EOF>>' 754 an end-of-file. 755 756'<s1,s2><<EOF>>' 757 an end-of-file when in start condition 's1' or 's2' 758 759 Note that inside of a character class, all regular expression 760operators lose their special meaning except escape ('\') and the 761character class operators, '-', ']]', and, at the beginning of the 762class, '^'. 763 764 The regular expressions listed above are grouped according to 765precedence, from highest precedence at the top to lowest at the bottom. 766Those grouped together have equal precedence (see special note on the 767precedence of the repeat operator, '{}', under the documentation for the 768'--posix' POSIX compliance option). For example, 769 770 foo|bar* 771 772 is the same as 773 774 (foo)|(ba(r*)) 775 776 since the '*' operator has higher precedence than concatenation, and 777concatenation higher than alternation ('|'). This pattern therefore 778matches _either_ the string 'foo' _or_ the string 'ba' followed by 779zero-or-more 'r''s. To match 'foo' or zero-or-more repetitions of the 780string 'bar', use: 781 782 foo|(bar)* 783 784 And to match a sequence of zero or more repetitions of 'foo' and 785'bar': 786 787 (foo|bar)* 788 789 In addition to characters and ranges of characters, character classes 790can also contain "character class expressions". These are expressions 791enclosed inside '[:' and ':]' delimiters (which themselves must appear 792between the '[' and ']' of the character class. Other elements may 793occur inside the character class, too). The valid expressions are: 794 795 [:alnum:] [:alpha:] [:blank:] 796 [:cntrl:] [:digit:] [:graph:] 797 [:lower:] [:print:] [:punct:] 798 [:space:] [:upper:] [:xdigit:] 799 800 These expressions all designate a set of characters equivalent to the 801corresponding standard C 'isXXX' function. For example, '[:alnum:]' 802designates those characters for which 'isalnum()' returns true - i.e., 803any alphabetic or numeric character. Some systems don't provide 804'isblank()', so flex defines '[:blank:]' as a blank or a tab. 805 806 For example, the following character classes are all equivalent: 807 808 [[:alnum:]] 809 [[:alpha:][:digit:]] 810 [[:alpha:][0-9]] 811 [a-zA-Z0-9] 812 813 A word of caution. Character classes are expanded immediately when 814seen in the 'flex' input. This means the character classes are 815sensitive to the locale in which 'flex' is executed, and the resulting 816scanner will not be sensitive to the runtime locale. This may or may 817not be desirable. 818 819 * If your scanner is case-insensitive (the '-i' flag), then 820 '[:upper:]' and '[:lower:]' are equivalent to '[:alpha:]'. 821 822 * Character classes with ranges, such as '[a-Z]', should be used with 823 caution in a case-insensitive scanner if the range spans upper or 824 lowercase characters. Flex does not know if you want to fold all 825 upper and lowercase characters together, or if you want the literal 826 numeric range specified (with no case folding). When in doubt, 827 flex will assume that you meant the literal numeric range, and will 828 issue a warning. The exception to this rule is a character range 829 such as '[a-z]' or '[S-W]' where it is obvious that you want 830 case-folding to occur. Here are some examples with the '-i' flag 831 enabled: 832 833 Range Result Literal Range Alternate Range 834 '[a-t]' ok '[a-tA-T]' 835 '[A-T]' ok '[a-tA-T]' 836 '[A-t]' ambiguous '[A-Z\[\\\]_`a-t]' '[a-tA-T]' 837 '[_-{]' ambiguous '[_`a-z{]' '[_`a-zA-Z{]' 838 '[@-C]' ambiguous '[@ABC]' '[@A-Z\[\\\]_`abc]' 839 840 * A negated character class such as the example '[^A-Z]' above _will_ 841 match a newline unless '\n' (or an equivalent escape sequence) is 842 one of the characters explicitly present in the negated character 843 class (e.g., '[^A-Z\n]'). This is unlike how many other regular 844 expression tools treat negated character classes, but unfortunately 845 the inconsistency is historically entrenched. Matching newlines 846 means that a pattern like '[^"]*' can match the entire input unless 847 there's another quote in the input. 848 849 Flex allows negation of character class expressions by prepending 850 '^' to the POSIX character class name. 851 852 [:^alnum:] [:^alpha:] [:^blank:] 853 [:^cntrl:] [:^digit:] [:^graph:] 854 [:^lower:] [:^print:] [:^punct:] 855 [:^space:] [:^upper:] [:^xdigit:] 856 857 Flex will issue a warning if the expressions '[:^upper:]' and 858 '[:^lower:]' appear in a case-insensitive scanner, since their 859 meaning is unclear. The current behavior is to skip them entirely, 860 but this may change without notice in future revisions of flex. 861 862 * 863 The '{-}' operator computes the difference of two character 864 classes. For example, '[a-c]{-}[b-z]' represents all the 865 characters in the class '[a-c]' that are not in the class '[b-z]' 866 (which in this case, is just the single character 'a'). The '{-}' 867 operator is left associative, so '[abc]{-}[b]{-}[c]' is the same as 868 '[a]'. Be careful not to accidentally create an empty set, which 869 will never match. 870 871 * 872 The '{+}' operator computes the union of two character classes. 873 For example, '[a-z]{+}[0-9]' is the same as '[a-z0-9]'. This 874 operator is useful when preceded by the result of a difference 875 operation, as in, '[[:alpha:]]{-}[[:lower:]]{+}[q]', which is 876 equivalent to '[A-Zq]' in the "C" locale. 877 878 * A rule can have at most one instance of trailing context (the '/' 879 operator or the '$' operator). The start condition, '^', and 880 '<<EOF>>' patterns can only occur at the beginning of a pattern, 881 and, as well as with '/' and '$', cannot be grouped inside 882 parentheses. A '^' which does not occur at the beginning of a rule 883 or a '$' which does not occur at the end of a rule loses its 884 special properties and is treated as a normal character. 885 886 * The following are invalid: 887 888 foo/bar$ 889 <sc1>foo<sc2>bar 890 891 Note that the first of these can be written 'foo/bar\n'. 892 893 * The following will result in '$' or '^' being treated as a normal 894 character: 895 896 foo|(bar$) 897 foo|^bar 898 899 If the desired meaning is a 'foo' or a 'bar'-followed-by-a-newline, 900 the following could be used (the special '|' action is explained 901 below, *note Actions::): 902 903 foo | 904 bar$ /* action goes here */ 905 906 A similar trick will work for matching a 'foo' or a 907 'bar'-at-the-beginning-of-a-line. 908 909 910File: flex.info, Node: Matching, Next: Actions, Prev: Patterns, Up: Top 911 9127 How the Input Is Matched 913************************** 914 915When the generated scanner is run, it analyzes its input looking for 916strings which match any of its patterns. If it finds more than one 917match, it takes the one matching the most text (for trailing context 918rules, this includes the length of the trailing part, even though it 919will then be returned to the input). If it finds two or more matches of 920the same length, the rule listed first in the 'flex' input file is 921chosen. 922 923 Once the match is determined, the text corresponding to the match 924(called the "token") is made available in the global character pointer 925'yytext', and its length in the global integer 'yyleng'. The "action" 926corresponding to the matched pattern is then executed (*note Actions::), 927and then the remaining input is scanned for another match. 928 929 If no match is found, then the "default rule" is executed: the next 930character in the input is considered matched and copied to the standard 931output. Thus, the simplest valid 'flex' input is: 932 933 %% 934 935 which generates a scanner that simply copies its input (one character 936at a time) to its output. 937 938 Note that 'yytext' can be defined in two different ways: either as a 939character _pointer_ or as a character _array_. You can control which 940definition 'flex' uses by including one of the special directives 941'%pointer' or '%array' in the first (definitions) section of your flex 942input. The default is '%pointer', unless you use the '-l' lex 943compatibility option, in which case 'yytext' will be an array. The 944advantage of using '%pointer' is substantially faster scanning and no 945buffer overflow when matching very large tokens (unless you run out of 946dynamic memory). The disadvantage is that you are restricted in how 947your actions can modify 'yytext' (*note Actions::), and calls to the 948'unput()' function destroys the present contents of 'yytext', which can 949be a considerable porting headache when moving between different 'lex' 950versions. 951 952 The advantage of '%array' is that you can then modify 'yytext' to 953your heart's content, and calls to 'unput()' do not destroy 'yytext' 954(*note Actions::). Furthermore, existing 'lex' programs sometimes 955access 'yytext' externally using declarations of the form: 956 957 extern char yytext[]; 958 959 This definition is erroneous when used with '%pointer', but correct 960for '%array'. 961 962 The '%array' declaration defines 'yytext' to be an array of 'YYLMAX' 963characters, which defaults to a fairly large value. You can change the 964size by simply #define'ing 'YYLMAX' to a different value in the first 965section of your 'flex' input. As mentioned above, with '%pointer' 966yytext grows dynamically to accommodate large tokens. While this means 967your '%pointer' scanner can accommodate very large tokens (such as 968matching entire blocks of comments), bear in mind that each time the 969scanner must resize 'yytext' it also must rescan the entire token from 970the beginning, so matching such tokens can prove slow. 'yytext' 971presently does _not_ dynamically grow if a call to 'unput()' results in 972too much text being pushed back; instead, a run-time error results. 973 974 Also note that you cannot use '%array' with C++ scanner classes 975(*note Cxx::). 976 977 978File: flex.info, Node: Actions, Next: Generated Scanner, Prev: Matching, Up: Top 979 9808 Actions 981********* 982 983Each pattern in a rule has a corresponding "action", which can be any 984arbitrary C statement. The pattern ends at the first non-escaped 985whitespace character; the remainder of the line is its action. If the 986action is empty, then when the pattern is matched the input token is 987simply discarded. For example, here is the specification for a program 988which deletes all occurrences of 'zap me' from its input: 989 990 %% 991 "zap me" 992 993 This example will copy all other characters in the input to the 994output since they will be matched by the default rule. 995 996 Here is a program which compresses multiple blanks and tabs down to a 997single blank, and throws away whitespace found at the end of a line: 998 999 %% 1000 [ \t]+ putchar( ' ' ); 1001 [ \t]+$ /* ignore this token */ 1002 1003 If the action contains a '{', then the action spans till the 1004balancing '}' is found, and the action may cross multiple lines. 'flex' 1005knows about C strings and comments and won't be fooled by braces found 1006within them, but also allows actions to begin with '%{' and will 1007consider the action to be all the text up to the next '%}' (regardless 1008of ordinary braces inside the action). 1009 1010 An action consisting solely of a vertical bar ('|') means "same as 1011the action for the next rule". See below for an illustration. 1012 1013 Actions can include arbitrary C code, including 'return' statements 1014to return a value to whatever routine called 'yylex()'. Each time 1015'yylex()' is called it continues processing tokens from where it last 1016left off until it either reaches the end of the file or executes a 1017return. 1018 1019 Actions are free to modify 'yytext' except for lengthening it (adding 1020characters to its end-these will overwrite later characters in the input 1021stream). This however does not apply when using '%array' (*note 1022Matching::). In that case, 'yytext' may be freely modified in any way. 1023 1024 Actions are free to modify 'yyleng' except they should not do so if 1025the action also includes use of 'yymore()' (see below). 1026 1027 There are a number of special directives which can be included within 1028an action: 1029 1030'ECHO' 1031 copies yytext to the scanner's output. 1032 1033'BEGIN' 1034 followed by the name of a start condition places the scanner in the 1035 corresponding start condition (see below). 1036 1037'REJECT' 1038 directs the scanner to proceed on to the "second best" rule which 1039 matched the input (or a prefix of the input). The rule is chosen 1040 as described above in *note Matching::, and 'yytext' and 'yyleng' 1041 set up appropriately. It may either be one which matched as much 1042 text as the originally chosen rule but came later in the 'flex' 1043 input file, or one which matched less text. For example, the 1044 following will both count the words in the input and call the 1045 routine 'special()' whenever 'frob' is seen: 1046 1047 int word_count = 0; 1048 %% 1049 1050 frob special(); REJECT; 1051 [^ \t\n]+ ++word_count; 1052 1053 Without the 'REJECT', any occurrences of 'frob' in the input would 1054 not be counted as words, since the scanner normally executes only 1055 one action per token. Multiple uses of 'REJECT' are allowed, each 1056 one finding the next best choice to the currently active rule. For 1057 example, when the following scanner scans the token 'abcd', it will 1058 write 'abcdabcaba' to the output: 1059 1060 %% 1061 a | 1062 ab | 1063 abc | 1064 abcd ECHO; REJECT; 1065 .|\n /* eat up any unmatched character */ 1066 1067 The first three rules share the fourth's action since they use the 1068 special '|' action. 1069 1070 'REJECT' is a particularly expensive feature in terms of scanner 1071 performance; if it is used in _any_ of the scanner's actions it 1072 will slow down _all_ of the scanner's matching. Furthermore, 1073 'REJECT' cannot be used with the '-Cf' or '-CF' options (*note 1074 Scanner Options::). 1075 1076 Note also that unlike the other special actions, 'REJECT' is a 1077 _branch_. Code immediately following it in the action will _not_ 1078 be executed. 1079 1080'yymore()' 1081 tells the scanner that the next time it matches a rule, the 1082 corresponding token should be _appended_ onto the current value of 1083 'yytext' rather than replacing it. For example, given the input 1084 'mega-kludge' the following will write 'mega-mega-kludge' to the 1085 output: 1086 1087 %% 1088 mega- ECHO; yymore(); 1089 kludge ECHO; 1090 1091 First 'mega-' is matched and echoed to the output. Then 'kludge' 1092 is matched, but the previous 'mega-' is still hanging around at the 1093 beginning of 'yytext' so the 'ECHO' for the 'kludge' rule will 1094 actually write 'mega-kludge'. 1095 1096 Two notes regarding use of 'yymore()'. First, 'yymore()' depends on 1097the value of 'yyleng' correctly reflecting the size of the current 1098token, so you must not modify 'yyleng' if you are using 'yymore()'. 1099Second, the presence of 'yymore()' in the scanner's action entails a 1100minor performance penalty in the scanner's matching speed. 1101 1102 'yyless(n)' returns all but the first 'n' characters of the current 1103token back to the input stream, where they will be rescanned when the 1104scanner looks for the next match. 'yytext' and 'yyleng' are adjusted 1105appropriately (e.g., 'yyleng' will now be equal to 'n'). For example, 1106on the input 'foobar' the following will write out 'foobarbar': 1107 1108 %% 1109 foobar ECHO; yyless(3); 1110 [a-z]+ ECHO; 1111 1112 An argument of 0 to 'yyless()' will cause the entire current input 1113string to be scanned again. Unless you've changed how the scanner will 1114subsequently process its input (using 'BEGIN', for example), this will 1115result in an endless loop. 1116 1117 Note that 'yyless()' is a macro and can only be used in the flex 1118input file, not from other source files. 1119 1120 'unput(c)' puts the character 'c' back onto the input stream. It 1121will be the next character scanned. The following action will take the 1122current token and cause it to be rescanned enclosed in parentheses. 1123 1124 { 1125 int i; 1126 /* Copy yytext because unput() trashes yytext */ 1127 char *yycopy = strdup( yytext ); 1128 unput( ')' ); 1129 for ( i = yyleng - 1; i >= 0; --i ) 1130 unput( yycopy[i] ); 1131 unput( '(' ); 1132 free( yycopy ); 1133 } 1134 1135 Note that since each 'unput()' puts the given character back at the 1136_beginning_ of the input stream, pushing back strings must be done 1137back-to-front. 1138 1139 An important potential problem when using 'unput()' is that if you 1140are using '%pointer' (the default), a call to 'unput()' _destroys_ the 1141contents of 'yytext', starting with its rightmost character and 1142devouring one character to the left with each call. If you need the 1143value of 'yytext' preserved after a call to 'unput()' (as in the above 1144example), you must either first copy it elsewhere, or build your scanner 1145using '%array' instead (*note Matching::). 1146 1147 Finally, note that you cannot put back 'EOF' to attempt to mark the 1148input stream with an end-of-file. 1149 1150 'input()' reads the next character from the input stream. For 1151example, the following is one way to eat up C comments: 1152 1153 %% 1154 "/*" { 1155 int c; 1156 1157 for ( ; ; ) 1158 { 1159 while ( (c = input()) != '*' && 1160 c != EOF ) 1161 ; /* eat up text of comment */ 1162 1163 if ( c == '*' ) 1164 { 1165 while ( (c = input()) == '*' ) 1166 ; 1167 if ( c == '/' ) 1168 break; /* found the end */ 1169 } 1170 1171 if ( c == EOF ) 1172 { 1173 error( "EOF in comment" ); 1174 break; 1175 } 1176 } 1177 } 1178 1179 (Note that if the scanner is compiled using 'C++', then 'input()' is 1180instead referred to as yyinput(), in order to avoid a name clash with 1181the 'C++' stream by the name of 'input'.) 1182 1183 'YY_FLUSH_BUFFER;' flushes the scanner's internal buffer so that the 1184next time the scanner attempts to match a token, it will first refill 1185the buffer using 'YY_INPUT()' (*note Generated Scanner::). This action 1186is a special case of the more general 'yy_flush_buffer;' function, 1187described below (*note Multiple Input Buffers::) 1188 1189 'yyterminate()' can be used in lieu of a return statement in an 1190action. It terminates the scanner and returns a 0 to the scanner's 1191caller, indicating "all done". By default, 'yyterminate()' is also 1192called when an end-of-file is encountered. It is a macro and may be 1193redefined. 1194 1195 1196File: flex.info, Node: Generated Scanner, Next: Start Conditions, Prev: Actions, Up: Top 1197 11989 The Generated Scanner 1199*********************** 1200 1201The output of 'flex' is the file 'lex.yy.c', which contains the scanning 1202routine 'yylex()', a number of tables used by it for matching tokens, 1203and a number of auxiliary routines and macros. By default, 'yylex()' is 1204declared as follows: 1205 1206 int yylex() 1207 { 1208 ... various definitions and the actions in here ... 1209 } 1210 1211 (If your environment supports function prototypes, then it will be 1212'int yylex( void )'.) This definition may be changed by defining the 1213'YY_DECL' macro. For example, you could use: 1214 1215 #define YY_DECL float lexscan( a, b ) float a, b; 1216 1217 to give the scanning routine the name 'lexscan', returning a float, 1218and taking two floats as arguments. Note that if you give arguments to 1219the scanning routine using a K&R-style/non-prototyped function 1220declaration, you must terminate the definition with a semi-colon (;). 1221 1222 'flex' generates 'C99' function definitions by default. Flex used to 1223have the ability to generate obsolete, er, 'traditional', function 1224definitions. This was to support bootstrapping gcc on old systems. 1225Unfortunately, traditional definitions prevent us from using any 1226standard data types smaller than int (such as short, char, or bool) as 1227function arguments. Furthermore, traditional definitions support added 1228extra complexity in the skeleton file. For this reason, current 1229versions of 'flex' generate standard C99 code only, leaving K&R-style 1230functions to the historians. 1231 1232 Whenever 'yylex()' is called, it scans tokens from the global input 1233file 'yyin' (which defaults to stdin). It continues until it either 1234reaches an end-of-file (at which point it returns the value 0) or one of 1235its actions executes a 'return' statement. 1236 1237 If the scanner reaches an end-of-file, subsequent calls are undefined 1238unless either 'yyin' is pointed at a new input file (in which case 1239scanning continues from that file), or 'yyrestart()' is called. 1240'yyrestart()' takes one argument, a 'FILE *' pointer (which can be NULL, 1241if you've set up 'YY_INPUT' to scan from a source other than 'yyin'), 1242and initializes 'yyin' for scanning from that file. Essentially there 1243is no difference between just assigning 'yyin' to a new input file or 1244using 'yyrestart()' to do so; the latter is available for compatibility 1245with previous versions of 'flex', and because it can be used to switch 1246input files in the middle of scanning. It can also be used to throw 1247away the current input buffer, by calling it with an argument of 'yyin'; 1248but it would be better to use 'YY_FLUSH_BUFFER' (*note Actions::). Note 1249that 'yyrestart()' does _not_ reset the start condition to 'INITIAL' 1250(*note Start Conditions::). 1251 1252 If 'yylex()' stops scanning due to executing a 'return' statement in 1253one of the actions, the scanner may then be called again and it will 1254resume scanning where it left off. 1255 1256 By default (and for purposes of efficiency), the scanner uses 1257block-reads rather than simple 'getc()' calls to read characters from 1258'yyin'. The nature of how it gets its input can be controlled by 1259defining the 'YY_INPUT' macro. The calling sequence for 'YY_INPUT()' is 1260'YY_INPUT(buf,result,max_size)'. Its action is to place up to 1261'max_size' characters in the character array 'buf' and return in the 1262integer variable 'result' either the number of characters read or the 1263constant 'YY_NULL' (0 on Unix systems) to indicate 'EOF'. The default 1264'YY_INPUT' reads from the global file-pointer 'yyin'. 1265 1266 Here is a sample definition of 'YY_INPUT' (in the definitions section 1267of the input file): 1268 1269 %{ 1270 #define YY_INPUT(buf,result,max_size) \ 1271 { \ 1272 int c = getchar(); \ 1273 result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \ 1274 } 1275 %} 1276 1277 This definition will change the input processing to occur one 1278character at a time. 1279 1280 When the scanner receives an end-of-file indication from YY_INPUT, it 1281then checks the 'yywrap()' function. If 'yywrap()' returns false 1282(zero), then it is assumed that the function has gone ahead and set up 1283'yyin' to point to another input file, and scanning continues. If it 1284returns true (non-zero), then the scanner terminates, returning 0 to its 1285caller. Note that in either case, the start condition remains 1286unchanged; it does _not_ revert to 'INITIAL'. 1287 1288 If you do not supply your own version of 'yywrap()', then you must 1289either use '%option noyywrap' (in which case the scanner behaves as 1290though 'yywrap()' returned 1), or you must link with '-lfl' to obtain 1291the default version of the routine, which always returns 1. 1292 1293 For scanning from in-memory buffers (e.g., scanning strings), see 1294*note Scanning Strings::. *Note Multiple Input Buffers::. 1295 1296 The scanner writes its 'ECHO' output to the 'yyout' global (default, 1297'stdout'), which may be redefined by the user simply by assigning it to 1298some other 'FILE' pointer. 1299 1300 1301File: flex.info, Node: Start Conditions, Next: Multiple Input Buffers, Prev: Generated Scanner, Up: Top 1302 130310 Start Conditions 1304******************* 1305 1306'flex' provides a mechanism for conditionally activating rules. Any 1307rule whose pattern is prefixed with '<sc>' will only be active when the 1308scanner is in the "start condition" named 'sc'. For example, 1309 1310 <STRING>[^"]* { /* eat up the string body ... */ 1311 ... 1312 } 1313 1314 will be active only when the scanner is in the 'STRING' start 1315condition, and 1316 1317 <INITIAL,STRING,QUOTE>\. { /* handle an escape ... */ 1318 ... 1319 } 1320 1321 will be active only when the current start condition is either 1322'INITIAL', 'STRING', or 'QUOTE'. 1323 1324 Start conditions are declared in the definitions (first) section of 1325the input using unindented lines beginning with either '%s' or '%x' 1326followed by a list of names. The former declares "inclusive" start 1327conditions, the latter "exclusive" start conditions. A start condition 1328is activated using the 'BEGIN' action. Until the next 'BEGIN' action is 1329executed, rules with the given start condition will be active and rules 1330with other start conditions will be inactive. If the start condition is 1331inclusive, then rules with no start conditions at all will also be 1332active. If it is exclusive, then _only_ rules qualified with the start 1333condition will be active. A set of rules contingent on the same 1334exclusive start condition describe a scanner which is independent of any 1335of the other rules in the 'flex' input. Because of this, exclusive 1336start conditions make it easy to specify "mini-scanners" which scan 1337portions of the input that are syntactically different from the rest 1338(e.g., comments). 1339 1340 If the distinction between inclusive and exclusive start conditions 1341is still a little vague, here's a simple example illustrating the 1342connection between the two. The set of rules: 1343 1344 %s example 1345 %% 1346 1347 <example>foo do_something(); 1348 1349 bar something_else(); 1350 1351 is equivalent to 1352 1353 %x example 1354 %% 1355 1356 <example>foo do_something(); 1357 1358 <INITIAL,example>bar something_else(); 1359 1360 Without the '<INITIAL,example>' qualifier, the 'bar' pattern in the 1361second example wouldn't be active (i.e., couldn't match) when in start 1362condition 'example'. If we just used '<example>' to qualify 'bar', 1363though, then it would only be active in 'example' and not in 'INITIAL', 1364while in the first example it's active in both, because in the first 1365example the 'example' start condition is an inclusive '(%s)' start 1366condition. 1367 1368 Also note that the special start-condition specifier '<*>' matches 1369every start condition. Thus, the above example could also have been 1370written: 1371 1372 %x example 1373 %% 1374 1375 <example>foo do_something(); 1376 1377 <*>bar something_else(); 1378 1379 The default rule (to 'ECHO' any unmatched character) remains active 1380in start conditions. It is equivalent to: 1381 1382 <*>.|\n ECHO; 1383 1384 'BEGIN(0)' returns to the original state where only the rules with no 1385start conditions are active. This state can also be referred to as the 1386start-condition 'INITIAL', so 'BEGIN(INITIAL)' is equivalent to 1387'BEGIN(0)'. (The parentheses around the start condition name are not 1388required but are considered good style.) 1389 1390 'BEGIN' actions can also be given as indented code at the beginning 1391of the rules section. For example, the following will cause the scanner 1392to enter the 'SPECIAL' start condition whenever 'yylex()' is called and 1393the global variable 'enter_special' is true: 1394 1395 int enter_special; 1396 1397 %x SPECIAL 1398 %% 1399 if ( enter_special ) 1400 BEGIN(SPECIAL); 1401 1402 <SPECIAL>blahblahblah 1403 ...more rules follow... 1404 1405 To illustrate the uses of start conditions, here is a scanner which 1406provides two different interpretations of a string like '123.456'. By 1407default it will treat it as three tokens, the integer '123', a dot 1408('.'), and the integer '456'. But if the string is preceded earlier in 1409the line by the string 'expect-floats' it will treat it as a single 1410token, the floating-point number '123.456': 1411 1412 %{ 1413 #include <math.h> 1414 %} 1415 %s expect 1416 1417 %% 1418 expect-floats BEGIN(expect); 1419 1420 <expect>[0-9]+.[0-9]+ { 1421 printf( "found a float, = %f\n", 1422 atof( yytext ) ); 1423 } 1424 <expect>\n { 1425 /* that's the end of the line, so 1426 * we need another "expect-number" 1427 * before we'll recognize any more 1428 * numbers 1429 */ 1430 BEGIN(INITIAL); 1431 } 1432 1433 [0-9]+ { 1434 printf( "found an integer, = %d\n", 1435 atoi( yytext ) ); 1436 } 1437 1438 "." printf( "found a dot\n" ); 1439 1440 Here is a scanner which recognizes (and discards) C comments while 1441maintaining a count of the current input line. 1442 1443 %x comment 1444 %% 1445 int line_num = 1; 1446 1447 "/*" BEGIN(comment); 1448 1449 <comment>[^*\n]* /* eat anything that's not a '*' */ 1450 <comment>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */ 1451 <comment>\n ++line_num; 1452 <comment>"*"+"/" BEGIN(INITIAL); 1453 1454 This scanner goes to a bit of trouble to match as much text as 1455possible with each rule. In general, when attempting to write a 1456high-speed scanner try to match as much possible in each rule, as it's a 1457big win. 1458 1459 Note that start-conditions names are really integer values and can be 1460stored as such. Thus, the above could be extended in the following 1461fashion: 1462 1463 %x comment foo 1464 %% 1465 int line_num = 1; 1466 int comment_caller; 1467 1468 "/*" { 1469 comment_caller = INITIAL; 1470 BEGIN(comment); 1471 } 1472 1473 ... 1474 1475 <foo>"/*" { 1476 comment_caller = foo; 1477 BEGIN(comment); 1478 } 1479 1480 <comment>[^*\n]* /* eat anything that's not a '*' */ 1481 <comment>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */ 1482 <comment>\n ++line_num; 1483 <comment>"*"+"/" BEGIN(comment_caller); 1484 1485 Furthermore, you can access the current start condition using the 1486integer-valued 'YY_START' macro. For example, the above assignments to 1487'comment_caller' could instead be written 1488 1489 comment_caller = YY_START; 1490 1491 Flex provides 'YYSTATE' as an alias for 'YY_START' (since that is 1492what's used by AT&T 'lex'). 1493 1494 For historical reasons, start conditions do not have their own 1495name-space within the generated scanner. The start condition names are 1496unmodified in the generated scanner and generated header. *Note 1497option-header::. *Note option-prefix::. 1498 1499 Finally, here's an example of how to match C-style quoted strings 1500using exclusive start conditions, including expanded escape sequences 1501(but not including checking for a string that's too long): 1502 1503 %x str 1504 1505 %% 1506 char string_buf[MAX_STR_CONST]; 1507 char *string_buf_ptr; 1508 1509 1510 \" string_buf_ptr = string_buf; BEGIN(str); 1511 1512 <str>\" { /* saw closing quote - all done */ 1513 BEGIN(INITIAL); 1514 *string_buf_ptr = '\0'; 1515 /* return string constant token type and 1516 * value to parser 1517 */ 1518 } 1519 1520 <str>\n { 1521 /* error - unterminated string constant */ 1522 /* generate error message */ 1523 } 1524 1525 <str>\\[0-7]{1,3} { 1526 /* octal escape sequence */ 1527 int result; 1528 1529 (void) sscanf( yytext + 1, "%o", &result ); 1530 1531 if ( result > 0xff ) 1532 /* error, constant is out-of-bounds */ 1533 1534 *string_buf_ptr++ = result; 1535 } 1536 1537 <str>\\[0-9]+ { 1538 /* generate error - bad escape sequence; something 1539 * like '\48' or '\0777777' 1540 */ 1541 } 1542 1543 <str>\\n *string_buf_ptr++ = '\n'; 1544 <str>\\t *string_buf_ptr++ = '\t'; 1545 <str>\\r *string_buf_ptr++ = '\r'; 1546 <str>\\b *string_buf_ptr++ = '\b'; 1547 <str>\\f *string_buf_ptr++ = '\f'; 1548 1549 <str>\\(.|\n) *string_buf_ptr++ = yytext[1]; 1550 1551 <str>[^\\\n\"]+ { 1552 char *yptr = yytext; 1553 1554 while ( *yptr ) 1555 *string_buf_ptr++ = *yptr++; 1556 } 1557 1558 Often, such as in some of the examples above, you wind up writing a 1559whole bunch of rules all preceded by the same start condition(s). Flex 1560makes this a little easier and cleaner by introducing a notion of start 1561condition "scope". A start condition scope is begun with: 1562 1563 <SCs>{ 1564 1565 where '<SCs>' is a list of one or more start conditions. Inside the 1566start condition scope, every rule automatically has the prefix '<SCs>' 1567applied to it, until a '}' which matches the initial '{'. So, for 1568example, 1569 1570 <ESC>{ 1571 "\\n" return '\n'; 1572 "\\r" return '\r'; 1573 "\\f" return '\f'; 1574 "\\0" return '\0'; 1575 } 1576 1577 is equivalent to: 1578 1579 <ESC>"\\n" return '\n'; 1580 <ESC>"\\r" return '\r'; 1581 <ESC>"\\f" return '\f'; 1582 <ESC>"\\0" return '\0'; 1583 1584 Start condition scopes may be nested. 1585 1586 The following routines are available for manipulating stacks of start 1587conditions: 1588 1589 -- Function: void yy_push_state ( int 'new_state' ) 1590 pushes the current start condition onto the top of the start 1591 condition stack and switches to 'new_state' as though you had used 1592 'BEGIN new_state' (recall that start condition names are also 1593 integers). 1594 1595 -- Function: void yy_pop_state () 1596 pops the top of the stack and switches to it via 'BEGIN'. 1597 1598 -- Function: int yy_top_state () 1599 returns the top of the stack without altering the stack's contents. 1600 1601 The start condition stack grows dynamically and so has no built-in 1602size limitation. If memory is exhausted, program execution aborts. 1603 1604 To use start condition stacks, your scanner must include a '%option 1605stack' directive (*note Scanner Options::). 1606 1607 1608File: flex.info, Node: Multiple Input Buffers, Next: EOF, Prev: Start Conditions, Up: Top 1609 161011 Multiple Input Buffers 1611************************* 1612 1613Some scanners (such as those which support "include" files) require 1614reading from several input streams. As 'flex' scanners do a large 1615amount of buffering, one cannot control where the next input will be 1616read from by simply writing a 'YY_INPUT()' which is sensitive to the 1617scanning context. 'YY_INPUT()' is only called when the scanner reaches 1618the end of its buffer, which may be a long time after scanning a 1619statement such as an 'include' statement which requires switching the 1620input source. 1621 1622 To negotiate these sorts of problems, 'flex' provides a mechanism for 1623creating and switching between multiple input buffers. An input buffer 1624is created by using: 1625 1626 -- Function: YY_BUFFER_STATE yy_create_buffer ( FILE *file, int size ) 1627 1628 which takes a 'FILE' pointer and a size and creates a buffer 1629associated with the given file and large enough to hold 'size' 1630characters (when in doubt, use 'YY_BUF_SIZE' for the size). It returns 1631a 'YY_BUFFER_STATE' handle, which may then be passed to other routines 1632(see below). The 'YY_BUFFER_STATE' type is a pointer to an opaque 1633'struct yy_buffer_state' structure, so you may safely initialize 1634'YY_BUFFER_STATE' variables to '((YY_BUFFER_STATE) 0)' if you wish, and 1635also refer to the opaque structure in order to correctly declare input 1636buffers in source files other than that of your scanner. Note that the 1637'FILE' pointer in the call to 'yy_create_buffer' is only used as the 1638value of 'yyin' seen by 'YY_INPUT'. If you redefine 'YY_INPUT()' so it 1639no longer uses 'yyin', then you can safely pass a NULL 'FILE' pointer to 1640'yy_create_buffer'. You select a particular buffer to scan from using: 1641 1642 -- Function: void yy_switch_to_buffer ( YY_BUFFER_STATE new_buffer ) 1643 1644 The above function switches the scanner's input buffer so subsequent 1645tokens will come from 'new_buffer'. Note that 'yy_switch_to_buffer()' 1646may be used by 'yywrap()' to set things up for continued scanning, 1647instead of opening a new file and pointing 'yyin' at it. If you are 1648looking for a stack of input buffers, then you want to use 1649'yypush_buffer_state()' instead of this function. Note also that 1650switching input sources via either 'yy_switch_to_buffer()' or 'yywrap()' 1651does _not_ change the start condition. 1652 1653 -- Function: void yy_delete_buffer ( YY_BUFFER_STATE buffer ) 1654 1655 is used to reclaim the storage associated with a buffer. ('buffer' 1656can be NULL, in which case the routine does nothing.) You can also 1657clear the current contents of a buffer using: 1658 1659 -- Function: void yypush_buffer_state ( YY_BUFFER_STATE buffer ) 1660 1661 This function pushes the new buffer state onto an internal stack. 1662The pushed state becomes the new current state. The stack is maintained 1663by flex and will grow as required. This function is intended to be used 1664instead of 'yy_switch_to_buffer', when you want to change states, but 1665preserve the current state for later use. 1666 1667 -- Function: void yypop_buffer_state ( ) 1668 1669 This function removes the current state from the top of the stack, 1670and deletes it by calling 'yy_delete_buffer'. The next state on the 1671stack, if any, becomes the new current state. 1672 1673 -- Function: void yy_flush_buffer ( YY_BUFFER_STATE buffer ) 1674 1675 This function discards the buffer's contents, so the next time the 1676scanner attempts to match a token from the buffer, it will first fill 1677the buffer anew using 'YY_INPUT()'. 1678 1679 -- Function: YY_BUFFER_STATE yy_new_buffer ( FILE *file, int size ) 1680 1681 is an alias for 'yy_create_buffer()', provided for compatibility with 1682the C++ use of 'new' and 'delete' for creating and destroying dynamic 1683objects. 1684 1685 'YY_CURRENT_BUFFER' macro returns a 'YY_BUFFER_STATE' handle to the 1686current buffer. It should not be used as an lvalue. 1687 1688 Here are two examples of using these features for writing a scanner 1689which expands include files (the '<<EOF>>' feature is discussed below). 1690 1691 This first example uses yypush_buffer_state and yypop_buffer_state. 1692Flex maintains the stack internally. 1693 1694 /* the "incl" state is used for picking up the name 1695 * of an include file 1696 */ 1697 %x incl 1698 %% 1699 include BEGIN(incl); 1700 1701 [a-z]+ ECHO; 1702 [^a-z\n]*\n? ECHO; 1703 1704 <incl>[ \t]* /* eat the whitespace */ 1705 <incl>[^ \t\n]+ { /* got the include file name */ 1706 yyin = fopen( yytext, "r" ); 1707 1708 if ( ! yyin ) 1709 error( ... ); 1710 1711 yypush_buffer_state(yy_create_buffer( yyin, YY_BUF_SIZE )); 1712 1713 BEGIN(INITIAL); 1714 } 1715 1716 <<EOF>> { 1717 yypop_buffer_state(); 1718 1719 if ( !YY_CURRENT_BUFFER ) 1720 { 1721 yyterminate(); 1722 } 1723 } 1724 1725 The second example, below, does the same thing as the previous 1726example did, but manages its own input buffer stack manually (instead of 1727letting flex do it). 1728 1729 /* the "incl" state is used for picking up the name 1730 * of an include file 1731 */ 1732 %x incl 1733 1734 %{ 1735 #define MAX_INCLUDE_DEPTH 10 1736 YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH]; 1737 int include_stack_ptr = 0; 1738 %} 1739 1740 %% 1741 include BEGIN(incl); 1742 1743 [a-z]+ ECHO; 1744 [^a-z\n]*\n? ECHO; 1745 1746 <incl>[ \t]* /* eat the whitespace */ 1747 <incl>[^ \t\n]+ { /* got the include file name */ 1748 if ( include_stack_ptr >= MAX_INCLUDE_DEPTH ) 1749 { 1750 fprintf( stderr, "Includes nested too deeply" ); 1751 exit( 1 ); 1752 } 1753 1754 include_stack[include_stack_ptr++] = 1755 YY_CURRENT_BUFFER; 1756 1757 yyin = fopen( yytext, "r" ); 1758 1759 if ( ! yyin ) 1760 error( ... ); 1761 1762 yy_switch_to_buffer( 1763 yy_create_buffer( yyin, YY_BUF_SIZE ) ); 1764 1765 BEGIN(INITIAL); 1766 } 1767 1768 <<EOF>> { 1769 if ( --include_stack_ptr == 0 ) 1770 { 1771 yyterminate(); 1772 } 1773 1774 else 1775 { 1776 yy_delete_buffer( YY_CURRENT_BUFFER ); 1777 yy_switch_to_buffer( 1778 include_stack[include_stack_ptr] ); 1779 } 1780 } 1781 1782 The following routines are available for setting up input buffers for 1783scanning in-memory strings instead of files. All of them create a new 1784input buffer for scanning the string, and return a corresponding 1785'YY_BUFFER_STATE' handle (which you should delete with 1786'yy_delete_buffer()' when done with it). They also switch to the new 1787buffer using 'yy_switch_to_buffer()', so the next call to 'yylex()' will 1788start scanning the string. 1789 1790 -- Function: YY_BUFFER_STATE yy_scan_string ( const char *str ) 1791 scans a NUL-terminated string. 1792 1793 -- Function: YY_BUFFER_STATE yy_scan_bytes ( const char *bytes, int len 1794 ) 1795 scans 'len' bytes (including possibly 'NUL's) starting at location 1796 'bytes'. 1797 1798 Note that both of these functions create and scan a _copy_ of the 1799string or bytes. (This may be desirable, since 'yylex()' modifies the 1800contents of the buffer it is scanning.) You can avoid the copy by 1801using: 1802 1803 -- Function: YY_BUFFER_STATE yy_scan_buffer (char *base, yy_size_t 1804 size) 1805 which scans in place the buffer starting at 'base', consisting of 1806 'size' bytes, the last two bytes of which _must_ be 1807 'YY_END_OF_BUFFER_CHAR' (ASCII NUL). These last two bytes are not 1808 scanned; thus, scanning consists of 'base[0]' through 1809 'base[size-2]', inclusive. 1810 1811 If you fail to set up 'base' in this manner (i.e., forget the final 1812two 'YY_END_OF_BUFFER_CHAR' bytes), then 'yy_scan_buffer()' returns a 1813NULL pointer instead of creating a new input buffer. 1814 1815 -- Data type: yy_size_t 1816 is an integral type to which you can cast an integer expression 1817 reflecting the size of the buffer. 1818 1819 1820File: flex.info, Node: EOF, Next: Misc Macros, Prev: Multiple Input Buffers, Up: Top 1821 182212 End-of-File Rules 1823******************** 1824 1825The special rule '<<EOF>>' indicates actions which are to be taken when 1826an end-of-file is encountered and 'yywrap()' returns non-zero (i.e., 1827indicates no further files to process). The action must finish by doing 1828one of the following things: 1829 1830 * assigning 'yyin' to a new input file (in previous versions of 1831 'flex', after doing the assignment you had to call the special 1832 action 'YY_NEW_FILE'. This is no longer necessary.) 1833 1834 * executing a 'return' statement; 1835 1836 * executing the special 'yyterminate()' action. 1837 1838 * or, switching to a new buffer using 'yy_switch_to_buffer()' as 1839 shown in the example above. 1840 1841 <<EOF>> rules may not be used with other patterns; they may only be 1842qualified with a list of start conditions. If an unqualified <<EOF>> 1843rule is given, it applies to _all_ start conditions which do not already 1844have <<EOF>> actions. To specify an <<EOF>> rule for only the initial 1845start condition, use: 1846 1847 <INITIAL><<EOF>> 1848 1849 These rules are useful for catching things like unclosed comments. 1850An example: 1851 1852 %x quote 1853 %% 1854 1855 ...other rules for dealing with quotes... 1856 1857 <quote><<EOF>> { 1858 error( "unterminated quote" ); 1859 yyterminate(); 1860 } 1861 <<EOF>> { 1862 if ( *++filelist ) 1863 yyin = fopen( *filelist, "r" ); 1864 else 1865 yyterminate(); 1866 } 1867 1868 1869File: flex.info, Node: Misc Macros, Next: User Values, Prev: EOF, Up: Top 1870 187113 Miscellaneous Macros 1872*********************** 1873 1874The macro 'YY_USER_ACTION' can be defined to provide an action which is 1875always executed prior to the matched rule's action. For example, it 1876could be #define'd to call a routine to convert yytext to lower-case. 1877When 'YY_USER_ACTION' is invoked, the variable 'yy_act' gives the number 1878of the matched rule (rules are numbered starting with 1). Suppose you 1879want to profile how often each of your rules is matched. The following 1880would do the trick: 1881 1882 #define YY_USER_ACTION ++ctr[yy_act] 1883 1884 where 'ctr' is an array to hold the counts for the different rules. 1885Note that the macro 'YY_NUM_RULES' gives the total number of rules 1886(including the default rule), even if you use '-s)', so a correct 1887declaration for 'ctr' is: 1888 1889 int ctr[YY_NUM_RULES]; 1890 1891 The macro 'YY_USER_INIT' may be defined to provide an action which is 1892always executed before the first scan (and before the scanner's internal 1893initializations are done). For example, it could be used to call a 1894routine to read in a data table or open a logging file. 1895 1896 The macro 'yy_set_interactive(is_interactive)' can be used to control 1897whether the current buffer is considered "interactive". An interactive 1898buffer is processed more slowly, but must be used when the scanner's 1899input source is indeed interactive to avoid problems due to waiting to 1900fill buffers (see the discussion of the '-I' flag in *note Scanner 1901Options::). A non-zero value in the macro invocation marks the buffer 1902as interactive, a zero value as non-interactive. Note that use of this 1903macro overrides '%option always-interactive' or '%option 1904never-interactive' (*note Scanner Options::). 'yy_set_interactive()' 1905must be invoked prior to beginning to scan the buffer that is (or is 1906not) to be considered interactive. 1907 1908 The macro 'yy_set_bol(at_bol)' can be used to control whether the 1909current buffer's scanning context for the next token match is done as 1910though at the beginning of a line. A non-zero macro argument makes 1911rules anchored with '^' active, while a zero argument makes '^' rules 1912inactive. 1913 1914 The macro 'YY_AT_BOL()' returns true if the next token scanned from 1915the current buffer will have '^' rules active, false otherwise. 1916 1917 In the generated scanner, the actions are all gathered in one large 1918switch statement and separated using 'YY_BREAK', which may be redefined. 1919By default, it is simply a 'break', to separate each rule's action from 1920the following rule's. Redefining 'YY_BREAK' allows, for example, C++ 1921users to #define YY_BREAK to do nothing (while being very careful that 1922every rule ends with a 'break' or a 'return'!) to avoid suffering from 1923unreachable statement warnings where because a rule's action ends with 1924'return', the 'YY_BREAK' is inaccessible. 1925 1926 1927File: flex.info, Node: User Values, Next: Yacc, Prev: Misc Macros, Up: Top 1928 192914 Values Available To the User 1930******************************* 1931 1932This chapter summarizes the various values available to the user in the 1933rule actions. 1934 1935'char *yytext' 1936 holds the text of the current token. It may be modified but not 1937 lengthened (you cannot append characters to the end). 1938 1939 If the special directive '%array' appears in the first section of 1940 the scanner description, then 'yytext' is instead declared 'char 1941 yytext[YYLMAX]', where 'YYLMAX' is a macro definition that you can 1942 redefine in the first section if you don't like the default value 1943 (generally 8KB). Using '%array' results in somewhat slower 1944 scanners, but the value of 'yytext' becomes immune to calls to 1945 'unput()', which potentially destroy its value when 'yytext' is a 1946 character pointer. The opposite of '%array' is '%pointer', which 1947 is the default. 1948 1949 You cannot use '%array' when generating C++ scanner classes (the 1950 '-+' flag). 1951 1952'int yyleng' 1953 holds the length of the current token. 1954 1955'FILE *yyin' 1956 is the file which by default 'flex' reads from. It may be 1957 redefined but doing so only makes sense before scanning begins or 1958 after an EOF has been encountered. Changing it in the midst of 1959 scanning will have unexpected results since 'flex' buffers its 1960 input; use 'yyrestart()' instead. Once scanning terminates because 1961 an end-of-file has been seen, you can assign 'yyin' at the new 1962 input file and then call the scanner again to continue scanning. 1963 1964'void yyrestart( FILE *new_file )' 1965 may be called to point 'yyin' at the new input file. The 1966 switch-over to the new file is immediate (any previously 1967 buffered-up input is lost). Note that calling 'yyrestart()' with 1968 'yyin' as an argument thus throws away the current input buffer and 1969 continues scanning the same input file. 1970 1971'FILE *yyout' 1972 is the file to which 'ECHO' actions are done. It can be reassigned 1973 by the user. 1974 1975'YY_CURRENT_BUFFER' 1976 returns a 'YY_BUFFER_STATE' handle to the current buffer. 1977 1978'YY_START' 1979 returns an integer value corresponding to the current start 1980 condition. You can subsequently use this value with 'BEGIN' to 1981 return to that start condition. 1982 1983 1984File: flex.info, Node: Yacc, Next: Scanner Options, Prev: User Values, Up: Top 1985 198615 Interfacing with Yacc 1987************************ 1988 1989One of the main uses of 'flex' is as a companion to the 'yacc' 1990parser-generator. 'yacc' parsers expect to call a routine named 1991'yylex()' to find the next input token. The routine is supposed to 1992return the type of the next token as well as putting any associated 1993value in the global 'yylval'. To use 'flex' with 'yacc', one specifies 1994the '-d' option to 'yacc' to instruct it to generate the file 'y.tab.h' 1995containing definitions of all the '%tokens' appearing in the 'yacc' 1996input. This file is then included in the 'flex' scanner. For example, 1997if one of the tokens is 'TOK_NUMBER', part of the scanner might look 1998like: 1999 2000 %{ 2001 #include "y.tab.h" 2002 %} 2003 2004 %% 2005 2006 [0-9]+ yylval = atoi( yytext ); return TOK_NUMBER; 2007 2008 2009File: flex.info, Node: Scanner Options, Next: Performance, Prev: Yacc, Up: Top 2010 201116 Scanner Options 2012****************** 2013 2014The various 'flex' options are categorized by function in the following 2015menu. If you want to lookup a particular option by name, *Note Index of 2016Scanner Options::. 2017 2018* Menu: 2019 2020* Options for Specifying Filenames:: 2021* Options Affecting Scanner Behavior:: 2022* Code-Level And API Options:: 2023* Options for Scanner Speed and Size:: 2024* Debugging Options:: 2025* Miscellaneous Options:: 2026 2027 Even though there are many scanner options, a typical scanner might 2028only specify the following options: 2029 2030 %option 8bit reentrant bison-bridge 2031 %option warn nodefault 2032 %option yylineno 2033 %option outfile="scanner.c" header-file="scanner.h" 2034 2035 The first line specifies the general type of scanner we want. The 2036second line specifies that we are being careful. The third line asks 2037flex to track line numbers. The last line tells flex what to name the 2038files. (The options can be specified in any order. We just divided 2039them.) 2040 2041 'flex' also provides a mechanism for controlling options within the 2042scanner specification itself, rather than from the flex command-line. 2043This is done by including '%option' directives in the first section of 2044the scanner specification. You can specify multiple options with a 2045single '%option' directive, and multiple directives in the first section 2046of your flex input file. 2047 2048 Most options are given simply as names, optionally preceded by the 2049word 'no' (with no intervening whitespace) to negate their meaning. The 2050names are the same as their long-option equivalents (but without the 2051leading '--' ). 2052 2053 'flex' scans your rule actions to determine whether you use the 2054'REJECT' or 'yymore()' features. The 'REJECT' and 'yymore' options are 2055available to override its decision as to whether you use the options, 2056either by setting them (e.g., '%option reject)' to indicate the feature 2057is indeed used, or unsetting them to indicate it actually is not used 2058(e.g., '%option noyymore)'. 2059 2060 A number of options are available for lint purists who want to 2061suppress the appearance of unneeded routines in the generated scanner. 2062Each of the following, if unset (e.g., '%option nounput'), results in 2063the corresponding routine not appearing in the generated scanner: 2064 2065 input, unput 2066 yy_push_state, yy_pop_state, yy_top_state 2067 yy_scan_buffer, yy_scan_bytes, yy_scan_string 2068 2069 yyget_extra, yyset_extra, yyget_leng, yyget_text, 2070 yyget_lineno, yyset_lineno, yyget_in, yyset_in, 2071 yyget_out, yyset_out, yyget_lval, yyset_lval, 2072 yyget_lloc, yyset_lloc, yyget_debug, yyset_debug 2073 2074 (though 'yy_push_state()' and friends won't appear anyway unless you 2075use '%option stack)'. 2076 2077 2078File: flex.info, Node: Options for Specifying Filenames, Next: Options Affecting Scanner Behavior, Prev: Scanner Options, Up: Scanner Options 2079 208016.1 Options for Specifying Filenames 2081===================================== 2082 2083'--header-file=FILE, '%option header-file="FILE"'' 2084 instructs flex to write a C header to 'FILE'. This file contains 2085 function prototypes, extern variables, and types used by the 2086 scanner. Only the external API is exported by the header file. 2087 Many macros that are usable from within scanner actions are not 2088 exported to the header file. This is due to namespace problems and 2089 the goal of a clean external API. 2090 2091 While in the header, the macro 'yyIN_HEADER' is defined, where 'yy' 2092 is substituted with the appropriate prefix. 2093 2094 The '--header-file' option is not compatible with the '--c++' 2095 option, since the C++ scanner provides its own header in 2096 'yyFlexLexer.h'. 2097 2098'-oFILE, --outfile=FILE, '%option outfile="FILE"'' 2099 directs flex to write the scanner to the file 'FILE' instead of 2100 'lex.yy.c'. If you combine '--outfile' with the '--stdout' option, 2101 then the scanner is written to 'stdout' but its '#line' directives 2102 (see the '-l' option above) refer to the file 'FILE'. 2103 2104'-t, --stdout, '%option stdout'' 2105 instructs 'flex' to write the scanner it generates to standard 2106 output instead of 'lex.yy.c'. 2107 2108'-SFILE, --skel=FILE' 2109 overrides the default skeleton file from which 'flex' constructs 2110 its scanners. You'll never need this option unless you are doing 2111 'flex' maintenance or development. 2112 2113'--tables-file=FILE' 2114 Write serialized scanner dfa tables to FILE. The generated scanner 2115 will not contain the tables, and requires them to be loaded at 2116 runtime. *Note serialization::. 2117 2118'--tables-verify' 2119 This option is for flex development. We document it here in case 2120 you stumble upon it by accident or in case you suspect some 2121 inconsistency in the serialized tables. Flex will serialize the 2122 scanner dfa tables but will also generate the in-code tables as it 2123 normally does. At runtime, the scanner will verify that the 2124 serialized tables match the in-code tables, instead of loading 2125 them. 2126 2127 2128File: flex.info, Node: Options Affecting Scanner Behavior, Next: Code-Level And API Options, Prev: Options for Specifying Filenames, Up: Scanner Options 2129 213016.2 Options Affecting Scanner Behavior 2131======================================= 2132 2133'-i, --case-insensitive, '%option case-insensitive'' 2134 instructs 'flex' to generate a "case-insensitive" scanner. The 2135 case of letters given in the 'flex' input patterns will be ignored, 2136 and tokens in the input will be matched regardless of case. The 2137 matched text given in 'yytext' will have the preserved case (i.e., 2138 it will not be folded). For tricky behavior, see *note case and 2139 character ranges::. 2140 2141'-l, --lex-compat, '%option lex-compat'' 2142 turns on maximum compatibility with the original AT&T 'lex' 2143 implementation. Note that this does not mean _full_ compatibility. 2144 Use of this option costs a considerable amount of performance, and 2145 it cannot be used with the '--c++', '--full', '--fast', '-Cf', or 2146 '-CF' options. For details on the compatibilities it provides, see 2147 *note Lex and Posix::. This option also results in the name 2148 'YY_FLEX_LEX_COMPAT' being '#define''d in the generated scanner. 2149 2150'-B, --batch, '%option batch'' 2151 instructs 'flex' to generate a "batch" scanner, the opposite of 2152 _interactive_ scanners generated by '--interactive' (see below). 2153 In general, you use '-B' when you are _certain_ that your scanner 2154 will never be used interactively, and you want to squeeze a 2155 _little_ more performance out of it. If your goal is instead to 2156 squeeze out a _lot_ more performance, you should be using the '-Cf' 2157 or '-CF' options, which turn on '--batch' automatically anyway. 2158 2159'-I, --interactive, '%option interactive'' 2160 instructs 'flex' to generate an interactive scanner. An 2161 interactive scanner is one that only looks ahead to decide what 2162 token has been matched if it absolutely must. It turns out that 2163 always looking one extra character ahead, even if the scanner has 2164 already seen enough text to disambiguate the current token, is a 2165 bit faster than only looking ahead when necessary. But scanners 2166 that always look ahead give dreadful interactive performance; for 2167 example, when a user types a newline, it is not recognized as a 2168 newline token until they enter _another_ token, which often means 2169 typing in another whole line. 2170 2171 'flex' scanners default to 'interactive' unless you use the '-Cf' 2172 or '-CF' table-compression options (*note Performance::). That's 2173 because if you're looking for high-performance you should be using 2174 one of these options, so if you didn't, 'flex' assumes you'd rather 2175 trade off a bit of run-time performance for intuitive interactive 2176 behavior. Note also that you _cannot_ use '--interactive' in 2177 conjunction with '-Cf' or '-CF'. Thus, this option is not really 2178 needed; it is on by default for all those cases in which it is 2179 allowed. 2180 2181 You can force a scanner to _not_ be interactive by using '--batch' 2182 2183'-7, --7bit, '%option 7bit'' 2184 instructs 'flex' to generate a 7-bit scanner, i.e., one which can 2185 only recognize 7-bit characters in its input. The advantage of 2186 using '--7bit' is that the scanner's tables can be up to half the 2187 size of those generated using the '--8bit'. The disadvantage is 2188 that such scanners often hang or crash if their input contains an 2189 8-bit character. 2190 2191 Note, however, that unless you generate your scanner using the 2192 '-Cf' or '-CF' table compression options, use of '--7bit' will save 2193 only a small amount of table space, and make your scanner 2194 considerably less portable. 'Flex''s default behavior is to 2195 generate an 8-bit scanner unless you use the '-Cf' or '-CF', in 2196 which case 'flex' defaults to generating 7-bit scanners unless your 2197 site was always configured to generate 8-bit scanners (as will 2198 often be the case with non-USA sites). You can tell whether flex 2199 generated a 7-bit or an 8-bit scanner by inspecting the flag 2200 summary in the '--verbose' output as described above. 2201 2202 Note that if you use '-Cfe' or '-CFe' 'flex' still defaults to 2203 generating an 8-bit scanner, since usually with these compression 2204 options full 8-bit tables are not much more expensive than 7-bit 2205 tables. 2206 2207'-8, --8bit, '%option 8bit'' 2208 instructs 'flex' to generate an 8-bit scanner, i.e., one which can 2209 recognize 8-bit characters. This flag is only needed for scanners 2210 generated using '-Cf' or '-CF', as otherwise flex defaults to 2211 generating an 8-bit scanner anyway. 2212 2213 See the discussion of '--7bit' above for 'flex''s default behavior 2214 and the tradeoffs between 7-bit and 8-bit scanners. 2215 2216'--default, '%option default'' 2217 generate the default rule. 2218 2219'--always-interactive, '%option always-interactive'' 2220 instructs flex to generate a scanner which always considers its 2221 input _interactive_. Normally, on each new input file the scanner 2222 calls 'isatty()' in an attempt to determine whether the scanner's 2223 input source is interactive and thus should be read a character at 2224 a time. When this option is used, however, then no such call is 2225 made. 2226 2227'--never-interactive, '--never-interactive'' 2228 instructs flex to generate a scanner which never considers its 2229 input interactive. This is the opposite of 'always-interactive'. 2230 2231'-X, --posix, '%option posix'' 2232 turns on maximum compatibility with the POSIX 1003.2-1992 2233 definition of 'lex'. Since 'flex' was originally designed to 2234 implement the POSIX definition of 'lex' this generally involves 2235 very few changes in behavior. At the current writing the known 2236 differences between 'flex' and the POSIX standard are: 2237 2238 * In POSIX and AT&T 'lex', the repeat operator, '{}', has lower 2239 precedence than concatenation (thus 'ab{3}' yields 'ababab'). 2240 Most POSIX utilities use an Extended Regular Expression (ERE) 2241 precedence that has the precedence of the repeat operator 2242 higher than concatenation (which causes 'ab{3}' to yield 2243 'abbb'). By default, 'flex' places the precedence of the 2244 repeat operator higher than concatenation which matches the 2245 ERE processing of other POSIX utilities. When either 2246 '--posix' or '-l' are specified, 'flex' will use the 2247 traditional AT&T and POSIX-compliant precedence for the repeat 2248 operator where concatenation has higher precedence than the 2249 repeat operator. 2250 2251'--stack, '%option stack'' 2252 enables the use of start condition stacks (*note Start 2253 Conditions::). 2254 2255'--stdinit, '%option stdinit'' 2256 if set (i.e., %option stdinit) initializes 'yyin' and 'yyout' to 2257 'stdin' and 'stdout', instead of the default of 'NULL'. Some 2258 existing 'lex' programs depend on this behavior, even though it is 2259 not compliant with ANSI C, which does not require 'stdin' and 2260 'stdout' to be compile-time constant. In a reentrant scanner, 2261 however, this is not a problem since initialization is performed in 2262 'yylex_init' at runtime. 2263 2264'--yylineno, '%option yylineno'' 2265 directs 'flex' to generate a scanner that maintains the number of 2266 the current line read from its input in the global variable 2267 'yylineno'. This option is implied by '%option lex-compat'. In a 2268 reentrant C scanner, the macro 'yylineno' is accessible regardless 2269 of the value of '%option yylineno', however, its value is not 2270 modified by 'flex' unless '%option yylineno' is enabled. 2271 2272'--yywrap, '%option yywrap'' 2273 if unset (i.e., '--noyywrap)', makes the scanner not call 2274 'yywrap()' upon an end-of-file, but simply assume that there are no 2275 more files to scan (until the user points 'yyin' at a new file and 2276 calls 'yylex()' again). 2277 2278 2279File: flex.info, Node: Code-Level And API Options, Next: Options for Scanner Speed and Size, Prev: Options Affecting Scanner Behavior, Up: Scanner Options 2280 228116.3 Code-Level And API Options 2282=============================== 2283 2284'--ansi-definitions, '%option ansi-definitions'' 2285 Deprecated, ignored 2286 2287'--ansi-prototypes, '%option ansi-prototypes'' 2288 Deprecated, ignored 2289 2290'--bison-bridge, '%option bison-bridge'' 2291 instructs flex to generate a C scanner that is meant to be called 2292 by a 'GNU bison' parser. The scanner has minor API changes for 2293 'bison' compatibility. In particular, the declaration of 'yylex' 2294 is modified to take an additional parameter, 'yylval'. *Note Bison 2295 Bridge::. 2296 2297'--bison-locations, '%option bison-locations'' 2298 instruct flex that 'GNU bison' '%locations' are being used. This 2299 means 'yylex' will be passed an additional parameter, 'yylloc'. 2300 This option implies '%option bison-bridge'. *Note Bison Bridge::. 2301 2302'-L, --noline, '%option noline'' 2303 instructs 'flex' not to generate '#line' directives. Without this 2304 option, 'flex' peppers the generated scanner with '#line' 2305 directives so error messages in the actions will be correctly 2306 located with respect to either the original 'flex' input file (if 2307 the errors are due to code in the input file), or 'lex.yy.c' (if 2308 the errors are 'flex''s fault - you should report these sorts of 2309 errors to the email address given in *note Reporting Bugs::). 2310 2311'-R, --reentrant, '%option reentrant'' 2312 instructs flex to generate a reentrant C scanner. The generated 2313 scanner may safely be used in a multi-threaded environment. The 2314 API for a reentrant scanner is different than for a non-reentrant 2315 scanner *note Reentrant::). Because of the API difference between 2316 reentrant and non-reentrant 'flex' scanners, non-reentrant flex 2317 code must be modified before it is suitable for use with this 2318 option. This option is not compatible with the '--c++' option. 2319 2320 The option '--reentrant' does not affect the performance of the 2321 scanner. 2322 2323'-+, --c++, '%option c++'' 2324 specifies that you want flex to generate a C++ scanner class. 2325 *Note Cxx::, for details. 2326 2327'--array, '%option array'' 2328 specifies that you want yytext to be an array instead of a char* 2329 2330'--pointer, '%option pointer'' 2331 specify that 'yytext' should be a 'char *', not an array. This 2332 default is 'char *'. 2333 2334'-PPREFIX, --prefix=PREFIX, '%option prefix="PREFIX"'' 2335 changes the default 'yy' prefix used by 'flex' for all 2336 globally-visible variable and function names to instead be 2337 'PREFIX'. For example, '--prefix=foo' changes the name of 'yytext' 2338 to 'footext'. It also changes the name of the default output file 2339 from 'lex.yy.c' to 'lex.foo.c'. Here is a partial list of the 2340 names affected: 2341 2342 yy_create_buffer 2343 yy_delete_buffer 2344 yy_flex_debug 2345 yy_init_buffer 2346 yy_flush_buffer 2347 yy_load_buffer_state 2348 yy_switch_to_buffer 2349 yyin 2350 yyleng 2351 yylex 2352 yylineno 2353 yyout 2354 yyrestart 2355 yytext 2356 yywrap 2357 yyalloc 2358 yyrealloc 2359 yyfree 2360 2361 (If you are using a C++ scanner, then only 'yywrap' and 2362 'yyFlexLexer' are affected.) Within your scanner itself, you can 2363 still refer to the global variables and functions using either 2364 version of their name; but externally, they have the modified name. 2365 2366 This option lets you easily link together multiple 'flex' programs 2367 into the same executable. Note, though, that using this option 2368 also renames 'yywrap()', so you now _must_ either provide your own 2369 (appropriately-named) version of the routine for your scanner, or 2370 use '%option noyywrap', as linking with '-lfl' no longer provides 2371 one for you by default. 2372 2373'--main, '%option main'' 2374 directs flex to provide a default 'main()' program for the scanner, 2375 which simply calls 'yylex()'. This option implies 'noyywrap' (see 2376 below). 2377 2378'--nounistd, '%option nounistd'' 2379 suppresses inclusion of the non-ANSI header file 'unistd.h'. This 2380 option is meant to target environments in which 'unistd.h' does not 2381 exist. Be aware that certain options may cause flex to generate 2382 code that relies on functions normally found in 'unistd.h', (e.g. 2383 'isatty()', 'read()'.) If you wish to use these functions, you 2384 will have to inform your compiler where to find them. *Note 2385 option-always-interactive::. *Note option-read::. 2386 2387'--yyclass=NAME, '%option yyclass="NAME"'' 2388 only applies when generating a C++ scanner (the '--c++' option). 2389 It informs 'flex' that you have derived 'NAME' as a subclass of 2390 'yyFlexLexer', so 'flex' will place your actions in the member 2391 function 'foo::yylex()' instead of 'yyFlexLexer::yylex()'. It also 2392 generates a 'yyFlexLexer::yylex()' member function that emits a 2393 run-time error (by invoking 'yyFlexLexer::LexerError())' if called. 2394 *Note Cxx::. 2395 2396 2397File: flex.info, Node: Options for Scanner Speed and Size, Next: Debugging Options, Prev: Code-Level And API Options, Up: Scanner Options 2398 239916.4 Options for Scanner Speed and Size 2400======================================= 2401 2402'-C[aefFmr]' 2403 controls the degree of table compression and, more generally, 2404 trade-offs between small scanners and fast scanners. 2405 2406 '-C' 2407 A lone '-C' specifies that the scanner tables should be 2408 compressed but neither equivalence classes nor 2409 meta-equivalence classes should be used. 2410 2411 '-Ca, --align, '%option align'' 2412 ("align") instructs flex to trade off larger tables in the 2413 generated scanner for faster performance because the elements 2414 of the tables are better aligned for memory access and 2415 computation. On some RISC architectures, fetching and 2416 manipulating longwords is more efficient than with 2417 smaller-sized units such as shortwords. This option can 2418 quadruple the size of the tables used by your scanner. 2419 2420 '-Ce, --ecs, '%option ecs'' 2421 directs 'flex' to construct "equivalence classes", i.e., sets 2422 of characters which have identical lexical properties (for 2423 example, if the only appearance of digits in the 'flex' input 2424 is in the character class "[0-9]" then the digits '0', '1', 2425 ..., '9' will all be put in the same equivalence class). 2426 Equivalence classes usually give dramatic reductions in the 2427 final table/object file sizes (typically a factor of 2-5) and 2428 are pretty cheap performance-wise (one array look-up per 2429 character scanned). 2430 2431 '-Cf' 2432 specifies that the "full" scanner tables should be generated - 2433 'flex' should not compress the tables by taking advantages of 2434 similar transition functions for different states. 2435 2436 '-CF' 2437 specifies that the alternate fast scanner representation 2438 (described above under the '--fast' flag) should be used. 2439 This option cannot be used with '--c++'. 2440 2441 '-Cm, --meta-ecs, '%option meta-ecs'' 2442 directs 'flex' to construct "meta-equivalence classes", which 2443 are sets of equivalence classes (or characters, if equivalence 2444 classes are not being used) that are commonly used together. 2445 Meta-equivalence classes are often a big win when using 2446 compressed tables, but they have a moderate performance impact 2447 (one or two 'if' tests and one array look-up per character 2448 scanned). 2449 2450 '-Cr, --read, '%option read'' 2451 causes the generated scanner to _bypass_ use of the standard 2452 I/O library ('stdio') for input. Instead of calling 'fread()' 2453 or 'getc()', the scanner will use the 'read()' system call, 2454 resulting in a performance gain which varies from system to 2455 system, but in general is probably negligible unless you are 2456 also using '-Cf' or '-CF'. Using '-Cr' can cause strange 2457 behavior if, for example, you read from 'yyin' using 'stdio' 2458 prior to calling the scanner (because the scanner will miss 2459 whatever text your previous reads left in the 'stdio' input 2460 buffer). '-Cr' has no effect if you define 'YY_INPUT()' 2461 (*note Generated Scanner::). 2462 2463 The options '-Cf' or '-CF' and '-Cm' do not make sense together - 2464 there is no opportunity for meta-equivalence classes if the table 2465 is not being compressed. Otherwise the options may be freely 2466 mixed, and are cumulative. 2467 2468 The default setting is '-Cem', which specifies that 'flex' should 2469 generate equivalence classes and meta-equivalence classes. This 2470 setting provides the highest degree of table compression. You can 2471 trade off faster-executing scanners at the cost of larger tables 2472 with the following generally being true: 2473 2474 slowest & smallest 2475 -Cem 2476 -Cm 2477 -Ce 2478 -C 2479 -C{f,F}e 2480 -C{f,F} 2481 -C{f,F}a 2482 fastest & largest 2483 2484 Note that scanners with the smallest tables are usually generated 2485 and compiled the quickest, so during development you will usually 2486 want to use the default, maximal compression. 2487 2488 '-Cfe' is often a good compromise between speed and size for 2489 production scanners. 2490 2491'-f, --full, '%option full'' 2492 specifies "fast scanner". No table compression is done and 'stdio' 2493 is bypassed. The result is large but fast. This option is 2494 equivalent to '--Cfr' 2495 2496'-F, --fast, '%option fast'' 2497 specifies that the _fast_ scanner table representation should be 2498 used (and 'stdio' bypassed). This representation is about as fast 2499 as the full table representation '--full', and for some sets of 2500 patterns will be considerably smaller (and for others, larger). In 2501 general, if the pattern set contains both _keywords_ and a 2502 catch-all, _identifier_ rule, such as in the set: 2503 2504 "case" return TOK_CASE; 2505 "switch" return TOK_SWITCH; 2506 ... 2507 "default" return TOK_DEFAULT; 2508 [a-z]+ return TOK_ID; 2509 2510 then you're better off using the full table representation. If 2511 only the _identifier_ rule is present and you then use a hash table 2512 or some such to detect the keywords, you're better off using 2513 '--fast'. 2514 2515 This option is equivalent to '-CFr'. It cannot be used with 2516 '--c++'. 2517 2518 2519File: flex.info, Node: Debugging Options, Next: Miscellaneous Options, Prev: Options for Scanner Speed and Size, Up: Scanner Options 2520 252116.5 Debugging Options 2522====================== 2523 2524'-b, --backup, '%option backup'' 2525 Generate backing-up information to 'lex.backup'. This is a list of 2526 scanner states which require backing up and the input characters on 2527 which they do so. By adding rules one can remove backing-up 2528 states. If _all_ backing-up states are eliminated and '-Cf' or 2529 '-CF' is used, the generated scanner will run faster (see the 2530 '--perf-report' flag). Only users who wish to squeeze every last 2531 cycle out of their scanners need worry about this option. (*note 2532 Performance::). 2533 2534'-d, --debug, '%option debug'' 2535 makes the generated scanner run in "debug" mode. Whenever a 2536 pattern is recognized and the global variable 'yy_flex_debug' is 2537 non-zero (which is the default), the scanner will write to 'stderr' 2538 a line of the form: 2539 2540 -accepting rule at line 53 ("the matched text") 2541 2542 The line number refers to the location of the rule in the file 2543 defining the scanner (i.e., the file that was fed to flex). 2544 Messages are also generated when the scanner backs up, accepts the 2545 default rule, reaches the end of its input buffer (or encounters a 2546 NUL; at this point, the two look the same as far as the scanner's 2547 concerned), or reaches an end-of-file. 2548 2549'-p, --perf-report, '%option perf-report'' 2550 generates a performance report to 'stderr'. The report consists of 2551 comments regarding features of the 'flex' input file which will 2552 cause a serious loss of performance in the resulting scanner. If 2553 you give the flag twice, you will also get comments regarding 2554 features that lead to minor performance losses. 2555 2556 Note that the use of 'REJECT', and variable trailing context (*note 2557 Limitations::) entails a substantial performance penalty; use of 2558 'yymore()', the '^' operator, and the '--interactive' flag entail 2559 minor performance penalties. 2560 2561'-s, --nodefault, '%option nodefault'' 2562 causes the _default rule_ (that unmatched scanner input is echoed 2563 to 'stdout)' to be suppressed. If the scanner encounters input 2564 that does not match any of its rules, it aborts with an error. 2565 This option is useful for finding holes in a scanner's rule set. 2566 2567'-T, --trace, '%option trace'' 2568 makes 'flex' run in "trace" mode. It will generate a lot of 2569 messages to 'stderr' concerning the form of the input and the 2570 resultant non-deterministic and deterministic finite automata. 2571 This option is mostly for use in maintaining 'flex'. 2572 2573'-w, --nowarn, '%option nowarn'' 2574 suppresses warning messages. 2575 2576'-v, --verbose, '%option verbose'' 2577 specifies that 'flex' should write to 'stderr' a summary of 2578 statistics regarding the scanner it generates. Most of the 2579 statistics are meaningless to the casual 'flex' user, but the first 2580 line identifies the version of 'flex' (same as reported by 2581 '--version'), and the next line the flags used when generating the 2582 scanner, including those that are on by default. 2583 2584'--warn, '%option warn'' 2585 warn about certain things. In particular, if the default rule can 2586 be matched but no default rule has been given, the flex will warn 2587 you. We recommend using this option always. 2588 2589 2590File: flex.info, Node: Miscellaneous Options, Prev: Debugging Options, Up: Scanner Options 2591 259216.6 Miscellaneous Options 2593========================== 2594 2595'-c' 2596 A do-nothing option included for POSIX compliance. 2597 2598'-h, -?, --help' 2599 generates a "help" summary of 'flex''s options to 'stdout' and then 2600 exits. 2601 2602'-n' 2603 Another do-nothing option included for POSIX compliance. 2604 2605'-V, --version' 2606 prints the version number to 'stdout' and exits. 2607 2608 2609File: flex.info, Node: Performance, Next: Cxx, Prev: Scanner Options, Up: Top 2610 261117 Performance Considerations 2612***************************** 2613 2614The main design goal of 'flex' is that it generate high-performance 2615scanners. It has been optimized for dealing well with large sets of 2616rules. Aside from the effects on scanner speed of the table compression 2617'-C' options outlined above, there are a number of options/actions which 2618degrade performance. These are, from most expensive to least: 2619 2620 REJECT 2621 arbitrary trailing context 2622 2623 pattern sets that require backing up 2624 %option yylineno 2625 %array 2626 2627 %option interactive 2628 %option always-interactive 2629 2630 ^ beginning-of-line operator 2631 yymore() 2632 2633 with the first two all being quite expensive and the last two being 2634quite cheap. Note also that 'unput()' is implemented as a routine call 2635that potentially does quite a bit of work, while 'yyless()' is a 2636quite-cheap macro. So if you are just putting back some excess text you 2637scanned, use 'yyless()'. 2638 2639 'REJECT' should be avoided at all costs when performance is 2640important. It is a particularly expensive option. 2641 2642 There is one case when '%option yylineno' can be expensive. That is 2643when your patterns match long tokens that could _possibly_ contain a 2644newline character. There is no performance penalty for rules that can 2645not possibly match newlines, since flex does not need to check them for 2646newlines. In general, you should avoid rules such as '[^f]+', which 2647match very long tokens, including newlines, and may possibly match your 2648entire file! A better approach is to separate '[^f]+' into two rules: 2649 2650 %option yylineno 2651 %% 2652 [^f\n]+ 2653 \n+ 2654 2655 The above scanner does not incur a performance penalty. 2656 2657 Getting rid of backing up is messy and often may be an enormous 2658amount of work for a complicated scanner. In principal, one begins by 2659using the '-b' flag to generate a 'lex.backup' file. For example, on 2660the input: 2661 2662 %% 2663 foo return TOK_KEYWORD; 2664 foobar return TOK_KEYWORD; 2665 2666 the file looks like: 2667 2668 State #6 is non-accepting - 2669 associated rule line numbers: 2670 2 3 2671 out-transitions: [ o ] 2672 jam-transitions: EOF [ \001-n p-\177 ] 2673 2674 State #8 is non-accepting - 2675 associated rule line numbers: 2676 3 2677 out-transitions: [ a ] 2678 jam-transitions: EOF [ \001-` b-\177 ] 2679 2680 State #9 is non-accepting - 2681 associated rule line numbers: 2682 3 2683 out-transitions: [ r ] 2684 jam-transitions: EOF [ \001-q s-\177 ] 2685 2686 Compressed tables always back up. 2687 2688 The first few lines tell us that there's a scanner state in which it 2689can make a transition on an 'o' but not on any other character, and that 2690in that state the currently scanned text does not match any rule. The 2691state occurs when trying to match the rules found at lines 2 and 3 in 2692the input file. If the scanner is in that state and then reads 2693something other than an 'o', it will have to back up to find a rule 2694which is matched. With a bit of headscratching one can see that this 2695must be the state it's in when it has seen 'fo'. When this has 2696happened, if anything other than another 'o' is seen, the scanner will 2697have to back up to simply match the 'f' (by the default rule). 2698 2699 The comment regarding State #8 indicates there's a problem when 2700'foob' has been scanned. Indeed, on any character other than an 'a', 2701the scanner will have to back up to accept "foo". Similarly, the 2702comment for State #9 concerns when 'fooba' has been scanned and an 'r' 2703does not follow. 2704 2705 The final comment reminds us that there's no point going to all the 2706trouble of removing backing up from the rules unless we're using '-Cf' 2707or '-CF', since there's no performance gain doing so with compressed 2708scanners. 2709 2710 The way to remove the backing up is to add "error" rules: 2711 2712 %% 2713 foo return TOK_KEYWORD; 2714 foobar return TOK_KEYWORD; 2715 2716 fooba | 2717 foob | 2718 fo { 2719 /* false alarm, not really a keyword */ 2720 return TOK_ID; 2721 } 2722 2723 Eliminating backing up among a list of keywords can also be done 2724using a "catch-all" rule: 2725 2726 %% 2727 foo return TOK_KEYWORD; 2728 foobar return TOK_KEYWORD; 2729 2730 [a-z]+ return TOK_ID; 2731 2732 This is usually the best solution when appropriate. 2733 2734 Backing up messages tend to cascade. With a complicated set of rules 2735it's not uncommon to get hundreds of messages. If one can decipher 2736them, though, it often only takes a dozen or so rules to eliminate the 2737backing up (though it's easy to make a mistake and have an error rule 2738accidentally match a valid token. A possible future 'flex' feature will 2739be to automatically add rules to eliminate backing up). 2740 2741 It's important to keep in mind that you gain the benefits of 2742eliminating backing up only if you eliminate _every_ instance of backing 2743up. Leaving just one means you gain nothing. 2744 2745 _Variable_ trailing context (where both the leading and trailing 2746parts do not have a fixed length) entails almost the same performance 2747loss as 'REJECT' (i.e., substantial). So when possible a rule like: 2748 2749 %% 2750 mouse|rat/(cat|dog) run(); 2751 2752 is better written: 2753 2754 %% 2755 mouse/cat|dog run(); 2756 rat/cat|dog run(); 2757 2758 or as 2759 2760 %% 2761 mouse|rat/cat run(); 2762 mouse|rat/dog run(); 2763 2764 Note that here the special '|' action does _not_ provide any savings, 2765and can even make things worse (*note Limitations::). 2766 2767 Another area where the user can increase a scanner's performance (and 2768one that's easier to implement) arises from the fact that the longer the 2769tokens matched, the faster the scanner will run. This is because with 2770long tokens the processing of most input characters takes place in the 2771(short) inner scanning loop, and does not often have to go through the 2772additional work of setting up the scanning environment (e.g., 'yytext') 2773for the action. Recall the scanner for C comments: 2774 2775 %x comment 2776 %% 2777 int line_num = 1; 2778 2779 "/*" BEGIN(comment); 2780 2781 <comment>[^*\n]* 2782 <comment>"*"+[^*/\n]* 2783 <comment>\n ++line_num; 2784 <comment>"*"+"/" BEGIN(INITIAL); 2785 2786 This could be sped up by writing it as: 2787 2788 %x comment 2789 %% 2790 int line_num = 1; 2791 2792 "/*" BEGIN(comment); 2793 2794 <comment>[^*\n]* 2795 <comment>[^*\n]*\n ++line_num; 2796 <comment>"*"+[^*/\n]* 2797 <comment>"*"+[^*/\n]*\n ++line_num; 2798 <comment>"*"+"/" BEGIN(INITIAL); 2799 2800 Now instead of each newline requiring the processing of another 2801action, recognizing the newlines is distributed over the other rules to 2802keep the matched text as long as possible. Note that _adding_ rules 2803does _not_ slow down the scanner! The speed of the scanner is 2804independent of the number of rules or (modulo the considerations given 2805at the beginning of this section) how complicated the rules are with 2806regard to operators such as '*' and '|'. 2807 2808 A final example in speeding up a scanner: suppose you want to scan 2809through a file containing identifiers and keywords, one per line and 2810with no other extraneous characters, and recognize all the keywords. A 2811natural first approach is: 2812 2813 %% 2814 asm | 2815 auto | 2816 break | 2817 ... etc ... 2818 volatile | 2819 while /* it's a keyword */ 2820 2821 .|\n /* it's not a keyword */ 2822 2823 To eliminate the back-tracking, introduce a catch-all rule: 2824 2825 %% 2826 asm | 2827 auto | 2828 break | 2829 ... etc ... 2830 volatile | 2831 while /* it's a keyword */ 2832 2833 [a-z]+ | 2834 .|\n /* it's not a keyword */ 2835 2836 Now, if it's guaranteed that there's exactly one word per line, then 2837we can reduce the total number of matches by a half by merging in the 2838recognition of newlines with that of the other tokens: 2839 2840 %% 2841 asm\n | 2842 auto\n | 2843 break\n | 2844 ... etc ... 2845 volatile\n | 2846 while\n /* it's a keyword */ 2847 2848 [a-z]+\n | 2849 .|\n /* it's not a keyword */ 2850 2851 One has to be careful here, as we have now reintroduced backing up 2852into the scanner. In particular, while _we_ know that there will never 2853be any characters in the input stream other than letters or newlines, 2854'flex' can't figure this out, and it will plan for possibly needing to 2855back up when it has scanned a token like 'auto' and then the next 2856character is something other than a newline or a letter. Previously it 2857would then just match the 'auto' rule and be done, but now it has no 2858'auto' rule, only a 'auto\n' rule. To eliminate the possibility of 2859backing up, we could either duplicate all rules but without final 2860newlines, or, since we never expect to encounter such an input and 2861therefore don't how it's classified, we can introduce one more catch-all 2862rule, this one which doesn't include a newline: 2863 2864 %% 2865 asm\n | 2866 auto\n | 2867 break\n | 2868 ... etc ... 2869 volatile\n | 2870 while\n /* it's a keyword */ 2871 2872 [a-z]+\n | 2873 [a-z]+ | 2874 .|\n /* it's not a keyword */ 2875 2876 Compiled with '-Cf', this is about as fast as one can get a 'flex' 2877scanner to go for this particular problem. 2878 2879 A final note: 'flex' is slow when matching 'NUL's, particularly when 2880a token contains multiple 'NUL's. It's best to write rules which match 2881_short_ amounts of text if it's anticipated that the text will often 2882include 'NUL's. 2883 2884 Another final note regarding performance: as mentioned in *note 2885Matching::, dynamically resizing 'yytext' to accommodate huge tokens is 2886a slow process because it presently requires that the (huge) token be 2887rescanned from the beginning. Thus if performance is vital, you should 2888attempt to match "large" quantities of text but not "huge" quantities, 2889where the cutoff between the two is at about 8K characters per token. 2890 2891 2892File: flex.info, Node: Cxx, Next: Reentrant, Prev: Performance, Up: Top 2893 289418 Generating C++ Scanners 2895************************** 2896 2897*IMPORTANT*: the present form of the scanning class is _experimental_ 2898and may change considerably between major releases. 2899 2900 'flex' provides two different ways to generate scanners for use with 2901C++. The first way is to simply compile a scanner generated by 'flex' 2902using a C++ compiler instead of a C compiler. You should not encounter 2903any compilation errors (*note Reporting Bugs::). You can then use C++ 2904code in your rule actions instead of C code. Note that the default 2905input source for your scanner remains 'yyin', and default echoing is 2906still done to 'yyout'. Both of these remain 'FILE *' variables and not 2907C++ _streams_. 2908 2909 You can also use 'flex' to generate a C++ scanner class, using the 2910'-+' option (or, equivalently, '%option c++)', which is automatically 2911specified if the name of the 'flex' executable ends in a '+', such as 2912'flex++'. When using this option, 'flex' defaults to generating the 2913scanner to the file 'lex.yy.cc' instead of 'lex.yy.c'. The generated 2914scanner includes the header file 'FlexLexer.h', which defines the 2915interface to two C++ classes. 2916 2917 The first class in 'FlexLexer.h', 'FlexLexer', provides an abstract 2918base class defining the general scanner class interface. It provides 2919the following member functions: 2920 2921'const char* YYText()' 2922 returns the text of the most recently matched token, the equivalent 2923 of 'yytext'. 2924 2925'int YYLeng()' 2926 returns the length of the most recently matched token, the 2927 equivalent of 'yyleng'. 2928 2929'int lineno() const' 2930 returns the current input line number (see '%option yylineno)', or 2931 '1' if '%option yylineno' was not used. 2932 2933'void set_debug( int flag )' 2934 sets the debugging flag for the scanner, equivalent to assigning to 2935 'yy_flex_debug' (*note Scanner Options::). Note that you must 2936 build the scanner using '%option debug' to include debugging 2937 information in it. 2938 2939'int debug() const' 2940 returns the current setting of the debugging flag. 2941 2942 Also provided are member functions equivalent to 2943'yy_switch_to_buffer()', 'yy_create_buffer()' (though the first argument 2944is an 'istream&' object reference and not a 'FILE*)', 2945'yy_flush_buffer()', 'yy_delete_buffer()', and 'yyrestart()' (again, the 2946first argument is a 'istream&' object reference). 2947 2948 The second class defined in 'FlexLexer.h' is 'yyFlexLexer', which is 2949derived from 'FlexLexer'. It defines the following additional member 2950functions: 2951 2952'yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )' 2953'yyFlexLexer( istream& arg_yyin, ostream& arg_yyout )' 2954 constructs a 'yyFlexLexer' object using the given streams for input 2955 and output. If not specified, the streams default to 'cin' and 2956 'cout', respectively. 'yyFlexLexer' does not take ownership of its 2957 stream arguments. It's up to the user to ensure the streams 2958 pointed to remain alive at least as long as the 'yyFlexLexer' 2959 instance. 2960 2961'virtual int yylex()' 2962 performs the same role is 'yylex()' does for ordinary 'flex' 2963 scanners: it scans the input stream, consuming tokens, until a 2964 rule's action returns a value. If you derive a subclass 'S' from 2965 'yyFlexLexer' and want to access the member functions and variables 2966 of 'S' inside 'yylex()', then you need to use '%option yyclass="S"' 2967 to inform 'flex' that you will be using that subclass instead of 2968 'yyFlexLexer'. In this case, rather than generating 2969 'yyFlexLexer::yylex()', 'flex' generates 'S::yylex()' (and also 2970 generates a dummy 'yyFlexLexer::yylex()' that calls 2971 'yyFlexLexer::LexerError()' if called). 2972 2973'virtual void switch_streams(istream* new_in = 0, ostream* new_out = 0)' 2974'virtual void switch_streams(istream& new_in, ostream& new_out)' 2975 reassigns 'yyin' to 'new_in' (if non-null) and 'yyout' to 'new_out' 2976 (if non-null), deleting the previous input buffer if 'yyin' is 2977 reassigned. 2978 2979'int yylex( istream* new_in, ostream* new_out = 0 )' 2980'int yylex( istream& new_in, ostream& new_out )' 2981 first switches the input streams via 'switch_streams( new_in, 2982 new_out )' and then returns the value of 'yylex()'. 2983 2984 In addition, 'yyFlexLexer' defines the following protected virtual 2985functions which you can redefine in derived classes to tailor the 2986scanner: 2987 2988'virtual int LexerInput( char* buf, int max_size )' 2989 reads up to 'max_size' characters into 'buf' and returns the number 2990 of characters read. To indicate end-of-input, return 0 characters. 2991 Note that 'interactive' scanners (see the '-B' and '-I' flags in 2992 *note Scanner Options::) define the macro 'YY_INTERACTIVE'. If you 2993 redefine 'LexerInput()' and need to take different actions 2994 depending on whether or not the scanner might be scanning an 2995 interactive input source, you can test for the presence of this 2996 name via '#ifdef' statements. 2997 2998'virtual void LexerOutput( const char* buf, int size )' 2999 writes out 'size' characters from the buffer 'buf', which, while 3000 'NUL'-terminated, may also contain internal 'NUL's if the scanner's 3001 rules can match text with 'NUL's in them. 3002 3003'virtual void LexerError( const char* msg )' 3004 reports a fatal error message. The default version of this 3005 function writes the message to the stream 'cerr' and exits. 3006 3007 Note that a 'yyFlexLexer' object contains its _entire_ scanning 3008state. Thus you can use such objects to create reentrant scanners, but 3009see also *note Reentrant::. You can instantiate multiple instances of 3010the same 'yyFlexLexer' class, and you can also combine multiple C++ 3011scanner classes together in the same program using the '-P' option 3012discussed above. 3013 3014 Finally, note that the '%array' feature is not available to C++ 3015scanner classes; you must use '%pointer' (the default). 3016 3017 Here is an example of a simple C++ scanner: 3018 3019 // An example of using the flex C++ scanner class. 3020 3021 %{ 3022 #include <iostream> 3023 using namespace std; 3024 int mylineno = 0; 3025 %} 3026 3027 %option noyywrap c++ 3028 3029 string \"[^\n"]+\" 3030 3031 ws [ \t]+ 3032 3033 alpha [A-Za-z] 3034 dig [0-9] 3035 name ({alpha}|{dig}|\$)({alpha}|{dig}|[_.\-/$])* 3036 num1 [-+]?{dig}+\.?([eE][-+]?{dig}+)? 3037 num2 [-+]?{dig}*\.{dig}+([eE][-+]?{dig}+)? 3038 number {num1}|{num2} 3039 3040 %% 3041 3042 {ws} /* skip blanks and tabs */ 3043 3044 "/*" { 3045 int c; 3046 3047 while((c = yyinput()) != 0) 3048 { 3049 if(c == '\n') 3050 ++mylineno; 3051 3052 else if(c == '*') 3053 { 3054 if((c = yyinput()) == '/') 3055 break; 3056 else 3057 unput(c); 3058 } 3059 } 3060 } 3061 3062 {number} cout << "number " << YYText() << '\n'; 3063 3064 \n mylineno++; 3065 3066 {name} cout << "name " << YYText() << '\n'; 3067 3068 {string} cout << "string " << YYText() << '\n'; 3069 3070 %% 3071 3072 // This include is required if main() is an another source file. 3073 //#include <FlexLexer.h> 3074 3075 int main( int /* argc */, char** /* argv */ ) 3076 { 3077 FlexLexer* lexer = new yyFlexLexer; 3078 while(lexer->yylex() != 0) 3079 ; 3080 return 0; 3081 } 3082 3083 If you want to create multiple (different) lexer classes, you use the 3084'-P' flag (or the 'prefix=' option) to rename each 'yyFlexLexer' to some 3085other 'xxFlexLexer'. You then can include '<FlexLexer.h>' in your other 3086sources once per lexer class, first renaming 'yyFlexLexer' as follows: 3087 3088 #undef yyFlexLexer 3089 #define yyFlexLexer xxFlexLexer 3090 #include <FlexLexer.h> 3091 3092 #undef yyFlexLexer 3093 #define yyFlexLexer zzFlexLexer 3094 #include <FlexLexer.h> 3095 3096 if, for example, you used '%option prefix="xx"' for one of your 3097scanners and '%option prefix="zz"' for the other. 3098 3099 3100File: flex.info, Node: Reentrant, Next: Lex and Posix, Prev: Cxx, Up: Top 3101 310219 Reentrant C Scanners 3103*********************** 3104 3105'flex' has the ability to generate a reentrant C scanner. This is 3106accomplished by specifying '%option reentrant' ('-R') The generated 3107scanner is both portable, and safe to use in one or more separate 3108threads of control. The most common use for reentrant scanners is from 3109within multi-threaded applications. Any thread may create and execute a 3110reentrant 'flex' scanner without the need for synchronization with other 3111threads. 3112 3113* Menu: 3114 3115* Reentrant Uses:: 3116* Reentrant Overview:: 3117* Reentrant Example:: 3118* Reentrant Detail:: 3119* Reentrant Functions:: 3120 3121 3122File: flex.info, Node: Reentrant Uses, Next: Reentrant Overview, Prev: Reentrant, Up: Reentrant 3123 312419.1 Uses for Reentrant Scanners 3125================================ 3126 3127However, there are other uses for a reentrant scanner. For example, you 3128could scan two or more files simultaneously to implement a 'diff' at the 3129token level (i.e., instead of at the character level): 3130 3131 /* Example of maintaining more than one active scanner. */ 3132 3133 do { 3134 int tok1, tok2; 3135 3136 tok1 = yylex( scanner_1 ); 3137 tok2 = yylex( scanner_2 ); 3138 3139 if( tok1 != tok2 ) 3140 printf("Files are different."); 3141 3142 } while ( tok1 && tok2 ); 3143 3144 Another use for a reentrant scanner is recursion. (Note that a 3145recursive scanner can also be created using a non-reentrant scanner and 3146buffer states. *Note Multiple Input Buffers::.) 3147 3148 The following crude scanner supports the 'eval' command by invoking 3149another instance of itself. 3150 3151 /* Example of recursive invocation. */ 3152 3153 %option reentrant 3154 3155 %% 3156 "eval(".+")" { 3157 yyscan_t scanner; 3158 YY_BUFFER_STATE buf; 3159 3160 yylex_init( &scanner ); 3161 yytext[yyleng-1] = ' '; 3162 3163 buf = yy_scan_string( yytext + 5, scanner ); 3164 yylex( scanner ); 3165 3166 yy_delete_buffer(buf,scanner); 3167 yylex_destroy( scanner ); 3168 } 3169 ... 3170 %% 3171 3172 3173File: flex.info, Node: Reentrant Overview, Next: Reentrant Example, Prev: Reentrant Uses, Up: Reentrant 3174 317519.2 An Overview of the Reentrant API 3176===================================== 3177 3178The API for reentrant scanners is different than for non-reentrant 3179scanners. Here is a quick overview of the API: 3180 3181 '%option reentrant' must be specified. 3182 3183 * All functions take one additional argument: 'yyscanner' 3184 3185 * All global variables are replaced by their macro equivalents. (We 3186 tell you this because it may be important to you during debugging.) 3187 3188 * 'yylex_init' and 'yylex_destroy' must be called before and after 3189 'yylex', respectively. 3190 3191 * Accessor methods (get/set functions) provide access to common 3192 'flex' variables. 3193 3194 * User-specific data can be stored in 'yyextra'. 3195 3196 3197File: flex.info, Node: Reentrant Example, Next: Reentrant Detail, Prev: Reentrant Overview, Up: Reentrant 3198 319919.3 Reentrant Example 3200====================== 3201 3202First, an example of a reentrant scanner: 3203 /* This scanner prints "//" comments. */ 3204 3205 %option reentrant stack noyywrap 3206 %x COMMENT 3207 3208 %% 3209 3210 "//" yy_push_state( COMMENT, yyscanner); 3211 .|\n 3212 3213 <COMMENT>\n yy_pop_state( yyscanner ); 3214 <COMMENT>[^\n]+ fprintf( yyout, "%s\n", yytext); 3215 3216 %% 3217 3218 int main ( int argc, char * argv[] ) 3219 { 3220 yyscan_t scanner; 3221 3222 yylex_init ( &scanner ); 3223 yylex ( scanner ); 3224 yylex_destroy ( scanner ); 3225 return 0; 3226 } 3227 3228 3229File: flex.info, Node: Reentrant Detail, Next: Reentrant Functions, Prev: Reentrant Example, Up: Reentrant 3230 323119.4 The Reentrant API in Detail 3232================================ 3233 3234Here are the things you need to do or know to use the reentrant C API of 3235'flex'. 3236 3237* Menu: 3238 3239* Specify Reentrant:: 3240* Extra Reentrant Argument:: 3241* Global Replacement:: 3242* Init and Destroy Functions:: 3243* Accessor Methods:: 3244* Extra Data:: 3245* About yyscan_t:: 3246 3247 3248File: flex.info, Node: Specify Reentrant, Next: Extra Reentrant Argument, Prev: Reentrant Detail, Up: Reentrant Detail 3249 325019.4.1 Declaring a Scanner As Reentrant 3251--------------------------------------- 3252 3253%option reentrant (-reentrant) must be specified. 3254 3255 Notice that '%option reentrant' is specified in the above example 3256(*note Reentrant Example::. Had this option not been specified, 'flex' 3257would have happily generated a non-reentrant scanner without 3258complaining. You may explicitly specify '%option noreentrant', if you 3259do _not_ want a reentrant scanner, although it is not necessary. The 3260default is to generate a non-reentrant scanner. 3261 3262 3263File: flex.info, Node: Extra Reentrant Argument, Next: Global Replacement, Prev: Specify Reentrant, Up: Reentrant Detail 3264 326519.4.2 The Extra Argument 3266------------------------- 3267 3268All functions take one additional argument: 'yyscanner'. 3269 3270 Notice that the calls to 'yy_push_state' and 'yy_pop_state' both have 3271an argument, 'yyscanner' , that is not present in a non-reentrant 3272scanner. Here are the declarations of 'yy_push_state' and 3273'yy_pop_state' in the reentrant scanner: 3274 3275 static void yy_push_state ( int new_state , yyscan_t yyscanner ) ; 3276 static void yy_pop_state ( yyscan_t yyscanner ) ; 3277 3278 Notice that the argument 'yyscanner' appears in the declaration of 3279both functions. In fact, all 'flex' functions in a reentrant scanner 3280have this additional argument. It is always the last argument in the 3281argument list, it is always of type 'yyscan_t' (which is typedef'd to 3282'void *') and it is always named 'yyscanner'. As you may have guessed, 3283'yyscanner' is a pointer to an opaque data structure encapsulating the 3284current state of the scanner. For a list of function declarations, see 3285*note Reentrant Functions::. Note that preprocessor macros, such as 3286'BEGIN', 'ECHO', and 'REJECT', do not take this additional argument. 3287 3288 3289File: flex.info, Node: Global Replacement, Next: Init and Destroy Functions, Prev: Extra Reentrant Argument, Up: Reentrant Detail 3290 329119.4.3 Global Variables Replaced By Macros 3292------------------------------------------ 3293 3294All global variables in traditional flex have been replaced by macro 3295equivalents. 3296 3297 Note that in the above example, 'yyout' and 'yytext' are not plain 3298variables. These are macros that will expand to their equivalent 3299lvalue. All of the familiar 'flex' globals have been replaced by their 3300macro equivalents. In particular, 'yytext', 'yyleng', 'yylineno', 3301'yyin', 'yyout', 'yyextra', 'yylval', and 'yylloc' are macros. You may 3302safely use these macros in actions as if they were plain variables. We 3303only tell you this so you don't expect to link to these variables 3304externally. Currently, each macro expands to a member of an internal 3305struct, e.g., 3306 3307 #define yytext (((struct yyguts_t*)yyscanner)->yytext_r) 3308 3309 One important thing to remember about 'yytext' and friends is that 3310'yytext' is not a global variable in a reentrant scanner, you can not 3311access it directly from outside an action or from other functions. You 3312must use an accessor method, e.g., 'yyget_text', to accomplish this. 3313(See below). 3314 3315 3316File: flex.info, Node: Init and Destroy Functions, Next: Accessor Methods, Prev: Global Replacement, Up: Reentrant Detail 3317 331819.4.4 Init and Destroy Functions 3319--------------------------------- 3320 3321'yylex_init' and 'yylex_destroy' must be called before and after 3322'yylex', respectively. 3323 3324 int yylex_init ( yyscan_t * ptr_yy_globals ) ; 3325 int yylex_init_extra ( YY_EXTRA_TYPE user_defined, yyscan_t * ptr_yy_globals ) ; 3326 int yylex ( yyscan_t yyscanner ) ; 3327 int yylex_destroy ( yyscan_t yyscanner ) ; 3328 3329 The function 'yylex_init' must be called before calling any other 3330function. The argument to 'yylex_init' is the address of an 3331uninitialized pointer to be filled in by 'yylex_init', overwriting any 3332previous contents. The function 'yylex_init_extra' may be used instead, 3333taking as its first argument a variable of type 'YY_EXTRA_TYPE'. See 3334the section on yyextra, below, for more details. 3335 3336 The value stored in 'ptr_yy_globals' should thereafter be passed to 3337'yylex' and 'yylex_destroy'. Flex does not save the argument passed to 3338'yylex_init', so it is safe to pass the address of a local pointer to 3339'yylex_init' so long as it remains in scope for the duration of all 3340calls to the scanner, up to and including the call to 'yylex_destroy'. 3341 3342 The function 'yylex' should be familiar to you by now. The reentrant 3343version takes one argument, which is the value returned (via an 3344argument) by 'yylex_init'. Otherwise, it behaves the same as the 3345non-reentrant version of 'yylex'. 3346 3347 Both 'yylex_init' and 'yylex_init_extra' returns 0 (zero) on success, 3348or non-zero on failure, in which case errno is set to one of the 3349following values: 3350 3351 * ENOMEM Memory allocation error. *Note memory-management::. 3352 * EINVAL Invalid argument. 3353 3354 The function 'yylex_destroy' should be called to free resources used 3355by the scanner. After 'yylex_destroy' is called, the contents of 3356'yyscanner' should not be used. Of course, there is no need to destroy 3357a scanner if you plan to reuse it. A 'flex' scanner (both reentrant and 3358non-reentrant) may be restarted by calling 'yyrestart'. 3359 3360 Below is an example of a program that creates a scanner, uses it, 3361then destroys it when done: 3362 3363 int main () 3364 { 3365 yyscan_t scanner; 3366 int tok; 3367 3368 yylex_init(&scanner); 3369 3370 while ((tok=yylex(scanner)) > 0) 3371 printf("tok=%d yytext=%s\n", tok, yyget_text(scanner)); 3372 3373 yylex_destroy(scanner); 3374 return 0; 3375 } 3376 3377 3378File: flex.info, Node: Accessor Methods, Next: Extra Data, Prev: Init and Destroy Functions, Up: Reentrant Detail 3379 338019.4.5 Accessing Variables with Reentrant Scanners 3381-------------------------------------------------- 3382 3383Accessor methods (get/set functions) provide access to common 'flex' 3384variables. 3385 3386 Many scanners that you build will be part of a larger project. 3387Portions of your project will need access to 'flex' values, such as 3388'yytext'. In a non-reentrant scanner, these values are global, so there 3389is no problem accessing them. However, in a reentrant scanner, there 3390are no global 'flex' values. You can not access them directly. 3391Instead, you must access 'flex' values using accessor methods (get/set 3392functions). Each accessor method is named 'yyget_NAME' or 'yyset_NAME', 3393where 'NAME' is the name of the 'flex' variable you want. For example: 3394 3395 /* Set the last character of yytext to NULL. */ 3396 void chop ( yyscan_t scanner ) 3397 { 3398 int len = yyget_leng( scanner ); 3399 yyget_text( scanner )[len - 1] = '\0'; 3400 } 3401 3402 The above code may be called from within an action like this: 3403 3404 %% 3405 .+\n { chop( yyscanner );} 3406 3407 You may find that '%option header-file' is particularly useful for 3408generating prototypes of all the accessor functions. *Note 3409option-header::. 3410 3411 3412File: flex.info, Node: Extra Data, Next: About yyscan_t, Prev: Accessor Methods, Up: Reentrant Detail 3413 341419.4.6 Extra Data 3415----------------- 3416 3417User-specific data can be stored in 'yyextra'. 3418 3419 In a reentrant scanner, it is unwise to use global variables to 3420communicate with or maintain state between different pieces of your 3421program. However, you may need access to external data or invoke 3422external functions from within the scanner actions. Likewise, you may 3423need to pass information to your scanner (e.g., open file descriptors, 3424or database connections). In a non-reentrant scanner, the only way to 3425do this would be through the use of global variables. 'Flex' allows you 3426to store arbitrary, "extra" data in a scanner. This data is accessible 3427through the accessor methods 'yyget_extra' and 'yyset_extra' from 3428outside the scanner, and through the shortcut macro 'yyextra' from 3429within the scanner itself. They are defined as follows: 3430 3431 #define YY_EXTRA_TYPE void* 3432 YY_EXTRA_TYPE yyget_extra ( yyscan_t scanner ); 3433 void yyset_extra ( YY_EXTRA_TYPE arbitrary_data , yyscan_t scanner); 3434 3435 In addition, an extra form of 'yylex_init' is provided, 3436'yylex_init_extra'. This function is provided so that the yyextra value 3437can be accessed from within the very first yyalloc, used to allocate the 3438scanner itself. 3439 3440 By default, 'YY_EXTRA_TYPE' is defined as type 'void *'. You may 3441redefine this type using '%option extra-type="your_type"' in the 3442scanner: 3443 3444 /* An example of overriding YY_EXTRA_TYPE. */ 3445 %{ 3446 #include <sys/stat.h> 3447 #include <unistd.h> 3448 %} 3449 %option reentrant 3450 %option extra-type="struct stat *" 3451 %% 3452 3453 __filesize__ printf( "%ld", yyextra->st_size ); 3454 __lastmod__ printf( "%ld", yyextra->st_mtime ); 3455 %% 3456 void scan_file( char* filename ) 3457 { 3458 yyscan_t scanner; 3459 struct stat buf; 3460 FILE *in; 3461 3462 in = fopen( filename, "r" ); 3463 stat( filename, &buf ); 3464 3465 yylex_init_extra( buf, &scanner ); 3466 yyset_in( in, scanner ); 3467 yylex( scanner ); 3468 yylex_destroy( scanner ); 3469 3470 fclose( in ); 3471 } 3472 3473 3474File: flex.info, Node: About yyscan_t, Prev: Extra Data, Up: Reentrant Detail 3475 347619.4.7 About yyscan_t 3477--------------------- 3478 3479'yyscan_t' is defined as: 3480 3481 typedef void* yyscan_t; 3482 3483 It is initialized by 'yylex_init()' to point to an internal 3484structure. You should never access this value directly. In particular, 3485you should never attempt to free it (use 'yylex_destroy()' instead.) 3486 3487 3488File: flex.info, Node: Reentrant Functions, Prev: Reentrant Detail, Up: Reentrant 3489 349019.5 Functions and Macros Available in Reentrant C Scanners 3491=========================================================== 3492 3493The following Functions are available in a reentrant scanner: 3494 3495 char *yyget_text ( yyscan_t scanner ); 3496 int yyget_leng ( yyscan_t scanner ); 3497 FILE *yyget_in ( yyscan_t scanner ); 3498 FILE *yyget_out ( yyscan_t scanner ); 3499 int yyget_lineno ( yyscan_t scanner ); 3500 YY_EXTRA_TYPE yyget_extra ( yyscan_t scanner ); 3501 int yyget_debug ( yyscan_t scanner ); 3502 3503 void yyset_debug ( int flag, yyscan_t scanner ); 3504 void yyset_in ( FILE * in_str , yyscan_t scanner ); 3505 void yyset_out ( FILE * out_str , yyscan_t scanner ); 3506 void yyset_lineno ( int line_number , yyscan_t scanner ); 3507 void yyset_extra ( YY_EXTRA_TYPE user_defined , yyscan_t scanner ); 3508 3509 There are no "set" functions for yytext and yyleng. This is 3510intentional. 3511 3512 The following Macro shortcuts are available in actions in a reentrant 3513scanner: 3514 3515 yytext 3516 yyleng 3517 yyin 3518 yyout 3519 yylineno 3520 yyextra 3521 yy_flex_debug 3522 3523 In a reentrant C scanner, support for yylineno is always present 3524(i.e., you may access yylineno), but the value is never modified by 3525'flex' unless '%option yylineno' is enabled. This is to allow the user 3526to maintain the line count independently of 'flex'. 3527 3528 The following functions and macros are made available when '%option 3529bison-bridge' ('--bison-bridge') is specified: 3530 3531 YYSTYPE * yyget_lval ( yyscan_t scanner ); 3532 void yyset_lval ( YYSTYPE * yylvalp , yyscan_t scanner ); 3533 yylval 3534 3535 The following functions and macros are made available when '%option 3536bison-locations' ('--bison-locations') is specified: 3537 3538 YYLTYPE *yyget_lloc ( yyscan_t scanner ); 3539 void yyset_lloc ( YYLTYPE * yyllocp , yyscan_t scanner ); 3540 yylloc 3541 3542 Support for yylval assumes that 'YYSTYPE' is a valid type. Support 3543for yylloc assumes that 'YYSLYPE' is a valid type. Typically, these 3544types are generated by 'bison', and are included in section 1 of the 3545'flex' input. 3546 3547 3548File: flex.info, Node: Lex and Posix, Next: Memory Management, Prev: Reentrant, Up: Top 3549 355020 Incompatibilities with Lex and Posix 3551*************************************** 3552 3553'flex' is a rewrite of the AT&T Unix _lex_ tool (the two implementations 3554do not share any code, though), with some extensions and 3555incompatibilities, both of which are of concern to those who wish to 3556write scanners acceptable to both implementations. 'flex' is fully 3557compliant with the POSIX 'lex' specification, except that when using 3558'%pointer' (the default), a call to 'unput()' destroys the contents of 3559'yytext', which is counter to the POSIX specification. In this section 3560we discuss all of the known areas of incompatibility between 'flex', 3561AT&T 'lex', and the POSIX specification. 'flex''s '-l' option turns on 3562maximum compatibility with the original AT&T 'lex' implementation, at 3563the cost of a major loss in the generated scanner's performance. We 3564note below which incompatibilities can be overcome using the '-l' 3565option. 'flex' is fully compatible with 'lex' with the following 3566exceptions: 3567 3568 * The undocumented 'lex' scanner internal variable 'yylineno' is not 3569 supported unless '-l' or '%option yylineno' is used. 3570 3571 * 'yylineno' should be maintained on a per-buffer basis, rather than 3572 a per-scanner (single global variable) basis. 3573 3574 * 'yylineno' is not part of the POSIX specification. 3575 3576 * The 'input()' routine is not redefinable, though it may be called 3577 to read characters following whatever has been matched by a rule. 3578 If 'input()' encounters an end-of-file the normal 'yywrap()' 3579 processing is done. A "real" end-of-file is returned by 'input()' 3580 as 'EOF'. 3581 3582 * Input is instead controlled by defining the 'YY_INPUT()' macro. 3583 3584 * The 'flex' restriction that 'input()' cannot be redefined is in 3585 accordance with the POSIX specification, which simply does not 3586 specify any way of controlling the scanner's input other than by 3587 making an initial assignment to 'yyin'. 3588 3589 * The 'unput()' routine is not redefinable. This restriction is in 3590 accordance with POSIX. 3591 3592 * 'flex' scanners are not as reentrant as 'lex' scanners. In 3593 particular, if you have an interactive scanner and an interrupt 3594 handler which long-jumps out of the scanner, and the scanner is 3595 subsequently called again, you may get the following message: 3596 3597 fatal flex scanner internal error--end of buffer missed 3598 3599 To reenter the scanner, first use: 3600 3601 yyrestart( yyin ); 3602 3603 Note that this call will throw away any buffered input; usually 3604 this isn't a problem with an interactive scanner. *Note 3605 Reentrant::, for 'flex''s reentrant API. 3606 3607 * Also note that 'flex' C++ scanner classes _are_ reentrant, so if 3608 using C++ is an option for you, you should use them instead. *Note 3609 Cxx::, and *note Reentrant:: for details. 3610 3611 * 'output()' is not supported. Output from the ECHO macro is done to 3612 the file-pointer 'yyout' (default 'stdout)'. 3613 3614 * 'output()' is not part of the POSIX specification. 3615 3616 * 'lex' does not support exclusive start conditions (%x), though they 3617 are in the POSIX specification. 3618 3619 * When definitions are expanded, 'flex' encloses them in parentheses. 3620 With 'lex', the following: 3621 3622 NAME [A-Z][A-Z0-9]* 3623 %% 3624 foo{NAME}? printf( "Found it\n" ); 3625 %% 3626 3627 will not match the string 'foo' because when the macro is expanded 3628 the rule is equivalent to 'foo[A-Z][A-Z0-9]*?' and the precedence 3629 is such that the '?' is associated with '[A-Z0-9]*'. With 'flex', 3630 the rule will be expanded to 'foo([A-Z][A-Z0-9]*)?' and so the 3631 string 'foo' will match. 3632 3633 * Note that if the definition begins with '^' or ends with '$' then 3634 it is _not_ expanded with parentheses, to allow these operators to 3635 appear in definitions without losing their special meanings. But 3636 the '<s>', '/', and '<<EOF>>' operators cannot be used in a 'flex' 3637 definition. 3638 3639 * Using '-l' results in the 'lex' behavior of no parentheses around 3640 the definition. 3641 3642 * The POSIX specification is that the definition be enclosed in 3643 parentheses. 3644 3645 * Some implementations of 'lex' allow a rule's action to begin on a 3646 separate line, if the rule's pattern has trailing whitespace: 3647 3648 %% 3649 foo|bar<space here> 3650 { foobar_action();} 3651 3652 'flex' does not support this feature. 3653 3654 * The 'lex' '%r' (generate a Ratfor scanner) option is not supported. 3655 It is not part of the POSIX specification. 3656 3657 * After a call to 'unput()', _yytext_ is undefined until the next 3658 token is matched, unless the scanner was built using '%array'. 3659 This is not the case with 'lex' or the POSIX specification. The 3660 '-l' option does away with this incompatibility. 3661 3662 * The precedence of the '{,}' (numeric range) operator is different. 3663 The AT&T and POSIX specifications of 'lex' interpret 'abc{1,3}' as 3664 match one, two, or three occurrences of 'abc'", whereas 'flex' 3665 interprets it as "match 'ab' followed by one, two, or three 3666 occurrences of 'c'". The '-l' and '--posix' options do away with 3667 this incompatibility. 3668 3669 * The precedence of the '^' operator is different. 'lex' interprets 3670 '^foo|bar' as "match either 'foo' at the beginning of a line, or 3671 'bar' anywhere", whereas 'flex' interprets it as "match either 3672 'foo' or 'bar' if they come at the beginning of a line". The 3673 latter is in agreement with the POSIX specification. 3674 3675 * The special table-size declarations such as '%a' supported by 'lex' 3676 are not required by 'flex' scanners.. 'flex' ignores them. 3677 * The name 'FLEX_SCANNER' is '#define''d so scanners may be written 3678 for use with either 'flex' or 'lex'. Scanners also include 3679 'YY_FLEX_MAJOR_VERSION', 'YY_FLEX_MINOR_VERSION' and 3680 'YY_FLEX_SUBMINOR_VERSION' indicating which version of 'flex' 3681 generated the scanner. For example, for the 2.5.22 release, these 3682 defines would be 2, 5 and 22 respectively. If the version of 3683 'flex' being used is a beta version, then the symbol 'FLEX_BETA' is 3684 defined. 3685 3686 * The symbols '[[' and ']]' in the code sections of the input may 3687 conflict with the m4 delimiters. *Note M4 Dependency::. 3688 3689 The following 'flex' features are not included in 'lex' or the POSIX 3690specification: 3691 3692 * C++ scanners 3693 * %option 3694 * start condition scopes 3695 * start condition stacks 3696 * interactive/non-interactive scanners 3697 * yy_scan_string() and friends 3698 * yyterminate() 3699 * yy_set_interactive() 3700 * yy_set_bol() 3701 * YY_AT_BOL() <<EOF>> 3702 * <*> 3703 * YY_DECL 3704 * YY_START 3705 * YY_USER_ACTION 3706 * YY_USER_INIT 3707 * #line directives 3708 * %{}'s around actions 3709 * reentrant C API 3710 * multiple actions on a line 3711 * almost all of the 'flex' command-line options 3712 3713 The feature "multiple actions on a line" refers to the fact that with 3714'flex' you can put multiple actions on the same line, separated with 3715semi-colons, while with 'lex', the following: 3716 3717 foo handle_foo(); ++num_foos_seen; 3718 3719 is (rather surprisingly) truncated to 3720 3721 foo handle_foo(); 3722 3723 'flex' does not truncate the action. Actions that are not enclosed 3724in braces are simply terminated at the end of the line. 3725 3726 3727File: flex.info, Node: Memory Management, Next: Serialized Tables, Prev: Lex and Posix, Up: Top 3728 372921 Memory Management 3730******************** 3731 3732This chapter describes how flex handles dynamic memory, and how you can 3733override the default behavior. 3734 3735* Menu: 3736 3737* The Default Memory Management:: 3738* Overriding The Default Memory Management:: 3739* A Note About yytext And Memory:: 3740 3741 3742File: flex.info, Node: The Default Memory Management, Next: Overriding The Default Memory Management, Prev: Memory Management, Up: Memory Management 3743 374421.1 The Default Memory Management 3745================================== 3746 3747Flex allocates dynamic memory during initialization, and once in a while 3748from within a call to yylex(). Initialization takes place during the 3749first call to yylex(). Thereafter, flex may reallocate more memory if 3750it needs to enlarge a buffer. As of version 2.5.9 Flex will clean up 3751all memory when you call 'yylex_destroy' *Note faq-memory-leak::. 3752 3753 Flex allocates dynamic memory for four purposes, listed below (1) 3754 375516kB for the input buffer. 3756 Flex allocates memory for the character buffer used to perform 3757 pattern matching. Flex must read ahead from the input stream and 3758 store it in a large character buffer. This buffer is typically the 3759 largest chunk of dynamic memory flex consumes. This buffer will 3760 grow if necessary, doubling the size each time. Flex frees this 3761 memory when you call yylex_destroy(). The default size of this 3762 buffer (16384 bytes) is almost always too large. The ideal size 3763 for this buffer is the length of the longest token expected, in 3764 bytes, plus a little more. Flex will allocate a few extra bytes 3765 for housekeeping. Currently, to override the size of the input 3766 buffer you must '#define YY_BUF_SIZE' to whatever number of bytes 3767 you want. We don't plan to change this in the near future, but we 3768 reserve the right to do so if we ever add a more robust memory 3769 management API. 3770 377164kb for the REJECT state. This will only be allocated if you use REJECT. 3772 The size is large enough to hold the same number of states as 3773 characters in the input buffer. If you override the size of the 3774 input buffer (via 'YY_BUF_SIZE'), then you automatically override 3775 the size of this buffer as well. 3776 3777100 bytes for the start condition stack. 3778 Flex allocates memory for the start condition stack. This is the 3779 stack used for pushing start states, i.e., with yy_push_state(). 3780 It will grow if necessary. Since the states are simply integers, 3781 this stack doesn't consume much memory. This stack is not present 3782 if '%option stack' is not specified. You will rarely need to tune 3783 this buffer. The ideal size for this stack is the maximum depth 3784 expected. The memory for this stack is automatically destroyed 3785 when you call yylex_destroy(). *Note option-stack::. 3786 378740 bytes for each YY_BUFFER_STATE. 3788 Flex allocates memory for each YY_BUFFER_STATE. The buffer state 3789 itself is about 40 bytes, plus an additional large character buffer 3790 (described above.) The initial buffer state is created during 3791 initialization, and with each call to yy_create_buffer(). You 3792 can't tune the size of this, but you can tune the character buffer 3793 as described above. Any buffer state that you explicitly create by 3794 calling yy_create_buffer() is _NOT_ destroyed automatically. You 3795 must call yy_delete_buffer() to free the memory. The exception to 3796 this rule is that flex will delete the current buffer automatically 3797 when you call yylex_destroy(). If you delete the current buffer, 3798 be sure to set it to NULL. That way, flex will not try to delete 3799 the buffer a second time (possibly crashing your program!) At the 3800 time of this writing, flex does not provide a growable stack for 3801 the buffer states. You have to manage that yourself. *Note 3802 Multiple Input Buffers::. 3803 380484 bytes for the reentrant scanner guts 3805 Flex allocates about 84 bytes for the reentrant scanner structure 3806 when you call yylex_init(). It is destroyed when the user calls 3807 yylex_destroy(). 3808 3809 ---------- Footnotes ---------- 3810 3811 (1) The quantities given here are approximate, and may vary due to 3812host architecture, compiler configuration, or due to future enhancements 3813to flex. 3814 3815 3816File: flex.info, Node: Overriding The Default Memory Management, Next: A Note About yytext And Memory, Prev: The Default Memory Management, Up: Memory Management 3817 381821.2 Overriding The Default Memory Management 3819============================================= 3820 3821Flex calls the functions 'yyalloc', 'yyrealloc', and 'yyfree' when it 3822needs to allocate or free memory. By default, these functions are 3823wrappers around the standard C functions, 'malloc', 'realloc', and 3824'free', respectively. You can override the default implementations by 3825telling flex that you will provide your own implementations. 3826 3827 To override the default implementations, you must do two things: 3828 3829 1. Suppress the default implementations by specifying one or more of 3830 the following options: 3831 3832 * '%option noyyalloc' 3833 * '%option noyyrealloc' 3834 * '%option noyyfree'. 3835 3836 2. Provide your own implementation of the following functions: (1) 3837 3838 // For a non-reentrant scanner 3839 void * yyalloc (size_t bytes); 3840 void * yyrealloc (void * ptr, size_t bytes); 3841 void yyfree (void * ptr); 3842 3843 // For a reentrant scanner 3844 void * yyalloc (size_t bytes, void * yyscanner); 3845 void * yyrealloc (void * ptr, size_t bytes, void * yyscanner); 3846 void yyfree (void * ptr, void * yyscanner); 3847 3848 In the following example, we will override all three memory routines. 3849We assume that there is a custom allocator with garbage collection. In 3850order to make this example interesting, we will use a reentrant scanner, 3851passing a pointer to the custom allocator through 'yyextra'. 3852 3853 %{ 3854 #include "some_allocator.h" 3855 %} 3856 3857 /* Suppress the default implementations. */ 3858 %option noyyalloc noyyrealloc noyyfree 3859 %option reentrant 3860 3861 /* Initialize the allocator. */ 3862 %{ 3863 #define YY_EXTRA_TYPE struct allocator* 3864 #define YY_USER_INIT yyextra = allocator_create(); 3865 %} 3866 3867 %% 3868 .|\n ; 3869 %% 3870 3871 /* Provide our own implementations. */ 3872 void * yyalloc (size_t bytes, void* yyscanner) { 3873 return allocator_alloc (yyextra, bytes); 3874 } 3875 3876 void * yyrealloc (void * ptr, size_t bytes, void* yyscanner) { 3877 return allocator_realloc (yyextra, bytes); 3878 } 3879 3880 void yyfree (void * ptr, void * yyscanner) { 3881 /* Do nothing -- we leave it to the garbage collector. */ 3882 } 3883 3884 3885 ---------- Footnotes ---------- 3886 3887 (1) It is not necessary to override all (or any) of the memory 3888management routines. You may, for example, override 'yyrealloc', but 3889not 'yyfree' or 'yyalloc'. 3890 3891 3892File: flex.info, Node: A Note About yytext And Memory, Prev: Overriding The Default Memory Management, Up: Memory Management 3893 389421.3 A Note About yytext And Memory 3895=================================== 3896 3897When flex finds a match, 'yytext' points to the first character of the 3898match in the input buffer. The string itself is part of the input 3899buffer, and is _NOT_ allocated separately. The value of yytext will be 3900overwritten the next time yylex() is called. In short, the value of 3901yytext is only valid from within the matched rule's action. 3902 3903 Often, you want the value of yytext to persist for later processing, 3904i.e., by a parser with non-zero lookahead. In order to preserve yytext, 3905you will have to copy it with strdup() or a similar function. But this 3906introduces some headache because your parser is now responsible for 3907freeing the copy of yytext. If you use a yacc or bison parser, 3908(commonly used with flex), you will discover that the error recovery 3909mechanisms can cause memory to be leaked. 3910 3911 To prevent memory leaks from strdup'd yytext, you will have to track 3912the memory somehow. Our experience has shown that a garbage collection 3913mechanism or a pooled memory mechanism will save you a lot of grief when 3914writing parsers. 3915 3916 3917File: flex.info, Node: Serialized Tables, Next: Diagnostics, Prev: Memory Management, Up: Top 3918 391922 Serialized Tables 3920******************** 3921 3922A 'flex' scanner has the ability to save the DFA tables to a file, and 3923load them at runtime when needed. The motivation for this feature is to 3924reduce the runtime memory footprint. Traditionally, these tables have 3925been compiled into the scanner as C arrays, and are sometimes quite 3926large. Since the tables are compiled into the scanner, the memory used 3927by the tables can never be freed. This is a waste of memory, especially 3928if an application uses several scanners, but none of them at the same 3929time. 3930 3931 The serialization feature allows the tables to be loaded at runtime, 3932before scanning begins. The tables may be discarded when scanning is 3933finished. 3934 3935* Menu: 3936 3937* Creating Serialized Tables:: 3938* Loading and Unloading Serialized Tables:: 3939* Tables File Format:: 3940 3941 3942File: flex.info, Node: Creating Serialized Tables, Next: Loading and Unloading Serialized Tables, Prev: Serialized Tables, Up: Serialized Tables 3943 394422.1 Creating Serialized Tables 3945=============================== 3946 3947You may create a scanner with serialized tables by specifying: 3948 3949 %option tables-file=FILE 3950 or 3951 --tables-file=FILE 3952 3953 These options instruct flex to save the DFA tables to the file FILE. 3954The tables will _not_ be embedded in the generated scanner. The scanner 3955will not function on its own. The scanner will be dependent upon the 3956serialized tables. You must load the tables from this file at runtime 3957before you can scan anything. 3958 3959 If you do not specify a filename to '--tables-file', the tables will 3960be saved to 'lex.yy.tables', where 'yy' is the appropriate prefix. 3961 3962 If your project uses several different scanners, you can concatenate 3963the serialized tables into one file, and flex will find the correct set 3964of tables, using the scanner prefix as part of the lookup key. An 3965example follows: 3966 3967 $ flex --tables-file --prefix=cpp cpp.l 3968 $ flex --tables-file --prefix=c c.l 3969 $ cat lex.cpp.tables lex.c.tables > all.tables 3970 3971 The above example created two scanners, 'cpp', and 'c'. Since we did 3972not specify a filename, the tables were serialized to 'lex.c.tables' and 3973'lex.cpp.tables', respectively. Then, we concatenated the two files 3974together into 'all.tables', which we will distribute with our project. 3975At runtime, we will open the file and tell flex to load the tables from 3976it. Flex will find the correct tables automatically. (See next 3977section). 3978 3979 3980File: flex.info, Node: Loading and Unloading Serialized Tables, Next: Tables File Format, Prev: Creating Serialized Tables, Up: Serialized Tables 3981 398222.2 Loading and Unloading Serialized Tables 3983============================================ 3984 3985If you've built your scanner with '%option tables-file', then you must 3986load the scanner tables at runtime. This can be accomplished with the 3987following function: 3988 3989 -- Function: int yytables_fload (FILE* FP [, yyscan_t SCANNER]) 3990 Locates scanner tables in the stream pointed to by FP and loads 3991 them. Memory for the tables is allocated via 'yyalloc'. You must 3992 call this function before the first call to 'yylex'. The argument 3993 SCANNER only appears in the reentrant scanner. This function 3994 returns '0' (zero) on success, or non-zero on error. 3995 3996 The loaded tables are *not* automatically destroyed (unloaded) when 3997you call 'yylex_destroy'. The reason is that you may create several 3998scanners of the same type (in a reentrant scanner), each of which needs 3999access to these tables. To avoid a nasty memory leak, you must call the 4000following function: 4001 4002 -- Function: int yytables_destroy ([yyscan_t SCANNER]) 4003 Unloads the scanner tables. The tables must be loaded again before 4004 you can scan any more data. The argument SCANNER only appears in 4005 the reentrant scanner. This function returns '0' (zero) on 4006 success, or non-zero on error. 4007 4008 *The functions 'yytables_fload' and 'yytables_destroy' are not 4009thread-safe.* You must ensure that these functions are called exactly 4010once (for each scanner type) in a threaded program, before any thread 4011calls 'yylex'. After the tables are loaded, they are never written to, 4012and no thread protection is required thereafter - until you destroy 4013them. 4014 4015 4016File: flex.info, Node: Tables File Format, Prev: Loading and Unloading Serialized Tables, Up: Serialized Tables 4017 401822.3 Tables File Format 4019======================= 4020 4021This section defines the file format of serialized 'flex' tables. 4022 4023 The tables format allows for one or more sets of tables to be 4024specified, where each set corresponds to a given scanner. Scanners are 4025indexed by name, as described below. The file format is as follows: 4026 4027 TABLE SET 1 4028 +-------------------------------+ 4029 Header | uint32 th_magic; | 4030 | uint32 th_hsize; | 4031 | uint32 th_ssize; | 4032 | uint16 th_flags; | 4033 | char th_version[]; | 4034 | char th_name[]; | 4035 | uint8 th_pad64[]; | 4036 +-------------------------------+ 4037 Table 1 | uint16 td_id; | 4038 | uint16 td_flags; | 4039 | uint32 td_hilen; | 4040 | uint32 td_lolen; | 4041 | void td_data[]; | 4042 | uint8 td_pad64[]; | 4043 +-------------------------------+ 4044 Table 2 | | 4045 . . . 4046 . . . 4047 . . . 4048 . . . 4049 Table n | | 4050 +-------------------------------+ 4051 TABLE SET 2 4052 . 4053 . 4054 . 4055 TABLE SET N 4056 4057 The above diagram shows that a complete set of tables consists of a 4058header followed by multiple individual tables. Furthermore, multiple 4059complete sets may be present in the same file, each set with its own 4060header and tables. The sets are contiguous in the file. The only way 4061to know if another set follows is to check the next four bytes for the 4062magic number (or check for EOF). The header and tables sections are 4063padded to 64-bit boundaries. Below we describe each field in detail. 4064This format does not specify how the scanner will expand the given data, 4065i.e., data may be serialized as int8, but expanded to an int32 array at 4066runtime. This is to reduce the size of the serialized data where 4067possible. Remember, _all integer values are in network byte order_. 4068 4069Fields of a table header: 4070 4071'th_magic' 4072 Magic number, always 0xF13C57B1. 4073 4074'th_hsize' 4075 Size of this entire header, in bytes, including all fields plus any 4076 padding. 4077 4078'th_ssize' 4079 Size of this entire set, in bytes, including the header, all 4080 tables, plus any padding. 4081 4082'th_flags' 4083 Bit flags for this table set. Currently unused. 4084 4085'th_version[]' 4086 Flex version in NULL-terminated string format. e.g., '2.5.13a'. 4087 This is the version of flex that was used to create the serialized 4088 tables. 4089 4090'th_name[]' 4091 Contains the name of this table set. The default is 'yytables', 4092 and is prefixed accordingly, e.g., 'footables'. Must be 4093 NULL-terminated. 4094 4095'th_pad64[]' 4096 Zero or more NULL bytes, padding the entire header to the next 4097 64-bit boundary as calculated from the beginning of the header. 4098 4099Fields of a table: 4100 4101'td_id' 4102 Specifies the table identifier. Possible values are: 4103 'YYTD_ID_ACCEPT (0x01)' 4104 'yy_accept' 4105 'YYTD_ID_BASE (0x02)' 4106 'yy_base' 4107 'YYTD_ID_CHK (0x03)' 4108 'yy_chk' 4109 'YYTD_ID_DEF (0x04)' 4110 'yy_def' 4111 'YYTD_ID_EC (0x05)' 4112 'yy_ec ' 4113 'YYTD_ID_META (0x06)' 4114 'yy_meta' 4115 'YYTD_ID_NUL_TRANS (0x07)' 4116 'yy_NUL_trans' 4117 'YYTD_ID_NXT (0x08)' 4118 'yy_nxt'. This array may be two dimensional. See the 4119 'td_hilen' field below. 4120 'YYTD_ID_RULE_CAN_MATCH_EOL (0x09)' 4121 'yy_rule_can_match_eol' 4122 'YYTD_ID_START_STATE_LIST (0x0A)' 4123 'yy_start_state_list'. This array is handled specially 4124 because it is an array of pointers to structs. See the 4125 'td_flags' field below. 4126 'YYTD_ID_TRANSITION (0x0B)' 4127 'yy_transition'. This array is handled specially because it 4128 is an array of structs. See the 'td_lolen' field below. 4129 'YYTD_ID_ACCLIST (0x0C)' 4130 'yy_acclist' 4131 4132'td_flags' 4133 Bit flags describing how to interpret the data in 'td_data'. The 4134 data arrays are one-dimensional by default, but may be two 4135 dimensional as specified in the 'td_hilen' field. 4136 4137 'YYTD_DATA8 (0x01)' 4138 The data is serialized as an array of type int8. 4139 'YYTD_DATA16 (0x02)' 4140 The data is serialized as an array of type int16. 4141 'YYTD_DATA32 (0x04)' 4142 The data is serialized as an array of type int32. 4143 'YYTD_PTRANS (0x08)' 4144 The data is a list of indexes of entries in the expanded 4145 'yy_transition' array. Each index should be expanded to a 4146 pointer to the corresponding entry in the 'yy_transition' 4147 array. We count on the fact that the 'yy_transition' array 4148 has already been seen. 4149 'YYTD_STRUCT (0x10)' 4150 The data is a list of yy_trans_info structs, each of which 4151 consists of two integers. There is no padding between struct 4152 elements or between structs. The type of each member is 4153 determined by the 'YYTD_DATA*' bits. 4154 4155'td_hilen' 4156 If 'td_hilen' is non-zero, then the data is a two-dimensional 4157 array. Otherwise, the data is a one-dimensional array. 'td_hilen' 4158 contains the number of elements in the higher dimensional array, 4159 and 'td_lolen' contains the number of elements in the lowest 4160 dimension. 4161 4162 Conceptually, 'td_data' is either 'sometype td_data[td_lolen]', or 4163 'sometype td_data[td_hilen][td_lolen]', where 'sometype' is 4164 specified by the 'td_flags' field. It is possible for both 4165 'td_lolen' and 'td_hilen' to be zero, in which case 'td_data' is a 4166 zero length array, and no data is loaded, i.e., this table is 4167 simply skipped. Flex does not currently generate tables of zero 4168 length. 4169 4170'td_lolen' 4171 Specifies the number of elements in the lowest dimension array. If 4172 this is a one-dimensional array, then it is simply the number of 4173 elements in this array. The element size is determined by the 4174 'td_flags' field. 4175 4176'td_data[]' 4177 The table data. This array may be a one- or two-dimensional array, 4178 of type 'int8', 'int16', 'int32', 'struct yy_trans_info', or 4179 'struct yy_trans_info*', depending upon the values in the 4180 'td_flags', 'td_hilen', and 'td_lolen' fields. 4181 4182'td_pad64[]' 4183 Zero or more NULL bytes, padding the entire table to the next 4184 64-bit boundary as calculated from the beginning of this table. 4185 4186 4187File: flex.info, Node: Diagnostics, Next: Limitations, Prev: Serialized Tables, Up: Top 4188 418923 Diagnostics 4190************** 4191 4192The following is a list of 'flex' diagnostic messages: 4193 4194 * 'warning, rule cannot be matched' indicates that the given rule 4195 cannot be matched because it follows other rules that will always 4196 match the same text as it. For example, in the following 'foo' 4197 cannot be matched because it comes after an identifier "catch-all" 4198 rule: 4199 4200 [a-z]+ got_identifier(); 4201 foo got_foo(); 4202 4203 Using 'REJECT' in a scanner suppresses this warning. 4204 4205 * 'warning, -s option given but default rule can be matched' means 4206 that it is possible (perhaps only in a particular start condition) 4207 that the default rule (match any single character) is the only one 4208 that will match a particular input. Since '-s' was given, 4209 presumably this is not intended. 4210 4211 * 'reject_used_but_not_detected undefined' or 4212 'yymore_used_but_not_detected undefined'. These errors can occur 4213 at compile time. They indicate that the scanner uses 'REJECT' or 4214 'yymore()' but that 'flex' failed to notice the fact, meaning that 4215 'flex' scanned the first two sections looking for occurrences of 4216 these actions and failed to find any, but somehow you snuck some in 4217 (via a #include file, for example). Use '%option reject' or 4218 '%option yymore' to indicate to 'flex' that you really do use these 4219 features. 4220 4221 * 'flex scanner jammed'. a scanner compiled with '-s' has 4222 encountered an input string which wasn't matched by any of its 4223 rules. This error can also occur due to internal problems. 4224 4225 * 'token too large, exceeds YYLMAX'. your scanner uses '%array' and 4226 one of its rules matched a string longer than the 'YYLMAX' constant 4227 (8K bytes by default). You can increase the value by #define'ing 4228 'YYLMAX' in the definitions section of your 'flex' input. 4229 4230 * 'scanner requires -8 flag to use the character 'x''. Your scanner 4231 specification includes recognizing the 8-bit character ''x'' and 4232 you did not specify the -8 flag, and your scanner defaulted to 4233 7-bit because you used the '-Cf' or '-CF' table compression 4234 options. See the discussion of the '-7' flag, *note Scanner 4235 Options::, for details. 4236 4237 * 'flex scanner push-back overflow'. you used 'unput()' to push back 4238 so much text that the scanner's buffer could not hold both the 4239 pushed-back text and the current token in 'yytext'. Ideally the 4240 scanner should dynamically resize the buffer in this case, but at 4241 present it does not. 4242 4243 * 'input buffer overflow, can't enlarge buffer because scanner uses 4244 REJECT'. the scanner was working on matching an extremely large 4245 token and needed to expand the input buffer. This doesn't work 4246 with scanners that use 'REJECT'. 4247 4248 * 'fatal flex scanner internal error--end of buffer missed'. This 4249 can occur in a scanner which is reentered after a long-jump has 4250 jumped out (or over) the scanner's activation frame. Before 4251 reentering the scanner, use: 4252 yyrestart( yyin ); 4253 or, as noted above, switch to using the C++ scanner class. 4254 4255 * 'too many start conditions in <> construct!' you listed more start 4256 conditions in a <> construct than exist (so you must have listed at 4257 least one of them twice). 4258 4259 4260File: flex.info, Node: Limitations, Next: Bibliography, Prev: Diagnostics, Up: Top 4261 426224 Limitations 4263************** 4264 4265Some trailing context patterns cannot be properly matched and generate 4266warning messages ('dangerous trailing context'). These are patterns 4267where the ending of the first part of the rule matches the beginning of 4268the second part, such as 'zx*/xy*', where the 'x*' matches the 'x' at 4269the beginning of the trailing context. (Note that the POSIX draft 4270states that the text matched by such patterns is undefined.) For some 4271trailing context rules, parts which are actually fixed-length are not 4272recognized as such, leading to the abovementioned performance loss. In 4273particular, parts using '|' or '{n}' (such as 'foo{3}') are always 4274considered variable-length. Combining trailing context with the special 4275'|' action can result in _fixed_ trailing context being turned into the 4276more expensive _variable_ trailing context. For example, in the 4277following: 4278 4279 %% 4280 abc | 4281 xyz/def 4282 4283 Use of 'unput()' invalidates yytext and yyleng, unless the '%array' 4284directive or the '-l' option has been used. Pattern-matching of 'NUL's 4285is substantially slower than matching other characters. Dynamic 4286resizing of the input buffer is slow, as it entails rescanning all the 4287text matched so far by the current (generally huge) token. Due to both 4288buffering of input and read-ahead, you cannot intermix calls to 4289'<stdio.h>' routines, such as, getchar(), with 'flex' rules and expect 4290it to work. Call 'input()' instead. The total table entries listed by 4291the '-v' flag excludes the number of table entries needed to determine 4292what rule has been matched. The number of entries is equal to the 4293number of DFA states if the scanner does not use 'REJECT', and somewhat 4294greater than the number of states if it does. 'REJECT' cannot be used 4295with the '-f' or '-F' options. 4296 4297 The 'flex' internal algorithms need documentation. 4298 4299 4300File: flex.info, Node: Bibliography, Next: FAQ, Prev: Limitations, Up: Top 4301 430225 Additional Reading 4303********************* 4304 4305You may wish to read more about the following programs: 4306 * lex 4307 * yacc 4308 * sed 4309 * awk 4310 4311 The following books may contain material of interest: 4312 4313 John Levine, Tony Mason, and Doug Brown, _Lex & Yacc_, O'Reilly and 4314Associates. Be sure to get the 2nd edition. 4315 4316 M. E. Lesk and E. Schmidt, _LEX - Lexical Analyzer Generator_ 4317 4318 Alfred Aho, Ravi Sethi and Jeffrey Ullman, _Compilers: Principles, 4319Techniques and Tools_, Addison-Wesley (1986). Describes the 4320pattern-matching techniques used by 'flex' (deterministic finite 4321automata). 4322 4323 4324File: flex.info, Node: FAQ, Next: Appendices, Prev: Bibliography, Up: Top 4325 4326FAQ 4327*** 4328 4329From time to time, the 'flex' maintainer receives certain questions. 4330Rather than repeat answers to well-understood problems, we publish them 4331here. 4332 4333* Menu: 4334 4335* When was flex born?:: 4336* How do I expand backslash-escape sequences in C-style quoted strings?:: 4337* Why do flex scanners call fileno if it is not ANSI compatible?:: 4338* Does flex support recursive pattern definitions?:: 4339* How do I skip huge chunks of input (tens of megabytes) while using flex?:: 4340* Flex is not matching my patterns in the same order that I defined them.:: 4341* My actions are executing out of order or sometimes not at all.:: 4342* How can I have multiple input sources feed into the same scanner at the same time?:: 4343* Can I build nested parsers that work with the same input file?:: 4344* How can I match text only at the end of a file?:: 4345* How can I make REJECT cascade across start condition boundaries?:: 4346* Why cant I use fast or full tables with interactive mode?:: 4347* How much faster is -F or -f than -C?:: 4348* If I have a simple grammar cant I just parse it with flex?:: 4349* Why doesn't yyrestart() set the start state back to INITIAL?:: 4350* How can I match C-style comments?:: 4351* The period isn't working the way I expected.:: 4352* Can I get the flex manual in another format?:: 4353* Does there exist a "faster" NDFA->DFA algorithm?:: 4354* How does flex compile the DFA so quickly?:: 4355* How can I use more than 8192 rules?:: 4356* How do I abandon a file in the middle of a scan and switch to a new file?:: 4357* How do I execute code only during initialization (only before the first scan)?:: 4358* How do I execute code at termination?:: 4359* Where else can I find help?:: 4360* Can I include comments in the "rules" section of the file?:: 4361* I get an error about undefined yywrap().:: 4362* How can I change the matching pattern at run time?:: 4363* How can I expand macros in the input?:: 4364* How can I build a two-pass scanner?:: 4365* How do I match any string not matched in the preceding rules?:: 4366* I am trying to port code from AT&T lex that uses yysptr and yysbuf.:: 4367* Is there a way to make flex treat NULL like a regular character?:: 4368* Whenever flex can not match the input it says "flex scanner jammed".:: 4369* Why doesn't flex have non-greedy operators like perl does?:: 4370* Memory leak - 16386 bytes allocated by malloc.:: 4371* How do I track the byte offset for lseek()?:: 4372* How do I use my own I/O classes in a C++ scanner?:: 4373* How do I skip as many chars as possible?:: 4374* deleteme00:: 4375* Are certain equivalent patterns faster than others?:: 4376* Is backing up a big deal?:: 4377* Can I fake multi-byte character support?:: 4378* deleteme01:: 4379* Can you discuss some flex internals?:: 4380* unput() messes up yy_at_bol:: 4381* The | operator is not doing what I want:: 4382* Why can't flex understand this variable trailing context pattern?:: 4383* The ^ operator isn't working:: 4384* Trailing context is getting confused with trailing optional patterns:: 4385* Is flex GNU or not?:: 4386* ERASEME53:: 4387* I need to scan if-then-else blocks and while loops:: 4388* ERASEME55:: 4389* ERASEME56:: 4390* ERASEME57:: 4391* Is there a repository for flex scanners?:: 4392* How can I conditionally compile or preprocess my flex input file?:: 4393* Where can I find grammars for lex and yacc?:: 4394* I get an end-of-buffer message for each character scanned.:: 4395* unnamed-faq-62:: 4396* unnamed-faq-63:: 4397* unnamed-faq-64:: 4398* unnamed-faq-65:: 4399* unnamed-faq-66:: 4400* unnamed-faq-67:: 4401* unnamed-faq-68:: 4402* unnamed-faq-69:: 4403* unnamed-faq-70:: 4404* unnamed-faq-71:: 4405* unnamed-faq-72:: 4406* unnamed-faq-73:: 4407* unnamed-faq-74:: 4408* unnamed-faq-75:: 4409* unnamed-faq-76:: 4410* unnamed-faq-77:: 4411* unnamed-faq-78:: 4412* unnamed-faq-79:: 4413* unnamed-faq-80:: 4414* unnamed-faq-81:: 4415* unnamed-faq-82:: 4416* unnamed-faq-83:: 4417* unnamed-faq-84:: 4418* unnamed-faq-85:: 4419* unnamed-faq-86:: 4420* unnamed-faq-87:: 4421* unnamed-faq-88:: 4422* unnamed-faq-90:: 4423* unnamed-faq-91:: 4424* unnamed-faq-92:: 4425* unnamed-faq-93:: 4426* unnamed-faq-94:: 4427* unnamed-faq-95:: 4428* unnamed-faq-96:: 4429* unnamed-faq-97:: 4430* unnamed-faq-98:: 4431* unnamed-faq-99:: 4432* unnamed-faq-100:: 4433* unnamed-faq-101:: 4434* What is the difference between YYLEX_PARAM and YY_DECL?:: 4435* Why do I get "conflicting types for yylex" error?:: 4436* How do I access the values set in a Flex action from within a Bison action?:: 4437 4438 4439File: flex.info, Node: When was flex born?, Next: How do I expand backslash-escape sequences in C-style quoted strings?, Up: FAQ 4440 4441When was flex born? 4442=================== 4443 4444Vern Paxson took over the 'Software Tools' lex project from Jef 4445Poskanzer in 1982. At that point it was written in Ratfor. Around 1987 4446or so, Paxson translated it into C, and a legend was born :-). 4447 4448 4449File: flex.info, Node: How do I expand backslash-escape sequences in C-style quoted strings?, Next: Why do flex scanners call fileno if it is not ANSI compatible?, Prev: When was flex born?, Up: FAQ 4450 4451How do I expand backslash-escape sequences in C-style quoted strings? 4452===================================================================== 4453 4454A key point when scanning quoted strings is that you cannot (easily) 4455write a single rule that will precisely match the string if you allow 4456things like embedded escape sequences and newlines. If you try to match 4457strings with a single rule then you'll wind up having to rescan the 4458string anyway to find any escape sequences. 4459 4460 Instead you can use exclusive start conditions and a set of rules, 4461one for matching non-escaped text, one for matching a single escape, one 4462for matching an embedded newline, and one for recognizing the end of the 4463string. Each of these rules is then faced with the question of where to 4464put its intermediary results. The best solution is for the rules to 4465append their local value of 'yytext' to the end of a "string literal" 4466buffer. A rule like the escape-matcher will append to the buffer the 4467meaning of the escape sequence rather than the literal text in 'yytext'. 4468In this way, 'yytext' does not need to be modified at all. 4469 4470 4471File: flex.info, Node: Why do flex scanners call fileno if it is not ANSI compatible?, Next: Does flex support recursive pattern definitions?, Prev: How do I expand backslash-escape sequences in C-style quoted strings?, Up: FAQ 4472 4473Why do flex scanners call fileno if it is not ANSI compatible? 4474============================================================== 4475 4476Flex scanners call 'fileno()' in order to get the file descriptor 4477corresponding to 'yyin'. The file descriptor may be passed to 4478'isatty()' or 'read()', depending upon which '%options' you specified. 4479If your system does not have 'fileno()' support, to get rid of the 4480'read()' call, do not specify '%option read'. To get rid of the 4481'isatty()' call, you must specify one of '%option always-interactive' or 4482'%option never-interactive'. 4483 4484 4485File: flex.info, Node: Does flex support recursive pattern definitions?, Next: How do I skip huge chunks of input (tens of megabytes) while using flex?, Prev: Why do flex scanners call fileno if it is not ANSI compatible?, Up: FAQ 4486 4487Does flex support recursive pattern definitions? 4488================================================ 4489 4490e.g., 4491 4492 %% 4493 block "{"({block}|{statement})*"}" 4494 4495 No. You cannot have recursive definitions. The pattern-matching 4496power of regular expressions in general (and therefore flex scanners, 4497too) is limited. In particular, regular expressions cannot "balance" 4498parentheses to an arbitrary degree. For example, it's impossible to 4499write a regular expression that matches all strings containing the same 4500number of '{'s as '}'s. For more powerful pattern matching, you need a 4501parser, such as 'GNU bison'. 4502 4503 4504File: flex.info, Node: How do I skip huge chunks of input (tens of megabytes) while using flex?, Next: Flex is not matching my patterns in the same order that I defined them., Prev: Does flex support recursive pattern definitions?, Up: FAQ 4505 4506How do I skip huge chunks of input (tens of megabytes) while using flex? 4507======================================================================== 4508 4509Use 'fseek()' (or 'lseek()') to position yyin, then call 'yyrestart()'. 4510 4511 4512File: flex.info, Node: Flex is not matching my patterns in the same order that I defined them., Next: My actions are executing out of order or sometimes not at all., Prev: How do I skip huge chunks of input (tens of megabytes) while using flex?, Up: FAQ 4513 4514Flex is not matching my patterns in the same order that I defined them. 4515======================================================================= 4516 4517'flex' picks the rule that matches the most text (i.e., the longest 4518possible input string). This is because 'flex' uses an entirely 4519different matching technique ("deterministic finite automata") that 4520actually does all of the matching simultaneously, in parallel. (Seems 4521impossible, but it's actually a fairly simple technique once you 4522understand the principles.) 4523 4524 A side-effect of this parallel matching is that when the input 4525matches more than one rule, 'flex' scanners pick the rule that matched 4526the _most_ text. This is explained further in the manual, in the 4527section *Note Matching::. 4528 4529 If you want 'flex' to choose a shorter match, then you can work 4530around this behavior by expanding your short rule to match more text, 4531then put back the extra: 4532 4533 data_.* yyless( 5 ); BEGIN BLOCKIDSTATE; 4534 4535 Another fix would be to make the second rule active only during the 4536'<BLOCKIDSTATE>' start condition, and make that start condition 4537exclusive by declaring it with '%x' instead of '%s'. 4538 4539 A final fix is to change the input language so that the ambiguity for 4540'data_' is removed, by adding characters to it that don't match the 4541identifier rule, or by removing characters (such as '_') from the 4542identifier rule so it no longer matches 'data_'. (Of course, you might 4543also not have the option of changing the input language.) 4544 4545 4546File: flex.info, Node: My actions are executing out of order or sometimes not at all., Next: How can I have multiple input sources feed into the same scanner at the same time?, Prev: Flex is not matching my patterns in the same order that I defined them., Up: FAQ 4547 4548My actions are executing out of order or sometimes not at all. 4549============================================================== 4550 4551Most likely, you have (in error) placed the opening '{' of the action 4552block on a different line than the rule, e.g., 4553 4554 ^(foo|bar) 4555 { <<<--- WRONG! 4556 4557 } 4558 4559 'flex' requires that the opening '{' of an action associated with a 4560rule begin on the same line as does the rule. You need instead to write 4561your rules as follows: 4562 4563 ^(foo|bar) { // CORRECT! 4564 4565 } 4566 4567 4568File: flex.info, Node: How can I have multiple input sources feed into the same scanner at the same time?, Next: Can I build nested parsers that work with the same input file?, Prev: My actions are executing out of order or sometimes not at all., Up: FAQ 4569 4570How can I have multiple input sources feed into the same scanner at the same time? 4571================================================================================== 4572 4573If ... 4574 * your scanner is free of backtracking (verified using 'flex''s '-b' 4575 flag), 4576 * AND you run your scanner interactively ('-I' option; default unless 4577 using special table compression options), 4578 * AND you feed it one character at a time by redefining 'YY_INPUT' to 4579 do so, 4580 4581 then every time it matches a token, it will have exhausted its input 4582buffer (because the scanner is free of backtracking). This means you 4583can safely use 'select()' at the point and only call 'yylex()' for 4584another token if 'select()' indicates there's data available. 4585 4586 That is, move the 'select()' out from the input function to a point 4587where it determines whether 'yylex()' gets called for the next token. 4588 4589 With this approach, you will still have problems if your input can 4590arrive piecemeal; 'select()' could inform you that the beginning of a 4591token is available, you call 'yylex()' to get it, but it winds up 4592blocking waiting for the later characters in the token. 4593 4594 Here's another way: Move your input multiplexing inside of 4595'YY_INPUT'. That is, whenever 'YY_INPUT' is called, it 'select()''s to 4596see where input is available. If input is available for the scanner, it 4597reads and returns the next byte. If input is available from another 4598source, it calls whatever function is responsible for reading from that 4599source. (If no input is available, it blocks until some input is 4600available.) I've used this technique in an interpreter I wrote that 4601both reads keyboard input using a 'flex' scanner and IPC traffic from 4602sockets, and it works fine. 4603 4604 4605File: flex.info, Node: Can I build nested parsers that work with the same input file?, Next: How can I match text only at the end of a file?, Prev: How can I have multiple input sources feed into the same scanner at the same time?, Up: FAQ 4606 4607Can I build nested parsers that work with the same input file? 4608============================================================== 4609 4610This is not going to work without some additional effort. The reason is 4611that 'flex' block-buffers the input it reads from 'yyin'. This means 4612that the "outermost" 'yylex()', when called, will automatically slurp up 4613the first 8K of input available on yyin, and subsequent calls to other 4614'yylex()''s won't see that input. You might be tempted to work around 4615this problem by redefining 'YY_INPUT' to only return a small amount of 4616text, but it turns out that that approach is quite difficult. Instead, 4617the best solution is to combine all of your scanners into one large 4618scanner, using a different exclusive start condition for each. 4619 4620 4621File: flex.info, Node: How can I match text only at the end of a file?, Next: How can I make REJECT cascade across start condition boundaries?, Prev: Can I build nested parsers that work with the same input file?, Up: FAQ 4622 4623How can I match text only at the end of a file? 4624=============================================== 4625 4626There is no way to write a rule which is "match this text, but only if 4627it comes at the end of the file". You can fake it, though, if you 4628happen to have a character lying around that you don't allow in your 4629input. Then you redefine 'YY_INPUT' to call your own routine which, if 4630it sees an 'EOF', returns the magic character first (and remembers to 4631return a real 'EOF' next time it's called). Then you could write: 4632 4633 <COMMENT>(.|\n)*{EOF_CHAR} /* saw comment at EOF */ 4634 4635 4636File: flex.info, Node: How can I make REJECT cascade across start condition boundaries?, Next: Why cant I use fast or full tables with interactive mode?, Prev: How can I match text only at the end of a file?, Up: FAQ 4637 4638How can I make REJECT cascade across start condition boundaries? 4639================================================================ 4640 4641You can do this as follows. Suppose you have a start condition 'A', and 4642after exhausting all of the possible matches in '<A>', you want to try 4643matches in '<INITIAL>'. Then you could use the following: 4644 4645 %x A 4646 %% 4647 <A>rule_that_is_long ...; REJECT; 4648 <A>rule ...; REJECT; /* shorter rule */ 4649 <A>etc. 4650 ... 4651 <A>.|\n { 4652 /* Shortest and last rule in <A>, so 4653 * cascaded REJECTs will eventually 4654 * wind up matching this rule. We want 4655 * to now switch to the initial state 4656 * and try matching from there instead. 4657 */ 4658 yyless(0); /* put back matched text */ 4659 BEGIN(INITIAL); 4660 } 4661 4662 4663File: flex.info, Node: Why cant I use fast or full tables with interactive mode?, Next: How much faster is -F or -f than -C?, Prev: How can I make REJECT cascade across start condition boundaries?, Up: FAQ 4664 4665Why can't I use fast or full tables with interactive mode? 4666========================================================== 4667 4668One of the assumptions flex makes is that interactive applications are 4669inherently slow (they're waiting on a human after all). It has to do 4670with how the scanner detects that it must be finished scanning a token. 4671For interactive scanners, after scanning each character the current 4672state is looked up in a table (essentially) to see whether there's a 4673chance of another input character possibly extending the length of the 4674match. If not, the scanner halts. For non-interactive scanners, the 4675end-of-token test is much simpler, basically a compare with 0, so no 4676memory bus cycles. Since the test occurs in the innermost scanning 4677loop, one would like to make it go as fast as possible. 4678 4679 Still, it seems reasonable to allow the user to choose to trade off a 4680bit of performance in this area to gain the corresponding flexibility. 4681There might be another reason, though, why fast scanners don't support 4682the interactive option. 4683 4684 4685File: flex.info, Node: How much faster is -F or -f than -C?, Next: If I have a simple grammar cant I just parse it with flex?, Prev: Why cant I use fast or full tables with interactive mode?, Up: FAQ 4686 4687How much faster is -F or -f than -C? 4688==================================== 4689 4690Much faster (factor of 2-3). 4691 4692 4693File: flex.info, Node: If I have a simple grammar cant I just parse it with flex?, Next: Why doesn't yyrestart() set the start state back to INITIAL?, Prev: How much faster is -F or -f than -C?, Up: FAQ 4694 4695If I have a simple grammar can't I just parse it with flex? 4696=========================================================== 4697 4698Is your grammar recursive? That's almost always a sign that you're 4699better off using a parser/scanner rather than just trying to use a 4700scanner alone. 4701 4702 4703File: flex.info, Node: Why doesn't yyrestart() set the start state back to INITIAL?, Next: How can I match C-style comments?, Prev: If I have a simple grammar cant I just parse it with flex?, Up: FAQ 4704 4705Why doesn't yyrestart() set the start state back to INITIAL? 4706============================================================ 4707 4708There are two reasons. The first is that there might be programs that 4709rely on the start state not changing across file changes. The second is 4710that beginning with 'flex' version 2.4, use of 'yyrestart()' is no 4711longer required, so fixing the problem there doesn't solve the more 4712general problem. 4713 4714 4715File: flex.info, Node: How can I match C-style comments?, Next: The period isn't working the way I expected., Prev: Why doesn't yyrestart() set the start state back to INITIAL?, Up: FAQ 4716 4717How can I match C-style comments? 4718================================= 4719 4720You might be tempted to try something like this: 4721 4722 "/*".*"*/" // WRONG! 4723 4724 or, worse, this: 4725 4726 "/*"(.|\n)"*/" // WRONG! 4727 4728 The above rules will eat too much input, and blow up on things like: 4729 4730 /* a comment */ do_my_thing( "oops */" ); 4731 4732 Here is one way which allows you to track line information: 4733 4734 <INITIAL>{ 4735 "/*" BEGIN(IN_COMMENT); 4736 } 4737 <IN_COMMENT>{ 4738 "*/" BEGIN(INITIAL); 4739 [^*\n]+ // eat comment in chunks 4740 "*" // eat the lone star 4741 \n yylineno++; 4742 } 4743 4744 4745File: flex.info, Node: The period isn't working the way I expected., Next: Can I get the flex manual in another format?, Prev: How can I match C-style comments?, Up: FAQ 4746 4747The '.' isn't working the way I expected. 4748========================================= 4749 4750Here are some tips for using '.': 4751 4752 * A common mistake is to place the grouping parenthesis AFTER an 4753 operator, when you really meant to place the parenthesis BEFORE the 4754 operator, e.g., you probably want this '(foo|bar)+' and NOT this 4755 '(foo|bar+)'. 4756 4757 The first pattern matches the words 'foo' or 'bar' any number of 4758 times, e.g., it matches the text 'barfoofoobarfoo'. The second 4759 pattern matches a single instance of 'foo' or a single instance of 4760 'bar' followed by one or more 'r's, e.g., it matches the text 4761 'barrrr' . 4762 * A '.' inside '[]''s just means a literal'.' (period), and NOT "any 4763 character except newline". 4764 * Remember that '.' matches any character EXCEPT '\n' (and 'EOF'). 4765 If you really want to match ANY character, including newlines, then 4766 use '(.|\n)' Beware that the regex '(.|\n)+' will match your entire 4767 input! 4768 * Finally, if you want to match a literal '.' (a period), then use 4769 '[.]' or '"."' 4770 4771 4772File: flex.info, Node: Can I get the flex manual in another format?, Next: Does there exist a "faster" NDFA->DFA algorithm?, Prev: The period isn't working the way I expected., Up: FAQ 4773 4774Can I get the flex manual in another format? 4775============================================ 4776 4777The 'flex' source distribution includes a texinfo manual. You are free 4778to convert that texinfo into whatever format you desire. The 'texinfo' 4779package includes tools for conversion to a number of formats. 4780 4781 4782File: flex.info, Node: Does there exist a "faster" NDFA->DFA algorithm?, Next: How does flex compile the DFA so quickly?, Prev: Can I get the flex manual in another format?, Up: FAQ 4783 4784Does there exist a "faster" NDFA->DFA algorithm? 4785================================================ 4786 4787There's no way around the potential exponential running time - it can 4788take you exponential time just to enumerate all of the DFA states. In 4789practice, though, the running time is closer to linear, or sometimes 4790quadratic. 4791 4792 4793File: flex.info, Node: How does flex compile the DFA so quickly?, Next: How can I use more than 8192 rules?, Prev: Does there exist a "faster" NDFA->DFA algorithm?, Up: FAQ 4794 4795How does flex compile the DFA so quickly? 4796========================================= 4797 4798There are two big speed wins that 'flex' uses: 4799 4800 1. It analyzes the input rules to construct equivalence classes for 4801 those characters that always make the same transitions. It then 4802 rewrites the NFA using equivalence classes for transitions instead 4803 of characters. This cuts down the NFA->DFA computation time 4804 dramatically, to the point where, for uncompressed DFA tables, the 4805 DFA generation is often I/O bound in writing out the tables. 4806 2. It maintains hash values for previously computed DFA states, so 4807 testing whether a newly constructed DFA state is equivalent to a 4808 previously constructed state can be done very quickly, by first 4809 comparing hash values. 4810 4811 4812File: flex.info, Node: How can I use more than 8192 rules?, Next: How do I abandon a file in the middle of a scan and switch to a new file?, Prev: How does flex compile the DFA so quickly?, Up: FAQ 4813 4814How can I use more than 8192 rules? 4815=================================== 4816 4817'Flex' is compiled with an upper limit of 8192 rules per scanner. If 4818you need more than 8192 rules in your scanner, you'll have to recompile 4819'flex' with the following changes in 'flexdef.h': 4820 4821 < #define YY_TRAILING_MASK 0x2000 4822 < #define YY_TRAILING_HEAD_MASK 0x4000 4823 -- 4824 > #define YY_TRAILING_MASK 0x20000000 4825 > #define YY_TRAILING_HEAD_MASK 0x40000000 4826 4827 This should work okay as long as your C compiler uses 32 bit 4828integers. But you might want to think about whether using such a huge 4829number of rules is the best way to solve your problem. 4830 4831 The following may also be relevant: 4832 4833 With luck, you should be able to increase the definitions in 4834flexdef.h for: 4835 4836 #define JAMSTATE -32766 /* marks a reference to the state that always jams */ 4837 #define MAXIMUM_MNS 31999 4838 #define BAD_SUBSCRIPT -32767 4839 4840 recompile everything, and it'll all work. Flex only has these 484116-bit-like values built into it because a long time ago it was 4842developed on a machine with 16-bit ints. I've given this advice to 4843others in the past but haven't heard back from them whether it worked 4844okay or not... 4845 4846 4847File: flex.info, Node: How do I abandon a file in the middle of a scan and switch to a new file?, Next: How do I execute code only during initialization (only before the first scan)?, Prev: How can I use more than 8192 rules?, Up: FAQ 4848 4849How do I abandon a file in the middle of a scan and switch to a new file? 4850========================================================================= 4851 4852Just call 'yyrestart(newfile)'. Be sure to reset the start state if you 4853want a "fresh start, since 'yyrestart' does NOT reset the start state 4854back to 'INITIAL'. 4855 4856 4857File: flex.info, Node: How do I execute code only during initialization (only before the first scan)?, Next: How do I execute code at termination?, Prev: How do I abandon a file in the middle of a scan and switch to a new file?, Up: FAQ 4858 4859How do I execute code only during initialization (only before the first scan)? 4860============================================================================== 4861 4862You can specify an initial action by defining the macro 'YY_USER_INIT' 4863(though note that 'yyout' may not be available at the time this macro is 4864executed). Or you can add to the beginning of your rules section: 4865 4866 %% 4867 /* Must be indented! */ 4868 static int did_init = 0; 4869 4870 if ( ! did_init ){ 4871 do_my_init(); 4872 did_init = 1; 4873 } 4874 4875 4876File: flex.info, Node: How do I execute code at termination?, Next: Where else can I find help?, Prev: How do I execute code only during initialization (only before the first scan)?, Up: FAQ 4877 4878How do I execute code at termination? 4879===================================== 4880 4881You can specify an action for the '<<EOF>>' rule. 4882 4883 4884File: flex.info, Node: Where else can I find help?, Next: Can I include comments in the "rules" section of the file?, Prev: How do I execute code at termination?, Up: FAQ 4885 4886Where else can I find help? 4887=========================== 4888 4889You can find the flex homepage on the web at 4890<http://flex.sourceforge.net/>. See that page for details about flex 4891mailing lists as well. 4892 4893 4894File: flex.info, Node: Can I include comments in the "rules" section of the file?, Next: I get an error about undefined yywrap()., Prev: Where else can I find help?, Up: FAQ 4895 4896Can I include comments in the "rules" section of the file? 4897========================================================== 4898 4899Yes, just about anywhere you want to. See the manual for the specific 4900syntax. 4901 4902 4903File: flex.info, Node: I get an error about undefined yywrap()., Next: How can I change the matching pattern at run time?, Prev: Can I include comments in the "rules" section of the file?, Up: FAQ 4904 4905I get an error about undefined yywrap(). 4906======================================== 4907 4908You must supply a 'yywrap()' function of your own, or link to 'libfl.a' 4909(which provides one), or use 4910 4911 %option noyywrap 4912 4913 in your source to say you don't want a 'yywrap()' function. 4914 4915 4916File: flex.info, Node: How can I change the matching pattern at run time?, Next: How can I expand macros in the input?, Prev: I get an error about undefined yywrap()., Up: FAQ 4917 4918How can I change the matching pattern at run time? 4919================================================== 4920 4921You can't, it's compiled into a static table when flex builds the 4922scanner. 4923 4924 4925File: flex.info, Node: How can I expand macros in the input?, Next: How can I build a two-pass scanner?, Prev: How can I change the matching pattern at run time?, Up: FAQ 4926 4927How can I expand macros in the input? 4928===================================== 4929 4930The best way to approach this problem is at a higher level, e.g., in the 4931parser. 4932 4933 However, you can do this using multiple input buffers. 4934 4935 %% 4936 macro/[a-z]+ { 4937 /* Saw the macro "macro" followed by extra stuff. */ 4938 main_buffer = YY_CURRENT_BUFFER; 4939 expansion_buffer = yy_scan_string(expand(yytext)); 4940 yy_switch_to_buffer(expansion_buffer); 4941 } 4942 4943 <<EOF>> { 4944 if ( expansion_buffer ) 4945 { 4946 // We were doing an expansion, return to where 4947 // we were. 4948 yy_switch_to_buffer(main_buffer); 4949 yy_delete_buffer(expansion_buffer); 4950 expansion_buffer = 0; 4951 } 4952 else 4953 yyterminate(); 4954 } 4955 4956 You probably will want a stack of expansion buffers to allow nested 4957macros. From the above though hopefully the idea is clear. 4958 4959 4960File: flex.info, Node: How can I build a two-pass scanner?, Next: How do I match any string not matched in the preceding rules?, Prev: How can I expand macros in the input?, Up: FAQ 4961 4962How can I build a two-pass scanner? 4963=================================== 4964 4965One way to do it is to filter the first pass to a temporary file, then 4966process the temporary file on the second pass. You will probably see a 4967performance hit, due to all the disk I/O. 4968 4969 When you need to look ahead far forward like this, it almost always 4970means that the right solution is to build a parse tree of the entire 4971input, then walk it after the parse in order to generate the output. In 4972a sense, this is a two-pass approach, once through the text and once 4973through the parse tree, but the performance hit for the latter is 4974usually an order of magnitude smaller, since everything is already 4975classified, in binary format, and residing in memory. 4976 4977 4978File: flex.info, Node: How do I match any string not matched in the preceding rules?, Next: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Prev: How can I build a two-pass scanner?, Up: FAQ 4979 4980How do I match any string not matched in the preceding rules? 4981============================================================= 4982 4983One way to assign precedence, is to place the more specific rules first. 4984If two rules would match the same input (same sequence of characters) 4985then the first rule listed in the 'flex' input wins, e.g., 4986 4987 %% 4988 foo[a-zA-Z_]+ return FOO_ID; 4989 bar[a-zA-Z_]+ return BAR_ID; 4990 [a-zA-Z_]+ return GENERIC_ID; 4991 4992 Note that the rule '[a-zA-Z_]+' must come *after* the others. It 4993will match the same amount of text as the more specific rules, and in 4994that case the 'flex' scanner will pick the first rule listed in your 4995scanner as the one to match. 4996 4997 4998File: flex.info, Node: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Next: Is there a way to make flex treat NULL like a regular character?, Prev: How do I match any string not matched in the preceding rules?, Up: FAQ 4999 5000I am trying to port code from AT&T lex that uses yysptr and yysbuf. 5001=================================================================== 5002 5003Those are internal variables pointing into the AT&T scanner's input 5004buffer. I imagine they're being manipulated in user versions of the 5005'input()' and 'unput()' functions. If so, what you need to do is 5006analyze those functions to figure out what they're doing, and then 5007replace 'input()' with an appropriate definition of 'YY_INPUT'. You 5008shouldn't need to (and must not) replace 'flex''s 'unput()' function. 5009 5010 5011File: flex.info, Node: Is there a way to make flex treat NULL like a regular character?, Next: Whenever flex can not match the input it says "flex scanner jammed"., Prev: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Up: FAQ 5012 5013Is there a way to make flex treat NULL like a regular character? 5014================================================================ 5015 5016Yes, '\0' and '\x00' should both do the trick. Perhaps you have an 5017ancient version of 'flex'. The latest release is version 2.6.4. 5018 5019 5020File: flex.info, Node: Whenever flex can not match the input it says "flex scanner jammed"., Next: Why doesn't flex have non-greedy operators like perl does?, Prev: Is there a way to make flex treat NULL like a regular character?, Up: FAQ 5021 5022Whenever flex can not match the input it says "flex scanner jammed". 5023==================================================================== 5024 5025You need to add a rule that matches the otherwise-unmatched text, e.g., 5026 5027 %option yylineno 5028 %% 5029 [[a bunch of rules here]] 5030 5031 . printf("bad input character '%s' at line %d\n", yytext, yylineno); 5032 5033 See '%option default' for more information. 5034 5035 5036File: flex.info, Node: Why doesn't flex have non-greedy operators like perl does?, Next: Memory leak - 16386 bytes allocated by malloc., Prev: Whenever flex can not match the input it says "flex scanner jammed"., Up: FAQ 5037 5038Why doesn't flex have non-greedy operators like perl does? 5039========================================================== 5040 5041A DFA can do a non-greedy match by stopping the first time it enters an 5042accepting state, instead of consuming input until it determines that no 5043further matching is possible (a "jam" state). This is actually easier 5044to implement than longest leftmost match (which flex does). 5045 5046 But it's also much less useful than longest leftmost match. In 5047general, when you find yourself wishing for non-greedy matching, that's 5048usually a sign that you're trying to make the scanner do some parsing. 5049That's generally the wrong approach, since it lacks the power to do a 5050decent job. Better is to either introduce a separate parser, or to 5051split the scanner into multiple scanners using (exclusive) start 5052conditions. 5053 5054 You might have a separate start state once you've seen the 'BEGIN'. 5055In that state, you might then have a regex that will match 'END' (to 5056kick you out of the state), and perhaps '(.|\n)' to get a single 5057character within the chunk ... 5058 5059 This approach also has much better error-reporting properties. 5060 5061 5062File: flex.info, Node: Memory leak - 16386 bytes allocated by malloc., Next: How do I track the byte offset for lseek()?, Prev: Why doesn't flex have non-greedy operators like perl does?, Up: FAQ 5063 5064Memory leak - 16386 bytes allocated by malloc. 5065============================================== 5066 5067UPDATED 2002-07-10: As of 'flex' version 2.5.9, this leak means that you 5068did not call 'yylex_destroy()'. If you are using an earlier version of 5069'flex', then read on. 5070 5071 The leak is about 16426 bytes. That is, (8192 * 2 + 2) for the 5072read-buffer, and about 40 for 'struct yy_buffer_state' (depending upon 5073alignment). The leak is in the non-reentrant C scanner only (NOT in the 5074reentrant scanner, NOT in the C++ scanner). Since 'flex' doesn't know 5075when you are done, the buffer is never freed. 5076 5077 However, the leak won't multiply since the buffer is reused no matter 5078how many times you call 'yylex()'. 5079 5080 If you want to reclaim the memory when you are completely done 5081scanning, then you might try this: 5082 5083 /* For non-reentrant C scanner only. */ 5084 yy_delete_buffer(YY_CURRENT_BUFFER); 5085 yy_init = 1; 5086 5087 Note: 'yy_init' is an "internal variable", and hasn't been tested in 5088this situation. It is possible that some other globals may need 5089resetting as well. 5090 5091 5092File: flex.info, Node: How do I track the byte offset for lseek()?, Next: How do I use my own I/O classes in a C++ scanner?, Prev: Memory leak - 16386 bytes allocated by malloc., Up: FAQ 5093 5094How do I track the byte offset for lseek()? 5095=========================================== 5096 5097 > We thought that it would be possible to have this number through the 5098 > evaluation of the following expression: 5099 > 5100 > seek_position = (no_buffers)*YY_READ_BUF_SIZE + yy_c_buf_p - YY_CURRENT_BUFFER->yy_ch_buf 5101 5102 While this is the right idea, it has two problems. The first is that 5103it's possible that 'flex' will request less than 'YY_READ_BUF_SIZE' 5104during an invocation of 'YY_INPUT' (or that your input source will 5105return less even though 'YY_READ_BUF_SIZE' bytes were requested). The 5106second problem is that when refilling its internal buffer, 'flex' keeps 5107some characters from the previous buffer (because usually it's in the 5108middle of a match, and needs those characters to construct 'yytext' for 5109the match once it's done). Because of this, 'yy_c_buf_p - 5110YY_CURRENT_BUFFER->yy_ch_buf' won't be exactly the number of characters 5111already read from the current buffer. 5112 5113 An alternative solution is to count the number of characters you've 5114matched since starting to scan. This can be done by using 5115'YY_USER_ACTION'. For example, 5116 5117 #define YY_USER_ACTION num_chars += yyleng; 5118 5119 (You need to be careful to update your bookkeeping if you use 5120'yymore('), 'yyless()', 'unput()', or 'input()'.) 5121 5122 5123File: flex.info, Node: How do I use my own I/O classes in a C++ scanner?, Next: How do I skip as many chars as possible?, Prev: How do I track the byte offset for lseek()?, Up: FAQ 5124 5125How do I use my own I/O classes in a C++ scanner? 5126================================================= 5127 5128When the flex C++ scanning class rewrite finally happens, then this sort 5129of thing should become much easier. 5130 5131 You can do this by passing the various functions (such as 5132'LexerInput()' and 'LexerOutput()') NULL 'iostream*''s, and then dealing 5133with your own I/O classes surreptitiously (i.e., stashing them in 5134special member variables). This works because the only assumption about 5135the lexer regarding what's done with the iostream's is that they're 5136ultimately passed to 'LexerInput()' and 'LexerOutput', which then do 5137whatever is necessary with them. 5138 5139 5140File: flex.info, Node: How do I skip as many chars as possible?, Next: deleteme00, Prev: How do I use my own I/O classes in a C++ scanner?, Up: FAQ 5141 5142How do I skip as many chars as possible? 5143======================================== 5144 5145How do I skip as many chars as possible - without interfering with the 5146other patterns? 5147 5148 In the example below, we want to skip over characters until we see 5149the phrase "endskip". The following will _NOT_ work correctly (do you 5150see why not?) 5151 5152 /* INCORRECT SCANNER */ 5153 %x SKIP 5154 %% 5155 <INITIAL>startskip BEGIN(SKIP); 5156 ... 5157 <SKIP>"endskip" BEGIN(INITIAL); 5158 <SKIP>.* ; 5159 5160 The problem is that the pattern .* will eat up the word "endskip." 5161The simplest (but slow) fix is: 5162 5163 <SKIP>"endskip" BEGIN(INITIAL); 5164 <SKIP>. ; 5165 5166 The fix involves making the second rule match more, without making it 5167match "endskip" plus something else. So for example: 5168 5169 <SKIP>"endskip" BEGIN(INITIAL); 5170 <SKIP>[^e]+ ; 5171 <SKIP>. ;/* so you eat up e's, too */ 5172 5173 5174File: flex.info, Node: deleteme00, Next: Are certain equivalent patterns faster than others?, Prev: How do I skip as many chars as possible?, Up: FAQ 5175 5176deleteme00 5177========== 5178 5179 QUESTION: 5180 When was flex born? 5181 5182 Vern Paxson took over 5183 the Software Tools lex project from Jef Poskanzer in 1982. At that point it 5184 was written in Ratfor. Around 1987 or so, Paxson translated it into C, and 5185 a legend was born :-). 5186 5187 5188File: flex.info, Node: Are certain equivalent patterns faster than others?, Next: Is backing up a big deal?, Prev: deleteme00, Up: FAQ 5189 5190Are certain equivalent patterns faster than others? 5191=================================================== 5192 5193 To: Adoram Rogel <adoram@orna.hybridge.com> 5194 Subject: Re: Flex 2.5.2 performance questions 5195 In-reply-to: Your message of Wed, 18 Sep 96 11:12:17 EDT. 5196 Date: Wed, 18 Sep 96 10:51:02 PDT 5197 From: Vern Paxson <vern> 5198 5199 [Note, the most recent flex release is 2.5.4, which you can get from 5200 ftp.ee.lbl.gov. It has bug fixes over 2.5.2 and 2.5.3.] 5201 5202 > 1. Using the pattern 5203 > ([Ff](oot)?)?[Nn](ote)?(\.)? 5204 > instead of 5205 > (((F|f)oot(N|n)ote)|((N|n)ote)|((N|n)\.)|((F|f)(N|n)(\.))) 5206 > (in a very complicated flex program) caused the program to slow from 5207 > 300K+/min to 100K/min (no other changes were done). 5208 5209 These two are not equivalent. For example, the first can match "footnote." 5210 but the second can only match "footnote". This is almost certainly the 5211 cause in the discrepancy - the slower scanner run is matching more tokens, 5212 and/or having to do more backing up. 5213 5214 > 2. Which of these two are better: [Ff]oot or (F|f)oot ? 5215 5216 From a performance point of view, they're equivalent (modulo presumably 5217 minor effects such as memory cache hit rates; and the presence of trailing 5218 context, see below). From a space point of view, the first is slightly 5219 preferable. 5220 5221 > 3. I have a pattern that look like this: 5222 > pats {p1}|{p2}|{p3}|...|{p50} (50 patterns ORd) 5223 > 5224 > running yet another complicated program that includes the following rule: 5225 > <snext>{and}/{no4}{bb}{pats} 5226 > 5227 > gets me to "too complicated - over 32,000 states"... 5228 5229 I can't tell from this example whether the trailing context is variable-length 5230 or fixed-length (it could be the latter if {and} is fixed-length). If it's 5231 variable length, which flex -p will tell you, then this reflects a basic 5232 performance problem, and if you can eliminate it by restructuring your 5233 scanner, you will see significant improvement. 5234 5235 > so I divided {pats} to {pats1}, {pats2},..., {pats5} each consists of about 5236 > 10 patterns and changed the rule to be 5 rules. 5237 > This did compile, but what is the rule of thumb here ? 5238 5239 The rule is to avoid trailing context other than fixed-length, in which for 5240 a/b, either the 'a' pattern or the 'b' pattern have a fixed length. Use 5241 of the '|' operator automatically makes the pattern variable length, so in 5242 this case '[Ff]oot' is preferred to '(F|f)oot'. 5243 5244 > 4. I changed a rule that looked like this: 5245 > <snext8>{and}{bb}/{ROMAN}[^A-Za-z] { BEGIN... 5246 > 5247 > to the next 2 rules: 5248 > <snext8>{and}{bb}/{ROMAN}[A-Za-z] { ECHO;} 5249 > <snext8>{and}{bb}/{ROMAN} { BEGIN... 5250 > 5251 > Again, I understand the using [^...] will cause a great performance loss 5252 5253 Actually, it doesn't cause any sort of performance loss. It's a surprising 5254 fact about regular expressions that they always match in linear time 5255 regardless of how complex they are. 5256 5257 > but are there any specific rules about it ? 5258 5259 See the "Performance Considerations" section of the man page, and also 5260 the example in MISC/fastwc/. 5261 5262 Vern 5263 5264 5265File: flex.info, Node: Is backing up a big deal?, Next: Can I fake multi-byte character support?, Prev: Are certain equivalent patterns faster than others?, Up: FAQ 5266 5267Is backing up a big deal? 5268========================= 5269 5270 To: Adoram Rogel <adoram@hybridge.com> 5271 Subject: Re: Flex 2.5.2 performance questions 5272 In-reply-to: Your message of Thu, 19 Sep 96 10:16:04 EDT. 5273 Date: Thu, 19 Sep 96 09:58:00 PDT 5274 From: Vern Paxson <vern> 5275 5276 > a lot about the backing up problem. 5277 > I believe that there lies my biggest problem, and I'll try to improve 5278 > it. 5279 5280 Since you have variable trailing context, this is a bigger performance 5281 problem. Fixing it is usually easier than fixing backing up, which in a 5282 complicated scanner (yours seems to fit the bill) can be extremely 5283 difficult to do correctly. 5284 5285 You also don't mention what flags you are using for your scanner. 5286 -f makes a large speed difference, and -Cfe buys you nearly as much 5287 speed but the resulting scanner is considerably smaller. 5288 5289 > I have an | operator in {and} and in {pats} so both of them are variable 5290 > length. 5291 5292 -p should have reported this. 5293 5294 > Is changing one of them to fixed-length is enough ? 5295 5296 Yes. 5297 5298 > Is it possible to change the 32,000 states limit ? 5299 5300 Yes. I've appended instructions on how. Before you make this change, 5301 though, you should think about whether there are ways to fundamentally 5302 simplify your scanner - those are certainly preferable! 5303 5304 Vern 5305 5306 To increase the 32K limit (on a machine with 32 bit integers), you increase 5307 the magnitude of the following in flexdef.h: 5308 5309 #define JAMSTATE -32766 /* marks a reference to the state that always jams */ 5310 #define MAXIMUM_MNS 31999 5311 #define BAD_SUBSCRIPT -32767 5312 #define MAX_SHORT 32700 5313 5314 Adding a 0 or two after each should do the trick. 5315 5316 5317File: flex.info, Node: Can I fake multi-byte character support?, Next: deleteme01, Prev: Is backing up a big deal?, Up: FAQ 5318 5319Can I fake multi-byte character support? 5320======================================== 5321 5322 To: Heeman_Lee@hp.com 5323 Subject: Re: flex - multi-byte support? 5324 In-reply-to: Your message of Thu, 03 Oct 1996 17:24:04 PDT. 5325 Date: Fri, 04 Oct 1996 11:42:18 PDT 5326 From: Vern Paxson <vern> 5327 5328 > I assume as long as my *.l file defines the 5329 > range of expected character code values (in octal format), flex will 5330 > scan the file and read multi-byte characters correctly. But I have no 5331 > confidence in this assumption. 5332 5333 Your lack of confidence is justified - this won't work. 5334 5335 Flex has in it a widespread assumption that the input is processed 5336 one byte at a time. Fixing this is on the to-do list, but is involved, 5337 so it won't happen any time soon. In the interim, the best I can suggest 5338 (unless you want to try fixing it yourself) is to write your rules in 5339 terms of pairs of bytes, using definitions in the first section: 5340 5341 X \xfe\xc2 5342 ... 5343 %% 5344 foo{X}bar found_foo_fe_c2_bar(); 5345 5346 etc. Definitely a pain - sorry about that. 5347 5348 By the way, the email address you used for me is ancient, indicating you 5349 have a very old version of flex. You can get the most recent, 2.5.4, from 5350 ftp.ee.lbl.gov. 5351 5352 Vern 5353 5354 5355File: flex.info, Node: deleteme01, Next: Can you discuss some flex internals?, Prev: Can I fake multi-byte character support?, Up: FAQ 5356 5357deleteme01 5358========== 5359 5360 To: moleary@primus.com 5361 Subject: Re: Flex / Unicode compatibility question 5362 In-reply-to: Your message of Tue, 22 Oct 1996 10:15:42 PDT. 5363 Date: Tue, 22 Oct 1996 11:06:13 PDT 5364 From: Vern Paxson <vern> 5365 5366 Unfortunately flex at the moment has a widespread assumption within it 5367 that characters are processed 8 bits at a time. I don't see any easy 5368 fix for this (other than writing your rules in terms of double characters - 5369 a pain). I also don't know of a wider lex, though you might try surfing 5370 the Plan 9 stuff because I know it's a Unicode system, and also the PCCT 5371 toolkit (try searching say Alta Vista for "Purdue Compiler Construction 5372 Toolkit"). 5373 5374 Fixing flex to handle wider characters is on the long-term to-do list. 5375 But since flex is a strictly spare-time project these days, this probably 5376 won't happen for quite a while, unless someone else does it first. 5377 5378 Vern 5379 5380 5381File: flex.info, Node: Can you discuss some flex internals?, Next: unput() messes up yy_at_bol, Prev: deleteme01, Up: FAQ 5382 5383Can you discuss some flex internals? 5384==================================== 5385 5386 To: Johan Linde <jl@theophys.kth.se> 5387 Subject: Re: translation of flex 5388 In-reply-to: Your message of Sun, 10 Nov 1996 09:16:36 PST. 5389 Date: Mon, 11 Nov 1996 10:33:50 PST 5390 From: Vern Paxson <vern> 5391 5392 > I'm working for the Swedish team translating GNU program, and I'm currently 5393 > working with flex. I have a few questions about some of the messages which 5394 > I hope you can answer. 5395 5396 All of the things you're wondering about, by the way, concerning flex 5397 internals - probably the only person who understands what they mean in 5398 English is me! So I wouldn't worry too much about getting them right. 5399 That said ... 5400 5401 > #: main.c:545 5402 > msgid " %d protos created\n" 5403 > 5404 > Does proto mean prototype? 5405 5406 Yes - prototypes of state compression tables. 5407 5408 > #: main.c:539 5409 > msgid " %d/%d (peak %d) template nxt-chk entries created\n" 5410 > 5411 > Here I'm mainly puzzled by 'nxt-chk'. I guess it means 'next-check'. (?) 5412 > However, 'template next-check entries' doesn't make much sense to me. To be 5413 > able to find a good translation I need to know a little bit more about it. 5414 5415 There is a scheme in the Aho/Sethi/Ullman compiler book for compressing 5416 scanner tables. It involves creating two pairs of tables. The first has 5417 "base" and "default" entries, the second has "next" and "check" entries. 5418 The "base" entry is indexed by the current state and yields an index into 5419 the next/check table. The "default" entry gives what to do if the state 5420 transition isn't found in next/check. The "next" entry gives the next 5421 state to enter, but only if the "check" entry verifies that this entry is 5422 correct for the current state. Flex creates templates of series of 5423 next/check entries and then encodes differences from these templates as a 5424 way to compress the tables. 5425 5426 > #: main.c:533 5427 > msgid " %d/%d base-def entries created\n" 5428 > 5429 > The same problem here for 'base-def'. 5430 5431 See above. 5432 5433 Vern 5434 5435 5436File: flex.info, Node: unput() messes up yy_at_bol, Next: The | operator is not doing what I want, Prev: Can you discuss some flex internals?, Up: FAQ 5437 5438unput() messes up yy_at_bol 5439=========================== 5440 5441 To: Xinying Li <xli@npac.syr.edu> 5442 Subject: Re: FLEX ? 5443 In-reply-to: Your message of Wed, 13 Nov 1996 17:28:38 PST. 5444 Date: Wed, 13 Nov 1996 19:51:54 PST 5445 From: Vern Paxson <vern> 5446 5447 > "unput()" them to input flow, question occurs. If I do this after I scan 5448 > a carriage, the variable "YY_CURRENT_BUFFER->yy_at_bol" is changed. That 5449 > means the carriage flag has gone. 5450 5451 You can control this by calling yy_set_bol(). It's described in the manual. 5452 5453 > And if in pre-reading it goes to the end of file, is anything done 5454 > to control the end of curren buffer and end of file? 5455 5456 No, there's no way to put back an end-of-file. 5457 5458 > By the way I am using flex 2.5.2 and using the "-l". 5459 5460 The latest release is 2.5.4, by the way. It fixes some bugs in 2.5.2 and 5461 2.5.3. You can get it from ftp.ee.lbl.gov. 5462 5463 Vern 5464 5465 5466File: flex.info, Node: The | operator is not doing what I want, Next: Why can't flex understand this variable trailing context pattern?, Prev: unput() messes up yy_at_bol, Up: FAQ 5467 5468The | operator is not doing what I want 5469======================================= 5470 5471 To: Alain.ISSARD@st.com 5472 Subject: Re: Start condition with FLEX 5473 In-reply-to: Your message of Mon, 18 Nov 1996 09:45:02 PST. 5474 Date: Mon, 18 Nov 1996 10:41:34 PST 5475 From: Vern Paxson <vern> 5476 5477 > I am not able to use the start condition scope and to use the | (OR) with 5478 > rules having start conditions. 5479 5480 The problem is that if you use '|' as a regular expression operator, for 5481 example "a|b" meaning "match either 'a' or 'b'", then it must *not* have 5482 any blanks around it. If you instead want the special '|' *action* (which 5483 from your scanner appears to be the case), which is a way of giving two 5484 different rules the same action: 5485 5486 foo | 5487 bar matched_foo_or_bar(); 5488 5489 then '|' *must* be separated from the first rule by whitespace and *must* 5490 be followed by a new line. You *cannot* write it as: 5491 5492 foo | bar matched_foo_or_bar(); 5493 5494 even though you might think you could because yacc supports this syntax. 5495 The reason for this unfortunately incompatibility is historical, but it's 5496 unlikely to be changed. 5497 5498 Your problems with start condition scope are simply due to syntax errors 5499 from your use of '|' later confusing flex. 5500 5501 Let me know if you still have problems. 5502 5503 Vern 5504 5505 5506File: flex.info, Node: Why can't flex understand this variable trailing context pattern?, Next: The ^ operator isn't working, Prev: The | operator is not doing what I want, Up: FAQ 5507 5508Why can't flex understand this variable trailing context pattern? 5509================================================================= 5510 5511 To: Gregory Margo <gmargo@newton.vip.best.com> 5512 Subject: Re: flex-2.5.3 bug report 5513 In-reply-to: Your message of Sat, 23 Nov 1996 16:50:09 PST. 5514 Date: Sat, 23 Nov 1996 17:07:32 PST 5515 From: Vern Paxson <vern> 5516 5517 > Enclosed is a lex file that "real" lex will process, but I cannot get 5518 > flex to process it. Could you try it and maybe point me in the right direction? 5519 5520 Your problem is that some of the definitions in the scanner use the '/' 5521 trailing context operator, and have it enclosed in ()'s. Flex does not 5522 allow this operator to be enclosed in ()'s because doing so allows undefined 5523 regular expressions such as "(a/b)+". So the solution is to remove the 5524 parentheses. Note that you must also be building the scanner with the -l 5525 option for AT&T lex compatibility. Without this option, flex automatically 5526 encloses the definitions in parentheses. 5527 5528 Vern 5529 5530 5531File: flex.info, Node: The ^ operator isn't working, Next: Trailing context is getting confused with trailing optional patterns, Prev: Why can't flex understand this variable trailing context pattern?, Up: FAQ 5532 5533The ^ operator isn't working 5534============================ 5535 5536 To: Thomas Hadig <hadig@toots.physik.rwth-aachen.de> 5537 Subject: Re: Flex Bug ? 5538 In-reply-to: Your message of Tue, 26 Nov 1996 14:35:01 PST. 5539 Date: Tue, 26 Nov 1996 11:15:05 PST 5540 From: Vern Paxson <vern> 5541 5542 > In my lexer code, i have the line : 5543 > ^\*.* { } 5544 > 5545 > Thus all lines starting with an astrix (*) are comment lines. 5546 > This does not work ! 5547 5548 I can't get this problem to reproduce - it works fine for me. Note 5549 though that if what you have is slightly different: 5550 5551 COMMENT ^\*.* 5552 %% 5553 {COMMENT} { } 5554 5555 then it won't work, because flex pushes back macro definitions enclosed 5556 in ()'s, so the rule becomes 5557 5558 (^\*.*) { } 5559 5560 and now that the '^' operator is not at the immediate beginning of the 5561 line, it's interpreted as just a regular character. You can avoid this 5562 behavior by using the "-l" lex-compatibility flag, or "%option lex-compat". 5563 5564 Vern 5565 5566 5567File: flex.info, Node: Trailing context is getting confused with trailing optional patterns, Next: Is flex GNU or not?, Prev: The ^ operator isn't working, Up: FAQ 5568 5569Trailing context is getting confused with trailing optional patterns 5570==================================================================== 5571 5572 To: Adoram Rogel <adoram@hybridge.com> 5573 Subject: Re: Flex 2.5.4 BOF ??? 5574 In-reply-to: Your message of Tue, 26 Nov 1996 16:10:41 PST. 5575 Date: Wed, 27 Nov 1996 10:56:25 PST 5576 From: Vern Paxson <vern> 5577 5578 > Organization(s)?/[a-z] 5579 > 5580 > This matched "Organizations" (looking in debug mode, the trailing s 5581 > was matched with trailing context instead of the optional (s) in the 5582 > end of the word. 5583 5584 That should only happen with lex. Flex can properly match this pattern. 5585 (That might be what you're saying, I'm just not sure.) 5586 5587 > Is there a way to avoid this dangerous trailing context problem ? 5588 5589 Unfortunately, there's no easy way. On the other hand, I don't see why 5590 it should be a problem. Lex's matching is clearly wrong, and I'd hope 5591 that usually the intent remains the same as expressed with the pattern, 5592 so flex's matching will be correct. 5593 5594 Vern 5595 5596 5597File: flex.info, Node: Is flex GNU or not?, Next: ERASEME53, Prev: Trailing context is getting confused with trailing optional patterns, Up: FAQ 5598 5599Is flex GNU or not? 5600=================== 5601 5602 To: Cameron MacKinnon <mackin@interlog.com> 5603 Subject: Re: Flex documentation bug 5604 In-reply-to: Your message of Mon, 02 Dec 1996 00:07:08 PST. 5605 Date: Sun, 01 Dec 1996 22:29:39 PST 5606 From: Vern Paxson <vern> 5607 5608 > I'm not sure how or where to submit bug reports (documentation or 5609 > otherwise) for the GNU project stuff ... 5610 5611 Well, strictly speaking flex isn't part of the GNU project. They just 5612 distribute it because no one's written a decent GPL'd lex replacement. 5613 So you should send bugs directly to me. Those sent to the GNU folks 5614 sometimes find there way to me, but some may drop between the cracks. 5615 5616 > In GNU Info, under the section 'Start Conditions', and also in the man 5617 > page (mine's dated April '95) is a nice little snippet showing how to 5618 > parse C quoted strings into a buffer, defined to be MAX_STR_CONST in 5619 > size. Unfortunately, no overflow checking is ever done ... 5620 5621 This is already mentioned in the manual: 5622 5623 Finally, here's an example of how to match C-style quoted 5624 strings using exclusive start conditions, including expanded 5625 escape sequences (but not including checking for a string 5626 that's too long): 5627 5628 The reason for not doing the overflow checking is that it will needlessly 5629 clutter up an example whose main purpose is just to demonstrate how to 5630 use flex. 5631 5632 The latest release is 2.5.4, by the way, available from ftp.ee.lbl.gov. 5633 5634 Vern 5635 5636 5637File: flex.info, Node: ERASEME53, Next: I need to scan if-then-else blocks and while loops, Prev: Is flex GNU or not?, Up: FAQ 5638 5639ERASEME53 5640========= 5641 5642 To: tsv@cs.UManitoba.CA 5643 Subject: Re: Flex (reg).. 5644 In-reply-to: Your message of Thu, 06 Mar 1997 23:50:16 PST. 5645 Date: Thu, 06 Mar 1997 15:54:19 PST 5646 From: Vern Paxson <vern> 5647 5648 > [:alpha:] ([:alnum:] | \\_)* 5649 5650 If your rule really has embedded blanks as shown above, then it won't 5651 work, as the first blank delimits the rule from the action. (It wouldn't 5652 even compile ...) You need instead: 5653 5654 [:alpha:]([:alnum:]|\\_)* 5655 5656 and that should work fine - there's no restriction on what can go inside 5657 of ()'s except for the trailing context operator, '/'. 5658 5659 Vern 5660 5661 5662File: flex.info, Node: I need to scan if-then-else blocks and while loops, Next: ERASEME55, Prev: ERASEME53, Up: FAQ 5663 5664I need to scan if-then-else blocks and while loops 5665================================================== 5666 5667 To: "Mike Stolnicki" <mstolnic@ford.com> 5668 Subject: Re: FLEX help 5669 In-reply-to: Your message of Fri, 30 May 1997 13:33:27 PDT. 5670 Date: Fri, 30 May 1997 10:46:35 PDT 5671 From: Vern Paxson <vern> 5672 5673 > We'd like to add "if-then-else", "while", and "for" statements to our 5674 > language ... 5675 > We've investigated many possible solutions. The one solution that seems 5676 > the most reasonable involves knowing the position of a TOKEN in yyin. 5677 5678 I strongly advise you to instead build a parse tree (abstract syntax tree) 5679 and loop over that instead. You'll find this has major benefits in keeping 5680 your interpreter simple and extensible. 5681 5682 That said, the functionality you mention for get_position and set_position 5683 have been on the to-do list for a while. As flex is a purely spare-time 5684 project for me, no guarantees when this will be added (in particular, it 5685 for sure won't be for many months to come). 5686 5687 Vern 5688 5689 5690File: flex.info, Node: ERASEME55, Next: ERASEME56, Prev: I need to scan if-then-else blocks and while loops, Up: FAQ 5691 5692ERASEME55 5693========= 5694 5695 To: Colin Paul Adams <colin@colina.demon.co.uk> 5696 Subject: Re: Flex C++ classes and Bison 5697 In-reply-to: Your message of 09 Aug 1997 17:11:41 PDT. 5698 Date: Fri, 15 Aug 1997 10:48:19 PDT 5699 From: Vern Paxson <vern> 5700 5701 > #define YY_DECL int yylex (YYSTYPE *lvalp, struct parser_control 5702 > *parm) 5703 > 5704 > I have been trying to get this to work as a C++ scanner, but it does 5705 > not appear to be possible (warning that it matches no declarations in 5706 > yyFlexLexer, or something like that). 5707 > 5708 > Is this supposed to be possible, or is it being worked on (I DID 5709 > notice the comment that scanner classes are still experimental, so I'm 5710 > not too hopeful)? 5711 5712 What you need to do is derive a subclass from yyFlexLexer that provides 5713 the above yylex() method, squirrels away lvalp and parm into member 5714 variables, and then invokes yyFlexLexer::yylex() to do the regular scanning. 5715 5716 Vern 5717 5718 5719File: flex.info, Node: ERASEME56, Next: ERASEME57, Prev: ERASEME55, Up: FAQ 5720 5721ERASEME56 5722========= 5723 5724 To: Mikael.Latvala@lmf.ericsson.se 5725 Subject: Re: Possible mistake in Flex v2.5 document 5726 In-reply-to: Your message of Fri, 05 Sep 1997 16:07:24 PDT. 5727 Date: Fri, 05 Sep 1997 10:01:54 PDT 5728 From: Vern Paxson <vern> 5729 5730 > In that example you show how to count comment lines when using 5731 > C style /* ... */ comments. My question is, shouldn't you take into 5732 > account a scenario where end of a comment marker occurs inside 5733 > character or string literals? 5734 5735 The scanner certainly needs to also scan character and string literals. 5736 However it does that (there's an example in the man page for strings), the 5737 lexer will recognize the beginning of the literal before it runs across the 5738 embedded "/*". Consequently, it will finish scanning the literal before it 5739 even considers the possibility of matching "/*". 5740 5741 Example: 5742 5743 '([^']*|{ESCAPE_SEQUENCE})' 5744 5745 will match all the text between the ''s (inclusive). So the lexer 5746 considers this as a token beginning at the first ', and doesn't even 5747 attempt to match other tokens inside it. 5748 5749 I thinnk this subtlety is not worth putting in the manual, as I suspect 5750 it would confuse more people than it would enlighten. 5751 5752 Vern 5753 5754 5755File: flex.info, Node: ERASEME57, Next: Is there a repository for flex scanners?, Prev: ERASEME56, Up: FAQ 5756 5757ERASEME57 5758========= 5759 5760 To: "Marty Leisner" <leisner@sdsp.mc.xerox.com> 5761 Subject: Re: flex limitations 5762 In-reply-to: Your message of Sat, 06 Sep 1997 11:27:21 PDT. 5763 Date: Mon, 08 Sep 1997 11:38:08 PDT 5764 From: Vern Paxson <vern> 5765 5766 > %% 5767 > [a-zA-Z]+ /* skip a line */ 5768 > { printf("got %s\n", yytext); } 5769 > %% 5770 5771 What version of flex are you using? If I feed this to 2.5.4, it complains: 5772 5773 "bug.l", line 5: EOF encountered inside an action 5774 "bug.l", line 5: unrecognized rule 5775 "bug.l", line 5: fatal parse error 5776 5777 Not the world's greatest error message, but it manages to flag the problem. 5778 5779 (With the introduction of start condition scopes, flex can't accommodate 5780 an action on a separate line, since it's ambiguous with an indented rule.) 5781 5782 You can get 2.5.4 from ftp.ee.lbl.gov. 5783 5784 Vern 5785 5786 5787File: flex.info, Node: Is there a repository for flex scanners?, Next: How can I conditionally compile or preprocess my flex input file?, Prev: ERASEME57, Up: FAQ 5788 5789Is there a repository for flex scanners? 5790======================================== 5791 5792Not that we know of. You might try asking on comp.compilers. 5793 5794 5795File: flex.info, Node: How can I conditionally compile or preprocess my flex input file?, Next: Where can I find grammars for lex and yacc?, Prev: Is there a repository for flex scanners?, Up: FAQ 5796 5797How can I conditionally compile or preprocess my flex input file? 5798================================================================= 5799 5800Flex doesn't have a preprocessor like C does. You might try using m4, 5801or the C preprocessor plus a sed script to clean up the result. 5802 5803 5804File: flex.info, Node: Where can I find grammars for lex and yacc?, Next: I get an end-of-buffer message for each character scanned., Prev: How can I conditionally compile or preprocess my flex input file?, Up: FAQ 5805 5806Where can I find grammars for lex and yacc? 5807=========================================== 5808 5809In the sources for flex and bison. 5810 5811 5812File: flex.info, Node: I get an end-of-buffer message for each character scanned., Next: unnamed-faq-62, Prev: Where can I find grammars for lex and yacc?, Up: FAQ 5813 5814I get an end-of-buffer message for each character scanned. 5815========================================================== 5816 5817This will happen if your LexerInput() function returns only one 5818character at a time, which can happen either if you're scanner is 5819"interactive", or if the streams library on your platform always returns 58201 for yyin->gcount(). 5821 5822 Solution: override LexerInput() with a version that returns whole 5823buffers. 5824 5825 5826File: flex.info, Node: unnamed-faq-62, Next: unnamed-faq-63, Prev: I get an end-of-buffer message for each character scanned., Up: FAQ 5827 5828unnamed-faq-62 5829============== 5830 5831 To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE 5832 Subject: Re: Flex maximums 5833 In-reply-to: Your message of Mon, 17 Nov 1997 17:16:06 PST. 5834 Date: Mon, 17 Nov 1997 17:16:15 PST 5835 From: Vern Paxson <vern> 5836 5837 > I took a quick look into the flex-sources and altered some #defines in 5838 > flexdefs.h: 5839 > 5840 > #define INITIAL_MNS 64000 5841 > #define MNS_INCREMENT 1024000 5842 > #define MAXIMUM_MNS 64000 5843 5844 The things to fix are to add a couple of zeroes to: 5845 5846 #define JAMSTATE -32766 /* marks a reference to the state that always jams */ 5847 #define MAXIMUM_MNS 31999 5848 #define BAD_SUBSCRIPT -32767 5849 #define MAX_SHORT 32700 5850 5851 and, if you get complaints about too many rules, make the following change too: 5852 5853 #define YY_TRAILING_MASK 0x200000 5854 #define YY_TRAILING_HEAD_MASK 0x400000 5855 5856 - Vern 5857 5858 5859File: flex.info, Node: unnamed-faq-63, Next: unnamed-faq-64, Prev: unnamed-faq-62, Up: FAQ 5860 5861unnamed-faq-63 5862============== 5863 5864 To: jimmey@lexis-nexis.com (Jimmey Todd) 5865 Subject: Re: FLEX question regarding istream vs ifstream 5866 In-reply-to: Your message of Mon, 08 Dec 1997 15:54:15 PST. 5867 Date: Mon, 15 Dec 1997 13:21:35 PST 5868 From: Vern Paxson <vern> 5869 5870 > stdin_handle = YY_CURRENT_BUFFER; 5871 > ifstream fin( "aFile" ); 5872 > yy_switch_to_buffer( yy_create_buffer( fin, YY_BUF_SIZE ) ); 5873 > 5874 > What I'm wanting to do, is pass the contents of a file thru one set 5875 > of rules and then pass stdin thru another set... It works great if, I 5876 > don't use the C++ classes. But since everything else that I'm doing is 5877 > in C++, I thought I'd be consistent. 5878 > 5879 > The problem is that 'yy_create_buffer' is expecting an istream* as it's 5880 > first argument (as stated in the man page). However, fin is a ifstream 5881 > object. Any ideas on what I might be doing wrong? Any help would be 5882 > appreciated. Thanks!! 5883 5884 You need to pass &fin, to turn it into an ifstream* instead of an ifstream. 5885 Then its type will be compatible with the expected istream*, because ifstream 5886 is derived from istream. 5887 5888 Vern 5889 5890 5891File: flex.info, Node: unnamed-faq-64, Next: unnamed-faq-65, Prev: unnamed-faq-63, Up: FAQ 5892 5893unnamed-faq-64 5894============== 5895 5896 To: Enda Fadian <fadiane@piercom.ie> 5897 Subject: Re: Question related to Flex man page? 5898 In-reply-to: Your message of Tue, 16 Dec 1997 15:17:34 PST. 5899 Date: Tue, 16 Dec 1997 14:17:09 PST 5900 From: Vern Paxson <vern> 5901 5902 > Can you explain to me what is ment by a long-jump in relation to flex? 5903 5904 Using the longjmp() function while inside yylex() or a routine called by it. 5905 5906 > what is the flex activation frame. 5907 5908 Just yylex()'s stack frame. 5909 5910 > As far as I can see yyrestart will bring me back to the sart of the input 5911 > file and using flex++ isnot really an option! 5912 5913 No, yyrestart() doesn't imply a rewind, even though its name might sound 5914 like it does. It tells the scanner to flush its internal buffers and 5915 start reading from the given file at its present location. 5916 5917 Vern 5918 5919 5920File: flex.info, Node: unnamed-faq-65, Next: unnamed-faq-66, Prev: unnamed-faq-64, Up: FAQ 5921 5922unnamed-faq-65 5923============== 5924 5925 To: hassan@larc.info.uqam.ca (Hassan Alaoui) 5926 Subject: Re: Need urgent Help 5927 In-reply-to: Your message of Sat, 20 Dec 1997 19:38:19 PST. 5928 Date: Sun, 21 Dec 1997 21:30:46 PST 5929 From: Vern Paxson <vern> 5930 5931 > /usr/lib/yaccpar: In function `int yyparse()': 5932 > /usr/lib/yaccpar:184: warning: implicit declaration of function `int yylex(...)' 5933 > 5934 > ld: Undefined symbol 5935 > _yylex 5936 > _yyparse 5937 > _yyin 5938 5939 This is a known problem with Solaris C++ (and/or Solaris yacc). I believe 5940 the fix is to explicitly insert some 'extern "C"' statements for the 5941 corresponding routines/symbols. 5942 5943 Vern 5944 5945 5946File: flex.info, Node: unnamed-faq-66, Next: unnamed-faq-67, Prev: unnamed-faq-65, Up: FAQ 5947 5948unnamed-faq-66 5949============== 5950 5951 To: mc0307@mclink.it 5952 Cc: gnu@prep.ai.mit.edu 5953 Subject: Re: [mc0307@mclink.it: Help request] 5954 In-reply-to: Your message of Fri, 12 Dec 1997 17:57:29 PST. 5955 Date: Sun, 21 Dec 1997 22:33:37 PST 5956 From: Vern Paxson <vern> 5957 5958 > This is my definition for float and integer types: 5959 > . . . 5960 > NZD [1-9] 5961 > ... 5962 > I've tested my program on other lex version (on UNIX Sun Solaris an HP 5963 > UNIX) and it work well, so I think that my definitions are correct. 5964 > There are any differences between Lex and Flex? 5965 5966 There are indeed differences, as discussed in the man page. The one 5967 you are probably running into is that when flex expands a name definition, 5968 it puts parentheses around the expansion, while lex does not. There's 5969 an example in the man page of how this can lead to different matching. 5970 Flex's behavior complies with the POSIX standard (or at least with the 5971 last POSIX draft I saw). 5972 5973 Vern 5974 5975 5976File: flex.info, Node: unnamed-faq-67, Next: unnamed-faq-68, Prev: unnamed-faq-66, Up: FAQ 5977 5978unnamed-faq-67 5979============== 5980 5981 To: hassan@larc.info.uqam.ca (Hassan Alaoui) 5982 Subject: Re: Thanks 5983 In-reply-to: Your message of Mon, 22 Dec 1997 16:06:35 PST. 5984 Date: Mon, 22 Dec 1997 14:35:05 PST 5985 From: Vern Paxson <vern> 5986 5987 > Thank you very much for your help. I compile and link well with C++ while 5988 > declaring 'yylex ...' extern, But a little problem remains. I get a 5989 > segmentation default when executing ( I linked with lfl library) while it 5990 > works well when using LEX instead of flex. Do you have some ideas about the 5991 > reason for this ? 5992 5993 The one possible reason for this that comes to mind is if you've defined 5994 yytext as "extern char yytext[]" (which is what lex uses) instead of 5995 "extern char *yytext" (which is what flex uses). If it's not that, then 5996 I'm afraid I don't know what the problem might be. 5997 5998 Vern 5999 6000 6001File: flex.info, Node: unnamed-faq-68, Next: unnamed-faq-69, Prev: unnamed-faq-67, Up: FAQ 6002 6003unnamed-faq-68 6004============== 6005 6006 To: "Bart Niswonger" <NISWONGR@almaden.ibm.com> 6007 Subject: Re: flex 2.5: c++ scanners & start conditions 6008 In-reply-to: Your message of Tue, 06 Jan 1998 10:34:21 PST. 6009 Date: Tue, 06 Jan 1998 19:19:30 PST 6010 From: Vern Paxson <vern> 6011 6012 > The problem is that when I do this (using %option c++) start 6013 > conditions seem to not apply. 6014 6015 The BEGIN macro modifies the yy_start variable. For C scanners, this 6016 is a static with scope visible through the whole file. For C++ scanners, 6017 it's a member variable, so it only has visible scope within a member 6018 function. Your lexbegin() routine is not a member function when you 6019 build a C++ scanner, so it's not modifying the correct yy_start. The 6020 diagnostic that indicates this is that you found you needed to add 6021 a declaration of yy_start in order to get your scanner to compile when 6022 using C++; instead, the correct fix is to make lexbegin() a member 6023 function (by deriving from yyFlexLexer). 6024 6025 Vern 6026 6027 6028File: flex.info, Node: unnamed-faq-69, Next: unnamed-faq-70, Prev: unnamed-faq-68, Up: FAQ 6029 6030unnamed-faq-69 6031============== 6032 6033 To: "Boris Zinin" <boris@ippe.rssi.ru> 6034 Subject: Re: current position in flex buffer 6035 In-reply-to: Your message of Mon, 12 Jan 1998 18:58:23 PST. 6036 Date: Mon, 12 Jan 1998 12:03:15 PST 6037 From: Vern Paxson <vern> 6038 6039 > The problem is how to determine the current position in flex active 6040 > buffer when a rule is matched.... 6041 6042 You will need to keep track of this explicitly, such as by redefining 6043 YY_USER_ACTION to count the number of characters matched. 6044 6045 The latest flex release, by the way, is 2.5.4, available from ftp.ee.lbl.gov. 6046 6047 Vern 6048 6049 6050File: flex.info, Node: unnamed-faq-70, Next: unnamed-faq-71, Prev: unnamed-faq-69, Up: FAQ 6051 6052unnamed-faq-70 6053============== 6054 6055 To: Bik.Dhaliwal@bis.org 6056 Subject: Re: Flex question 6057 In-reply-to: Your message of Mon, 26 Jan 1998 13:05:35 PST. 6058 Date: Tue, 27 Jan 1998 22:41:52 PST 6059 From: Vern Paxson <vern> 6060 6061 > That requirement involves knowing 6062 > the character position at which a particular token was matched 6063 > in the lexer. 6064 6065 The way you have to do this is by explicitly keeping track of where 6066 you are in the file, by counting the number of characters scanned 6067 for each token (available in yyleng). It may prove convenient to 6068 do this by redefining YY_USER_ACTION, as described in the manual. 6069 6070 Vern 6071 6072 6073File: flex.info, Node: unnamed-faq-71, Next: unnamed-faq-72, Prev: unnamed-faq-70, Up: FAQ 6074 6075unnamed-faq-71 6076============== 6077 6078 To: Vladimir Alexiev <vladimir@cs.ualberta.ca> 6079 Subject: Re: flex: how to control start condition from parser? 6080 In-reply-to: Your message of Mon, 26 Jan 1998 05:50:16 PST. 6081 Date: Tue, 27 Jan 1998 22:45:37 PST 6082 From: Vern Paxson <vern> 6083 6084 > It seems useful for the parser to be able to tell the lexer about such 6085 > context dependencies, because then they don't have to be limited to 6086 > local or sequential context. 6087 6088 One way to do this is to have the parser call a stub routine that's 6089 included in the scanner's .l file, and consequently that has access ot 6090 BEGIN. The only ugliness is that the parser can't pass in the state 6091 it wants, because those aren't visible - but if you don't have many 6092 such states, then using a different set of names doesn't seem like 6093 to much of a burden. 6094 6095 While generating a .h file like you suggests is certainly cleaner, 6096 flex development has come to a virtual stand-still :-(, so a workaround 6097 like the above is much more pragmatic than waiting for a new feature. 6098 6099 Vern 6100 6101 6102File: flex.info, Node: unnamed-faq-72, Next: unnamed-faq-73, Prev: unnamed-faq-71, Up: FAQ 6103 6104unnamed-faq-72 6105============== 6106 6107 To: Barbara Denny <denny@3com.com> 6108 Subject: Re: freebsd flex bug? 6109 In-reply-to: Your message of Fri, 30 Jan 1998 12:00:43 PST. 6110 Date: Fri, 30 Jan 1998 12:42:32 PST 6111 From: Vern Paxson <vern> 6112 6113 > lex.yy.c:1996: parse error before `=' 6114 6115 This is the key, identifying this error. (It may help to pinpoint 6116 it by using flex -L, so it doesn't generate #line directives in its 6117 output.) I will bet you heavy money that you have a start condition 6118 name that is also a variable name, or something like that; flex spits 6119 out #define's for each start condition name, mapping them to a number, 6120 so you can wind up with: 6121 6122 %x foo 6123 %% 6124 ... 6125 %% 6126 void bar() 6127 { 6128 int foo = 3; 6129 } 6130 6131 and the penultimate will turn into "int 1 = 3" after C preprocessing, 6132 since flex will put "#define foo 1" in the generated scanner. 6133 6134 Vern 6135 6136 6137File: flex.info, Node: unnamed-faq-73, Next: unnamed-faq-74, Prev: unnamed-faq-72, Up: FAQ 6138 6139unnamed-faq-73 6140============== 6141 6142 To: Maurice Petrie <mpetrie@infoscigroup.com> 6143 Subject: Re: Lost flex .l file 6144 In-reply-to: Your message of Mon, 02 Feb 1998 14:10:01 PST. 6145 Date: Mon, 02 Feb 1998 11:15:12 PST 6146 From: Vern Paxson <vern> 6147 6148 > I am curious as to 6149 > whether there is a simple way to backtrack from the generated source to 6150 > reproduce the lost list of tokens we are searching on. 6151 6152 In theory, it's straight-forward to go from the DFA representation 6153 back to a regular-expression representation - the two are isomorphic. 6154 In practice, a huge headache, because you have to unpack all the tables 6155 back into a single DFA representation, and then write a program to munch 6156 on that and translate it into an RE. 6157 6158 Sorry for the less-than-happy news ... 6159 6160 Vern 6161 6162 6163File: flex.info, Node: unnamed-faq-74, Next: unnamed-faq-75, Prev: unnamed-faq-73, Up: FAQ 6164 6165unnamed-faq-74 6166============== 6167 6168 To: jimmey@lexis-nexis.com (Jimmey Todd) 6169 Subject: Re: Flex performance question 6170 In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST. 6171 Date: Thu, 19 Feb 1998 08:48:51 PST 6172 From: Vern Paxson <vern> 6173 6174 > What I have found, is that the smaller the data chunk, the faster the 6175 > program executes. This is the opposite of what I expected. Should this be 6176 > happening this way? 6177 6178 This is exactly what will happen if your input file has embedded NULs. 6179 From the man page: 6180 6181 A final note: flex is slow when matching NUL's, particularly 6182 when a token contains multiple NUL's. It's best to write 6183 rules which match short amounts of text if it's anticipated 6184 that the text will often include NUL's. 6185 6186 So that's the first thing to look for. 6187 6188 Vern 6189 6190 6191File: flex.info, Node: unnamed-faq-75, Next: unnamed-faq-76, Prev: unnamed-faq-74, Up: FAQ 6192 6193unnamed-faq-75 6194============== 6195 6196 To: jimmey@lexis-nexis.com (Jimmey Todd) 6197 Subject: Re: Flex performance question 6198 In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST. 6199 Date: Thu, 19 Feb 1998 15:42:25 PST 6200 From: Vern Paxson <vern> 6201 6202 So there are several problems. 6203 6204 First, to go fast, you want to match as much text as possible, which 6205 your scanners don't in the case that what they're scanning is *not* 6206 a <RN> tag. So you want a rule like: 6207 6208 [^<]+ 6209 6210 Second, C++ scanners are particularly slow if they're interactive, 6211 which they are by default. Using -B speeds it up by a factor of 3-4 6212 on my workstation. 6213 6214 Third, C++ scanners that use the istream interface are slow, because 6215 of how poorly implemented istream's are. I built two versions of 6216 the following scanner: 6217 6218 %% 6219 .*\n 6220 .* 6221 %% 6222 6223 and the C version inhales a 2.5MB file on my workstation in 0.8 seconds. 6224 The C++ istream version, using -B, takes 3.8 seconds. 6225 6226 Vern 6227 6228 6229File: flex.info, Node: unnamed-faq-76, Next: unnamed-faq-77, Prev: unnamed-faq-75, Up: FAQ 6230 6231unnamed-faq-76 6232============== 6233 6234 To: "Frescatore, David (CRD, TAD)" <frescatore@exc01crdge.crd.ge.com> 6235 Subject: Re: FLEX 2.5 & THE YEAR 2000 6236 In-reply-to: Your message of Wed, 03 Jun 1998 11:26:22 PDT. 6237 Date: Wed, 03 Jun 1998 10:22:26 PDT 6238 From: Vern Paxson <vern> 6239 6240 > I am researching the Y2K problem with General Electric R&D 6241 > and need to know if there are any known issues concerning 6242 > the above mentioned software and Y2K regardless of version. 6243 6244 There shouldn't be, all it ever does with the date is ask the system 6245 for it and then print it out. 6246 6247 Vern 6248 6249 6250File: flex.info, Node: unnamed-faq-77, Next: unnamed-faq-78, Prev: unnamed-faq-76, Up: FAQ 6251 6252unnamed-faq-77 6253============== 6254 6255 To: "Hans Dermot Doran" <htd@ibhdoran.com> 6256 Subject: Re: flex problem 6257 In-reply-to: Your message of Wed, 15 Jul 1998 21:30:13 PDT. 6258 Date: Tue, 21 Jul 1998 14:23:34 PDT 6259 From: Vern Paxson <vern> 6260 6261 > To overcome this, I gets() the stdin into a string and lex the string. The 6262 > string is lexed OK except that the end of string isn't lexed properly 6263 > (yy_scan_string()), that is the lexer dosn't recognise the end of string. 6264 6265 Flex doesn't contain mechanisms for recognizing buffer endpoints. But if 6266 you use fgets instead (which you should anyway, to protect against buffer 6267 overflows), then the final \n will be preserved in the string, and you can 6268 scan that in order to find the end of the string. 6269 6270 Vern 6271 6272 6273File: flex.info, Node: unnamed-faq-78, Next: unnamed-faq-79, Prev: unnamed-faq-77, Up: FAQ 6274 6275unnamed-faq-78 6276============== 6277 6278 To: soumen@almaden.ibm.com 6279 Subject: Re: Flex++ 2.5.3 instance member vs. static member 6280 In-reply-to: Your message of Mon, 27 Jul 1998 02:10:04 PDT. 6281 Date: Tue, 28 Jul 1998 01:10:34 PDT 6282 From: Vern Paxson <vern> 6283 6284 > %{ 6285 > int mylineno = 0; 6286 > %} 6287 > ws [ \t]+ 6288 > alpha [A-Za-z] 6289 > dig [0-9] 6290 > %% 6291 > 6292 > Now you'd expect mylineno to be a member of each instance of class 6293 > yyFlexLexer, but is this the case? A look at the lex.yy.cc file seems to 6294 > indicate otherwise; unless I am missing something the declaration of 6295 > mylineno seems to be outside any class scope. 6296 > 6297 > How will this work if I want to run a multi-threaded application with each 6298 > thread creating a FlexLexer instance? 6299 6300 Derive your own subclass and make mylineno a member variable of it. 6301 6302 Vern 6303 6304 6305File: flex.info, Node: unnamed-faq-79, Next: unnamed-faq-80, Prev: unnamed-faq-78, Up: FAQ 6306 6307unnamed-faq-79 6308============== 6309 6310 To: Adoram Rogel <adoram@hybridge.com> 6311 Subject: Re: More than 32K states change hangs 6312 In-reply-to: Your message of Tue, 04 Aug 1998 16:55:39 PDT. 6313 Date: Tue, 04 Aug 1998 22:28:45 PDT 6314 From: Vern Paxson <vern> 6315 6316 > Vern Paxson, 6317 > 6318 > I followed your advice, posted on Usenet bu you, and emailed to me 6319 > personally by you, on how to overcome the 32K states limit. I'm running 6320 > on Linux machines. 6321 > I took the full source of version 2.5.4 and did the following changes in 6322 > flexdef.h: 6323 > #define JAMSTATE -327660 6324 > #define MAXIMUM_MNS 319990 6325 > #define BAD_SUBSCRIPT -327670 6326 > #define MAX_SHORT 327000 6327 > 6328 > and compiled. 6329 > All looked fine, including check and bigcheck, so I installed. 6330 6331 Hmmm, you shouldn't increase MAX_SHORT, though looking through my email 6332 archives I see that I did indeed recommend doing so. Try setting it back 6333 to 32700; that should suffice that you no longer need -Ca. If it still 6334 hangs, then the interesting question is - where? 6335 6336 > Compiling the same hanged program with a out-of-the-box (RedHat 4.2 6337 > distribution of Linux) 6338 > flex 2.5.4 binary works. 6339 6340 Since Linux comes with source code, you should diff it against what 6341 you have to see what problems they missed. 6342 6343 > Should I always compile with the -Ca option now ? even short and simple 6344 > filters ? 6345 6346 No, definitely not. It's meant to be for those situations where you 6347 absolutely must squeeze every last cycle out of your scanner. 6348 6349 Vern 6350 6351 6352File: flex.info, Node: unnamed-faq-80, Next: unnamed-faq-81, Prev: unnamed-faq-79, Up: FAQ 6353 6354unnamed-faq-80 6355============== 6356 6357 To: "Schmackpfeffer, Craig" <Craig.Schmackpfeffer@usa.xerox.com> 6358 Subject: Re: flex output for static code portion 6359 In-reply-to: Your message of Tue, 11 Aug 1998 11:55:30 PDT. 6360 Date: Mon, 17 Aug 1998 23:57:42 PDT 6361 From: Vern Paxson <vern> 6362 6363 > I would like to use flex under the hood to generate a binary file 6364 > containing the data structures that control the parse. 6365 6366 This has been on the wish-list for a long time. In principle it's 6367 straight-forward - you redirect mkdata() et al's I/O to another file, 6368 and modify the skeleton to have a start-up function that slurps these 6369 into dynamic arrays. The concerns are (1) the scanner generation code 6370 is hairy and full of corner cases, so it's easy to get surprised when 6371 going down this path :-( ; and (2) being careful about buffering so 6372 that when the tables change you make sure the scanner starts in the 6373 correct state and reading at the right point in the input file. 6374 6375 > I was wondering if you know of anyone who has used flex in this way. 6376 6377 I don't - but it seems like a reasonable project to undertake (unlike 6378 numerous other flex tweaks :-). 6379 6380 Vern 6381 6382 6383File: flex.info, Node: unnamed-faq-81, Next: unnamed-faq-82, Prev: unnamed-faq-80, Up: FAQ 6384 6385unnamed-faq-81 6386============== 6387 6388 Received: from 131.173.17.11 (131.173.17.11 [131.173.17.11]) 6389 by ee.lbl.gov (8.9.1/8.9.1) with ESMTP id AAA03838 6390 for <vern@ee.lbl.gov>; Thu, 20 Aug 1998 00:47:57 -0700 (PDT) 6391 Received: from hal.cl-ki.uni-osnabrueck.de (hal.cl-ki.Uni-Osnabrueck.DE [131.173.141.2]) 6392 by deimos.rz.uni-osnabrueck.de (8.8.7/8.8.8) with ESMTP id JAA34694 6393 for <vern@ee.lbl.gov>; Thu, 20 Aug 1998 09:47:55 +0200 6394 Received: (from georg@localhost) by hal.cl-ki.uni-osnabrueck.de (8.6.12/8.6.12) id JAA34834 for vern@ee.lbl.gov; Thu, 20 Aug 1998 09:47:54 +0200 6395 From: Georg Rehm <georg@hal.cl-ki.uni-osnabrueck.de> 6396 Message-Id: <199808200747.JAA34834@hal.cl-ki.uni-osnabrueck.de> 6397 Subject: "flex scanner push-back overflow" 6398 To: vern@ee.lbl.gov 6399 Date: Thu, 20 Aug 1998 09:47:54 +0200 (MEST) 6400 Reply-To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE 6401 X-NoJunk: Do NOT send commercial mail, spam or ads to this address! 6402 X-URL: http://www.cl-ki.uni-osnabrueck.de/~georg/ 6403 X-Mailer: ELM [version 2.4ME+ PL28 (25)] 6404 MIME-Version: 1.0 6405 Content-Type: text/plain; charset=US-ASCII 6406 Content-Transfer-Encoding: 7bit 6407 6408 Hi Vern, 6409 6410 Yesterday, I encountered a strange problem: I use the macro processor m4 6411 to include some lengthy lists into a .l file. Following is a flex macro 6412 definition that causes some serious pain in my neck: 6413 6414 AUTHOR ("A. Boucard / L. Boucard"|"A. Dastarac / M. Levent"|"A.Boucaud / L.Boucaud"|"Abderrahim Lamchichi"|"Achmat Dangor"|"Adeline Toullier"|"Adewale Maja-Pearce"|"Ahmed Ziri"|"Akram Ellyas"|"Alain Bihr"|"Alain Gresh"|"Alain Guillemoles"|"Alain Joxe"|"Alain Morice"|"Alain Renon"|"Alain Zecchini"|"Albert Memmi"|"Alberto Manguel"|"Alex De Waal"|"Alfonso Artico"| [...]) 6415 6416 The complete list contains about 10kB. When I try to "flex" this file 6417 (on a Solaris 2.6 machine, using a modified flex 2.5.4 (I only increased 6418 some of the predefined values in flexdefs.h) I get the error: 6419 6420 myflex/flex -8 sentag.tmp.l 6421 flex scanner push-back overflow 6422 6423 When I remove the slashes in the macro definition everything works fine. 6424 As I understand it, the double quotes escape the slash-character so it 6425 really means "/" and not "trailing context". Furthermore, I tried to 6426 escape the slashes with backslashes, but with no use, the same error message 6427 appeared when flexing the code. 6428 6429 Do you have an idea what's going on here? 6430 6431 Greetings from Germany, 6432 Georg 6433 -- 6434 Georg Rehm georg@cl-ki.uni-osnabrueck.de 6435 Institute for Semantic Information Processing, University of Osnabrueck, FRG 6436 6437 6438File: flex.info, Node: unnamed-faq-82, Next: unnamed-faq-83, Prev: unnamed-faq-81, Up: FAQ 6439 6440unnamed-faq-82 6441============== 6442 6443 To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE 6444 Subject: Re: "flex scanner push-back overflow" 6445 In-reply-to: Your message of Thu, 20 Aug 1998 09:47:54 PDT. 6446 Date: Thu, 20 Aug 1998 07:05:35 PDT 6447 From: Vern Paxson <vern> 6448 6449 > myflex/flex -8 sentag.tmp.l 6450 > flex scanner push-back overflow 6451 6452 Flex itself uses a flex scanner. That scanner is running out of buffer 6453 space when it tries to unput() the humongous macro you've defined. When 6454 you remove the '/'s, you make it small enough so that it fits in the buffer; 6455 removing spaces would do the same thing. 6456 6457 The fix is to either rethink how come you're using such a big macro and 6458 perhaps there's another/better way to do it; or to rebuild flex's own 6459 scan.c with a larger value for 6460 6461 #define YY_BUF_SIZE 16384 6462 6463 - Vern 6464 6465 6466File: flex.info, Node: unnamed-faq-83, Next: unnamed-faq-84, Prev: unnamed-faq-82, Up: FAQ 6467 6468unnamed-faq-83 6469============== 6470 6471 To: Jan Kort <jan@research.techforce.nl> 6472 Subject: Re: Flex 6473 In-reply-to: Your message of Fri, 04 Sep 1998 12:18:43 +0200. 6474 Date: Sat, 05 Sep 1998 00:59:49 PDT 6475 From: Vern Paxson <vern> 6476 6477 > %% 6478 > 6479 > "TEST1\n" { fprintf(stderr, "TEST1\n"); yyless(5); } 6480 > ^\n { fprintf(stderr, "empty line\n"); } 6481 > . { } 6482 > \n { fprintf(stderr, "new line\n"); } 6483 > 6484 > %% 6485 > -- input --------------------------------------- 6486 > TEST1 6487 > -- output -------------------------------------- 6488 > TEST1 6489 > empty line 6490 > ------------------------------------------------ 6491 6492 IMHO, it's not clear whether or not this is in fact a bug. It depends 6493 on whether you view yyless() as backing up in the input stream, or as 6494 pushing new characters onto the beginning of the input stream. Flex 6495 interprets it as the latter (for implementation convenience, I'll admit), 6496 and so considers the newline as in fact matching at the beginning of a 6497 line, as after all the last token scanned an entire line and so the 6498 scanner is now at the beginning of a new line. 6499 6500 I agree that this is counter-intuitive for yyless(), given its 6501 functional description (it's less so for unput(), depending on whether 6502 you're unput()'ing new text or scanned text). But I don't plan to 6503 change it any time soon, as it's a pain to do so. Consequently, 6504 you do indeed need to use yy_set_bol() and YY_AT_BOL() to tweak 6505 your scanner into the behavior you desire. 6506 6507 Sorry for the less-than-completely-satisfactory answer. 6508 6509 Vern 6510 6511 6512File: flex.info, Node: unnamed-faq-84, Next: unnamed-faq-85, Prev: unnamed-faq-83, Up: FAQ 6513 6514unnamed-faq-84 6515============== 6516 6517 To: Patrick Krusenotto <krusenot@mac-info-link.de> 6518 Subject: Re: Problems with restarting flex-2.5.2-generated scanner 6519 In-reply-to: Your message of Thu, 24 Sep 1998 10:14:07 PDT. 6520 Date: Thu, 24 Sep 1998 23:28:43 PDT 6521 From: Vern Paxson <vern> 6522 6523 > I am using flex-2.5.2 and bison 1.25 for Solaris and I am desperately 6524 > trying to make my scanner restart with a new file after my parser stops 6525 > with a parse error. When my compiler restarts, the parser always 6526 > receives the token after the token (in the old file!) that caused the 6527 > parser error. 6528 6529 I suspect the problem is that your parser has read ahead in order 6530 to attempt to resolve an ambiguity, and when it's restarted it picks 6531 up with that token rather than reading a fresh one. If you're using 6532 yacc, then the special "error" production can sometimes be used to 6533 consume tokens in an attempt to get the parser into a consistent state. 6534 6535 Vern 6536 6537 6538File: flex.info, Node: unnamed-faq-85, Next: unnamed-faq-86, Prev: unnamed-faq-84, Up: FAQ 6539 6540unnamed-faq-85 6541============== 6542 6543 To: Henric Jungheim <junghelh@pe-nelson.com> 6544 Subject: Re: flex 2.5.4a 6545 In-reply-to: Your message of Tue, 27 Oct 1998 16:41:42 PST. 6546 Date: Tue, 27 Oct 1998 16:50:14 PST 6547 From: Vern Paxson <vern> 6548 6549 > This brings up a feature request: How about a command line 6550 > option to specify the filename when reading from stdin? That way one 6551 > doesn't need to create a temporary file in order to get the "#line" 6552 > directives to make sense. 6553 6554 Use -o combined with -t (per the man page description of -o). 6555 6556 > P.S., Is there any simple way to use non-blocking IO to parse multiple 6557 > streams? 6558 6559 Simple, no. 6560 6561 One approach might be to return a magic character on EWOULDBLOCK and 6562 have a rule 6563 6564 .*<magic-character> // put back .*, eat magic character 6565 6566 This is off the top of my head, not sure it'll work. 6567 6568 Vern 6569 6570 6571File: flex.info, Node: unnamed-faq-86, Next: unnamed-faq-87, Prev: unnamed-faq-85, Up: FAQ 6572 6573unnamed-faq-86 6574============== 6575 6576 To: "Repko, Billy D" <billy.d.repko@intel.com> 6577 Subject: Re: Compiling scanners 6578 In-reply-to: Your message of Wed, 13 Jan 1999 10:52:47 PST. 6579 Date: Thu, 14 Jan 1999 00:25:30 PST 6580 From: Vern Paxson <vern> 6581 6582 > It appears that maybe it cannot find the lfl library. 6583 6584 The Makefile in the distribution builds it, so you should have it. 6585 It's exceedingly trivial, just a main() that calls yylex() and 6586 a yyrap() that always returns 1. 6587 6588 > %% 6589 > \n ++num_lines; ++num_chars; 6590 > . ++num_chars; 6591 6592 You can't indent your rules like this - that's where the errors are coming 6593 from. Flex copies indented text to the output file, it's how you do things 6594 like 6595 6596 int num_lines_seen = 0; 6597 6598 to declare local variables. 6599 6600 Vern 6601 6602 6603File: flex.info, Node: unnamed-faq-87, Next: unnamed-faq-88, Prev: unnamed-faq-86, Up: FAQ 6604 6605unnamed-faq-87 6606============== 6607 6608 To: Erick Branderhorst <Erick.Branderhorst@asml.nl> 6609 Subject: Re: flex input buffer 6610 In-reply-to: Your message of Tue, 09 Feb 1999 13:53:46 PST. 6611 Date: Tue, 09 Feb 1999 21:03:37 PST 6612 From: Vern Paxson <vern> 6613 6614 > In the flex.skl file the size of the default input buffers is set. Can you 6615 > explain why this size is set and why it is such a high number. 6616 6617 It's large to optimize performance when scanning large files. You can 6618 safely make it a lot lower if needed. 6619 6620 Vern 6621 6622 6623File: flex.info, Node: unnamed-faq-88, Next: unnamed-faq-90, Prev: unnamed-faq-87, Up: FAQ 6624 6625unnamed-faq-88 6626============== 6627 6628 To: "Guido Minnen" <guidomi@cogs.susx.ac.uk> 6629 Subject: Re: Flex error message 6630 In-reply-to: Your message of Wed, 24 Feb 1999 15:31:46 PST. 6631 Date: Thu, 25 Feb 1999 00:11:31 PST 6632 From: Vern Paxson <vern> 6633 6634 > I'm extending a larger scanner written in Flex and I keep running into 6635 > problems. More specifically, I get the error message: 6636 > "flex: input rules are too complicated (>= 32000 NFA states)" 6637 6638 Increase the definitions in flexdef.h for: 6639 6640 #define JAMSTATE -32766 /* marks a reference to the state that always j 6641 ams */ 6642 #define MAXIMUM_MNS 31999 6643 #define BAD_SUBSCRIPT -32767 6644 6645 recompile everything, and it should all work. 6646 6647 Vern 6648 6649 6650File: flex.info, Node: unnamed-faq-90, Next: unnamed-faq-91, Prev: unnamed-faq-88, Up: FAQ 6651 6652unnamed-faq-90 6653============== 6654 6655 To: "Dmitriy Goldobin" <gold@ems.chel.su> 6656 Subject: Re: FLEX trouble 6657 In-reply-to: Your message of Mon, 31 May 1999 18:44:49 PDT. 6658 Date: Tue, 01 Jun 1999 00:15:07 PDT 6659 From: Vern Paxson <vern> 6660 6661 > I have a trouble with FLEX. Why rule "/*".*"*/" work properly,=20 6662 > but rule "/*"(.|\n)*"*/" don't work ? 6663 6664 The second of these will have to scan the entire input stream (because 6665 "(.|\n)*" matches an arbitrary amount of any text) in order to see if 6666 it ends with "*/", terminating the comment. That potentially will overflow 6667 the input buffer. 6668 6669 > More complex rule "/*"([^*]|(\*/[^/]))*"*/ give an error 6670 > 'unrecognized rule'. 6671 6672 You can't use the '/' operator inside parentheses. It's not clear 6673 what "(a/b)*" actually means. 6674 6675 > I now use workaround with state <comment>, but single-rule is 6676 > better, i think. 6677 6678 Single-rule is nice but will always have the problem of either setting 6679 restrictions on comments (like not allowing multi-line comments) and/or 6680 running the risk of consuming the entire input stream, as noted above. 6681 6682 Vern 6683 6684 6685File: flex.info, Node: unnamed-faq-91, Next: unnamed-faq-92, Prev: unnamed-faq-90, Up: FAQ 6686 6687unnamed-faq-91 6688============== 6689 6690 Received: from mc-qout4.whowhere.com (mc-qout4.whowhere.com [209.185.123.18]) 6691 by ee.lbl.gov (8.9.3/8.9.3) with SMTP id IAA05100 6692 for <vern@ee.lbl.gov>; Tue, 15 Jun 1999 08:56:06 -0700 (PDT) 6693 Received: from Unknown/Local ([?.?.?.?]) by my-deja.com; Tue Jun 15 08:55:43 1999 6694 To: vern@ee.lbl.gov 6695 Date: Tue, 15 Jun 1999 08:55:43 -0700 6696 From: "Aki Niimura" <neko@my-deja.com> 6697 Message-ID: <KNONDOHDOBGAEAAA@my-deja.com> 6698 Mime-Version: 1.0 6699 Cc: 6700 X-Sent-Mail: on 6701 Reply-To: 6702 X-Mailer: MailCity Service 6703 Subject: A question on flex C++ scanner 6704 X-Sender-Ip: 12.72.207.61 6705 Organization: My Deja Email (http://www.my-deja.com:80) 6706 Content-Type: text/plain; charset=us-ascii 6707 Content-Transfer-Encoding: 7bit 6708 6709 Dear Dr. Paxon, 6710 6711 I have been using flex for years. 6712 It works very well on many projects. 6713 Most case, I used it to generate a scanner on C language. 6714 However, one project I needed to generate a scanner 6715 on C++ lanuage. Thanks to your enhancement, flex did 6716 the job. 6717 6718 Currently, I'm working on enhancing my previous project. 6719 I need to deal with multiple input streams (recursive 6720 inclusion) in this scanner (C++). 6721 I did similar thing for another scanner (C) as you 6722 explained in your documentation. 6723 6724 The generated scanner (C++) has necessary methods: 6725 - switch_to_buffer(struct yy_buffer_state *b) 6726 - yy_create_buffer(istream *is, int sz) 6727 - yy_delete_buffer(struct yy_buffer_state *b) 6728 6729 However, I couldn't figure out how to access current 6730 buffer (yy_current_buffer). 6731 6732 yy_current_buffer is a protected member of yyFlexLexer. 6733 I can't access it directly. 6734 Then, I thought yy_create_buffer() with is = 0 might 6735 return current stream buffer. But it seems not as far 6736 as I checked the source. (flex 2.5.4) 6737 6738 I went through the Web in addition to Flex documentation. 6739 However, it hasn't been successful, so far. 6740 6741 It is not my intention to bother you, but, can you 6742 comment about how to obtain the current stream buffer? 6743 6744 Your response would be highly appreciated. 6745 6746 Best regards, 6747 Aki Niimura 6748 6749 --== Sent via Deja.com http://www.deja.com/ ==-- 6750 Share what you know. Learn what you don't. 6751 6752 6753File: flex.info, Node: unnamed-faq-92, Next: unnamed-faq-93, Prev: unnamed-faq-91, Up: FAQ 6754 6755unnamed-faq-92 6756============== 6757 6758 To: neko@my-deja.com 6759 Subject: Re: A question on flex C++ scanner 6760 In-reply-to: Your message of Tue, 15 Jun 1999 08:55:43 PDT. 6761 Date: Tue, 15 Jun 1999 09:04:24 PDT 6762 From: Vern Paxson <vern> 6763 6764 > However, I couldn't figure out how to access current 6765 > buffer (yy_current_buffer). 6766 6767 Derive your own subclass from yyFlexLexer. 6768 6769 Vern 6770 6771 6772File: flex.info, Node: unnamed-faq-93, Next: unnamed-faq-94, Prev: unnamed-faq-92, Up: FAQ 6773 6774unnamed-faq-93 6775============== 6776 6777 To: "Stones, Darren" <Darren.Stones@nectech.co.uk> 6778 Subject: Re: You're the man to see? 6779 In-reply-to: Your message of Wed, 23 Jun 1999 11:10:29 PDT. 6780 Date: Wed, 23 Jun 1999 09:01:40 PDT 6781 From: Vern Paxson <vern> 6782 6783 > I hope you can help me. I am using Flex and Bison to produce an interpreted 6784 > language. However all goes well until I try to implement an IF statement or 6785 > a WHILE. I cannot get this to work as the parser parses all the conditions 6786 > eg. the TRUE and FALSE conditons to check for a rule match. So I cannot 6787 > make a decision!! 6788 6789 You need to use the parser to build a parse tree (= abstract syntax trwee), 6790 and when that's all done you recursively evaluate the tree, binding variables 6791 to values at that time. 6792 6793 Vern 6794 6795 6796File: flex.info, Node: unnamed-faq-94, Next: unnamed-faq-95, Prev: unnamed-faq-93, Up: FAQ 6797 6798unnamed-faq-94 6799============== 6800 6801 To: Petr Danecek <petr@ics.cas.cz> 6802 Subject: Re: flex - question 6803 In-reply-to: Your message of Mon, 28 Jun 1999 19:21:41 PDT. 6804 Date: Fri, 02 Jul 1999 16:52:13 PDT 6805 From: Vern Paxson <vern> 6806 6807 > file, it takes an enormous amount of time. It is funny, because the 6808 > source code has only 12 rules!!! I think it looks like an exponencial 6809 > growth. 6810 6811 Right, that's the problem - some patterns (those with a lot of 6812 ambiguity, where yours has because at any given time the scanner can 6813 be in the middle of all sorts of combinations of the different 6814 rules) blow up exponentially. 6815 6816 For your rules, there is an easy fix. Change the ".*" that comes fater 6817 the directory name to "[^ ]*". With that in place, the rules are no 6818 longer nearly so ambiguous, because then once one of the directories 6819 has been matched, no other can be matched (since they all require a 6820 leading blank). 6821 6822 If that's not an acceptable solution, then you can enter a start state 6823 to pick up the .*\n after each directory is matched. 6824 6825 Also note that for speed, you'll want to add a ".*" rule at the end, 6826 otherwise rules that don't match any of the patterns will be matched 6827 very slowly, a character at a time. 6828 6829 Vern 6830 6831 6832File: flex.info, Node: unnamed-faq-95, Next: unnamed-faq-96, Prev: unnamed-faq-94, Up: FAQ 6833 6834unnamed-faq-95 6835============== 6836 6837 To: Tielman Koekemoer <tielman@spi.co.za> 6838 Subject: Re: Please help. 6839 In-reply-to: Your message of Thu, 08 Jul 1999 13:20:37 PDT. 6840 Date: Thu, 08 Jul 1999 08:20:39 PDT 6841 From: Vern Paxson <vern> 6842 6843 > I was hoping you could help me with my problem. 6844 > 6845 > I tried compiling (gnu)flex on a Solaris 2.4 machine 6846 > but when I ran make (after configure) I got an error. 6847 > 6848 > -------------------------------------------------------------- 6849 > gcc -c -I. -I. -g -O parse.c 6850 > ./flex -t -p ./scan.l >scan.c 6851 > sh: ./flex: not found 6852 > *** Error code 1 6853 > make: Fatal error: Command failed for target `scan.c' 6854 > ------------------------------------------------------------- 6855 > 6856 > What's strange to me is that I'm only 6857 > trying to install flex now. I then edited the Makefile to 6858 > and changed where it says "FLEX = flex" to "FLEX = lex" 6859 > ( lex: the native Solaris one ) but then it complains about 6860 > the "-p" option. Is there any way I can compile flex without 6861 > using flex or lex? 6862 > 6863 > Thanks so much for your time. 6864 6865 You managed to step on the bootstrap sequence, which first copies 6866 initscan.c to scan.c in order to build flex. Try fetching a fresh 6867 distribution from ftp.ee.lbl.gov. (Or you can first try removing 6868 ".bootstrap" and doing a make again.) 6869 6870 Vern 6871 6872 6873File: flex.info, Node: unnamed-faq-96, Next: unnamed-faq-97, Prev: unnamed-faq-95, Up: FAQ 6874 6875unnamed-faq-96 6876============== 6877 6878 To: Tielman Koekemoer <tielman@spi.co.za> 6879 Subject: Re: Please help. 6880 In-reply-to: Your message of Fri, 09 Jul 1999 09:16:14 PDT. 6881 Date: Fri, 09 Jul 1999 00:27:20 PDT 6882 From: Vern Paxson <vern> 6883 6884 > First I removed .bootstrap (and ran make) - no luck. I downloaded the 6885 > software but I still have the same problem. Is there anything else I 6886 > could try. 6887 6888 Try: 6889 6890 cp initscan.c scan.c 6891 touch scan.c 6892 make scan.o 6893 6894 If this last tries to first build scan.c from scan.l using ./flex, then 6895 your "make" is broken, in which case compile scan.c to scan.o by hand. 6896 6897 Vern 6898 6899 6900File: flex.info, Node: unnamed-faq-97, Next: unnamed-faq-98, Prev: unnamed-faq-96, Up: FAQ 6901 6902unnamed-faq-97 6903============== 6904 6905 To: Sumanth Kamenani <skamenan@crl.nmsu.edu> 6906 Subject: Re: Error 6907 In-reply-to: Your message of Mon, 19 Jul 1999 23:08:41 PDT. 6908 Date: Tue, 20 Jul 1999 00:18:26 PDT 6909 From: Vern Paxson <vern> 6910 6911 > I am getting a compilation error. The error is given as "unknown symbol- yylex". 6912 6913 The parser relies on calling yylex(), but you're instead using the C++ scanning 6914 class, so you need to supply a yylex() "glue" function that calls an instance 6915 scanner of the scanner (e.g., "scanner->yylex()"). 6916 6917 Vern 6918 6919 6920File: flex.info, Node: unnamed-faq-98, Next: unnamed-faq-99, Prev: unnamed-faq-97, Up: FAQ 6921 6922unnamed-faq-98 6923============== 6924 6925 To: daniel@synchrods.synchrods.COM (Daniel Senderowicz) 6926 Subject: Re: lex 6927 In-reply-to: Your message of Mon, 22 Nov 1999 11:19:04 PST. 6928 Date: Tue, 23 Nov 1999 15:54:30 PST 6929 From: Vern Paxson <vern> 6930 6931 Well, your problem is the 6932 6933 switch (yybgin-yysvec-1) { /* witchcraft */ 6934 6935 at the beginning of lex rules. "witchcraft" == "non-portable". It's 6936 assuming knowledge of the AT&T lex's internal variables. 6937 6938 For flex, you can probably do the equivalent using a switch on YYSTATE. 6939 6940 Vern 6941 6942 6943File: flex.info, Node: unnamed-faq-99, Next: unnamed-faq-100, Prev: unnamed-faq-98, Up: FAQ 6944 6945unnamed-faq-99 6946============== 6947 6948 To: archow@hss.hns.com 6949 Subject: Re: Regarding distribution of flex and yacc based grammars 6950 In-reply-to: Your message of Sun, 19 Dec 1999 17:50:24 +0530. 6951 Date: Wed, 22 Dec 1999 01:56:24 PST 6952 From: Vern Paxson <vern> 6953 6954 > When we provide the customer with an object code distribution, is it 6955 > necessary for us to provide source 6956 > for the generated C files from flex and bison since they are generated by 6957 > flex and bison ? 6958 6959 For flex, no. I don't know what the current state of this is for bison. 6960 6961 > Also, is there any requrirement for us to neccessarily provide source for 6962 > the grammar files which are fed into flex and bison ? 6963 6964 Again, for flex, no. 6965 6966 See the file "COPYING" in the flex distribution for the legalese. 6967 6968 Vern 6969 6970 6971File: flex.info, Node: unnamed-faq-100, Next: unnamed-faq-101, Prev: unnamed-faq-99, Up: FAQ 6972 6973unnamed-faq-100 6974=============== 6975 6976 To: Martin Gallwey <gallweym@hyperion.moe.ul.ie> 6977 Subject: Re: Flex, and self referencing rules 6978 In-reply-to: Your message of Sun, 20 Feb 2000 01:01:21 PST. 6979 Date: Sat, 19 Feb 2000 18:33:16 PST 6980 From: Vern Paxson <vern> 6981 6982 > However, I do not use unput anywhere. I do use self-referencing 6983 > rules like this: 6984 > 6985 > UnaryExpr ({UnionExpr})|("-"{UnaryExpr}) 6986 6987 You can't do this - flex is *not* a parser like yacc (which does indeed 6988 allow recursion), it is a scanner that's confined to regular expressions. 6989 6990 Vern 6991 6992 6993File: flex.info, Node: unnamed-faq-101, Next: What is the difference between YYLEX_PARAM and YY_DECL?, Prev: unnamed-faq-100, Up: FAQ 6994 6995unnamed-faq-101 6996=============== 6997 6998 To: slg3@lehigh.edu (SAMUEL L. GULDEN) 6999 Subject: Re: Flex problem 7000 In-reply-to: Your message of Thu, 02 Mar 2000 12:29:04 PST. 7001 Date: Thu, 02 Mar 2000 23:00:46 PST 7002 From: Vern Paxson <vern> 7003 7004 If this is exactly your program: 7005 7006 > digit [0-9] 7007 > digits {digit}+ 7008 > whitespace [ \t\n]+ 7009 > 7010 > %% 7011 > "[" { printf("open_brac\n");} 7012 > "]" { printf("close_brac\n");} 7013 > "+" { printf("addop\n");} 7014 > "*" { printf("multop\n");} 7015 > {digits} { printf("NUMBER = %s\n", yytext);} 7016 > whitespace ; 7017 7018 then the problem is that the last rule needs to be "{whitespace}" ! 7019 7020 Vern 7021 7022 7023File: flex.info, Node: What is the difference between YYLEX_PARAM and YY_DECL?, Next: Why do I get "conflicting types for yylex" error?, Prev: unnamed-faq-101, Up: FAQ 7024 7025What is the difference between YYLEX_PARAM and YY_DECL? 7026======================================================= 7027 7028YYLEX_PARAM is not a flex symbol. It is for Bison. It tells Bison to 7029pass extra params when it calls yylex() from the parser. 7030 7031 YY_DECL is the Flex declaration of yylex. The default is similar to 7032this: 7033 7034 #define int yy_lex () 7035 7036 7037File: flex.info, Node: Why do I get "conflicting types for yylex" error?, Next: How do I access the values set in a Flex action from within a Bison action?, Prev: What is the difference between YYLEX_PARAM and YY_DECL?, Up: FAQ 7038 7039Why do I get "conflicting types for yylex" error? 7040================================================= 7041 7042This is a compiler error regarding a generated Bison parser, not a Flex 7043scanner. It means you need a prototype of yylex() in the top of the 7044Bison file. Be sure the prototype matches YY_DECL. 7045 7046 7047File: flex.info, Node: How do I access the values set in a Flex action from within a Bison action?, Prev: Why do I get "conflicting types for yylex" error?, Up: FAQ 7048 7049How do I access the values set in a Flex action from within a Bison action? 7050=========================================================================== 7051 7052With $1, $2, $3, etc. These are called "Semantic Values" in the Bison 7053manual. See *note (bison)Top::. 7054 7055 7056File: flex.info, Node: Appendices, Next: Indices, Prev: FAQ, Up: Top 7057 7058Appendix A Appendices 7059********************* 7060 7061* Menu: 7062 7063* Makefiles and Flex:: 7064* Bison Bridge:: 7065* M4 Dependency:: 7066* Common Patterns:: 7067 7068 7069File: flex.info, Node: Makefiles and Flex, Next: Bison Bridge, Prev: Appendices, Up: Appendices 7070 7071A.1 Makefiles and Flex 7072====================== 7073 7074In this appendix, we provide tips for writing Makefiles to build your 7075scanners. 7076 7077 In a traditional build environment, we say that the '.c' files are 7078the sources, and the '.o' files are the intermediate files. When using 7079'flex', however, the '.l' files are the sources, and the generated '.c' 7080files (along with the '.o' files) are the intermediate files. This 7081requires you to carefully plan your Makefile. 7082 7083 Modern 'make' programs understand that 'foo.l' is intended to 7084generate 'lex.yy.c' or 'foo.c', and will behave accordingly(1)(2). The 7085following Makefile does not explicitly instruct 'make' how to build 7086'foo.c' from 'foo.l'. Instead, it relies on the implicit rules of the 7087'make' program to build the intermediate file, 'scan.c': 7088 7089 # Basic Makefile -- relies on implicit rules 7090 # Creates "myprogram" from "scan.l" and "myprogram.c" 7091 # 7092 LEX=flex 7093 myprogram: scan.o myprogram.o 7094 scan.o: scan.l 7095 7096 7097 For simple cases, the above may be sufficient. For other cases, you 7098may have to explicitly instruct 'make' how to build your scanner. The 7099following is an example of a Makefile containing explicit rules: 7100 7101 # Basic Makefile -- provides explicit rules 7102 # Creates "myprogram" from "scan.l" and "myprogram.c" 7103 # 7104 LEX=flex 7105 myprogram: scan.o myprogram.o 7106 $(CC) -o $@ $(LDFLAGS) $^ 7107 7108 myprogram.o: myprogram.c 7109 $(CC) $(CPPFLAGS) $(CFLAGS) -o $@ -c $^ 7110 7111 scan.o: scan.c 7112 $(CC) $(CPPFLAGS) $(CFLAGS) -o $@ -c $^ 7113 7114 scan.c: scan.l 7115 $(LEX) $(LFLAGS) -o $@ $^ 7116 7117 clean: 7118 $(RM) *.o scan.c 7119 7120 7121 Notice in the above example that 'scan.c' is in the 'clean' target. 7122This is because we consider the file 'scan.c' to be an intermediate 7123file. 7124 7125 Finally, we provide a realistic example of a 'flex' scanner used with 7126a 'bison' parser(3). There is a tricky problem we have to deal with. 7127Since a 'flex' scanner will typically include a header file (e.g., 7128'y.tab.h') generated by the parser, we need to be sure that the header 7129file is generated BEFORE the scanner is compiled. We handle this case 7130in the following example: 7131 7132 # Makefile example -- scanner and parser. 7133 # Creates "myprogram" from "scan.l", "parse.y", and "myprogram.c" 7134 # 7135 LEX = flex 7136 YACC = bison -y 7137 YFLAGS = -d 7138 objects = scan.o parse.o myprogram.o 7139 7140 myprogram: $(objects) 7141 scan.o: scan.l parse.c 7142 parse.o: parse.y 7143 myprogram.o: myprogram.c 7144 7145 7146 In the above example, notice the line, 7147 7148 scan.o: scan.l parse.c 7149 7150 , which lists the file 'parse.c' (the generated parser) as a 7151dependency of 'scan.o'. We want to ensure that the parser is created 7152before the scanner is compiled, and the above line seems to do the 7153trick. Feel free to experiment with your specific implementation of 7154'make'. 7155 7156 For more details on writing Makefiles, see *note (make)Top::. 7157 7158 ---------- Footnotes ---------- 7159 7160 (1) GNU 'make' and GNU 'automake' are two such programs that provide 7161implicit rules for flex-generated scanners. 7162 7163 (2) GNU 'automake' may generate code to execute flex in 7164lex-compatible mode, or to stdout. If this is not what you want, then 7165you should provide an explicit rule in your Makefile.am 7166 7167 (3) This example also applies to yacc parsers. 7168 7169 7170File: flex.info, Node: Bison Bridge, Next: M4 Dependency, Prev: Makefiles and Flex, Up: Appendices 7171 7172A.2 C Scanners with Bison Parsers 7173================================= 7174 7175This section describes the 'flex' features useful when integrating 7176'flex' with 'GNU bison'(1). Skip this section if you are not using 7177'bison' with your scanner. Here we discuss only the 'flex' half of the 7178'flex' and 'bison' pair. We do not discuss 'bison' in any detail. For 7179more information about generating 'bison' parsers, see *note 7180(bison)Top::. 7181 7182 A compatible 'bison' scanner is generated by declaring '%option 7183bison-bridge' or by supplying '--bison-bridge' when invoking 'flex' from 7184the command line. This instructs 'flex' that the macro 'yylval' may be 7185used. The data type for 'yylval', 'YYSTYPE', is typically defined in a 7186header file, included in section 1 of the 'flex' input file. For a list 7187of functions and macros available, *Note bison-functions::. 7188 7189 The declaration of yylex becomes, 7190 7191 int yylex ( YYSTYPE * lvalp, yyscan_t scanner ); 7192 7193 If '%option bison-locations' is specified, then the declaration 7194becomes, 7195 7196 int yylex ( YYSTYPE * lvalp, YYLTYPE * llocp, yyscan_t scanner ); 7197 7198 Note that the macros 'yylval' and 'yylloc' evaluate to pointers. 7199Support for 'yylloc' is optional in 'bison', so it is optional in 'flex' 7200as well. The following is an example of a 'flex' scanner that is 7201compatible with 'bison'. 7202 7203 /* Scanner for "C" assignment statements... sort of. */ 7204 %{ 7205 #include "y.tab.h" /* Generated by bison. */ 7206 %} 7207 7208 %option bison-bridge bison-locations 7209 % 7210 7211 [[:digit:]]+ { yylval->num = atoi(yytext); return NUMBER;} 7212 [[:alnum:]]+ { yylval->str = strdup(yytext); return STRING;} 7213 "="|";" { return yytext[0];} 7214 . {} 7215 % 7216 7217 As you can see, there really is no magic here. We just use 'yylval' 7218as we would any other variable. The data type of 'yylval' is generated 7219by 'bison', and included in the file 'y.tab.h'. Here is the 7220corresponding 'bison' parser: 7221 7222 /* Parser to convert "C" assignments to lisp. */ 7223 %{ 7224 /* Pass the argument to yyparse through to yylex. */ 7225 #define YYPARSE_PARAM scanner 7226 #define YYLEX_PARAM scanner 7227 %} 7228 %locations 7229 %pure_parser 7230 %union { 7231 int num; 7232 char* str; 7233 } 7234 %token <str> STRING 7235 %token <num> NUMBER 7236 %% 7237 assignment: 7238 STRING '=' NUMBER ';' { 7239 printf( "(setf %s %d)", $1, $3 ); 7240 } 7241 ; 7242 7243 ---------- Footnotes ---------- 7244 7245 (1) The features described here are purely optional, and are by no 7246means the only way to use flex with bison. We merely provide some glue 7247to ease development of your parser-scanner pair. 7248 7249 7250File: flex.info, Node: M4 Dependency, Next: Common Patterns, Prev: Bison Bridge, Up: Appendices 7251 7252A.3 M4 Dependency 7253================= 7254 7255The macro processor 'm4'(1) must be installed wherever flex is 7256installed. 'flex' invokes 'm4', found by searching the directories in 7257the 'PATH' environment variable. Any code you place in section 1 or in 7258the actions will be sent through m4. Please follow these rules to 7259protect your code from unwanted 'm4' processing. 7260 7261 * Do not use symbols that begin with, 'm4_', such as, 'm4_define', or 7262 'm4_include', since those are reserved for 'm4' macro names. If 7263 for some reason you need m4_ as a prefix, use a preprocessor 7264 #define to get your symbol past m4 unmangled. 7265 7266 * Do not use the strings '[[' or ']]' anywhere in your code. The 7267 former is not valid in C, except within comments and strings, but 7268 the latter is valid in code such as 'x[y[z]]'. The solution is 7269 simple. To get the literal string '"]]"', use '"]""]"'. To get 7270 the array notation 'x[y[z]]', use 'x[y[z] ]'. Flex will attempt to 7271 detect these sequences in user code, and escape them. However, 7272 it's best to avoid this complexity where possible, by removing such 7273 sequences from your code. 7274 7275 'm4' is only required at the time you run 'flex'. The generated 7276scanner is ordinary C or C++, and does _not_ require 'm4'. 7277 7278 ---------- Footnotes ---------- 7279 7280 (1) The use of m4 is subject to change in future revisions of flex. 7281It is not part of the public API of flex. Do not depend on it. 7282 7283 7284File: flex.info, Node: Common Patterns, Prev: M4 Dependency, Up: Appendices 7285 7286A.4 Common Patterns 7287=================== 7288 7289This appendix provides examples of common regular expressions you might 7290use in your scanner. 7291 7292* Menu: 7293 7294* Numbers:: 7295* Identifiers:: 7296* Quoted Constructs:: 7297* Addresses:: 7298 7299 7300File: flex.info, Node: Numbers, Next: Identifiers, Up: Common Patterns 7301 7302A.4.1 Numbers 7303------------- 7304 7305C99 decimal constant 7306 '([[:digit:]]{-}[0])[[:digit:]]*' 7307 7308C99 hexadecimal constant 7309 '0[xX][[:xdigit:]]+' 7310 7311C99 octal constant 7312 '0[01234567]*' 7313 7314C99 floating point constant 7315 {dseq} ([[:digit:]]+) 7316 {dseq_opt} ([[:digit:]]*) 7317 {frac} (({dseq_opt}"."{dseq})|{dseq}".") 7318 {exp} ([eE][+-]?{dseq}) 7319 {exp_opt} ({exp}?) 7320 {fsuff} [flFL] 7321 {fsuff_opt} ({fsuff}?) 7322 {hpref} (0[xX]) 7323 {hdseq} ([[:xdigit:]]+) 7324 {hdseq_opt} ([[:xdigit:]]*) 7325 {hfrac} (({hdseq_opt}"."{hdseq})|({hdseq}".")) 7326 {bexp} ([pP][+-]?{dseq}) 7327 {dfc} (({frac}{exp_opt}{fsuff_opt})|({dseq}{exp}{fsuff_opt})) 7328 {hfc} (({hpref}{hfrac}{bexp}{fsuff_opt})|({hpref}{hdseq}{bexp}{fsuff_opt})) 7329 7330 {c99_floating_point_constant} ({dfc}|{hfc}) 7331 7332 See C99 section 6.4.4.2 for the gory details. 7333 7334 7335File: flex.info, Node: Identifiers, Next: Quoted Constructs, Prev: Numbers, Up: Common Patterns 7336 7337A.4.2 Identifiers 7338----------------- 7339 7340C99 Identifier 7341 ucn ((\\u([[:xdigit:]]{4}))|(\\U([[:xdigit:]]{8}))) 7342 nondigit [_[:alpha:]] 7343 c99_id ([_[:alpha:]]|{ucn})([_[:alnum:]]|{ucn})* 7344 7345 Technically, the above pattern does not encompass all possible C99 7346 identifiers, since C99 allows for "implementation-defined" 7347 characters. In practice, C compilers follow the above pattern, 7348 with the addition of the '$' character. 7349 7350UTF-8 Encoded Unicode Code Point 7351 [\x09\x0A\x0D\x20-\x7E]|[\xC2-\xDF][\x80-\xBF]|\xE0[\xA0-\xBF][\x80-\xBF]|[\xE1-\xEC\xEE\xEF]([\x80-\xBF]{2})|\xED[\x80-\x9F][\x80-\xBF]|\xF0[\x90-\xBF]([\x80-\xBF]{2})|[\xF1-\xF3]([\x80-\xBF]{3})|\xF4[\x80-\x8F]([\x80-\xBF]{2}) 7352 7353 7354File: flex.info, Node: Quoted Constructs, Next: Addresses, Prev: Identifiers, Up: Common Patterns 7355 7356A.4.3 Quoted Constructs 7357----------------------- 7358 7359C99 String Literal 7360 'L?\"([^\"\\\n]|(\\['\"?\\abfnrtv])|(\\([0123456]{1,3}))|(\\x[[:xdigit:]]+)|(\\u([[:xdigit:]]{4}))|(\\U([[:xdigit:]]{8})))*\"' 7361 7362C99 Comment 7363 '("/*"([^*]|"*"[^/])*"*/")|("/"(\\\n)*"/"[^\n]*)' 7364 7365 Note that in C99, a '//'-style comment may be split across lines, 7366 and, contrary to popular belief, does not include the trailing '\n' 7367 character. 7368 7369 A better way to scan '/* */' comments is by line, rather than 7370 matching possibly huge comments all at once. This will allow you 7371 to scan comments of unlimited length, as long as line breaks appear 7372 at sane intervals. This is also more efficient when used with 7373 automatic line number processing. *Note option-yylineno::. 7374 7375 <INITIAL>{ 7376 "/*" BEGIN(COMMENT); 7377 } 7378 <COMMENT>{ 7379 "*/" BEGIN(0); 7380 [^*\n]+ ; 7381 "*"[^/] ; 7382 \n ; 7383 } 7384 7385 7386File: flex.info, Node: Addresses, Prev: Quoted Constructs, Up: Common Patterns 7387 7388A.4.4 Addresses 7389--------------- 7390 7391IPv4 Address 7392 dec-octet [0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5] 7393 IPv4address {dec-octet}\.{dec-octet}\.{dec-octet}\.{dec-octet} 7394 7395IPv6 Address 7396 h16 [0-9A-Fa-f]{1,4} 7397 ls32 {h16}:{h16}|{IPv4address} 7398 IPv6address ({h16}:){6}{ls32}| 7399 ::({h16}:){5}{ls32}| 7400 ({h16})?::({h16}:){4}{ls32}| 7401 (({h16}:){0,1}{h16})?::({h16}:){3}{ls32}| 7402 (({h16}:){0,2}{h16})?::({h16}:){2}{ls32}| 7403 (({h16}:){0,3}{h16})?::{h16}:{ls32}| 7404 (({h16}:){0,4}{h16})?::{ls32}| 7405 (({h16}:){0,5}{h16})?::{h16}| 7406 (({h16}:){0,6}{h16})?:: 7407 7408 See RFC 2373 (http://www.ietf.org/rfc/rfc2373.txt) for details. 7409 Note that you have to fold the definition of 'IPv6address' into one 7410 line and that it also matches the "unspecified address" "::". 7411 7412URI 7413 '(([^:/?#]+):)?("//"([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?' 7414 7415 This pattern is nearly useless, since it allows just about any 7416 character to appear in a URI, including spaces and control 7417 characters. See RFC 2396 (http://www.ietf.org/rfc/rfc2396.txt) for 7418 details. 7419 7420 7421File: flex.info, Node: Indices, Prev: Appendices, Up: Top 7422 7423Indices 7424******* 7425 7426* Menu: 7427 7428* Concept Index:: 7429* Index of Functions and Macros:: 7430* Index of Variables:: 7431* Index of Data Types:: 7432* Index of Hooks:: 7433* Index of Scanner Options:: 7434 7435 7436File: flex.info, Node: Concept Index, Next: Index of Functions and Macros, Prev: Indices, Up: Indices 7437 7438Concept Index 7439============= 7440 7441[index] 7442* Menu: 7443 7444* $ as normal character in patterns: Patterns. (line 275) 7445* %array, advantages of: Matching. (line 43) 7446* %array, use of: Matching. (line 29) 7447* %array, with C++: Matching. (line 65) 7448* %option noyywrapp: Generated Scanner. (line 93) 7449* %pointer, and unput(): Actions. (line 162) 7450* %pointer, use of: Matching. (line 29) 7451* %top: Definitions Section. (line 44) 7452* %{ and %}, in Definitions Section: Definitions Section. (line 40) 7453* %{ and %}, in Rules Section: Actions. (line 26) 7454* <<EOF>>, use of: EOF. (line 33) 7455* [] in patterns: Patterns. (line 15) 7456* ^ as non-special character in patterns: Patterns. (line 275) 7457* |, in actions: Actions. (line 33) 7458* |, use of: Actions. (line 83) 7459* accessor functions, use of: Accessor Methods. (line 18) 7460* actions: Actions. (line 6) 7461* actions, embedded C strings: Actions. (line 26) 7462* actions, redefining YY_BREAK: Misc Macros. (line 49) 7463* actions, use of { and }: Actions. (line 26) 7464* aliases, how to define: Definitions Section. (line 10) 7465* arguments, command-line: Scanner Options. (line 6) 7466* array, default size for yytext: User Values. (line 13) 7467* backing up, eliminating: Performance. (line 54) 7468* backing up, eliminating by adding error rules: Performance. (line 104) 7469* backing up, eliminating with catch-all rule: Performance. (line 118) 7470* backing up, example of eliminating: Performance. (line 49) 7471* BEGIN: Actions. (line 57) 7472* BEGIN, explanation: Start Conditions. (line 84) 7473* beginning of line, in patterns: Patterns. (line 127) 7474* bison, bridging with flex: Bison Bridge. (line 6) 7475* bison, parser: Bison Bridge. (line 53) 7476* bison, scanner to be called from bison: Bison Bridge. (line 34) 7477* BOL, checking the BOL flag: Misc Macros. (line 46) 7478* BOL, in patterns: Patterns. (line 127) 7479* BOL, setting it: Misc Macros. (line 40) 7480* braces in patterns: Patterns. (line 42) 7481* bugs, reporting: Reporting Bugs. (line 6) 7482* C code in flex input: Definitions Section. (line 40) 7483* C++: Cxx. (line 9) 7484* C++ and %array: User Values. (line 23) 7485* C++ I/O, customizing: How do I use my own I/O classes in a C++ scanner?. 7486 (line 9) 7487* C++ scanners, including multiple scanners: Cxx. (line 197) 7488* C++ scanners, use of: Cxx. (line 128) 7489* c++, experimental form of scanner class: Cxx. (line 6) 7490* C++, multiple different scanners: Cxx. (line 192) 7491* C-strings, in actions: Actions. (line 26) 7492* case-insensitive, effect on character classes: Patterns. (line 216) 7493* character classes in patterns: Patterns. (line 186) 7494* character classes in patterns, syntax of: Patterns. (line 15) 7495* character classes, equivalence of: Patterns. (line 205) 7496* clearing an input buffer: Multiple Input Buffers. 7497 (line 66) 7498* command-line options: Scanner Options. (line 6) 7499* comments in flex input: Definitions Section. (line 37) 7500* comments in the input: Comments in the Input. 7501 (line 24) 7502* comments, discarding: Actions. (line 176) 7503* comments, example of scanning C comments: Start Conditions. (line 140) 7504* comments, in actions: Actions. (line 26) 7505* comments, in rules section: Comments in the Input. 7506 (line 11) 7507* comments, syntax of: Comments in the Input. 7508 (line 6) 7509* comments, valid uses of: Comments in the Input. 7510 (line 24) 7511* compressing whitespace: Actions. (line 22) 7512* concatenation, in patterns: Patterns. (line 111) 7513* copyright of flex: Copyright. (line 6) 7514* counting characters and lines: Simple Examples. (line 23) 7515* customizing I/O in C++ scanners: How do I use my own I/O classes in a C++ scanner?. 7516 (line 9) 7517* default rule: Simple Examples. (line 15) 7518* default rule <1>: Matching. (line 20) 7519* defining pattern aliases: Definitions Section. (line 21) 7520* Definitions, in flex input: Definitions Section. (line 6) 7521* deleting lines from input: Actions. (line 13) 7522* discarding C comments: Actions. (line 176) 7523* distributing flex: Copyright. (line 6) 7524* ECHO: Actions. (line 54) 7525* ECHO, and yyout: Generated Scanner. (line 101) 7526* embedding C code in flex input: Definitions Section. (line 40) 7527* end of file, in patterns: Patterns. (line 150) 7528* end of line, in negated character classes: Patterns. (line 237) 7529* end of line, in patterns: Patterns. (line 131) 7530* end-of-file, and yyrestart(): Generated Scanner. (line 42) 7531* EOF and yyrestart(): Generated Scanner. (line 42) 7532* EOF in patterns, syntax of: Patterns. (line 150) 7533* EOF, example using multiple input buffers: Multiple Input Buffers. 7534 (line 81) 7535* EOF, explanation: EOF. (line 6) 7536* EOF, pushing back: Actions. (line 170) 7537* EOL, in negated character classes: Patterns. (line 237) 7538* EOL, in patterns: Patterns. (line 131) 7539* error messages, end of buffer missed: Lex and Posix. (line 50) 7540* error reporting, diagnostic messages: Diagnostics. (line 6) 7541* error reporting, in C++: Cxx. (line 112) 7542* error rules, to eliminate backing up: Performance. (line 102) 7543* escape sequences in patterns, syntax of: Patterns. (line 57) 7544* exiting with yyterminate(): Actions. (line 212) 7545* experimental form of c++ scanner class: Cxx. (line 6) 7546* extended scope of start conditions: Start Conditions. (line 270) 7547* file format: Format. (line 6) 7548* file format, serialized tables: Tables File Format. (line 6) 7549* flushing an input buffer: Multiple Input Buffers. 7550 (line 66) 7551* flushing the internal buffer: Actions. (line 206) 7552* format of flex input: Format. (line 6) 7553* format of input file: Format. (line 9) 7554* freeing tables: Loading and Unloading Serialized Tables. 7555 (line 6) 7556* getting current start state with YY_START: Start Conditions. 7557 (line 189) 7558* halting with yyterminate(): Actions. (line 212) 7559* handling include files with multiple input buffers: Multiple Input Buffers. 7560 (line 87) 7561* handling include files with multiple input buffers <1>: Multiple Input Buffers. 7562 (line 122) 7563* header files, with C++: Cxx. (line 197) 7564* include files, with C++: Cxx. (line 197) 7565* input file, Definitions section: Definitions Section. (line 6) 7566* input file, Rules Section: Rules Section. (line 6) 7567* input file, user code Section: User Code Section. (line 6) 7568* input(): Actions. (line 173) 7569* input(), and C++: Actions. (line 202) 7570* input, format of: Format. (line 6) 7571* input, matching: Matching. (line 6) 7572* keywords, for performance: Performance. (line 200) 7573* lex (traditional) and POSIX: Lex and Posix. (line 6) 7574* LexerInput, overriding: How do I use my own I/O classes in a C++ scanner?. 7575 (line 9) 7576* LexerOutput, overriding: How do I use my own I/O classes in a C++ scanner?. 7577 (line 9) 7578* limitations of flex: Limitations. (line 6) 7579* literal text in patterns, syntax of: Patterns. (line 54) 7580* loading tables at runtime: Loading and Unloading Serialized Tables. 7581 (line 6) 7582* m4: M4 Dependency. (line 6) 7583* Makefile, example of implicit rules: Makefiles and Flex. (line 21) 7584* Makefile, explicit example: Makefiles and Flex. (line 33) 7585* Makefile, syntax: Makefiles and Flex. (line 6) 7586* matching C-style double-quoted strings: Start Conditions. (line 203) 7587* matching, and trailing context: Matching. (line 6) 7588* matching, length of: Matching. (line 6) 7589* matching, multiple matches: Matching. (line 6) 7590* member functions, C++: Cxx. (line 9) 7591* memory management: Memory Management. (line 6) 7592* memory, allocating input buffers: Multiple Input Buffers. 7593 (line 19) 7594* memory, considerations for reentrant scanners: Init and Destroy Functions. 7595 (line 6) 7596* memory, deleting input buffers: Multiple Input Buffers. 7597 (line 46) 7598* memory, for start condition stacks: Start Conditions. (line 301) 7599* memory, serialized tables: Serialized Tables. (line 6) 7600* memory, serialized tables <1>: Loading and Unloading Serialized Tables. 7601 (line 6) 7602* methods, c++: Cxx. (line 9) 7603* minimal scanner: Matching. (line 24) 7604* multiple input streams: Multiple Input Buffers. 7605 (line 6) 7606* name definitions, not POSIX: Lex and Posix. (line 75) 7607* negating ranges in patterns: Patterns. (line 23) 7608* newline, matching in patterns: Patterns. (line 135) 7609* non-POSIX features of flex: Lex and Posix. (line 142) 7610* noyywrap, %option: Generated Scanner. (line 93) 7611* NULL character in patterns, syntax of: Patterns. (line 62) 7612* octal characters in patterns: Patterns. (line 65) 7613* options, command-line: Scanner Options. (line 6) 7614* overriding LexerInput: How do I use my own I/O classes in a C++ scanner?. 7615 (line 9) 7616* overriding LexerOutput: How do I use my own I/O classes in a C++ scanner?. 7617 (line 9) 7618* overriding the memory routines: Overriding The Default Memory Management. 7619 (line 38) 7620* Pascal-like language: Simple Examples. (line 49) 7621* pattern aliases, defining: Definitions Section. (line 21) 7622* pattern aliases, expansion of: Patterns. (line 51) 7623* pattern aliases, how to define: Definitions Section. (line 10) 7624* pattern aliases, use of: Definitions Section. (line 28) 7625* patterns and actions on different lines: Lex and Posix. (line 101) 7626* patterns, character class equivalence: Patterns. (line 205) 7627* patterns, common: Common Patterns. (line 6) 7628* patterns, end of line: Patterns. (line 300) 7629* patterns, grouping and precedence: Patterns. (line 167) 7630* patterns, in rules section: Patterns. (line 6) 7631* patterns, invalid trailing context: Patterns. (line 285) 7632* patterns, matching: Matching. (line 6) 7633* patterns, precedence of operators: Patterns. (line 161) 7634* patterns, repetitions with grouping: Patterns. (line 184) 7635* patterns, special characters treated as non-special: Patterns. 7636 (line 293) 7637* patterns, syntax: Patterns. (line 9) 7638* patterns, syntax <1>: Patterns. (line 9) 7639* patterns, tuning for performance: Performance. (line 49) 7640* patterns, valid character classes: Patterns. (line 192) 7641* performance optimization, matching longer tokens: Performance. 7642 (line 167) 7643* performance optimization, recognizing keywords: Performance. 7644 (line 205) 7645* performance, backing up: Performance. (line 49) 7646* performance, considerations: Performance. (line 6) 7647* performance, using keywords: Performance. (line 200) 7648* popping an input buffer: Multiple Input Buffers. 7649 (line 60) 7650* POSIX and lex: Lex and Posix. (line 6) 7651* POSIX comp;compliance: Lex and Posix. (line 142) 7652* POSIX, character classes in patterns, syntax of: Patterns. (line 15) 7653* preprocessor macros, for use in actions: Actions. (line 50) 7654* pushing an input buffer: Multiple Input Buffers. 7655 (line 52) 7656* pushing back characters with unput: Actions. (line 143) 7657* pushing back characters with unput(): Actions. (line 147) 7658* pushing back characters with yyless: Actions. (line 131) 7659* pushing back EOF: Actions. (line 170) 7660* ranges in patterns: Patterns. (line 19) 7661* ranges in patterns, negating: Patterns. (line 23) 7662* recognizing C comments: Start Conditions. (line 143) 7663* reentrant scanners, multiple interleaved scanners: Reentrant Uses. 7664 (line 10) 7665* reentrant scanners, recursive invocation: Reentrant Uses. (line 30) 7666* reentrant, accessing flex variables: Global Replacement. (line 6) 7667* reentrant, accessor functions: Accessor Methods. (line 6) 7668* reentrant, API explanation: Reentrant Overview. (line 6) 7669* reentrant, calling functions: Extra Reentrant Argument. 7670 (line 6) 7671* reentrant, example of: Reentrant Example. (line 6) 7672* reentrant, explanation: Reentrant. (line 6) 7673* reentrant, extra data: Extra Data. (line 6) 7674* reentrant, initialization: Init and Destroy Functions. 7675 (line 6) 7676* regular expressions, in patterns: Patterns. (line 6) 7677* REJECT: Actions. (line 61) 7678* REJECT, calling multiple times: Actions. (line 83) 7679* REJECT, performance costs: Performance. (line 12) 7680* reporting bugs: Reporting Bugs. (line 6) 7681* restarting the scanner: Lex and Posix. (line 54) 7682* RETURN, within actions: Generated Scanner. (line 57) 7683* rules, default: Simple Examples. (line 15) 7684* rules, in flex input: Rules Section. (line 6) 7685* scanner, definition of: Introduction. (line 6) 7686* sections of flex input: Format. (line 6) 7687* serialization: Serialized Tables. (line 6) 7688* serialization of tables: Creating Serialized Tables. 7689 (line 6) 7690* serialized tables, multiple scanners: Creating Serialized Tables. 7691 (line 26) 7692* stack, input buffer pop: Multiple Input Buffers. 7693 (line 60) 7694* stack, input buffer push: Multiple Input Buffers. 7695 (line 52) 7696* stacks, routines for manipulating: Start Conditions. (line 286) 7697* start condition, applying to multiple patterns: Start Conditions. 7698 (line 258) 7699* start conditions: Start Conditions. (line 6) 7700* start conditions, behavior of default rule: Start Conditions. 7701 (line 82) 7702* start conditions, exclusive: Start Conditions. (line 53) 7703* start conditions, for different interpretations of same input: Start Conditions. 7704 (line 112) 7705* start conditions, in patterns: Patterns. (line 140) 7706* start conditions, inclusive: Start Conditions. (line 44) 7707* start conditions, inclusive v.s. exclusive: Start Conditions. 7708 (line 24) 7709* start conditions, integer values: Start Conditions. (line 163) 7710* start conditions, multiple: Start Conditions. (line 17) 7711* start conditions, special wildcard condition: Start Conditions. 7712 (line 68) 7713* start conditions, use of a stack: Start Conditions. (line 286) 7714* start conditions, use of wildcard condition (<*>): Start Conditions. 7715 (line 72) 7716* start conditions, using BEGIN: Start Conditions. (line 95) 7717* stdin, default for yyin: Generated Scanner. (line 37) 7718* stdout, as default for yyout: Generated Scanner. (line 101) 7719* strings, scanning strings instead of files: Multiple Input Buffers. 7720 (line 175) 7721* tables, creating serialized: Creating Serialized Tables. 7722 (line 6) 7723* tables, file format: Tables File Format. (line 6) 7724* tables, freeing: Loading and Unloading Serialized Tables. 7725 (line 6) 7726* tables, loading and unloading: Loading and Unloading Serialized Tables. 7727 (line 6) 7728* terminating with yyterminate(): Actions. (line 212) 7729* token: Matching. (line 14) 7730* trailing context, in patterns: Patterns. (line 118) 7731* trailing context, limits of: Patterns. (line 275) 7732* trailing context, matching: Matching. (line 6) 7733* trailing context, performance costs: Performance. (line 12) 7734* trailing context, variable length: Performance. (line 141) 7735* unput(): Actions. (line 143) 7736* unput(), and %pointer: Actions. (line 162) 7737* unput(), pushing back characters: Actions. (line 147) 7738* user code, in flex input: User Code Section. (line 6) 7739* username expansion: Simple Examples. (line 8) 7740* using integer values of start condition names: Start Conditions. 7741 (line 163) 7742* verbatim text in patterns, syntax of: Patterns. (line 54) 7743* warning, dangerous trailing context: Limitations. (line 20) 7744* warning, rule cannot be matched: Diagnostics. (line 14) 7745* warnings, diagnostic messages: Diagnostics. (line 6) 7746* whitespace, compressing: Actions. (line 22) 7747* yacc interface: Yacc. (line 17) 7748* yacc, interface: Yacc. (line 6) 7749* yyalloc, overriding: Overriding The Default Memory Management. 7750 (line 6) 7751* yyfree, overriding: Overriding The Default Memory Management. 7752 (line 6) 7753* yyin: Generated Scanner. (line 37) 7754* yyinput(): Actions. (line 202) 7755* yyleng: Matching. (line 14) 7756* yyleng, modification of: Actions. (line 47) 7757* yyless(): Actions. (line 125) 7758* yyless(), pushing back characters: Actions. (line 131) 7759* yylex(), in generated scanner: Generated Scanner. (line 6) 7760* yylex(), overriding: Generated Scanner. (line 16) 7761* yylex, overriding the prototype of: Generated Scanner. (line 20) 7762* yylineno, in a reentrant scanner: Reentrant Functions. (line 36) 7763* yylineno, performance costs: Performance. (line 12) 7764* yymore(): Actions. (line 104) 7765* yymore() to append token to previous token: Actions. (line 110) 7766* yymore(), mega-kludge: Actions. (line 110) 7767* yymore, and yyleng: Actions. (line 47) 7768* yymore, performance penalty of: Actions. (line 119) 7769* yyout: Generated Scanner. (line 101) 7770* yyrealloc, overriding: Overriding The Default Memory Management. 7771 (line 6) 7772* yyrestart(): Generated Scanner. (line 42) 7773* yyterminate(): Actions. (line 212) 7774* yytext: Matching. (line 14) 7775* yytext, default array size: User Values. (line 13) 7776* yytext, memory considerations: A Note About yytext And Memory. 7777 (line 6) 7778* yytext, modification of: Actions. (line 42) 7779* yytext, two types of: Matching. (line 29) 7780* yywrap(): Generated Scanner. (line 85) 7781* yywrap, default for: Generated Scanner. (line 93) 7782* YY_CURRENT_BUFFER, and multiple buffers Finally, the macro: Multiple Input Buffers. 7783 (line 78) 7784* YY_EXTRA_TYPE, defining your own type: Extra Data. (line 33) 7785* YY_FLUSH_BUFFER: Actions. (line 206) 7786* YY_INPUT: Generated Scanner. (line 61) 7787* YY_INPUT, overriding: Generated Scanner. (line 71) 7788* YY_START, example: Start Conditions. (line 185) 7789* YY_USER_ACTION to track each time a rule is matched: Misc Macros. 7790 (line 14) 7791 7792