1# 2# Module Parse::Yapp.pm. 3# 4# Copyright (c) 1998-2001, Francois Desarmenien, all right reserved. 5# 6# See the Copyright section at the end of the Parse/Yapp.pm pod section 7# for usage and distribution rights. 8# 9# 10package Parse::Yapp; 11 12use strict; 13use vars qw($VERSION @ISA); 14@ISA = qw(Parse::Yapp::Output); 15 16use Parse::Yapp::Output; 17 18# $VERSION is in Parse/Yapp/Driver.pm 19 20 211; 22 23__END__ 24 25=head1 NAME 26 27Parse::Yapp - Perl extension for generating and using LALR parsers. 28 29=head1 SYNOPSIS 30 31 yapp -m MyParser grammar_file.yp 32 33 ... 34 35 use MyParser; 36 37 $parser=new MyParser(); 38 $value=$parser->YYParse(yylex => \&lexer_sub, yyerror => \&error_sub); 39 40 $nberr=$parser->YYNberr(); 41 42 $parser->YYData->{DATA}= [ 'Anything', 'You Want' ]; 43 44 $data=$parser->YYData->{DATA}[0]; 45 46=head1 DESCRIPTION 47 48Parse::Yapp (Yet Another Perl Parser compiler) is a collection of modules 49that let you generate and use yacc like thread safe (reentrant) parsers with 50perl object oriented interface. 51 52The script yapp is a front-end to the Parse::Yapp module and let you 53easily create a Perl OO parser from an input grammar file. 54 55=head2 The Grammar file 56 57=over 4 58 59=item C<Comments> 60 61Through all your files, comments are either Perl style, introduced by I<#> 62up to the end of line, or C style, enclosed between I</*> and I<*/>. 63 64 65=item C<Tokens and string literals> 66 67 68Through all the grammar files, two kind of symbols may appear: 69I<Non-terminal> symbols, called also I<left-hand-side> symbols, 70which are the names of your rules, and I<Terminal> symbols, called 71also I<Tokens>. 72 73Tokens are the symbols your lexer function will feed your parser with 74(see below). They are of two flavours: symbolic tokens and string 75literals. 76 77Non-terminals and symbolic tokens share the same identifier syntax: 78 79 [A-Za-z][A-Za-z0-9_]* 80 81String literals are enclosed in single quotes and can contain almost 82anything. They will be output to your parser file double-quoted, making 83any special character as such. '"', '$' and '@' will be automatically 84quoted with '\', making their writing more natural. On the other hand, 85if you need a single quote inside your literal, just quote it with '\'. 86 87You cannot have a literal I<'error'> in your grammar as it would 88confuse the driver with the I<error> token. Use a symbolic token instead. 89In case you inadvertently use it, this will produce a warning telling you 90you should have written it I<error> and will treat it as if it were the 91I<error> token, which is certainly NOT what you meant. 92 93 94=item C<Grammar file syntax> 95 96It is very close to yacc syntax (in fact, I<Parse::Yapp> should compile 97a clean I<yacc> grammar without any modification, whereas the opposite 98is not true). 99 100This file is divided in three sections, separated by C<%%>: 101 102 header section 103 %% 104 rules section 105 %% 106 footer section 107 108=over 4 109 110=item B<The Header Section> section may optionally contain: 111 112=item * 113 114One or more code blocks enclosed inside C<%{> and C<%}> just like in 115yacc. They may contain any valid Perl code and will be copied verbatim 116at the very beginning of the parser module. They are not as useful as 117they are in yacc, but you can use them, for example, for global variable 118declarations, though you will notice later that such global variables can 119be avoided to make a reentrant parser module. 120 121=item * 122 123Precedence declarations, introduced by C<%left>, C<%right> and C<%nonassoc> 124specifying associativity, followed by the list of tokens or litterals 125having the same precedence and associativity. 126The precedence beeing the latter declared will be having the highest level. 127(see the yacc or bison manuals for a full explanation of how they work, 128as they are implemented exactly the same way in Parse::Yapp) 129 130=item * 131 132C<%start> followed by a rule's left hand side, declaring this rule to 133be the starting rule of your grammar. The default, when C<%start> is not 134used, is the first rule in your grammar section. 135 136=item * 137 138C<%token> followed by a list of symbols, forcing them to be recognized 139as tokens, generating a syntax error if used in the left hand side of 140a rule declaration. 141Note that in Parse::Yapp, you I<don't> need to declare tokens as in yacc: any 142symbol not appearing as a left hand side of a rule is considered to be 143a token. 144Other yacc declarations or constructs such as C<%type> and C<%union> are 145parsed but (almost) ignored. 146 147=item * 148 149C<%expect> followed by a number, suppress warnings about number of Shift/Reduce 150conflicts when both numbers match, a la bison. 151 152 153=item B<The Rule Section> contains your grammar rules: 154 155A rule is made of a left-hand-side symbol, followed by a C<':'> and one 156or more right-hand-sides separated by C<'|'> and terminated by a C<';'>: 157 158 exp: exp '+' exp 159 | exp '-' exp 160 ; 161 162A right hand side may be empty: 163 164 input: #empty 165 | input line 166 ; 167 168(if you have more than one empty rhs, Parse::Yapp will issue a warning, 169as this is usually a mistake, and you will certainly have a reduce/reduce 170conflict) 171 172 173A rhs may be followed by an optional C<%prec> directive, followed 174by a token, giving the rule an explicit precedence (see yacc manuals 175for its precise meaning) and optionnal semantic action code block (see 176below). 177 178 exp: '-' exp %prec NEG { -$_[1] } 179 | exp '+' exp { $_[1] + $_[3] } 180 | NUM 181 ; 182 183Note that in Parse::Yapp, a lhs I<cannot> appear more than once as 184a rule name (This differs from yacc). 185 186 187=item C<The footer section> 188 189may contain any valid Perl code and will be appended at the very end 190of your parser module. Here you can write your lexer, error report 191subs and anything relevant to you parser. 192 193=item C<Semantic actions> 194 195Semantic actions are run every time a I<reduction> occurs in the 196parsing flow and they must return a semantic value. 197 198They are (usually, but see below C<In rule actions>) written at 199the very end of the rhs, enclosed with C<{ }>, and are copied verbatim 200to your parser file, inside of the rules table. 201 202Be aware that matching braces in Perl is much more difficult than 203in C: inside strings they don't need to match. While in C it is 204very easy to detect the beginning of a string construct, or a 205single character, it is much more difficult in Perl, as there 206are so many ways of writing such literals. So there is no check 207for that today. If you need a brace in a double-quoted string, just 208quote it (C<\{> or C<\}>). For single-quoted strings, you will need 209to make a comment matching it I<in th right order>. 210Sorry for the inconvenience. 211 212 { 213 "{ My string block }". 214 "\{ My other string block \}". 215 qq/ My unmatched brace \} /. 216 # Force the match: { 217 q/ for my closing brace } / 218 q/ My opening brace { / 219 # must be closed: } 220 } 221 222All of these constructs should work. 223 224 225In Parse::Yapp, semantic actions are called like normal Perl sub calls, 226with their arguments passed in C<@_>, and their semantic value are 227their return values. 228 229$_[1] to $_[n] are the parameters just as $1 to $n in yacc, while 230$_[0] is the parser object itself. 231 232Having $_[0] beeing the parser object itself allows you to call 233parser methods. Thats how the yacc macros are implemented: 234 235 yyerrok is done by calling $_[0]->YYErrok 236 YYERROR is done by calling $_[0]->YYError 237 YYACCEPT is done by calling $_[0]->YYAccept 238 YYABORT is done by calling $_[0]->YYAbort 239 240All those methods explicitly return I<undef>, for convenience. 241 242 YYRECOVERING is done by calling $_[0]->YYRecovering 243 244Four useful methods in error recovery sub 245 246 $_[0]->YYCurtok 247 $_[0]->YYCurval 248 $_[0]->YYExpect 249 $_[0]->YYLexer 250 251return respectivly the current input token that made the parse fail, 252its semantic value (both can be used to modify their values too, but 253I<know what you are doing> ! See I<Error reporting routine> section for 254an example), a list which contains the tokens the parser expected when 255the failure occured and a reference to the lexer routine. 256 257Note that if C<$_[0]-E<gt>YYCurtok> is declared as a C<%nonassoc> token, 258it can be included in C<$_[0]-E<gt>YYExpect> list whenever the input 259try to use it in an associative way. This is not a bug: the token 260IS expected to report an error if encountered. 261 262To detect such a thing in your error reporting sub, the following 263example should do the trick: 264 265 grep { $_[0]->YYCurtok eq $_ } $_[0]->YYExpect 266 and do { 267 #Non-associative token used in an associative expression 268 }; 269 270Accessing semantics values on the left of your reducing rule is done 271through the method 272 273 $_[0]->YYSemval( index ) 274 275where index is an integer. Its value being I<1 .. n> returns the same values 276than I<$_[1] .. $_[n]>, but I<-n .. 0> returns values on the left of the rule 277beeing reduced (It is related to I<$-n .. $0 .. $n> in yacc, but you 278cannot use I<$_[0]> or I<$_[-n]> constructs in Parse::Yapp for obvious reasons) 279 280 281There is also a provision for a user data area in the parser object, 282accessed by the method: 283 284 $_[0]->YYData 285 286which returns a reference to an anonymous hash, which let you have 287all of your parsing data held inside the object (see the Calc.yp 288or ParseYapp.yp files in the distribution for some examples). 289That's how you can make you parser module reentrant: all of your 290module states and variables are held inside the parser object. 291 292Note: unfortunatly, method calls in Perl have a lot of overhead, 293 and when YYData is used, it may be called a huge number 294 of times. If your are not a *real* purist and efficiency 295 is your concern, you may access directly the user-space 296 in the object: $parser->{USER} wich is a reference to an 297 anonymous hash array, and then benchmark. 298 299If no action is specified for a rule, the equivalant of a default 300action is run, which returns the first parameter: 301 302 { $_[1] } 303 304=item C<In rule actions> 305 306It is also possible to embed semantic actions inside of a rule: 307 308 typedef: TYPE { $type = $_[1] } identlist { ... } ; 309 310When the Parse::Yapp's parser encounter such an embedded action, it modifies 311the grammar as if you wrote (although @x-1 is not a legal lhs value): 312 313 @x-1: /* empty */ { $type = $_[1] }; 314 typedef: TYPE @x-1 identlist { ... } ; 315 316where I<x> is a sequential number incremented for each "in rule" action, 317and I<-1> represents the "dot position" in the rule where the action arises. 318 319In such actions, you can use I<$_[1]..$_[n]> variables, which are the 320semantic values on the left of your action. 321 322Be aware that the way Parse::Yapp modifies your grammar because of 323I<in rule actions> can produce, in some cases, spurious conflicts 324that wouldn't happen otherwise. 325 326=item C<Generating the Parser Module> 327 328Now that you grammar file is written, you can use yapp on it 329to generate your parser module: 330 331 yapp -v Calc.yp 332 333will create two files F<Calc.pm>, your parser module, and F<Calc.output> 334a verbose output of your parser rules, conflicts, warnings, states 335and summary. 336 337What your are missing now is a lexer routine. 338 339=item C<The Lexer sub> 340 341is called each time the parser need to read the next token. 342 343It is called with only one argument that is the parser object itself, 344so you can access its methods, specially the 345 346 $_[0]->YYData 347 348data area. 349 350It is its duty to return the next token and value to the parser. 351They C<must> be returned as a list of two variables, the first one 352is the token known by the parser (symbolic or literal), the second 353one beeing anything you want (usualy the content of the token, or the 354literal value) from a simple scalar value to any complex reference, 355as the parsing driver never use it but to call semantic actions: 356 357 ( 'NUMBER', $num ) 358or 359 ( '>=', '>=' ) 360or 361 ( 'ARRAY', [ @values ] ) 362 363When the lexer reach the end of input, it must return the C<''> 364empty token with an undef value: 365 366 ( '', undef ) 367 368Note that your lexer should I<never> return C<'error'> as token 369value: for the driver, this is the error token used for error 370recovery and would lead to odd reactions. 371 372Now that you have your lexer written, maybe you will need to output 373meaningful error messages, instead of the default which is to print 374'Parse error.' on STDERR. 375 376So you will need an Error reporting sub. 377 378item C<Error reporting routine> 379 380If you want one, write it knowing that it is passed as parameter 381the parser object. So you can share information whith the lexer 382routine quite easily. 383 384You can also use the C<$_[0]-E<gt>YYErrok> method in it, which will 385resume parsing as if no error occured. Of course, since the invalid 386token is still invalid, you're supposed to fix the problem by 387yourself. 388 389The method C<$_[0]-E<gt>YYLexer> may help you, as it returns a reference 390to the lexer routine, and can be called as 391 392 ($tok,$val)=&{$_[0]->Lexer} 393 394to get the next token and semantic value from the input stream. To 395make them current for the parser, use: 396 397 ($_[0]->YYCurtok, $_[0]->YYCurval) = ($tok, $val) 398 399and know what you're doing... 400 401=item C<Parsing> 402 403Now you've got everything to do the parsing. 404 405First, use the parser module: 406 407 use Calc; 408 409Then create the parser object: 410 411 $parser=new Calc; 412 413Now, call the YYParse method, telling it where to find the lexer 414and error report subs: 415 416 $result=$parser->YYParse(yylex => \&Lexer, 417 yyerror => \&ErrorReport); 418 419(assuming Lexer and ErrorReport subs have been written in your current 420package) 421 422The order in which parameters appear is unimportant. 423 424Et voila. 425 426The YYParse method will do the parse, then return the last semantic 427value returned, or undef if error recovery cannot recover. 428 429If you need to be sure the parse has been successful (in case your 430last returned semantic value I<is> undef) make a call to: 431 432 $parser->YYNberr() 433 434which returns the total number of time the error reporting sub has been called. 435 436=item C<Error Recovery> 437 438in Parse::Yapp is implemented the same way it is in yacc. 439 440=item C<Debugging Parser> 441 442To debug your parser, you can call the YYParse method with a debug parameter: 443 444 $parser->YYParse( ... , yydebug => value, ... ) 445 446where value is a bitfield, each bit representing a specific debug output: 447 448 Bit Value Outputs 449 0x01 Token reading (useful for Lexer debugging) 450 0x02 States information 451 0x04 Driver actions (shifts, reduces, accept...) 452 0x08 Parse Stack dump 453 0x10 Error Recovery tracing 454 455To have a full debugging ouput, use 456 457 debug => 0x1F 458 459Debugging output is sent to STDERR, and be aware that it can produce 460C<huge> outputs. 461 462=item C<Standalone Parsers> 463 464By default, the parser modules generated will need the Parse::Yapp 465module installed on the system to run. They use the Parse::Yapp::Driver 466which can be safely shared between parsers in the same script. 467 468In the case you'd prefer to have a standalone module generated, use 469the C<-s> switch with yapp: this will automagically copy the driver 470code into your module so you can use/distribute it without the need 471of the Parse::Yapp module, making it really a C<Standalone Parser>. 472 473If you do so, please remember to include Parse::Yapp's copyright notice 474in your main module copyright, so others can know about Parse::Yapp module. 475 476=item C<Source file line numbers> 477 478by default will be included in the generated parser module, which will help 479to find the guilty line in your source file in case of a syntax error. 480You can disable this feature by compiling your grammar with yapp using 481the C<-n> switch. 482 483=back 484 485=head1 BUGS AND SUGGESTIONS 486 487If you find bugs, think of anything that could improve Parse::Yapp 488or have any questions related to it, feel free to contact the author. 489 490=head1 AUTHOR 491 492Francois Desarmenien <francois@fdesar.net> 493 494=head1 SEE ALSO 495 496yapp(1) perl(1) yacc(1) bison(1). 497 498=head1 COPYRIGHT 499 500The Parse::Yapp module and its related modules and shell scripts are copyright 501(c) 1998-2001 Francois Desarmenien, France. All rights reserved. 502 503You may use and distribute them under the terms of either 504the GNU General Public License or the Artistic License, 505as specified in the Perl README file. 506 507If you use the "standalone parser" option so people don't need to install 508Parse::Yapp on their systems in order to run you software, this copyright 509noticed should be included in your software copyright too, and the copyright 510notice in the embedded driver should be left untouched. 511 512=cut 513