1#
2# Module Parse::Yapp.pm.
3#
4# Copyright (c) 1998-2001, Francois Desarmenien, all right reserved.
5#
6# See the Copyright section at the end of the Parse/Yapp.pm pod section
7# for usage and distribution rights.
8#
9#
10package Parse::Yapp;
11
12use strict;
13use vars qw($VERSION @ISA);
14@ISA = qw(Parse::Yapp::Output);
15
16use Parse::Yapp::Output;
17
18# $VERSION is in Parse/Yapp/Driver.pm
19
20
211;
22
23__END__
24
25=head1 NAME
26
27Parse::Yapp - Perl extension for generating and using LALR parsers.
28
29=head1 SYNOPSIS
30
31  yapp -m MyParser grammar_file.yp
32
33  ...
34
35  use MyParser;
36
37  $parser=new MyParser();
38  $value=$parser->YYParse(yylex => \&lexer_sub, yyerror => \&error_sub);
39
40  $nberr=$parser->YYNberr();
41
42  $parser->YYData->{DATA}= [ 'Anything', 'You Want' ];
43
44  $data=$parser->YYData->{DATA}[0];
45
46=head1 DESCRIPTION
47
48Parse::Yapp (Yet Another Perl Parser compiler) is a collection of modules
49that let you generate and use yacc like thread safe (reentrant) parsers with
50perl object oriented interface.
51
52The script yapp is a front-end to the Parse::Yapp module and let you
53easily create a Perl OO parser from an input grammar file.
54
55=head2 The Grammar file
56
57=over 4
58
59=item C<Comments>
60
61Through all your files, comments are either Perl style, introduced by I<#>
62up to the end of line, or C style, enclosed between  I</*> and I<*/>.
63
64
65=item C<Tokens and string literals>
66
67
68Through all the grammar files, two kind of symbols may appear:
69I<Non-terminal> symbols, called also I<left-hand-side> symbols,
70which are the names of your rules, and I<Terminal> symbols, called
71also I<Tokens>.
72
73Tokens are the symbols your lexer function will feed your parser with
74(see below). They are of two flavours: symbolic tokens and string
75literals.
76
77Non-terminals and symbolic tokens share the same identifier syntax:
78
79		[A-Za-z][A-Za-z0-9_]*
80
81String literals are enclosed in single quotes and can contain almost
82anything. They will be output to your parser file double-quoted, making
83any special character as such. '"', '$' and '@' will be automatically
84quoted with '\', making their writing more natural. On the other hand,
85if you need a single quote inside your literal, just quote it with '\'.
86
87You cannot have a literal I<'error'> in your grammar as it would
88confuse the driver with the I<error> token. Use a symbolic token instead.
89In case you inadvertently use it, this will produce a warning telling you
90you should have written it I<error> and will treat it as if it were the
91I<error> token, which is certainly NOT what you meant.
92
93
94=item C<Grammar file syntax>
95
96It is very close to yacc syntax (in fact, I<Parse::Yapp> should compile
97a clean I<yacc> grammar without any modification, whereas the opposite
98is not true).
99
100This file is divided in three sections, separated by C<%%>:
101
102	header section
103	%%
104	rules section
105	%%
106	footer section
107
108=over 4
109
110=item B<The Header Section> section may optionally contain:
111
112=item *
113
114One or more code blocks enclosed inside C<%{> and C<%}> just like in
115yacc. They may contain any valid Perl code and will be copied verbatim
116at the very beginning of the parser module. They are not as useful as
117they are in yacc, but you can use them, for example, for global variable
118declarations, though you will notice later that such global variables can
119be avoided to make a reentrant parser module.
120
121=item *
122
123Precedence declarations, introduced by C<%left>, C<%right> and C<%nonassoc>
124specifying associativity, followed by the list of tokens or litterals
125having the same precedence and associativity.
126The precedence beeing the latter declared will be having the highest level.
127(see the yacc or bison manuals for a full explanation of how they work,
128as they are implemented exactly the same way in Parse::Yapp)
129
130=item *
131
132C<%start> followed by a rule's left hand side, declaring this rule to
133be the starting rule of your grammar. The default, when C<%start> is not
134used, is the first rule in your grammar section.
135
136=item *
137
138C<%token> followed by a list of symbols, forcing them to be recognized
139as tokens, generating a syntax error if used in the left hand side of
140a rule declaration.
141Note that in Parse::Yapp, you I<don't> need to declare tokens as in yacc: any
142symbol not appearing as a left hand side of a rule is considered to be
143a token.
144Other yacc declarations or constructs such as C<%type> and C<%union> are
145parsed but (almost) ignored.
146
147=item *
148
149C<%expect> followed by a number, suppress warnings about number of Shift/Reduce
150conflicts when both numbers match, a la bison.
151
152
153=item B<The Rule Section> contains your grammar rules:
154
155A rule is made of a left-hand-side symbol, followed by a C<':'> and one
156or more right-hand-sides separated by C<'|'> and terminated by a C<';'>:
157
158    exp:    exp '+' exp
159        |   exp '-' exp
160        ;
161
162A right hand side may be empty:
163
164    input:  #empty
165        |   input line
166        ;
167
168(if you have more than one empty rhs, Parse::Yapp will issue a warning,
169as this is usually a mistake, and you will certainly have a reduce/reduce
170conflict)
171
172
173A rhs may be followed by an optional C<%prec> directive, followed
174by a token, giving the rule an explicit precedence (see yacc manuals
175for its precise meaning) and optionnal semantic action code block (see
176below).
177
178    exp:   '-' exp %prec NEG { -$_[1] }
179        |  exp '+' exp       { $_[1] + $_[3] }
180        |  NUM
181        ;
182
183Note that in Parse::Yapp, a lhs I<cannot> appear more than once as
184a rule name (This differs from yacc).
185
186
187=item C<The footer section>
188
189may contain any valid Perl code and will be appended at the very end
190of your parser module. Here you can write your lexer, error report
191subs and anything relevant to you parser.
192
193=item C<Semantic actions>
194
195Semantic actions are run every time a I<reduction> occurs in the
196parsing flow and they must return a semantic value.
197
198They are (usually, but see below C<In rule actions>) written at
199the very end of the rhs, enclosed with C<{ }>, and are copied verbatim
200to your parser file, inside of the rules table.
201
202Be aware that matching braces in Perl is much more difficult than
203in C: inside strings they don't need to match. While in C it is
204very easy to detect the beginning of a string construct, or a
205single character, it is much more difficult in Perl, as there
206are so many ways of writing such literals. So there is no check
207for that today. If you need a brace in a double-quoted string, just
208quote it (C<\{> or C<\}>). For single-quoted strings, you will need
209to make a comment matching it I<in th right order>.
210Sorry for the inconvenience.
211
212    {
213        "{ My string block }".
214        "\{ My other string block \}".
215        qq/ My unmatched brace \} /.
216        # Force the match: {
217        q/ for my closing brace } /
218        q/ My opening brace { /
219        # must be closed: }
220    }
221
222All of these constructs should work.
223
224
225In Parse::Yapp, semantic actions are called like normal Perl sub calls,
226with their arguments passed in C<@_>, and their semantic value are
227their return values.
228
229$_[1] to $_[n] are the parameters just as $1 to $n in yacc, while
230$_[0] is the parser object itself.
231
232Having $_[0] beeing the parser object itself allows you to call
233parser methods. Thats how the yacc macros are implemented:
234
235	yyerrok is done by calling $_[0]->YYErrok
236	YYERROR is done by calling $_[0]->YYError
237	YYACCEPT is done by calling $_[0]->YYAccept
238	YYABORT is done by calling $_[0]->YYAbort
239
240All those methods explicitly return I<undef>, for convenience.
241
242    YYRECOVERING is done by calling $_[0]->YYRecovering
243
244Four useful methods in error recovery sub
245
246    $_[0]->YYCurtok
247    $_[0]->YYCurval
248    $_[0]->YYExpect
249    $_[0]->YYLexer
250
251return respectivly the current input token that made the parse fail,
252its semantic value (both can be used to modify their values too, but
253I<know what you are doing> ! See I<Error reporting routine> section for
254an example), a list which contains the tokens the parser expected when
255the failure occured and a reference to the lexer routine.
256
257Note that if C<$_[0]-E<gt>YYCurtok> is declared as a C<%nonassoc> token,
258it can be included in C<$_[0]-E<gt>YYExpect> list whenever the input
259try to use it in an associative way. This is not a bug: the token
260IS expected to report an error if encountered.
261
262To detect such a thing in your error reporting sub, the following
263example should do the trick:
264
265        grep { $_[0]->YYCurtok eq $_ } $_[0]->YYExpect
266    and do {
267        #Non-associative token used in an associative expression
268    };
269
270Accessing semantics values on the left of your reducing rule is done
271through the method
272
273    $_[0]->YYSemval( index )
274
275where index is an integer. Its value being I<1 .. n> returns the same values
276than I<$_[1] .. $_[n]>, but I<-n .. 0> returns values on the left of the rule
277beeing reduced (It is related to I<$-n .. $0 .. $n> in yacc, but you
278cannot use I<$_[0]> or I<$_[-n]> constructs in Parse::Yapp for obvious reasons)
279
280
281There is also a provision for a user data area in the parser object,
282accessed by the method:
283
284    $_[0]->YYData
285
286which returns a reference to an anonymous hash, which let you have
287all of your parsing data held inside the object (see the Calc.yp
288or ParseYapp.yp files in the distribution for some examples).
289That's how you can make you parser module reentrant: all of your
290module states and variables are held inside the parser object.
291
292Note: unfortunatly, method calls in Perl have a lot of overhead,
293      and when YYData is used, it may be called a huge number
294      of times. If your are not a *real* purist and efficiency
295      is your concern, you may access directly the user-space
296      in the object: $parser->{USER} wich is a reference to an
297      anonymous hash array, and then benchmark.
298
299If no action is specified for a rule, the equivalant of a default
300action is run, which returns the first parameter:
301
302   { $_[1] }
303
304=item C<In rule actions>
305
306It is also possible to embed semantic actions inside of a rule:
307
308    typedef:    TYPE { $type = $_[1] } identlist { ... } ;
309
310When the Parse::Yapp's parser encounter such an embedded action, it modifies
311the grammar as if you wrote (although @x-1 is not a legal lhs value):
312
313    @x-1:   /* empty */ { $type = $_[1] };
314    typedef:    TYPE @x-1 identlist { ... } ;
315
316where I<x> is a sequential number incremented for each "in rule" action,
317and I<-1> represents the "dot position" in the rule where the action arises.
318
319In such actions, you can use I<$_[1]..$_[n]> variables, which are the
320semantic values on the left of your action.
321
322Be aware that the way Parse::Yapp modifies your grammar because of
323I<in rule actions> can produce, in some cases, spurious conflicts
324that wouldn't happen otherwise.
325
326=item C<Generating the Parser Module>
327
328Now that you grammar file is written, you can use yapp on it
329to generate your parser module:
330
331    yapp -v Calc.yp
332
333will create two files F<Calc.pm>, your parser module, and F<Calc.output>
334a verbose output of your parser rules, conflicts, warnings, states
335and summary.
336
337What your are missing now is a lexer routine.
338
339=item C<The Lexer sub>
340
341is called each time the parser need to read the next token.
342
343It is called with only one argument that is the parser object itself,
344so you can access its methods, specially the
345
346    $_[0]->YYData
347
348data area.
349
350It is its duty to return the next token and value to the parser.
351They C<must> be returned as a list of two variables, the first one
352is the token known by the parser (symbolic or literal), the second
353one beeing anything you want (usualy the content of the token, or the
354literal value) from a simple scalar value to any complex reference,
355as the parsing driver never use it but to call semantic actions:
356
357    ( 'NUMBER', $num )
358or
359    ( '>=', '>=' )
360or
361    ( 'ARRAY', [ @values ] )
362
363When the lexer reach the end of input, it must return the C<''>
364empty token with an undef value:
365
366     ( '', undef )
367
368Note that your lexer should I<never> return C<'error'> as token
369value: for the driver, this is the error token used for error
370recovery and would lead to odd reactions.
371
372Now that you have your lexer written, maybe you will need to output
373meaningful error messages, instead of the default which is to print
374'Parse error.' on STDERR.
375
376So you will need an Error reporting sub.
377
378item C<Error reporting routine>
379
380If you want one, write it knowing that it is passed as parameter
381the parser object. So you can share information whith the lexer
382routine quite easily.
383
384You can also use the C<$_[0]-E<gt>YYErrok> method in it, which will
385resume parsing as if no error occured. Of course, since the invalid
386token is still invalid, you're supposed to fix the problem by
387yourself.
388
389The method C<$_[0]-E<gt>YYLexer> may help you, as it returns a reference
390to the lexer routine, and can be called as
391
392    ($tok,$val)=&{$_[0]->Lexer}
393
394to get the next token and semantic value from the input stream. To
395make them current for the parser, use:
396
397    ($_[0]->YYCurtok, $_[0]->YYCurval) = ($tok, $val)
398
399and know what you're doing...
400
401=item C<Parsing>
402
403Now you've got everything to do the parsing.
404
405First, use the parser module:
406
407    use Calc;
408
409Then create the parser object:
410
411    $parser=new Calc;
412
413Now, call the YYParse method, telling it where to find the lexer
414and error report subs:
415
416    $result=$parser->YYParse(yylex => \&Lexer,
417                           yyerror => \&ErrorReport);
418
419(assuming Lexer and ErrorReport subs have been written in your current
420package)
421
422The order in which parameters appear is unimportant.
423
424Et voila.
425
426The YYParse method will do the parse, then return the last semantic
427value returned, or undef if error recovery cannot recover.
428
429If you need to be sure the parse has been successful (in case your
430last returned semantic value I<is> undef) make a call to:
431
432    $parser->YYNberr()
433
434which returns the total number of time the error reporting sub has been called.
435
436=item C<Error Recovery>
437
438in Parse::Yapp is implemented the same way it is in yacc.
439
440=item C<Debugging Parser>
441
442To debug your parser, you can call the YYParse method with a debug parameter:
443
444    $parser->YYParse( ... , yydebug => value, ... )
445
446where value is a bitfield, each bit representing a specific debug output:
447
448    Bit Value    Outputs
449    0x01         Token reading (useful for Lexer debugging)
450    0x02         States information
451    0x04         Driver actions (shifts, reduces, accept...)
452    0x08         Parse Stack dump
453    0x10         Error Recovery tracing
454
455To have a full debugging ouput, use
456
457    debug => 0x1F
458
459Debugging output is sent to STDERR, and be aware that it can produce
460C<huge> outputs.
461
462=item C<Standalone Parsers>
463
464By default, the parser modules generated will need the Parse::Yapp
465module installed on the system to run. They use the Parse::Yapp::Driver
466which can be safely shared between parsers in the same script.
467
468In the case you'd prefer to have a standalone module generated, use
469the C<-s> switch with yapp: this will automagically copy the driver
470code into your module so you can use/distribute it without the need
471of the Parse::Yapp module, making it really a C<Standalone Parser>.
472
473If you do so, please remember to include Parse::Yapp's copyright notice
474in your main module copyright, so others can know about Parse::Yapp module.
475
476=item C<Source file line numbers>
477
478by default will be included in the generated parser module, which will help
479to find the guilty line in your source file in case of a syntax error.
480You can disable this feature by compiling your grammar with yapp using
481the C<-n> switch.
482
483=back
484
485=head1 BUGS AND SUGGESTIONS
486
487If you find bugs, think of anything that could improve Parse::Yapp
488or have any questions related to it, feel free to contact the author.
489
490=head1 AUTHOR
491
492Francois Desarmenien  <francois@fdesar.net>
493
494=head1 SEE ALSO
495
496yapp(1) perl(1) yacc(1) bison(1).
497
498=head1 COPYRIGHT
499
500The Parse::Yapp module and its related modules and shell scripts are copyright
501(c) 1998-2001 Francois Desarmenien, France. All rights reserved.
502
503You may use and distribute them under the terms of either
504the GNU General Public License or the Artistic License,
505as specified in the Perl README file.
506
507If you use the "standalone parser" option so people don't need to install
508Parse::Yapp on their systems in order to run you software, this copyright
509noticed should be included in your software copyright too, and the copyright
510notice in the embedded driver should be left untouched.
511
512=cut
513