1[comment {-*- tcl -*- doctools manpage}]
2[manpage_begin page_util_norm_peg n 1.0]
3[copyright {2007 Andreas Kupries <andreas_kupries@users.sourceforge.net>}]
4[moddesc   {Parser generator tools}]
5[titledesc {page AST normalization, PEG}]
6[category  {Page Parser Generator}]
7[require page::util::norm_peg [opt 0.1]]
8[require snit]
9[keywords page PEG normalization {tree walking} {graph walking}]
10[keywords {parser generator} {text processing}]
11[description]
12[para]
13
14This package provides a single utility command which takes an AST for a
15parsing expression grammar and normalizes it in various ways. The result
16is called a [term {Normalized PE Grammar Tree}].
17
18[para]
19
20[emph Note] that this package can only be used from within a plugin
21managed by the package [package page::pluginmgr].
22
23[comment {
24    TODO: Document the structure of a PEG AST,
25          and then of a Normalized PEG Tree. Which
26          is not a true AST any longer.
27}]
28
29[section API]
30
31[list_begin definitions]
32[call [cmd ::page::util::norm::peg] [arg tree]]
33
34This command assumes the [arg tree] object contains for a
35parsing expression grammar. It normalizes this tree in place.
36The result is called a  [term {Normalized PE Grammar Tree}].
37
38[para]
39
40The following operations are performd
41
42[list_begin enum]
43[enum] 
44The data for all terminals is stored in their grandparental
45nodes. The terminal nodes and their parents are removed. Type
46information is dropped.
47
48[enum]
49All nodes which have exactly one child are irrelevant and are
50removed, with the exception of the root node. The immediate
51child of the root is irrelevant as well, and removed as well.
52
53[enum]
54The name of the grammar is moved from the tree node it is stored
55in to an attribute of the root node, and the tree node removed.
56[para]
57The node keeping the start expression separate is removed as
58irrelevant and the root node of the start expression tagged with
59a marker attribute, and its handle saved in an attribute of the
60root node for quick access.
61
62[enum]
63Nonterminal hint information is moved from nodes into attributes,
64and the now irrelevant nodes are deleted.
65[para]
66[emph Note:] This transformation is dependent on the removal of all
67nodes with exactly one child, as it removes the all 'Attribute'
68nodes already. Otherwise this transformation would have to put
69the information into the grandparental node.
70[para]
71The default mode given to the nonterminals is [const value].
72[para]
73Like with the global metadata definition specific information
74is moved out out of nodes into attributes, the now irrelevant
75nodes are deleted, and the root nodes of all definitions are
76tagged with marker attributes. This provides us with a mapping
77from nonterminal names to their defining nodes as well, which
78is saved in an attribute of the root node for quick reference.
79[para]
80At last the range in the input covered by a definition is
81computed. The left extent comes from the terminal for the
82nonterminal symbol it defines. The right extent comes from
83the rightmost child under the definition. While this not an
84expression tree yet the location data is sound already.
85
86[enum]
87The remaining nodes under all definitions are transformed
88into proper expression trees. First character ranges, followed
89by unary operations, characters, and nonterminals. At last the
90tree is flattened by the removal of superfluous inner nodes.
91[para]
92The order matters, to shed as much nodes as possible early, and
93to avoid unnecessary work later.
94
95[list_end]
96[list_end]
97
98[section {BUGS, IDEAS, FEEDBACK}]
99
100This document, will undoubtedly contain bugs and other problems.
101
102Please report such in the category [emph page] of the
103[uri {http://sourceforge.net/tracker/?group_id=12883} {Tcllib SF Trackers}].
104
105Please also report any ideas for enhancements you may have.
106
107[manpage_end]
108