1[comment {-*- tcl -*- doctools manpage}] 2[manpage_begin page_util_norm_peg n 1.0] 3[copyright {2007 Andreas Kupries <andreas_kupries@users.sourceforge.net>}] 4[moddesc {Parser generator tools}] 5[titledesc {page AST normalization, PEG}] 6[category {Page Parser Generator}] 7[require page::util::norm_peg [opt 0.1]] 8[require snit] 9[keywords page PEG normalization {tree walking} {graph walking}] 10[keywords {parser generator} {text processing}] 11[description] 12[para] 13 14This package provides a single utility command which takes an AST for a 15parsing expression grammar and normalizes it in various ways. The result 16is called a [term {Normalized PE Grammar Tree}]. 17 18[para] 19 20[emph Note] that this package can only be used from within a plugin 21managed by the package [package page::pluginmgr]. 22 23[comment { 24 TODO: Document the structure of a PEG AST, 25 and then of a Normalized PEG Tree. Which 26 is not a true AST any longer. 27}] 28 29[section API] 30 31[list_begin definitions] 32[call [cmd ::page::util::norm::peg] [arg tree]] 33 34This command assumes the [arg tree] object contains for a 35parsing expression grammar. It normalizes this tree in place. 36The result is called a [term {Normalized PE Grammar Tree}]. 37 38[para] 39 40The following operations are performd 41 42[list_begin enum] 43[enum] 44The data for all terminals is stored in their grandparental 45nodes. The terminal nodes and their parents are removed. Type 46information is dropped. 47 48[enum] 49All nodes which have exactly one child are irrelevant and are 50removed, with the exception of the root node. The immediate 51child of the root is irrelevant as well, and removed as well. 52 53[enum] 54The name of the grammar is moved from the tree node it is stored 55in to an attribute of the root node, and the tree node removed. 56[para] 57The node keeping the start expression separate is removed as 58irrelevant and the root node of the start expression tagged with 59a marker attribute, and its handle saved in an attribute of the 60root node for quick access. 61 62[enum] 63Nonterminal hint information is moved from nodes into attributes, 64and the now irrelevant nodes are deleted. 65[para] 66[emph Note:] This transformation is dependent on the removal of all 67nodes with exactly one child, as it removes the all 'Attribute' 68nodes already. Otherwise this transformation would have to put 69the information into the grandparental node. 70[para] 71The default mode given to the nonterminals is [const value]. 72[para] 73Like with the global metadata definition specific information 74is moved out out of nodes into attributes, the now irrelevant 75nodes are deleted, and the root nodes of all definitions are 76tagged with marker attributes. This provides us with a mapping 77from nonterminal names to their defining nodes as well, which 78is saved in an attribute of the root node for quick reference. 79[para] 80At last the range in the input covered by a definition is 81computed. The left extent comes from the terminal for the 82nonterminal symbol it defines. The right extent comes from 83the rightmost child under the definition. While this not an 84expression tree yet the location data is sound already. 85 86[enum] 87The remaining nodes under all definitions are transformed 88into proper expression trees. First character ranges, followed 89by unary operations, characters, and nonterminals. At last the 90tree is flattened by the removal of superfluous inner nodes. 91[para] 92The order matters, to shed as much nodes as possible early, and 93to avoid unnecessary work later. 94 95[list_end] 96[list_end] 97 98[section {BUGS, IDEAS, FEEDBACK}] 99 100This document, will undoubtedly contain bugs and other problems. 101 102Please report such in the category [emph page] of the 103[uri {http://sourceforge.net/tracker/?group_id=12883} {Tcllib SF Trackers}]. 104 105Please also report any ideas for enhancements you may have. 106 107[manpage_end] 108