help/Docfiles/Parse.add_rule.doc

\DOC add_rule

\BLTYPE
add_rule :
  {term_name : string, fixity : fixity,
   pp_elements: term_grammar.pp_element list,
   paren_style : term_grammar.ParenStyle,
   block_style : term_grammar.PhraseBlockStyle *
                 term_grammar.block_info}  -> unit
\ELTYPE

\SYNOPSIS
Adds a parsing/printing rule to the global grammar.

\KEYWORDS
parsing

\LIBRARY
Parse

\DESCRIBE
The function {add_rule} is a fundamental method for adding parsing
(and thus printing) rules to the global term grammar that sits behind
the term-parsing function {Parse.Term}, and the pretty-printer
installed for terms. It is used for everything except the addition of
list-forms, for which refer to the entry for {add_listform}.

There are five components in the record argument to {add_rule}.  The
{term_name} component is the name of the term (whether a constant or a
variable) that will be generated at the head of the function
application.  Thus, the {term_name} component when specifying parsing
for conditional expressions is {COND}.

The following values (all in structure {Parse}) are useful for
constructing fixity values:
{
   val LEFT       : HOLgrammars.associativity
   val RIGHT      : HOLgrammars.associativity
   val NONASSOC   : HOLgrammars.associativity

   val Binder     : fixity
   val Closefix   : fixity
   val Infixl     : int -> fixity
   val Infixr     : int -> fixity
   val Infix      : HOLgrammars.associativity * int -> fixity
   val Prefix     : int -> fixity
   val Suffix     : int -> fixity
}
The {Binder} fixity is for binders such as universal and existential
quantifiers ({!} and {?}).  Binders can actually be seen as (true)
prefixes (should {`!x. p /\ q`} be parsed as {`(!x. p) /\ q`} or as
{`!x. (p /\ q)`}?), but the {add_rule} interface only allows binders to
be added at the one level (the weakest in the grammar).  Further, when
binders are added using this interface, all elements of the record
apart from the {term_name} are ignored, so the name of the binder must
be the same as the string that is parsed and printed  (but see also
restricted quantifiers: {associate_restriction}).

The remaining fixities all cause {add_rule} to pay due heed to the
{pp_elements} (``parsing/printing elements'') component of the
record.  As far as parsing is concerned, the only important elements
are {TOK} and {TM} values, of the following types:
{
   val TM  : term_grammar.pp_element
   val TOK : string -> term_grammar.pp_element
}
The {TM} value corresponds to a ``hole'' where a sub-term is
possible.  The {TOK} value corresponds to a piece of concrete syntax,
a string that is required when parsing, and which will appear when
printing.   The sequence of {pp_elements} specified in the record
passed to {add_rule} specifies the ``kernel'' syntax of an operator in the
grammar.  The ``kernel'' of a rule is extended (or not) by additional
sub-terms depending on the fixity type, thus:
{
   Closefix    :      [Kernel]     (* no external arguments *)
   Prefix      :      [Kernel] _   (* an argument to the right *)
   Suffix      :    _ [Kernel]     (* an argument to the left *)
   Infix       :    _ [Kernel] _   (* arguments on both sides *)
}
Thus simple infixes, suffixes and prefixes would have singleton
{pp_element} lists, consisting of just the symbol desired.  More
complicated mix-fix syntax can be constructed by identifying whether
or not sub-term arguments exist beyond the kernel of concrete syntax.
For example, syntax for the evaluation relation of an operational
semantics ({ _ |- _ --> _ }) is an infix with a kernel delimited by
{|-} and {-->} tokens.  Syntax for denotation brackets {[| _ |]} is a
closefix with one internal argument in the kernel.

The remaining sorts of possible {pp_element} values are concerned with
pretty-printing.  (The basic scheme is implemented on top of a
standard Oppen-style pretty-printing package.) They are
{
   (* where
        type term_grammar.block_info = PP.break_style * int
   *)
   val BreakSpace : (int * int) -> term_grammar.pp_element
   val HardSpace : int -> term_grammar.pp_element

   val BeginFinalBlock : term_grammar.block_info -> term_grammar.pp_element
   val EndInitialBlock : term_grammar.block_info -> term_grammar.pp_element
   val PPBlock : term_grammar.pp_element list * term_grammar.block_info
                 -> term_grammar.pp_element

   val OnlyIfNecessary : term_grammar.ParenStyle
   val ParoundName : term_grammar.ParenStyle
   val ParoundPrec : term_grammar.ParenStyle
   val Always : term_grammar.ParenStyle

   val AroundEachPhrase : term_grammar.PhraseBlockStyle
   val AroundSamePrec   : term_grammar.PhraseBlockStyle
   val AroundSameName   : term_grammar.PhraseBlockStyle
   val NoPhrasing       : term_grammar.PhraseBlockStyle
}
The two spacing values provide ways of specifying
white-space should be added when terms are printed.  Use of
{HardSpace n} results in {n} spaces being added to the term whatever
the context.  On the other hand, {BreakSpace(m,n)} results in a break
of width {m} spaces unless this makes the current line too wide, in
which case a line-break will occur, and the next line will be indented
an extra {n} spaces.

For example, the {add_infix} function (q.v.) is implemented in terms
of {add_rule} in such a way that a single token infix {s}, has a
{pp_element} list of
{
   [HardSpace 1, TOK s, BreakSpace(1,0)]
}

This results in chains of infixes (such as those that occur
with conjunctions) that break so as to leave the infix on the right
hand side of the line.  Under this constraint, printing can't break so
as to put the infix symbol on the start of a line, because that would
imply that the {HardSpace} had in fact been broken.  (Consequently, if
a change to this behaviour is desired, there is no global way of
effecting it, but one can do it on an infix-by-infix basis by deleting
the given rule (see, for example, {remove_termtok}) and then ``putting
it back'' with different pretty-printing constraints.)

The {PPBlock} function allows the specification of nested blocks
(blocks in the Oppen pretty-printing sense) within the list of
{pp_element}s.  Because there are sub-terms in all but the {Closefix}
fixities that occur beyond the scope of the {pp_element} list, the
{BeginFinalBlock} and {EndInitialBlock} functions can also be used to
indicate the boundary of blocks whose outer extent is the term beyond
the kernel represented by the {pp_element} list.  There is an example
of this below.

The possible {ParenStyle} values describe when parentheses should be
added to terms.  The {OnlyIfNecessary} value will cause parentheses to
be added only when required to disambiguate syntax.  The {ParoundName}
will cause parentheses to be added if necessary, or where the head
symbol has the given {term_name} and where this term is not the
argument of a function with the same head name.  This style of
parenthesisation is used with tuples, for example.  The {ParoundPrec}
value is similar, but causes parentheses to be added when the term is
the argument to a function with a different precedence level.
Finally, the {Always} value causes parentheses always to be added.

The {PhraseBlockStyle} values describe when pretty-printing blocks
involving this term should be entered.  The {AroundEachPhrase} style
causes a pretty-printing block to be created around each term.  This
is not appropriate for operators such as conjunction however, where
all of the arguments to the conjunctions in a list are more pleasingly
thought of as being at the same level.  This effect is gained by
specifying either {AroundSamePrec} or {AroundSameName}.  The former
will cause the creation of a new block for the phrase if it is at a
different precedence level from its parent, while the latter creates
the block if the parent name is not the same.  The former is
appropriate for {+} and {-} which are at the same precedence level,
while the latter is appropriate for {/\}.  Finally, the {NoPhrasing}
style causes there to be no block at all around terms controlled by
this rule.  The intention in using such a style is to have block
structure controlled by the level above.

\FAILURE
This function will fail if the {pp_element} list does not have {TOK}
values at the beginning and the end of the list, or if there are two
adjacent {TM} values in the list.  It will fail if the rule specifies
a fixity with a precedence, and if that precedence level in the
grammar is already taken by rules with a different sort of fixity.

\EXAMPLE
There are two conditional expression syntaxes defined in the theory
{bool}.  The first is the traditional HOL88/90 syntax.  Because the
syntax involves ``dangling'' terms to the left and right, it is
an infix (and one of very weak precedence at that).
{
   val _ = add_rule{term_name = "COND",
                    fixity = Infix (HOLgrammars.RIGHT, 3),
                    pp_elements = [HardSpace 1, TOK "=>",
                                   BreakSpace(1,0), TM,
                                   BreakSpace(1,0), TOK "|",
                                   HardSpace 1],
                    paren_style = OnlyIfNecessary,
                    block_style = (AroundEachPhrase,
                                   (PP.INCONSISTENT, 0))};
}
The second rule added uses the more familiar {if-then-else}
syntax.  Here there is only a ``dangling'' term to the right of the
construction, so this rule's fixity is of type {Prefix}. (If the
rule was made a {Closefix}, strings such as {`if P then Q else R`}
would still parse, but so too would {`if P then Q else`}.)  This
example also illustrates the use of blocks within rules to improve
pretty-printing.
{
   val _ = add_rule{term_name = "COND", fixity = Prefix 70,
                    pp_elements = [PPBlock([TOK "if", BreakSpace(1,2),
                                            TM, BreakSpace(1,0),
                                            TOK "then"], (PP.CONSISTENT, 0)),
                                   BreakSpace(1,2), TM, BreakSpace(1,0),
                                   BeginFinalBlock(PP.CONSISTENT, 2),
                                   TOK "else", BreakSpace(1,0)],
                    paren_style = OnlyIfNecessary,
                    block_style = (AroundEachPhrase,
                                   (PP.INCONSISTENT, 0))};
}
Note that the above form is not that actually used in the
system.  As written, it allows for pretty-printing some expressions
as:
{
   if P then
      <very long term> else Q
}
because the {block_style} is {INCONSISTENT}.

The pretty-printer prefers later rules over earlier rules by
default (though this choice can be changed with {prefer_form_with_tok}
(q.v.)), so conditional expressions print using the {if-then-else}
syntax rather than the {_ => _ | _} syntax.

\USES
For making pretty concrete syntax possible.

\COMMENTS
Because adding new rules to the grammar may result in precedence
conflicts in the operator-precedence matrix, it is as well with
interactive use to test the {Term} parser immediately after adding a
new rule, as it is only with this call that the precedence matrix is
built.

As with other functions in the {Parse} structure, there is a companion
{temp_add_rule} function, which has the same effect on the global
grammar, but which does not cause this effect to persist when the
current theory is exported.

An Isabelle-style concrete syntax for specifying rules would probably
be desirable as it would conceal the complexity of the above from most
users.

\SEEALSO
Parse.add_listform, Parse.add_infix, Parse.prefer_form_with_tok, Parse.remove_rules_for_term.
\ENDDOC