1(*:maxLineLen=78:*)
2
3theory Syntax
4imports Base
5begin
6
7chapter \<open>Concrete syntax and type-checking\<close>
8
9text \<open>
10  Pure \<open>\<lambda>\<close>-calculus as introduced in \chref{ch:logic} is an adequate
11  foundation for logical languages --- in the tradition of \<^emph>\<open>higher-order
12  abstract syntax\<close> --- but end-users require additional means for reading and
13  printing of terms and types. This important add-on outside the logical core
14  is called \<^emph>\<open>inner syntax\<close> in Isabelle jargon, as opposed to the \<^emph>\<open>outer
15  syntax\<close> of the theory and proof language @{cite "isabelle-isar-ref"}.
16
17  For example, according to @{cite church40} quantifiers are represented as
18  higher-order constants \<open>All :: ('a \<Rightarrow> bool) \<Rightarrow> bool\<close> such that \<open>All (\<lambda>x::'a. B
19  x)\<close> faithfully represents the idea that is displayed in Isabelle as \<open>\<forall>x::'a.
20  B x\<close> via @{keyword "binder"} notation. Moreover, type-inference in the style
21  of Hindley-Milner @{cite hindleymilner} (and extensions) enables users to
22  write \<open>\<forall>x. B x\<close> concisely, when the type \<open>'a\<close> is already clear from the
23  context.\<^footnote>\<open>Type-inference taken to the extreme can easily confuse users.
24  Beginners often stumble over unexpectedly general types inferred by the
25  system.\<close>
26
27  \<^medskip>
28  The main inner syntax operations are \<^emph>\<open>read\<close> for parsing together with
29  type-checking, and \<^emph>\<open>pretty\<close> for formatted output. See also
30  \secref{sec:read-print}.
31
32  Furthermore, the input and output syntax layers are sub-divided into
33  separate phases for \<^emph>\<open>concrete syntax\<close> versus \<^emph>\<open>abstract syntax\<close>, see also
34  \secref{sec:parse-unparse} and \secref{sec:term-check}, respectively. This
35  results in the following decomposition of the main operations:
36
37    \<^item> \<open>read = parse; check\<close>
38
39    \<^item> \<open>pretty = uncheck; unparse\<close>
40
41  For example, some specification package might thus intercept syntax
42  processing at a well-defined stage after \<open>parse\<close>, to a augment the resulting
43  pre-term before full type-reconstruction is performed by \<open>check\<close>. Note that
44  the formal status of bound variables, versus free variables, versus
45  constants must not be changed between these phases.
46
47  \<^medskip>
48  In general, \<open>check\<close> and \<open>uncheck\<close> operate simultaneously on a list of terms.
49  This is particular important for type-checking, to reconstruct types for
50  several terms of the same context and scope. In contrast, \<open>parse\<close> and
51  \<open>unparse\<close> operate separately on single terms.
52
53  There are analogous operations to read and print types, with the same
54  sub-division into phases.
55\<close>
56
57
58section \<open>Reading and pretty printing \label{sec:read-print}\<close>
59
60text \<open>
61  Read and print operations are roughly dual to each other, such that for the
62  user \<open>s' = pretty (read s)\<close> looks similar to the original source text \<open>s\<close>,
63  but the details depend on many side-conditions. There are also explicit
64  options to control the removal of type information in the output. The
65  default configuration routinely looses information, so \<open>t' = read (pretty
66  t)\<close> might fail, or produce a differently typed term, or a completely
67  different term in the face of syntactic overloading.
68\<close>
69
70text %mlref \<open>
71  \begin{mldecls}
72  @{index_ML Syntax.read_typs: "Proof.context -> string list -> typ list"} \\
73  @{index_ML Syntax.read_terms: "Proof.context -> string list -> term list"} \\
74  @{index_ML Syntax.read_props: "Proof.context -> string list -> term list"} \\[0.5ex]
75  @{index_ML Syntax.read_typ: "Proof.context -> string -> typ"} \\
76  @{index_ML Syntax.read_term: "Proof.context -> string -> term"} \\
77  @{index_ML Syntax.read_prop: "Proof.context -> string -> term"} \\[0.5ex]
78  @{index_ML Syntax.pretty_typ: "Proof.context -> typ -> Pretty.T"} \\
79  @{index_ML Syntax.pretty_term: "Proof.context -> term -> Pretty.T"} \\
80  @{index_ML Syntax.string_of_typ: "Proof.context -> typ -> string"} \\
81  @{index_ML Syntax.string_of_term: "Proof.context -> term -> string"} \\
82  \end{mldecls}
83
84  \<^descr> @{ML Syntax.read_typs}~\<open>ctxt strs\<close> parses and checks a simultaneous list
85  of source strings as types of the logic.
86
87  \<^descr> @{ML Syntax.read_terms}~\<open>ctxt strs\<close> parses and checks a simultaneous list
88  of source strings as terms of the logic. Type-reconstruction puts all parsed
89  terms into the same scope: types of free variables ultimately need to
90  coincide.
91
92  If particular type-constraints are required for some of the arguments, the
93  read operations needs to be split into its parse and check phases. Then it
94  is possible to use @{ML Type.constraint} on the intermediate pre-terms
95  (\secref{sec:term-check}).
96
97  \<^descr> @{ML Syntax.read_props}~\<open>ctxt strs\<close> parses and checks a simultaneous list
98  of source strings as terms of the logic, with an implicit type-constraint
99  for each argument to enforce type @{typ prop}; this also affects the inner
100  syntax for parsing. The remaining type-reconstruction works as for @{ML
101  Syntax.read_terms}.
102
103  \<^descr> @{ML Syntax.read_typ}, @{ML Syntax.read_term}, @{ML Syntax.read_prop} are
104  like the simultaneous versions, but operate on a single argument only. This
105  convenient shorthand is adequate in situations where a single item in its
106  own scope is processed. Do not use @{ML "map o Syntax.read_term"} where @{ML
107  Syntax.read_terms} is actually intended!
108
109  \<^descr> @{ML Syntax.pretty_typ}~\<open>ctxt T\<close> and @{ML Syntax.pretty_term}~\<open>ctxt t\<close>
110  uncheck and pretty-print the given type or term, respectively. Although the
111  uncheck phase acts on a simultaneous list as well, this is rarely used in
112  practice, so only the singleton case is provided as combined pretty
113  operation. There is no distinction of term vs.\ proposition.
114
115  \<^descr> @{ML Syntax.string_of_typ} and @{ML Syntax.string_of_term} are convenient
116  compositions of @{ML Syntax.pretty_typ} and @{ML Syntax.pretty_term} with
117  @{ML Pretty.string_of} for output. The result may be concatenated with other
118  strings, as long as there is no further formatting and line-breaking
119  involved.
120
121
122  @{ML Syntax.read_term}, @{ML Syntax.read_prop}, and @{ML
123  Syntax.string_of_term} are the most important operations in practice.
124
125  \<^medskip>
126  Note that the string values that are passed in and out are annotated by the
127  system, to carry further markup that is relevant for the Prover IDE @{cite
128  "isabelle-jedit"}. User code should neither compose its own input strings,
129  nor try to analyze the output strings. Conceptually this is an abstract
130  datatype, encoded as concrete string for historical reasons.
131
132  The standard way to provide the required position markup for input works via
133  the outer syntax parser wrapper @{ML Parse.inner_syntax}, which is already
134  part of @{ML Parse.typ}, @{ML Parse.term}, @{ML Parse.prop}. So a string
135  obtained from one of the latter may be directly passed to the corresponding
136  read operation: this yields PIDE markup of the input and precise positions
137  for warning and error messages.
138\<close>
139
140
141section \<open>Parsing and unparsing \label{sec:parse-unparse}\<close>
142
143text \<open>
144  Parsing and unparsing converts between actual source text and a certain
145  \<^emph>\<open>pre-term\<close> format, where all bindings and scopes are already resolved
146  faithfully. Thus the names of free variables or constants are determined in
147  the sense of the logical context, but type information might be still
148  missing. Pre-terms support an explicit language of \<^emph>\<open>type constraints\<close> that
149  may be augmented by user code to guide the later \<^emph>\<open>check\<close> phase.
150
151  Actual parsing is based on traditional lexical analysis and Earley parsing
152  for arbitrary context-free grammars. The user can specify the grammar
153  declaratively via mixfix annotations. Moreover, there are \<^emph>\<open>syntax
154  translations\<close> that can be augmented by the user, either declaratively via
155  @{command translations} or programmatically via @{command
156  parse_translation}, @{command print_translation} @{cite
157  "isabelle-isar-ref"}. The final scope-resolution is performed by the system,
158  according to name spaces for types, term variables and constants determined
159  by the context.
160\<close>
161
162text %mlref \<open>
163  \begin{mldecls}
164  @{index_ML Syntax.parse_typ: "Proof.context -> string -> typ"} \\
165  @{index_ML Syntax.parse_term: "Proof.context -> string -> term"} \\
166  @{index_ML Syntax.parse_prop: "Proof.context -> string -> term"} \\[0.5ex]
167  @{index_ML Syntax.unparse_typ: "Proof.context -> typ -> Pretty.T"} \\
168  @{index_ML Syntax.unparse_term: "Proof.context -> term -> Pretty.T"} \\
169  \end{mldecls}
170
171  \<^descr> @{ML Syntax.parse_typ}~\<open>ctxt str\<close> parses a source string as pre-type that
172  is ready to be used with subsequent check operations.
173
174  \<^descr> @{ML Syntax.parse_term}~\<open>ctxt str\<close> parses a source string as pre-term that
175  is ready to be used with subsequent check operations.
176
177  \<^descr> @{ML Syntax.parse_prop}~\<open>ctxt str\<close> parses a source string as pre-term that
178  is ready to be used with subsequent check operations. The inner syntax
179  category is @{typ prop} and a suitable type-constraint is included to ensure
180  that this information is observed in subsequent type reconstruction.
181
182  \<^descr> @{ML Syntax.unparse_typ}~\<open>ctxt T\<close> unparses a type after uncheck
183  operations, to turn it into a pretty tree.
184
185  \<^descr> @{ML Syntax.unparse_term}~\<open>ctxt T\<close> unparses a term after uncheck
186  operations, to turn it into a pretty tree. There is no distinction for
187  propositions here.
188
189
190  These operations always operate on a single item; use the combinator @{ML
191  map} to apply them to a list.
192\<close>
193
194
195section \<open>Checking and unchecking \label{sec:term-check}\<close>
196
197text \<open>
198  These operations define the transition from pre-terms and fully-annotated
199  terms in the sense of the logical core (\chref{ch:logic}).
200
201  The \<^emph>\<open>check\<close> phase is meant to subsume a variety of mechanisms in the manner
202  of ``type-inference'' or ``type-reconstruction'' or ``type-improvement'',
203  not just type-checking in the narrow sense. The \<^emph>\<open>uncheck\<close> phase is roughly
204  dual, it prunes type-information before pretty printing.
205
206  A typical add-on for the check/uncheck syntax layer is the @{command
207  abbreviation} mechanism @{cite "isabelle-isar-ref"}. Here the user specifies
208  syntactic definitions that are managed by the system as polymorphic \<open>let\<close>
209  bindings. These are expanded during the \<open>check\<close> phase, and contracted during
210  the \<open>uncheck\<close> phase, without affecting the type-assignment of the given
211  terms.
212
213  \<^medskip>
214  The precise meaning of type checking depends on the context --- additional
215  check/uncheck modules might be defined in user space.
216
217  For example, the @{command class} command defines a context where \<open>check\<close>
218  treats certain type instances of overloaded constants according to the
219  ``dictionary construction'' of its logical foundation. This involves ``type
220  improvement'' (specialization of slightly too general types) and replacement
221  by certain locale parameters. See also @{cite "Haftmann-Wenzel:2009"}.
222\<close>
223
224text %mlref \<open>
225  \begin{mldecls}
226  @{index_ML Syntax.check_typs: "Proof.context -> typ list -> typ list"} \\
227  @{index_ML Syntax.check_terms: "Proof.context -> term list -> term list"} \\
228  @{index_ML Syntax.check_props: "Proof.context -> term list -> term list"} \\[0.5ex]
229  @{index_ML Syntax.uncheck_typs: "Proof.context -> typ list -> typ list"} \\
230  @{index_ML Syntax.uncheck_terms: "Proof.context -> term list -> term list"} \\
231  \end{mldecls}
232
233  \<^descr> @{ML Syntax.check_typs}~\<open>ctxt Ts\<close> checks a simultaneous list of pre-types
234  as types of the logic. Typically, this involves normalization of type
235  synonyms.
236
237  \<^descr> @{ML Syntax.check_terms}~\<open>ctxt ts\<close> checks a simultaneous list of pre-terms
238  as terms of the logic. Typically, this involves type-inference and
239  normalization term abbreviations. The types within the given terms are
240  treated in the same way as for @{ML Syntax.check_typs}.
241
242  Applications sometimes need to check several types and terms together. The
243  standard approach uses @{ML Logic.mk_type} to embed the language of types
244  into that of terms; all arguments are appended into one list of terms that
245  is checked; afterwards the type arguments are recovered with @{ML
246  Logic.dest_type}.
247
248  \<^descr> @{ML Syntax.check_props}~\<open>ctxt ts\<close> checks a simultaneous list of pre-terms
249  as terms of the logic, such that all terms are constrained by type @{typ
250  prop}. The remaining check operation works as @{ML Syntax.check_terms}
251  above.
252
253  \<^descr> @{ML Syntax.uncheck_typs}~\<open>ctxt Ts\<close> unchecks a simultaneous list of types
254  of the logic, in preparation of pretty printing.
255
256  \<^descr> @{ML Syntax.uncheck_terms}~\<open>ctxt ts\<close> unchecks a simultaneous list of terms
257  of the logic, in preparation of pretty printing. There is no distinction for
258  propositions here.
259
260
261  These operations always operate simultaneously on a list; use the combinator
262  @{ML singleton} to apply them to a single item.
263\<close>
264
265end
266