1(*:maxLineLen=78:*)
2
3theory "ML"
4imports Base
5begin
6
7chapter \<open>Isabelle/ML\<close>
8
9text \<open>
10  Isabelle/ML is best understood as a certain culture based on Standard ML.
11  Thus it is not a new programming language, but a certain way to use SML at
12  an advanced level within the Isabelle environment. This covers a variety of
13  aspects that are geared towards an efficient and robust platform for
14  applications of formal logic with fully foundational proof construction ---
15  according to the well-known \<^emph>\<open>LCF principle\<close>. There is specific
16  infrastructure with library modules to address the needs of this difficult
17  task. For example, the raw parallel programming model of Poly/ML is
18  presented as considerably more abstract concept of \<^emph>\<open>futures\<close>, which is then
19  used to augment the inference kernel, Isar theory and proof interpreter, and
20  PIDE document management.
21
22  The main aspects of Isabelle/ML are introduced below. These first-hand
23  explanations should help to understand how proper Isabelle/ML is to be read
24  and written, and to get access to the wealth of experience that is expressed
25  in the source text and its history of changes.\<^footnote>\<open>See
26  \<^url>\<open>https://isabelle.in.tum.de/repos/isabelle\<close> for the full Mercurial history.
27  There are symbolic tags to refer to official Isabelle releases, as opposed
28  to arbitrary \<^emph>\<open>tip\<close> versions that merely reflect snapshots that are never
29  really up-to-date.\<close>
30\<close>
31
32
33section \<open>Style and orthography\<close>
34
35text \<open>
36  The sources of Isabelle/Isar are optimized for \<^emph>\<open>readability\<close> and
37  \<^emph>\<open>maintainability\<close>. The main purpose is to tell an informed reader what is
38  really going on and how things really work. This is a non-trivial aim, but
39  it is supported by a certain style of writing Isabelle/ML that has emerged
40  from long years of system development.\<^footnote>\<open>See also the interesting style guide
41  for OCaml \<^url>\<open>https://caml.inria.fr/resources/doc/guides/guidelines.en.html\<close>
42  which shares many of our means and ends.\<close>
43
44  The main principle behind any coding style is \<^emph>\<open>consistency\<close>. For a single
45  author of a small program this merely means ``choose your style and stick to
46  it''. A complex project like Isabelle, with long years of development and
47  different contributors, requires more standardization. A coding style that
48  is changed every few years or with every new contributor is no style at all,
49  because consistency is quickly lost. Global consistency is hard to achieve,
50  though. Nonetheless, one should always strive at least for local consistency
51  of modules and sub-systems, without deviating from some general principles
52  how to write Isabelle/ML.
53
54  In a sense, good coding style is like an \<^emph>\<open>orthography\<close> for the sources: it
55  helps to read quickly over the text and see through the main points, without
56  getting distracted by accidental presentation of free-style code.
57\<close>
58
59
60subsection \<open>Header and sectioning\<close>
61
62text \<open>
63  Isabelle source files have a certain standardized header format (with
64  precise spacing) that follows ancient traditions reaching back to the
65  earliest versions of the system by Larry Paulson. See
66  \<^file>\<open>~~/src/Pure/thm.ML\<close>, for example.
67
68  The header includes at least \<^verbatim>\<open>Title\<close> and \<^verbatim>\<open>Author\<close> entries, followed by a
69  prose description of the purpose of the module. The latter can range from a
70  single line to several paragraphs of explanations.
71
72  The rest of the file is divided into chapters, sections, subsections,
73  subsubsections, paragraphs etc.\ using a simple layout via ML comments as
74  follows.
75
76  @{verbatim [display]
77\<open>  (**** chapter ****)
78 
79  (*** section ***)
80
81  (** subsection **)
82
83  (* subsubsection *)
84
85  (*short paragraph*)
86
87  (*
88    long paragraph,
89    with more text
90  *)\<close>}
91
92  As in regular typography, there is some extra space \<^emph>\<open>before\<close> section
93  headings that are adjacent to plain text, but not other headings as in the
94  example above.
95
96  \<^medskip>
97  The precise wording of the prose text given in these headings is chosen
98  carefully to introduce the main theme of the subsequent formal ML text.
99\<close>
100
101
102subsection \<open>Naming conventions\<close>
103
104text \<open>
105  Since ML is the primary medium to express the meaning of the source text,
106  naming of ML entities requires special care.
107\<close>
108
109paragraph \<open>Notation.\<close>
110text \<open>
111  A name consists of 1--3 \<^emph>\<open>words\<close> (rarely 4, but not more) that are separated
112  by underscore. There are three variants concerning upper or lower case
113  letters, which are used for certain ML categories as follows:
114
115  \<^medskip>
116  \begin{tabular}{lll}
117  variant & example & ML categories \\\hline
118  lower-case & @{ML_text foo_bar} & values, types, record fields \\
119  capitalized & @{ML_text Foo_Bar} & datatype constructors, structures, functors \\
120  upper-case & @{ML_text FOO_BAR} & special values, exception constructors, signatures \\
121  \end{tabular}
122  \<^medskip>
123
124  For historical reasons, many capitalized names omit underscores, e.g.\
125  old-style @{ML_text FooBar} instead of @{ML_text Foo_Bar}. Genuine
126  mixed-case names are \<^emph>\<open>not\<close> used, because clear division of words is
127  essential for readability.\<^footnote>\<open>Camel-case was invented to workaround the lack
128  of underscore in some early non-ASCII character sets. Later it became
129  habitual in some language communities that are now strong in numbers.\<close>
130
131  A single (capital) character does not count as ``word'' in this respect:
132  some Isabelle/ML names are suffixed by extra markers like this: @{ML_text
133  foo_barT}.
134
135  Name variants are produced by adding 1--3 primes, e.g.\ @{ML_text foo'},
136  @{ML_text foo''}, or @{ML_text foo'''}, but not @{ML_text foo''''} or more.
137  Decimal digits scale better to larger numbers, e.g.\ @{ML_text foo0},
138  @{ML_text foo1}, @{ML_text foo42}.
139\<close>
140
141paragraph \<open>Scopes.\<close>
142text \<open>
143  Apart from very basic library modules, ML structures are not ``opened'', but
144  names are referenced with explicit qualification, as in @{ML
145  Syntax.string_of_term} for example. When devising names for structures and
146  their components it is important to aim at eye-catching compositions of both
147  parts, because this is how they are seen in the sources and documentation.
148  For the same reasons, aliases of well-known library functions should be
149  avoided.
150
151  Local names of function abstraction or case/let bindings are typically
152  shorter, sometimes using only rudiments of ``words'', while still avoiding
153  cryptic shorthands. An auxiliary function called @{ML_text helper},
154  @{ML_text aux}, or @{ML_text f} is considered bad style.
155
156  Example:
157
158  @{verbatim [display]
159\<open>  (* RIGHT *)
160
161  fun print_foo ctxt foo =
162    let
163      fun print t = ... Syntax.string_of_term ctxt t ...
164    in ... end;
165
166
167  (* RIGHT *)
168
169  fun print_foo ctxt foo =
170    let
171      val string_of_term = Syntax.string_of_term ctxt;
172      fun print t = ... string_of_term t ...
173    in ... end;
174
175
176  (* WRONG *)
177
178  val string_of_term = Syntax.string_of_term;
179
180  fun print_foo ctxt foo =
181    let
182      fun aux t = ... string_of_term ctxt t ...
183    in ... end;\<close>}
184\<close>
185
186paragraph \<open>Specific conventions.\<close>
187text \<open>
188  Here are some specific name forms that occur frequently in the sources.
189
190  \<^item> A function that maps @{ML_text foo} to @{ML_text bar} is called @{ML_text
191  foo_to_bar} or @{ML_text bar_of_foo} (never @{ML_text foo2bar}, nor
192  @{ML_text bar_from_foo}, nor @{ML_text bar_for_foo}, nor @{ML_text
193  bar4foo}).
194
195  \<^item> The name component @{ML_text legacy} means that the operation is about to
196  be discontinued soon.
197
198  \<^item> The name component @{ML_text global} means that this works with the
199  background theory instead of the regular local context
200  (\secref{sec:context}), sometimes for historical reasons, sometimes due a
201  genuine lack of locality of the concept involved, sometimes as a fall-back
202  for the lack of a proper context in the application code. Whenever there is
203  a non-global variant available, the application should be migrated to use it
204  with a proper local context.
205
206  \<^item> Variables of the main context types of the Isabelle/Isar framework
207  (\secref{sec:context} and \chref{ch:local-theory}) have firm naming
208  conventions as follows:
209
210    \<^item> theories are called @{ML_text thy}, rarely @{ML_text theory}
211    (never @{ML_text thry})
212
213    \<^item> proof contexts are called @{ML_text ctxt}, rarely @{ML_text
214    context} (never @{ML_text ctx})
215
216    \<^item> generic contexts are called @{ML_text context}
217
218    \<^item> local theories are called @{ML_text lthy}, except for local
219    theories that are treated as proof context (which is a semantic
220    super-type)
221
222  Variations with primed or decimal numbers are always possible, as well as
223  semantic prefixes like @{ML_text foo_thy} or @{ML_text bar_ctxt}, but the
224  base conventions above need to be preserved. This allows to emphasize their
225  data flow via plain regular expressions in the text editor.
226
227  \<^item> The main logical entities (\secref{ch:logic}) have established naming
228  convention as follows:
229
230    \<^item> sorts are called @{ML_text S}
231
232    \<^item> types are called @{ML_text T}, @{ML_text U}, or @{ML_text ty} (never
233    @{ML_text t})
234
235    \<^item> terms are called @{ML_text t}, @{ML_text u}, or @{ML_text tm} (never
236    @{ML_text trm})
237
238    \<^item> certified types are called @{ML_text cT}, rarely @{ML_text T}, with
239    variants as for types
240
241    \<^item> certified terms are called @{ML_text ct}, rarely @{ML_text t}, with
242    variants as for terms (never @{ML_text ctrm})
243
244    \<^item> theorems are called @{ML_text th}, or @{ML_text thm}
245
246  Proper semantic names override these conventions completely. For example,
247  the left-hand side of an equation (as a term) can be called @{ML_text lhs}
248  (not @{ML_text lhs_tm}). Or a term that is known to be a variable can be
249  called @{ML_text v} or @{ML_text x}.
250
251  \<^item> Tactics (\secref{sec:tactics}) are sufficiently important to have specific
252  naming conventions. The name of a basic tactic definition always has a
253  @{ML_text "_tac"} suffix, the subgoal index (if applicable) is always called
254  @{ML_text i}, and the goal state (if made explicit) is usually called
255  @{ML_text st} instead of the somewhat misleading @{ML_text thm}. Any other
256  arguments are given before the latter two, and the general context is given
257  first. Example:
258
259  @{verbatim [display] \<open>  fun my_tac ctxt arg1 arg2 i st = ...\<close>}
260
261  Note that the goal state @{ML_text st} above is rarely made explicit, if
262  tactic combinators (tacticals) are used as usual.
263
264  A tactic that requires a proof context needs to make that explicit as seen
265  in the \<^verbatim>\<open>ctxt\<close> argument above. Do not refer to the background theory of
266  \<^verbatim>\<open>st\<close> -- it is not a proper context, but merely a formal certificate.
267\<close>
268
269
270subsection \<open>General source layout\<close>
271
272text \<open>
273  The general Isabelle/ML source layout imitates regular type-setting
274  conventions, augmented by the requirements for deeply nested expressions
275  that are commonplace in functional programming.
276\<close>
277
278paragraph \<open>Line length\<close>
279text \<open>
280  is limited to 80 characters according to ancient standards, but we allow as
281  much as 100 characters (not more).\<^footnote>\<open>Readability requires to keep the
282  beginning of a line in view while watching its end. Modern wide-screen
283  displays do not change the way how the human brain works. Sources also need
284  to be printable on plain paper with reasonable font-size.\<close> The extra 20
285  characters acknowledge the space requirements due to qualified library
286  references in Isabelle/ML.
287\<close>
288
289paragraph \<open>White-space\<close>
290text \<open>
291  is used to emphasize the structure of expressions, following mostly standard
292  conventions for mathematical typesetting, as can be seen in plain {\TeX} or
293  {\LaTeX}. This defines positioning of spaces for parentheses, punctuation,
294  and infixes as illustrated here:
295
296  @{verbatim [display]
297\<open>  val x = y + z * (a + b);
298  val pair = (a, b);
299  val record = {foo = 1, bar = 2};\<close>}
300
301  Lines are normally broken \<^emph>\<open>after\<close> an infix operator or punctuation
302  character. For example:
303
304  @{verbatim [display]
305\<open>
306  val x =
307    a +
308    b +
309    c;
310
311  val tuple =
312   (a,
313    b,
314    c);
315\<close>}
316
317  Some special infixes (e.g.\ @{ML_text "|>"}) work better at the start of the
318  line, but punctuation is always at the end.
319
320  Function application follows the tradition of \<open>\<lambda>\<close>-calculus, not informal
321  mathematics. For example: @{ML_text "f a b"} for a curried function, or
322  @{ML_text "g (a, b)"} for a tupled function. Note that the space between
323  @{ML_text g} and the pair @{ML_text "(a, b)"} follows the important
324  principle of \<^emph>\<open>compositionality\<close>: the layout of @{ML_text "g p"} does not
325  change when @{ML_text "p"} is refined to the concrete pair @{ML_text "(a,
326  b)"}.
327\<close>
328
329paragraph \<open>Indentation\<close>
330text \<open>
331  uses plain spaces, never hard tabulators.\<^footnote>\<open>Tabulators were invented to move
332  the carriage of a type-writer to certain predefined positions. In software
333  they could be used as a primitive run-length compression of consecutive
334  spaces, but the precise result would depend on non-standardized text editor
335  configuration.\<close>
336
337  Each level of nesting is indented by 2 spaces, sometimes 1, very rarely 4,
338  never 8 or any other odd number.
339
340  Indentation follows a simple logical format that only depends on the nesting
341  depth, not the accidental length of the text that initiates a level of
342  nesting. Example:
343
344  @{verbatim [display]
345\<open>  (* RIGHT *)
346
347  if b then
348    expr1_part1
349    expr1_part2
350  else
351    expr2_part1
352    expr2_part2
353
354
355  (* WRONG *)
356
357  if b then expr1_part1
358            expr1_part2
359  else expr2_part1
360       expr2_part2\<close>}
361
362  The second form has many problems: it assumes a fixed-width font when
363  viewing the sources, it uses more space on the line and thus makes it hard
364  to observe its strict length limit (working against \<^emph>\<open>readability\<close>), it
365  requires extra editing to adapt the layout to changes of the initial text
366  (working against \<^emph>\<open>maintainability\<close>) etc.
367
368  \<^medskip>
369  For similar reasons, any kind of two-dimensional or tabular layouts,
370  ASCII-art with lines or boxes of asterisks etc.\ should be avoided.
371\<close>
372
373paragraph \<open>Complex expressions\<close>
374text \<open>
375  that consist of multi-clausal function definitions, @{ML_text handle},
376  @{ML_text case}, @{ML_text let} (and combinations) require special
377  attention. The syntax of Standard ML is quite ambitious and admits a lot of
378  variance that can distort the meaning of the text.
379
380  Multiple clauses of @{ML_text fun}, @{ML_text fn}, @{ML_text handle},
381  @{ML_text case} get extra indentation to indicate the nesting clearly.
382  Example:
383
384  @{verbatim [display]
385\<open>  (* RIGHT *)
386
387  fun foo p1 =
388        expr1
389    | foo p2 =
390        expr2
391
392
393  (* WRONG *)
394
395  fun foo p1 =
396    expr1
397    | foo p2 =
398    expr2\<close>}
399
400  Body expressions consisting of @{ML_text case} or @{ML_text let} require
401  care to maintain compositionality, to prevent loss of logical indentation
402  where it is especially important to see the structure of the text. Example:
403
404  @{verbatim [display]
405\<open>  (* RIGHT *)
406
407  fun foo p1 =
408        (case e of
409          q1 => ...
410        | q2 => ...)
411    | foo p2 =
412        let
413          ...
414        in
415          ...
416        end
417
418
419  (* WRONG *)
420
421  fun foo p1 = case e of
422      q1 => ...
423    | q2 => ...
424    | foo p2 =
425    let
426      ...
427    in
428      ...
429    end\<close>}
430
431  Extra parentheses around @{ML_text case} expressions are optional, but help
432  to analyse the nesting based on character matching in the text editor.
433
434  \<^medskip>
435  There are two main exceptions to the overall principle of compositionality
436  in the layout of complex expressions.
437
438  \<^enum> @{ML_text "if"} expressions are iterated as if ML had multi-branch
439  conditionals, e.g.
440
441  @{verbatim [display]
442\<open>  (* RIGHT *)
443
444  if b1 then e1
445  else if b2 then e2
446  else e3\<close>}
447
448  \<^enum> @{ML_text fn} abstractions are often layed-out as if they would lack any
449  structure by themselves. This traditional form is motivated by the
450  possibility to shift function arguments back and forth wrt.\ additional
451  combinators. Example:
452
453  @{verbatim [display]
454\<open>  (* RIGHT *)
455
456  fun foo x y = fold (fn z =>
457    expr)\<close>}
458
459  Here the visual appearance is that of three arguments @{ML_text x},
460  @{ML_text y}, @{ML_text z} in a row.
461
462
463  Such weakly structured layout should be use with great care. Here are some
464  counter-examples involving @{ML_text let} expressions:
465
466  @{verbatim [display]
467\<open>  (* WRONG *)
468
469  fun foo x = let
470      val y = ...
471    in ... end
472
473
474  (* WRONG *)
475
476  fun foo x = let
477    val y = ...
478  in ... end
479
480
481  (* WRONG *)
482
483  fun foo x =
484  let
485    val y = ...
486  in ... end
487
488
489  (* WRONG *)
490
491  fun foo x =
492    let
493      val y = ...
494    in
495      ... end\<close>}
496
497  \<^medskip>
498  In general the source layout is meant to emphasize the structure of complex
499  language expressions, not to pretend that SML had a completely different
500  syntax (say that of Haskell, Scala, Java).
501\<close>
502
503
504section \<open>ML embedded into Isabelle/Isar\<close>
505
506text \<open>
507  ML and Isar are intertwined via an open-ended bootstrap process that
508  provides more and more programming facilities and logical content in an
509  alternating manner. Bootstrapping starts from the raw environment of
510  existing implementations of Standard ML (mainly Poly/ML).
511
512  Isabelle/Pure marks the point where the raw ML toplevel is superseded by
513  Isabelle/ML within the Isar theory and proof language, with a uniform
514  context for arbitrary ML values (see also \secref{sec:context}). This formal
515  environment holds ML compiler bindings, logical entities, and many other
516  things.
517
518  Object-logics like Isabelle/HOL are built within the Isabelle/ML/Isar
519  environment by introducing suitable theories with associated ML modules,
520  either inlined within \<^verbatim>\<open>.thy\<close> files, or as separate \<^verbatim>\<open>.ML\<close> files that are
521  loading from some theory. Thus Isabelle/HOL is defined as a regular
522  user-space application within the Isabelle framework. Further add-on tools
523  can be implemented in ML within the Isar context in the same manner: ML is
524  part of the standard repertoire of Isabelle, and there is no distinction
525  between ``users'' and ``developers'' in this respect.
526\<close>
527
528
529subsection \<open>Isar ML commands\<close>
530
531text \<open>
532  The primary Isar source language provides facilities to ``open a window'' to
533  the underlying ML compiler. Especially see the Isar commands @{command_ref
534  "ML_file"} and @{command_ref "ML"}: both work the same way, but the source
535  text is provided differently, via a file vs.\ inlined, respectively. Apart
536  from embedding ML into the main theory definition like that, there are many
537  more commands that refer to ML source, such as @{command_ref setup} or
538  @{command_ref declaration}. Even more fine-grained embedding of ML into Isar
539  is encountered in the proof method @{method_ref tactic}, which refines the
540  pending goal state via a given expression of type @{ML_type tactic}.
541\<close>
542
543text %mlex \<open>
544  The following artificial example demonstrates some ML toplevel declarations
545  within the implicit Isar theory context. This is regular functional
546  programming without referring to logical entities yet.
547\<close>
548
549ML \<open>
550  fun factorial 0 = 1
551    | factorial n = n * factorial (n - 1)
552\<close>
553
554text \<open>
555  Here the ML environment is already managed by Isabelle, i.e.\ the @{ML
556  factorial} function is not yet accessible in the preceding paragraph, nor in
557  a different theory that is independent from the current one in the import
558  hierarchy.
559
560  Removing the above ML declaration from the source text will remove any trace
561  of this definition, as expected. The Isabelle/ML toplevel environment is
562  managed in a \<^emph>\<open>stateless\<close> way: in contrast to the raw ML toplevel, there are
563  no global side-effects involved here.\<^footnote>\<open>Such a stateless compilation
564  environment is also a prerequisite for robust parallel compilation within
565  independent nodes of the implicit theory development graph.\<close>
566
567  \<^medskip>
568  The next example shows how to embed ML into Isar proofs, using @{command_ref
569  "ML_prf"} instead of @{command_ref "ML"}. As illustrated below, the effect
570  on the ML environment is local to the whole proof body, but ignoring the
571  block structure.
572\<close>
573
574notepad
575begin
576  ML_prf %"ML" \<open>val a = 1\<close>
577  {
578    ML_prf %"ML" \<open>val b = a + 1\<close>
579  } \<comment> \<open>Isar block structure ignored by ML environment\<close>
580  ML_prf %"ML" \<open>val c = b + 1\<close>
581end
582
583text \<open>
584  By side-stepping the normal scoping rules for Isar proof blocks, embedded ML
585  code can refer to the different contexts and manipulate corresponding
586  entities, e.g.\ export a fact from a block context.
587
588  \<^medskip>
589  Two further ML commands are useful in certain situations: @{command_ref
590  ML_val} and @{command_ref ML_command} are \<^emph>\<open>diagnostic\<close> in the sense that
591  there is no effect on the underlying environment, and can thus be used
592  anywhere. The examples below produce long strings of digits by invoking @{ML
593  factorial}: @{command ML_val} takes care of printing the ML toplevel result,
594  but @{command ML_command} is silent so we produce an explicit output
595  message.
596\<close>
597
598ML_val \<open>factorial 100\<close>
599ML_command \<open>writeln (string_of_int (factorial 100))\<close>
600
601notepad
602begin
603  ML_val \<open>factorial 100\<close>
604  ML_command \<open>writeln (string_of_int (factorial 100))\<close>
605end
606
607
608subsection \<open>Compile-time context\<close>
609
610text \<open>
611  Whenever the ML compiler is invoked within Isabelle/Isar, the formal context
612  is passed as a thread-local reference variable. Thus ML code may access the
613  theory context during compilation, by reading or writing the (local) theory
614  under construction. Note that such direct access to the compile-time context
615  is rare. In practice it is typically done via some derived ML functions
616  instead.
617\<close>
618
619text %mlref \<open>
620  \begin{mldecls}
621  @{index_ML Context.the_generic_context: "unit -> Context.generic"} \\
622  @{index_ML "Context.>>": "(Context.generic -> Context.generic) -> unit"} \\
623  @{index_ML ML_Thms.bind_thms: "string * thm list -> unit"} \\
624  @{index_ML ML_Thms.bind_thm: "string * thm -> unit"} \\
625  \end{mldecls}
626
627    \<^descr> @{ML "Context.the_generic_context ()"} refers to the theory context of
628    the ML toplevel --- at compile time. ML code needs to take care to refer to
629    @{ML "Context.the_generic_context ()"} correctly. Recall that evaluation
630    of a function body is delayed until actual run-time.
631
632    \<^descr> @{ML "Context.>>"}~\<open>f\<close> applies context transformation \<open>f\<close> to the implicit
633    context of the ML toplevel.
634
635    \<^descr> @{ML ML_Thms.bind_thms}~\<open>(name, thms)\<close> stores a list of theorems produced
636    in ML both in the (global) theory context and the ML toplevel, associating
637    it with the provided name.
638
639    \<^descr> @{ML ML_Thms.bind_thm} is similar to @{ML ML_Thms.bind_thms} but refers to
640    a singleton fact.
641
642  It is important to note that the above functions are really restricted to
643  the compile time, even though the ML compiler is invoked at run-time. The
644  majority of ML code either uses static antiquotations
645  (\secref{sec:ML-antiq}) or refers to the theory or proof context at
646  run-time, by explicit functional abstraction.
647\<close>
648
649
650subsection \<open>Antiquotations \label{sec:ML-antiq}\<close>
651
652text \<open>
653  A very important consequence of embedding ML into Isar is the concept of
654  \<^emph>\<open>ML antiquotation\<close>. The standard token language of ML is augmented by
655  special syntactic entities of the following form:
656
657  @{rail \<open>
658  @{syntax_def antiquote}: '@{' name args '}'
659  \<close>}
660
661  Here @{syntax name} and @{syntax args} are outer syntax categories, as
662  defined in @{cite "isabelle-isar-ref"}.
663
664  \<^medskip>
665  A regular antiquotation \<open>@{name args}\<close> processes its arguments by the usual
666  means of the Isar source language, and produces corresponding ML source
667  text, either as literal \<^emph>\<open>inline\<close> text (e.g.\ \<open>@{term t}\<close>) or abstract
668  \<^emph>\<open>value\<close> (e.g. \<open>@{thm th}\<close>). This pre-compilation scheme allows to refer to
669  formal entities in a robust manner, with proper static scoping and with some
670  degree of logical checking of small portions of the code.
671\<close>
672
673
674subsection \<open>Printing ML values\<close>
675
676text \<open>
677  The ML compiler knows about the structure of values according to their
678  static type, and can print them in the manner of its toplevel, although the
679  details are non-portable. The antiquotations @{ML_antiquotation_def
680  "make_string"} and @{ML_antiquotation_def "print"} provide a quasi-portable
681  way to refer to this potential capability of the underlying ML system in
682  generic Isabelle/ML sources.
683
684  This is occasionally useful for diagnostic or demonstration purposes. Note
685  that production-quality tools require proper user-level error messages,
686  avoiding raw ML values in the output.
687\<close>
688
689text %mlantiq \<open>
690  \begin{matharray}{rcl}
691  @{ML_antiquotation_def "make_string"} & : & \<open>ML_antiquotation\<close> \\
692  @{ML_antiquotation_def "print"} & : & \<open>ML_antiquotation\<close> \\
693  \end{matharray}
694
695  @{rail \<open>
696  @@{ML_antiquotation make_string}
697  ;
698  @@{ML_antiquotation print} embedded?
699  \<close>}
700
701  \<^descr> \<open>@{make_string}\<close> inlines a function to print arbitrary values similar to
702  the ML toplevel. The result is compiler dependent and may fall back on "?"
703  in certain situations. The value of configuration option @{attribute_ref
704  ML_print_depth} determines further details of output.
705
706  \<^descr> \<open>@{print f}\<close> uses the ML function \<open>f: string -> unit\<close> to output the result
707  of \<open>@{make_string}\<close> above, together with the source position of the
708  antiquotation. The default output function is @{ML writeln}.
709\<close>
710
711text %mlex \<open>
712  The following artificial examples show how to produce adhoc output of ML
713  values for debugging purposes.
714\<close>
715
716ML_val \<open>
717  val x = 42;
718  val y = true;
719
720  writeln (@{make_string} {x = x, y = y});
721
722  @{print} {x = x, y = y};
723  @{print tracing} {x = x, y = y};
724\<close>
725
726
727section \<open>Canonical argument order \label{sec:canonical-argument-order}\<close>
728
729text \<open>
730  Standard ML is a language in the tradition of \<open>\<lambda>\<close>-calculus and
731  \<^emph>\<open>higher-order functional programming\<close>, similar to OCaml, Haskell, or
732  Isabelle/Pure and HOL as logical languages. Getting acquainted with the
733  native style of representing functions in that setting can save a lot of
734  extra boiler-plate of redundant shuffling of arguments, auxiliary
735  abstractions etc.
736
737  Functions are usually \<^emph>\<open>curried\<close>: the idea of turning arguments of type
738  \<open>\<tau>\<^sub>i\<close> (for \<open>i \<in> {1, \<dots> n}\<close>) into a result of type \<open>\<tau>\<close> is represented by the
739  iterated function space \<open>\<tau>\<^sub>1 \<rightarrow> \<dots> \<rightarrow> \<tau>\<^sub>n \<rightarrow> \<tau>\<close>. This is isomorphic to the
740  well-known encoding via tuples \<open>\<tau>\<^sub>1 \<times> \<dots> \<times> \<tau>\<^sub>n \<rightarrow> \<tau>\<close>, but the curried version
741  fits more smoothly into the basic calculus.\<^footnote>\<open>The difference is even more
742  significant in HOL, because the redundant tuple structure needs to be
743  accommodated extraneous proof steps.\<close>
744
745  Currying gives some flexibility due to \<^emph>\<open>partial application\<close>. A function
746  \<open>f: \<tau>\<^sub>1 \<rightarrow> \<tau>\<^sub>2 \<rightarrow> \<tau>\<close> can be applied to \<open>x: \<tau>\<^sub>1\<close> and the remaining \<open>(f x): \<tau>\<^sub>2
747  \<rightarrow> \<tau>\<close> passed to another function etc. How well this works in practice depends
748  on the order of arguments. In the worst case, arguments are arranged
749  erratically, and using a function in a certain situation always requires
750  some glue code. Thus we would get exponentially many opportunities to
751  decorate the code with meaningless permutations of arguments.
752
753  This can be avoided by \<^emph>\<open>canonical argument order\<close>, which observes certain
754  standard patterns and minimizes adhoc permutations in their application. In
755  Isabelle/ML, large portions of text can be written without auxiliary
756  operations like \<open>swap: \<alpha> \<times> \<beta> \<rightarrow> \<beta> \<times> \<alpha>\<close> or \<open>C: (\<alpha> \<rightarrow> \<beta> \<rightarrow> \<gamma>) \<rightarrow> (\<beta> \<rightarrow> \<alpha> \<rightarrow> \<gamma>)\<close> (the
757  latter is not present in the Isabelle/ML library).
758
759  \<^medskip>
760  The main idea is that arguments that vary less are moved further to the left
761  than those that vary more. Two particularly important categories of
762  functions are \<^emph>\<open>selectors\<close> and \<^emph>\<open>updates\<close>.
763
764  The subsequent scheme is based on a hypothetical set-like container of type
765  \<open>\<beta>\<close> that manages elements of type \<open>\<alpha>\<close>. Both the names and types of the
766  associated operations are canonical for Isabelle/ML.
767
768  \begin{center}
769  \begin{tabular}{ll}
770  kind & canonical name and type \\\hline
771  selector & \<open>member: \<beta> \<rightarrow> \<alpha> \<rightarrow> bool\<close> \\
772  update & \<open>insert: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>\<close> \\
773  \end{tabular}
774  \end{center}
775
776  Given a container \<open>B: \<beta>\<close>, the partially applied \<open>member B\<close> is a predicate
777  over elements \<open>\<alpha> \<rightarrow> bool\<close>, and thus represents the intended denotation
778  directly. It is customary to pass the abstract predicate to further
779  operations, not the concrete container. The argument order makes it easy to
780  use other combinators: \<open>forall (member B) list\<close> will check a list of
781  elements for membership in \<open>B\<close> etc. Often the explicit \<open>list\<close> is pointless
782  and can be contracted to \<open>forall (member B)\<close> to get directly a predicate
783  again.
784
785  In contrast, an update operation varies the container, so it moves to the
786  right: \<open>insert a\<close> is a function \<open>\<beta> \<rightarrow> \<beta>\<close> to insert a value \<open>a\<close>. These can be
787  composed naturally as \<open>insert c \<circ> insert b \<circ> insert a\<close>. The slightly awkward
788  inversion of the composition order is due to conventional mathematical
789  notation, which can be easily amended as explained below.
790\<close>
791
792
793subsection \<open>Forward application and composition\<close>
794
795text \<open>
796  Regular function application and infix notation works best for relatively
797  deeply structured expressions, e.g.\ \<open>h (f x y + g z)\<close>. The important
798  special case of \<^emph>\<open>linear transformation\<close> applies a cascade of functions \<open>f\<^sub>n
799  (\<dots> (f\<^sub>1 x))\<close>. This becomes hard to read and maintain if the functions are
800  themselves given as complex expressions. The notation can be significantly
801  improved by introducing \<^emph>\<open>forward\<close> versions of application and composition
802  as follows:
803
804  \<^medskip>
805  \begin{tabular}{lll}
806  \<open>x |> f\<close> & \<open>\<equiv>\<close> & \<open>f x\<close> \\
807  \<open>(f #> g) x\<close> & \<open>\<equiv>\<close> & \<open>x |> f |> g\<close> \\
808  \end{tabular}
809  \<^medskip>
810
811  This enables to write conveniently \<open>x |> f\<^sub>1 |> \<dots> |> f\<^sub>n\<close> or \<open>f\<^sub>1 #> \<dots> #>
812  f\<^sub>n\<close> for its functional abstraction over \<open>x\<close>.
813
814  \<^medskip>
815  There is an additional set of combinators to accommodate multiple results
816  (via pairs) that are passed on as multiple arguments (via currying).
817
818  \<^medskip>
819  \begin{tabular}{lll}
820  \<open>(x, y) |-> f\<close> & \<open>\<equiv>\<close> & \<open>f x y\<close> \\
821  \<open>(f #-> g) x\<close> & \<open>\<equiv>\<close> & \<open>x |> f |-> g\<close> \\
822  \end{tabular}
823  \<^medskip>
824\<close>
825
826text %mlref \<open>
827  \begin{mldecls}
828  @{index_ML_op "|> ": "'a * ('a -> 'b) -> 'b"} \\
829  @{index_ML_op "|-> ": "('c * 'a) * ('c -> 'a -> 'b) -> 'b"} \\
830  @{index_ML_op "#> ": "('a -> 'b) * ('b -> 'c) -> 'a -> 'c"} \\
831  @{index_ML_op "#-> ": "('a -> 'c * 'b) * ('c -> 'b -> 'd) -> 'a -> 'd"} \\
832  \end{mldecls}
833\<close>
834
835
836subsection \<open>Canonical iteration\<close>
837
838text \<open>
839  As explained above, a function \<open>f: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>\<close> can be understood as update on
840  a configuration of type \<open>\<beta>\<close>, parameterized by an argument of type \<open>\<alpha>\<close>. Given
841  \<open>a: \<alpha>\<close> the partial application \<open>(f a): \<beta> \<rightarrow> \<beta>\<close> operates homogeneously on \<open>\<beta>\<close>.
842  This can be iterated naturally over a list of parameters \<open>[a\<^sub>1, \<dots>, a\<^sub>n]\<close> as
843  \<open>f a\<^sub>1 #> \<dots> #> f a\<^sub>n\<close>. The latter expression is again a function \<open>\<beta> \<rightarrow> \<beta>\<close>. It
844  can be applied to an initial configuration \<open>b: \<beta>\<close> to start the iteration
845  over the given list of arguments: each \<open>a\<close> in \<open>a\<^sub>1, \<dots>, a\<^sub>n\<close> is applied
846  consecutively by updating a cumulative configuration.
847
848  The \<open>fold\<close> combinator in Isabelle/ML lifts a function \<open>f\<close> as above to its
849  iterated version over a list of arguments. Lifting can be repeated, e.g.\
850  \<open>(fold \<circ> fold) f\<close> iterates over a list of lists as expected.
851
852  The variant \<open>fold_rev\<close> works inside-out over the list of arguments, such
853  that \<open>fold_rev f \<equiv> fold f \<circ> rev\<close> holds.
854
855  The \<open>fold_map\<close> combinator essentially performs \<open>fold\<close> and \<open>map\<close>
856  simultaneously: each application of \<open>f\<close> produces an updated configuration
857  together with a side-result; the iteration collects all such side-results as
858  a separate list.
859\<close>
860
861text %mlref \<open>
862  \begin{mldecls}
863  @{index_ML fold: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\
864  @{index_ML fold_rev: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\
865  @{index_ML fold_map: "('a -> 'b -> 'c * 'b) -> 'a list -> 'b -> 'c list * 'b"} \\
866  \end{mldecls}
867
868  \<^descr> @{ML fold}~\<open>f\<close> lifts the parametrized update function \<open>f\<close> to a list of
869  parameters.
870
871  \<^descr> @{ML fold_rev}~\<open>f\<close> is similar to @{ML fold}~\<open>f\<close>, but works inside-out, as
872  if the list would be reversed.
873
874  \<^descr> @{ML fold_map}~\<open>f\<close> lifts the parametrized update function \<open>f\<close> (with
875  side-result) to a list of parameters and cumulative side-results.
876
877
878  \begin{warn}
879  The literature on functional programming provides a confusing multitude of
880  combinators called \<open>foldl\<close>, \<open>foldr\<close> etc. SML97 provides its own variations
881  as @{ML List.foldl} and @{ML List.foldr}, while the classic Isabelle library
882  also has the historic @{ML Library.foldl} and @{ML Library.foldr}. To avoid
883  unnecessary complication, all these historical versions should be ignored,
884  and the canonical @{ML fold} (or @{ML fold_rev}) used exclusively.
885  \end{warn}
886\<close>
887
888text %mlex \<open>
889  The following example shows how to fill a text buffer incrementally by
890  adding strings, either individually or from a given list.
891\<close>
892
893ML_val \<open>
894  val s =
895    Buffer.empty
896    |> Buffer.add "digits: "
897    |> fold (Buffer.add o string_of_int) (0 upto 9)
898    |> Buffer.content;
899
900  @{assert} (s = "digits: 0123456789");
901\<close>
902
903text \<open>
904  Note how @{ML "fold (Buffer.add o string_of_int)"} above saves an extra @{ML
905  "map"} over the given list. This kind of peephole optimization reduces both
906  the code size and the tree structures in memory (``deforestation''), but it
907  requires some practice to read and write fluently.
908
909  \<^medskip>
910  The next example elaborates the idea of canonical iteration, demonstrating
911  fast accumulation of tree content using a text buffer.
912\<close>
913
914ML \<open>
915  datatype tree = Text of string | Elem of string * tree list;
916
917  fun slow_content (Text txt) = txt
918    | slow_content (Elem (name, ts)) =
919        "<" ^ name ^ ">" ^
920        implode (map slow_content ts) ^
921        "</" ^ name ^ ">"
922
923  fun add_content (Text txt) = Buffer.add txt
924    | add_content (Elem (name, ts)) =
925        Buffer.add ("<" ^ name ^ ">") #>
926        fold add_content ts #>
927        Buffer.add ("</" ^ name ^ ">");
928
929  fun fast_content tree =
930    Buffer.empty |> add_content tree |> Buffer.content;
931\<close>
932
933text \<open>
934  The slowness of @{ML slow_content} is due to the @{ML implode} of the
935  recursive results, because it copies previously produced strings again and
936  again.
937
938  The incremental @{ML add_content} avoids this by operating on a buffer that
939  is passed through in a linear fashion. Using @{ML_text "#>"} and contraction
940  over the actual buffer argument saves some additional boiler-plate. Of
941  course, the two @{ML "Buffer.add"} invocations with concatenated strings
942  could have been split into smaller parts, but this would have obfuscated the
943  source without making a big difference in performance. Here we have done
944  some peephole-optimization for the sake of readability.
945
946  Another benefit of @{ML add_content} is its ``open'' form as a function on
947  buffers that can be continued in further linear transformations, folding
948  etc. Thus it is more compositional than the naive @{ML slow_content}. As
949  realistic example, compare the old-style
950  @{ML "Term.maxidx_of_term: term -> int"} with the newer @{ML
951  "Term.maxidx_term: term -> int -> int"} in Isabelle/Pure.
952
953  Note that @{ML fast_content} above is only defined as example. In many
954  practical situations, it is customary to provide the incremental @{ML
955  add_content} only and leave the initialization and termination to the
956  concrete application to the user.
957\<close>
958
959
960section \<open>Message output channels \label{sec:message-channels}\<close>
961
962text \<open>
963  Isabelle provides output channels for different kinds of messages: regular
964  output, high-volume tracing information, warnings, and errors.
965
966  Depending on the user interface involved, these messages may appear in
967  different text styles or colours. The standard output for batch sessions
968  prefixes each line of warnings by \<^verbatim>\<open>###\<close> and errors by \<^verbatim>\<open>***\<close>, but leaves
969  anything else unchanged. The message body may contain further markup and
970  formatting, which is routinely used in the Prover IDE @{cite
971  "isabelle-jedit"}.
972
973  Messages are associated with the transaction context of the running Isar
974  command. This enables the front-end to manage commands and resulting
975  messages together. For example, after deleting a command from a given theory
976  document version, the corresponding message output can be retracted from the
977  display.
978\<close>
979
980text %mlref \<open>
981  \begin{mldecls}
982  @{index_ML writeln: "string -> unit"} \\
983  @{index_ML tracing: "string -> unit"} \\
984  @{index_ML warning: "string -> unit"} \\
985  @{index_ML error: "string -> 'a"} % FIXME Output.error_message (!?) \\
986  \end{mldecls}
987
988  \<^descr> @{ML writeln}~\<open>text\<close> outputs \<open>text\<close> as regular message. This is the
989  primary message output operation of Isabelle and should be used by default.
990
991  \<^descr> @{ML tracing}~\<open>text\<close> outputs \<open>text\<close> as special tracing message, indicating
992  potential high-volume output to the front-end (hundreds or thousands of
993  messages issued by a single command). The idea is to allow the
994  user-interface to downgrade the quality of message display to achieve higher
995  throughput.
996
997  Note that the user might have to take special actions to see tracing output,
998  e.g.\ switch to a different output window. So this channel should not be
999  used for regular output.
1000
1001  \<^descr> @{ML warning}~\<open>text\<close> outputs \<open>text\<close> as warning, which typically means some
1002  extra emphasis on the front-end side (color highlighting, icons, etc.).
1003
1004  \<^descr> @{ML error}~\<open>text\<close> raises exception @{ML ERROR}~\<open>text\<close> and thus lets the
1005  Isar toplevel print \<open>text\<close> on the error channel, which typically means some
1006  extra emphasis on the front-end side (color highlighting, icons, etc.).
1007
1008  This assumes that the exception is not handled before the command
1009  terminates. Handling exception @{ML ERROR}~\<open>text\<close> is a perfectly legal
1010  alternative: it means that the error is absorbed without any message output.
1011
1012  \begin{warn}
1013  The actual error channel is accessed via @{ML Output.error_message}, but
1014  this is normally not used directly in user code.
1015  \end{warn}
1016
1017
1018  \begin{warn}
1019  Regular Isabelle/ML code should output messages exclusively by the official
1020  channels. Using raw I/O on \<^emph>\<open>stdout\<close> or \<^emph>\<open>stderr\<close> instead (e.g.\ via @{ML
1021  TextIO.output}) is apt to cause problems in the presence of parallel and
1022  asynchronous processing of Isabelle theories. Such raw output might be
1023  displayed by the front-end in some system console log, with a low chance
1024  that the user will ever see it. Moreover, as a genuine side-effect on global
1025  process channels, there is no proper way to retract output when Isar command
1026  transactions are reset by the system.
1027  \end{warn}
1028
1029  \begin{warn}
1030  The message channels should be used in a message-oriented manner. This means
1031  that multi-line output that logically belongs together is issued by a single
1032  invocation of @{ML writeln} etc.\ with the functional concatenation of all
1033  message constituents.
1034  \end{warn}
1035\<close>
1036
1037text %mlex \<open>
1038  The following example demonstrates a multi-line warning. Note that in some
1039  situations the user sees only the first line, so the most important point
1040  should be made first.
1041\<close>
1042
1043ML_command \<open>
1044  warning (cat_lines
1045   ["Beware the Jabberwock, my son!",
1046    "The jaws that bite, the claws that catch!",
1047    "Beware the Jubjub Bird, and shun",
1048    "The frumious Bandersnatch!"]);
1049\<close>
1050
1051text \<open>
1052  \<^medskip>
1053  An alternative is to make a paragraph of freely-floating words as follows.
1054\<close>
1055
1056ML_command \<open>
1057  warning (Pretty.string_of (Pretty.para
1058    "Beware the Jabberwock, my son! \
1059    \The jaws that bite, the claws that catch! \
1060    \Beware the Jubjub Bird, and shun \
1061    \The frumious Bandersnatch!"))
1062\<close>
1063
1064text \<open>
1065  This has advantages with variable window / popup sizes, but might make it
1066  harder to search for message content systematically, e.g.\ by other tools or
1067  by humans expecting the ``verse'' of a formal message in a fixed layout.
1068\<close>
1069
1070
1071section \<open>Exceptions \label{sec:exceptions}\<close>
1072
1073text \<open>
1074  The Standard ML semantics of strict functional evaluation together with
1075  exceptions is rather well defined, but some delicate points need to be
1076  observed to avoid that ML programs go wrong despite static type-checking.
1077  Exceptions in Isabelle/ML are subsequently categorized as follows.
1078\<close>
1079
1080paragraph \<open>Regular user errors.\<close>
1081text \<open>
1082  These are meant to provide informative feedback about malformed input etc.
1083
1084  The \<^emph>\<open>error\<close> function raises the corresponding @{ML ERROR} exception, with a
1085  plain text message as argument. @{ML ERROR} exceptions can be handled
1086  internally, in order to be ignored, turned into other exceptions, or
1087  cascaded by appending messages. If the corresponding Isabelle/Isar command
1088  terminates with an @{ML ERROR} exception state, the system will print the
1089  result on the error channel (see \secref{sec:message-channels}).
1090
1091  It is considered bad style to refer to internal function names or values in
1092  ML source notation in user error messages. Do not use \<open>@{make_string}\<close> nor
1093  \<open>@{here}\<close>!
1094
1095  Grammatical correctness of error messages can be improved by \<^emph>\<open>omitting\<close>
1096  final punctuation: messages are often concatenated or put into a larger
1097  context (e.g.\ augmented with source position). Note that punctuation after
1098  formal entities (types, terms, theorems) is particularly prone to user
1099  confusion.
1100\<close>
1101
1102paragraph \<open>Program failures.\<close>
1103text \<open>
1104  There is a handful of standard exceptions that indicate general failure
1105  situations, or failures of core operations on logical entities (types,
1106  terms, theorems, theories, see \chref{ch:logic}).
1107
1108  These exceptions indicate a genuine breakdown of the program, so the main
1109  purpose is to determine quickly what has happened where. Traditionally, the
1110  (short) exception message would include the name of an ML function, although
1111  this is no longer necessary, because the ML runtime system attaches detailed
1112  source position stemming from the corresponding @{ML_text raise} keyword.
1113
1114  \<^medskip>
1115  User modules can always introduce their own custom exceptions locally, e.g.\
1116  to organize internal failures robustly without overlapping with existing
1117  exceptions. Exceptions that are exposed in module signatures require extra
1118  care, though, and should \<^emph>\<open>not\<close> be introduced by default. Surprise by users
1119  of a module can be often minimized by using plain user errors instead.
1120\<close>
1121
1122paragraph \<open>Interrupts.\<close>
1123text \<open>
1124  These indicate arbitrary system events: both the ML runtime system and the
1125  Isabelle/ML infrastructure signal various exceptional situations by raising
1126  the special @{ML Exn.Interrupt} exception in user code.
1127
1128  This is the one and only way that physical events can intrude an Isabelle/ML
1129  program. Such an interrupt can mean out-of-memory, stack overflow, timeout,
1130  internal signaling of threads, or a POSIX process signal. An Isabelle/ML
1131  program that intercepts interrupts becomes dependent on physical effects of
1132  the environment. Even worse, exception handling patterns that are too
1133  general by accident, e.g.\ by misspelled exception constructors, will cover
1134  interrupts unintentionally and thus render the program semantics
1135  ill-defined.
1136
1137  Note that the Interrupt exception dates back to the original SML90 language
1138  definition. It was excluded from the SML97 version to avoid its malign
1139  impact on ML program semantics, but without providing a viable alternative.
1140  Isabelle/ML recovers physical interruptibility (which is an indispensable
1141  tool to implement managed evaluation of command transactions), but requires
1142  user code to be strictly transparent wrt.\ interrupts.
1143
1144  \begin{warn}
1145  Isabelle/ML user code needs to terminate promptly on interruption, without
1146  guessing at its meaning to the system infrastructure. Temporary handling of
1147  interrupts for cleanup of global resources etc.\ needs to be followed
1148  immediately by re-raising of the original exception.
1149  \end{warn}
1150\<close>
1151
1152text %mlref \<open>
1153  \begin{mldecls}
1154  @{index_ML try: "('a -> 'b) -> 'a -> 'b option"} \\
1155  @{index_ML can: "('a -> 'b) -> 'a -> bool"} \\
1156  @{index_ML_exception ERROR: string} \\
1157  @{index_ML_exception Fail: string} \\
1158  @{index_ML Exn.is_interrupt: "exn -> bool"} \\
1159  @{index_ML Exn.reraise: "exn -> 'a"} \\
1160  @{index_ML Runtime.exn_trace: "(unit -> 'a) -> 'a"} \\
1161  \end{mldecls}
1162
1163  \<^descr> @{ML try}~\<open>f x\<close> makes the partiality of evaluating \<open>f x\<close> explicit via the
1164  option datatype. Interrupts are \<^emph>\<open>not\<close> handled here, i.e.\ this form serves
1165  as safe replacement for the \<^emph>\<open>unsafe\<close> version @{ML_text "(SOME"}~\<open>f
1166  x\<close>~@{ML_text "handle _ => NONE)"} that is occasionally seen in books about
1167  SML97, but not in Isabelle/ML.
1168
1169  \<^descr> @{ML can} is similar to @{ML try} with more abstract result.
1170
1171  \<^descr> @{ML ERROR}~\<open>msg\<close> represents user errors; this exception is normally
1172  raised indirectly via the @{ML error} function (see
1173  \secref{sec:message-channels}).
1174
1175  \<^descr> @{ML Fail}~\<open>msg\<close> represents general program failures.
1176
1177  \<^descr> @{ML Exn.is_interrupt} identifies interrupts robustly, without mentioning
1178  concrete exception constructors in user code. Handled interrupts need to be
1179  re-raised promptly!
1180
1181  \<^descr> @{ML Exn.reraise}~\<open>exn\<close> raises exception \<open>exn\<close> while preserving its implicit
1182  position information (if possible, depending on the ML platform).
1183
1184  \<^descr> @{ML Runtime.exn_trace}~@{ML_text "(fn () =>"}~\<open>e\<close>@{ML_text ")"} evaluates
1185  expression \<open>e\<close> while printing a full trace of its stack of nested exceptions
1186  (if possible, depending on the ML platform).
1187
1188  Inserting @{ML Runtime.exn_trace} into ML code temporarily is useful for
1189  debugging, but not suitable for production code.
1190\<close>
1191
1192text %mlantiq \<open>
1193  \begin{matharray}{rcl}
1194  @{ML_antiquotation_def "assert"} & : & \<open>ML_antiquotation\<close> \\
1195  @{ML_antiquotation_def "undefined"} & : & \<open>ML_antiquotation\<close> \\
1196  \end{matharray}
1197
1198  \<^descr> \<open>@{assert}\<close> inlines a function @{ML_type "bool -> unit"} that raises @{ML
1199  Fail} if the argument is @{ML false}. Due to inlining the source position of
1200  failed assertions is included in the error output.
1201
1202  \<^descr> \<open>@{undefined}\<close> inlines @{verbatim raise}~@{ML Match}, i.e.\ the ML program
1203  behaves as in some function application of an undefined case.
1204\<close>
1205
1206text %mlex \<open>
1207  The ML function @{ML undefined} is defined in \<^file>\<open>~~/src/Pure/library.ML\<close>
1208  as follows:
1209\<close>
1210
1211ML \<open>fun undefined _ = raise Match\<close>
1212
1213text \<open>
1214  \<^medskip>
1215  The following variant uses the antiquotation @{ML_antiquotation undefined}
1216  instead:
1217\<close>
1218
1219ML \<open>fun undefined _ = @{undefined}\<close>
1220
1221text \<open>
1222  \<^medskip>
1223  Here is the same, using control-symbol notation for the antiquotation, with
1224  special rendering of \<^verbatim>\<open>\<^undefined>\<close>:
1225\<close>
1226
1227ML \<open>fun undefined _ = \<^undefined>\<close>
1228
1229text \<open>
1230  \medskip Semantically, all forms are equivalent to a function definition
1231  without any clauses, but that is syntactically not allowed in ML.
1232\<close>
1233
1234
1235section \<open>Strings of symbols \label{sec:symbols}\<close>
1236
1237text \<open>
1238  A \<^emph>\<open>symbol\<close> constitutes the smallest textual unit in Isabelle/ML --- raw ML
1239  characters are normally not encountered at all. Isabelle strings consist of
1240  a sequence of symbols, represented as a packed string or an exploded list of
1241  strings. Each symbol is in itself a small string, which has either one of
1242  the following forms:
1243
1244    \<^enum> a single ASCII character ``\<open>c\<close>'', for example ``\<^verbatim>\<open>a\<close>'',
1245
1246    \<^enum> a codepoint according to UTF-8 (non-ASCII byte sequence),
1247
1248    \<^enum> a regular symbol ``\<^verbatim>\<open>\<ident>\<close>'', for example ``\<^verbatim>\<open>\<alpha>\<close>'',
1249
1250    \<^enum> a control symbol ``\<^verbatim>\<open>\<^ident>\<close>'', for example ``\<^verbatim>\<open>\<^bold>\<close>'',
1251
1252  The \<open>ident\<close> syntax for symbol names is \<open>letter (letter | digit)\<^sup>*\<close>, where
1253  \<open>letter = A..Za..z\<close> and \<open>digit = 0..9\<close>. There are infinitely many regular
1254  symbols and control symbols, but a fixed collection of standard symbols is
1255  treated specifically. For example, ``\<^verbatim>\<open>\<alpha>\<close>'' is classified as a letter, which
1256  means it may occur within regular Isabelle identifiers.
1257
1258  The character set underlying Isabelle symbols is 7-bit ASCII, but 8-bit
1259  character sequences are passed-through unchanged. Unicode/UCS data in UTF-8
1260  encoding is processed in a non-strict fashion, such that well-formed code
1261  sequences are recognized accordingly. Unicode provides its own collection of
1262  mathematical symbols, but within the core Isabelle/ML world there is no link
1263  to the standard collection of Isabelle regular symbols.
1264
1265  \<^medskip>
1266  Output of Isabelle symbols depends on the print mode. For example, the
1267  standard {\LaTeX} setup of the Isabelle document preparation system would
1268  present ``\<^verbatim>\<open>\<alpha>\<close>'' as \<open>\<alpha>\<close>, and ``\<^verbatim>\<open>\<^bold>\<alpha>\<close>'' as \<open>\<^bold>\<alpha>\<close>. On-screen rendering usually
1269  works by mapping a finite subset of Isabelle symbols to suitable Unicode
1270  characters.
1271\<close>
1272
1273text %mlref \<open>
1274  \begin{mldecls}
1275  @{index_ML_type "Symbol.symbol": string} \\
1276  @{index_ML Symbol.explode: "string -> Symbol.symbol list"} \\
1277  @{index_ML Symbol.is_letter: "Symbol.symbol -> bool"} \\
1278  @{index_ML Symbol.is_digit: "Symbol.symbol -> bool"} \\
1279  @{index_ML Symbol.is_quasi: "Symbol.symbol -> bool"} \\
1280  @{index_ML Symbol.is_blank: "Symbol.symbol -> bool"} \\
1281  \end{mldecls}
1282  \begin{mldecls}
1283  @{index_ML_type "Symbol.sym"} \\
1284  @{index_ML Symbol.decode: "Symbol.symbol -> Symbol.sym"} \\
1285  \end{mldecls}
1286
1287  \<^descr> Type @{ML_type "Symbol.symbol"} represents individual Isabelle symbols.
1288
1289  \<^descr> @{ML "Symbol.explode"}~\<open>str\<close> produces a symbol list from the packed form.
1290  This function supersedes @{ML "String.explode"} for virtually all purposes
1291  of manipulating text in Isabelle!\<^footnote>\<open>The runtime overhead for exploded strings
1292  is mainly that of the list structure: individual symbols that happen to be a
1293  singleton string do not require extra memory in Poly/ML.\<close>
1294
1295  \<^descr> @{ML "Symbol.is_letter"}, @{ML "Symbol.is_digit"}, @{ML
1296  "Symbol.is_quasi"}, @{ML "Symbol.is_blank"} classify standard symbols
1297  according to fixed syntactic conventions of Isabelle, cf.\ @{cite
1298  "isabelle-isar-ref"}.
1299
1300  \<^descr> Type @{ML_type "Symbol.sym"} is a concrete datatype that represents the
1301  different kinds of symbols explicitly, with constructors @{ML
1302  "Symbol.Char"}, @{ML "Symbol.UTF8"}, @{ML "Symbol.Sym"}, @{ML
1303  "Symbol.Control"}, @{ML "Symbol.Malformed"}.
1304
1305  \<^descr> @{ML "Symbol.decode"} converts the string representation of a symbol into
1306  the datatype version.
1307\<close>
1308
1309paragraph \<open>Historical note.\<close>
1310text \<open>
1311  In the original SML90 standard the primitive ML type @{ML_type char} did not
1312  exists, and @{ML_text "explode: string -> string list"} produced a list of
1313  singleton strings like @{ML "raw_explode: string -> string list"} in
1314  Isabelle/ML today. When SML97 came out, Isabelle did not adopt its somewhat
1315  anachronistic 8-bit or 16-bit characters, but the idea of exploding a string
1316  into a list of small strings was extended to ``symbols'' as explained above.
1317  Thus Isabelle sources can refer to an infinite store of user-defined
1318  symbols, without having to worry about the multitude of Unicode encodings
1319  that have emerged over the years.
1320\<close>
1321
1322
1323section \<open>Basic data types\<close>
1324
1325text \<open>
1326  The basis library proposal of SML97 needs to be treated with caution. Many
1327  of its operations simply do not fit with important Isabelle/ML conventions
1328  (like ``canonical argument order'', see
1329  \secref{sec:canonical-argument-order}), others cause problems with the
1330  parallel evaluation model of Isabelle/ML (such as @{ML TextIO.print} or @{ML
1331  OS.Process.system}).
1332
1333  Subsequently we give a brief overview of important operations on basic ML
1334  data types.
1335\<close>
1336
1337
1338subsection \<open>Characters\<close>
1339
1340text %mlref \<open>
1341  \begin{mldecls}
1342  @{index_ML_type char} \\
1343  \end{mldecls}
1344
1345  \<^descr> Type @{ML_type char} is \<^emph>\<open>not\<close> used. The smallest textual unit in Isabelle
1346  is represented as a ``symbol'' (see \secref{sec:symbols}).
1347\<close>
1348
1349
1350subsection \<open>Strings\<close>
1351
1352text %mlref \<open>
1353  \begin{mldecls}
1354  @{index_ML_type string} \\
1355  \end{mldecls}
1356
1357  \<^descr> Type @{ML_type string} represents immutable vectors of 8-bit characters.
1358  There are operations in SML to convert back and forth to actual byte
1359  vectors, which are seldom used.
1360
1361  This historically important raw text representation is used for
1362  Isabelle-specific purposes with the following implicit substructures packed
1363  into the string content:
1364
1365    \<^enum> sequence of Isabelle symbols (see also \secref{sec:symbols}), with @{ML
1366    Symbol.explode} as key operation;
1367  
1368    \<^enum> XML tree structure via YXML (see also @{cite "isabelle-system"}), with
1369    @{ML YXML.parse_body} as key operation.
1370
1371  Note that Isabelle/ML string literals may refer Isabelle symbols like
1372  ``\<^verbatim>\<open>\<alpha>\<close>'' natively, \<^emph>\<open>without\<close> escaping the backslash. This is a consequence
1373  of Isabelle treating all source text as strings of symbols, instead of raw
1374  characters.
1375\<close>
1376
1377text %mlex \<open>
1378  The subsequent example illustrates the difference of physical addressing of
1379  bytes versus logical addressing of symbols in Isabelle strings.
1380\<close>
1381
1382ML_val \<open>
1383  val s = "\<A>";
1384
1385  @{assert} (length (Symbol.explode s) = 1);
1386  @{assert} (size s = 4);
1387\<close>
1388
1389text \<open>
1390  Note that in Unicode renderings of the symbol \<open>\<A>\<close>, variations of encodings
1391  like UTF-8 or UTF-16 pose delicate questions about the multi-byte
1392  representations of its codepoint, which is outside of the 16-bit address
1393  space of the original Unicode standard from the 1990-ies. In Isabelle/ML it
1394  is just ``\<^verbatim>\<open>\<A>\<close>'' literally, using plain ASCII characters beyond any
1395  doubts.
1396\<close>
1397
1398
1399subsection \<open>Integers\<close>
1400
1401text %mlref \<open>
1402  \begin{mldecls}
1403  @{index_ML_type int} \\
1404  \end{mldecls}
1405
1406  \<^descr> Type @{ML_type int} represents regular mathematical integers, which are
1407  \<^emph>\<open>unbounded\<close>. Overflow is treated properly, but should never happen in
1408  practice.\<^footnote>\<open>The size limit for integer bit patterns in memory is 64\,MB for
1409  32-bit Poly/ML, and much higher for 64-bit systems.\<close>
1410
1411  Structure @{ML_structure IntInf} of SML97 is obsolete and superseded by
1412  @{ML_structure Int}. Structure @{ML_structure Integer} in
1413  \<^file>\<open>~~/src/Pure/General/integer.ML\<close> provides some additional operations.
1414\<close>
1415
1416
1417subsection \<open>Rational numbers\<close>
1418
1419text %mlref \<open>
1420  \begin{mldecls}
1421  @{index_ML_type Rat.rat} \\
1422  \end{mldecls}
1423
1424  \<^descr> Type @{ML_type Rat.rat} represents rational numbers, based on the
1425  unbounded integers of Poly/ML.
1426
1427  Literal rationals may be written with special antiquotation syntax
1428  \<^verbatim>\<open>@\<close>\<open>int\<close>\<^verbatim>\<open>/\<close>\<open>nat\<close> or \<^verbatim>\<open>@\<close>\<open>int\<close> (without any white space). For example
1429  \<^verbatim>\<open>@~1/4\<close> or \<^verbatim>\<open>@10\<close>. The ML toplevel pretty printer uses the same format.
1430
1431  Standard operations are provided via ad-hoc overloading of \<^verbatim>\<open>+\<close>, \<^verbatim>\<open>-\<close>, \<^verbatim>\<open>*\<close>,
1432  \<^verbatim>\<open>/\<close>, etc.
1433\<close>
1434
1435
1436subsection \<open>Time\<close>
1437
1438text %mlref \<open>
1439  \begin{mldecls}
1440  @{index_ML_type Time.time} \\
1441  @{index_ML seconds: "real -> Time.time"} \\
1442  \end{mldecls}
1443
1444  \<^descr> Type @{ML_type Time.time} represents time abstractly according to the
1445  SML97 basis library definition. This is adequate for internal ML operations,
1446  but awkward in concrete time specifications.
1447
1448  \<^descr> @{ML seconds}~\<open>s\<close> turns the concrete scalar \<open>s\<close> (measured in seconds) into
1449  an abstract time value. Floating point numbers are easy to use as
1450  configuration options in the context (see \secref{sec:config-options}) or
1451  system options that are maintained externally.
1452\<close>
1453
1454
1455subsection \<open>Options\<close>
1456
1457text %mlref \<open>
1458  \begin{mldecls}
1459  @{index_ML Option.map: "('a -> 'b) -> 'a option -> 'b option"} \\
1460  @{index_ML is_some: "'a option -> bool"} \\
1461  @{index_ML is_none: "'a option -> bool"} \\
1462  @{index_ML the: "'a option -> 'a"} \\
1463  @{index_ML these: "'a list option -> 'a list"} \\
1464  @{index_ML the_list: "'a option -> 'a list"} \\
1465  @{index_ML the_default: "'a -> 'a option -> 'a"} \\
1466  \end{mldecls}
1467\<close>
1468
1469text \<open>
1470  Apart from @{ML Option.map} most other operations defined in structure
1471  @{ML_structure Option} are alien to Isabelle/ML and never used. The
1472  operations shown above are defined in \<^file>\<open>~~/src/Pure/General/basics.ML\<close>.
1473\<close>
1474
1475
1476subsection \<open>Lists\<close>
1477
1478text \<open>
1479  Lists are ubiquitous in ML as simple and light-weight ``collections'' for
1480  many everyday programming tasks. Isabelle/ML provides important additions
1481  and improvements over operations that are predefined in the SML97 library.
1482\<close>
1483
1484text %mlref \<open>
1485  \begin{mldecls}
1486  @{index_ML cons: "'a -> 'a list -> 'a list"} \\
1487  @{index_ML member: "('b * 'a -> bool) -> 'a list -> 'b -> bool"} \\
1488  @{index_ML insert: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\
1489  @{index_ML remove: "('b * 'a -> bool) -> 'b -> 'a list -> 'a list"} \\
1490  @{index_ML update: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\
1491  \end{mldecls}
1492
1493  \<^descr> @{ML cons}~\<open>x xs\<close> evaluates to \<open>x :: xs\<close>.
1494
1495  Tupled infix operators are a historical accident in Standard ML. The curried
1496  @{ML cons} amends this, but it should be only used when partial application
1497  is required.
1498
1499  \<^descr> @{ML member}, @{ML insert}, @{ML remove}, @{ML update} treat lists as a
1500  set-like container that maintains the order of elements. See
1501  \<^file>\<open>~~/src/Pure/library.ML\<close> for the full specifications (written in ML).
1502  There are some further derived operations like @{ML union} or @{ML inter}.
1503
1504  Note that @{ML insert} is conservative about elements that are already a
1505  @{ML member} of the list, while @{ML update} ensures that the latest entry
1506  is always put in front. The latter discipline is often more appropriate in
1507  declarations of context data (\secref{sec:context-data}) that are issued by
1508  the user in Isar source: later declarations take precedence over earlier
1509  ones. \<close>
1510
1511text %mlex \<open>
1512  Using canonical @{ML fold} together with @{ML cons} (or similar standard
1513  operations) alternates the orientation of data. The is quite natural and
1514  should not be altered forcible by inserting extra applications of @{ML rev}.
1515  The alternative @{ML fold_rev} can be used in the few situations, where
1516  alternation should be prevented.
1517\<close>
1518
1519ML_val \<open>
1520  val items = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
1521
1522  val list1 = fold cons items [];
1523  @{assert} (list1 = rev items);
1524
1525  val list2 = fold_rev cons items [];
1526  @{assert} (list2 = items);
1527\<close>
1528
1529text \<open>
1530  The subsequent example demonstrates how to \<^emph>\<open>merge\<close> two lists in a natural
1531  way.
1532\<close>
1533
1534ML_val \<open>
1535  fun merge_lists eq (xs, ys) = fold_rev (insert eq) ys xs;
1536\<close>
1537
1538text \<open>
1539  Here the first list is treated conservatively: only the new elements from
1540  the second list are inserted. The inside-out order of insertion via @{ML
1541  fold_rev} attempts to preserve the order of elements in the result.
1542
1543  This way of merging lists is typical for context data
1544  (\secref{sec:context-data}). See also @{ML merge} as defined in
1545  \<^file>\<open>~~/src/Pure/library.ML\<close>.
1546\<close>
1547
1548
1549subsection \<open>Association lists\<close>
1550
1551text \<open>
1552  The operations for association lists interpret a concrete list of pairs as a
1553  finite function from keys to values. Redundant representations with multiple
1554  occurrences of the same key are implicitly normalized: lookup and update
1555  only take the first occurrence into account.
1556\<close>
1557
1558text \<open>
1559  \begin{mldecls}
1560  @{index_ML AList.lookup: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> 'c option"} \\
1561  @{index_ML AList.defined: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> bool"} \\
1562  @{index_ML AList.update: "('a * 'a -> bool) -> 'a * 'b -> ('a * 'b) list -> ('a * 'b) list"} \\
1563  \end{mldecls}
1564
1565  \<^descr> @{ML AList.lookup}, @{ML AList.defined}, @{ML AList.update} implement the
1566  main ``framework operations'' for mappings in Isabelle/ML, following
1567  standard conventions for their names and types.
1568
1569  Note that a function called \<^verbatim>\<open>lookup\<close> is obliged to express its partiality
1570  via an explicit option element. There is no choice to raise an exception,
1571  without changing the name to something like \<open>the_element\<close> or \<open>get\<close>.
1572
1573  The \<open>defined\<close> operation is essentially a contraction of @{ML is_some} and
1574  \<^verbatim>\<open>lookup\<close>, but this is sufficiently frequent to justify its independent
1575  existence. This also gives the implementation some opportunity for peep-hole
1576  optimization.
1577
1578
1579  Association lists are adequate as simple implementation of finite mappings
1580  in many practical situations. A more advanced table structure is defined in
1581  \<^file>\<open>~~/src/Pure/General/table.ML\<close>; that version scales easily to thousands or
1582  millions of elements.
1583\<close>
1584
1585
1586subsection \<open>Unsynchronized references\<close>
1587
1588text %mlref \<open>
1589  \begin{mldecls}
1590  @{index_ML_type "'a Unsynchronized.ref"} \\
1591  @{index_ML Unsynchronized.ref: "'a -> 'a Unsynchronized.ref"} \\
1592  @{index_ML "!": "'a Unsynchronized.ref -> 'a"} \\
1593  @{index_ML_op ":=": "'a Unsynchronized.ref * 'a -> unit"} \\
1594  \end{mldecls}
1595\<close>
1596
1597text \<open>
1598  Due to ubiquitous parallelism in Isabelle/ML (see also
1599  \secref{sec:multi-threading}), the mutable reference cells of Standard ML
1600  are notorious for causing problems. In a highly parallel system, both
1601  correctness \<^emph>\<open>and\<close> performance are easily degraded when using mutable data.
1602
1603  The unwieldy name of @{ML Unsynchronized.ref} for the constructor for
1604  references in Isabelle/ML emphasizes the inconveniences caused by
1605  mutability. Existing operations @{ML "!"} and @{ML_op ":="} are unchanged,
1606  but should be used with special precautions, say in a strictly local
1607  situation that is guaranteed to be restricted to sequential evaluation ---
1608  now and in the future.
1609
1610  \begin{warn}
1611  Never @{ML_text "open Unsynchronized"}, not even in a local scope!
1612  Pretending that mutable state is no problem is a very bad idea.
1613  \end{warn}
1614\<close>
1615
1616
1617section \<open>Thread-safe programming \label{sec:multi-threading}\<close>
1618
1619text \<open>
1620  Multi-threaded execution has become an everyday reality in Isabelle since
1621  Poly/ML 5.2.1 and Isabelle2008. Isabelle/ML provides implicit and explicit
1622  parallelism by default, and there is no way for user-space tools to ``opt
1623  out''. ML programs that are purely functional, output messages only via the
1624  official channels (\secref{sec:message-channels}), and do not intercept
1625  interrupts (\secref{sec:exceptions}) can participate in the multi-threaded
1626  environment immediately without further ado.
1627
1628  More ambitious tools with more fine-grained interaction with the environment
1629  need to observe the principles explained below.
1630\<close>
1631
1632
1633subsection \<open>Multi-threading with shared memory\<close>
1634
1635text \<open>
1636  Multiple threads help to organize advanced operations of the system, such as
1637  real-time conditions on command transactions, sub-components with explicit
1638  communication, general asynchronous interaction etc. Moreover, parallel
1639  evaluation is a prerequisite to make adequate use of the CPU resources that
1640  are available on multi-core systems.\<^footnote>\<open>Multi-core computing does not mean
1641  that there are ``spare cycles'' to be wasted. It means that the continued
1642  exponential speedup of CPU performance due to ``Moore's Law'' follows
1643  different rules: clock frequency has reached its peak around 2005, and
1644  applications need to be parallelized in order to avoid a perceived loss of
1645  performance. See also @{cite "Sutter:2005"}.\<close>
1646
1647  Isabelle/Isar exploits the inherent structure of theories and proofs to
1648  support \<^emph>\<open>implicit parallelism\<close> to a large extent. LCF-style theorem proving
1649  provides almost ideal conditions for that, see also @{cite "Wenzel:2009"}.
1650  This means, significant parts of theory and proof checking is parallelized
1651  by default. In Isabelle2013, a maximum speedup-factor of 3.5 on 4 cores and
1652  6.5 on 8 cores can be expected @{cite "Wenzel:2013:ITP"}.
1653
1654  \<^medskip>
1655  ML threads lack the memory protection of separate processes, and operate
1656  concurrently on shared heap memory. This has the advantage that results of
1657  independent computations are directly available to other threads: abstract
1658  values can be passed without copying or awkward serialization that is
1659  typically required for separate processes.
1660
1661  To make shared-memory multi-threading work robustly and efficiently, some
1662  programming guidelines need to be observed. While the ML system is
1663  responsible to maintain basic integrity of the representation of ML values
1664  in memory, the application programmer needs to ensure that multi-threaded
1665  execution does not break the intended semantics.
1666
1667  \begin{warn}
1668  To participate in implicit parallelism, tools need to be thread-safe. A
1669  single ill-behaved tool can affect the stability and performance of the
1670  whole system.
1671  \end{warn}
1672
1673  Apart from observing the principles of thread-safeness passively, advanced
1674  tools may also exploit parallelism actively, e.g.\ by using library
1675  functions for parallel list operations (\secref{sec:parlist}).
1676
1677  \begin{warn}
1678  Parallel computing resources are managed centrally by the Isabelle/ML
1679  infrastructure. User programs should not fork their own ML threads to
1680  perform heavy computations.
1681  \end{warn}
1682\<close>
1683
1684
1685subsection \<open>Critical shared resources\<close>
1686
1687text \<open>
1688  Thread-safeness is mainly concerned about concurrent read/write access to
1689  shared resources, which are outside the purely functional world of ML. This
1690  covers the following in particular.
1691
1692    \<^item> Global references (or arrays), i.e.\ mutable memory cells that persist
1693    over several invocations of associated operations.\<^footnote>\<open>This is independent of
1694    the visibility of such mutable values in the toplevel scope.\<close>
1695
1696    \<^item> Global state of the running Isabelle/ML process, i.e.\ raw I/O channels,
1697    environment variables, current working directory.
1698
1699    \<^item> Writable resources in the file-system that are shared among different
1700    threads or external processes.
1701
1702  Isabelle/ML provides various mechanisms to avoid critical shared resources
1703  in most situations. As last resort there are some mechanisms for explicit
1704  synchronization. The following guidelines help to make Isabelle/ML programs
1705  work smoothly in a concurrent environment.
1706
1707  \<^item> Avoid global references altogether. Isabelle/Isar maintains a uniform
1708  context that incorporates arbitrary data declared by user programs
1709  (\secref{sec:context-data}). This context is passed as plain value and user
1710  tools can get/map their own data in a purely functional manner.
1711  Configuration options within the context (\secref{sec:config-options})
1712  provide simple drop-in replacements for historic reference variables.
1713
1714  \<^item> Keep components with local state information re-entrant. Instead of poking
1715  initial values into (private) global references, a new state record can be
1716  created on each invocation, and passed through any auxiliary functions of
1717  the component. The state record contain mutable references in special
1718  situations, without requiring any synchronization, as long as each
1719  invocation gets its own copy and the tool itself is single-threaded.
1720
1721  \<^item> Avoid raw output on \<open>stdout\<close> or \<open>stderr\<close>. The Poly/ML library is
1722  thread-safe for each individual output operation, but the ordering of
1723  parallel invocations is arbitrary. This means raw output will appear on some
1724  system console with unpredictable interleaving of atomic chunks.
1725
1726  Note that this does not affect regular message output channels
1727  (\secref{sec:message-channels}). An official message id is associated with
1728  the command transaction from where it originates, independently of other
1729  transactions. This means each running Isar command has effectively its own
1730  set of message channels, and interleaving can only happen when commands use
1731  parallelism internally (and only at message boundaries).
1732
1733  \<^item> Treat environment variables and the current working directory of the
1734  running process as read-only.
1735
1736  \<^item> Restrict writing to the file-system to unique temporary files. Isabelle
1737  already provides a temporary directory that is unique for the running
1738  process, and there is a centralized source of unique serial numbers in
1739  Isabelle/ML. Thus temporary files that are passed to to some external
1740  process will be always disjoint, and thus thread-safe.
1741\<close>
1742
1743text %mlref \<open>
1744  \begin{mldecls}
1745  @{index_ML File.tmp_path: "Path.T -> Path.T"} \\
1746  @{index_ML serial_string: "unit -> string"} \\
1747  \end{mldecls}
1748
1749  \<^descr> @{ML File.tmp_path}~\<open>path\<close> relocates the base component of \<open>path\<close> into the
1750  unique temporary directory of the running Isabelle/ML process.
1751
1752  \<^descr> @{ML serial_string}~\<open>()\<close> creates a new serial number that is unique over
1753  the runtime of the Isabelle/ML process.
1754\<close>
1755
1756text %mlex \<open>
1757  The following example shows how to create unique temporary file names.
1758\<close>
1759
1760ML_val \<open>
1761  val tmp1 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));
1762  val tmp2 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));
1763  @{assert} (tmp1 <> tmp2);
1764\<close>
1765
1766
1767subsection \<open>Explicit synchronization\<close>
1768
1769text \<open>
1770  Isabelle/ML provides explicit synchronization for mutable variables over
1771  immutable data, which may be updated atomically and exclusively. This
1772  addresses the rare situations where mutable shared resources are really
1773  required. Synchronization in Isabelle/ML is based on primitives of Poly/ML,
1774  which have been adapted to the specific assumptions of the concurrent
1775  Isabelle environment. User code should not break this abstraction, but stay
1776  within the confines of concurrent Isabelle/ML.
1777
1778  A \<^emph>\<open>synchronized variable\<close> is an explicit state component associated with
1779  mechanisms for locking and signaling. There are operations to await a
1780  condition, change the state, and signal the change to all other waiting
1781  threads. Synchronized access to the state variable is \<^emph>\<open>not\<close> re-entrant:
1782  direct or indirect nesting within the same thread will cause a deadlock!
1783\<close>
1784
1785text %mlref \<open>
1786  \begin{mldecls}
1787  @{index_ML_type "'a Synchronized.var"} \\
1788  @{index_ML Synchronized.var: "string -> 'a -> 'a Synchronized.var"} \\
1789  @{index_ML Synchronized.guarded_access: "'a Synchronized.var ->
1790  ('a -> ('b * 'a) option) -> 'b"} \\
1791  \end{mldecls}
1792
1793    \<^descr> Type @{ML_type "'a Synchronized.var"} represents synchronized variables
1794    with state of type @{ML_type 'a}.
1795
1796    \<^descr> @{ML Synchronized.var}~\<open>name x\<close> creates a synchronized variable that is
1797    initialized with value \<open>x\<close>. The \<open>name\<close> is used for tracing.
1798
1799    \<^descr> @{ML Synchronized.guarded_access}~\<open>var f\<close> lets the function \<open>f\<close> operate
1800    within a critical section on the state \<open>x\<close> as follows: if \<open>f x\<close> produces
1801    @{ML NONE}, it continues to wait on the internal condition variable,
1802    expecting that some other thread will eventually change the content in a
1803    suitable manner; if \<open>f x\<close> produces @{ML SOME}~\<open>(y, x')\<close> it is satisfied and
1804    assigns the new state value \<open>x'\<close>, broadcasts a signal to all waiting threads
1805    on the associated condition variable, and returns the result \<open>y\<close>.
1806
1807  There are some further variants of the @{ML Synchronized.guarded_access}
1808  combinator, see \<^file>\<open>~~/src/Pure/Concurrent/synchronized.ML\<close> for details.
1809\<close>
1810
1811text %mlex \<open>
1812  The following example implements a counter that produces positive integers
1813  that are unique over the runtime of the Isabelle process:
1814\<close>
1815
1816ML_val \<open>
1817  local
1818    val counter = Synchronized.var "counter" 0;
1819  in
1820    fun next () =
1821      Synchronized.guarded_access counter
1822        (fn i =>
1823          let val j = i + 1
1824          in SOME (j, j) end);
1825  end;
1826
1827  val a = next ();
1828  val b = next ();
1829  @{assert} (a <> b);
1830\<close>
1831
1832text \<open>
1833  \<^medskip>
1834  See \<^file>\<open>~~/src/Pure/Concurrent/mailbox.ML\<close> how to implement a mailbox as
1835  synchronized variable over a purely functional list.
1836\<close>
1837
1838
1839section \<open>Managed evaluation\<close>
1840
1841text \<open>
1842  Execution of Standard ML follows the model of strict functional evaluation
1843  with optional exceptions. Evaluation happens whenever some function is
1844  applied to (sufficiently many) arguments. The result is either an explicit
1845  value or an implicit exception.
1846
1847  \<^emph>\<open>Managed evaluation\<close> in Isabelle/ML organizes expressions and results to
1848  control certain physical side-conditions, to say more specifically when and
1849  how evaluation happens. For example, the Isabelle/ML library supports lazy
1850  evaluation with memoing, parallel evaluation via futures, asynchronous
1851  evaluation via promises, evaluation with time limit etc.
1852
1853  \<^medskip>
1854  An \<^emph>\<open>unevaluated expression\<close> is represented either as unit abstraction
1855  \<^verbatim>\<open>fn () => a\<close> of type \<^verbatim>\<open>unit -> 'a\<close> or as regular function \<^verbatim>\<open>fn a => b\<close> of
1856  type \<^verbatim>\<open>'a -> 'b\<close>. Both forms occur routinely, and special care is required
1857  to tell them apart --- the static type-system of SML is only of limited help
1858  here.
1859
1860  The first form is more intuitive: some combinator \<^verbatim>\<open>(unit -> 'a) -> 'a\<close>
1861  applies the given function to \<^verbatim>\<open>()\<close> to initiate the postponed evaluation
1862  process. The second form is more flexible: some combinator
1863  \<^verbatim>\<open>('a -> 'b) -> 'a -> 'b\<close> acts like a modified form of function application;
1864  several such combinators may be cascaded to modify a given function, before
1865  it is ultimately applied to some argument.
1866
1867  \<^medskip>
1868  \<^emph>\<open>Reified results\<close> make the disjoint sum of regular values versions
1869  exceptional situations explicit as ML datatype: \<open>'a result = Res of 'a | Exn
1870  of exn\<close>. This is typically used for administrative purposes, to store the
1871  overall outcome of an evaluation process.
1872
1873  \<^emph>\<open>Parallel exceptions\<close> aggregate reified results, such that multiple
1874  exceptions are digested as a collection in canonical form that identifies
1875  exceptions according to their original occurrence. This is particular
1876  important for parallel evaluation via futures \secref{sec:futures}, which
1877  are organized as acyclic graph of evaluations that depend on other
1878  evaluations: exceptions stemming from shared sub-graphs are exposed exactly
1879  once and in the order of their original occurrence (e.g.\ when printed at
1880  the toplevel). Interrupt counts as neutral element here: it is treated as
1881  minimal information about some canceled evaluation process, and is absorbed
1882  by the presence of regular program exceptions.
1883\<close>
1884
1885text %mlref \<open>
1886  \begin{mldecls}
1887  @{index_ML_type "'a Exn.result"} \\
1888  @{index_ML Exn.capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\
1889  @{index_ML Exn.interruptible_capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\
1890  @{index_ML Exn.release: "'a Exn.result -> 'a"} \\
1891  @{index_ML Par_Exn.release_all: "'a Exn.result list -> 'a list"} \\
1892  @{index_ML Par_Exn.release_first: "'a Exn.result list -> 'a list"} \\
1893  \end{mldecls}
1894
1895  \<^descr> Type @{ML_type "'a Exn.result"} represents the disjoint sum of ML results
1896  explicitly, with constructor @{ML Exn.Res} for regular values and @{ML
1897  "Exn.Exn"} for exceptions.
1898
1899  \<^descr> @{ML Exn.capture}~\<open>f x\<close> manages the evaluation of \<open>f x\<close> such that
1900  exceptions are made explicit as @{ML "Exn.Exn"}. Note that this includes
1901  physical interrupts (see also \secref{sec:exceptions}), so the same
1902  precautions apply to user code: interrupts must not be absorbed
1903  accidentally!
1904
1905  \<^descr> @{ML Exn.interruptible_capture} is similar to @{ML Exn.capture}, but
1906  interrupts are immediately re-raised as required for user code.
1907
1908  \<^descr> @{ML Exn.release}~\<open>result\<close> releases the original runtime result, exposing
1909  its regular value or raising the reified exception.
1910
1911  \<^descr> @{ML Par_Exn.release_all}~\<open>results\<close> combines results that were produced
1912  independently (e.g.\ by parallel evaluation). If all results are regular
1913  values, that list is returned. Otherwise, the collection of all exceptions
1914  is raised, wrapped-up as collective parallel exception. Note that the latter
1915  prevents access to individual exceptions by conventional \<^verbatim>\<open>handle\<close> of ML.
1916
1917  \<^descr> @{ML Par_Exn.release_first} is similar to @{ML Par_Exn.release_all}, but
1918  only the first (meaningful) exception that has occurred in the original
1919  evaluation process is raised again, the others are ignored. That single
1920  exception may get handled by conventional means in ML.
1921\<close>
1922
1923
1924subsection \<open>Parallel skeletons \label{sec:parlist}\<close>
1925
1926text \<open>
1927  Algorithmic skeletons are combinators that operate on lists in parallel, in
1928  the manner of well-known \<open>map\<close>, \<open>exists\<close>, \<open>forall\<close> etc. Management of
1929  futures (\secref{sec:futures}) and their results as reified exceptions is
1930  wrapped up into simple programming interfaces that resemble the sequential
1931  versions.
1932
1933  What remains is the application-specific problem to present expressions with
1934  suitable \<^emph>\<open>granularity\<close>: each list element corresponds to one evaluation
1935  task. If the granularity is too coarse, the available CPUs are not
1936  saturated. If it is too fine-grained, CPU cycles are wasted due to the
1937  overhead of organizing parallel processing. In the worst case, parallel
1938  performance will be less than the sequential counterpart!
1939\<close>
1940
1941text %mlref \<open>
1942  \begin{mldecls}
1943  @{index_ML Par_List.map: "('a -> 'b) -> 'a list -> 'b list"} \\
1944  @{index_ML Par_List.get_some: "('a -> 'b option) -> 'a list -> 'b option"} \\
1945  \end{mldecls}
1946
1947  \<^descr> @{ML Par_List.map}~\<open>f [x\<^sub>1, \<dots>, x\<^sub>n]\<close> is like @{ML "map"}~\<open>f [x\<^sub>1, \<dots>,
1948  x\<^sub>n]\<close>, but the evaluation of \<open>f x\<^sub>i\<close> for \<open>i = 1, \<dots>, n\<close> is performed in
1949  parallel.
1950
1951  An exception in any \<open>f x\<^sub>i\<close> cancels the overall evaluation process. The
1952  final result is produced via @{ML Par_Exn.release_first} as explained above,
1953  which means the first program exception that happened to occur in the
1954  parallel evaluation is propagated, and all other failures are ignored.
1955
1956  \<^descr> @{ML Par_List.get_some}~\<open>f [x\<^sub>1, \<dots>, x\<^sub>n]\<close> produces some \<open>f x\<^sub>i\<close> that is of
1957  the form \<open>SOME y\<^sub>i\<close>, if that exists, otherwise \<open>NONE\<close>. Thus it is similar to
1958  @{ML Library.get_first}, but subject to a non-deterministic parallel choice
1959  process. The first successful result cancels the overall evaluation process;
1960  other exceptions are propagated as for @{ML Par_List.map}.
1961
1962  This generic parallel choice combinator is the basis for derived forms, such
1963  as @{ML Par_List.find_some}, @{ML Par_List.exists}, @{ML Par_List.forall}.
1964\<close>
1965
1966text %mlex \<open>
1967  Subsequently, the Ackermann function is evaluated in parallel for some
1968  ranges of arguments.
1969\<close>
1970
1971ML_val \<open>
1972  fun ackermann 0 n = n + 1
1973    | ackermann m 0 = ackermann (m - 1) 1
1974    | ackermann m n = ackermann (m - 1) (ackermann m (n - 1));
1975
1976  Par_List.map (ackermann 2) (500 upto 1000);
1977  Par_List.map (ackermann 3) (5 upto 10);
1978\<close>
1979
1980
1981subsection \<open>Lazy evaluation\<close>
1982
1983text \<open>
1984  Classic lazy evaluation works via the \<open>lazy\<close>~/ \<open>force\<close> pair of operations:
1985  \<open>lazy\<close> to wrap an unevaluated expression, and \<open>force\<close> to evaluate it once
1986  and store its result persistently. Later invocations of \<open>force\<close> retrieve the
1987  stored result without another evaluation. Isabelle/ML refines this idea to
1988  accommodate the aspects of multi-threading, synchronous program exceptions
1989  and asynchronous interrupts.
1990
1991  The first thread that invokes \<open>force\<close> on an unfinished lazy value changes
1992  its state into a \<^emph>\<open>promise\<close> of the eventual result and starts evaluating it.
1993  Any other threads that \<open>force\<close> the same lazy value in the meantime need to
1994  wait for it to finish, by producing a regular result or program exception.
1995  If the evaluation attempt is interrupted, this event is propagated to all
1996  waiting threads and the lazy value is reset to its original state.
1997
1998  This means a lazy value is completely evaluated at most once, in a
1999  thread-safe manner. There might be multiple interrupted evaluation attempts,
2000  and multiple receivers of intermediate interrupt events. Interrupts are
2001  \<^emph>\<open>not\<close> made persistent: later evaluation attempts start again from the
2002  original expression.
2003\<close>
2004
2005text %mlref \<open>
2006  \begin{mldecls}
2007  @{index_ML_type "'a lazy"} \\
2008  @{index_ML Lazy.lazy: "(unit -> 'a) -> 'a lazy"} \\
2009  @{index_ML Lazy.value: "'a -> 'a lazy"} \\
2010  @{index_ML Lazy.force: "'a lazy -> 'a"} \\
2011  \end{mldecls}
2012
2013  \<^descr> Type @{ML_type "'a lazy"} represents lazy values over type \<^verbatim>\<open>'a\<close>.
2014
2015  \<^descr> @{ML Lazy.lazy}~\<open>(fn () => e)\<close> wraps the unevaluated expression \<open>e\<close> as
2016  unfinished lazy value.
2017
2018  \<^descr> @{ML Lazy.value}~\<open>a\<close> wraps the value \<open>a\<close> as finished lazy value. When
2019  forced, it returns \<open>a\<close> without any further evaluation.
2020
2021  There is very low overhead for this proforma wrapping of strict values as
2022  lazy values.
2023
2024  \<^descr> @{ML Lazy.force}~\<open>x\<close> produces the result of the lazy value in a
2025  thread-safe manner as explained above. Thus it may cause the current thread
2026  to wait on a pending evaluation attempt by another thread.
2027\<close>
2028
2029
2030subsection \<open>Futures \label{sec:futures}\<close>
2031
2032text \<open>
2033  Futures help to organize parallel execution in a value-oriented manner, with
2034  \<open>fork\<close>~/ \<open>join\<close> as the main pair of operations, and some further variants;
2035  see also @{cite "Wenzel:2009" and "Wenzel:2013:ITP"}. Unlike lazy values,
2036  futures are evaluated strictly and spontaneously on separate worker threads.
2037  Futures may be canceled, which leads to interrupts on running evaluation
2038  attempts, and forces structurally related futures to fail for all time;
2039  already finished futures remain unchanged. Exceptions between related
2040  futures are propagated as well, and turned into parallel exceptions (see
2041  above).
2042
2043  Technically, a future is a single-assignment variable together with a
2044  \<^emph>\<open>task\<close> that serves administrative purposes, notably within the \<^emph>\<open>task
2045  queue\<close> where new futures are registered for eventual evaluation and the
2046  worker threads retrieve their work.
2047
2048  The pool of worker threads is limited, in correlation with the number of
2049  physical cores on the machine. Note that allocation of runtime resources may
2050  be distorted either if workers yield CPU time (e.g.\ via system sleep or
2051  wait operations), or if non-worker threads contend for significant runtime
2052  resources independently. There is a limited number of replacement worker
2053  threads that get activated in certain explicit wait conditions, after a
2054  timeout.
2055
2056  \<^medskip>
2057  Each future task belongs to some \<^emph>\<open>task group\<close>, which represents the
2058  hierarchic structure of related tasks, together with the exception status a
2059  that point. By default, the task group of a newly created future is a new
2060  sub-group of the presently running one, but it is also possible to indicate
2061  different group layouts under program control.
2062
2063  Cancellation of futures actually refers to the corresponding task group and
2064  all its sub-groups. Thus interrupts are propagated down the group hierarchy.
2065  Regular program exceptions are treated likewise: failure of the evaluation
2066  of some future task affects its own group and all sub-groups. Given a
2067  particular task group, its \<^emph>\<open>group status\<close> cumulates all relevant exceptions
2068  according to its position within the group hierarchy. Interrupted tasks that
2069  lack regular result information, will pick up parallel exceptions from the
2070  cumulative group status.
2071
2072  \<^medskip>
2073  A \<^emph>\<open>passive future\<close> or \<^emph>\<open>promise\<close> is a future with slightly different
2074  evaluation policies: there is only a single-assignment variable and some
2075  expression to evaluate for the \<^emph>\<open>failed\<close> case (e.g.\ to clean up resources
2076  when canceled). A regular result is produced by external means, using a
2077  separate \<^emph>\<open>fulfill\<close> operation.
2078
2079  Promises are managed in the same task queue, so regular futures may depend
2080  on them. This allows a form of reactive programming, where some promises are
2081  used as minimal elements (or guards) within the future dependency graph:
2082  when these promises are fulfilled the evaluation of subsequent futures
2083  starts spontaneously, according to their own inter-dependencies.
2084\<close>
2085
2086text %mlref \<open>
2087  \begin{mldecls}
2088  @{index_ML_type "'a future"} \\
2089  @{index_ML Future.fork: "(unit -> 'a) -> 'a future"} \\
2090  @{index_ML Future.forks: "Future.params -> (unit -> 'a) list -> 'a future list"} \\
2091  @{index_ML Future.join: "'a future -> 'a"} \\
2092  @{index_ML Future.joins: "'a future list -> 'a list"} \\
2093  @{index_ML Future.value: "'a -> 'a future"} \\
2094  @{index_ML Future.map: "('a -> 'b) -> 'a future -> 'b future"} \\
2095  @{index_ML Future.cancel: "'a future -> unit"} \\
2096  @{index_ML Future.cancel_group: "Future.group -> unit"} \\[0.5ex]
2097  @{index_ML Future.promise: "(unit -> unit) -> 'a future"} \\
2098  @{index_ML Future.fulfill: "'a future -> 'a -> unit"} \\
2099  \end{mldecls}
2100
2101  \<^descr> Type @{ML_type "'a future"} represents future values over type \<^verbatim>\<open>'a\<close>.
2102
2103  \<^descr> @{ML Future.fork}~\<open>(fn () => e)\<close> registers the unevaluated expression \<open>e\<close>
2104  as unfinished future value, to be evaluated eventually on the parallel
2105  worker-thread farm. This is a shorthand for @{ML Future.forks} below, with
2106  default parameters and a single expression.
2107
2108  \<^descr> @{ML Future.forks}~\<open>params exprs\<close> is the general interface to fork several
2109  futures simultaneously. The \<open>params\<close> consist of the following fields:
2110
2111    \<^item> \<open>name : string\<close> (default @{ML "\"\""}) specifies a common name for the
2112    tasks of the forked futures, which serves diagnostic purposes.
2113
2114    \<^item> \<open>group : Future.group option\<close> (default @{ML NONE}) specifies an optional
2115    task group for the forked futures. @{ML NONE} means that a new sub-group
2116    of the current worker-thread task context is created. If this is not a
2117    worker thread, the group will be a new root in the group hierarchy.
2118
2119    \<^item> \<open>deps : Future.task list\<close> (default @{ML "[]"}) specifies dependencies on
2120    other future tasks, i.e.\ the adjacency relation in the global task queue.
2121    Dependencies on already finished tasks are ignored.
2122
2123    \<^item> \<open>pri : int\<close> (default @{ML 0}) specifies a priority within the task
2124    queue.
2125
2126    Typically there is only little deviation from the default priority @{ML
2127    0}. As a rule of thumb, @{ML "~1"} means ``low priority" and @{ML 1} means
2128    ``high priority''.
2129
2130    Note that the task priority only affects the position in the queue, not
2131    the thread priority. When a worker thread picks up a task for processing,
2132    it runs with the normal thread priority to the end (or until canceled).
2133    Higher priority tasks that are queued later need to wait until this (or
2134    another) worker thread becomes free again.
2135
2136    \<^item> \<open>interrupts : bool\<close> (default @{ML true}) tells whether the worker thread
2137    that processes the corresponding task is initially put into interruptible
2138    state. This state may change again while running, by modifying the thread
2139    attributes.
2140
2141    With interrupts disabled, a running future task cannot be canceled. It is
2142    the responsibility of the programmer that this special state is retained
2143    only briefly.
2144
2145  \<^descr> @{ML Future.join}~\<open>x\<close> retrieves the value of an already finished future,
2146  which may lead to an exception, according to the result of its previous
2147  evaluation.
2148
2149  For an unfinished future there are several cases depending on the role of
2150  the current thread and the status of the future. A non-worker thread waits
2151  passively until the future is eventually evaluated. A worker thread
2152  temporarily changes its task context and takes over the responsibility to
2153  evaluate the future expression on the spot. The latter is done in a
2154  thread-safe manner: other threads that intend to join the same future need
2155  to wait until the ongoing evaluation is finished.
2156
2157  Note that excessive use of dynamic dependencies of futures by adhoc joining
2158  may lead to bad utilization of CPU cores, due to threads waiting on other
2159  threads to finish required futures. The future task farm has a limited
2160  amount of replacement threads that continue working on unrelated tasks after
2161  some timeout.
2162
2163  Whenever possible, static dependencies of futures should be specified
2164  explicitly when forked (see \<open>deps\<close> above). Thus the evaluation can work from
2165  the bottom up, without join conflicts and wait states.
2166
2167  \<^descr> @{ML Future.joins}~\<open>xs\<close> joins the given list of futures simultaneously,
2168  which is more efficient than @{ML "map Future.join"}~\<open>xs\<close>.
2169
2170  Based on the dependency graph of tasks, the current thread takes over the
2171  responsibility to evaluate future expressions that are required for the main
2172  result, working from the bottom up. Waiting on future results that are
2173  presently evaluated on other threads only happens as last resort, when no
2174  other unfinished futures are left over.
2175
2176  \<^descr> @{ML Future.value}~\<open>a\<close> wraps the value \<open>a\<close> as finished future value,
2177  bypassing the worker-thread farm. When joined, it returns \<open>a\<close> without any
2178  further evaluation.
2179
2180  There is very low overhead for this proforma wrapping of strict values as
2181  futures.
2182
2183  \<^descr> @{ML Future.map}~\<open>f x\<close> is a fast-path implementation of @{ML
2184  Future.fork}~\<open>(fn () => f (\<close>@{ML Future.join}~\<open>x))\<close>, which avoids the full
2185  overhead of the task queue and worker-thread farm as far as possible. The
2186  function \<open>f\<close> is supposed to be some trivial post-processing or projection of
2187  the future result.
2188
2189  \<^descr> @{ML Future.cancel}~\<open>x\<close> cancels the task group of the given future, using
2190  @{ML Future.cancel_group} below.
2191
2192  \<^descr> @{ML Future.cancel_group}~\<open>group\<close> cancels all tasks of the given task
2193  group for all time. Threads that are presently processing a task of the
2194  given group are interrupted: it may take some time until they are actually
2195  terminated. Tasks that are queued but not yet processed are dequeued and
2196  forced into interrupted state. Since the task group is itself invalidated,
2197  any further attempt to fork a future that belongs to it will yield a
2198  canceled result as well.
2199
2200  \<^descr> @{ML Future.promise}~\<open>abort\<close> registers a passive future with the given
2201  \<open>abort\<close> operation: it is invoked when the future task group is canceled.
2202
2203  \<^descr> @{ML Future.fulfill}~\<open>x a\<close> finishes the passive future \<open>x\<close> by the given
2204  value \<open>a\<close>. If the promise has already been canceled, the attempt to fulfill
2205  it causes an exception.
2206\<close>
2207
2208end
2209