1(*:maxLineLen=78:*) 2 3theory "ML" 4imports Base 5begin 6 7chapter \<open>Isabelle/ML\<close> 8 9text \<open> 10 Isabelle/ML is best understood as a certain culture based on Standard ML. 11 Thus it is not a new programming language, but a certain way to use SML at 12 an advanced level within the Isabelle environment. This covers a variety of 13 aspects that are geared towards an efficient and robust platform for 14 applications of formal logic with fully foundational proof construction --- 15 according to the well-known \<^emph>\<open>LCF principle\<close>. There is specific 16 infrastructure with library modules to address the needs of this difficult 17 task. For example, the raw parallel programming model of Poly/ML is 18 presented as considerably more abstract concept of \<^emph>\<open>futures\<close>, which is then 19 used to augment the inference kernel, Isar theory and proof interpreter, and 20 PIDE document management. 21 22 The main aspects of Isabelle/ML are introduced below. These first-hand 23 explanations should help to understand how proper Isabelle/ML is to be read 24 and written, and to get access to the wealth of experience that is expressed 25 in the source text and its history of changes.\<^footnote>\<open>See 26 \<^url>\<open>https://isabelle.in.tum.de/repos/isabelle\<close> for the full Mercurial history. 27 There are symbolic tags to refer to official Isabelle releases, as opposed 28 to arbitrary \<^emph>\<open>tip\<close> versions that merely reflect snapshots that are never 29 really up-to-date.\<close> 30\<close> 31 32 33section \<open>Style and orthography\<close> 34 35text \<open> 36 The sources of Isabelle/Isar are optimized for \<^emph>\<open>readability\<close> and 37 \<^emph>\<open>maintainability\<close>. The main purpose is to tell an informed reader what is 38 really going on and how things really work. This is a non-trivial aim, but 39 it is supported by a certain style of writing Isabelle/ML that has emerged 40 from long years of system development.\<^footnote>\<open>See also the interesting style guide 41 for OCaml \<^url>\<open>https://caml.inria.fr/resources/doc/guides/guidelines.en.html\<close> 42 which shares many of our means and ends.\<close> 43 44 The main principle behind any coding style is \<^emph>\<open>consistency\<close>. For a single 45 author of a small program this merely means ``choose your style and stick to 46 it''. A complex project like Isabelle, with long years of development and 47 different contributors, requires more standardization. A coding style that 48 is changed every few years or with every new contributor is no style at all, 49 because consistency is quickly lost. Global consistency is hard to achieve, 50 though. Nonetheless, one should always strive at least for local consistency 51 of modules and sub-systems, without deviating from some general principles 52 how to write Isabelle/ML. 53 54 In a sense, good coding style is like an \<^emph>\<open>orthography\<close> for the sources: it 55 helps to read quickly over the text and see through the main points, without 56 getting distracted by accidental presentation of free-style code. 57\<close> 58 59 60subsection \<open>Header and sectioning\<close> 61 62text \<open> 63 Isabelle source files have a certain standardized header format (with 64 precise spacing) that follows ancient traditions reaching back to the 65 earliest versions of the system by Larry Paulson. See 66 \<^file>\<open>~~/src/Pure/thm.ML\<close>, for example. 67 68 The header includes at least \<^verbatim>\<open>Title\<close> and \<^verbatim>\<open>Author\<close> entries, followed by a 69 prose description of the purpose of the module. The latter can range from a 70 single line to several paragraphs of explanations. 71 72 The rest of the file is divided into chapters, sections, subsections, 73 subsubsections, paragraphs etc.\ using a simple layout via ML comments as 74 follows. 75 76 @{verbatim [display] 77\<open> (**** chapter ****) 78 79 (*** section ***) 80 81 (** subsection **) 82 83 (* subsubsection *) 84 85 (*short paragraph*) 86 87 (* 88 long paragraph, 89 with more text 90 *)\<close>} 91 92 As in regular typography, there is some extra space \<^emph>\<open>before\<close> section 93 headings that are adjacent to plain text, but not other headings as in the 94 example above. 95 96 \<^medskip> 97 The precise wording of the prose text given in these headings is chosen 98 carefully to introduce the main theme of the subsequent formal ML text. 99\<close> 100 101 102subsection \<open>Naming conventions\<close> 103 104text \<open> 105 Since ML is the primary medium to express the meaning of the source text, 106 naming of ML entities requires special care. 107\<close> 108 109paragraph \<open>Notation.\<close> 110text \<open> 111 A name consists of 1--3 \<^emph>\<open>words\<close> (rarely 4, but not more) that are separated 112 by underscore. There are three variants concerning upper or lower case 113 letters, which are used for certain ML categories as follows: 114 115 \<^medskip> 116 \begin{tabular}{lll} 117 variant & example & ML categories \\\hline 118 lower-case & @{ML_text foo_bar} & values, types, record fields \\ 119 capitalized & @{ML_text Foo_Bar} & datatype constructors, structures, functors \\ 120 upper-case & @{ML_text FOO_BAR} & special values, exception constructors, signatures \\ 121 \end{tabular} 122 \<^medskip> 123 124 For historical reasons, many capitalized names omit underscores, e.g.\ 125 old-style @{ML_text FooBar} instead of @{ML_text Foo_Bar}. Genuine 126 mixed-case names are \<^emph>\<open>not\<close> used, because clear division of words is 127 essential for readability.\<^footnote>\<open>Camel-case was invented to workaround the lack 128 of underscore in some early non-ASCII character sets. Later it became 129 habitual in some language communities that are now strong in numbers.\<close> 130 131 A single (capital) character does not count as ``word'' in this respect: 132 some Isabelle/ML names are suffixed by extra markers like this: @{ML_text 133 foo_barT}. 134 135 Name variants are produced by adding 1--3 primes, e.g.\ @{ML_text foo'}, 136 @{ML_text foo''}, or @{ML_text foo'''}, but not @{ML_text foo''''} or more. 137 Decimal digits scale better to larger numbers, e.g.\ @{ML_text foo0}, 138 @{ML_text foo1}, @{ML_text foo42}. 139\<close> 140 141paragraph \<open>Scopes.\<close> 142text \<open> 143 Apart from very basic library modules, ML structures are not ``opened'', but 144 names are referenced with explicit qualification, as in @{ML 145 Syntax.string_of_term} for example. When devising names for structures and 146 their components it is important to aim at eye-catching compositions of both 147 parts, because this is how they are seen in the sources and documentation. 148 For the same reasons, aliases of well-known library functions should be 149 avoided. 150 151 Local names of function abstraction or case/let bindings are typically 152 shorter, sometimes using only rudiments of ``words'', while still avoiding 153 cryptic shorthands. An auxiliary function called @{ML_text helper}, 154 @{ML_text aux}, or @{ML_text f} is considered bad style. 155 156 Example: 157 158 @{verbatim [display] 159\<open> (* RIGHT *) 160 161 fun print_foo ctxt foo = 162 let 163 fun print t = ... Syntax.string_of_term ctxt t ... 164 in ... end; 165 166 167 (* RIGHT *) 168 169 fun print_foo ctxt foo = 170 let 171 val string_of_term = Syntax.string_of_term ctxt; 172 fun print t = ... string_of_term t ... 173 in ... end; 174 175 176 (* WRONG *) 177 178 val string_of_term = Syntax.string_of_term; 179 180 fun print_foo ctxt foo = 181 let 182 fun aux t = ... string_of_term ctxt t ... 183 in ... end;\<close>} 184\<close> 185 186paragraph \<open>Specific conventions.\<close> 187text \<open> 188 Here are some specific name forms that occur frequently in the sources. 189 190 \<^item> A function that maps @{ML_text foo} to @{ML_text bar} is called @{ML_text 191 foo_to_bar} or @{ML_text bar_of_foo} (never @{ML_text foo2bar}, nor 192 @{ML_text bar_from_foo}, nor @{ML_text bar_for_foo}, nor @{ML_text 193 bar4foo}). 194 195 \<^item> The name component @{ML_text legacy} means that the operation is about to 196 be discontinued soon. 197 198 \<^item> The name component @{ML_text global} means that this works with the 199 background theory instead of the regular local context 200 (\secref{sec:context}), sometimes for historical reasons, sometimes due a 201 genuine lack of locality of the concept involved, sometimes as a fall-back 202 for the lack of a proper context in the application code. Whenever there is 203 a non-global variant available, the application should be migrated to use it 204 with a proper local context. 205 206 \<^item> Variables of the main context types of the Isabelle/Isar framework 207 (\secref{sec:context} and \chref{ch:local-theory}) have firm naming 208 conventions as follows: 209 210 \<^item> theories are called @{ML_text thy}, rarely @{ML_text theory} 211 (never @{ML_text thry}) 212 213 \<^item> proof contexts are called @{ML_text ctxt}, rarely @{ML_text 214 context} (never @{ML_text ctx}) 215 216 \<^item> generic contexts are called @{ML_text context} 217 218 \<^item> local theories are called @{ML_text lthy}, except for local 219 theories that are treated as proof context (which is a semantic 220 super-type) 221 222 Variations with primed or decimal numbers are always possible, as well as 223 semantic prefixes like @{ML_text foo_thy} or @{ML_text bar_ctxt}, but the 224 base conventions above need to be preserved. This allows to emphasize their 225 data flow via plain regular expressions in the text editor. 226 227 \<^item> The main logical entities (\secref{ch:logic}) have established naming 228 convention as follows: 229 230 \<^item> sorts are called @{ML_text S} 231 232 \<^item> types are called @{ML_text T}, @{ML_text U}, or @{ML_text ty} (never 233 @{ML_text t}) 234 235 \<^item> terms are called @{ML_text t}, @{ML_text u}, or @{ML_text tm} (never 236 @{ML_text trm}) 237 238 \<^item> certified types are called @{ML_text cT}, rarely @{ML_text T}, with 239 variants as for types 240 241 \<^item> certified terms are called @{ML_text ct}, rarely @{ML_text t}, with 242 variants as for terms (never @{ML_text ctrm}) 243 244 \<^item> theorems are called @{ML_text th}, or @{ML_text thm} 245 246 Proper semantic names override these conventions completely. For example, 247 the left-hand side of an equation (as a term) can be called @{ML_text lhs} 248 (not @{ML_text lhs_tm}). Or a term that is known to be a variable can be 249 called @{ML_text v} or @{ML_text x}. 250 251 \<^item> Tactics (\secref{sec:tactics}) are sufficiently important to have specific 252 naming conventions. The name of a basic tactic definition always has a 253 @{ML_text "_tac"} suffix, the subgoal index (if applicable) is always called 254 @{ML_text i}, and the goal state (if made explicit) is usually called 255 @{ML_text st} instead of the somewhat misleading @{ML_text thm}. Any other 256 arguments are given before the latter two, and the general context is given 257 first. Example: 258 259 @{verbatim [display] \<open> fun my_tac ctxt arg1 arg2 i st = ...\<close>} 260 261 Note that the goal state @{ML_text st} above is rarely made explicit, if 262 tactic combinators (tacticals) are used as usual. 263 264 A tactic that requires a proof context needs to make that explicit as seen 265 in the \<^verbatim>\<open>ctxt\<close> argument above. Do not refer to the background theory of 266 \<^verbatim>\<open>st\<close> -- it is not a proper context, but merely a formal certificate. 267\<close> 268 269 270subsection \<open>General source layout\<close> 271 272text \<open> 273 The general Isabelle/ML source layout imitates regular type-setting 274 conventions, augmented by the requirements for deeply nested expressions 275 that are commonplace in functional programming. 276\<close> 277 278paragraph \<open>Line length\<close> 279text \<open> 280 is limited to 80 characters according to ancient standards, but we allow as 281 much as 100 characters (not more).\<^footnote>\<open>Readability requires to keep the 282 beginning of a line in view while watching its end. Modern wide-screen 283 displays do not change the way how the human brain works. Sources also need 284 to be printable on plain paper with reasonable font-size.\<close> The extra 20 285 characters acknowledge the space requirements due to qualified library 286 references in Isabelle/ML. 287\<close> 288 289paragraph \<open>White-space\<close> 290text \<open> 291 is used to emphasize the structure of expressions, following mostly standard 292 conventions for mathematical typesetting, as can be seen in plain {\TeX} or 293 {\LaTeX}. This defines positioning of spaces for parentheses, punctuation, 294 and infixes as illustrated here: 295 296 @{verbatim [display] 297\<open> val x = y + z * (a + b); 298 val pair = (a, b); 299 val record = {foo = 1, bar = 2};\<close>} 300 301 Lines are normally broken \<^emph>\<open>after\<close> an infix operator or punctuation 302 character. For example: 303 304 @{verbatim [display] 305\<open> 306 val x = 307 a + 308 b + 309 c; 310 311 val tuple = 312 (a, 313 b, 314 c); 315\<close>} 316 317 Some special infixes (e.g.\ @{ML_text "|>"}) work better at the start of the 318 line, but punctuation is always at the end. 319 320 Function application follows the tradition of \<open>\<lambda>\<close>-calculus, not informal 321 mathematics. For example: @{ML_text "f a b"} for a curried function, or 322 @{ML_text "g (a, b)"} for a tupled function. Note that the space between 323 @{ML_text g} and the pair @{ML_text "(a, b)"} follows the important 324 principle of \<^emph>\<open>compositionality\<close>: the layout of @{ML_text "g p"} does not 325 change when @{ML_text "p"} is refined to the concrete pair @{ML_text "(a, 326 b)"}. 327\<close> 328 329paragraph \<open>Indentation\<close> 330text \<open> 331 uses plain spaces, never hard tabulators.\<^footnote>\<open>Tabulators were invented to move 332 the carriage of a type-writer to certain predefined positions. In software 333 they could be used as a primitive run-length compression of consecutive 334 spaces, but the precise result would depend on non-standardized text editor 335 configuration.\<close> 336 337 Each level of nesting is indented by 2 spaces, sometimes 1, very rarely 4, 338 never 8 or any other odd number. 339 340 Indentation follows a simple logical format that only depends on the nesting 341 depth, not the accidental length of the text that initiates a level of 342 nesting. Example: 343 344 @{verbatim [display] 345\<open> (* RIGHT *) 346 347 if b then 348 expr1_part1 349 expr1_part2 350 else 351 expr2_part1 352 expr2_part2 353 354 355 (* WRONG *) 356 357 if b then expr1_part1 358 expr1_part2 359 else expr2_part1 360 expr2_part2\<close>} 361 362 The second form has many problems: it assumes a fixed-width font when 363 viewing the sources, it uses more space on the line and thus makes it hard 364 to observe its strict length limit (working against \<^emph>\<open>readability\<close>), it 365 requires extra editing to adapt the layout to changes of the initial text 366 (working against \<^emph>\<open>maintainability\<close>) etc. 367 368 \<^medskip> 369 For similar reasons, any kind of two-dimensional or tabular layouts, 370 ASCII-art with lines or boxes of asterisks etc.\ should be avoided. 371\<close> 372 373paragraph \<open>Complex expressions\<close> 374text \<open> 375 that consist of multi-clausal function definitions, @{ML_text handle}, 376 @{ML_text case}, @{ML_text let} (and combinations) require special 377 attention. The syntax of Standard ML is quite ambitious and admits a lot of 378 variance that can distort the meaning of the text. 379 380 Multiple clauses of @{ML_text fun}, @{ML_text fn}, @{ML_text handle}, 381 @{ML_text case} get extra indentation to indicate the nesting clearly. 382 Example: 383 384 @{verbatim [display] 385\<open> (* RIGHT *) 386 387 fun foo p1 = 388 expr1 389 | foo p2 = 390 expr2 391 392 393 (* WRONG *) 394 395 fun foo p1 = 396 expr1 397 | foo p2 = 398 expr2\<close>} 399 400 Body expressions consisting of @{ML_text case} or @{ML_text let} require 401 care to maintain compositionality, to prevent loss of logical indentation 402 where it is especially important to see the structure of the text. Example: 403 404 @{verbatim [display] 405\<open> (* RIGHT *) 406 407 fun foo p1 = 408 (case e of 409 q1 => ... 410 | q2 => ...) 411 | foo p2 = 412 let 413 ... 414 in 415 ... 416 end 417 418 419 (* WRONG *) 420 421 fun foo p1 = case e of 422 q1 => ... 423 | q2 => ... 424 | foo p2 = 425 let 426 ... 427 in 428 ... 429 end\<close>} 430 431 Extra parentheses around @{ML_text case} expressions are optional, but help 432 to analyse the nesting based on character matching in the text editor. 433 434 \<^medskip> 435 There are two main exceptions to the overall principle of compositionality 436 in the layout of complex expressions. 437 438 \<^enum> @{ML_text "if"} expressions are iterated as if ML had multi-branch 439 conditionals, e.g. 440 441 @{verbatim [display] 442\<open> (* RIGHT *) 443 444 if b1 then e1 445 else if b2 then e2 446 else e3\<close>} 447 448 \<^enum> @{ML_text fn} abstractions are often layed-out as if they would lack any 449 structure by themselves. This traditional form is motivated by the 450 possibility to shift function arguments back and forth wrt.\ additional 451 combinators. Example: 452 453 @{verbatim [display] 454\<open> (* RIGHT *) 455 456 fun foo x y = fold (fn z => 457 expr)\<close>} 458 459 Here the visual appearance is that of three arguments @{ML_text x}, 460 @{ML_text y}, @{ML_text z} in a row. 461 462 463 Such weakly structured layout should be use with great care. Here are some 464 counter-examples involving @{ML_text let} expressions: 465 466 @{verbatim [display] 467\<open> (* WRONG *) 468 469 fun foo x = let 470 val y = ... 471 in ... end 472 473 474 (* WRONG *) 475 476 fun foo x = let 477 val y = ... 478 in ... end 479 480 481 (* WRONG *) 482 483 fun foo x = 484 let 485 val y = ... 486 in ... end 487 488 489 (* WRONG *) 490 491 fun foo x = 492 let 493 val y = ... 494 in 495 ... end\<close>} 496 497 \<^medskip> 498 In general the source layout is meant to emphasize the structure of complex 499 language expressions, not to pretend that SML had a completely different 500 syntax (say that of Haskell, Scala, Java). 501\<close> 502 503 504section \<open>ML embedded into Isabelle/Isar\<close> 505 506text \<open> 507 ML and Isar are intertwined via an open-ended bootstrap process that 508 provides more and more programming facilities and logical content in an 509 alternating manner. Bootstrapping starts from the raw environment of 510 existing implementations of Standard ML (mainly Poly/ML). 511 512 Isabelle/Pure marks the point where the raw ML toplevel is superseded by 513 Isabelle/ML within the Isar theory and proof language, with a uniform 514 context for arbitrary ML values (see also \secref{sec:context}). This formal 515 environment holds ML compiler bindings, logical entities, and many other 516 things. 517 518 Object-logics like Isabelle/HOL are built within the Isabelle/ML/Isar 519 environment by introducing suitable theories with associated ML modules, 520 either inlined within \<^verbatim>\<open>.thy\<close> files, or as separate \<^verbatim>\<open>.ML\<close> files that are 521 loading from some theory. Thus Isabelle/HOL is defined as a regular 522 user-space application within the Isabelle framework. Further add-on tools 523 can be implemented in ML within the Isar context in the same manner: ML is 524 part of the standard repertoire of Isabelle, and there is no distinction 525 between ``users'' and ``developers'' in this respect. 526\<close> 527 528 529subsection \<open>Isar ML commands\<close> 530 531text \<open> 532 The primary Isar source language provides facilities to ``open a window'' to 533 the underlying ML compiler. Especially see the Isar commands @{command_ref 534 "ML_file"} and @{command_ref "ML"}: both work the same way, but the source 535 text is provided differently, via a file vs.\ inlined, respectively. Apart 536 from embedding ML into the main theory definition like that, there are many 537 more commands that refer to ML source, such as @{command_ref setup} or 538 @{command_ref declaration}. Even more fine-grained embedding of ML into Isar 539 is encountered in the proof method @{method_ref tactic}, which refines the 540 pending goal state via a given expression of type @{ML_type tactic}. 541\<close> 542 543text %mlex \<open> 544 The following artificial example demonstrates some ML toplevel declarations 545 within the implicit Isar theory context. This is regular functional 546 programming without referring to logical entities yet. 547\<close> 548 549ML \<open> 550 fun factorial 0 = 1 551 | factorial n = n * factorial (n - 1) 552\<close> 553 554text \<open> 555 Here the ML environment is already managed by Isabelle, i.e.\ the @{ML 556 factorial} function is not yet accessible in the preceding paragraph, nor in 557 a different theory that is independent from the current one in the import 558 hierarchy. 559 560 Removing the above ML declaration from the source text will remove any trace 561 of this definition, as expected. The Isabelle/ML toplevel environment is 562 managed in a \<^emph>\<open>stateless\<close> way: in contrast to the raw ML toplevel, there are 563 no global side-effects involved here.\<^footnote>\<open>Such a stateless compilation 564 environment is also a prerequisite for robust parallel compilation within 565 independent nodes of the implicit theory development graph.\<close> 566 567 \<^medskip> 568 The next example shows how to embed ML into Isar proofs, using @{command_ref 569 "ML_prf"} instead of @{command_ref "ML"}. As illustrated below, the effect 570 on the ML environment is local to the whole proof body, but ignoring the 571 block structure. 572\<close> 573 574notepad 575begin 576 ML_prf %"ML" \<open>val a = 1\<close> 577 { 578 ML_prf %"ML" \<open>val b = a + 1\<close> 579 } \<comment> \<open>Isar block structure ignored by ML environment\<close> 580 ML_prf %"ML" \<open>val c = b + 1\<close> 581end 582 583text \<open> 584 By side-stepping the normal scoping rules for Isar proof blocks, embedded ML 585 code can refer to the different contexts and manipulate corresponding 586 entities, e.g.\ export a fact from a block context. 587 588 \<^medskip> 589 Two further ML commands are useful in certain situations: @{command_ref 590 ML_val} and @{command_ref ML_command} are \<^emph>\<open>diagnostic\<close> in the sense that 591 there is no effect on the underlying environment, and can thus be used 592 anywhere. The examples below produce long strings of digits by invoking @{ML 593 factorial}: @{command ML_val} takes care of printing the ML toplevel result, 594 but @{command ML_command} is silent so we produce an explicit output 595 message. 596\<close> 597 598ML_val \<open>factorial 100\<close> 599ML_command \<open>writeln (string_of_int (factorial 100))\<close> 600 601notepad 602begin 603 ML_val \<open>factorial 100\<close> 604 ML_command \<open>writeln (string_of_int (factorial 100))\<close> 605end 606 607 608subsection \<open>Compile-time context\<close> 609 610text \<open> 611 Whenever the ML compiler is invoked within Isabelle/Isar, the formal context 612 is passed as a thread-local reference variable. Thus ML code may access the 613 theory context during compilation, by reading or writing the (local) theory 614 under construction. Note that such direct access to the compile-time context 615 is rare. In practice it is typically done via some derived ML functions 616 instead. 617\<close> 618 619text %mlref \<open> 620 \begin{mldecls} 621 @{index_ML Context.the_generic_context: "unit -> Context.generic"} \\ 622 @{index_ML "Context.>>": "(Context.generic -> Context.generic) -> unit"} \\ 623 @{index_ML ML_Thms.bind_thms: "string * thm list -> unit"} \\ 624 @{index_ML ML_Thms.bind_thm: "string * thm -> unit"} \\ 625 \end{mldecls} 626 627 \<^descr> @{ML "Context.the_generic_context ()"} refers to the theory context of 628 the ML toplevel --- at compile time. ML code needs to take care to refer to 629 @{ML "Context.the_generic_context ()"} correctly. Recall that evaluation 630 of a function body is delayed until actual run-time. 631 632 \<^descr> @{ML "Context.>>"}~\<open>f\<close> applies context transformation \<open>f\<close> to the implicit 633 context of the ML toplevel. 634 635 \<^descr> @{ML ML_Thms.bind_thms}~\<open>(name, thms)\<close> stores a list of theorems produced 636 in ML both in the (global) theory context and the ML toplevel, associating 637 it with the provided name. 638 639 \<^descr> @{ML ML_Thms.bind_thm} is similar to @{ML ML_Thms.bind_thms} but refers to 640 a singleton fact. 641 642 It is important to note that the above functions are really restricted to 643 the compile time, even though the ML compiler is invoked at run-time. The 644 majority of ML code either uses static antiquotations 645 (\secref{sec:ML-antiq}) or refers to the theory or proof context at 646 run-time, by explicit functional abstraction. 647\<close> 648 649 650subsection \<open>Antiquotations \label{sec:ML-antiq}\<close> 651 652text \<open> 653 A very important consequence of embedding ML into Isar is the concept of 654 \<^emph>\<open>ML antiquotation\<close>. The standard token language of ML is augmented by 655 special syntactic entities of the following form: 656 657 @{rail \<open> 658 @{syntax_def antiquote}: '@{' name args '}' 659 \<close>} 660 661 Here @{syntax name} and @{syntax args} are outer syntax categories, as 662 defined in @{cite "isabelle-isar-ref"}. 663 664 \<^medskip> 665 A regular antiquotation \<open>@{name args}\<close> processes its arguments by the usual 666 means of the Isar source language, and produces corresponding ML source 667 text, either as literal \<^emph>\<open>inline\<close> text (e.g.\ \<open>@{term t}\<close>) or abstract 668 \<^emph>\<open>value\<close> (e.g. \<open>@{thm th}\<close>). This pre-compilation scheme allows to refer to 669 formal entities in a robust manner, with proper static scoping and with some 670 degree of logical checking of small portions of the code. 671\<close> 672 673 674subsection \<open>Printing ML values\<close> 675 676text \<open> 677 The ML compiler knows about the structure of values according to their 678 static type, and can print them in the manner of its toplevel, although the 679 details are non-portable. The antiquotations @{ML_antiquotation_def 680 "make_string"} and @{ML_antiquotation_def "print"} provide a quasi-portable 681 way to refer to this potential capability of the underlying ML system in 682 generic Isabelle/ML sources. 683 684 This is occasionally useful for diagnostic or demonstration purposes. Note 685 that production-quality tools require proper user-level error messages, 686 avoiding raw ML values in the output. 687\<close> 688 689text %mlantiq \<open> 690 \begin{matharray}{rcl} 691 @{ML_antiquotation_def "make_string"} & : & \<open>ML_antiquotation\<close> \\ 692 @{ML_antiquotation_def "print"} & : & \<open>ML_antiquotation\<close> \\ 693 \end{matharray} 694 695 @{rail \<open> 696 @@{ML_antiquotation make_string} 697 ; 698 @@{ML_antiquotation print} embedded? 699 \<close>} 700 701 \<^descr> \<open>@{make_string}\<close> inlines a function to print arbitrary values similar to 702 the ML toplevel. The result is compiler dependent and may fall back on "?" 703 in certain situations. The value of configuration option @{attribute_ref 704 ML_print_depth} determines further details of output. 705 706 \<^descr> \<open>@{print f}\<close> uses the ML function \<open>f: string -> unit\<close> to output the result 707 of \<open>@{make_string}\<close> above, together with the source position of the 708 antiquotation. The default output function is @{ML writeln}. 709\<close> 710 711text %mlex \<open> 712 The following artificial examples show how to produce adhoc output of ML 713 values for debugging purposes. 714\<close> 715 716ML_val \<open> 717 val x = 42; 718 val y = true; 719 720 writeln (@{make_string} {x = x, y = y}); 721 722 @{print} {x = x, y = y}; 723 @{print tracing} {x = x, y = y}; 724\<close> 725 726 727section \<open>Canonical argument order \label{sec:canonical-argument-order}\<close> 728 729text \<open> 730 Standard ML is a language in the tradition of \<open>\<lambda>\<close>-calculus and 731 \<^emph>\<open>higher-order functional programming\<close>, similar to OCaml, Haskell, or 732 Isabelle/Pure and HOL as logical languages. Getting acquainted with the 733 native style of representing functions in that setting can save a lot of 734 extra boiler-plate of redundant shuffling of arguments, auxiliary 735 abstractions etc. 736 737 Functions are usually \<^emph>\<open>curried\<close>: the idea of turning arguments of type 738 \<open>\<tau>\<^sub>i\<close> (for \<open>i \<in> {1, \<dots> n}\<close>) into a result of type \<open>\<tau>\<close> is represented by the 739 iterated function space \<open>\<tau>\<^sub>1 \<rightarrow> \<dots> \<rightarrow> \<tau>\<^sub>n \<rightarrow> \<tau>\<close>. This is isomorphic to the 740 well-known encoding via tuples \<open>\<tau>\<^sub>1 \<times> \<dots> \<times> \<tau>\<^sub>n \<rightarrow> \<tau>\<close>, but the curried version 741 fits more smoothly into the basic calculus.\<^footnote>\<open>The difference is even more 742 significant in HOL, because the redundant tuple structure needs to be 743 accommodated extraneous proof steps.\<close> 744 745 Currying gives some flexibility due to \<^emph>\<open>partial application\<close>. A function 746 \<open>f: \<tau>\<^sub>1 \<rightarrow> \<tau>\<^sub>2 \<rightarrow> \<tau>\<close> can be applied to \<open>x: \<tau>\<^sub>1\<close> and the remaining \<open>(f x): \<tau>\<^sub>2 747 \<rightarrow> \<tau>\<close> passed to another function etc. How well this works in practice depends 748 on the order of arguments. In the worst case, arguments are arranged 749 erratically, and using a function in a certain situation always requires 750 some glue code. Thus we would get exponentially many opportunities to 751 decorate the code with meaningless permutations of arguments. 752 753 This can be avoided by \<^emph>\<open>canonical argument order\<close>, which observes certain 754 standard patterns and minimizes adhoc permutations in their application. In 755 Isabelle/ML, large portions of text can be written without auxiliary 756 operations like \<open>swap: \<alpha> \<times> \<beta> \<rightarrow> \<beta> \<times> \<alpha>\<close> or \<open>C: (\<alpha> \<rightarrow> \<beta> \<rightarrow> \<gamma>) \<rightarrow> (\<beta> \<rightarrow> \<alpha> \<rightarrow> \<gamma>)\<close> (the 757 latter is not present in the Isabelle/ML library). 758 759 \<^medskip> 760 The main idea is that arguments that vary less are moved further to the left 761 than those that vary more. Two particularly important categories of 762 functions are \<^emph>\<open>selectors\<close> and \<^emph>\<open>updates\<close>. 763 764 The subsequent scheme is based on a hypothetical set-like container of type 765 \<open>\<beta>\<close> that manages elements of type \<open>\<alpha>\<close>. Both the names and types of the 766 associated operations are canonical for Isabelle/ML. 767 768 \begin{center} 769 \begin{tabular}{ll} 770 kind & canonical name and type \\\hline 771 selector & \<open>member: \<beta> \<rightarrow> \<alpha> \<rightarrow> bool\<close> \\ 772 update & \<open>insert: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>\<close> \\ 773 \end{tabular} 774 \end{center} 775 776 Given a container \<open>B: \<beta>\<close>, the partially applied \<open>member B\<close> is a predicate 777 over elements \<open>\<alpha> \<rightarrow> bool\<close>, and thus represents the intended denotation 778 directly. It is customary to pass the abstract predicate to further 779 operations, not the concrete container. The argument order makes it easy to 780 use other combinators: \<open>forall (member B) list\<close> will check a list of 781 elements for membership in \<open>B\<close> etc. Often the explicit \<open>list\<close> is pointless 782 and can be contracted to \<open>forall (member B)\<close> to get directly a predicate 783 again. 784 785 In contrast, an update operation varies the container, so it moves to the 786 right: \<open>insert a\<close> is a function \<open>\<beta> \<rightarrow> \<beta>\<close> to insert a value \<open>a\<close>. These can be 787 composed naturally as \<open>insert c \<circ> insert b \<circ> insert a\<close>. The slightly awkward 788 inversion of the composition order is due to conventional mathematical 789 notation, which can be easily amended as explained below. 790\<close> 791 792 793subsection \<open>Forward application and composition\<close> 794 795text \<open> 796 Regular function application and infix notation works best for relatively 797 deeply structured expressions, e.g.\ \<open>h (f x y + g z)\<close>. The important 798 special case of \<^emph>\<open>linear transformation\<close> applies a cascade of functions \<open>f\<^sub>n 799 (\<dots> (f\<^sub>1 x))\<close>. This becomes hard to read and maintain if the functions are 800 themselves given as complex expressions. The notation can be significantly 801 improved by introducing \<^emph>\<open>forward\<close> versions of application and composition 802 as follows: 803 804 \<^medskip> 805 \begin{tabular}{lll} 806 \<open>x |> f\<close> & \<open>\<equiv>\<close> & \<open>f x\<close> \\ 807 \<open>(f #> g) x\<close> & \<open>\<equiv>\<close> & \<open>x |> f |> g\<close> \\ 808 \end{tabular} 809 \<^medskip> 810 811 This enables to write conveniently \<open>x |> f\<^sub>1 |> \<dots> |> f\<^sub>n\<close> or \<open>f\<^sub>1 #> \<dots> #> 812 f\<^sub>n\<close> for its functional abstraction over \<open>x\<close>. 813 814 \<^medskip> 815 There is an additional set of combinators to accommodate multiple results 816 (via pairs) that are passed on as multiple arguments (via currying). 817 818 \<^medskip> 819 \begin{tabular}{lll} 820 \<open>(x, y) |-> f\<close> & \<open>\<equiv>\<close> & \<open>f x y\<close> \\ 821 \<open>(f #-> g) x\<close> & \<open>\<equiv>\<close> & \<open>x |> f |-> g\<close> \\ 822 \end{tabular} 823 \<^medskip> 824\<close> 825 826text %mlref \<open> 827 \begin{mldecls} 828 @{index_ML_op "|> ": "'a * ('a -> 'b) -> 'b"} \\ 829 @{index_ML_op "|-> ": "('c * 'a) * ('c -> 'a -> 'b) -> 'b"} \\ 830 @{index_ML_op "#> ": "('a -> 'b) * ('b -> 'c) -> 'a -> 'c"} \\ 831 @{index_ML_op "#-> ": "('a -> 'c * 'b) * ('c -> 'b -> 'd) -> 'a -> 'd"} \\ 832 \end{mldecls} 833\<close> 834 835 836subsection \<open>Canonical iteration\<close> 837 838text \<open> 839 As explained above, a function \<open>f: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>\<close> can be understood as update on 840 a configuration of type \<open>\<beta>\<close>, parameterized by an argument of type \<open>\<alpha>\<close>. Given 841 \<open>a: \<alpha>\<close> the partial application \<open>(f a): \<beta> \<rightarrow> \<beta>\<close> operates homogeneously on \<open>\<beta>\<close>. 842 This can be iterated naturally over a list of parameters \<open>[a\<^sub>1, \<dots>, a\<^sub>n]\<close> as 843 \<open>f a\<^sub>1 #> \<dots> #> f a\<^sub>n\<close>. The latter expression is again a function \<open>\<beta> \<rightarrow> \<beta>\<close>. It 844 can be applied to an initial configuration \<open>b: \<beta>\<close> to start the iteration 845 over the given list of arguments: each \<open>a\<close> in \<open>a\<^sub>1, \<dots>, a\<^sub>n\<close> is applied 846 consecutively by updating a cumulative configuration. 847 848 The \<open>fold\<close> combinator in Isabelle/ML lifts a function \<open>f\<close> as above to its 849 iterated version over a list of arguments. Lifting can be repeated, e.g.\ 850 \<open>(fold \<circ> fold) f\<close> iterates over a list of lists as expected. 851 852 The variant \<open>fold_rev\<close> works inside-out over the list of arguments, such 853 that \<open>fold_rev f \<equiv> fold f \<circ> rev\<close> holds. 854 855 The \<open>fold_map\<close> combinator essentially performs \<open>fold\<close> and \<open>map\<close> 856 simultaneously: each application of \<open>f\<close> produces an updated configuration 857 together with a side-result; the iteration collects all such side-results as 858 a separate list. 859\<close> 860 861text %mlref \<open> 862 \begin{mldecls} 863 @{index_ML fold: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\ 864 @{index_ML fold_rev: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\ 865 @{index_ML fold_map: "('a -> 'b -> 'c * 'b) -> 'a list -> 'b -> 'c list * 'b"} \\ 866 \end{mldecls} 867 868 \<^descr> @{ML fold}~\<open>f\<close> lifts the parametrized update function \<open>f\<close> to a list of 869 parameters. 870 871 \<^descr> @{ML fold_rev}~\<open>f\<close> is similar to @{ML fold}~\<open>f\<close>, but works inside-out, as 872 if the list would be reversed. 873 874 \<^descr> @{ML fold_map}~\<open>f\<close> lifts the parametrized update function \<open>f\<close> (with 875 side-result) to a list of parameters and cumulative side-results. 876 877 878 \begin{warn} 879 The literature on functional programming provides a confusing multitude of 880 combinators called \<open>foldl\<close>, \<open>foldr\<close> etc. SML97 provides its own variations 881 as @{ML List.foldl} and @{ML List.foldr}, while the classic Isabelle library 882 also has the historic @{ML Library.foldl} and @{ML Library.foldr}. To avoid 883 unnecessary complication, all these historical versions should be ignored, 884 and the canonical @{ML fold} (or @{ML fold_rev}) used exclusively. 885 \end{warn} 886\<close> 887 888text %mlex \<open> 889 The following example shows how to fill a text buffer incrementally by 890 adding strings, either individually or from a given list. 891\<close> 892 893ML_val \<open> 894 val s = 895 Buffer.empty 896 |> Buffer.add "digits: " 897 |> fold (Buffer.add o string_of_int) (0 upto 9) 898 |> Buffer.content; 899 900 @{assert} (s = "digits: 0123456789"); 901\<close> 902 903text \<open> 904 Note how @{ML "fold (Buffer.add o string_of_int)"} above saves an extra @{ML 905 "map"} over the given list. This kind of peephole optimization reduces both 906 the code size and the tree structures in memory (``deforestation''), but it 907 requires some practice to read and write fluently. 908 909 \<^medskip> 910 The next example elaborates the idea of canonical iteration, demonstrating 911 fast accumulation of tree content using a text buffer. 912\<close> 913 914ML \<open> 915 datatype tree = Text of string | Elem of string * tree list; 916 917 fun slow_content (Text txt) = txt 918 | slow_content (Elem (name, ts)) = 919 "<" ^ name ^ ">" ^ 920 implode (map slow_content ts) ^ 921 "</" ^ name ^ ">" 922 923 fun add_content (Text txt) = Buffer.add txt 924 | add_content (Elem (name, ts)) = 925 Buffer.add ("<" ^ name ^ ">") #> 926 fold add_content ts #> 927 Buffer.add ("</" ^ name ^ ">"); 928 929 fun fast_content tree = 930 Buffer.empty |> add_content tree |> Buffer.content; 931\<close> 932 933text \<open> 934 The slowness of @{ML slow_content} is due to the @{ML implode} of the 935 recursive results, because it copies previously produced strings again and 936 again. 937 938 The incremental @{ML add_content} avoids this by operating on a buffer that 939 is passed through in a linear fashion. Using @{ML_text "#>"} and contraction 940 over the actual buffer argument saves some additional boiler-plate. Of 941 course, the two @{ML "Buffer.add"} invocations with concatenated strings 942 could have been split into smaller parts, but this would have obfuscated the 943 source without making a big difference in performance. Here we have done 944 some peephole-optimization for the sake of readability. 945 946 Another benefit of @{ML add_content} is its ``open'' form as a function on 947 buffers that can be continued in further linear transformations, folding 948 etc. Thus it is more compositional than the naive @{ML slow_content}. As 949 realistic example, compare the old-style 950 @{ML "Term.maxidx_of_term: term -> int"} with the newer @{ML 951 "Term.maxidx_term: term -> int -> int"} in Isabelle/Pure. 952 953 Note that @{ML fast_content} above is only defined as example. In many 954 practical situations, it is customary to provide the incremental @{ML 955 add_content} only and leave the initialization and termination to the 956 concrete application to the user. 957\<close> 958 959 960section \<open>Message output channels \label{sec:message-channels}\<close> 961 962text \<open> 963 Isabelle provides output channels for different kinds of messages: regular 964 output, high-volume tracing information, warnings, and errors. 965 966 Depending on the user interface involved, these messages may appear in 967 different text styles or colours. The standard output for batch sessions 968 prefixes each line of warnings by \<^verbatim>\<open>###\<close> and errors by \<^verbatim>\<open>***\<close>, but leaves 969 anything else unchanged. The message body may contain further markup and 970 formatting, which is routinely used in the Prover IDE @{cite 971 "isabelle-jedit"}. 972 973 Messages are associated with the transaction context of the running Isar 974 command. This enables the front-end to manage commands and resulting 975 messages together. For example, after deleting a command from a given theory 976 document version, the corresponding message output can be retracted from the 977 display. 978\<close> 979 980text %mlref \<open> 981 \begin{mldecls} 982 @{index_ML writeln: "string -> unit"} \\ 983 @{index_ML tracing: "string -> unit"} \\ 984 @{index_ML warning: "string -> unit"} \\ 985 @{index_ML error: "string -> 'a"} % FIXME Output.error_message (!?) \\ 986 \end{mldecls} 987 988 \<^descr> @{ML writeln}~\<open>text\<close> outputs \<open>text\<close> as regular message. This is the 989 primary message output operation of Isabelle and should be used by default. 990 991 \<^descr> @{ML tracing}~\<open>text\<close> outputs \<open>text\<close> as special tracing message, indicating 992 potential high-volume output to the front-end (hundreds or thousands of 993 messages issued by a single command). The idea is to allow the 994 user-interface to downgrade the quality of message display to achieve higher 995 throughput. 996 997 Note that the user might have to take special actions to see tracing output, 998 e.g.\ switch to a different output window. So this channel should not be 999 used for regular output. 1000 1001 \<^descr> @{ML warning}~\<open>text\<close> outputs \<open>text\<close> as warning, which typically means some 1002 extra emphasis on the front-end side (color highlighting, icons, etc.). 1003 1004 \<^descr> @{ML error}~\<open>text\<close> raises exception @{ML ERROR}~\<open>text\<close> and thus lets the 1005 Isar toplevel print \<open>text\<close> on the error channel, which typically means some 1006 extra emphasis on the front-end side (color highlighting, icons, etc.). 1007 1008 This assumes that the exception is not handled before the command 1009 terminates. Handling exception @{ML ERROR}~\<open>text\<close> is a perfectly legal 1010 alternative: it means that the error is absorbed without any message output. 1011 1012 \begin{warn} 1013 The actual error channel is accessed via @{ML Output.error_message}, but 1014 this is normally not used directly in user code. 1015 \end{warn} 1016 1017 1018 \begin{warn} 1019 Regular Isabelle/ML code should output messages exclusively by the official 1020 channels. Using raw I/O on \<^emph>\<open>stdout\<close> or \<^emph>\<open>stderr\<close> instead (e.g.\ via @{ML 1021 TextIO.output}) is apt to cause problems in the presence of parallel and 1022 asynchronous processing of Isabelle theories. Such raw output might be 1023 displayed by the front-end in some system console log, with a low chance 1024 that the user will ever see it. Moreover, as a genuine side-effect on global 1025 process channels, there is no proper way to retract output when Isar command 1026 transactions are reset by the system. 1027 \end{warn} 1028 1029 \begin{warn} 1030 The message channels should be used in a message-oriented manner. This means 1031 that multi-line output that logically belongs together is issued by a single 1032 invocation of @{ML writeln} etc.\ with the functional concatenation of all 1033 message constituents. 1034 \end{warn} 1035\<close> 1036 1037text %mlex \<open> 1038 The following example demonstrates a multi-line warning. Note that in some 1039 situations the user sees only the first line, so the most important point 1040 should be made first. 1041\<close> 1042 1043ML_command \<open> 1044 warning (cat_lines 1045 ["Beware the Jabberwock, my son!", 1046 "The jaws that bite, the claws that catch!", 1047 "Beware the Jubjub Bird, and shun", 1048 "The frumious Bandersnatch!"]); 1049\<close> 1050 1051text \<open> 1052 \<^medskip> 1053 An alternative is to make a paragraph of freely-floating words as follows. 1054\<close> 1055 1056ML_command \<open> 1057 warning (Pretty.string_of (Pretty.para 1058 "Beware the Jabberwock, my son! \ 1059 \The jaws that bite, the claws that catch! \ 1060 \Beware the Jubjub Bird, and shun \ 1061 \The frumious Bandersnatch!")) 1062\<close> 1063 1064text \<open> 1065 This has advantages with variable window / popup sizes, but might make it 1066 harder to search for message content systematically, e.g.\ by other tools or 1067 by humans expecting the ``verse'' of a formal message in a fixed layout. 1068\<close> 1069 1070 1071section \<open>Exceptions \label{sec:exceptions}\<close> 1072 1073text \<open> 1074 The Standard ML semantics of strict functional evaluation together with 1075 exceptions is rather well defined, but some delicate points need to be 1076 observed to avoid that ML programs go wrong despite static type-checking. 1077 Exceptions in Isabelle/ML are subsequently categorized as follows. 1078\<close> 1079 1080paragraph \<open>Regular user errors.\<close> 1081text \<open> 1082 These are meant to provide informative feedback about malformed input etc. 1083 1084 The \<^emph>\<open>error\<close> function raises the corresponding @{ML ERROR} exception, with a 1085 plain text message as argument. @{ML ERROR} exceptions can be handled 1086 internally, in order to be ignored, turned into other exceptions, or 1087 cascaded by appending messages. If the corresponding Isabelle/Isar command 1088 terminates with an @{ML ERROR} exception state, the system will print the 1089 result on the error channel (see \secref{sec:message-channels}). 1090 1091 It is considered bad style to refer to internal function names or values in 1092 ML source notation in user error messages. Do not use \<open>@{make_string}\<close> nor 1093 \<open>@{here}\<close>! 1094 1095 Grammatical correctness of error messages can be improved by \<^emph>\<open>omitting\<close> 1096 final punctuation: messages are often concatenated or put into a larger 1097 context (e.g.\ augmented with source position). Note that punctuation after 1098 formal entities (types, terms, theorems) is particularly prone to user 1099 confusion. 1100\<close> 1101 1102paragraph \<open>Program failures.\<close> 1103text \<open> 1104 There is a handful of standard exceptions that indicate general failure 1105 situations, or failures of core operations on logical entities (types, 1106 terms, theorems, theories, see \chref{ch:logic}). 1107 1108 These exceptions indicate a genuine breakdown of the program, so the main 1109 purpose is to determine quickly what has happened where. Traditionally, the 1110 (short) exception message would include the name of an ML function, although 1111 this is no longer necessary, because the ML runtime system attaches detailed 1112 source position stemming from the corresponding @{ML_text raise} keyword. 1113 1114 \<^medskip> 1115 User modules can always introduce their own custom exceptions locally, e.g.\ 1116 to organize internal failures robustly without overlapping with existing 1117 exceptions. Exceptions that are exposed in module signatures require extra 1118 care, though, and should \<^emph>\<open>not\<close> be introduced by default. Surprise by users 1119 of a module can be often minimized by using plain user errors instead. 1120\<close> 1121 1122paragraph \<open>Interrupts.\<close> 1123text \<open> 1124 These indicate arbitrary system events: both the ML runtime system and the 1125 Isabelle/ML infrastructure signal various exceptional situations by raising 1126 the special @{ML Exn.Interrupt} exception in user code. 1127 1128 This is the one and only way that physical events can intrude an Isabelle/ML 1129 program. Such an interrupt can mean out-of-memory, stack overflow, timeout, 1130 internal signaling of threads, or a POSIX process signal. An Isabelle/ML 1131 program that intercepts interrupts becomes dependent on physical effects of 1132 the environment. Even worse, exception handling patterns that are too 1133 general by accident, e.g.\ by misspelled exception constructors, will cover 1134 interrupts unintentionally and thus render the program semantics 1135 ill-defined. 1136 1137 Note that the Interrupt exception dates back to the original SML90 language 1138 definition. It was excluded from the SML97 version to avoid its malign 1139 impact on ML program semantics, but without providing a viable alternative. 1140 Isabelle/ML recovers physical interruptibility (which is an indispensable 1141 tool to implement managed evaluation of command transactions), but requires 1142 user code to be strictly transparent wrt.\ interrupts. 1143 1144 \begin{warn} 1145 Isabelle/ML user code needs to terminate promptly on interruption, without 1146 guessing at its meaning to the system infrastructure. Temporary handling of 1147 interrupts for cleanup of global resources etc.\ needs to be followed 1148 immediately by re-raising of the original exception. 1149 \end{warn} 1150\<close> 1151 1152text %mlref \<open> 1153 \begin{mldecls} 1154 @{index_ML try: "('a -> 'b) -> 'a -> 'b option"} \\ 1155 @{index_ML can: "('a -> 'b) -> 'a -> bool"} \\ 1156 @{index_ML_exception ERROR: string} \\ 1157 @{index_ML_exception Fail: string} \\ 1158 @{index_ML Exn.is_interrupt: "exn -> bool"} \\ 1159 @{index_ML Exn.reraise: "exn -> 'a"} \\ 1160 @{index_ML Runtime.exn_trace: "(unit -> 'a) -> 'a"} \\ 1161 \end{mldecls} 1162 1163 \<^descr> @{ML try}~\<open>f x\<close> makes the partiality of evaluating \<open>f x\<close> explicit via the 1164 option datatype. Interrupts are \<^emph>\<open>not\<close> handled here, i.e.\ this form serves 1165 as safe replacement for the \<^emph>\<open>unsafe\<close> version @{ML_text "(SOME"}~\<open>f 1166 x\<close>~@{ML_text "handle _ => NONE)"} that is occasionally seen in books about 1167 SML97, but not in Isabelle/ML. 1168 1169 \<^descr> @{ML can} is similar to @{ML try} with more abstract result. 1170 1171 \<^descr> @{ML ERROR}~\<open>msg\<close> represents user errors; this exception is normally 1172 raised indirectly via the @{ML error} function (see 1173 \secref{sec:message-channels}). 1174 1175 \<^descr> @{ML Fail}~\<open>msg\<close> represents general program failures. 1176 1177 \<^descr> @{ML Exn.is_interrupt} identifies interrupts robustly, without mentioning 1178 concrete exception constructors in user code. Handled interrupts need to be 1179 re-raised promptly! 1180 1181 \<^descr> @{ML Exn.reraise}~\<open>exn\<close> raises exception \<open>exn\<close> while preserving its implicit 1182 position information (if possible, depending on the ML platform). 1183 1184 \<^descr> @{ML Runtime.exn_trace}~@{ML_text "(fn () =>"}~\<open>e\<close>@{ML_text ")"} evaluates 1185 expression \<open>e\<close> while printing a full trace of its stack of nested exceptions 1186 (if possible, depending on the ML platform). 1187 1188 Inserting @{ML Runtime.exn_trace} into ML code temporarily is useful for 1189 debugging, but not suitable for production code. 1190\<close> 1191 1192text %mlantiq \<open> 1193 \begin{matharray}{rcl} 1194 @{ML_antiquotation_def "assert"} & : & \<open>ML_antiquotation\<close> \\ 1195 @{ML_antiquotation_def "undefined"} & : & \<open>ML_antiquotation\<close> \\ 1196 \end{matharray} 1197 1198 \<^descr> \<open>@{assert}\<close> inlines a function @{ML_type "bool -> unit"} that raises @{ML 1199 Fail} if the argument is @{ML false}. Due to inlining the source position of 1200 failed assertions is included in the error output. 1201 1202 \<^descr> \<open>@{undefined}\<close> inlines @{verbatim raise}~@{ML Match}, i.e.\ the ML program 1203 behaves as in some function application of an undefined case. 1204\<close> 1205 1206text %mlex \<open> 1207 The ML function @{ML undefined} is defined in \<^file>\<open>~~/src/Pure/library.ML\<close> 1208 as follows: 1209\<close> 1210 1211ML \<open>fun undefined _ = raise Match\<close> 1212 1213text \<open> 1214 \<^medskip> 1215 The following variant uses the antiquotation @{ML_antiquotation undefined} 1216 instead: 1217\<close> 1218 1219ML \<open>fun undefined _ = @{undefined}\<close> 1220 1221text \<open> 1222 \<^medskip> 1223 Here is the same, using control-symbol notation for the antiquotation, with 1224 special rendering of \<^verbatim>\<open>\<^undefined>\<close>: 1225\<close> 1226 1227ML \<open>fun undefined _ = \<^undefined>\<close> 1228 1229text \<open> 1230 \medskip Semantically, all forms are equivalent to a function definition 1231 without any clauses, but that is syntactically not allowed in ML. 1232\<close> 1233 1234 1235section \<open>Strings of symbols \label{sec:symbols}\<close> 1236 1237text \<open> 1238 A \<^emph>\<open>symbol\<close> constitutes the smallest textual unit in Isabelle/ML --- raw ML 1239 characters are normally not encountered at all. Isabelle strings consist of 1240 a sequence of symbols, represented as a packed string or an exploded list of 1241 strings. Each symbol is in itself a small string, which has either one of 1242 the following forms: 1243 1244 \<^enum> a single ASCII character ``\<open>c\<close>'', for example ``\<^verbatim>\<open>a\<close>'', 1245 1246 \<^enum> a codepoint according to UTF-8 (non-ASCII byte sequence), 1247 1248 \<^enum> a regular symbol ``\<^verbatim>\<open>\<ident>\<close>'', for example ``\<^verbatim>\<open>\<alpha>\<close>'', 1249 1250 \<^enum> a control symbol ``\<^verbatim>\<open>\<^ident>\<close>'', for example ``\<^verbatim>\<open>\<^bold>\<close>'', 1251 1252 The \<open>ident\<close> syntax for symbol names is \<open>letter (letter | digit)\<^sup>*\<close>, where 1253 \<open>letter = A..Za..z\<close> and \<open>digit = 0..9\<close>. There are infinitely many regular 1254 symbols and control symbols, but a fixed collection of standard symbols is 1255 treated specifically. For example, ``\<^verbatim>\<open>\<alpha>\<close>'' is classified as a letter, which 1256 means it may occur within regular Isabelle identifiers. 1257 1258 The character set underlying Isabelle symbols is 7-bit ASCII, but 8-bit 1259 character sequences are passed-through unchanged. Unicode/UCS data in UTF-8 1260 encoding is processed in a non-strict fashion, such that well-formed code 1261 sequences are recognized accordingly. Unicode provides its own collection of 1262 mathematical symbols, but within the core Isabelle/ML world there is no link 1263 to the standard collection of Isabelle regular symbols. 1264 1265 \<^medskip> 1266 Output of Isabelle symbols depends on the print mode. For example, the 1267 standard {\LaTeX} setup of the Isabelle document preparation system would 1268 present ``\<^verbatim>\<open>\<alpha>\<close>'' as \<open>\<alpha>\<close>, and ``\<^verbatim>\<open>\<^bold>\<alpha>\<close>'' as \<open>\<^bold>\<alpha>\<close>. On-screen rendering usually 1269 works by mapping a finite subset of Isabelle symbols to suitable Unicode 1270 characters. 1271\<close> 1272 1273text %mlref \<open> 1274 \begin{mldecls} 1275 @{index_ML_type "Symbol.symbol": string} \\ 1276 @{index_ML Symbol.explode: "string -> Symbol.symbol list"} \\ 1277 @{index_ML Symbol.is_letter: "Symbol.symbol -> bool"} \\ 1278 @{index_ML Symbol.is_digit: "Symbol.symbol -> bool"} \\ 1279 @{index_ML Symbol.is_quasi: "Symbol.symbol -> bool"} \\ 1280 @{index_ML Symbol.is_blank: "Symbol.symbol -> bool"} \\ 1281 \end{mldecls} 1282 \begin{mldecls} 1283 @{index_ML_type "Symbol.sym"} \\ 1284 @{index_ML Symbol.decode: "Symbol.symbol -> Symbol.sym"} \\ 1285 \end{mldecls} 1286 1287 \<^descr> Type @{ML_type "Symbol.symbol"} represents individual Isabelle symbols. 1288 1289 \<^descr> @{ML "Symbol.explode"}~\<open>str\<close> produces a symbol list from the packed form. 1290 This function supersedes @{ML "String.explode"} for virtually all purposes 1291 of manipulating text in Isabelle!\<^footnote>\<open>The runtime overhead for exploded strings 1292 is mainly that of the list structure: individual symbols that happen to be a 1293 singleton string do not require extra memory in Poly/ML.\<close> 1294 1295 \<^descr> @{ML "Symbol.is_letter"}, @{ML "Symbol.is_digit"}, @{ML 1296 "Symbol.is_quasi"}, @{ML "Symbol.is_blank"} classify standard symbols 1297 according to fixed syntactic conventions of Isabelle, cf.\ @{cite 1298 "isabelle-isar-ref"}. 1299 1300 \<^descr> Type @{ML_type "Symbol.sym"} is a concrete datatype that represents the 1301 different kinds of symbols explicitly, with constructors @{ML 1302 "Symbol.Char"}, @{ML "Symbol.UTF8"}, @{ML "Symbol.Sym"}, @{ML 1303 "Symbol.Control"}, @{ML "Symbol.Malformed"}. 1304 1305 \<^descr> @{ML "Symbol.decode"} converts the string representation of a symbol into 1306 the datatype version. 1307\<close> 1308 1309paragraph \<open>Historical note.\<close> 1310text \<open> 1311 In the original SML90 standard the primitive ML type @{ML_type char} did not 1312 exists, and @{ML_text "explode: string -> string list"} produced a list of 1313 singleton strings like @{ML "raw_explode: string -> string list"} in 1314 Isabelle/ML today. When SML97 came out, Isabelle did not adopt its somewhat 1315 anachronistic 8-bit or 16-bit characters, but the idea of exploding a string 1316 into a list of small strings was extended to ``symbols'' as explained above. 1317 Thus Isabelle sources can refer to an infinite store of user-defined 1318 symbols, without having to worry about the multitude of Unicode encodings 1319 that have emerged over the years. 1320\<close> 1321 1322 1323section \<open>Basic data types\<close> 1324 1325text \<open> 1326 The basis library proposal of SML97 needs to be treated with caution. Many 1327 of its operations simply do not fit with important Isabelle/ML conventions 1328 (like ``canonical argument order'', see 1329 \secref{sec:canonical-argument-order}), others cause problems with the 1330 parallel evaluation model of Isabelle/ML (such as @{ML TextIO.print} or @{ML 1331 OS.Process.system}). 1332 1333 Subsequently we give a brief overview of important operations on basic ML 1334 data types. 1335\<close> 1336 1337 1338subsection \<open>Characters\<close> 1339 1340text %mlref \<open> 1341 \begin{mldecls} 1342 @{index_ML_type char} \\ 1343 \end{mldecls} 1344 1345 \<^descr> Type @{ML_type char} is \<^emph>\<open>not\<close> used. The smallest textual unit in Isabelle 1346 is represented as a ``symbol'' (see \secref{sec:symbols}). 1347\<close> 1348 1349 1350subsection \<open>Strings\<close> 1351 1352text %mlref \<open> 1353 \begin{mldecls} 1354 @{index_ML_type string} \\ 1355 \end{mldecls} 1356 1357 \<^descr> Type @{ML_type string} represents immutable vectors of 8-bit characters. 1358 There are operations in SML to convert back and forth to actual byte 1359 vectors, which are seldom used. 1360 1361 This historically important raw text representation is used for 1362 Isabelle-specific purposes with the following implicit substructures packed 1363 into the string content: 1364 1365 \<^enum> sequence of Isabelle symbols (see also \secref{sec:symbols}), with @{ML 1366 Symbol.explode} as key operation; 1367 1368 \<^enum> XML tree structure via YXML (see also @{cite "isabelle-system"}), with 1369 @{ML YXML.parse_body} as key operation. 1370 1371 Note that Isabelle/ML string literals may refer Isabelle symbols like 1372 ``\<^verbatim>\<open>\<alpha>\<close>'' natively, \<^emph>\<open>without\<close> escaping the backslash. This is a consequence 1373 of Isabelle treating all source text as strings of symbols, instead of raw 1374 characters. 1375\<close> 1376 1377text %mlex \<open> 1378 The subsequent example illustrates the difference of physical addressing of 1379 bytes versus logical addressing of symbols in Isabelle strings. 1380\<close> 1381 1382ML_val \<open> 1383 val s = "\<A>"; 1384 1385 @{assert} (length (Symbol.explode s) = 1); 1386 @{assert} (size s = 4); 1387\<close> 1388 1389text \<open> 1390 Note that in Unicode renderings of the symbol \<open>\<A>\<close>, variations of encodings 1391 like UTF-8 or UTF-16 pose delicate questions about the multi-byte 1392 representations of its codepoint, which is outside of the 16-bit address 1393 space of the original Unicode standard from the 1990-ies. In Isabelle/ML it 1394 is just ``\<^verbatim>\<open>\<A>\<close>'' literally, using plain ASCII characters beyond any 1395 doubts. 1396\<close> 1397 1398 1399subsection \<open>Integers\<close> 1400 1401text %mlref \<open> 1402 \begin{mldecls} 1403 @{index_ML_type int} \\ 1404 \end{mldecls} 1405 1406 \<^descr> Type @{ML_type int} represents regular mathematical integers, which are 1407 \<^emph>\<open>unbounded\<close>. Overflow is treated properly, but should never happen in 1408 practice.\<^footnote>\<open>The size limit for integer bit patterns in memory is 64\,MB for 1409 32-bit Poly/ML, and much higher for 64-bit systems.\<close> 1410 1411 Structure @{ML_structure IntInf} of SML97 is obsolete and superseded by 1412 @{ML_structure Int}. Structure @{ML_structure Integer} in 1413 \<^file>\<open>~~/src/Pure/General/integer.ML\<close> provides some additional operations. 1414\<close> 1415 1416 1417subsection \<open>Rational numbers\<close> 1418 1419text %mlref \<open> 1420 \begin{mldecls} 1421 @{index_ML_type Rat.rat} \\ 1422 \end{mldecls} 1423 1424 \<^descr> Type @{ML_type Rat.rat} represents rational numbers, based on the 1425 unbounded integers of Poly/ML. 1426 1427 Literal rationals may be written with special antiquotation syntax 1428 \<^verbatim>\<open>@\<close>\<open>int\<close>\<^verbatim>\<open>/\<close>\<open>nat\<close> or \<^verbatim>\<open>@\<close>\<open>int\<close> (without any white space). For example 1429 \<^verbatim>\<open>@~1/4\<close> or \<^verbatim>\<open>@10\<close>. The ML toplevel pretty printer uses the same format. 1430 1431 Standard operations are provided via ad-hoc overloading of \<^verbatim>\<open>+\<close>, \<^verbatim>\<open>-\<close>, \<^verbatim>\<open>*\<close>, 1432 \<^verbatim>\<open>/\<close>, etc. 1433\<close> 1434 1435 1436subsection \<open>Time\<close> 1437 1438text %mlref \<open> 1439 \begin{mldecls} 1440 @{index_ML_type Time.time} \\ 1441 @{index_ML seconds: "real -> Time.time"} \\ 1442 \end{mldecls} 1443 1444 \<^descr> Type @{ML_type Time.time} represents time abstractly according to the 1445 SML97 basis library definition. This is adequate for internal ML operations, 1446 but awkward in concrete time specifications. 1447 1448 \<^descr> @{ML seconds}~\<open>s\<close> turns the concrete scalar \<open>s\<close> (measured in seconds) into 1449 an abstract time value. Floating point numbers are easy to use as 1450 configuration options in the context (see \secref{sec:config-options}) or 1451 system options that are maintained externally. 1452\<close> 1453 1454 1455subsection \<open>Options\<close> 1456 1457text %mlref \<open> 1458 \begin{mldecls} 1459 @{index_ML Option.map: "('a -> 'b) -> 'a option -> 'b option"} \\ 1460 @{index_ML is_some: "'a option -> bool"} \\ 1461 @{index_ML is_none: "'a option -> bool"} \\ 1462 @{index_ML the: "'a option -> 'a"} \\ 1463 @{index_ML these: "'a list option -> 'a list"} \\ 1464 @{index_ML the_list: "'a option -> 'a list"} \\ 1465 @{index_ML the_default: "'a -> 'a option -> 'a"} \\ 1466 \end{mldecls} 1467\<close> 1468 1469text \<open> 1470 Apart from @{ML Option.map} most other operations defined in structure 1471 @{ML_structure Option} are alien to Isabelle/ML and never used. The 1472 operations shown above are defined in \<^file>\<open>~~/src/Pure/General/basics.ML\<close>. 1473\<close> 1474 1475 1476subsection \<open>Lists\<close> 1477 1478text \<open> 1479 Lists are ubiquitous in ML as simple and light-weight ``collections'' for 1480 many everyday programming tasks. Isabelle/ML provides important additions 1481 and improvements over operations that are predefined in the SML97 library. 1482\<close> 1483 1484text %mlref \<open> 1485 \begin{mldecls} 1486 @{index_ML cons: "'a -> 'a list -> 'a list"} \\ 1487 @{index_ML member: "('b * 'a -> bool) -> 'a list -> 'b -> bool"} \\ 1488 @{index_ML insert: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\ 1489 @{index_ML remove: "('b * 'a -> bool) -> 'b -> 'a list -> 'a list"} \\ 1490 @{index_ML update: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\ 1491 \end{mldecls} 1492 1493 \<^descr> @{ML cons}~\<open>x xs\<close> evaluates to \<open>x :: xs\<close>. 1494 1495 Tupled infix operators are a historical accident in Standard ML. The curried 1496 @{ML cons} amends this, but it should be only used when partial application 1497 is required. 1498 1499 \<^descr> @{ML member}, @{ML insert}, @{ML remove}, @{ML update} treat lists as a 1500 set-like container that maintains the order of elements. See 1501 \<^file>\<open>~~/src/Pure/library.ML\<close> for the full specifications (written in ML). 1502 There are some further derived operations like @{ML union} or @{ML inter}. 1503 1504 Note that @{ML insert} is conservative about elements that are already a 1505 @{ML member} of the list, while @{ML update} ensures that the latest entry 1506 is always put in front. The latter discipline is often more appropriate in 1507 declarations of context data (\secref{sec:context-data}) that are issued by 1508 the user in Isar source: later declarations take precedence over earlier 1509 ones. \<close> 1510 1511text %mlex \<open> 1512 Using canonical @{ML fold} together with @{ML cons} (or similar standard 1513 operations) alternates the orientation of data. The is quite natural and 1514 should not be altered forcible by inserting extra applications of @{ML rev}. 1515 The alternative @{ML fold_rev} can be used in the few situations, where 1516 alternation should be prevented. 1517\<close> 1518 1519ML_val \<open> 1520 val items = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]; 1521 1522 val list1 = fold cons items []; 1523 @{assert} (list1 = rev items); 1524 1525 val list2 = fold_rev cons items []; 1526 @{assert} (list2 = items); 1527\<close> 1528 1529text \<open> 1530 The subsequent example demonstrates how to \<^emph>\<open>merge\<close> two lists in a natural 1531 way. 1532\<close> 1533 1534ML_val \<open> 1535 fun merge_lists eq (xs, ys) = fold_rev (insert eq) ys xs; 1536\<close> 1537 1538text \<open> 1539 Here the first list is treated conservatively: only the new elements from 1540 the second list are inserted. The inside-out order of insertion via @{ML 1541 fold_rev} attempts to preserve the order of elements in the result. 1542 1543 This way of merging lists is typical for context data 1544 (\secref{sec:context-data}). See also @{ML merge} as defined in 1545 \<^file>\<open>~~/src/Pure/library.ML\<close>. 1546\<close> 1547 1548 1549subsection \<open>Association lists\<close> 1550 1551text \<open> 1552 The operations for association lists interpret a concrete list of pairs as a 1553 finite function from keys to values. Redundant representations with multiple 1554 occurrences of the same key are implicitly normalized: lookup and update 1555 only take the first occurrence into account. 1556\<close> 1557 1558text \<open> 1559 \begin{mldecls} 1560 @{index_ML AList.lookup: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> 'c option"} \\ 1561 @{index_ML AList.defined: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> bool"} \\ 1562 @{index_ML AList.update: "('a * 'a -> bool) -> 'a * 'b -> ('a * 'b) list -> ('a * 'b) list"} \\ 1563 \end{mldecls} 1564 1565 \<^descr> @{ML AList.lookup}, @{ML AList.defined}, @{ML AList.update} implement the 1566 main ``framework operations'' for mappings in Isabelle/ML, following 1567 standard conventions for their names and types. 1568 1569 Note that a function called \<^verbatim>\<open>lookup\<close> is obliged to express its partiality 1570 via an explicit option element. There is no choice to raise an exception, 1571 without changing the name to something like \<open>the_element\<close> or \<open>get\<close>. 1572 1573 The \<open>defined\<close> operation is essentially a contraction of @{ML is_some} and 1574 \<^verbatim>\<open>lookup\<close>, but this is sufficiently frequent to justify its independent 1575 existence. This also gives the implementation some opportunity for peep-hole 1576 optimization. 1577 1578 1579 Association lists are adequate as simple implementation of finite mappings 1580 in many practical situations. A more advanced table structure is defined in 1581 \<^file>\<open>~~/src/Pure/General/table.ML\<close>; that version scales easily to thousands or 1582 millions of elements. 1583\<close> 1584 1585 1586subsection \<open>Unsynchronized references\<close> 1587 1588text %mlref \<open> 1589 \begin{mldecls} 1590 @{index_ML_type "'a Unsynchronized.ref"} \\ 1591 @{index_ML Unsynchronized.ref: "'a -> 'a Unsynchronized.ref"} \\ 1592 @{index_ML "!": "'a Unsynchronized.ref -> 'a"} \\ 1593 @{index_ML_op ":=": "'a Unsynchronized.ref * 'a -> unit"} \\ 1594 \end{mldecls} 1595\<close> 1596 1597text \<open> 1598 Due to ubiquitous parallelism in Isabelle/ML (see also 1599 \secref{sec:multi-threading}), the mutable reference cells of Standard ML 1600 are notorious for causing problems. In a highly parallel system, both 1601 correctness \<^emph>\<open>and\<close> performance are easily degraded when using mutable data. 1602 1603 The unwieldy name of @{ML Unsynchronized.ref} for the constructor for 1604 references in Isabelle/ML emphasizes the inconveniences caused by 1605 mutability. Existing operations @{ML "!"} and @{ML_op ":="} are unchanged, 1606 but should be used with special precautions, say in a strictly local 1607 situation that is guaranteed to be restricted to sequential evaluation --- 1608 now and in the future. 1609 1610 \begin{warn} 1611 Never @{ML_text "open Unsynchronized"}, not even in a local scope! 1612 Pretending that mutable state is no problem is a very bad idea. 1613 \end{warn} 1614\<close> 1615 1616 1617section \<open>Thread-safe programming \label{sec:multi-threading}\<close> 1618 1619text \<open> 1620 Multi-threaded execution has become an everyday reality in Isabelle since 1621 Poly/ML 5.2.1 and Isabelle2008. Isabelle/ML provides implicit and explicit 1622 parallelism by default, and there is no way for user-space tools to ``opt 1623 out''. ML programs that are purely functional, output messages only via the 1624 official channels (\secref{sec:message-channels}), and do not intercept 1625 interrupts (\secref{sec:exceptions}) can participate in the multi-threaded 1626 environment immediately without further ado. 1627 1628 More ambitious tools with more fine-grained interaction with the environment 1629 need to observe the principles explained below. 1630\<close> 1631 1632 1633subsection \<open>Multi-threading with shared memory\<close> 1634 1635text \<open> 1636 Multiple threads help to organize advanced operations of the system, such as 1637 real-time conditions on command transactions, sub-components with explicit 1638 communication, general asynchronous interaction etc. Moreover, parallel 1639 evaluation is a prerequisite to make adequate use of the CPU resources that 1640 are available on multi-core systems.\<^footnote>\<open>Multi-core computing does not mean 1641 that there are ``spare cycles'' to be wasted. It means that the continued 1642 exponential speedup of CPU performance due to ``Moore's Law'' follows 1643 different rules: clock frequency has reached its peak around 2005, and 1644 applications need to be parallelized in order to avoid a perceived loss of 1645 performance. See also @{cite "Sutter:2005"}.\<close> 1646 1647 Isabelle/Isar exploits the inherent structure of theories and proofs to 1648 support \<^emph>\<open>implicit parallelism\<close> to a large extent. LCF-style theorem proving 1649 provides almost ideal conditions for that, see also @{cite "Wenzel:2009"}. 1650 This means, significant parts of theory and proof checking is parallelized 1651 by default. In Isabelle2013, a maximum speedup-factor of 3.5 on 4 cores and 1652 6.5 on 8 cores can be expected @{cite "Wenzel:2013:ITP"}. 1653 1654 \<^medskip> 1655 ML threads lack the memory protection of separate processes, and operate 1656 concurrently on shared heap memory. This has the advantage that results of 1657 independent computations are directly available to other threads: abstract 1658 values can be passed without copying or awkward serialization that is 1659 typically required for separate processes. 1660 1661 To make shared-memory multi-threading work robustly and efficiently, some 1662 programming guidelines need to be observed. While the ML system is 1663 responsible to maintain basic integrity of the representation of ML values 1664 in memory, the application programmer needs to ensure that multi-threaded 1665 execution does not break the intended semantics. 1666 1667 \begin{warn} 1668 To participate in implicit parallelism, tools need to be thread-safe. A 1669 single ill-behaved tool can affect the stability and performance of the 1670 whole system. 1671 \end{warn} 1672 1673 Apart from observing the principles of thread-safeness passively, advanced 1674 tools may also exploit parallelism actively, e.g.\ by using library 1675 functions for parallel list operations (\secref{sec:parlist}). 1676 1677 \begin{warn} 1678 Parallel computing resources are managed centrally by the Isabelle/ML 1679 infrastructure. User programs should not fork their own ML threads to 1680 perform heavy computations. 1681 \end{warn} 1682\<close> 1683 1684 1685subsection \<open>Critical shared resources\<close> 1686 1687text \<open> 1688 Thread-safeness is mainly concerned about concurrent read/write access to 1689 shared resources, which are outside the purely functional world of ML. This 1690 covers the following in particular. 1691 1692 \<^item> Global references (or arrays), i.e.\ mutable memory cells that persist 1693 over several invocations of associated operations.\<^footnote>\<open>This is independent of 1694 the visibility of such mutable values in the toplevel scope.\<close> 1695 1696 \<^item> Global state of the running Isabelle/ML process, i.e.\ raw I/O channels, 1697 environment variables, current working directory. 1698 1699 \<^item> Writable resources in the file-system that are shared among different 1700 threads or external processes. 1701 1702 Isabelle/ML provides various mechanisms to avoid critical shared resources 1703 in most situations. As last resort there are some mechanisms for explicit 1704 synchronization. The following guidelines help to make Isabelle/ML programs 1705 work smoothly in a concurrent environment. 1706 1707 \<^item> Avoid global references altogether. Isabelle/Isar maintains a uniform 1708 context that incorporates arbitrary data declared by user programs 1709 (\secref{sec:context-data}). This context is passed as plain value and user 1710 tools can get/map their own data in a purely functional manner. 1711 Configuration options within the context (\secref{sec:config-options}) 1712 provide simple drop-in replacements for historic reference variables. 1713 1714 \<^item> Keep components with local state information re-entrant. Instead of poking 1715 initial values into (private) global references, a new state record can be 1716 created on each invocation, and passed through any auxiliary functions of 1717 the component. The state record contain mutable references in special 1718 situations, without requiring any synchronization, as long as each 1719 invocation gets its own copy and the tool itself is single-threaded. 1720 1721 \<^item> Avoid raw output on \<open>stdout\<close> or \<open>stderr\<close>. The Poly/ML library is 1722 thread-safe for each individual output operation, but the ordering of 1723 parallel invocations is arbitrary. This means raw output will appear on some 1724 system console with unpredictable interleaving of atomic chunks. 1725 1726 Note that this does not affect regular message output channels 1727 (\secref{sec:message-channels}). An official message id is associated with 1728 the command transaction from where it originates, independently of other 1729 transactions. This means each running Isar command has effectively its own 1730 set of message channels, and interleaving can only happen when commands use 1731 parallelism internally (and only at message boundaries). 1732 1733 \<^item> Treat environment variables and the current working directory of the 1734 running process as read-only. 1735 1736 \<^item> Restrict writing to the file-system to unique temporary files. Isabelle 1737 already provides a temporary directory that is unique for the running 1738 process, and there is a centralized source of unique serial numbers in 1739 Isabelle/ML. Thus temporary files that are passed to to some external 1740 process will be always disjoint, and thus thread-safe. 1741\<close> 1742 1743text %mlref \<open> 1744 \begin{mldecls} 1745 @{index_ML File.tmp_path: "Path.T -> Path.T"} \\ 1746 @{index_ML serial_string: "unit -> string"} \\ 1747 \end{mldecls} 1748 1749 \<^descr> @{ML File.tmp_path}~\<open>path\<close> relocates the base component of \<open>path\<close> into the 1750 unique temporary directory of the running Isabelle/ML process. 1751 1752 \<^descr> @{ML serial_string}~\<open>()\<close> creates a new serial number that is unique over 1753 the runtime of the Isabelle/ML process. 1754\<close> 1755 1756text %mlex \<open> 1757 The following example shows how to create unique temporary file names. 1758\<close> 1759 1760ML_val \<open> 1761 val tmp1 = File.tmp_path (Path.basic ("foo" ^ serial_string ())); 1762 val tmp2 = File.tmp_path (Path.basic ("foo" ^ serial_string ())); 1763 @{assert} (tmp1 <> tmp2); 1764\<close> 1765 1766 1767subsection \<open>Explicit synchronization\<close> 1768 1769text \<open> 1770 Isabelle/ML provides explicit synchronization for mutable variables over 1771 immutable data, which may be updated atomically and exclusively. This 1772 addresses the rare situations where mutable shared resources are really 1773 required. Synchronization in Isabelle/ML is based on primitives of Poly/ML, 1774 which have been adapted to the specific assumptions of the concurrent 1775 Isabelle environment. User code should not break this abstraction, but stay 1776 within the confines of concurrent Isabelle/ML. 1777 1778 A \<^emph>\<open>synchronized variable\<close> is an explicit state component associated with 1779 mechanisms for locking and signaling. There are operations to await a 1780 condition, change the state, and signal the change to all other waiting 1781 threads. Synchronized access to the state variable is \<^emph>\<open>not\<close> re-entrant: 1782 direct or indirect nesting within the same thread will cause a deadlock! 1783\<close> 1784 1785text %mlref \<open> 1786 \begin{mldecls} 1787 @{index_ML_type "'a Synchronized.var"} \\ 1788 @{index_ML Synchronized.var: "string -> 'a -> 'a Synchronized.var"} \\ 1789 @{index_ML Synchronized.guarded_access: "'a Synchronized.var -> 1790 ('a -> ('b * 'a) option) -> 'b"} \\ 1791 \end{mldecls} 1792 1793 \<^descr> Type @{ML_type "'a Synchronized.var"} represents synchronized variables 1794 with state of type @{ML_type 'a}. 1795 1796 \<^descr> @{ML Synchronized.var}~\<open>name x\<close> creates a synchronized variable that is 1797 initialized with value \<open>x\<close>. The \<open>name\<close> is used for tracing. 1798 1799 \<^descr> @{ML Synchronized.guarded_access}~\<open>var f\<close> lets the function \<open>f\<close> operate 1800 within a critical section on the state \<open>x\<close> as follows: if \<open>f x\<close> produces 1801 @{ML NONE}, it continues to wait on the internal condition variable, 1802 expecting that some other thread will eventually change the content in a 1803 suitable manner; if \<open>f x\<close> produces @{ML SOME}~\<open>(y, x')\<close> it is satisfied and 1804 assigns the new state value \<open>x'\<close>, broadcasts a signal to all waiting threads 1805 on the associated condition variable, and returns the result \<open>y\<close>. 1806 1807 There are some further variants of the @{ML Synchronized.guarded_access} 1808 combinator, see \<^file>\<open>~~/src/Pure/Concurrent/synchronized.ML\<close> for details. 1809\<close> 1810 1811text %mlex \<open> 1812 The following example implements a counter that produces positive integers 1813 that are unique over the runtime of the Isabelle process: 1814\<close> 1815 1816ML_val \<open> 1817 local 1818 val counter = Synchronized.var "counter" 0; 1819 in 1820 fun next () = 1821 Synchronized.guarded_access counter 1822 (fn i => 1823 let val j = i + 1 1824 in SOME (j, j) end); 1825 end; 1826 1827 val a = next (); 1828 val b = next (); 1829 @{assert} (a <> b); 1830\<close> 1831 1832text \<open> 1833 \<^medskip> 1834 See \<^file>\<open>~~/src/Pure/Concurrent/mailbox.ML\<close> how to implement a mailbox as 1835 synchronized variable over a purely functional list. 1836\<close> 1837 1838 1839section \<open>Managed evaluation\<close> 1840 1841text \<open> 1842 Execution of Standard ML follows the model of strict functional evaluation 1843 with optional exceptions. Evaluation happens whenever some function is 1844 applied to (sufficiently many) arguments. The result is either an explicit 1845 value or an implicit exception. 1846 1847 \<^emph>\<open>Managed evaluation\<close> in Isabelle/ML organizes expressions and results to 1848 control certain physical side-conditions, to say more specifically when and 1849 how evaluation happens. For example, the Isabelle/ML library supports lazy 1850 evaluation with memoing, parallel evaluation via futures, asynchronous 1851 evaluation via promises, evaluation with time limit etc. 1852 1853 \<^medskip> 1854 An \<^emph>\<open>unevaluated expression\<close> is represented either as unit abstraction 1855 \<^verbatim>\<open>fn () => a\<close> of type \<^verbatim>\<open>unit -> 'a\<close> or as regular function \<^verbatim>\<open>fn a => b\<close> of 1856 type \<^verbatim>\<open>'a -> 'b\<close>. Both forms occur routinely, and special care is required 1857 to tell them apart --- the static type-system of SML is only of limited help 1858 here. 1859 1860 The first form is more intuitive: some combinator \<^verbatim>\<open>(unit -> 'a) -> 'a\<close> 1861 applies the given function to \<^verbatim>\<open>()\<close> to initiate the postponed evaluation 1862 process. The second form is more flexible: some combinator 1863 \<^verbatim>\<open>('a -> 'b) -> 'a -> 'b\<close> acts like a modified form of function application; 1864 several such combinators may be cascaded to modify a given function, before 1865 it is ultimately applied to some argument. 1866 1867 \<^medskip> 1868 \<^emph>\<open>Reified results\<close> make the disjoint sum of regular values versions 1869 exceptional situations explicit as ML datatype: \<open>'a result = Res of 'a | Exn 1870 of exn\<close>. This is typically used for administrative purposes, to store the 1871 overall outcome of an evaluation process. 1872 1873 \<^emph>\<open>Parallel exceptions\<close> aggregate reified results, such that multiple 1874 exceptions are digested as a collection in canonical form that identifies 1875 exceptions according to their original occurrence. This is particular 1876 important for parallel evaluation via futures \secref{sec:futures}, which 1877 are organized as acyclic graph of evaluations that depend on other 1878 evaluations: exceptions stemming from shared sub-graphs are exposed exactly 1879 once and in the order of their original occurrence (e.g.\ when printed at 1880 the toplevel). Interrupt counts as neutral element here: it is treated as 1881 minimal information about some canceled evaluation process, and is absorbed 1882 by the presence of regular program exceptions. 1883\<close> 1884 1885text %mlref \<open> 1886 \begin{mldecls} 1887 @{index_ML_type "'a Exn.result"} \\ 1888 @{index_ML Exn.capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\ 1889 @{index_ML Exn.interruptible_capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\ 1890 @{index_ML Exn.release: "'a Exn.result -> 'a"} \\ 1891 @{index_ML Par_Exn.release_all: "'a Exn.result list -> 'a list"} \\ 1892 @{index_ML Par_Exn.release_first: "'a Exn.result list -> 'a list"} \\ 1893 \end{mldecls} 1894 1895 \<^descr> Type @{ML_type "'a Exn.result"} represents the disjoint sum of ML results 1896 explicitly, with constructor @{ML Exn.Res} for regular values and @{ML 1897 "Exn.Exn"} for exceptions. 1898 1899 \<^descr> @{ML Exn.capture}~\<open>f x\<close> manages the evaluation of \<open>f x\<close> such that 1900 exceptions are made explicit as @{ML "Exn.Exn"}. Note that this includes 1901 physical interrupts (see also \secref{sec:exceptions}), so the same 1902 precautions apply to user code: interrupts must not be absorbed 1903 accidentally! 1904 1905 \<^descr> @{ML Exn.interruptible_capture} is similar to @{ML Exn.capture}, but 1906 interrupts are immediately re-raised as required for user code. 1907 1908 \<^descr> @{ML Exn.release}~\<open>result\<close> releases the original runtime result, exposing 1909 its regular value or raising the reified exception. 1910 1911 \<^descr> @{ML Par_Exn.release_all}~\<open>results\<close> combines results that were produced 1912 independently (e.g.\ by parallel evaluation). If all results are regular 1913 values, that list is returned. Otherwise, the collection of all exceptions 1914 is raised, wrapped-up as collective parallel exception. Note that the latter 1915 prevents access to individual exceptions by conventional \<^verbatim>\<open>handle\<close> of ML. 1916 1917 \<^descr> @{ML Par_Exn.release_first} is similar to @{ML Par_Exn.release_all}, but 1918 only the first (meaningful) exception that has occurred in the original 1919 evaluation process is raised again, the others are ignored. That single 1920 exception may get handled by conventional means in ML. 1921\<close> 1922 1923 1924subsection \<open>Parallel skeletons \label{sec:parlist}\<close> 1925 1926text \<open> 1927 Algorithmic skeletons are combinators that operate on lists in parallel, in 1928 the manner of well-known \<open>map\<close>, \<open>exists\<close>, \<open>forall\<close> etc. Management of 1929 futures (\secref{sec:futures}) and their results as reified exceptions is 1930 wrapped up into simple programming interfaces that resemble the sequential 1931 versions. 1932 1933 What remains is the application-specific problem to present expressions with 1934 suitable \<^emph>\<open>granularity\<close>: each list element corresponds to one evaluation 1935 task. If the granularity is too coarse, the available CPUs are not 1936 saturated. If it is too fine-grained, CPU cycles are wasted due to the 1937 overhead of organizing parallel processing. In the worst case, parallel 1938 performance will be less than the sequential counterpart! 1939\<close> 1940 1941text %mlref \<open> 1942 \begin{mldecls} 1943 @{index_ML Par_List.map: "('a -> 'b) -> 'a list -> 'b list"} \\ 1944 @{index_ML Par_List.get_some: "('a -> 'b option) -> 'a list -> 'b option"} \\ 1945 \end{mldecls} 1946 1947 \<^descr> @{ML Par_List.map}~\<open>f [x\<^sub>1, \<dots>, x\<^sub>n]\<close> is like @{ML "map"}~\<open>f [x\<^sub>1, \<dots>, 1948 x\<^sub>n]\<close>, but the evaluation of \<open>f x\<^sub>i\<close> for \<open>i = 1, \<dots>, n\<close> is performed in 1949 parallel. 1950 1951 An exception in any \<open>f x\<^sub>i\<close> cancels the overall evaluation process. The 1952 final result is produced via @{ML Par_Exn.release_first} as explained above, 1953 which means the first program exception that happened to occur in the 1954 parallel evaluation is propagated, and all other failures are ignored. 1955 1956 \<^descr> @{ML Par_List.get_some}~\<open>f [x\<^sub>1, \<dots>, x\<^sub>n]\<close> produces some \<open>f x\<^sub>i\<close> that is of 1957 the form \<open>SOME y\<^sub>i\<close>, if that exists, otherwise \<open>NONE\<close>. Thus it is similar to 1958 @{ML Library.get_first}, but subject to a non-deterministic parallel choice 1959 process. The first successful result cancels the overall evaluation process; 1960 other exceptions are propagated as for @{ML Par_List.map}. 1961 1962 This generic parallel choice combinator is the basis for derived forms, such 1963 as @{ML Par_List.find_some}, @{ML Par_List.exists}, @{ML Par_List.forall}. 1964\<close> 1965 1966text %mlex \<open> 1967 Subsequently, the Ackermann function is evaluated in parallel for some 1968 ranges of arguments. 1969\<close> 1970 1971ML_val \<open> 1972 fun ackermann 0 n = n + 1 1973 | ackermann m 0 = ackermann (m - 1) 1 1974 | ackermann m n = ackermann (m - 1) (ackermann m (n - 1)); 1975 1976 Par_List.map (ackermann 2) (500 upto 1000); 1977 Par_List.map (ackermann 3) (5 upto 10); 1978\<close> 1979 1980 1981subsection \<open>Lazy evaluation\<close> 1982 1983text \<open> 1984 Classic lazy evaluation works via the \<open>lazy\<close>~/ \<open>force\<close> pair of operations: 1985 \<open>lazy\<close> to wrap an unevaluated expression, and \<open>force\<close> to evaluate it once 1986 and store its result persistently. Later invocations of \<open>force\<close> retrieve the 1987 stored result without another evaluation. Isabelle/ML refines this idea to 1988 accommodate the aspects of multi-threading, synchronous program exceptions 1989 and asynchronous interrupts. 1990 1991 The first thread that invokes \<open>force\<close> on an unfinished lazy value changes 1992 its state into a \<^emph>\<open>promise\<close> of the eventual result and starts evaluating it. 1993 Any other threads that \<open>force\<close> the same lazy value in the meantime need to 1994 wait for it to finish, by producing a regular result or program exception. 1995 If the evaluation attempt is interrupted, this event is propagated to all 1996 waiting threads and the lazy value is reset to its original state. 1997 1998 This means a lazy value is completely evaluated at most once, in a 1999 thread-safe manner. There might be multiple interrupted evaluation attempts, 2000 and multiple receivers of intermediate interrupt events. Interrupts are 2001 \<^emph>\<open>not\<close> made persistent: later evaluation attempts start again from the 2002 original expression. 2003\<close> 2004 2005text %mlref \<open> 2006 \begin{mldecls} 2007 @{index_ML_type "'a lazy"} \\ 2008 @{index_ML Lazy.lazy: "(unit -> 'a) -> 'a lazy"} \\ 2009 @{index_ML Lazy.value: "'a -> 'a lazy"} \\ 2010 @{index_ML Lazy.force: "'a lazy -> 'a"} \\ 2011 \end{mldecls} 2012 2013 \<^descr> Type @{ML_type "'a lazy"} represents lazy values over type \<^verbatim>\<open>'a\<close>. 2014 2015 \<^descr> @{ML Lazy.lazy}~\<open>(fn () => e)\<close> wraps the unevaluated expression \<open>e\<close> as 2016 unfinished lazy value. 2017 2018 \<^descr> @{ML Lazy.value}~\<open>a\<close> wraps the value \<open>a\<close> as finished lazy value. When 2019 forced, it returns \<open>a\<close> without any further evaluation. 2020 2021 There is very low overhead for this proforma wrapping of strict values as 2022 lazy values. 2023 2024 \<^descr> @{ML Lazy.force}~\<open>x\<close> produces the result of the lazy value in a 2025 thread-safe manner as explained above. Thus it may cause the current thread 2026 to wait on a pending evaluation attempt by another thread. 2027\<close> 2028 2029 2030subsection \<open>Futures \label{sec:futures}\<close> 2031 2032text \<open> 2033 Futures help to organize parallel execution in a value-oriented manner, with 2034 \<open>fork\<close>~/ \<open>join\<close> as the main pair of operations, and some further variants; 2035 see also @{cite "Wenzel:2009" and "Wenzel:2013:ITP"}. Unlike lazy values, 2036 futures are evaluated strictly and spontaneously on separate worker threads. 2037 Futures may be canceled, which leads to interrupts on running evaluation 2038 attempts, and forces structurally related futures to fail for all time; 2039 already finished futures remain unchanged. Exceptions between related 2040 futures are propagated as well, and turned into parallel exceptions (see 2041 above). 2042 2043 Technically, a future is a single-assignment variable together with a 2044 \<^emph>\<open>task\<close> that serves administrative purposes, notably within the \<^emph>\<open>task 2045 queue\<close> where new futures are registered for eventual evaluation and the 2046 worker threads retrieve their work. 2047 2048 The pool of worker threads is limited, in correlation with the number of 2049 physical cores on the machine. Note that allocation of runtime resources may 2050 be distorted either if workers yield CPU time (e.g.\ via system sleep or 2051 wait operations), or if non-worker threads contend for significant runtime 2052 resources independently. There is a limited number of replacement worker 2053 threads that get activated in certain explicit wait conditions, after a 2054 timeout. 2055 2056 \<^medskip> 2057 Each future task belongs to some \<^emph>\<open>task group\<close>, which represents the 2058 hierarchic structure of related tasks, together with the exception status a 2059 that point. By default, the task group of a newly created future is a new 2060 sub-group of the presently running one, but it is also possible to indicate 2061 different group layouts under program control. 2062 2063 Cancellation of futures actually refers to the corresponding task group and 2064 all its sub-groups. Thus interrupts are propagated down the group hierarchy. 2065 Regular program exceptions are treated likewise: failure of the evaluation 2066 of some future task affects its own group and all sub-groups. Given a 2067 particular task group, its \<^emph>\<open>group status\<close> cumulates all relevant exceptions 2068 according to its position within the group hierarchy. Interrupted tasks that 2069 lack regular result information, will pick up parallel exceptions from the 2070 cumulative group status. 2071 2072 \<^medskip> 2073 A \<^emph>\<open>passive future\<close> or \<^emph>\<open>promise\<close> is a future with slightly different 2074 evaluation policies: there is only a single-assignment variable and some 2075 expression to evaluate for the \<^emph>\<open>failed\<close> case (e.g.\ to clean up resources 2076 when canceled). A regular result is produced by external means, using a 2077 separate \<^emph>\<open>fulfill\<close> operation. 2078 2079 Promises are managed in the same task queue, so regular futures may depend 2080 on them. This allows a form of reactive programming, where some promises are 2081 used as minimal elements (or guards) within the future dependency graph: 2082 when these promises are fulfilled the evaluation of subsequent futures 2083 starts spontaneously, according to their own inter-dependencies. 2084\<close> 2085 2086text %mlref \<open> 2087 \begin{mldecls} 2088 @{index_ML_type "'a future"} \\ 2089 @{index_ML Future.fork: "(unit -> 'a) -> 'a future"} \\ 2090 @{index_ML Future.forks: "Future.params -> (unit -> 'a) list -> 'a future list"} \\ 2091 @{index_ML Future.join: "'a future -> 'a"} \\ 2092 @{index_ML Future.joins: "'a future list -> 'a list"} \\ 2093 @{index_ML Future.value: "'a -> 'a future"} \\ 2094 @{index_ML Future.map: "('a -> 'b) -> 'a future -> 'b future"} \\ 2095 @{index_ML Future.cancel: "'a future -> unit"} \\ 2096 @{index_ML Future.cancel_group: "Future.group -> unit"} \\[0.5ex] 2097 @{index_ML Future.promise: "(unit -> unit) -> 'a future"} \\ 2098 @{index_ML Future.fulfill: "'a future -> 'a -> unit"} \\ 2099 \end{mldecls} 2100 2101 \<^descr> Type @{ML_type "'a future"} represents future values over type \<^verbatim>\<open>'a\<close>. 2102 2103 \<^descr> @{ML Future.fork}~\<open>(fn () => e)\<close> registers the unevaluated expression \<open>e\<close> 2104 as unfinished future value, to be evaluated eventually on the parallel 2105 worker-thread farm. This is a shorthand for @{ML Future.forks} below, with 2106 default parameters and a single expression. 2107 2108 \<^descr> @{ML Future.forks}~\<open>params exprs\<close> is the general interface to fork several 2109 futures simultaneously. The \<open>params\<close> consist of the following fields: 2110 2111 \<^item> \<open>name : string\<close> (default @{ML "\"\""}) specifies a common name for the 2112 tasks of the forked futures, which serves diagnostic purposes. 2113 2114 \<^item> \<open>group : Future.group option\<close> (default @{ML NONE}) specifies an optional 2115 task group for the forked futures. @{ML NONE} means that a new sub-group 2116 of the current worker-thread task context is created. If this is not a 2117 worker thread, the group will be a new root in the group hierarchy. 2118 2119 \<^item> \<open>deps : Future.task list\<close> (default @{ML "[]"}) specifies dependencies on 2120 other future tasks, i.e.\ the adjacency relation in the global task queue. 2121 Dependencies on already finished tasks are ignored. 2122 2123 \<^item> \<open>pri : int\<close> (default @{ML 0}) specifies a priority within the task 2124 queue. 2125 2126 Typically there is only little deviation from the default priority @{ML 2127 0}. As a rule of thumb, @{ML "~1"} means ``low priority" and @{ML 1} means 2128 ``high priority''. 2129 2130 Note that the task priority only affects the position in the queue, not 2131 the thread priority. When a worker thread picks up a task for processing, 2132 it runs with the normal thread priority to the end (or until canceled). 2133 Higher priority tasks that are queued later need to wait until this (or 2134 another) worker thread becomes free again. 2135 2136 \<^item> \<open>interrupts : bool\<close> (default @{ML true}) tells whether the worker thread 2137 that processes the corresponding task is initially put into interruptible 2138 state. This state may change again while running, by modifying the thread 2139 attributes. 2140 2141 With interrupts disabled, a running future task cannot be canceled. It is 2142 the responsibility of the programmer that this special state is retained 2143 only briefly. 2144 2145 \<^descr> @{ML Future.join}~\<open>x\<close> retrieves the value of an already finished future, 2146 which may lead to an exception, according to the result of its previous 2147 evaluation. 2148 2149 For an unfinished future there are several cases depending on the role of 2150 the current thread and the status of the future. A non-worker thread waits 2151 passively until the future is eventually evaluated. A worker thread 2152 temporarily changes its task context and takes over the responsibility to 2153 evaluate the future expression on the spot. The latter is done in a 2154 thread-safe manner: other threads that intend to join the same future need 2155 to wait until the ongoing evaluation is finished. 2156 2157 Note that excessive use of dynamic dependencies of futures by adhoc joining 2158 may lead to bad utilization of CPU cores, due to threads waiting on other 2159 threads to finish required futures. The future task farm has a limited 2160 amount of replacement threads that continue working on unrelated tasks after 2161 some timeout. 2162 2163 Whenever possible, static dependencies of futures should be specified 2164 explicitly when forked (see \<open>deps\<close> above). Thus the evaluation can work from 2165 the bottom up, without join conflicts and wait states. 2166 2167 \<^descr> @{ML Future.joins}~\<open>xs\<close> joins the given list of futures simultaneously, 2168 which is more efficient than @{ML "map Future.join"}~\<open>xs\<close>. 2169 2170 Based on the dependency graph of tasks, the current thread takes over the 2171 responsibility to evaluate future expressions that are required for the main 2172 result, working from the bottom up. Waiting on future results that are 2173 presently evaluated on other threads only happens as last resort, when no 2174 other unfinished futures are left over. 2175 2176 \<^descr> @{ML Future.value}~\<open>a\<close> wraps the value \<open>a\<close> as finished future value, 2177 bypassing the worker-thread farm. When joined, it returns \<open>a\<close> without any 2178 further evaluation. 2179 2180 There is very low overhead for this proforma wrapping of strict values as 2181 futures. 2182 2183 \<^descr> @{ML Future.map}~\<open>f x\<close> is a fast-path implementation of @{ML 2184 Future.fork}~\<open>(fn () => f (\<close>@{ML Future.join}~\<open>x))\<close>, which avoids the full 2185 overhead of the task queue and worker-thread farm as far as possible. The 2186 function \<open>f\<close> is supposed to be some trivial post-processing or projection of 2187 the future result. 2188 2189 \<^descr> @{ML Future.cancel}~\<open>x\<close> cancels the task group of the given future, using 2190 @{ML Future.cancel_group} below. 2191 2192 \<^descr> @{ML Future.cancel_group}~\<open>group\<close> cancels all tasks of the given task 2193 group for all time. Threads that are presently processing a task of the 2194 given group are interrupted: it may take some time until they are actually 2195 terminated. Tasks that are queued but not yet processed are dequeued and 2196 forced into interrupted state. Since the task group is itself invalidated, 2197 any further attempt to fork a future that belongs to it will yield a 2198 canceled result as well. 2199 2200 \<^descr> @{ML Future.promise}~\<open>abort\<close> registers a passive future with the given 2201 \<open>abort\<close> operation: it is invoked when the future task group is canceled. 2202 2203 \<^descr> @{ML Future.fulfill}~\<open>x a\<close> finishes the passive future \<open>x\<close> by the given 2204 value \<open>a\<close>. If the promise has already been canceled, the attempt to fulfill 2205 it causes an exception. 2206\<close> 2207 2208end 2209