Notes on HOL 4, Kananaskis-5 release

We are pleased to announce the release of the latest HOL4 system. We believe there are a number of exciting new features (runs on Poly/ML, looks prettier thanks to Unicode support), as well as the usual bug fixes.

A new version of the HOL4 theorem prover is also now available, called HOL-Omega (or HOL_ω). While backwards compatible, the HOL-Omega theorem prover implements a more powerful logic, capable of modelling concepts not expressible in normal higher order logic, such as monads and concepts from category theory. See the New versions section below for more.

New features
Bugs fixed
New theories
New tools
New examples
New versions
Incompatibilities

New features:

There is now a Poly/ML implementation of HOL4 available for users on platforms other than Windows. Users should build from the same source-tree, and have version 5.2 of Poly/ML installed. The build process is similar. Start with
```
         poly < tools/smart-configure.sml
```
Continue with
```
         bin/build
```
Most of the features of the Moscow ML implementation should work in the Poly/ML version, but as it is still rather young, users may encounter some rough edges. The advantage of the Poly/ML implementation is that it is considerably quicker than the Moscow ML implementation. Many thanks to Scott Owens for doing most of the work on this port.
Overloading can now target “syntactic patterns”, which are terms other than just constants. Typically a pattern will be a lambda-abstraction. For example, one can overload the symbol <> (meant to represent “not equal to”) to the term (\x y. ~(x = y)). The call would be
```
         overload_on ("<>", ``\x y. ~(x = y)``)
```
It is then possible to write terms such as ``<> a 3``. This will actually translate to the term ``~(a = 3)`` internally. Pretty-printing will reverse the transformation so that what was typed will also be printed. (One can check that the term really has been translated as desired with calls to term destructors like dest_term and strip_comb.)

Of course, in the case of <>, one would want this as an infix. To do this one would use set_fixity as usual. In this case,
```
         set_fixity "<>" (Infix(NONASSOC, 450))
```
would be appropriate. (See the Incompatibilities section below on the change to the associativity of grammar level 450.)

Finally, overloading can be type-specific. In this way, the default system now overloads the ASCII <=> to equality over the booleans. Various operators over strings now use the same technology as well (see below).

The system has syntactic patterns built-in for not-equal (as above), iff (the <=> symbol), not-iff (<=/=>), and not-an-element-of (NOTIN). The Unicode variants (see later item) for these are ≠, ⇔, ⇎, and ∉.
The :string type is now an alias for :char list, meaning that strings inherit all of the list operations “for free”. Existing string operators that are effectively duplicates of operations on lists (like STRLEN and STRCAT) are now type-specific overloads of the list constants.

Note that this means that the IMPLODE and EXPLODE functions are now identities, and that theorems that used to print as
```
         ⊢ STRLEN s = LENGTH (EXPLODE s)
```
are still true, but are now looping rewrites. (The STRLEN in the theorem is the same term as the LENGTH, and in fact the theorem (now called STRLEN_EXPLODE_THM) prints with STRLEN in both positions.)

The EXPLODE and IMPLODE constants are still useful if theorems are to be exported to ML, where strings are not the same type as lists of characters.
Types can now be numerals, as long as fcpTheory (the theory of finite cartesian products) is loaded. A numeral type has no interesting properties except that the cardinality of its universe has exactly the size given by the numeral. This implies that negative number types and the type :0 do not exist. The pretty-printing of numeral types can be turned off with the trace flag pp_num_types.

This removes the need for type abbreviations i32 and others, and also the function fcpLib.mk_index_type, which has been removed.
The finite cartesian product “array” type can now be written with square brackets rather than with the infix ** operator. This combines well with the numeric types above. For example, :bool[32] is a type of 32 booleans, and indeed is now the type used for what had been previously called word32. The pretty-printing of array types can be turned off with the trace flag pp_array_types.

Unfortunately, because of lists, at the term level one can not index arrays with square brackets. Instead, we recommend the infix ' operator (that’s an apostrophe, ASCII character #39). For example, array ' 3 is the value of the fourth element of array (indexing starts at zero!)
Errors in inductive relation definitions (made with Hol_reln) are now accompanied by better location information.
Errors in quotient type definitions (made with entry-points in quotientLib) are now accompanied by better diagnostics about problems like missing respectfulness results.
If HOL is started in a directory with INCLUDES specified in a Holmakefile, then those same includes are put onto the loadPath that is used interactively. This should help interactive debugging of multiple-directory developments.

If Holmake is run in a directory with INCLUDES specified in a Holmakefile, then it will call itself recursively in those directories before working in the current directory. This behaviour can be prevented by using the --no_prereqs option on the command-line, or by including NO_PREREQS as an OPTION in the Holmakefile.
The tautLib decision procedure for propositional logic now uses external SAT solvers (through the HolSatLib code) for all but the smallest goals, translating proofs back through the HOL kernel.
In the Emacs mode: provide a new option to the work-horse M-h M-r command: if you precede it by hitting C-u twice, it will toggle quietdec around what is to be pasted into the HOL session. This can be useful if you’re opening a whole slew of big theories and don’t want to have to watch grammar guff scroll by.
The termination prover under Define is now smarter. It now does a better job of guessing termination relations, using a relevance criterion to cut down the number of lexicographic combinations that are tried. It can handle DIV and MOD now, through the termination_simps variable, which can be added to in order to support other destructor functions.

Termination proofs for functions defined recursively over words are also supported. The system knows about subtracting 1 and also right-shifting.

Post processing for word parsing is supported with the function:

         wordsLib.inst_word_lengths : term -> term

When possible, word lengths are guessed for the extract and concatenate functions. That is, ``(a >< b) w`` is given type ``:(a+1-b) word`` and ``(a: a word) @@ (b: b word)`` is given type ``:(a + b) word``.

For example:

         - val _ = Parse.post_process_term := wordsLib.inst_word_lengths;
         - ``(3 >< 0) a @@ (7 >< 4) a @@ (16 >< 8) a``;
         <<HOL message: inventing new type variable names: 'a, 'b, 'c, 'd, 'e, 'f>>
         <<HOL message: assigning word length(s): 'a <- 4, 'b <- 13, 'c <- 17, 'e <- 4 and 'f <- 9>>
         > val it = ``(3 >< 0) a @@ (7 >< 4) a @@ (16 >< 8) a`` : term

The assignment message is controlled by the trace variable "notify word length guesses".

wordsLib now supports evaluation over non-standard word-sizes:
```
         - load "wordsLib";
         > val it = () : unit
         - EVAL ``123w + 321w:bool[56]``;
         > val it = |- 123w + 321w = 444w : thm
```
Non-standard word sizes will evaluate more slowly when first used. However, size theorems are then added to the compset, so subsequent evaluations will be quicker.
HOL now supports Unicode symbols as part of its parsing and printing. In addition, there are now appropriate Unicode symbols, such as ∀ and ∈ built in to the system. To enable their printing and parsing, you must set the trace variable Unicode. If this is done, you will see, for example
```
         - FORALL_DEF;
         > val it = |- $∀ = (λP. P = (λx. T)) : thm
```
If you turn Unicode on, and see gibberish when you print terms and theorems, your fonts and system are not Unicode and UTF-8 compatible. Emacs versions past 22 can be made to work, as can normal shells on MacOS and Linux. The latest versions of Emacs on Windows support UTF-8, and so running HOL inside Emacs on Windows gives a pleasant Unicode experience there too.

The Unicode trace controls the use of Unicode symbols that are set up as explicit alternatives for ASCII equivalents. To set up such an aliasing, use the function
```
         Unicode.unicode_version : {u:string,tmnm:string} -> unit
```
where the string u is the Unicode, and where tmnm is the name of the term being aliased. Note that you can also alias the names of syntactic patterns (see above).

You are also free to use the standard grammar manipulation functions to define pretty Unicode syntax that has no ASCII equivalent if you wish, and this can be done whether or not the Unicode trace is set.
A new function limit:
```
         limit : int -> simpset -> simpset
```
can be used to limit the number of rewrites a simpset applies. By default a simpset will allow the simplifier to apply as many rewrites as match, possibly going into an infinite loop thereby. The limit function puts a numeric limit on this, usually guaranteeing the simplifier’s termination (it will still loop if a user-provided conversion or decision procedure loops).
Operators (e.g. & and -) can now be both prefix and infix operators simultaneously. In particular, the subtraction operator (-) is now also a prefix in all theories descended from arithmetic. In order to use this as a negation, you must overload the string "numeric_negate", which is the abstract syntax name of the unary minus. Thus, in integerTheory, the line
```
         val _ = overload_on("numeric_negate", ``int_neg``)
```
sets up unary minus as a symbol for negation over the integers.

This change allows one to input terms such as ``-x + y`` even in the theory of natural numbers. In this context, the unary negation maps to a variable called numeric_negate, which is unlikely to be helpful. Luckily, the variable will likely be polymorphic and the warning about invention of type variables will serve as a clue that something dubious is being attempted.

In the existing numeric theories, we have kept ~ as a unary negation symbol as well (following SML’s example), but the unary minus is now the preferred printing form.

If you wish to add an existing binary operator as a prefix, it is important that you not use the set_fixity command to do it. This command will clear the grammar of your binary operator before adding the unary one. Better to use add_rule (for which, see the Reference Manual). This will also allow you to map the unary form to a different name. With subtraction, for example, the binary operator maps to the name "-", and the unary operator maps to "numeric_negate".

Finally, note that in the HOL language, unary minus and binary minus are inherently ambiguous. Is x-y the application of binary minus to arguments x and y, or is it the application of function x to argument -y? In this situation, the implementation of this feature treats these dual-purpose operators as infixes by default. In order to obtain the second reading above, one has to write x(-y).

(See below for discussion of the backwards incompatibility this causes to users of the words theory.)
It is now possible to use the simplifier to rewrite terms with respect to arbitrary pre-orders. The entry-point is the function
```
         simpLib.add_relsimp :
            {trans: thm, refl: thm, subsets: thm list,
             rewrs: thm list, weakenings: thm list} ->
            simpset -> simpset
```
where the trans and refl fields are the theorems stating that some relation is a pre-order. The weakenings field is a list of congruences justifying the switch from equality reasoning to reasoning over the pre-order. For example, all equivalences (≡, say) will have the following be true
```
         |- (M1 ≡ M2) ==> (N1 ≡ N2) ==> ((M1 ≡ N1) <=> (M2 ≡ N2))
```
With the above installed, when the simplifier sees a term of the form M1 ≡ N1 it will rewrite both sides using the relation ≡.

See the Description manual’s section on the simplifier for more.

Bugs fixed:

EVAL ``BIGUNION {}`` now works.
Fixed a subtle bug in termination condition extraction, whereby multiple let-bindings for a variable would cause Define to be unpredictable and sometimes broken. Now variables get renamed in order to make the definition process succeed.
ASM_MESON_TAC (which lives beneath PROVE_TAC) now properly pays attention to the controlling reference variable max_depth.
Fixed an error in the build process when building mlyacc from sources on Windows.

New theories:

A theory of Patricia trees, due to Anthony Fox.

New tools:

A new library HolSmtLib provides an oracle interface to external SMT solvers.
There are a number of new facilities in wordsLib. The main additions are:
WORD_ss:
Does some basic simplification, evaluation and AC rewriting (over *, +, && and !!). For example,
```
         ``a * 3w + a``  -->  ``4w * a``
```
and
```
         ``a && 3w !! a && 2w``  -->  ``3w && a``
```
BIT_ss:

For example, ``BIT n 3`` --> ``n IN {0; 1}``

WORD_MUL_LSL_ss:

Converts multiplications to left-shifts e.g. ``2w * a`` --> ``a << 1``

WORD_BIT_EQ_ss:

Can be used to establish bit-wise (in)equality e.g. ``a && ~a = 0w`` --> ``T`` Does not work with *, + etc.

WORD_ARITH_EQ_ss:

Can be used to establish arithmetic (in)equality e.g. ``~(b + 1w = b + 4294967295w:word32)`` --> ``T``

WORD_EXTRACT_ss:

Simplification for shifts and bit extraction. Simplifies --, w2w, sw2sw, #>>, @@ etc. and expresses operations using ><, << and !!.

WORD_DECIDE:

A decision procedure. Will solve Boolean (bitwise) problems and some problems over <, <+ etc.

New versions:

The HOL-Omega or HOL_ω system presents a more powerful version of the widely used HOL theorem prover. This system implements a new logic, which is an extension of the existing higher order logic of HOL4. The logic is extended to three levels, adding kinds to the existing levels of types and terms. New types include type operator variables and universal types as in System F. Impredicativity is avoided through the stratification of types by ranks according to the depth of universal types. The HOL-Omega system is a merging of HOL4, HOL2P by Völker, and major aspects of System F_ω from chapter 30 of Types and Programming Languages by Pierce. As in HOL4, HOL-Omega was constructed according to the design principles of the LCF approach, for the highest degree of soundness.

The HOL-Omega system is described in the paper The HOL-Omega Logic, by Homeier, P.V., in Proceedings of TPHOLs 2009, LNCS vol. 5674, pp. 244-259. The paper presents the abstract syntax and semantics for the ranks, kinds, types, and terms of the logic, as well as the new fundamental axioms and rules of inference. It also presents examples of using the HOL-Omega system to model monads and concepts from category theory such as functors and natural transformations. As an example, here is the definition of functors, as a type abbreviation and a term predicate:
```
   (*---------------------------------------------------------------------------
               Functor type abbreviation
    ---------------------------------------------------------------------------*)

   val _ = type_abbrev ("functor", Type `: \'F. !'a 'b. ('a -> 'b) -> 'a 'F -> 'b 'F`);

   (*---------------------------------------------------------------------------
               Functor predicate
    ---------------------------------------------------------------------------*)

   val functor_def = new_definition("functor_def", Term
      `functor (F': 'F functor) =
         (* Identity *)
             (!:'a. F' (I:'a->'a) = I) /\
         (* Composition *)
             (!:'a 'b 'c. !(f:'a -> 'b) (g:'b -> 'c).
                    F' (g o f) = F' g o F' f)
      `);
```
The HOL-Omega implementation may be downloaded by the command
```
   svn checkout https://hol.svn.sf.net/svnroot/hol/branches/HOL-Omega
```
Installation instructions are in the top directory. HOL-Omega may be built using either the standard or the experimental kernel, and either Moscow ML or Poly/ML, by the same process as for HOL4. This implementation is still in development but is currently useful. This provides a practical workbench for developments in the HOL-Omega logic, integrated in a natural and consistent manner with the existing HOL4 tools and libraries that have been refined and extended over many years. Examples using the new features of HOL-Omega may be found in the directory examples/HolOmega.

This implementation was designed with particular concern for backward compatibility with HOL4. This was almost entirely achieved, which was possible only because the fundamental data types representing types and terms were originally encapsulated. This meant that the underlying representation could be changed without affecting the abstract view of types and terms by the rest of the system. Virtually all existing HOL4 code will build correctly, including the extensive libraries. The simplifiers have been upgraded, including higher-order matching of the new types and terms and automatic type beta-reduction. Algebraic types with higher kinds and ranks may be constructed using the familiar Hol_datatype tool. Not all of the tools will work as expected on the new terms and types, as the revision process is ongoing, but they will function identically on the classic terms and types. So nothing of HOL4’s existing power has been lost.

More information may be found at http://www.trustworthytools.com.

New examples:

examples/machine-code contains a development of verified Lisp implementations on top of formal models of hardware instruction sets for the ARM, PowerPC and x86 chips. This is work done by Magnus Myreen. Note: This example must be built with Poly/ML.
examples/hol_dpllScript.sml contains a very simplistic HOL implementation of DPLL with unit propagation, with proofs of termination, completeness and soundness.
examples/misc/PropLogicScript.sml contains soundness and completeness results for a classical propositional logic.
A formalisation of Symbolic Trajectory Evaluation, contributed by Ashish Darbari.

Incompatibilities:

The emacs mode is now found in the file tools/hol-mode.el (rather than hol98-mode.el) and all of the occurrences of the string hol98 there have been replaced by hol.
The muddy code, and the packages that depend on it (HolBdd and HolCheck) have moved from the src directory into examples. If you wish to use any of these packages, you will now have to both build them (see below), and then explicitly include the directories in your Holmakefiles. For example, the following line at the top of a Holmakefile will give access to HolBdd (and its dependency, muddy):
```
         INCLUDES = $(protect $(HOLDIR)/examples/muddy) $(protect $(HOLDIR)/examples/HolBdd)
```
To build these libraries:
1. On Windows, copy tools\win-binaries\muddy.so to examples\muddy\muddyC. On other platforms, type make in examples/muddy/muddyC.
2. Do Holmake in examples/muddy.
3. Do Holmake in examples/HolBdd.
4. Do Holmake in examples/HolCheck.
On 64-bit Unices, users report having to add the -fPIC option to the CFLAGS variable at
- src/muddy/muddyC/Makefile
- src/muddy/muddyC/buddy/config
The FIRSTN and BUTFIRSTN constants from the theory rich_list are now defined in the theory list, with names TAKE and DROP. If you load rich_listTheory, then the old names are overloaded so that parsing existing theories should continue to work. The change may cause pain if you have code that expects the constants to have their old names (for a call to mk_const say), or if you have variables or constants of your own using the names TAKE and DROP.

Similarly, the IS_PREFIX constant from rich_listTheory is now actually an overload to a pattern referring to the constant isPREFIX in listTheory (this was the name of the corresponding constant in stringTheory (see above). The new constant takes its arguments in the reverse order (smaller list first), and prints with the infix symbol <<=. The quotation `IS_PREFIX l1 l2` will produce a term that prints as ``l2 <<= l1``, preserving the appropriate semantics. (The Unicode symbol for this infix is ≼.)
The SET_TO_LIST constant is now defined in listTheory (moved from containerTheory), which now depends on the theory of sets (pred_setTheory). The LIST_TO_SET constant is now also overloaded to the easier-to-type name set. A number of theorems about both functions are now in listTheory, and because the theory of lists depends on that of sets, so too does the standard HOL environment loaded by bossLib. All of the theorems from containerTheory are still accessible in the same place, even though they are now just copies of theorems proved in listTheory.

If the presence of a constant called set causes pain because you have used the same string as a variable name and don’t wish to change it, the constant can be hidden, and the interference halted, by using hide:
```
         hide "set";
```
The parse_in_context function (used heavily in the Q structure) is now pickier about its parses. Now any use of a free variable in a quotation that is parsed must be consistent with its type in the provided context.
The user-specified pretty-printing functions that can be added to customise printing of terms (using add_user_printer), now key off term patterns rather than types. Previously, if a term was of a the type specified by the user, the user’s function would be called. Now, if the term matches the provided pattern, the function is called. The old behaviour can be emulated by simply giving as the pattern a variable of the desired type.

Additionally, the user’s function now gets access to a richer context of arguments when the system printer calls it.
Precedence level 450 in the standard term grammar is now non-associative, and suggested as an appropriate level for binary relations, such as =, < and ⊆. The list cons operators :: has moved out of this level to level 490, and the ++ operator used as a shorthand for list concatenation (APPEND) has moved to 480. (This preserves their relative positioning.)
The new feature allowing unary minus causes the way unary minus was handled in wordsTheory to fail. There the unary minus (“twos complement” in fact) could be accessed by writing things like ``$- 3w + x``. One should now write ``-3w + x``. If converting one’s script files is going to be too arduous, the old behaviour can be achieved by including the line
```
         val _ = temp_overload_on ("-", ``word_2comp``)
```
at the top of the given script file.

HOL 4, Kananaskis-5