\chapter{Functional Programming in HOL} This chapter describes how to write functional programs in HOL and how to verify them. However, most of the constructs and proof procedures introduced are general and recur in any specification or verification task. We really should speak of functional \emph{modelling} rather than functional \emph{programming}: our primary aim is not to write programs but to design abstract models of systems. HOL is a specification language that goes well beyond what can be expressed as a program. However, for the time being we concentrate on the computable. If you are a purist functional programmer, please note that all functions in HOL must be total: they must terminate for all inputs. Lazy data structures are not directly available. \section{An Introductory Theory} \label{sec:intro-theory} Functional programming needs datatypes and functions. Both of them can be defined in a theory with a syntax reminiscent of languages like ML or Haskell. As an example consider the theory in figure~\ref{fig:ToyList}. We will now examine it line by line. \begin{figure}[htbp] \begin{ttbox}\makeatother \input{ToyList1.txt}\end{ttbox} \caption{A Theory of Lists} \label{fig:ToyList} \end{figure} \index{*ToyList example|(} {\makeatother\medskip\input{ToyList.tex}} The complete proof script is shown in Fig.\ts\ref{fig:ToyList-proofs}. The concatenation of Figs.\ts\ref{fig:ToyList} and~\ref{fig:ToyList-proofs} constitutes the complete theory \texttt{ToyList} and should reside in file \texttt{ToyList.thy}. % It is good practice to present all declarations and %definitions at the beginning of a theory to facilitate browsing.% \index{*ToyList example|)} \begin{figure}[htbp] \begin{ttbox}\makeatother \input{ToyList2.txt}\end{ttbox} \caption{Proofs about Lists} \label{fig:ToyList-proofs} \end{figure} \subsubsection*{Review} This is the end of our toy proof. It should have familiarized you with \begin{itemize} \item the standard theorem proving procedure: state a goal (lemma or theorem); proceed with proof until a separate lemma is required; prove that lemma; come back to the original goal. \item a specific procedure that works well for functional programs: induction followed by all-out simplification via \isa{auto}. \item a basic repertoire of proof commands. \end{itemize} \begin{warn} It is tempting to think that all lemmas should have the \isa{simp} attribute just because this was the case in the example above. However, in that example all lemmas were equations, and the right-hand side was simpler than the left-hand side --- an ideal situation for simplification purposes. Unless this is clearly the case, novices should refrain from awarding a lemma the \isa{simp} attribute, which has a global effect. Instead, lemmas can be applied locally where they are needed, which is discussed in the following chapter. \end{warn} \section{Some Helpful Commands} \label{sec:commands-and-hints} This section discusses a few basic commands for manipulating the proof state and can be skipped by casual readers. There are two kinds of commands used during a proof: the actual proof commands and auxiliary commands for examining the proof state and controlling the display. Simple proof commands are of the form \commdx{apply}(\textit{method}), where \textit{method} is typically \isa{induct_tac} or \isa{auto}. All such theorem proving operations are referred to as \bfindex{methods}, and further ones are introduced throughout the tutorial. Unless stated otherwise, you may assume that a method attacks merely the first subgoal. An exception is \isa{auto}, which tries to solve all subgoals. The most useful auxiliary commands are as follows: \begin{description} \item[Modifying the order of subgoals:] \commdx{defer} moves the first subgoal to the end and \commdx{prefer}~$n$ moves subgoal $n$ to the front. \item[Printing theorems:] \commdx{thm}~\textit{name}$@1$~\dots~\textit{name}$@n$ prints the named theorems. \item[Reading terms and types:] \commdx{term} \textit{string} reads, type-checks and prints the given string as a term in the current context; the inferred type is output as well. \commdx{typ} \textit{string} reads and prints the given string as a type in the current context. \end{description} Further commands are found in the Isabelle/Isar Reference Manual~\cite{isabelle-isar-ref}. \begin{pgnote} Clicking on the \pgmenu{State} button redisplays the current proof state. This is helpful in case commands like \isacommand{thm} have overwritten it. \end{pgnote} We now examine Isabelle's functional programming constructs systematically, starting with inductive datatypes. \section{Datatypes} \label{sec:datatype} \index{datatypes|(}% Inductive datatypes are part of almost every non-trivial application of HOL. First we take another look at an important example, the datatype of lists, before we turn to datatypes in general. The section closes with a case study. \subsection{Lists} \index{*list (type)}% Lists are one of the essential datatypes in computing. We expect that you are already familiar with their basic operations. Theory \isa{ToyList} is only a small fragment of HOL's predefined theory \thydx{List}\footnote{\url{https://isabelle.in.tum.de/library/HOL/List.html}}. The latter contains many further operations. For example, the functions \cdx{hd} (``head'') and \cdx{tl} (``tail'') return the first element and the remainder of a list. (However, pattern matching is usually preferable to \isa{hd} and \isa{tl}.) Also available are higher-order functions like \isa{map} and \isa{filter}. Theory \isa{List} also contains more syntactic sugar: \isa{[}$x@1$\isa{,}\dots\isa{,}$x@n$\isa{]} abbreviates $x@1$\isa{\#}\dots\isa{\#}$x@n$\isa{\#[]}. In the rest of the tutorial we always use HOL's predefined lists by building on theory \isa{Main}. \begin{warn} Looking ahead to sets and quanifiers in Part II: The best way to express that some element \isa{x} is in a list \isa{xs} is \isa{x $\in$ set xs}, where \isa{set} is a function that turns a list into the set of its elements. By the same device you can also write bounded quantifiers like \isa{$\forall$x $\in$ set xs} or embed lists in other set expressions. \end{warn} \subsection{The General Format} \label{sec:general-datatype} The general HOL \isacommand{datatype} definition is of the form \[ \isacommand{datatype}~(\alpha@1, \dots, \alpha@n) \, t ~=~ C@1~\tau@{11}~\dots~\tau@{1k@1} ~\mid~ \dots ~\mid~ C@m~\tau@{m1}~\dots~\tau@{mk@m} \] where $\alpha@i$ are distinct type variables (the parameters), $C@i$ are distinct constructor names and $\tau@{ij}$ are types; it is customary to capitalize the first letter in constructor names. There are a number of restrictions (such as that the type should not be empty) detailed elsewhere~\cite{isabelle-HOL}. Isabelle notifies you if you violate them. Laws about datatypes, such as \isa{[] \isasymnoteq~x\#xs} and \isa{(x\#xs = y\#ys) = (x=y \isasymand~xs=ys)}, are used automatically during proofs by simplification. The same is true for the equations in primitive recursive function definitions. Every\footnote{Except for advanced datatypes where the recursion involves ``\isasymRightarrow'' as in {\S}\ref{sec:nested-fun-datatype}.} datatype $t$ comes equipped with a \isa{size} function from $t$ into the natural numbers (see~{\S}\ref{sec:nat} below). For lists, \isa{size} is just the length, i.e.\ \isa{size [] = 0} and \isa{size(x \# xs) = size xs + 1}. In general, \cdx{size} returns \begin{itemize} \item zero for all constructors that do not have an argument of type $t$, \item one plus the sum of the sizes of all arguments of type~$t$, for all other constructors. \end{itemize} Note that because \isa{size} is defined on every datatype, it is overloaded; on lists \isa{size} is also called \sdx{length}, which is not overloaded. Isabelle will always show \isa{size} on lists as \isa{length}. \subsection{Primitive Recursion} \index{recursion!primitive}% Functions on datatypes are usually defined by recursion. In fact, most of the time they are defined by what is called \textbf{primitive recursion} over some datatype $t$. This means that the recursion equations must be of the form \[ f \, x@1 \, \dots \, (C \, y@1 \, \dots \, y@k)\, \dots \, x@n = r \] such that $C$ is a constructor of $t$ and all recursive calls of $f$ in $r$ are of the form $f \, \dots \, y@i \, \dots$ for some $i$. Thus Isabelle immediately sees that $f$ terminates because one (fixed!) argument becomes smaller with every recursive call. There must be at most one equation for each constructor. Their order is immaterial. A more general method for defining total recursive functions is introduced in {\S}\ref{sec:fun}. \begin{exercise}\label{ex:Tree} \input{Tree.tex}% \end{exercise} \input{case_exprs.tex} \input{Ifexpr.tex} \index{datatypes|)} \section{Some Basic Types} This section introduces the types of natural numbers and ordered pairs. Also described is type \isa{option}, which is useful for modelling exceptional cases. \subsection{Natural Numbers} \label{sec:nat}\index{natural numbers}% \index{linear arithmetic|(} \input{fakenat.tex}\medskip \input{natsum.tex} \index{linear arithmetic|)} \subsection{Pairs} \input{pairs2.tex} \subsection{Datatype {\tt\slshape option}} \label{sec:option} \input{Option2.tex} \section{Definitions} \label{sec:Definitions} A definition is simply an abbreviation, i.e.\ a new name for an existing construction. In particular, definitions cannot be recursive. Isabelle offers definitions on the level of types and terms. Those on the type level are called \textbf{type synonyms}; those on the term level are simply called definitions. \subsection{Type Synonyms} \index{type synonyms}% Type synonyms are similar to those found in ML\@. They are created by a \commdx{type\protect\_synonym} command: \medskip \input{types.tex} \input{prime_def.tex} \section{The Definitional Approach} \label{sec:definitional} \index{Definitional Approach}% As we pointed out at the beginning of the chapter, asserting arbitrary axioms such as $f(n) = f(n) + 1$ can easily lead to contradictions. In order to avoid this danger, we advocate the definitional rather than the axiomatic approach: introduce new concepts by definitions. However, Isabelle/HOL seems to support many richer definitional constructs, such as \isacommand{primrec}. The point is that Isabelle reduces such constructs to first principles. For example, each \isacommand{primrec} function definition is turned into a proper (nonrecursive!) definition from which the user-supplied recursion equations are automatically proved. This process is hidden from the user, who does not have to understand the details. Other commands described later, like \isacommand{fun} and \isacommand{inductive}, work similarly. This strict adherence to the definitional approach reduces the risk of soundness errors. \chapter{More Functional Programming} The purpose of this chapter is to deepen your understanding of the concepts encountered so far and to introduce advanced forms of datatypes and recursive functions. The first two sections give a structured presentation of theorem proving by simplification ({\S}\ref{sec:Simplification}) and discuss important heuristics for induction ({\S}\ref{sec:InductionHeuristics}). You can skip them if you are not planning to perform proofs yourself. We then present a case study: a compiler for expressions ({\S}\ref{sec:ExprCompiler}). Advanced datatypes, including those involving function spaces, are covered in {\S}\ref{sec:advanced-datatypes}; it closes with another case study, search trees (``tries''). Finally we introduce \isacommand{fun}, a general form of recursive function definition that goes well beyond \isacommand{primrec} ({\S}\ref{sec:fun}). \section{Simplification} \label{sec:Simplification} \index{simplification|(} So far we have proved our theorems by \isa{auto}, which simplifies all subgoals. In fact, \isa{auto} can do much more than that. To go beyond toy examples, you need to understand the ingredients of \isa{auto}. This section covers the method that \isa{auto} always applies first, simplification. Simplification is one of the central theorem proving tools in Isabelle and many other systems. The tool itself is called the \textbf{simplifier}. This section introduces the many features of the simplifier and is required reading if you intend to perform proofs. Later on, {\S}\ref{sec:simplification-II} explains some more advanced features and a little bit of how the simplifier works. The serious student should read that section as well, in particular to understand why the simplifier did something unexpected. \subsection{What is Simplification?} In its most basic form, simplification means repeated application of equations from left to right. For example, taking the rules for \isa{\at} and applying them to the term \isa{[0,1] \at\ []} results in a sequence of simplification steps: \begin{ttbox}\makeatother (0#1#[]) @ [] \(\leadsto\) 0#((1#[]) @ []) \(\leadsto\) 0#(1#([] @ [])) \(\leadsto\) 0#1#[] \end{ttbox} This is also known as \bfindex{term rewriting}\indexbold{rewriting} and the equations are referred to as \bfindex{rewrite rules}. ``Rewriting'' is more honest than ``simplification'' because the terms do not necessarily become simpler in the process. The simplifier proves arithmetic goals as described in {\S}\ref{sec:nat} above. Arithmetic expressions are simplified using built-in procedures that go beyond mere rewrite rules. New simplification procedures can be coded and installed, but they are definitely not a matter for this tutorial. \input{simp.tex} \index{simplification|)} \input{Itrev.tex} \begin{exercise} \input{Plus.tex}% \end{exercise} \begin{exercise} \input{Tree2.tex}% \end{exercise} \input{CodeGen.tex} \section{Advanced Datatypes} \label{sec:advanced-datatypes} \index{datatype@\isacommand {datatype} (command)|(} \index{primrec@\isacommand {primrec} (command)|(} %|) This section presents advanced forms of datatypes: mutual and nested recursion. A series of examples will culminate in a treatment of the trie data structure. \subsection{Mutual Recursion} \label{sec:datatype-mut-rec} \input{ABexpr.tex} \subsection{Nested Recursion} \label{sec:nested-datatype} {\makeatother\input{Nested.tex}} \subsection{The Limits of Nested Recursion} \label{sec:nested-fun-datatype} How far can we push nested recursion? By the unfolding argument above, we can reduce nested to mutual recursion provided the nested recursion only involves previously defined datatypes. This does not include functions: \begin{isabelle} \isacommand{datatype} t = C "t \isasymRightarrow\ bool" \end{isabelle} This declaration is a real can of worms. In HOL it must be ruled out because it requires a type \isa{t} such that \isa{t} and its power set \isa{t \isasymFun\ bool} have the same cardinality --- an impossibility. For the same reason it is not possible to allow recursion involving the type \isa{t set}, which is isomorphic to \isa{t \isasymFun\ bool}. Fortunately, a limited form of recursion involving function spaces is permitted: the recursive type may occur on the right of a function arrow, but never on the left. Hence the above can of worms is ruled out but the following example of a potentially \index{infinitely branching trees}% infinitely branching tree is accepted: \smallskip \input{Fundata.tex} If you need nested recursion on the left of a function arrow, there are alternatives to pure HOL\@. In the Logic for Computable Functions (\rmindex{LCF}), types like \begin{isabelle} \isacommand{datatype} lam = C "lam \isasymrightarrow\ lam" \end{isabelle} do indeed make sense~\cite{paulson87}. Note the different arrow, \isa{\isasymrightarrow} instead of \isa{\isasymRightarrow}, expressing the type of \emph{continuous} functions. There is even a version of LCF on top of HOL, called \rmindex{HOLCF}~\cite{MuellerNvOS99}. \index{datatype@\isacommand {datatype} (command)|)} \index{primrec@\isacommand {primrec} (command)|)} \subsection{Case Study: Tries} \label{sec:Trie} \index{tries|(}% Tries are a classic search tree data structure~\cite{Knuth3-75} for fast indexing with strings. Figure~\ref{fig:trie} gives a graphical example of a trie containing the words ``all'', ``an'', ``ape'', ``can'', ``car'' and ``cat''. When searching a string in a trie, the letters of the string are examined sequentially. Each letter determines which subtrie to search next. In this case study we model tries as a datatype, define a lookup and an update function, and prove that they behave as expected. \begin{figure}[htbp] \begin{center} \unitlength1mm \begin{picture}(60,30) \put( 5, 0){\makebox(0,0)[b]{l}} \put(25, 0){\makebox(0,0)[b]{e}} \put(35, 0){\makebox(0,0)[b]{n}} \put(45, 0){\makebox(0,0)[b]{r}} \put(55, 0){\makebox(0,0)[b]{t}} % \put( 5, 9){\line(0,-1){5}} \put(25, 9){\line(0,-1){5}} \put(44, 9){\line(-3,-2){9}} \put(45, 9){\line(0,-1){5}} \put(46, 9){\line(3,-2){9}} % \put( 5,10){\makebox(0,0)[b]{l}} \put(15,10){\makebox(0,0)[b]{n}} \put(25,10){\makebox(0,0)[b]{p}} \put(45,10){\makebox(0,0)[b]{a}} % \put(14,19){\line(-3,-2){9}} \put(15,19){\line(0,-1){5}} \put(16,19){\line(3,-2){9}} \put(45,19){\line(0,-1){5}} % \put(15,20){\makebox(0,0)[b]{a}} \put(45,20){\makebox(0,0)[b]{c}} % \put(30,30){\line(-3,-2){13}} \put(30,30){\line(3,-2){13}} \end{picture} \end{center} \caption{A Sample Trie} \label{fig:trie} \end{figure} Proper tries associate some value with each string. Since the information is stored only in the final node associated with the string, many nodes do not carry any value. This distinction is modeled with the help of the predefined datatype \isa{option} (see {\S}\ref{sec:option}). \input{Trie.tex} \index{tries|)} \section{Total Recursive Functions: \isacommand{fun}} \label{sec:fun} \index{fun@\isacommand {fun} (command)|(}\index{functions!total|(} Although many total functions have a natural primitive recursive definition, this is not always the case. Arbitrary total recursive functions can be defined by means of \isacommand{fun}: you can use full pattern matching, recursion need not involve datatypes, and termination is proved by showing that the arguments of all recursive calls are smaller in a suitable sense. In this section we restrict ourselves to functions where Isabelle can prove termination automatically. More advanced function definitions, including user supplied termination proofs, nested recursion and partiality, are discussed in a separate tutorial~\cite{isabelle-function}. \input{fun0.tex} \index{fun@\isacommand {fun} (command)|)}\index{functions!total|)}