Tutorial/binomial/binomial.tex

% binomial.tex %
% The Binomial Theorem proven in HOL %

\setlength{\unitlength}{1mm}
% \vskip10mm
% \begin{center}\LARGE\bf
% The Binomial Theorem proven in HOL.
% \end{center}
% \vskip10mm

\def\obj#1{\mbox{\tt#1}}

% ---------------------------------------------------------------------
% Parameters customising the document; set by whoever \input's it
% ---------------------------------------------------------------------
% \self{} is one word denoting this document, such as "article" or "chapter"
% \path{} is the path denoting the case-study directory
%
% Typical examples:
% \def\self{article}
% \def\path{\verb%Training/studies/binomial%}
% ---------------------------------------------------------------------

\section{Pascal's Triangle and the Binomial Theorem}

Pascal's Triangle\footnote{
According to Knuth \cite{knuth73}, the triangle gets its name from
its appearance in Blaise Pascal's {\em Trait\'e du triangle arithm\'etique}
in 1653, although the binomial coefficients were known long before then.
The earliest surviving description is from the tenth century,
in Hal\={a}yudha's commentary on the Hindu classic, the Chandah-S\^{u}tra.
The Binomial Theorem itself was first reported in 1676 by Isaac Newton.
}
is the infinite triangle of numbers which starts like this:
\[\begin{array}{ccccccc}
1  \\
1 & 1 \\
1 & 2 & 1 \\
1 & 3 & 3 & 1 \\
1 & 4 & 6 & 4 & 1 \\
1 & 5 & 10 & 10 & 5 & 1 \\
\cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\
\end{array}\]
The construction is as follows. The numbers at the edges
are $1$s. Each number in the interior is the sum of the number immediately
above it and the number to its left in the previous row.

The numbers in Pascal's Triangle are called Binomial Coefficients.
The number in the $n$th row and $k$th column (where $0 \leq k \leq n$, and the
topmost row and the leftmost column are numbered $0$) is written
$n \choose k$ and pronounced ``$n$ choose $k$''.  The coefficients are
pronounced this way because $n \choose k$ turns out to be the number of
ways to choose $k$ things out of $n$ things, but that is another story
(see for instance Knuth's book \cite{knuth73}).

A simple form of the Binomial Theorem \cite{maclane67} \cite{mostow63}
is the following equation, where $a$ and $b$ are integers and $n$ is any
natural number,
\[
(a + b)^n = \sum_{k=0}^n {n \choose k} a^k b^{n-k}
\]
The rest of this \self{} describes how the Binomial Theorem can be proven
in \HOL{}.  In fact, a more general theorem is proven, where $a$ and $b$
are elements of an arbitrary commutative ring, but the form displayed here
can be derived from the general result.

The motivation for the proof of the Binomial Theorem in \HOL{} is
tutorial.
% ---to be a medium sized worked example whose subject matter is
% more widely known than any specific piece of hardware or software.
The proof illustrates specific ways in \HOL{} to represent mathematical
notations and manage theorems proven about general structures such as
monoids, groups and rings. The following files accompany this \self{} in
directory \path{}:
\begin{description}
\item[{\tt mk\_BINOMIAL.ml}]
    The \ML{} program containing the definitions, lemmas and theorems\\
    needed to prove the Binomial Theorem in \HOL{}.
\item
    [{\tt BINOMIAL.th}]
    The theory file which is generated by \verb@mk_BINOMIAL.ml@.
\item
    [{\tt BINOMIAL.ml}]
    An \ML{} program which loads the theory \verb@BINOMIAL@ and declares
    a few derived rules and one tactic used in the proof.
    The Binomial Theorem is bound to the \ML{} identifier \verb@BINOMIAL@.
\end{description}
% The aim of this \self{} is to introduce the small amount of algebra and
% mathematical notation needed to state and prove the Binomial Theorem, show
% how this is rendered in \HOL{}, and outline the structure of the proof
% contained in \verb@mk_BINOMIAL.ml@.
The \self{} is meant to be intelligible
without examination of any of the accompanying files.
To avoid clutter not every detail is spelt out.
Often definitions and theorems are indicated
in the form displayed by \HOL{}, rather than as \ML{} source text. Specific
details of the \ML{} definitions or tactics can be found in the file
\verb@mk_BINOMIAL.ml@.

The way to work with this case study is first to study this \self{} to see
the structure of the proof, and then to work interactively with \HOL{}.
Start \HOL{} and say:
\begin{session}
\begin{verbatim}
#loadt `BINOMIAL`;;
...
File BINOMIAL loaded
() : void

#BINOMIAL;;
|- !plus times.
    RING(plus,times) ==>
    (!a b n.
      POWER times n(plus a b) =
      SIGMA(plus,0,SUC n)(BINTERM plus times a b n))
\end{verbatim}
\end{session}
(This is the first display of an interaction with \HOL{}.  The difference
between lines input to and lines output from \HOL{} is that only the former
are preceded by the \HOL{} prompt ``\verb@#@''.  A line of ellipsis
``\verb@...@'' stands for output from \HOL{} which has been omitted from the
display.)

The first input line in the display above loads in all the definitions,
theorems, derived rules and the tactic defined for the proof.  The second
input line asks \HOL{} to print theorem \verb@BINOMIAL@.  The constants
used in \verb@BINOMIAL@ are explained later in the \self{}.  The \HOL{}
session is now set up either to make use of \verb@BINOMIAL@ by specialising
it to a specific ring, or to study \verb@mk_BINOMIAL.ml@ by using the subgoal
package to work through proofs contained in it.

The remainder of the \self{} follows the plan:
\begin{description}
\item[Section~\ref{BinomialCoefficients}]
    shows how to define the number $n \choose k$ in \HOL{} as the term
    $\obj{CHOOSE}\,n\,k$, where $\obj{CHOOSE}$ is a function defined by
    primitive recursion.
\item[Section~\ref{MonoidsGroupsRings}]
    shows how to define elementary algebraic concepts like associativity,
    commutativity, monoid, group and ring in \HOL{}, and how to apply
    theorems proved in the general case to particular situations.
\item[Section~\ref{PowersReductionsRangesSums}]
    shows how to define the iterated operations of raising a term to a
    power and reducing (summing) a list of terms. These operations are
    part of the everyday mathematical language used to state the Binomial
    Theorem, but they are not built-in to \HOL{} and so need to be defined.
    This section and the previous two together describe enough definitions
    in \HOL{} to state the Binomial Theorem.
\item[Section~\ref{BinomialTheorem}]
    shows how the Binomial Theorem is proven in \HOL{}.
    The proof is by induction, and depends on three main lemmas.
\item[Section~\ref{BinomialTheoremForIntegers}]
    outlines how to apply the general theorem to a particular case:
    the ring of integers.
\item[Appendix~\ref{PrincipalLemmas}]
    states the principal lemmas used to prove the theorem.
    Some of these are {\em ad hoc} but others could be reused elsewhere.
\end{description}

% ----------------------------------------------------------------------------

\section{The Binomial Coefficients}
\label{BinomialCoefficients}

The definition of the binomial coefficients as the numbers in Pascal's Triangle
is formalised in three equations:
\begin{eqnarray*}
n \choose 0 &=& 1 \\
0 \choose {k+1} &=& 0 \\
{n+1} \choose {k+1} &=& {{n} \choose {k+1}} + {{n} \choose {k}}
\end{eqnarray*}
(These actually define $n \choose k$ to be $0$ if $k>n$, but this is
consistent with Pascal's Triangle: think of the spaces to the right of
the triangle as filled with $0$'s.) The desire is to define a constant
\verb@CHOOSE@ in \HOL{} such that for all numbers $n$, $k$ in the type
{\tt num}, $\verb@CHOOSE@\,n\,k$ denotes the number $n \choose k$.
Unfortunately these three equations cannot be used directly as the definition
of \verb@CHOOSE@ in \HOL{} because they are not in the form of a base case
(when $n=0$) together with an inductive case (when $n>0$).

To work towards the definition, consider a term that specifies a base case
and an inductive case intended to be equivalent to the three equations above:
\begin{session}
\begin{verbatim}
let base_and_inductive: term =
    "(CHOOSE 0 k = ((0 = k) => 1 | 0)) /\
     (CHOOSE (SUC n) k =
         ((0 = k) => 1 | (CHOOSE n (k-1)) + (CHOOSE n k)))";;
\end{verbatim}
\end{session}
The theory \verb@prim_rec@ contains a primitive recursion theorem which
says that any base case and inductive case identifies a unique function,
\verb@fn@:
\begin{session}
\begin{verbatim}
#num_Axiom;;
|- !e f. ?! fn.
    (fn 0 = e) /\
    (!n. fn (SUC n) = f (fn n) n)
\end{verbatim}
\end{session}
Given the theorem \verb@prim_rec@, the builtin function
\verb@prove_rec_fn_exists@ can prove that there is a function that satisfies
the property specified by the term \verb@base_and_inductive@:
\begin{session}
\begin{verbatim}
#let RAW_CHOOSE_DEF = prove_rec_fn_exists num_Axiom base_and_inductive;;
RAW_CHOOSE_DEF =
|- ?CHOOSE.
    (!k. CHOOSE 0 k = ((0 = k) => 1 | 0)) /\
    (!n k.
      CHOOSE(SUC n)k = ((0 = k) => 1 | (CHOOSE n(k - 1)) + (CHOOSE n k)))
\end{verbatim}
\end{session}
These equations would not do as the only definition of \verb@CHOOSE@ due
to the presence of the conditional operator, which makes rewriting
difficult.  But having obtained the theorem \verb@RAW_CHOOSE_DEF@ it is
easy to prove there is a function which satisfies the original three
equations:
\begin{session}
\begin{verbatim}
CHOOSE_DEF_LEMMA =
|- ?CHOOSE.
    (!n. CHOOSE n 0 = 1) /\
    (!k. CHOOSE 0(SUC k) = 0) /\
    (!n k. CHOOSE(SUC n)(SUC k) = (CHOOSE n(SUC k)) + (CHOOSE n k))
\end{verbatim}
\end{session}
Now the constant \verb@CHOOSE@ can be specified as originally desired:
\begin{session}
\begin{verbatim}
#let CHOOSE_DEF =
#    new_specification `CHOOSE_DEF` [`constant`,`CHOOSE`] CHOOSE_DEF_LEMMA;;
CHOOSE_DEF =
|- (!n. CHOOSE n 0 = 1) /\
   (!k. CHOOSE 0(SUC k) = 0) /\
   (!n k. CHOOSE(SUC n)(SUC k) = (CHOOSE n(SUC k)) + (CHOOSE n k))
\end{verbatim}
\end{session}
Two elementary properties of the binomial coefficients can now be proven
by induction:
\begin{session}
\begin{verbatim}
CHOOSE_LESS = |- !n k. n < k ==> (CHOOSE n k = 0)
CHOOSE_SAME = |- !n. CHOOSE n n = 1
\end{verbatim}
\end{session}

% ----------------------------------------------------------------------------

\section{Monoids, Groups and Commutative Rings}
\label{MonoidsGroupsRings}
%A simple way to define algebraic structures in \HOL{}.

\subsection{Associativity and Commutativity}

An {\em algebraic structure} is a set equipped with some operators that
obey some laws.  An elementary example of such a structure is the pair
$(A,+)$, where $A$ is a set and $+$ is a binary operator on $A$ that obeys
the laws of associativity,
\[
a + (b + c) = (a + b) + c
\]
and commutativity,
\[
a + b = b + a
\]

How might these laws be represented in \HOL{}?  The simplest scheme is
to deal with structures of the form $(\sigma, +)$, where $\sigma$ is a
type of the \HOL{} logic, and $+$ is a \HOL{} term of type
$\sigma\to\sigma\to\sigma$.  Notice that the set in the structure is
represented as a \HOL{} type: see the theory \verb@group@ presented in
Elsa Gunter's case study of Modular Arithmetic for a more sophisticated
approach where the set in a structure can be a subset of a \HOL{} type.

The two laws can be defined as two constants in \HOL{}:
\begin{session}
\begin{verbatim}
#let ASSOCIATIVE =
#    new_definition (`ASSOCIATIVE`,
#      "!plus: *->*->*. ASSOCIATIVE plus =
#          (!a b c. plus a (plus b c) = plus (plus a b) c)");;
...
#let COMMUTATIVE =
#    new_definition (`COMMUTATIVE`,
#      "!plus: *->*->*. COMMUTATIVE plus =
#          (!a b. plus a b = plus b a)");;
\end{verbatim}
\end{session}
By instantiating the type variable \verb@*@ and specialising the variable
\verb@plus@ these constants are applicable to any particular structure
of the form $(\sigma, +: \sigma\to\sigma\to\sigma)$.

A simple example to illustrate this methodology is a permutation theorem
about an arbitrary associative commutative operator (which is called
\verb@PERM@ in file \verb@mk_BINOMIAL.ml@).  It is worthwhile going
through the proof in detail because it illustrates how a difficulty in
making use of assumptions like \verb@ASSOCIATIVE plus@ and
\verb@COMMUTATIVE plus@ can be solved.
\begin{session}
\begin{verbatim}
#g "!plus: *->*->*.
#       ASSOCIATIVE plus /\ COMMUTATIVE plus ==>
#           !a b c. plus (plus a b) c = plus b (plus a c)";;
\end{verbatim}
\end{session}
The first step is to strip off the quantifiers, and break up the implication:
\begin{session}
\begin{verbatim}
#e (REPEAT STRIP_TAC);;
OK..
"plus(plus a b)c = plus b(plus a c)"
    [ "ASSOCIATIVE plus" ]
    [ "COMMUTATIVE plus" ]
\end{verbatim}
\end{session}
The next step is to rewrite \verb@plus a b@ to \verb@plus b a@ on the
left-hand side, from the assumption \verb@COMMUTATIVE plus@. If the theorem
was about a specific operator, such as \verb@$+: num->num->num@ from the
theory of numbers in \HOL{}, the rewriting could be done with
\verb@REWRITE_TAC@ and the specific commutativity theorem, such as
\verb@ADD_SYM@:
\[
\mbox{\tt |- !m n. m + n = n + m}.
\]
But in the general case all that is available is the definition of
\verb@COMMUTATIVE plus@ which cannot be used directly with
\verb@REWRITE_TAC@ to swap the terms \verb@a@ and \verb@b@.
One approach is first to prove the
following equation from the definition \verb@COMMUTATIVE plus@,
\[
\mbox{\tt COMMUTATIVE plus |- !a b. plus a b = plus b a},
\]
and then to use the equation with \verb@REWRITE_TAC@ in the same way as one
might use \verb@ADD_SYM@.  This is a valid tactic because the
assumption \verb@COMMUTATIVE plus@ is already present in the assumption
list.  A little forward proof achieves the first step, the derivation of the
equation:
\begin{session}
\begin{verbatim}
#top_print print_all_thm;;
- : (thm -> void)

#let forwards = fst(EQ_IMP_RULE (SPEC "plus: *->*->*" COMMUTATIVE));;
forwards = |- COMMUTATIVE plus ==> (!a b. plus a b = plus b a)

#let eqn = UNDISCH forwards;;
eqn = COMMUTATIVE plus |- !a b. plus a b = plus b a
\end{verbatim}
\end{session}
This forward proof is a special case of a derived
rule of the logic,
\[
\Gamma\turn \uquant{x} t_1 = t_2
\over
\Gamma, t_1[t'/x] \turn t_2[t'/x]
\]
which can be coded as the \ML{} function \verb@SPEC_EQ@:
\begin{session}
\begin{verbatim}
#let SPEC_EQ v th = UNDISCH(fst(EQ_IMP_RULE (SPEC v th)));;
SPEC_EQ = - : (term -> thm -> thm)
\end{verbatim}
\end{session}
Now the equation \verb@eqn@ can be used to prove swap terms
\verb@a@ and \verb@b@:
\begin{session}
\begin{verbatim}
#e(SUBST1_TAC (SPECL ["a:*";"b:*"] eqn));;
OK..
"plus(plus b a)c = plus b(plus a c)"
    [ "ASSOCIATIVE plus" ]
    [ "COMMUTATIVE plus" ]
\end{verbatim}
\end{session}
Now that the terms appear in both sides in the same order but with different
grouping, associativity can be used to make the grouping on both sides
the same.  Again the definition of \verb@ASSOCIATIVE@ cannot be used
directly, but the derived rule \verb@SPEC_EQ@ obtains from the definition
an equation analogous to \verb@eqn@ which can be used:
\begin{session}
\begin{verbatim}
#e(REWRITE_TAC [SPEC_EQ "plus: *->*->*" ASSOCIATIVE]);;
OK..
goal proved
...
|- !plus.
    ASSOCIATIVE plus /\ COMMUTATIVE plus ==>
    (!a b c. plus(plus a b)c = plus b(plus a c))
\end{verbatim}
\end{session}

The point of this example was to illustrate how to prove theorems about
an arbitrary operator whose behaviour is specified by general predicates
like \verb@ASSOCIATIVE@ and \verb@COMMUTATIVE@. The main difficulty is
in deriving equations for use with rewriting or substitution.  The solution
used in the example and extensively in the proof of the Binomial Theorem
is to derive equations using the derived rule \verb@SPEC_EQ@. Another derived
rule, \verb@SPEC_IMP@, is used extensively and is a variant of \verb@SPEC_EQ@:
\[
\Gamma\turn \uquant{x} t_1 \Rightarrow t_2
\over
\Gamma, t_1[t'/x] \turn t_2[t'/x]
\]
\begin{session}
\begin{verbatim}
#let SPEC_IMP v th = UNDISCH(SPEC v th);;
SPEC_IMP = - : (term -> thm -> thm)
\end{verbatim}
\end{session}
Two following two rules are versions of \verb@SPEC_EQ@ and \verb@SPEC_IMP@
generalised to take lists of variables rather than just one:
\begin{session}
\begin{verbatim}
SPECL_EQ = - : (term list -> thm -> thm)
SPECL_IMP = - : (term list -> thm -> thm)
\end{verbatim}
\end{session}
The final new derived rule used in the proof of the Binomial Theorem
is \verb@STRIP_SPEC_IMP@, a variant of \verb@SPEC_IMP@, which
splits up a conjunction in the antecedent into separate assumptions:
\[
\Gamma\turn \uquant{x} (t_1 \wedge \cdots \wedge t_n) \Rightarrow t_{n+1}
\over
\Gamma, t_1[t'/x], \ldots, t_n[t'/x] \turn t_{n+1}[t'/x]
\]
\begin{session}
\begin{verbatim}
#let STRIP_SPEC_IMP v th =
#    UNDISCH_ALL(SPEC v (CONV_RULE (ONCE_DEPTH_CONV ANTE_CONJ_CONV) th));;
STRIP_SPEC_IMP = - : (term -> thm -> thm)
\end{verbatim}
\end{session}

There are other methods for proving theorems which are conditional on
predicates such as \verb@ASSOCIATIVE@ and \verb@COMMUTATIVE@.  One is to
rewrite with the definition before assuming the predicate (see, for instance,
the proof of \verb@RIGHT_CANCELLATION@ in \verb@mk_BINOMIAL.ml@). This
method is fine for small proofs, but in larger proofs it presents the problem
of how to specify a particular assumption on the assumption list.  Another
method is to put the predicates on the assumption list, and use resolution
to extract the body of the definition (see the proof of
\verb@UNIQUE_LEFT_INV@).  When the assumption list is not short this is
rather a blunt instrument, in the sense that resolution can find many more
theorems than the one desired, and is also rather computationally costly
when compared to carefully constructing a theorem with \verb@SPEC_EQ@ and
its variants.

\subsection{Monoids}

A {\em monoid} is a structure $(M,+)$ where $M$ is a set, $+$ is an
associative binary operator on $M$, such that there is an {\em identity
element}, $u \in M$, which is a left and right identity of $+$, that is,
it satisfies $u+a = a+u = a$ for all $a \in M$. When the operator is written
as $+$, the structure $(M,+)$ is called an {\em additive monoid} and the
identity element is written as $0$; otherwise if the operator is written
as $\times$, the structure $(M,\times)$ is called a {\em multiplicative
monoid} and the identity element is written as $1$.
\begin{session}
\begin{verbatim}
#let RAW_MONOID =
#    new_definition (`RAW_MONOID`,
#      "!plus: *->*->*. MONOID plus =
#          ASSOCIATIVE plus /\
#          (?u. (!a. plus a u = a) /\ (!a. plus u a = a))");;
\end{verbatim}
\end{session}
This definition is inconvenient to manipulate as it stands, because of
the presence of the existential quantifier.  However it is possible via
Hilbert's $\epsilon$-operator to specify a function \verb@Id@ such that
\verb@Id plus@ stands for an identity element of \verb@plus@, if such
exists.  In fact it is easy to prove that the identity element in a monoid
is unique.  The first step is to prove that there exists a function \verb@Id@
with the property just described:
\begin{session}
\begin{verbatim}
ID_LEMMA =
|- ?Id.
    !plus.
     (?u. (!a. plus u a = a) /\ (!a. plus a u = a)) ==>
     (!a. plus(Id plus)a = a) /\ (!a. plus a(Id plus) = a)
\end{verbatim}
\end{session}
The proof of \verb@ID_LEMMA@ uses Hilbert's $\epsilon$-operator to
construct a witness for the outer existential quantifier,
\begin{verbatim}
    \plus.@u:*.(!a:*. (plus u a = a)) /\ (!a. (plus a u = a))
\end{verbatim}
Given \verb@ID_LEMMA@, the constant \verb@Id@ is specified as follows:
\begin{session}
\begin{verbatim}
#let ID = new_specification `Id` [`constant`, `Id`] ID_LEMMA;;
ID =
|- !plus.
    (?u. (!a. plus u a = a) /\ (!a. plus a u = a)) ==>
    (!a. plus(Id plus)a = a) /\ (!a. plus a(Id plus) = a)
\end{verbatim}
\end{session}
The condition that \verb@Id plus@ is a left and right identity is expressed
by the predicates \verb@LEFT_ID@ and \verb@RIGHT_ID@, respectively:
\begin{session}
\begin{verbatim}
LEFT_ID = |- !plus. LEFT_ID plus = (!a. plus(Id plus)a = a)
RIGHT_ID = |- !plus. RIGHT_ID plus = (!a. plus a(Id plus) = a)
\end{verbatim}
\end{session}
Given these abbreviations it is easy to prove in \HOL{} that
if a left and right identity element exists, then it is unique:
\begin{session}
\begin{verbatim}
UNIQUE_LEFT_ID =
|- !plus.
    LEFT_ID plus /\ RIGHT_ID plus ==>
    (!l. (!a. plus l a = a) ==> (Id plus = l))

UNIQUE_RIGHT_ID =
|- !plus.
    LEFT_ID plus /\ RIGHT_ID plus ==>
    (!r. (!a. plus a r = a) ==> (Id plus = r))
\end{verbatim}
\end{session}
The use of these last two theorems is to help identify the element
\verb@Id plus@ for some specific \verb@plus@, such as \verb@$+@ from the theory
of \verb@arithmetic@.  The abbreviations \verb@LEFT_ID@ and \verb@RIGHT_ID@
admit a new characterisation of the predicate \verb@MONOID@, which is easier
to manipulate in \HOL{} than \verb@RAW_MONOID@:
\begin{session}
\begin{verbatim}
MONOID =
|- !plus. MONOID plus = LEFT_ID plus /\ RIGHT_ID plus /\ ASSOCIATIVE plus
\end{verbatim}
\end{session}

\subsection{Groups}

A {\em group} is a monoid in which every element is invertible.
\begin{session}
\begin{verbatim}
RAW_GROUP =
|- !plus.
    Group plus =
    MONOID plus /\
    (!a. ?b. (plus b a = Id plus) /\ (plus a b = Id plus))
\end{verbatim}
\end{session}
An alternative characterisation of \verb@Group@\footnote{The mixed
  case identifier \texttt{Group} is used rather than \texttt{GROUP},
  because the latter is already defined in Elsa Gunter's theory
  \texttt{group}, a parent of her theory \texttt{integer}, which needs
  to coexist with the present theory so that integers can be used as
  an example ring (Section~\ref{BinomialTheoremForIntegers}).} which
makes the existential quantification implicit can be had by specifying
a function \verb@Inv@ so that, if \verb@Group plus@ holds, then the
element \verb@Inv plus a@ is an inverse of \verb@a@.  The function
\verb@Inv@ is specified in a way analogous to the specification of
\verb@Id@.  First a lemma is proven which says there does exist a
function \verb@Inv@ with the property desired:
\begin{session}
\begin{verbatim}
INV_LEMMA =
|- ?Inv.
    !plus.
     (!a. ?b. (plus b a = Id plus) /\ (plus a b = Id plus)) ==>
     (!a.
       (plus(Inv plus a)a = Id plus) /\ (plus a(Inv plus a) = Id plus))
\end{verbatim}
\end{session}
Given this lemma, which is proven by an easy tactic almost the same
as the one for \verb@ID_LEMMA@, the constant \verb@Inv@ can be specified:
\begin{session}
\begin{verbatim}
#let INV =
#    new_specification `Inv` [`constant`, `Inv`] INV_LEMMA;;
\end{verbatim}
\end{session}
The predicates which say that \verb@Inv@ produces left and right inverses
are defined as new constants:
\begin{session}
\begin{verbatim}
LEFT_INV =
|- !plus. LEFT_INV plus = (!a. plus(Inv plus a)a = Id plus)
RIGHT_INV =
|- !plus. RIGHT_INV plus = (!a. plus a(Inv plus a) = Id plus)
\end{verbatim}
\end{session}
Finally, the alternative characterisation of \verb@Group@ can be proven:
\begin{session}
\begin{verbatim}
GROUP = |- Group plus = MONOID plus /\ LEFT_INV plus /\ RIGHT_INV plus
\end{verbatim}
\end{session}
The theory of groups has been developed only as far as needed to prove
the Binomial Theorem; Appendix~\ref{PrincipalLemmas} shows the one lemma,
\verb@RIGHT_CANCELLATION@, specifically about groups.

\subsection{Rings}

A {\em ring} is a structure $(R,+,\times)$ such that structure $(R,+)$
is a group, structure $(R,\times)$ is a monoid, {\em addition}, $+$, is
commutative, and {\em multiplication}, $\times$, distributes (on both sides)
over addition.  A {\em commutative ring} is a ring in which multiplication
is commutative.  The word ring is used in the remainder of this \self{}
and in the \HOL{} code to mean a commutative ring:
\begin{session}
\begin{verbatim}
RING =
|- !plus times.
    RING(plus,times) =
    Group plus /\
    COMMUTATIVE plus /\
    MONOID times /\
    COMMUTATIVE times /\
    LEFT_DISTRIB(plus,times) /\
    RIGHT_DISTRIB(plus,times)
\end{verbatim}
\end{session}
\verb@LEFT_DISTRIB@ and \verb@RIGHT_DISTRIB@ are new constants
defined by the theorems:
\begin{session}
\begin{verbatim}
LEFT_DISTRIB =
|- !plus times.
    LEFT_DISTRIB(plus,times) =
    (!a b c. times a(plus b c) = plus(times a b)(times a c))

RIGHT_DISTRIB =
|- !plus times.
    RIGHT_DISTRIB(plus,times) =
    (!a b c. times(plus a b)c = plus(times a c)(times b c))
\end{verbatim}
\end{session}
Notice that the definition of \verb@RING@ abbreviates a conjunction of
predicates, some of which are atomic, in the sense that they are some basic
property of \verb@plus@ or \verb@times@ or both, and some are compound,
in the sense that they abbreviate a further conjunction of properties.
The definition is a tree of predicates with atomic ones like
\verb@LEFT_DISTRIB@ and \verb@ASSOCIATIVE@, at the leaves, and compound
ones like \verb@Group@ and \verb@MONOID@, at the branches.  Many theorems
conditional on \verb@RING(plus,times)@ need to make use of both atomic
and compound predicates contained in the tree.  To make access to all these
predicates easy, the following lemma is proven, which makes explicit all
the predicates in the tree:
\begin{session}
\begin{verbatim}
RING_LEMMA =
|- RING(plus,times) ==>
   RING(plus,times) /\
   Group plus /\
   MONOID plus /\
   LEFT_ID plus /\
   RIGHT_ID plus /\
   ASSOCIATIVE plus /\
   LEFT_INV plus /\
   RIGHT_INV plus /\
   COMMUTATIVE plus /\
   MONOID times /\
   LEFT_ID times /\
   RIGHT_ID times /\
   ASSOCIATIVE times /\
   COMMUTATIVE times /\
   LEFT_DISTRIB(plus,times) /\
   RIGHT_DISTRIB(plus,times)
\end{verbatim}
\end{session}
Goals of the form,
\[
\mbox{\tt !plus times. RING(plus,times) ==> ...}
\]
are consistently tackled with the following tactic, which uses \verb@RING_LEMMA@
to add all the atomic and compound predicates in the tree to the assumption
list:
\begin{session}
\begin{verbatim}
#let RING_TAC =
#    GEN_TAC THEN GEN_TAC THEN
#    DISCH_THEN (\th. STRIP_ASSUME_TAC (MP RING_LEMMA th));;
\end{verbatim}
\end{session}
The one lemma about rings needed here, \verb@RING_0@, is stated in
Appendix~\ref{PrincipalLemmas}.

% ----------------------------------------------------------------------------

\section{Powers, Reductions, Ranges and Sums}
\label{PowersReductionsRangesSums}
%Translation of mathematical definitions such as sums of series into HOL

The aim of this section is to explain how the mathematical notation
$\sum_{k=0}^n {n \choose k} a^k b^{n-k}$ can be rendered in \HOL{}.
If this is written out more explicitly than necessary for human
consumption it looks like this:
\[
\sum_{k=0}^n {n \choose k} \cdot (a^k \times b^{n-k})
\]
where $\times$ is multiplication in the ring, and where exponentiation
and $\cdot$ stand for the scalar powers of multiplication and addition,
respectively.  Since binomial coefficients were treated in
Section~\ref{BinomialCoefficients}, what remain to be defined in this section
are scalar powers and the $\Sigma$-notation.  The latter is tackled in
three steps: reduction of a list of monoid elements; generation of lists
comprising subranges of natural numbers, $[n,n+1,\ldots,n+k-1]$; and finally
the reduction of a list of ring elements indexed by a subrange of the natural
numbers.

\subsection{Scalar powers in a monoid}

If $(M,+)$ and $(M,\times)$ are additive and multiplicative
monoids respectively, the scalar powers of addition and multiplication are
defined informally by the following equations, where $n>0$:
\begin{eqnarray*}
n \cdot a &=& \overbrace{a + \cdots + a}^{n} \\
a^n &=& \overbrace{a \times \cdots \times a}^{n}
\end{eqnarray*}
When $n=0$, $n \cdot a = 0$ and $a^n = 1$.

Scalar power is defined in \HOL{} via the constant \verb@POWER@ which
is defined by primitive recursion:
\begin{session}
\begin{verbatim}
POWER =
|- (!plus a. POWER plus 0 a = Id plus) /\
   (!plus n a. POWER plus(SUC n)a = plus a(POWER plus n a))
\end{verbatim}
\end{session}
Three lemmas about \verb@POWER@ are stated in Appendix~\ref{PrincipalLemmas}.

\subsection{Reduction in a monoid}

This section makes use of the standard \HOL{} theory of lists, in particular
the constants \verb@NIL@, \verb@CONS@, \verb@APPEND@ and \verb@MAP@,
definition by primitive recursion and proof by induction. The theory of
lists is described in \DESCRIPTION.

If $(M,+)$ is a monoid, then the reduction of a list $[a_1,\ldots,a_n]$,
where each $a_i$ is an element of $M$, is the sum,
\[
a_1 + \cdots + a_n,
\]
or the monoid's identity element if $n=0$.

Reduction is represented in \HOL{} by the constant \verb@REDUCE@,
with type,
\begin{verbatim}
REDUCE : (*->*->*) -> (*)list -> *
\end{verbatim}
and which is defined by primitive recursion on lists on its second argument:
\begin{session}
\begin{verbatim}
#let REDUCE =
#  new_list_rec_definition (`REDUCE`,
#    "(!plus. (REDUCE plus NIL) = (Id plus):*) /\
#     (!plus (a:*) as. REDUCE plus (CONS a as) = plus a (REDUCE plus as))");;
REDUCE =
|- (!plus. REDUCE plus[] = Id plus) /\
   (!plus a as. REDUCE plus(CONS a as) = plus a(REDUCE plus as))
\end{verbatim}
\end{session}
Three lemmas about \verb@REDUCE@ are stated in Appendix~\ref{PrincipalLemmas}.

\subsection{Subranges of the natural numbers}

The constant \verb@RANGE@, which is a curried function of two numbers,
is defined by primitive recursion so that,
\[
\verb@RANGE@\,m\,n = [m, m+1, \ldots, m+n-1]
\]
This definition means that the subrange is the first $n$ numbers starting
at $m$.  In particular, when $n=0$ the list is empty.
Appendix~\ref{PrincipalLemmas} states two simple properties of \verb@RANGE@

\verb@RANGE@ generates a subrange given its least element and length;
an alternative method is to generate the subrange given its two endpoints,
via a constant \verb@INTERVAL@ which satisfies the equation:
\[
\verb@INTERVAL@\,m\,n =
    \left\{\begin{array}{ll}
    [m, m+1, \ldots, n] & \mbox{if $m \leq n$} \\
    {}[] & \mbox{otherwise}
    \end{array}\right.
\]

For comparison, the two methods are definable in \HOL{} as follows:
\begin{session}
\begin{verbatim}
#let RANGE =
#    new_recursive_definition false num_Axiom `RANGE`
#      "(RANGE m 0 = NIL) /\
#       (RANGE m (SUC n) = CONS m (RANGE (SUC m) n))";;
#
#let INTERVAL =
#    new_recursive_definition false num_Axiom `INTERVAL`
#      "(INTERVAL m 0 =
#          (m > 0) => NIL | [0]) /\
#      (INTERVAL m (SUC n) =
#          (m > (SUC n)) => NIL | APPEND (INTERVAL m n) ([SUC n]))";;
\end{verbatim}
\end{session}

\subsection{$\Sigma$-notation}

The standard $\Sigma$-notation is defined informally by the equation,
\[
\sum_{i=m}^n a_i =
    \left\{\begin{array}{ll}
    a_m + a_{m+1} + \cdots + a_n & \mbox{if $m \leq n$} \\
    0 & \mbox{otherwise}
    \end{array}\right.
\]
Therefore the standard $\Sigma$-notation is naturally definable via
a subrange generated by \verb@INTERVAL@.

An alternative notation, naturally definable via \verb@RANGE@, can be
defined informally as follows:
\[
\sum_{i=m}^{(n)} a_i =
    \left\{\begin{array}{ll}
    a_m + a_{m+1} + \cdots + a_{m+n-1} & \mbox{if $n > 0$} \\
    0 & \mbox{otherwise}
    \end{array}\right.
\]
The brackets around the $n$ above the $\Sigma$ distinguish this
notation from the standard one.

This second notation is used in the \HOL{} proof of the Binomial Theorem.
The difference between the two is just that the standard notation is
definable via \verb@INTERVAL@, and the second via \verb@RANGE@. The second
notation was chosen entirely because the constant \verb@RANGE@ is easier
to manipulate in \HOL{} than \verb@INTERVAL@, which is because the latter
has a more complex definition. The disadvantage of this choice is that
the proof of the theorem uses a non-standard notation, but that's all the
difference is: notation.  If desired, the Binomial Theorem stated in the
standard notation could be trivially deduced from its non-standard statement.

The non-standard $\Sigma$-notation is defined in \HOL{} as the following
abbreviation:
\begin{session}
\begin{verbatim}
#let SIGMA =
#    new_definition (`SIGMA`,
#      "SIGMA (plus,m,n) f = REDUCE plus (MAP f (RANGE m n)): *");;
\end{verbatim}
\end{session}
Note that an indexed term $a_i$ is represented in \HOL{} as a function
\verb@f@ of type \verb@num -> *@.

Appendix~\ref{PrincipalLemmas} states several properties of \verb@SIGMA@;
their proofs make use of properties of \verb@MAP@ from the theory
\verb@list@, and also the lemmas about \verb@REDUCE@ and \verb@RANGE@.

The functions \verb@REDUCE@ and \verb@RANGE@ are needed here only to define
\verb@SIGMA@, and in fact could be omitted if \verb@SIGMA@ were to be defined
directly  using primitive recursion.  It is worth going to the trouble
of defining \verb@REDUCE@ and \verb@RANGE@, however, because they are likely
to be useful in other situations.

% ----------------------------------------------------------------------------

\section{The Binomial Theorem for a Commutative Ring}
\label{BinomialTheorem}
%Division of a medium sized proof into manageable lemmas.

This section outlines the proof.  For full details see the tactics in
\verb@mk_BINOMIAL.ml@.

The statement and proof of the Binomial Theorem use a new \HOL{} constant,
\verb@BINTERM@, to abbreviate indexed terms of the form $\lambda k.\,{n
\choose k} \cdot a^{n-k} \times b^k$.  \verb@BINTERM@ is defined as a
curried function as follows:
\begin{session}
\begin{verbatim}
#let BINTERM_DEF =
#    new_definition (`BINTERM_DEF`,
#      "BINTERM (plus: *->*->*) (times: *->*->*) a b n k =
#          POWER plus (CHOOSE n k)
#              (times (POWER times (n-k) a) (POWER times k b))");;
\end{verbatim}
\end{session}
This abbreviation lets the Binomial Theorem be stated and proven in \HOL{} as
follows:
\begin{session}
\begin{verbatim}
BINOMIAL =
|- !plus times.
    RING(plus,times) ==>
    (!a b n.
      POWER times n(plus a b) =
      SIGMA(plus,0,SUC n)(BINTERM plus times a b n))
\end{verbatim}
\end{session}
In outline, the proof proceeds as follows.  The following equation is
proven by induction on $n$:
\begin{eqnarray*}
(a+b)^n &=& \sum_{k=0}^{(n+1)} {n \choose k} a^{n-k} b^k
\end{eqnarray*}
When $n=0$ both sides reduce to the multiplicative identity element, $1$,
of the ring. For the inductive step, the equation is assumed for $n$, and
the following goal must be proven:
\begin{eqnarray*}
(a+b)^{n+1} &=& \sum_{k=0}^{(n+2)} {{n+1} \choose k} a^{n+1-k} b^k
\end{eqnarray*}
Three lemmas are used in the inductive step:
\[\begin{array}{rcll}
\displaystyle
\sum_{k=1}^{(n)} {{n+1} \choose k} a^{n+1-k} b^k &=&
\displaystyle
    a \times \sum_{k=1}^{(n)} {n \choose k} a^{n+1-k} b^k +
    b \times \sum_{k=0}^{(n)} {n \choose k} a^{n+1-k} b^k
    & \mbox{({\tt LEMMA\_1})} \\
\displaystyle
a \times \sum_{k=0}^{(n+1)} {n \choose k} a^{n-k} b^k &=&
\displaystyle
    a^{n+1} +
    a \times \sum_{k=1}^{(n)} {{n+1} \choose k} a^{n+1-k} b^k
    & \mbox{({\tt LEMMA\_2})} \\
\displaystyle
b \times \sum_{k=0}^{(n+1)} {n \choose k} a^{n-k} b^k &=&
\displaystyle
    b \times \sum_{k=0}^{(n)} {{n+1} \choose k} a^{n+1-k} b^k
    + b^{n+1}
    & \mbox{({\tt LEMMA\_3})}
\end{array}\]
\verb@LEMMA_1@ is the key to the inductive step. The three stages of the
inductive step are indicated (a), (b) and (c) in \ML{} comments.  Step
(a) expands the right-hand side of the goal with \verb@LEMMA_1@:
\begin{eqnarray*}
\mbox{rhs} &=&
    a^{n+1} +
    a \times \sum_{k=1}^{(n)} {{n+1} \choose k} a^{n+1-k} b^k +
    b \times \sum_{k=0}^{(n)} {{n+1} \choose k} a^{n+1-k} b^k +
    b^{n+1}
\end{eqnarray*}
Step (b) expands the left-hand side with the induction hypothesis:
\begin{eqnarray*}
\mbox{lhs} &=&
    (a+b)\sum_{k=0}^{(n+1)} {n \choose k} a^{n-k} b^k
\end{eqnarray*}
Finally, in step (c) the expansions of the two sides are shown to be the
same, using distributivity of $\times$ over $+$, associativity of $+$,
and \verb@LEMMA_2@ and \verb@LEMMA_3@:
\begin{eqnarray*}
\mbox{lhs} &=&
    a \times \sum_{k=0}^{(n+1)} {n \choose k} a^{n-k} b^k
    +
    b \times \sum_{k=0}^{(n+1)} {n \choose k} a^{n-k} b^k \\
  &=&
    a^{n+1} +
    a \times \sum_{k=1}^{(n)} {{n+1} \choose k} a^{n+1-k} b^k
    +
    b \times \sum_{k=0}^{(n)} {{n+1} \choose k} a^{n+1-k} b^k
    + b^{n+1} \\
  &=&
      \mbox{rhs}
\end{eqnarray*}

% ----------------------------------------------------------------------------

\section{The Binomial Theorem for Integers}
\label{BinomialTheoremForIntegers}

The previous sections have dealt with the material in theory \verb@BINOMIAL@,
concerning how to prove the Binomial Theorem for an arbitrary ring.  A
natural next step is to produce some specific commutative ring, and prove
a Binomial Theorem for it, as an instance of the general case.

To see how to do so for Elsa Gunter's theory of integers, take a look
at the file {\tt mk\_BINOMIAL\_integer.ml}, contained in directory \path{}.
When loaded into \HOL{}, this \ML{} file proves that the integers are a
ring---by quoting laws contained in the theory \verb@integer@---and then
derives a Binomial Theorem for the integers---by Modus Ponens from the
general theorem \verb@BINOMIAL@.