1theory Foundations 2imports Introduction 3begin 4 5section \<open>Code generation foundations \label{sec:foundations}\<close> 6 7subsection \<open>Code generator architecture \label{sec:architecture}\<close> 8 9text \<open> 10 The code generator is actually a framework consisting of different 11 components which can be customised individually. 12 13 Conceptually all components operate on Isabelle's logic framework 14 Pure. Practically, the object logic HOL 15 provides the necessary facilities to make use of the code generator, 16 mainly since it is an extension of Isabelle/Pure. 17 18 The constellation of the different components is visualized in the 19 following picture. 20 21 \begin{figure}[h] 22 \def\sys#1{\emph{#1}} 23 \begin{tikzpicture}[x = 4cm, y = 1cm] 24 \tikzstyle positive=[color = black, fill = white]; 25 \tikzstyle negative=[color = white, fill = black]; 26 \tikzstyle entity=[rounded corners, draw, thick]; 27 \tikzstyle process=[ellipse, draw, thick]; 28 \tikzstyle arrow=[-stealth, semithick]; 29 \node (spec) at (0, 3) [entity, positive] {specification tools}; 30 \node (user) at (1, 3) [entity, positive] {user proofs}; 31 \node (spec_user_join) at (0.5, 3) [shape=coordinate] {}; 32 \node (raw) at (0.5, 4) [entity, positive] {raw code equations}; 33 \node (pre) at (1.5, 4) [process, positive] {preprocessing}; 34 \node (eqn) at (2.5, 4) [entity, positive] {code equations}; 35 \node (iml) at (0.5, 0) [entity, positive] {intermediate program}; 36 \node (seri) at (1.5, 0) [process, positive] {serialisation}; 37 \node (SML) at (2.5, 3) [entity, positive] {\sys{SML}}; 38 \node (OCaml) at (2.5, 2) [entity, positive] {\sys{OCaml}}; 39 \node (Haskell) at (2.5, 1) [entity, positive] {\sys{Haskell}}; 40 \node (Scala) at (2.5, 0) [entity, positive] {\sys{Scala}}; 41 \draw [semithick] (spec) -- (spec_user_join); 42 \draw [semithick] (user) -- (spec_user_join); 43 \draw [-diamond, semithick] (spec_user_join) -- (raw); 44 \draw [arrow] (raw) -- (pre); 45 \draw [arrow] (pre) -- (eqn); 46 \draw [arrow] (eqn) -- node (transl) [process, positive] {translation} (iml); 47 \draw [arrow] (iml) -- (seri); 48 \draw [arrow] (seri) -- (SML); 49 \draw [arrow] (seri) -- (OCaml); 50 \draw [arrow] (seri) -- (Haskell); 51 \draw [arrow] (seri) -- (Scala); 52 \end{tikzpicture} 53 \caption{Code generator architecture} 54 \label{fig:arch} 55 \end{figure} 56 57 \noindent Central to code generation is the notion of \emph{code 58 equations}. A code equation as a first approximation is a theorem 59 of the form @{text "f t\<^sub>1 t\<^sub>2 \<dots> t\<^sub>n \<equiv> t"} (an equation headed by a 60 constant @{text f} with arguments @{text "t\<^sub>1 t\<^sub>2 \<dots> t\<^sub>n"} and right 61 hand side @{text t}). 62 63 \begin{itemize} 64 65 \item Starting point of code generation is a collection of (raw) 66 code equations in a theory. It is not relevant where they stem 67 from, but typically they were either produced by specification 68 tools or proved explicitly by the user. 69 70 \item These raw code equations can be subjected to theorem 71 transformations. This \qn{preprocessor} (see 72 \secref{sec:preproc}) can apply the full expressiveness of 73 ML-based theorem transformations to code generation. The result 74 of preprocessing is a structured collection of code equations. 75 76 \item These code equations are \qn{translated} to a program in an 77 abstract intermediate language. Think of it as a kind of 78 \qt{Mini-Haskell} with four \qn{statements}: @{text data} (for 79 datatypes), @{text fun} (stemming from code equations), also 80 @{text class} and @{text inst} (for type classes). 81 82 \item Finally, the abstract program is \qn{serialised} into 83 concrete source code of a target language. This step only 84 produces concrete syntax but does not change the program in 85 essence; all conceptual transformations occur in the translation 86 step. 87 88 \end{itemize} 89 90 \noindent From these steps, only the last two are carried out 91 outside the logic; by keeping this layer as thin as possible, the 92 amount of code to trust is kept to a minimum. 93\<close> 94 95 96subsection \<open>The pre- and postprocessor \label{sec:preproc}\<close> 97 98text \<open> 99 Before selected function theorems are turned into abstract code, a 100 chain of definitional transformation steps is carried out: 101 \emph{preprocessing}. The preprocessor consists of two 102 components: a \emph{simpset} and \emph{function transformers}. 103 104 The preprocessor simpset has a disparate brother, the 105 \emph{postprocessor simpset}. 106 In the theory-to-code scenario depicted in the picture 107 above, it plays no role. But if generated code is used 108 to evaluate expressions (cf.~\secref{sec:evaluation}), the 109 postprocessor simpset is applied to the resulting expression before this 110 is turned back. 111 112 The pre- and postprocessor \emph{simpsets} can apply the 113 full generality of the Isabelle 114 simplifier. Due to the interpretation of theorems as code 115 equations, rewrites are applied to the right hand side and the 116 arguments of the left hand side of an equation, but never to the 117 constant heading the left hand side. 118 119 Pre- and postprocessor can be setup to transfer between 120 expressions suitable for logical reasoning and expressions 121 suitable for execution. As example, take list membership; logically 122 it is expressed as @{term "x \<in> set xs"}. But for execution 123 the intermediate set is not desirable. Hence the following 124 specification: 125\<close> 126 127definition %quote member :: "'a list \<Rightarrow> 'a \<Rightarrow> bool" 128where 129 [code_abbrev]: "member xs x \<longleftrightarrow> x \<in> set xs" 130 131text \<open> 132 \noindent The \emph{@{attribute code_abbrev} attribute} declares 133 its theorem a rewrite rule for the postprocessor and the symmetric 134 of its theorem as rewrite rule for the preprocessor. Together, 135 this has the effect that expressions @{thm (rhs) member_def [no_vars]} 136 are replaced by @{thm (lhs) member_def [no_vars]} in generated code, but 137 are turned back into @{thm (rhs) member_def [no_vars]} if generated 138 code is used for evaluation. 139 140 Rewrite rules for pre- or postprocessor may be 141 declared independently using \emph{@{attribute code_unfold}} 142 or \emph{@{attribute code_post}} respectively. 143 144 \emph{Function transformers} provide a very general 145 interface, transforming a list of function theorems to another list 146 of function theorems, provided that neither the heading constant nor 147 its type change. The @{term "0::nat"} / @{const Suc} pattern 148 used in theory @{text Code_Abstract_Nat} (see \secref{abstract_nat}) 149 uses this interface. 150 151 \noindent The current setup of the pre- and postprocessor may be inspected 152 using the @{command_def print_codeproc} command. @{command_def 153 code_thms} (see \secref{sec:equations}) provides a convenient 154 mechanism to inspect the impact of a preprocessor setup on code 155 equations. Attribute @{attribute code_preproc_trace} allows 156 for low-level tracing: 157\<close> 158 159declare %quote [[code_preproc_trace]] 160 161declare %quote [[code_preproc_trace only: distinct remdups]] 162 163declare %quote [[code_preproc_trace off]] 164 165 166subsection \<open>Understanding code equations \label{sec:equations}\<close> 167 168text \<open> 169 As told in \secref{sec:principle}, the notion of code equations is 170 vital to code generation. Indeed most problems which occur in 171 practice can be resolved by an inspection of the underlying code 172 equations. 173 174 It is possible to exchange the default code equations for constants 175 by explicitly proving alternative ones: 176\<close> 177 178lemma %quote [code]: 179 "dequeue (AQueue xs []) = 180 (if xs = [] then (None, AQueue [] []) 181 else dequeue (AQueue [] (rev xs)))" 182 "dequeue (AQueue xs (y # ys)) = 183 (Some y, AQueue xs ys)" 184 by (cases xs, simp_all) (cases "rev xs", simp_all) 185 186text \<open> 187 \noindent The annotation @{text "[code]"} is an @{text attribute} 188 which states that the given theorems should be considered as code 189 equations for a @{text fun} statement -- the corresponding constant 190 is determined syntactically. The resulting code: 191\<close> 192 193text %quotetypewriter \<open> 194 @{code_stmts dequeue (consts) dequeue (Haskell)} 195\<close> 196 197text \<open> 198 \noindent You may note that the equality test @{term "xs = []"} has 199 been replaced by the predicate @{term "List.null xs"}. This is due 200 to the default setup of the \qn{preprocessor}. 201 202 This possibility to select arbitrary code equations is the key 203 technique for program and datatype refinement (see 204 \secref{sec:refinement}). 205 206 Due to the preprocessor, there is the distinction of raw code 207 equations (before preprocessing) and code equations (after 208 preprocessing). 209 210 The first can be listed (among other data) using the @{command_def 211 print_codesetup} command. 212 213 The code equations after preprocessing are already are blueprint of 214 the generated program and can be inspected using the @{command 215 code_thms} command: 216\<close> 217 218code_thms %quote dequeue 219 220text \<open> 221 \noindent This prints a table with the code equations for @{const 222 dequeue}, including \emph{all} code equations those equations depend 223 on recursively. These dependencies themselves can be visualized using 224 the @{command_def code_deps} command. 225\<close> 226 227 228subsection \<open>Equality\<close> 229 230text \<open> 231 Implementation of equality deserves some attention. Here an example 232 function involving polymorphic equality: 233\<close> 234 235primrec %quote collect_duplicates :: "'a list \<Rightarrow> 'a list \<Rightarrow> 'a list \<Rightarrow> 'a list" where 236 "collect_duplicates xs ys [] = xs" 237| "collect_duplicates xs ys (z#zs) = (if z \<in> set xs 238 then if z \<in> set ys 239 then collect_duplicates xs ys zs 240 else collect_duplicates xs (z#ys) zs 241 else collect_duplicates (z#xs) (z#ys) zs)" 242 243text \<open> 244 \noindent During preprocessing, the membership test is rewritten, 245 resulting in @{const List.member}, which itself performs an explicit 246 equality check, as can be seen in the corresponding @{text SML} code: 247\<close> 248 249text %quotetypewriter \<open> 250 @{code_stmts collect_duplicates (SML)} 251\<close> 252 253text \<open> 254 \noindent Obviously, polymorphic equality is implemented the Haskell 255 way using a type class. How is this achieved? HOL introduces an 256 explicit class @{class equal} with a corresponding operation @{const 257 HOL.equal} such that @{thm equal [no_vars]}. The preprocessing 258 framework does the rest by propagating the @{class equal} constraints 259 through all dependent code equations. For datatypes, instances of 260 @{class equal} are implicitly derived when possible. For other types, 261 you may instantiate @{text equal} manually like any other type class. 262\<close> 263 264 265subsection \<open>Explicit partiality \label{sec:partiality}\<close> 266 267text \<open> 268 Partiality usually enters the game by partial patterns, as 269 in the following example, again for amortised queues: 270\<close> 271 272definition %quote strict_dequeue :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where 273 "strict_dequeue q = (case dequeue q 274 of (Some x, q') \<Rightarrow> (x, q'))" 275 276lemma %quote strict_dequeue_AQueue [code]: 277 "strict_dequeue (AQueue xs (y # ys)) = (y, AQueue xs ys)" 278 "strict_dequeue (AQueue xs []) = 279 (case rev xs of y # ys \<Rightarrow> (y, AQueue [] ys))" 280 by (simp_all add: strict_dequeue_def) (cases xs, simp_all split: list.split) 281 282text \<open> 283 \noindent In the corresponding code, there is no equation 284 for the pattern @{term "AQueue [] []"}: 285\<close> 286 287text %quotetypewriter \<open> 288 @{code_stmts strict_dequeue (consts) strict_dequeue (Haskell)} 289\<close> 290 291text \<open> 292 \noindent In some cases it is desirable to state this 293 pseudo-\qt{partiality} more explicitly, e.g.~as follows: 294\<close> 295 296axiomatization %quote empty_queue :: 'a 297 298definition %quote strict_dequeue' :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where 299 "strict_dequeue' q = (case dequeue q of (Some x, q') \<Rightarrow> (x, q') 300 | _ \<Rightarrow> empty_queue)" 301 302lemma %quote strict_dequeue'_AQueue [code]: 303 "strict_dequeue' (AQueue xs []) = (if xs = [] then empty_queue 304 else strict_dequeue' (AQueue [] (rev xs)))" 305 "strict_dequeue' (AQueue xs (y # ys)) = 306 (y, AQueue xs ys)" 307 by (simp_all add: strict_dequeue'_def split: list.splits) 308 309text \<open> 310 Observe that on the right hand side of the definition of @{const 311 "strict_dequeue'"}, the unspecified constant @{const empty_queue} occurs. 312 An attempt to generate code for @{const strict_dequeue'} would 313 make the code generator complain that @{const empty_queue} has 314 no associated code equations. In most situations unimplemented 315 constants indeed indicated a broken program; however such 316 constants can also be thought of as function definitions which always fail, 317 since there is never a successful pattern match on the left hand 318 side. In order to categorise a constant into that category 319 explicitly, use the @{attribute code} attribute with 320 @{text abort}: 321\<close> 322 323declare %quote [[code abort: empty_queue]] 324 325text \<open> 326 \noindent Then the code generator will just insert an error or 327 exception at the appropriate position: 328\<close> 329 330text %quotetypewriter \<open> 331 @{code_stmts strict_dequeue' (consts) empty_queue strict_dequeue' (Haskell)} 332\<close> 333 334text \<open> 335 \noindent This feature however is rarely needed in practice. Note 336 that the HOL default setup already includes 337\<close> 338 339declare %quote [[code abort: undefined]] 340 341text \<open> 342 \noindent -- hence @{const undefined} can always be used in such 343 situations. 344\<close> 345 346 347subsection \<open>If something goes utterly wrong \label{sec:utterly_wrong}\<close> 348 349text \<open> 350 Under certain circumstances, the code generator fails to produce 351 code entirely. To debug these, the following hints may prove 352 helpful: 353 354 \begin{description} 355 356 \ditem{\emph{Check with a different target language}.} Sometimes 357 the situation gets more clear if you switch to another target 358 language; the code generated there might give some hints what 359 prevents the code generator to produce code for the desired 360 language. 361 362 \ditem{\emph{Inspect code equations}.} Code equations are the central 363 carrier of code generation. Most problems occurring while generating 364 code can be traced to single equations which are printed as part of 365 the error message. A closer inspection of those may offer the key 366 for solving issues (cf.~\secref{sec:equations}). 367 368 \ditem{\emph{Inspect preprocessor setup}.} The preprocessor might 369 transform code equations unexpectedly; to understand an 370 inspection of its setup is necessary (cf.~\secref{sec:preproc}). 371 372 \ditem{\emph{Generate exceptions}.} If the code generator 373 complains about missing code equations, in can be helpful to 374 implement the offending constants as exceptions 375 (cf.~\secref{sec:partiality}); this allows at least for a formal 376 generation of code, whose inspection may then give clues what is 377 wrong. 378 379 \ditem{\emph{Remove offending code equations}.} If code 380 generation is prevented by just a single equation, this can be 381 removed (cf.~\secref{sec:equations}) to allow formal code 382 generation, whose result in turn can be used to trace the 383 problem. The most prominent case here are mismatches in type 384 class signatures (\qt{wellsortedness error}). 385 386 \end{description} 387\<close> 388 389end 390