1%if false 2 Copyright (c) 2009, ETH Zurich. 3 All rights reserved. 4 5 This file is distributed under the terms in the attached LICENSE file. 6 If you do not find this file, copies can be found by writing to: 7 ETH Zurich D-INFK, Haldeneggsteig 4, CH-8092 Zurich. Attn: Systems Group. 8%endif 9 10%include polycode.fmt 11 12%if false 13 14> module PureExpressions where 15 16> import Data.Char 17 18> import {-# SOURCE #-} Constructs 19 20%endif 21 22\section{Filet-o-Fish pure expressions} 23\label{sec:fof_syntax_core} 24 25The core of Filet-o-Fish is organized around the purely functional 26core of C. It consists of C types as well as C expressions. 27 28 29%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 30\subsection{Types} 31%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 32 33%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 34\subsubsection{Data-type Definitions} 35%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 36 37The |TypeExpr| data-type encompasses the following types: 38\begin{itemize} 39 \item Void, 40 \item Integers, of various signedness and size, 41 \item Float, 42 \item Named structures and unions, 43 \item Named pointers, ie. a pointer recurring in a structure 44 or union, 45 \item Arrays, and 46 \item Pointers 47\end{itemize} 48 49Note that a value of type |TInt| or |TFloat| is a \emph{constant}, 50like $2$, $3 / 7$, or \verb!sizeof(struct foo)!. In FoF meta-language, 51a C variable is \emph{not} a value -- but a construct. So, the type of 52the variable 53$x$ defined by \verb!int32_t x = 4! is \emph{not} |TInt Signed TInt32|. 54 55> data TypeExpr = TVoid 56> | TInt Signedness Size 57> | TFloat 58> | TChar 59> | TStruct AllocStruct String TFieldList 60> | TUnion AllocUnion String TFieldList 61> | TCompPointer String 62> | TEnum String [(String, Int)] 63> | TArray AllocArray TypeExpr 64> | TPointer TypeExpr Mode 65> | TTypedef TypeExpr String 66> | TFun String Function TypeExpr [(TypeExpr, Maybe String)] 67> deriving (Eq, Show) 68 69%%%%%%%%%%%%%%%% 70\paragraph{Functions} 71%%%%%%%%%%%%%%%% 72 73A function is represented by an Haskell function, taking a list of 74arguments and computing the body of the function. In the jargon, this 75is called an \emph{higher-order abstract syntax}. So, the function 76definition is represented by the following type: 77 78> data Function = Fun ([PureExpr] -> FoFCode PureExpr) 79 80Because |TypeExpr| is showable, |Function| has to be showable 81too. While we could define a more complete |Show| instance for 82|Function|, we will not do so here and simply return an opaque name. 83 84> instance Show Function where 85> show _ = "<fun>" 86 87Concerning equality, this becomes more tricky. We would have to define 88what "equality" means and if that definition is decidable. Here, we 89consider syntactic equality and although we could decide whether two 90functions are syntactically equal or not, we will not do so for the 91moment. We simply consider functions as always distinct. 92 93> instance Eq Function where 94> _ == _ = False 95 96 97%%%%%%%%%%%%%%%% 98\paragraph{Composed data-types} 99%%%%%%%%%%%%%%%% 100 101 102Composed data-types have several allocation policies: they might be 103declared dynamically, using malloc, or statically, on the stack. This 104is reflected by the following definitions. We chose to use differents 105definitions for each kind of data-type because they are likely to 106evolve in future versions and diverge from this common scheme. 107 108> data AllocStruct = StaticStruct 109> | DynamicStruct 110> deriving (Eq, Show) 111> data AllocUnion = StaticUnion 112> | DynamicUnion 113> deriving (Eq, Show) 114> data AllocArray = StaticArray Int 115> | DynamicArray 116> deriving (Eq, Show) 117 118Both Structures and Unions rely on the |TFieldList| 119synonym. Basically, the type of a Structure corresponds to its name 120as well as the list of its field names and respective types. 121 122> type TFieldList = [(String, TypeExpr)] 123 124 125%%%%%%%%%%%%%%%% 126\paragraph{Integers} 127%%%%%%%%%%%%%%%% 128 129Signedness and size of integers is defined as usual. An integer is 130either signed or unsigned and its size may vary from 8 to 64 131bits. Interestingly, we derive |Ord| on these data-types: |Ord| 132provides us with a comparison function on the signedness and size. In 133practice, we can check that a cast is a correct \emph{downcasting} by 134enforcing that the sign and size we cast to is \emph{bigger} than the 135original sign and size. 136 137> data Signedness = Unsigned 138> | Signed 139> deriving (Eq, Ord, Show) 140> 141> data Size = TInt8 142> | TInt16 143> | TInt32 144> | TInt64 145> deriving (Eq, Ord, Show) 146 147 148%%%%%%%%%%%%%%%% 149\paragraph{Pointers} 150%%%%%%%%%%%%%%%% 151 152As we understand that the suspense is unbearable, we are going to 153reveal you the type of $x$ defined above. Actually, the type of $x$ is 154|TPointer (TInt Signed TInt32) Avail|. A pointer? Indeed, a variable 155does actually \emph{points} to a location in memory. This choice 156allows us to capture the notion of variables and pointers in a 157single abstraction, called a \emph{reference cell}. 158 159A reference cell can be in one of the following states: either 160|Avail|able or |Read|. This 161distinction makes sense during the compilation process, it can ignored 162otherwise. 163 164> data Mode = Avail 165> | Read 166> deriving (Eq, Show) 167 168%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 169\subsubsection{Smart Constructors} 170%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 171 172In some circumstances, it is necessary to explicitly write the type of 173an expression. However, explicitly combining the previously defined 174types can be quite cumbersome. For example, we can naturally define 175the base types as follow: 176 177> voidT :: TypeExpr 178> voidT = TVoid 179> 180> uint8T, uint16T, uint32T, uint64T :: TypeExpr 181> uint8T = TInt Unsigned TInt8 182> uint16T = TInt Unsigned TInt16 183> uint32T = TInt Unsigned TInt32 184> uint64T = TInt Unsigned TInt64 185> 186> int8T, int16T, int32T, int64T :: TypeExpr 187> int8T = TInt Signed TInt8 188> int16T = TInt Signed TInt16 189> int32T = TInt Signed TInt32 190> int64T = TInt Signed TInt64 191> 192> floatT :: TypeExpr 193> floatT = TFloat 194> 195> charT :: TypeExpr 196> charT = TChar 197 198> uintptrT :: TypeExpr 199> uintptrT = TCompPointer "void" 200 201 202And, similarly, we can build up composed types by applying them on 203smaller types: 204 205> arrayDT :: TypeExpr -> TypeExpr 206> arrayDT typ = TArray DynamicArray typ 207> 208> arrayST :: Int -> TypeExpr -> TypeExpr 209> arrayST size typ = TArray (StaticArray size) typ 210> 211> ptrT :: TypeExpr -> TypeExpr 212> ptrT typ = TPointer typ Avail 213> 214> structDT, unionDT, 215> structST, unionST :: String -> TFieldList -> TypeExpr 216> structDT name fields = TStruct DynamicStruct name fields 217> unionDT name fields = TUnion DynamicUnion name fields 218> structST name fields = TStruct StaticStruct name fields 219> unionST name fields = TUnion StaticUnion name fields 220> 221> enumT :: String -> [(String, Int)] -> TypeExpr 222> enumT name fields = TEnum name fields 223> 224> typedef :: TypeExpr -> String -> TypeExpr 225> typedef typ name = TTypedef typ name 226 227Finally, the named pointer -- which is actually a \emph{fix-point} -- 228takes as input the name of the structure or union it refers to. 229 230> cptrT :: String -> TypeExpr 231> cptrT id = TCompPointer id 232 233 234%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 235\subsection{Pure Expressions} 236%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 237 238In a first step, we are going to define the expressions composing FoF 239meta-language. As for types, this consists in a data-type, |PureExpr|, 240capturing the syntax of expressions. Then, we also define some smart 241constructors. 242 243%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 244\subsubsection{Data-type Definitions} 245%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 246 247An expression is one of the following object: 248\begin{itemize} 249\item |void|, the only object populating the type |Void|, 250\item an integer, of specific signedness and size, 251\item a float, 252\item a reference to an object in memory, 253\item a unary operation, applied to an object, 254\item a binary operation, applied on two objects, 255\item the |sizeof| operator, applied to a type, 256\item a conditional expression, testing an object against $0$, 257 returning one of two objects, and 258\item a |cast| operator, casting an object to a given type 259\end{itemize} 260 261 262> data PureExpr = Void 263> | CLInteger Signedness Size Integer 264> | CLFloat Float 265> | CLChar Char 266> | CLRef Origin TypeExpr VarName 267> | Unary UnaryOp PureExpr 268> | Binary BinaryOp PureExpr PureExpr 269> | Sizeof TypeExpr 270> | Test PureExpr PureExpr PureExpr 271> | Cast TypeExpr PureExpr 272> | Quote String 273> deriving (Eq, Show) 274 275%%%%%%%%%%%%%%%% 276\paragraph{Variable names} 277%%%%%%%%%%%%%%%% 278 279A reference is identified by a name. A |Generated| name has been 280forged by FoF. A |Provided| name has been defined by the compiler 281designer. An |Inherited| name results from an operation performed on 282another variable. We carefully track the origin of names for compilation purpose: for 283example, if a variable name has been |Generated|, we should try to 284eliminate it, to make the compiled code more readable. 285 286> data VarName = Generated String 287> | Provided String 288> | Inherited Int VarName 289> deriving (Show, Eq) 290 291A reference is also decorated by its \emph{origin}. This field is used by 292the compiler to identify the scope of variables. Therefore, the 293compiler can enforce some safety checks, such as verifying that the 294address of a local variable is not assigned to a global one, for 295example. Sadly, this information is not always precisely maintained 296nor correctly used in the current implementation. More care and more 297checks should be added in the future, to ensure the correctness of the 298generated code. 299 300> data Origin = Local 301> | Global 302> | Param 303> | Dynamic 304> deriving (Eq, Show) 305 306%%%%%%%%%%%%%%%% 307\paragraph{Unary operations} 308%%%%%%%%%%%%%%%% 309 310The unary operations are either the arithmetic \emph{minus} operation, 311or the logic \emph{complement} operation, or the logic \emph{negation} 312operation. 313 314> data UnaryOp = Minus | Complement | Negation 315> deriving (Eq, Show) 316 317%%%%%%%%%%%%%%%% 318\paragraph{Binary operations} 319%%%%%%%%%%%%%%%% 320 321The binary operations are either arithmetic operators ($+$, $-$, 322$\times$, $/$, and $\%$), Boolean operators ($<<$, $>>$, $\&$, 323bitwise-or, and \^{}), or comparison operators ($<$, $<=$, $>$, $>=$, 324$==$, and $!=$). 325 326> data BinaryOp = Plus | Sub | Mul | Div | Mod 327> | Shl | Shr | AndBit | OrBit | XorBit 328> | Le | Leq | Ge | Geq | Eq | Neq 329> deriving (Eq, Show) 330 331 332%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 333\subsubsection{Smart Constructors} 334%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 335 336As usual, we define some constructors for the C programmer to feel at 337home with FoF. Let us start with the constants first: 338 339> void :: PureExpr 340> void = Void 341> 342> int8, int16, int32, int64 :: Integer -> PureExpr 343> int8 x = CLInteger Signed TInt8 x 344> int16 x = CLInteger Signed TInt16 x 345> int32 x = CLInteger Signed TInt32 x 346> int64 x = CLInteger Signed TInt64 x 347> 348> uint8, uint16, uint32, uint64 :: Integer -> PureExpr 349> uint8 x = CLInteger Unsigned TInt8 x 350> uint16 x = CLInteger Unsigned TInt16 x 351> uint32 x = CLInteger Unsigned TInt32 x 352> uint64 x = CLInteger Unsigned TInt64 x 353> 354> charc :: Char -> PureExpr 355> charc x = CLInteger Unsigned TInt8 (toInteger $ ord x) 356> 357> float :: Float -> PureExpr 358> float x = CLFloat x 359> 360> cchar :: Char -> PureExpr 361> cchar x = CLChar x 362 363> opaque :: TypeExpr -> String -> PureExpr 364> opaque t s = CLRef Local t (Provided s) 365 366Then come the unary operators: 367 368> minus, comp, neg :: PureExpr -> PureExpr 369> minus = Unary Minus 370> comp = Unary Complement 371> neg = Unary Negation 372 373And the binary operators. Note that they are defined 374\emph{infix}. Therefore, it becomes possible to write the following 375code: 376 377> exampleInfix :: PureExpr 378> exampleInfix = (uint8 1) .<. ((uint8 2) .+. (uint8 4)) 379 380Although not specified yet, we could have set up the left/right 381associativity and precedence rules of these operators. This would 382reduce the parenthesizing overhead. It is just a matter of doing it. 383 384> (.+.), (.-.), (.*.), (./.), (.%.), 385> (.<<.), (.>>.), (.&.), (.|.), (.^.), 386> (.<.), (.<=.), (.>.), 387> (.>=.), (.==.), (.!=.) :: PureExpr -> PureExpr -> PureExpr 388> (.+.) = Binary Plus 389> (.-.) = Binary Sub 390> (.*.) = Binary Mul 391> (./.) = Binary Div 392> (.%.) = Binary Mod 393> (.<<.) = Binary Shl 394> (.>>.) = Binary Shr 395> (.&.) = Binary AndBit 396> (.|.) = Binary OrBit 397> (.^.) = Binary XorBit 398> (.<.) = Binary Le 399> (.<=.) = Binary Leq 400> (.>.) = Binary Ge 401> (.>=.) = Binary Geq 402> (.==.) = Binary Eq 403> (.!=.) = Binary Neq 404 405Finally, |sizeof|, conditionals, and |cast| have their straightforward 406alter-ego in FoF: 407 408> sizeof :: TypeExpr -> PureExpr 409> sizeof t = Sizeof t 410> 411> test :: PureExpr -> PureExpr -> PureExpr -> PureExpr 412> test c ift iff = Test c ift iff 413> 414> cast :: TypeExpr -> PureExpr -> PureExpr 415> cast t e = Cast t e 416 417 418When compiling foreign function calls, one might need to turn a 419(Haskell) string into a FoF \emph{quote} object. This is achieved by 420the following combinator. One must avoid using this operation as much 421as possible: this quotation has no semantic meaning, therefore one 422should use it only when we are really sure we are not interested in 423the quoted semantic anymore. 424 425> quote :: String -> PureExpr 426> quote s = Quote s