1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
2<html>
3
4<head>
5<title>Poly/ML Interface to the C Programming Language</title>
6</head>
7
8<body>
9<font face="Arial, Helvetica, sans-serif">This is the old foreign function interface. 
10It has now been superseded by the <a href="../Reference/Foreign.html">Foreign</a> 
11structure. </font>
12<h1>Poly/ML Interface to the C Programming Language</h1>
13
14<h2>Nick Chapman&nbsp;&nbsp;&nbsp; June 6, 1994</h2>
15
16<ol>
17  <li><a href="CInterface.html#1 Introduction">Introduction</a></li>
18  <li><a href="CInterface.html#2 Dynamic Libraries">Dynamic Libraries</a></li>
19  <li><a href="CInterface.html#3 Creating a Dynamic Library">Creating a Dynamic Library</a></li>
20  <li><a href="CInterface.html#4 Calling Simple C-functions">Calling Simple C-functions</a></li>
21  <li><a href="CInterface.html#5 Calln functions">A family of <tt>call</tt><i>n</i> functions</a></li>
22  <li><a href="CInterface.html#6 Predefined Conversions">Predefined <tt>Conversion</tt>s</a></li>
23  <li><a href="CInterface.html#7 Volatile Types">Volatile Types: <tt>vol</tt>, <tt>sym</tt>
24    and <tt>dylib</tt>.</a></li>
25  <li><a href="CInterface.html#8 Calling C-functions with return-parameters">Calling
26    C-functions with <em>return-parameters</em></a></li>
27  <li><a href="CInterface.html#9 A family of callnretr functions">A family of <tt>call</tt><i>n</i><tt>ret</tt><i>r</i>
28    functions</a></li>
29  <li><a href="CInterface.html#10 C structures">C structures</a></li>
30  <li><a href="CInterface.html#11 A family of structn Conversionals">A family of <tt>struct</tt><i>n</i>
31    Conversionals</a></li>
32  <li><a href="CInterface.html#12 Lower Level Calling Mechanism: call_sym">Lower Level Calling
33    Mechanism: <tt>call_sym</tt></a></li>
34  <li><a href="CInterface.html#13 Creating New Conversions">Creating New <tt>Conversion</tt>s</a></li>
35  <li><a href="CInterface.html#14 Enumerated Types">Enumerated Types</a></li>
36  <li><a href="CInterface.html#15 C Programming Primitives">C Programming Primitives</a></li>
37  <li><a href="CInterface.html#16 Example: Quicksort">Example: Quicksort</a></li>
38  <li><a href="CInterface.html#17 Volatile Implementation">Volatile Implementation</a></li>
39</ol>
40
41<h2><a name="1 Introduction">1 Introduction</a></h2>
42
43<p>It is now possible for Poly/ML to call functions which have been written in the C
44programming language. These functions are accessed from a dynamic library, and so don't
45have to be statically linked into the Poly/ML runtime system. The C interface is contained
46in the structure <b><tt>CInterface</tt></b>, which is built into every ML database. The
47facilities available allow dynamic libraries to be loaded and for symbols to be extracted
48from these libraries. symbols which represent C-functions can be executed.</p>
49
50<p>The arguments to a C-function need to be in a format which the C-function can
51understand. Similarly, the return value from a C-function will be in a standard C format.
52All such C-values are represented in ML using the abstract type <b><tt>vol</tt></b>.
53Values of this type are volatile because they do not persist from one ML session to the
54next. There are facilities to convert between ML-values and <b><tt>vol</tt></b>s, together
55with a collection of 'C-programming' primitives to manipulate vols.</p>
56
57<h2><a name="2 Dynamic Libraries">2 <b>Dynamic Libraries</b></a></h2>
58
59<p><b><tt>exception Foreign of string<br>
60val load_lib : string -&gt; dylib<br>
61val load_sym : dylib -&gt; string -&gt; sym<br>
62val get_sym : string -&gt; string -&gt; sym</tt></b></p>
63
64<p>The function <b><tt>load_lib</tt> </b>takes an ML string containing the pathname of a
65dynamic library. This should preferably be a full pathname. If it is a relative pathname
66it will be interpreted with respect to the directory in which the ML session was started
67from. The return value is a <b><tt>dylib</tt></b> representing the dynamic library. If the
68dynamic library cannot be found, the exception <b><tt>Foreign</tt></b> is raised with a
69string describing the problem.</p>
70
71<p><i>If the file named by the filename exists but is not in the correct format for a
72dynamic library, the underlying C-function</i> <b><tt>dlopen</tt></b> <i>prints an error
73message and then kills the ML session. So far, I have been unable to catch this error.</i></p>
74
75<p>Once a library has been opened, a symbol may be extracted from the library with the
76function <b><tt>load_sym</tt></b>. This takes a <b><tt>dylib</tt></b> representing the
77dynamic library and an ML string naming the symbol. The return value is a <b><tt>sym</tt></b>
78representing the symbol. If the symbol is not contained in the dynamic library, the
79exception <b><tt>Foreign</tt></b> is raised with a string describing the problem.</p>
80
81<p>Often the return value of the function <b><tt>load_lib</tt></b> is passed directly to
82the function <b><tt>load_sym</tt></b> . This combination is captured by the function <b><tt>get_sym</tt></b>,
83which takes two strings naming the dynamic library and the symbol, and returns the <b><tt>sym</tt>
84</b>representing the symbol, or raises the exception <b><tt>Foreign</tt></b>.</p>
85
86<p><b><tt>fun get_sym lib sym = load_sym (load_lib lib) sym;</tt></b></p>
87
88<p>Values of type <b><tt>dylib</tt> </b>and <b><tt>sym</tt> </b>share the volatile nature
89of <b><tt>vol</tt> </b>; they do not persist from one ML session to the next. This is
90explained in more detail in <a href="CInterface.html#7 Volatile Types">Section 7</a>.</p>
91
92<h2><a name="3 Creating a Dynamic Library">3 Creating a Dynamic Library</a></h2>
93
94<p>Suppose we have written a C-function called <b><tt>difference</tt></b>, which computes
95the difference of two integers. The function is contained in a file named <b><tt>sample. c</tt></b>.</p>
96
97<p><tt><strong>int difference (int x, int y) {<br>
98&nbsp;&nbsp;&nbsp; return x &gt; y ? x - y : y - x;<br>
99}</strong></tt></p>
100
101<p>To create a dynamic library containing this function we carry out the following steps
102at the shell prompt:</p>
103
104<p><tt><b>Pinky$ gcc -c sample.c -o sample.o<br>
105Pinky$ ld -o sample.so sample.o</b></tt></p>
106
107<p>These steps create a dynamic library named <b><tt>sample.so</tt></b>. Often many
108symbols will be retrieved from the same dynamic library, and so it is useful to partially
109apply the function <b><tt>get_sym</tt></b> to the name of the common library. Most of the
110examples in this document use symbols retrieved from the library <b><tt>samples.so</tt></b>.</p>
111
112<p><tt><strong>val get = get_sym &quot;sample.so&quot;;</strong></tt></p>
113
114<h2><a name="4 Calling Simple C-functions">4 Calling Simple C-functions</a></h2>
115
116<p>To call the C-function <b><tt>difference</tt></b> we use the function <b><tt>call2</tt></b>
117from the structure <b>CInterface. </b>This function allows us to call C-functions that
118take two arguments:</p>
119
120<p><tt><b>val call2 : sym</b> -&gt; <b>'a Conversion * 'b Conversion</b> -&gt; <b>'c
121Conversion<br>
122&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
123-&gt; 'a</b> <b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; * 'b</b>
124-&gt; <b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 'c</b></tt></p>
125
126<p>The first parameter of <b><tt>call2</tt></b> is the <b><tt>sym</tt></b> representing
127the symbol that we wish to call. This is usually obtained from a call to <b><tt>get_sym</tt></b>.
128The second parameter is a pair of <b><tt>Conversions</tt></b> describing the two arguments
129to the C-function; the third parameter is a <b><tt>Conversion</tt></b> describing the
130return value of the C-function. The fourth parameter is a pair containing the actual
131arguments to be passed to the C-function. Notice how the type of each argument matches the
132type variable contained in the corresponding <b><tt>Conversion</tt></b> parameter.</p>
133
134<p>The purpose of a <b><tt>Conversion</tt></b> is twofold. Firstly, it specifies the
135C-type required by the C-function. This needs to be known at the lowest level so that the
136correct argument passing and return conventions can be used when calling the C-function.
137Secondly, the <b><tt>Conversion</tt></b> performs the conversion between a C-value (in
138this case a C integer) and an ML-value. The conversion necessary to call the example
139C-function <b><tt>difference</tt></b> is <b><tt>INT</tt></b> which has type <b><tt>int
140Conversion</tt> </b>.We can now define an ML function as a wrapper around the underlying
141C-function.</p>
142
143<p><tt><strong>val diff = call2 (get &quot;difference&quot;) (INT,INT) INT;</strong></tt></p>
144
145<p>Because the Conversion <b><tt>INT</tt></b> has type <b><tt>int Conversion</tt></b>, the
146type of <b><tt>diff</tt></b> is constrained to being<b><tt> int-&gt;int-&gt;int</tt></b> -
147which is just what we require. We can now apply the ML function, for example: <b><tt>(diff
148(13,50))</tt></b>, which evaluates to <b><tt>37</tt></b>.</p>
149
150<h2><a name="5 Calln functions">5 A family</a> of <tt>call</tt><i>n</i> functions</h2>
151
152<p>There is a family of <tt><b>call</b></tt><i>n</i> functions from <b><tt>call0</tt></b>
153to <b><tt>call9</tt></b>.</p>
154
155<p><tt><strong>val calln :<br>
156&nbsp;&nbsp; sym -&gt; 'a<small><small>1</small></small> Conversion *&nbsp; ... * 'a<small><small>n</small></small>
157Conversion<br>
158&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -&gt; 'b Conversion<br>
159&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -&gt; 'a<small><small>1</small></small> * ... * 'a<small><small>n</small></small>
160-&gt; 'b </strong></tt></p>
161
162<p>We need a collection of functions because we cannot give a legal ML type to a function
163which takes a list of <b><tt>Conversion</tt></b>s without forcing them all to have the
164same type parameter. C-functions with more than nine parameters can still be called, but
165the lower level calling mechanism must be used, see <a
166href="CInterface.html#12 Lower Level Calling Mechanism: call_sym">Section 12</a>.</p>
167
168<h2><a name="6 Predefined Conversions">6 Predefined</a> <tt>Conversion</tt>s</h2>
169
170<p>In the structure <b><tt>CInterface</tt></b>, there are various predefined <b><tt>Conversion</tt></b>s.
171The name of each <b><tt>Conversion</tt></b> indicates the C-type required/returned,
172whereas the ML type of the <b><tt>Conversion</tt></b> constrains the resulting type when
173the <b><tt>Conversion</tt> </b>is used as an argument to a <b><tt>call</tt></b>n function.</p>
174
175<p><tt><strong>val CHAR: char Conversion<br>
176val DOUBLE : real Conversion<br>
177val FLOAT : real Conversion<br>
178val INT : int Conversion<br>
179val LONG : int Conversion<br>
180val SHORT : int Conversion<br>
181val STRING :string Conversion<br>
182val VOID : unit Conversion<br>
183val BOOL : bool Conversion<br>
184val POINTER :vol Conversion</strong></tt></p>
185
186<p>The <b><tt>Conversions CHAR, DOUBLE, FLOAT, INT, LONG</tt> </b>and <b><tt>SHORT</tt> </b>are
187primitive in the sense that they convert between small fixed-size C types.</p>
188
189<p>The <b><tt>Conversion STRING</tt></b> converts between an ML string and a C pointer;
190the pointer points at a null terminated array of characters. This <b><tt>Conversion</tt></b>
191is built out of the <b><tt>CHAR Conversion</tt></b> and the C programming primitives, see <a
192href="CInterface.html#15 C Programming Primitives">Section 15</a>.</p>
193
194<p>The <b><tt>Conversion VOID</tt></b> is really a one way <b><tt>Conversion</tt></b>
195intended for the result of C-functions that return <b><tt>void</tt></b>. Attempts to use
196this <b><tt>Conversion</tt></b> the other way around raise the exception <b><tt>Foreig</tt>n</b>
197with an appropriate message.</p>
198
199<p>The <b><tt>Conversion BOOL</tt></b> is build on top of the <b><tt>Conversion INT</tt></b>.
200It converts between an ML <b><tt>bool</tt></b> and a C integer.</p>
201
202<p>The <b><tt>Conversion POINTER</tt></b> is basically the identity <b><tt>Conversion</tt></b>.
203No conversion is performed and the underlying <b><tt>vol</tt></b> becomes accessible.</p>
204
205<h2><a name="7 Volatile Types">7 Volatile Types</a>: <tt>vol</tt>, <tt>sym</tt> and <tt>dylib</tt>.</h2>
206
207<p>There is a problem with the definition of the ML-function <b><tt>diff</tt></b> given
208above. The call to <b><tt>get_sym</tt></b> (within the partial application <b><tt>get</tt></b>)
209returns a value of type <b><tt>sym</tt></b> which like values of type <b><tt>vol</tt></b>
210does not persist from one ML session to the next. If after the definition of <b><tt>diff</tt></b>
211we were to commit the database and leave the ML session, we would find that on restarting
212the ML session, the function <b><tt>diff</tt></b> no longer operates as expected, but
213instead causes the exception <b><tt>Foreign</tt></b> to be raised:</p>
214
215<p><tt><strong>&gt; commit();<br>
216&gt; diff (13,50);<br>
217val it = 3<br>
218&gt; quit();<br>
219Pinky$ ml<br>
220&gt; diff (13,50);<br>
221Exception- Foreign &quot;Invalid volatile&quot; raised</strong></tt></p>
222
223<p>One solution is to redefine the ML function <b><tt>diff</tt></b> as:</p>
224
225<p><strong><tt>fun diff args =<br>
226cal12 (get &quot;difference&quot;) (INT,INT) INT args;</tt></strong></p>
227
228<p>The new version of <b><tt>diff</tt></b> is very similar to the old version, except that
229the subexpression <b><tt>get &quot;difference&quot;</tt></b> will be executed every time
230the function is applied to the tuple of arguments, instead of just once. This causes the
231library and symbol to be reloaded on every invocation of the function <b><tt>diff</tt></b>
232ensuring that the <b><tt>vol</tt></b> is valid. Efficiency wise this is not as horrific as
233it sounds. The underlying dynamic library manipulation functions appear to cache what has
234already been loaded, and so do little work on a subsequent calls to load the same library
235or symbol.</p>
236
237<h2><a name="8 Calling C-functions with return-parameters">8 Calling C-functions with <em>return-parameters</em></a></h2>
238
239<p>Although C is strictly a <i>call-by-value</i> language, <i>call-by-reference</i> is
240often simulated with the use of parameters of a pointer type. When a function is called
241with a parameter that has a pointer type, the called function can then modify the value
242pointed at by the pointer. For example, the C-function below <b><tt>diff_sum</tt></b>
243computes both the difference and the sum of two integers. The function has four
244parameters-two input parameters and two return-parameters.</p>
245
246<p><tt><strong>void diff_sum (int x, int y, int *diff, int *sum) {<br>
247&nbsp; *diff = x &gt; y ? x - y : y - x;<br>
248&nbsp; *sum = x+y;<br>
249}</strong></tt></p>
250
251<p>With C, this function would be invoked with something like:</p>
252
253<p><tt><strong>{<br>
254&nbsp; int diff,sum;<br>
255&nbsp; diff_sum(x,y,&amp;diff,&amp;sum);<br>
256}</strong></tt></p>
257
258<p>To call the C-function <b><tt>diff_sum</tt></b> from ML we use the function <b><tt>call4ret2</tt></b>.
259This allows us to call C-functions that have four parameters, the last two being
260return-parameters.</p>
261
262<p><tt><strong>val call4ret2 : sym<br>
263&nbsp; -&gt; 'a Conversion * 'b Conversion -&gt; 'c Conversion * 'd Conversion<br>
264&nbsp; -&gt; 'a&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; * 'b
265&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -&gt; 'c
266&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; * 'd</strong></tt></p>
267
268<p>Now we can write an ML wrapper function:</p>
269
270<p><strong><tt>fun diff_sum x y =<br>
271&nbsp;&nbsp; call4ret2 (get &quot;diff_sum&quot;) (INT,INT) (INT,INT) (x,y);</tt></strong></p>
272
273<p>Evaluating <b><tt>(diff _sum 13 50)</tt></b> results in <b><tt>(37,63)</tt></b>.</p>
274
275<h2><a name="9 A family of callnretr functions">9 A family of <tt>call</tt><i>n</i><tt>ret</tt><i>r</i>
276functions</a></h2>
277
278<p>There is a limited family of <b><tt>call</tt><i>n</i><tt>ret</tt><i>r</i> </b>functions
279defined to call C~functions that have<i> n - r input-parameters</i> followed by<i> r
280return-parameters</i>. This family contains functions for n ranging from 1 to 5, with r as
281either 1 or 2. (Exception: there is no <b><tt>call1ret2</tt></b> because this makes no
282sense.)</p>
283
284<p><tt><b>val call1ret1 : sym -&gt; unit -&gt; 'a Conversion -&gt; unit -&gt; 'a<br>
285val call<em>n</em>ret<em>r</em> :<br>
286&nbsp;&nbsp; sym -&gt; 'a<small>1</small> Conversion * ... * 'a<small>n-r</small>
287Conversion<br>
288&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -&gt; 'a<small>n-r+1</small> Conversion * ... * 'a<small>n</small>
289Conversion<br>
290&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -&gt; 'a<small>1</small> * ... *'a<small>n-r</small>
291-&gt; 'a<small>n-r+1</small> * ... 'a<small>n</small></b></tt></p>
292
293<p>For other combinations of n and r; requiring a non-final parameter in the parameter
294list to be a return-parameter; or requiring the actual return result together with the use
295of return parameters, the lower level calling mechanism can be used (<a
296href="CInterface.html#12 Lower Level Calling Mechanism: call_sym">Section 12</a>).</p>
297
298<h2><a name="10 C structures">10 C structures</a></h2>
299
300<p>C functions may be called which take/return C structure values. For example, the
301following piece of C defines a <b><tt>typedef</tt></b>ed structure called <b><tt>Point</tt></b>,
302and a function which manipulates these <b><tt>Points</tt></b> called <b><tt>addPoint</tt></b>.</p>
303
304<p><b><tt>typedef struct {int x; int y;} Point;</tt></b></p>
305
306<p><b><tt>Point addPoint (Point p1, Point p2) {<br>
307&nbsp; p1.x += p2.x;<br>
308&nbsp; p1.y += p2.y;<br>
309&nbsp; return p1;<br>
310}</tt></b></p>
311
312<p>To create the necessary <b><tt>Conversion</tt></b> for <b><tt>Points</tt></b> we can
313use the <b><tt>Conversional</tt></b>, <b><tt>STRUCT2</tt></b>. This function takes a pair
314of <b><tt>Conversion</tt></b>s and returns a new <b><tt>Conversion</tt></b> suitable for a
315C structure containing those types. The type of <b><tt>STRUCT2</tt></b> is:</p>
316
317<p><b>v<tt>al STRUCT2 : 'a Conversion * 'b Conversion -&gt; ('a * 'b) Conversion</tt></b></p>
318
319<p>We now define an ML wrapper function for <b><tt>addPoint</tt></b>:</p>
320
321<p><tt><strong>val POINT = STRUCT2 (INT,INT);<br>
322fun addPoint p1 p2 =<br>
323&nbsp;&nbsp; cal12 (get &quot;addPoint&quot;) (POINT,POINT) POINT (p1, p2);</strong></tt></p>
324
325<p>Now, <b><tt>(addPoint (5, 6) (8,9))</tt></b> evaluates to <b><tt>(13, 15)</tt></b>.</p>
326
327<h2><a name="11 A family of structn Conversionals">11 A family of <tt>struct</tt><i>n</i>
328Conversionals</a></h2>
329
330<p>There is a family of <b><tt>struct</tt></b><i>n</i> functions from <b><tt>struct2</tt></b>to
331<b><tt>struct9</tt></b>.</p>
332
333<p><tt><strong>val structn : 'a<small>1</small> Conversion * ... * 'a<small>n</small>
334Conversion<br>
335&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -&gt;
336('a<small>1</small> *... * 'a<small>n</small>) Conversion</strong></tt></p>
337
338<p>Manipulation of structures with more than nine components can be achieved with the use
339of the lower level calling mechanism, <a
340href="CInterface.html#12 Lower Level Calling Mechanism: call_sym">see Section 12</a>.</p>
341
342<h2><a name="12 Lower Level Calling Mechanism: call_sym">12 Lower Level Calling Mechanism:
343<tt>call_sym</tt></a></h2>
344
345<p>Occasionally it is necessary to access the dynamic calling mechanism at a lower level.
346The collection of functions <b><tt>call</tt></b><i>n</i> and <b><tt>call</tt><i>n</i><tt>ret</tt><i>r</i></b>
347are all defined in terms of the function <b><tt>call_sym</tt></b>, which has the following
348type:</p>
349
350<p><b><tt>val call_sym : sym -&gt; (Ctype * vol) list -&gt; Ctype -&gt; vol</tt></b></p>
351
352<p>The second argument to <b><tt>call_sym</tt></b> is a list of <b><tt>Ctype/vol</tt></b>
353pairs, which allows C-functions of any number of arguments to be called. This function is
354more cumbersome to use than the <b><tt>call</tt><i>n</i></b> and <b><tt>call</tt><i>n</i><tt>ret</tt><i>r</i></b>
355functions because the two stages of; specification of the C-type, and conversion between
356ML-values and C-values <b>(vols) </b>have been separated. The specification of the C-type
357is achieved by using a constructor of the datatype <b><tt>Ctype</tt></b>:</p>
358
359<p><tt><strong>datatype Ctype =<br>
360Cchar | Cdouble | Cfloat | Cint | Clong | Cshort | Cvoid<br>
361| Cpointer of Ctype<br>
362| Cstruct of Ctype list<br>
363| Cfunction of Ctype list * Ctype</strong></tt></p>
364
365<p>The following collection of functions is used to convert from and to values of type <b><tt>vol</tt></b>.</p>
366
367<p><tt><b>val</b> <b>fromCstring : vol -&gt;string<br>
368val</b> <b>fromCchar : vol -&gt;char<br>
369val</b> <b>fromCdouble : vol -&gt;real<br>
370val</b> <b>fromCfloat : vol -&gt;real<br>
371val</b> <b>fromCint :</b> <b>vol -&gt;int<br>
372val</b> <b>fromClong : vol -&gt;int<br>
373val</b> <b>fromCshort : vol -&gt;int<br>
374val</b> <b>toCstring : string -&gt;</b> <b>vol<br>
375val</b> <b>toCchar : char -&gt; vol<br>
376val</b> <b>toCdouble : real -&gt;vol<br>
377val</b> <b>toCfloat :</b> <b>real -&gt;vol<br>
378val</b> <b>toCint : int -&gt;vol<br>
379val</b> <b>toClong :</b> <b>int -&gt;vol<br>
380val</b> <b>toCshort :</b> <b>int -&gt;vol</b></tt></p>
381
382<p>For example, this is how to define <b><tt>diff</tt></b> directly in terms of <b><tt>call_sym</tt></b>.</p>
383
384<p><tt><strong>fun diff x y =<br>
385&nbsp; fromCint (call_sym (get &quot;difference&quot;)<br>
386&nbsp;&nbsp;&nbsp; [(Cint, toCint x),(Cint, toCint y)] Cint)</strong></tt></p>
387
388<p>Manipulation of C structures is achieved with the following two functions:</p>
389
390<p><tt><b>val make_struct</b> : <b>(Ctype * vol) list</b> -&gt; <b>vol <br>
391val break_struct</b> : <b>Ctype list -&gt; vol</b> -&gt; <b>vol list</b></tt></p>
392
393<h2><a name="13 Creating New Conversions">13 Creating New <tt>Conversion</tt>s</a></h2>
394
395<p>Recall a <b><tt>Conversion</tt></b> encapsulates three things: an underlying C-type; a
396function to convert from the C-value (of type <b><tt>vol</tt></b>) to an ML value of a
397given type; a function which converts from the ML value back into the C-value (of type <b>vol).
398</b>Sometimes it is useful to be able to create new <b><tt>Conversions</tt></b>, or to
399retrieve the components from an existing <b><tt>Conversion</tt></b>.</p>
400
401<p><tt><b>val mkConversion</b> : <b>(vol -&gt; 'a) -&gt; ('a -&gt; vol) -&gt; Ctype</b>
402-&gt; <b>'a Conversion <br>
403val breakConversion</b> : <b>'a Conversion -&gt; (vol -&gt; 'a) * ('a</b> -&gt; <b>vol) *
404Ctype</b></tt></p>
405
406<p>The function <b><tt>mkConversion</tt></b> creates a new <b><tt>Conversion</tt></b> from
407its three components. The function <b><tt>breakConversion</tt></b> takes an existing <b><tt>Conversion</tt></b>
408and returns a triple containing the components. For example, the standard conversion <b><tt>INT</tt></b>
409might be defined as:</p>
410
411<p><strong><tt>val INT = mkConversion fromCint toCint Cint</tt></strong></p>
412
413<p>A good reason for creating a new <b><tt>Conversion</tt></b> is to give a different ML
414type to values of type <b><tt>vol</tt></b> which are to be used in a particular way. For
415example, we may be interfacing to a collection of C-functions that take/return pointers
416which are being used to implement a particular abstract type, for example a tree node. By
417creating a new conversion we can use the ML type system to avoid mixing values of this new
418type with other normal <b><tt>vol</tt></b>s.</p>
419
420<p><strong><tt>abstype node = Node of vol<br>
421with val NODE = mkConversion Node (fn (Node n) =&gt; n) (Cpointer Cvoid)<br>
422end</tt></strong></p>
423
424<p><strong><tt>fun lookupNode s = call1 (get &quot;lookupNode&quot;) STRING NODE s<br>
425fun printNode n = call1 (get &quot;printNode&quot;) NODE VOID n</tt></strong></p>
426
427<p>The types of these two functions are:</p>
428
429<p><tt><b>val lookupNode</b> : <b>string -&gt; node<br>
430val printNode</b> : <b>node -&gt; unit</b></tt></p>
431
432<h2><a name="14 Enumerated Types">14 Enumerated Types</a></h2>
433
434<p>Another reason for creating a new <b>Conversion</b> is for when we want to call a
435C-function that takes/returns values of an enumerated type. For example, suppose <b>colour</b>
436is declared as:</p>
437
438<p><tt><strong>typedef enum {<br>
439&nbsp; white,<br>
440&nbsp; red = 5,<br>
441&nbsp; green,<br>
442&nbsp; blue,<br>
443&nbsp; /* leave room for extra colours in the future */<br>
444&nbsp; black = 100<br>
445} colour;</strong></tt></p>
446
447<p>This example shows that C enumerations are just sugar for integers, so much so, we can
448even specify which constructors correspond to which integer values. When an enumeration is
449declared that specifies integer values for just some constructors, (as in <b><tt>colour</tt></b>
450above): if the first constructor is unspecified, it is assigned 0; successive unspecified
451constructors are assigned successive integer values, e.g. <b><tt>green</tt></b> is 6.</p>
452
453<p>We would like to convert C-enumerations like <b><tt>colour</tt></b> into an equivalent
454ML datatype, together with functions to convert between values of the datatype and ML
455integers. This can be achieved automatically by using the script <b><tt>proc-enums</tt></b>,
456contained in the scripts subdirectory of the source tree.</p>
457
458<p><tt><strong>Usage: proc-enums &lt;struct-name&gt; {&lt;filename&gt;}+</strong></tt></p>
459
460<p>The first parameter to <b><tt>proc-enums</tt></b> is the name of the generated ML
461structure. The remaining parameters specify C-files in which to search for C <b><tt>typedef</tt></b>ed
462enumeration declarations. No formatting conventions are assumed, i.e. arbitrary white
463space and comments are allowed within the declaration. Other declarations and definitions
464are ignored. The generated file is named <b><tt>&lt;struct-name&gt;.ML</tt></b>.</p>
465
466<p>For the colour example, we would type <b><tt>'proc-enums colour colour.h'</tt></b> at
467the shell prompt. This would generate a file <b><tt>colour.ML</tt></b> containing the
468following ML definitions.</p>
469
470<p><strong><tt>structure colour = struct</tt></strong></p>
471
472<p><strong><tt>datatype colour<br>
473= white<br>
474| red<br>
475| green<br>
476| blue<br>
477| black</tt></strong></p>
478
479<p><strong><tt>exception Int2colour</tt></strong></p>
480
481<p><strong><tt>fun int2colour i = case i of <br>
482&nbsp; 0 =&gt; white<br>
483| 5 =&gt; red<br>
484| 6 =&gt; green<br>
485| 7 =&gt; blue<br>
486| 100 =&gt; black<br>
487| _ =&gt; raise Int2colour</tt></strong></p>
488
489<p><strong><tt>fun colour2int i = case i of <br>
490&nbsp; white =&gt; 0<br>
491| red =&gt; 5<br>
492| green =<br>
493| blue =&gt; 7<br>
494| black =&gt; 100</tt></strong></p>
495
496<p><strong><tt>end (* struct *)</tt></strong></p>
497
498<p>Once these definitions have been generated we can create a new <b>Conversion:</b></p>
499
500<p><strong><tt>val COLOUR =<br>
501&nbsp; mkConversion (int2colour o fromCint) (toCint o colour2int) Cint;</tt></strong></p>
502
503<p>Now, suppose we have a C-function <b><tt>nameOfColour</tt></b>,</p>
504
505<p><tt><strong>#include &quot;colour.h&quot;<br>
506char* nameOfColour (colour c) {<br>
507&nbsp; switch (c) {<br>
508&nbsp;&nbsp;&nbsp; case white: return&quot;white&quot;;<br>
509&nbsp;&nbsp;&nbsp; case red:&nbsp;&nbsp; return&quot;red&quot;;<br>
510&nbsp;&nbsp;&nbsp; case green: return&quot;green&quot;;<br>
511&nbsp;&nbsp;&nbsp; case blue:&nbsp; return&quot;blue&quot;;<br>
512&nbsp;&nbsp;&nbsp; case black: return&quot;black&quot;;<br>
513&nbsp;&nbsp;&nbsp; default:&nbsp;&nbsp;&nbsp; return&quot;Error: No such colour&quot;;<br>
514&nbsp; }<br>
515}</strong></tt></p>
516
517<p>we can write a ML wrapper for this function as:</p>
518
519<p><tt><strong>fun nameOfColour c =<br>
520&nbsp;&nbsp; call1 (get &quot;nameOfColour&quot;) COLOUR STRING c;</strong></tt></p>
521
522<p>Now we can execute, <b><tt>(nameOfColour blue)</tt></b>, which evaluates to the ML
523string <b><tt>&quot;blue&quot;</tt></b>.</p>
524
525<h2><a name="15 C Programming Primitives">15 C Programming Primitives</a></h2>
526
527<p>Occasionally, we need to manipulate C-values in greater detail. The following example
528shows how an ML wrapper can be written for the C-function <b><tt>diff _sum</tt></b>,
529without using a <b><tt>call</tt><i>n</i><tt>ret</tt><i>r</i> </b>function.</p>
530
531<p><tt><strong>fun diff_sum x y =<br>
532&nbsp;&nbsp;&nbsp; let val diff = alloc 1 Cint<br>
533&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; val sum = alloc 1 Cint<br>
534&nbsp;&nbsp;&nbsp; in<br>
535&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; cal14 (get &quot;diff_sum&quot;)
536(INT,INT,POINTER,POINTER) VOID<br>
537&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (x, y, address diff,
538address sum);<br>
539&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (fromCint diff, fromCint sum)<br>
540&nbsp;&nbsp;&nbsp; end</strong></tt></p>
541
542<p>This example uses two of a collection of six ML functions allowing basic C-programming.</p>
543
544<p><tt><strong>val sizeof&nbsp; : Ctype -&gt; int<br>
545val alloc&nbsp;&nbsp; : int -&gt; Ctype -&gt; vol<br>
546val address : vol -&gt; vol<br>
547val deref&nbsp;&nbsp; : vol -&gt; vol<br>
548val assign&nbsp; : Ctype -&gt; vol -&gt; vol -&gt; unit<br>
549val offset&nbsp; : int -&gt; Ctype -&gt; vol -&gt; vol</strong></tt></p>
550
551<p><i>These functions are intrinsically unsafe-incorrect usage can cause the ML session to
552die.</i></p>
553
554<p>The application <b><tt>(sizeof</tt></b><i> t</i><b><tt>)</tt></b> returns the size (in
555bytes) of the <b><tt>Ctype</tt></b><i> t</i>.</p>
556
557<p>The application <b><tt>(alloc</tt> </b><i>n t</i><b><tt>)</tt></b> returns a <b><tt>vol</tt>
558</b>encapsulating some freshly allocated memory of size <b><tt>(</tt></b><i>n</i>*<b><tt>sizeof</tt></b>
559t<b><tt>)</tt></b> bytes. Unlike allocation facilities in C which return a pointer to the
560newly allocated space,the result of <b><tt>alloc</tt></b> encapsulates the space directly.</p>
561
562<p><i>The underlying implementation of</i><b><tt> alloc</tt></b><i> does in fact use</i> <b>malloc
563</b><i>to gain some newly allocated space, and does in fact consist of a pointer to this
564space. However, all the above ML functions work at an extra level of indirection to the
565corresponding C-operation. This extra indirection is removed before the C-value is passed
566to a real C-function.</i></p>
567
568<p>The application <b><tt>(address</tt></b> <i>v</i><b><tt>)</tt></b> returns a new <b><tt>vol</tt>
569</b>containing the address of <i>v</i>. This function corresponds to the C operator <b><tt>&amp;</tt></b>.</p>
570
571<p>The application <b><tt>(deref</tt></b> <i>v</i><b><tt>)</tt></b> returns a <b><tt>vol</tt></b>
572which is the result of dereferencing the address contained in <i>v</i>. This function
573corresponds to the C operator <b><tt>*</tt></b>. If <i>v</i> is not a valid address, the
574ML session will die with a segmentation error.</p>
575
576<p>The application <b><tt>(assign</tt></b><i> t v w</i><b><tt>)</tt></b> copies <b><tt>(sizeof</tt></b>
577<i>t</i><b><tt>)</tt></b> bytes of data from <i>w</i> into <i>v</i>. This function
578corresponds to the C operator <b><tt>=</tt></b>, or the standard C function <b><b><tt>memcpy</tt></b></b>.</p>
579
580<p>The application <b><tt>(offset</tt></b><i> i t v</i><b><tt>)</tt></b> returns a new <b><tt>vol</tt>
581</b>that is offset <b><tt>(</tt>i</b>*<b><tt>sizeof</tt></b><i> t</i><b><tt>) </tt></b>bytes
582in memory from <i>v</i>. The closest corresponding operator in C is structure
583dereferencing <tt>(.)</tt>. Pointer arithmetic can be achieved by combining the function <b><tt>offset</tt></b>
584with the functions <b><tt>address</tt></b> and <b>d<tt>eref</tt></b>.</p>
585
586<p>The functions <b><tt>address</tt></b> and <b><tt>deref</tt></b> create the same
587aliasing as the corresponding C operators. For example, the following sequence of C
588statements causes the final value of <b><tt>i</tt> </b>to be 123:</p>
589
590<p><tt><strong>{<br>
591&nbsp; int i = 0;<br>
592&nbsp; int *p = &amp;i;<br>
593&nbsp; *p = 123;<br>
594}</strong></tt></p>
595
596<p>Likewise, the following sequence of ML statements:</p>
597
598<p><tt><strong>&gt; val i = toCint 0;<br>
599&gt; val p = address i;<br>
600&gt; assign Cint (deref p) (toCint 123);<br>
601&gt; fromCint i;<br>
602val it = 123</strong></tt></p>
603
604<h2><a name="16 Example: Quicksort">16 Example: Quicksort</a></h2>
605
606<p>The following example shows how the C-programming primitives are intended to be used.
607The example involves interfacing to the standard C-function <b>qsort</b>. On many Unix
608systems this function can be retrieved from a dynamic library in <b><tt>/usr/lib</tt></b>.</p>
609
610<p><strong><tt>val getC = get_sym &quot;/usr/lib/libc.so.1.7&quot;;</tt></strong></p>
611
612<p>The function <b><tt>qsort</tt></b> takes four parameters.</p>
613
614<p><strong><tt>void qsort (void *base, int nel, int width, int (*compar)());</tt></strong></p>
615
616<p>The first parameter, <b><tt>base</tt></b>, is a pointer to an array of elements to be
617sorted; the second parameter, <b><tt>nel</tt></b>, is the number of elements in the array;
618the third parameter, <b><tt>width</tt></b>, is the size (in bytes) of each element; the
619fourth parameter, <b><tt>compar</tt></b> is a comparison function which must return an
620integer less than, equal to, or greater than zero. See the <b><tt>qsort</tt></b> manual
621page for more details.</p>
622
623<p>In our example we wish to sort pairs of strings. The first string is the key to be
624sorted, while the second string is arbitrary data. In C we would represent this pair as a
625structure, and would write the comparison function <b><tt>compare</tt></b> using <b><tt>strcmp</tt></b>.</p>
626
627<p><strong><tt>typedef struct {<br>
628&nbsp; char *key;<br>
629&nbsp; char *data;<br>
630} pair;</tt></strong></p>
631
632<p><strong><tt>int compare (pair x, pair y) {<br>
633&nbsp;&nbsp; return strcmp(x.key, y.key);<br>
634}</tt></strong></p>
635
636<p>We want to define an ML wrapper <b><tt>qsort</tt></b> which takes a list of string
637pairs and returns the sorted list. Other than the C-programming primitives, the only
638additional function needed is <b><tt>volOfSym</tt></b>. This is needed to supply the
639fourth argument to <b><tt>qsort</tt></b>, a pointer to a comparison function. The
640application <b><tt>(volOfSym</tt></b> <i>s</i><b><tt>)</tt></b> returns the <b><tt>vol</tt></b>
641encapsulated in the symbol <i>s</i>.</p>
642
643<p><strong><tt>val volOfSym : sym -&gt; vol</tt></strong></p>
644
645<p>We can now defined <b><tt>qsort</tt></b>, together with two auxiliary function <b><tt>fill</tt></b>
646and <b><tt>read</tt></b>.</p>
647
648<p><strong><tt>val (fromPair,toPair,pairType) = breakConversion (STRUCT2 (STRING,STRING));</tt></strong></p>
649
650<p><strong><tt>fun fill p [] = ()<br>
651&nbsp; | fill p ((key,data)::xs) =<br>
652&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (assign pairType p (toPair (key,data)); <br>
653&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; fill (offset 1 pairType p) xs)</tt></strong></p>
654
655<p><strong><tt>fun read p 0 = []<br>
656&nbsp; | read p n = fromPair p :: read (offset 1 pairType p) (n-1)</tt></strong></p>
657
658<p><strong><tt>fun qsort xs =<br>
659&nbsp;&nbsp; let<br>
660&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; val len = length xs<br>
661&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; val table = alloc len pairType<br>
662&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; val compare = volOfSym (get &quot;compare&quot;)<br>
663&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; val sort = ca114 (getc &quot;qsort&quot;)
664(POINTER,INT,INT,POINTER) VOID<br>
665&nbsp;&nbsp; in<br>
666&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; fill table xs;<br>
667&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; sort (address table, len, sizeof pairType, compare);<br>
668&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; read table len<br>
669&nbsp;&nbsp; end</tt></strong></p>
670
671<p>The function <b><tt>fill</tt></b> takes a pointer into some allocated space (which must
672be big enough), and a string pair list. It fills the array with structures created from
673the list. The function <b><tt>offset</tt></b> is used to move along the allocated area.</p>
674
675<p>The function <b><tt>read</tt></b> is the inverse of <b><tt>fill</tt></b>. It takes an
676array of structures and an integer <i>n</i> and reconstructs a list of <i>n</i> string
677pairs.</p>
678
679<p>The ML function <b><tt>qsort</tt></b> operates by first allocating enough space for the
680array of structures, then using <b><tt>fill</tt></b> to fill this array from the argument
681list <b><tt>xs</tt></b>. A call to the C-function <b><tt>qsort</tt></b> is made to sort
682this array. Notice how the first argument to <b><tt>sort</tt></b> is <b><tt>(address
683table)</tt></b> which generates the required array pointer for the C-function <b><tt>qsort</tt></b>.
684Finally, a list is reconstructed from the sorted array using <b><tt>read</tt></b>.</p>
685
686<p>Now we can evaluate the following:</p>
687
688<p><tt><strong>&gt; qsort [(&quot;one&quot;,&quot;fred&quot;), (&quot;two&quot;,
689&quot;dave&quot;), (&quot;three&quot;, &quot;bob&quot;), (&quot;four&quot;,
690&quot;mary&quot;)];<br>
691val it =<br>
692&nbsp; [( &quot;four&quot;, &quot;mary&quot;), (&quot;one&quot;, &quot;fred&quot;),
693(&quot;three&quot;, &quot;bob&quot;), (&quot;two&quot;, &quot;dave&quot;)]</strong></tt></p>
694
695<h2><a name="17 Volatile Implementation">17 Volatile Implementation</a></h2>
696
697<p>The C-data contained in a volatile is managed in a separate space from normal ML data
698which is stored in the heap. There are two reasons for this. Data contained in the ML heap
699is liable to change its address during garbage collection, and C-functions cannot cope
700with this. The second reason is safety. We do not want foreign C-functions to obtain a
701pointer into the ML heap. Because the C-function is running in the same Unix process, it
702is always possible for it to corrupt the ML heap; however the most usual cause of
703corruption is caused by <i>off-by-one</i> errors. If the C-data is stored in the ML heap
704this would cause a neighbouring heap cell to be corrupted.</p>
705
706<p>Every ML value of type <b><tt>vol</tt></b> has two components: (1) An ML heap cell; (2)
707A slot in the <b><tt>vols</tt></b> array, a runtime system variable declared and managed
708in the file <b>Driver/foreign.c </b>. The ML heap cell indexes a slot in the <b><tt>vols</tt></b>
709array. This slot contains three items: (1) A back pointer, pointing at the corresponding
710ML heap cell. (2) A C-pointer, pointing to the actual C-data; (3) A boolean, indicating
711whether this volatile <i>owns</i> the space pointed to by the C-pointer.</p>
712
713<p>The combination of <b><tt>vols</tt></b> array index and the back pointer found there
714enables the validity of a volatile to be checked as it is dereferenced. If the volatile is
715invalid then the exception <b><tt>Foreign</tt></b> is raised.</p>
716
717<p>The collection of functions that convert ML values into <b><tt>vols</tt></b> (e.g. <b><tt>toCint</tt></b>
718and <b><tt>toCfloat</tt></b>), together with the functions <b><tt>alloc</tt></b> and <b><tt>address</tt></b>
719create new volatiles; that is, volatiles that <i>own</i> the space pointed to by the
720C-pointer in their <b>vols </b>array slot. This space is obtained from a call to <tt><b>malloc</b></tt>.
721There is always exactly one owner of any piece of <b><tt>malloc</tt></b>ed space. The <b><tt>deref</tt></b>
722and <b><tt>offset</tt></b> functions create <b><tt>vol</tt></b>s that point to previously
723allocated space and so are not regarded as the owner.</p>
724
725<p>Volatiles are garbage collected in such a way that <b><tt>malloc</tt></b>ed space is
726freed when there are no remaining references to the ML cell which owns that space.
727However, by itself this scheme is too vicious. For example:</p>
728
729<p><strong><tt>val a = address (toCint 999);</tt></strong></p>
730
731<p>When a garbage collection occurs, although the space owned by <b>a</b> (containing the
732pointer) will be preserved, the space allocated to hold the C-integer 999 will be
733reclaimed because there are no references to its owner, the anonymous expression <b><tt>(toCint
734999)</tt></b></p>
735
736<p>If we now evaluate the expression <b><tt>(fromCint (deref a))</tt></b>, it will result
737in whatever garbage happened to be pointed to by the redundant C-pointer contained in the
738volatile <b>a</b>. What is needed is a way to ensure that the volatile <b><tt>a</tt></b>
739holds an ML reference to the anonymous volatile <b><tt>(toCint 999)</tt></b> for the
740duration of its lifetime. In a similar manner, any volatile that does not own its own
741space, i.e. the result of the expression <b><tt>(deref (address (toCint 999)))</tt></b>,
742needs to hold a reference to the owner of the space it points at. This scheme of
743maintaining references is implemented in <b><tt>Volatile.ML</tt></b> in the directory <b><tt>Prelude/Foreign</tt></b>,
744and is completely transparent to the user.</p>
745
746<p>In some unusual situations we might want to allocate some space which persists after
747all ML references to it have disappeared. For example, we might have to allocate space for
748a buffer, and then hand a pointer to this buffer over to a foreign C-function. This can be
749achieved in two ways. We could carefully maintain an ML reference to the <b><tt>vol</tt></b>
750encapsulating the buffer. Alternatively, we could use the dynamic library manipulation
751functions to use the real C-function <b><tt>malloc</tt></b>.</p>
752</body>
753</html>
754