1<?xml version="1.0" encoding="iso-8859-2"?>
2<!DOCTYPE article 
3SYSTEM "file:///usr/share/docbook/docbook-xml-4.3/docbookx.dtd">
4
5<article id="libxslt">
6<articleinfo>
7  <author><firstname>Panos</firstname><surname>Louridas</surname></author>
8  <copyright>
9    <year>2004</year>
10    <holder>Panagiotis Louridas</holder>
11  </copyright>
12  <legalnotice>
13    <para>Permission is hereby granted, free of charge, to
14  any person obtaining a copy of this software and associated
15  documentation files (the "Software"), to deal in the Software
16  without restriction, including without limitation the rights to use,
17  copy, modify, merge, publish, distribute, sublicense, and/or sell
18  copies of the Software, and to permit persons to whom the Software
19  is furnished to do so, subject to the following conditions:
20  </para>
21
22  <para>The above copyright notice and this permission notice shall be
23  included in all copies or substantial portions of the Software.
24  </para>
25
26  <para>THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
27  EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
28  MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
29  NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
30  LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
31  OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
32  WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.</para>
33
34  </legalnotice>
35</articleinfo>
36
37<title>libxslt: An Extended Tutorial</title>
38
39<sect1><title>Introduction</title>
40
41<para>The Extensible Stylesheet Language Transformations (XSLT)
42specification defines an XML template language for transforming XML
43documents. An XSLT engine reads an XSLT file and an XML document and
44transforms the document accordingly.</para>
45
46<para>We want to perform a series of XSLT transformations to a series
47of documents. An obvious solution is to use the operating system's
48pipe mechanism and start a series of transformation processes, each
49one taking as input the output of the previous transformation. It
50would be interesting, though, and perhaps more efficient if we could
51do our job within a single process.</para>
52
53<para>libxslt is a library for doing XSLT transformations. It is built
54on libxml, which is a library for handling XML documents. libxml and
55libxslt are used by the GNOME project. Although developed in the
56*NIX world, both libxml and libxslt have been
57ported to the MS-Windows platform. In principle an application using
58libxslt should be easily portable between the two systems. In
59practice, however, there arise various wrinkles. These do not have
60anything to do with libxml or libxslt per se, but rather with the
61different compilation and linking procedures of each system.</para>
62
63<para>The presented solution is an extension of <ulink
64url="http://xmlsoft.org/XSLT/tutorial/libxslttutorial.html">John
65Fleck's libxslt tutorial</ulink>, but the present tutorial tries to be
66self-contained. It develops a minimal libxslt application
67(libxslt_pipes) that can perform a series of transformations to a
68series of files in a pipe-like manner. An invocation might be:</para>
69
70<para>
71  <userinput>
72    libxslt_pipes --out results.xml foo.xsl bar.xsl doc1.xml doc2.xml
73  </userinput>
74</para>
75
76<para>The <filename>foo.xsl</filename> stylesheet will be applied to
77<filename> doc1.xml</filename> and the <filename>bar.xsl</filename>
78stylesheet will be applied to the resulting document; then the two
79stylesheets will be applied in the same sequence to
80<filename>bar.xsl</filename>. The results are sent to
81<filename>results.xml</filename> (if no output is specified they are
82sent to standard output).</para>
83
84<para>The application is compiled in both *NIX
85systems and MS-Windows, where by *NIX systems we
86mean Linux, BSD, and other members of the
87family. The gcc suite is used in the *NIX platform
88and the Microsoft compiler and linker are used in the
89MS-Windows platform.</para>
90
91</sect1>
92
93<sect1><title>Setting the Scene</title>
94
95<para>
96We need to include the necessary libraries:
97
98<programlisting>
99  <![CDATA[
100  #include <stdio.h>
101  #include <string.h>
102  #include <stdlib.h>
103  
104  #include <libxslt/transform.h>
105  #include <libxslt/xsltutils.h>
106  ]]>
107</programlisting>
108</para>
109
110<para>The first group of include directives includes general C
111libraries. The libraries we need to make libxslt work are in the
112second group. The <filename>transform.h</filename> header file
113declares the API that does the bulk of the actual processing. The
114<filename>xsltutils.h</filename> header file declares the API for some
115generic utility functions of the XSLT engine; among other things,
116saving to a file, which is what we need it for.</para>
117
118<para>
119If our input files contain entities through external subsets, we need
120to tell libxslt to load them. The global variable
121<function>xmlLoadExtDtdDefaultValue</function>, defined in
122<filename>libxml/globals.h</filename>, is responsible for that. As the
123variable is defined outside our program we must specify external
124linkage:
125  <programlisting>
126    extern int xmlLoadExtDtdDefaultValue;
127  </programlisting>
128</para>
129
130<para>
131The program is called from the command line. We anticipate that the
132user may not call it the right way, so we define a function for
133describing its usage:
134<programlisting>
135  static void usage(const char *name) {
136      printf("Usage: %s [options] stylesheet [stylesheet ...] file [file ...]\n",
137          name);
138      printf("      --out file: send output to file\n");
139      printf("      --param name value: pass a (parameter,value) pair\n");
140  }
141</programlisting>
142</para>
143</sect1>
144
145<sect1><title>Program Start</title>
146
147<para>We need to define a few variables that are used throughout the
148program:
149<programlisting>
150    int main(int argc, char **argv) {
151        int arg_indx;
152	const char *params[16 + 1];
153	int params_indx = 0;
154	int stylesheet_indx = 0;
155	int file_indx = 0;
156	int i, j, k;
157	FILE *output_file = stdout;
158	xsltStylesheetPtr *stylesheets = 
159	    (xsltStylesheetPtr *) calloc(argc, sizeof(xsltStylesheetPtr));
160	    xmlDocPtr *files = (xmlDocPtr *) calloc(argc, sizeof(xmlDocPtr));
161	int return_value = 0;
162</programlisting>
163</para>
164
165<para>The <varname>arg_indx</varname> integer is an index used to
166iterate over the program arguments. The <varname>params</varname>
167string array is used to collect the XSLT parameters. In XSLT,
168additional information may be passed to the processor via
169parameters. The user of the program specifies these in key-value pairs
170in the command line following the <userinput>--param</userinput>
171command line argument. We accept up to 8 such key-value pairs, which
172we track with the <varname>params_indx</varname> integer. libxslt
173expects the parameters array to be null-terminated, so we have to
174allocate one extra place (16 + 1) for it. The
175<varname>file_indx</varname> is an index to iterate over the files to
176be processed. The <varname>i</varname>, <varname>j</varname>,
177<varname>k</varname> integers are additional indices for iteration
178purposes, and <varname>return_value</varname> is the value the program
179returns to the operating system. We expect the result of the
180transformation to be the standard output in most cases, but the user
181may wish otherwise via the <option>--out</option> command line
182option, so we need to keep track of the situation with the
183<varname>output_file</varname> file pointer.</para>
184
185<para>In libxslt, XSLT stylesheets are internally stored in
186<structname>xsltStylesheet</structname> structures; similarly, in
187libxml XML documents are stored in <structname>xmlDoc</structname>
188structures. <type>xsltStylesheetPtr</type> and <type>xmlDocPtr</type>
189are simply typedefs of pointers to them. The user may specify any
190number of stylesheets that will be applied to the documents one after
191the other. To save time we parse the stylesheets and the documents as
192we read them from the command line and keep the parsed representation
193of them. The parsed results are kept in arrays. These are dynamically
194allocated and sized to the number of arguments; this wastes some
195space, but not much (the size of <type>xmlStyleSheetPtr</type> and
196<type>xmlDocPtr</type> is the size of a pointer) and simplifies code
197later on. The array memory is allocated with
198<function>calloc</function> to ensure contents are initialised to
199zero.
200</para>
201
202</sect1>
203
204<sect1><title>Arguments Collection</title>
205
206<para>If the program gets no arguments at all, we print the usage
207description, set the program return value to 1 and exit. Instead of
208returning directly we go to (literally) to the end of the program text
209where some housekeeping takes place.</para> 
210
211<para>
212<programlisting>
213  <![CDATA[
214    if (argc <= 1) {
215        usage(argv[0]);
216        return_value = 1;
217        goto finish;
218    }
219        
220    /* Collect arguments */
221    for (arg_indx = 1; arg_indx < argc; arg_indx++) {
222        if (argv[arg_indx][0] != '-')
223            break;
224        if ((!strcmp(argv[arg_indx], "-param"))
225                || (!strcmp(argv[arg_indx], "--param"))) {
226            arg_indx++;
227            params[params_indx++] = argv[arg_indx++];
228            params[params_indx++] = argv[arg_indx];
229            if (params_indx >= 16) {
230                fprintf(stderr, "too many params\n");
231                return_value = 1;
232                goto finish;
233            }
234        }  else if ((!strcmp(argv[arg_indx], "-o"))
235                || (!strcmp(argv[arg_indx], "--out"))) {
236            arg_indx++;
237            output_file = fopen(argv[arg_indx], "w");
238        } else {
239            fprintf(stderr, "Unknown option %s\n", argv[arg_indx]);
240            usage(argv[0]);
241            return_value = 1;
242            goto finish;
243        }
244    }
245    params[params_indx] = 0;
246    ]]>
247</programlisting>
248</para>
249
250<para>If the user passes arguments we have to collect them. This is a
251matter of iterating over the program argument list while we encounter
252arguments starting with a dash. The XSLT parameters are put into the
253<varname>params</varname> array and the <varname>output_file</varname>
254is set to the user request, if any. After processing all the parameter
255key-value pairs we set the last element of the <varname>params</varname>
256array to null.
257</para>
258</sect1>
259
260<sect1><title>Parsing</title>
261
262<para>The rest of the argument list is taken to be stylesheets and
263files to be transformed. Stylesheets are identified by their suffix,
264which is expected to be xsl (case sensitive). All other files are
265assumed to be XML documents, regardless of suffix.</para>
266
267<para>
268<programlisting>
269  <![CDATA[
270    /* Collect and parse stylesheets and files to be transformed */
271    for (; arg_indx < argc; arg_indx++) {
272        char *argument =
273            (char *) malloc(sizeof(char) * (strlen(argv[arg_indx]) + 1));
274        strcpy(argument, argv[arg_indx]);
275        if (strtok(argument, ".")) {
276            char *suffix = strtok(0, ".");
277            if (suffix && !strcmp(suffix, "xsl")) {
278                stylesheets[stylesheet_indx++] =
279                    xsltParseStylesheetFile((const xmlChar *)argv[arg_indx]);;
280            } else {
281                files[file_indx++] = xmlParseFile(argv[arg_indx]);
282            }
283        } else {
284            files[file_indx++] = xmlParseFile(argv[arg_indx]);
285        }
286        free(argument);
287    }
288  ]]>
289</programlisting>
290</para>
291
292<para>Stylesheets are parsed using the
293<function>xsltParseStylesheetFile</function>
294function. <function>xsltParseStylesheetFile</function> takes as
295argument a pointer to an <type>xmlChar</type>, a typedef of an
296unsigned char; in effect, the filename of the stylesheet. The
297resulting <type>xsltStylesheetPtr</type> is placed in the
298<varname>stylesheets</varname> array. In the same vein, XML files are
299parsed using the <function>xmlParseFile</function> function that takes
300as argument the file's name; the resulting <type>xmlDocPtr</type> is
301placed in the <varname>files</varname> array.
302</para>
303
304</sect1>
305
306<sect1><title>File Processing</title>
307
308<para>All stylesheets are applied to each file one after the
309other. Stylesheets are applied with the
310<function>xsltApplyStylesheet</function> function that takes as
311argument the stylesheet to be applied, the file to be transformed and
312any parameters we have collected. The in-memory representation of an
313XML document takes space, which we free using the
314<function>xmlFreeDoc</function> function. The file is then saved to the
315specified output.</para>
316
317<para>
318<programlisting>
319  <![CDATA[
320    /* Process files */
321    for (i = 0; files[i]; i++) {
322        doc = files[i];
323        res = doc;
324        for (j = 0; stylesheets[j]; j++) {
325            res = xsltApplyStylesheet(stylesheets[j], doc, params);
326            xmlFreeDoc(doc);
327            doc = res;
328        }
329
330        if (stylesheets[0]) {
331            xsltSaveResultToFile(output_file, res, stylesheets[j-1]);
332        } else {
333            xmlDocDump(output_file, res);
334        }
335        xmlFreeDoc(res);
336    }
337
338    fclose(output_file);
339
340    for (k = 0; stylesheets[k]; k++) {
341        xsltFreeStylesheet(stylesheets[k]);
342    }
343
344    xsltCleanupGlobals();
345    xmlCleanupParser();
346
347 finish:
348    free(stylesheets);
349    free(files);
350    return(return_value);
351    ]]>
352</programlisting>
353</para>
354
355<para>To output an XML document we have in memory we use the
356<function>xlstSaveResultToFile</function> function, where we specify
357the destination, the document and the stylesheet that has been applied
358to it. The stylesheet is required so that output-related information
359contained in the stylesheet, such as the encoding to be used, is used
360in output. If no transformation has taken place, which will happen
361when the user specifies no stylesheets at all in the command line, we
362use the <function>xmlDocDump</function> libxml function that saves the
363source document to the file without further ado.</para>
364
365<para>As parsed stylesheets take up space in memory, we take care to
366free that memory after use with a call to
367<function>xmlFreeStyleSheet</function>. When all work is done, we
368clean up all global variables used by the XSLT library using
369<function>xsltCleanupGlobals</function>. Likewise, all global memory
370allocated for the XML parser is reclaimed by a call to
371<function>xmlCleanupParser</function>. Before returning we deallocate
372the memory allocated for the holding the pointers to the XML documents
373and stylesheets.</para>
374
375</sect1>
376
377<sect1><title>*NIX Compiling and Linking</title>
378
379<para>Compiling and linking in a *NIX environment
380is easy, as the required libraries are almost certain to be already in
381place (remember that libxml and libxslt are used by the GNOME project,
382so they are present in most installations). The program can be
383dynamically linked so that its footprint is minimized, or statically
384linked, so that it stands by itself, carrying all required code.</para>
385
386<para>For dynamic linking the following one liner will do:</para>
387
388<para>
389<userinput>gcc -o libxslt_pipes -Wall -I/usr/include/libxml2 -lxslt
390-lxml2 -L/usr/lib libxslt_pipes.c</userinput>
391</para>
392
393<para>We assume that the necessary header files are in <filename
394class="directory">/usr/include/libxml2</filename> and that the
395required libraries (<filename>libxslt.so</filename>,
396<filename>libxml2.so</filename>) are in <filename
397class="directory">/usr/lib</filename>.</para>
398
399<para>In general, a program may need to link to additional libraries,
400depending on the processing it actually performs. A good way to start
401is to use the <command>xslt-config</command> script. The
402<option>--help</option> option displays usage
403information. Running</para>
404
405<para>
406  <userinput>
407    xslt-config --cflags
408  </userinput>
409</para>
410
411<para>we get compile flags, while running</para>
412
413<para>
414  <userinput>
415    xslt-config --libs
416  </userinput>
417</para>
418
419<para>we get the library settings for the linker.</para>
420
421<para>For static linking we must list more libraries than we did for
422dynamic linking, as the libraries on which the libxsl and libxslt
423libraries depend are also needed. Using <command>xslt-config</command>
424on a particular installation we create the following one-liner:</para>
425
426<para>
427<userinput>
428gcc -o libxslt_pipes -Wall -I/usr/include/libxml2 libxslt_pipes.c
429-static -L/usr/lib -lxslt -lxml2 -lz -lpthread -lm
430</userinput>
431</para>
432
433<para>If we get warnings to the effect that some function in
434statically linked applications requires at runtime the shared
435libraries used from the glibc version used for linking, that means
436that the binary is not completely static. Although we statically
437linked against the GNU C runtime library glibc, glibc uses external
438libraries to perform some of its functions. Same version libraries
439must be present on the system we want the application to run. One way
440to avoid this it to use an alternative C runtime, for example <ulink
441url="http://www.uclibc.org">uClibc</ulink>, which requires obtaining
442and building a uClibc toolchain first (if the reason for trying to get
443a statically linked version of the program is to embed it somewhere,
444using uClibc might be a good idea anyway).
445</para>
446
447</sect1>
448
449<sect1 id="windows-build"><title>MS-Windows Compiling and
450Linking</title>
451
452<para>Compiling and linking in MS-Windows requires
453some attention. First, the MS-Windows ports must be
454downloaded and installed in the programming workstation. The ports are
455available in <ulink url="http://www.zlatkovic.com/libxml.en.html">Igor
456Zlatkovi�'s site</ulink>. We need the ports for iconv, zlib, libxml,
457and libxslt. In contrast to *NIX environments, we
458cannot assume that the libraries needed will be present in other
459computers where the program will be used. One solution is to
460distribute the program along with the necessary dynamic
461libraries. Another solution is to statically link the program so that
462only a single executable file will have to be distributed.</para>
463
464<para>We assume that we have decompressed the downloaded ports and
465have placed the required contents of their <filename
466class="directory">include</filename> directories in an <filename
467class="directory">include</filename> directory in our file system. The
468required contents include everything apart from the <filename
469class="directory">libexslt</filename> directory of the libxslt port,
470as we are not using EXLST (an initiative to provide extensions to
471XSLT) in this project. In order to compile the program we have to make
472sure that all necessary header files are included. When using the
473Microsoft compiler this translates to adding the required
474<option>/I</option> switches in the command line. If using a Visual
475Studio product the same effect is attained by specifying additional
476include directories in the compilation options. In the end, if the
477headers have been copied in <filename
478class="directory">C:\include</filename> the command line must contain
479<option>/I"C:\include" /I"C:\include\libslt"
480/I"C:\include\libxml"</option>.</para>
481
482<para>This being a C program, it needs to be compiled against an
483implementation of the C libraries. Microsoft provides various
484implementations. The ports, however, have been compiled against the
485<filename>msvcrt.dll</filename> implementation, so it is wise to use
486the same runtime in our project, lest we wish to come against
487unexpected runtime crashes. The <filename>msvcrt.dll</filename> is a
488multi-threaded implementation and is specified by giving
489<option>/MD</option> as a compiler option. Unfortunately, the
490correspondence between the <option>/MD</option> switch and
491<filename>msvcrt.dll</filename> breaks after version 6 of the
492Microsoft compiler. In version 7 and later (i.e., Visual Studio .NET),
493<option>/MD</option> links against a different DLL; in version 7.1
494this is <filename>msvcrt71.dll</filename>. The end result of this bit
495of esoterica is that if you try to dynamically link your application
496with a compiler whose version is greater than 6, your program is
497likely to crash unexpectedly. Alternatively, you may wish to compile
498all iconv, zlib, libxml and libxslt yourself, using the new runtime
499library. This is not a tall order, and some details are given
500<link linkend="windows-ports-build">below</link>.</para>
501
502<para>There are three kinds of libraries in MS-Windows. Dynamically
503Linked Libraries (DLLs), like <filename>msvcrt.dll</filename> we met
504above, are used for dynamic linking; an application links to them at
505runtime, so the application does not include the code contained in
506them. Static libraries are used for static linking; an application
507adds the libraries' code to its own code at link time. Import
508libraries are used when building an application that uses DLLs. For
509the application to be built, the linker must somehow find the
510definitions of the functions that will be provided in runtime by the
511DLLs, otherwise it will complain about unresolved references. Import
512libraries contain function stubs that, for each DLL function we want
513to call, know where to look for it in the DLL. In essence, in order to
514use a DLL we must link against its corresponding import library. DLLs
515have a <filename>.dll</filename> suffix; static and import libraries
516both have a <filename>.lib</filename> suffix. In the MS-Windows ports
517of libxml and libxslt static libraries are distinguished by their name
518ending in <filename>_a.lib</filename>, while in the zlib port the
519import library is <filename>zdll.lib</filename> and the static library
520is <filename>zlib.lib</filename>. In what follows we assume we have a
521<filename class="directory">lib</filename> directory in our filesystem
522where we place the libraries we need for linking.</para>
523
524<para>If we want to link dynamically we must make sure the <filename
525class="directory">lib</filename> directory contains
526<filename>iconv.lib</filename>, <filename>libxslt.lib</filename>,
527<filename>libxml2.lib</filename>, and
528<filename>zdll.lib</filename>. When using the Microsoft linker this
529translates to adding the required <option>/LIBPATH</option>
530switch and the necessary libraries in the command line. In Visual
531Studio we must specify an additional library directory for <filename
532class="directory">lib</filename> and put the necessary libraries in
533the additional dependencies. In the end, the command line must include
534<option>/LIBPATH:"C:\lib" "lib\iconv.lib" "lib\libxslt.lib"
535"lib\libxml2.lib" "lib\zdll.lib"</option>, provided the libraries'
536directory is <filename class="directory">C:\lib</filename>. In order
537for the resulting executable to run, the ports DLLs must be present;
538one way is to place all DLLs contained in the ports in the home
539directory of our application, and make sure they are distributed
540together.</para>
541
542<para>If we want to link statically we must make sure the <filename
543class="directory">lib</filename> directory contains
544<filename>iconv_a.lib</filename>, <filename>libxslt_a.lib</filename>,
545<filename>libxml2_a.lib</filename>, and
546<filename>zlib.lib</filename>. Adding <filename
547class="directory">lib</filename> as a library directory and putting
548the necessary libraries in the additional dependencies, we get a
549command line that should include <option>/LIBPATH:"C:\lib"
550"lib\iconv_a.lib" "lib\libxslt_a.lib" "lib\libxml2_a.lib"
551"lib\zlib.lib"</option>. The resulting executable is much bigger
552than if we linked dynamically; it is, however, self-contained and can
553be distributed more easily, in theory at least. In practice, however,
554the executable is not completely static. We saw that the ports are
555compiled against <filename>msvcrt.dll</filename>, so the program does
556require that DLL at runtime. Moreover, since when using a version of
557Microsoft developer tools with a version number greater than 6, we are
558no longer using <filename>msvcrt.dll</filename>, but another runtime
559like <filename>msvcrt71.dll</filename>, and we then need that DLL.  In
560contrast to <filename>msvcrt.dll</filename> it may not be present on
561the target computer, so we may have to copy it along.</para>
562
563<sect2 id="windows-ports-build"><title>Building the Ports in
564MS-Windows</title>
565
566<para>The source code of the ports is readily available on the web,
567one has to check the ports sites. Each port can be built without
568problems in an MS-Windows environment using Microsoft development
569tools.  The necessary command line tools (compiler, linker,
570<command>nmake</command>) must be available. This means running a
571batch file called <command>vcvars32.bat</command> that comes with
572Visual Studio (its exact location in the directory tree may vary
573depending on the version of Visual Studio, but a file search will find
574it anyway). Makefiles for the Microsoft tools are found in all
575ports. They are distinguished by their suffix, e.g.,
576<filename>Makefile.msvc</filename> or
577<filename>Makefile.msc</filename>. To build zlib it suffices to run
578<command>nmake</command> against <filename>Makefile.msc</filename>
579(i.e., with the <option>/F</option> option); similarly, to build
580<filename>iconv</filename> it suffices to run <command>nmake</command>
581against <filename>Makefile.msvc</filename>. Building libxml and
582libxslt requires an extra configuration step; we must run the
583<filename>configure.js</filename> configuration script with the
584<command>cscript</command> command. <filename>configure.js</filename>
585is found in the <filename class="directory">win32</filename> directory
586in the distributions. It is written in JScript, Microsoft's
587implementation of the ECMA 262 language specification (ECMAScript
588Edition 3), a JavaScript offspring. The configuration string takes a
589number of parameters detailing our environment and needs;
590<userinput>cscript configure.js help</userinput> documents
591them.</para>
592
593<para>It is wise to read all documentation files in the source
594distributions before starting; moreover, pay attention to the
595dependencies between the ports. If we configure libxml and libxslt to
596use iconv and zlib we must build these two first and make sure their
597headers and libraries can be found by the compiler and the
598linker when building libxml and libxslt.</para>
599
600</sect2>
601
602</sect1>
603
604<sect1><title>zlib, iconv and All That</title>
605
606<para>We saw that libxml and libxslt depend on various other
607libraries, for instance zlib, iconv, and so forth. Taking a look into
608them gives us clues on the capabilities of libxml and libxslt.</para>
609
610<para><ulink url="http://www.zlib.org">zlib</ulink> is a free general
611purpose lossless data compression library. It is a venerable
612workhorse; more than <ulink
613url="http://www.gzip.org/zlib/apps.html">500 applications</ulink>
614(both commercial and open source) seem to use the library. libxml uses
615zlib so that it can read from or write to compressed files
616directly. The <function>xmlParseFile</function> function can
617transparently parse a compressed document to produce an
618<structname>xmlDoc</structname>. If we want to create a compressed
619document with libxml we can use an
620<structname>xmlTextWriterPtr</structname> (obtained through
621<function>xmlNewTextWriterDoc</function>), or another related
622structure from <filename>libxml/xmlwriter.h</filename>, with
623compression enabled.</para>
624
625<para>XML allows documents to use a variety of different character
626encodings. <ulink
627url="http://www.gnu.org/software/libiconv">iconv</ulink> is a free
628library for converting between different character encodings.  libxml
629provides a set of default converters for some encodings: UTF-8, UTF-16
630(little endian and big endian), ISO-8859-1, ASCII, and HTML (a
631specific handler for the conversion of UTF-8 to ASCII with HTML
632predefined entities like &amp;copy; for the copyright sign). However,
633when compiled with iconv support, libxml and libxslt can handle the
634full range of encodings provided by iconv; these should cover most
635needs.</para>
636
637<para>libxml and libxslt can be used in multi-threaded
638applications. In MS-Windows they are linked against
639<filename>MSVCRT.DLL</filename> (or one of its descendants, as we saw
640<link linkend="windows-build">above</link>). In *NIX the pthreads
641(POSIX threads) library is used.</para>
642
643</sect1>
644
645<sect1><title>The Complete Program</title>
646
647<para>
648The complete program listing is given below. The program is also
649<ulink url="libxslt_pipes.c">available online</ulink>.
650</para>
651
652<para>
653<programlisting>
654<xi:include href="libxslt_pipes.c" parse="text"
655	    xmlns:xi="http://www.w3.org/2003/XInclude"/>
656</programlisting>
657</para>
658
659</sect1>
660
661</article>
662