1<?xml version="1.0" encoding="iso-8859-2"?> 2<!DOCTYPE article 3SYSTEM "file:///usr/share/docbook/docbook-xml-4.3/docbookx.dtd"> 4 5<article id="libxslt"> 6<articleinfo> 7 <author><firstname>Panos</firstname><surname>Louridas</surname></author> 8 <copyright> 9 <year>2004</year> 10 <holder>Panagiotis Louridas</holder> 11 </copyright> 12 <legalnotice> 13 <para>Permission is hereby granted, free of charge, to 14 any person obtaining a copy of this software and associated 15 documentation files (the "Software"), to deal in the Software 16 without restriction, including without limitation the rights to use, 17 copy, modify, merge, publish, distribute, sublicense, and/or sell 18 copies of the Software, and to permit persons to whom the Software 19 is furnished to do so, subject to the following conditions: 20 </para> 21 22 <para>The above copyright notice and this permission notice shall be 23 included in all copies or substantial portions of the Software. 24 </para> 25 26 <para>THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 27 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 28 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 29 NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE 30 LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 31 OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION 32 WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.</para> 33 34 </legalnotice> 35</articleinfo> 36 37<title>libxslt: An Extended Tutorial</title> 38 39<sect1><title>Introduction</title> 40 41<para>The Extensible Stylesheet Language Transformations (XSLT) 42specification defines an XML template language for transforming XML 43documents. An XSLT engine reads an XSLT file and an XML document and 44transforms the document accordingly.</para> 45 46<para>We want to perform a series of XSLT transformations to a series 47of documents. An obvious solution is to use the operating system's 48pipe mechanism and start a series of transformation processes, each 49one taking as input the output of the previous transformation. It 50would be interesting, though, and perhaps more efficient if we could 51do our job within a single process.</para> 52 53<para>libxslt is a library for doing XSLT transformations. It is built 54on libxml, which is a library for handling XML documents. libxml and 55libxslt are used by the GNOME project. Although developed in the 56*NIX world, both libxml and libxslt have been 57ported to the MS-Windows platform. In principle an application using 58libxslt should be easily portable between the two systems. In 59practice, however, there arise various wrinkles. These do not have 60anything to do with libxml or libxslt per se, but rather with the 61different compilation and linking procedures of each system.</para> 62 63<para>The presented solution is an extension of <ulink 64url="http://xmlsoft.org/XSLT/tutorial/libxslttutorial.html">John 65Fleck's libxslt tutorial</ulink>, but the present tutorial tries to be 66self-contained. It develops a minimal libxslt application 67(libxslt_pipes) that can perform a series of transformations to a 68series of files in a pipe-like manner. An invocation might be:</para> 69 70<para> 71 <userinput> 72 libxslt_pipes --out results.xml foo.xsl bar.xsl doc1.xml doc2.xml 73 </userinput> 74</para> 75 76<para>The <filename>foo.xsl</filename> stylesheet will be applied to 77<filename> doc1.xml</filename> and the <filename>bar.xsl</filename> 78stylesheet will be applied to the resulting document; then the two 79stylesheets will be applied in the same sequence to 80<filename>bar.xsl</filename>. The results are sent to 81<filename>results.xml</filename> (if no output is specified they are 82sent to standard output).</para> 83 84<para>The application is compiled in both *NIX 85systems and MS-Windows, where by *NIX systems we 86mean Linux, BSD, and other members of the 87family. The gcc suite is used in the *NIX platform 88and the Microsoft compiler and linker are used in the 89MS-Windows platform.</para> 90 91</sect1> 92 93<sect1><title>Setting the Scene</title> 94 95<para> 96We need to include the necessary libraries: 97 98<programlisting> 99 <![CDATA[ 100 #include <stdio.h> 101 #include <string.h> 102 #include <stdlib.h> 103 104 #include <libxslt/transform.h> 105 #include <libxslt/xsltutils.h> 106 ]]> 107</programlisting> 108</para> 109 110<para>The first group of include directives includes general C 111libraries. The libraries we need to make libxslt work are in the 112second group. The <filename>transform.h</filename> header file 113declares the API that does the bulk of the actual processing. The 114<filename>xsltutils.h</filename> header file declares the API for some 115generic utility functions of the XSLT engine; among other things, 116saving to a file, which is what we need it for.</para> 117 118<para> 119If our input files contain entities through external subsets, we need 120to tell libxslt to load them. The global variable 121<function>xmlLoadExtDtdDefaultValue</function>, defined in 122<filename>libxml/globals.h</filename>, is responsible for that. As the 123variable is defined outside our program we must specify external 124linkage: 125 <programlisting> 126 extern int xmlLoadExtDtdDefaultValue; 127 </programlisting> 128</para> 129 130<para> 131The program is called from the command line. We anticipate that the 132user may not call it the right way, so we define a function for 133describing its usage: 134<programlisting> 135 static void usage(const char *name) { 136 printf("Usage: %s [options] stylesheet [stylesheet ...] file [file ...]\n", 137 name); 138 printf(" --out file: send output to file\n"); 139 printf(" --param name value: pass a (parameter,value) pair\n"); 140 } 141</programlisting> 142</para> 143</sect1> 144 145<sect1><title>Program Start</title> 146 147<para>We need to define a few variables that are used throughout the 148program: 149<programlisting> 150 int main(int argc, char **argv) { 151 int arg_indx; 152 const char *params[16 + 1]; 153 int params_indx = 0; 154 int stylesheet_indx = 0; 155 int file_indx = 0; 156 int i, j, k; 157 FILE *output_file = stdout; 158 xsltStylesheetPtr *stylesheets = 159 (xsltStylesheetPtr *) calloc(argc, sizeof(xsltStylesheetPtr)); 160 xmlDocPtr *files = (xmlDocPtr *) calloc(argc, sizeof(xmlDocPtr)); 161 int return_value = 0; 162</programlisting> 163</para> 164 165<para>The <varname>arg_indx</varname> integer is an index used to 166iterate over the program arguments. The <varname>params</varname> 167string array is used to collect the XSLT parameters. In XSLT, 168additional information may be passed to the processor via 169parameters. The user of the program specifies these in key-value pairs 170in the command line following the <userinput>--param</userinput> 171command line argument. We accept up to 8 such key-value pairs, which 172we track with the <varname>params_indx</varname> integer. libxslt 173expects the parameters array to be null-terminated, so we have to 174allocate one extra place (16 + 1) for it. The 175<varname>file_indx</varname> is an index to iterate over the files to 176be processed. The <varname>i</varname>, <varname>j</varname>, 177<varname>k</varname> integers are additional indices for iteration 178purposes, and <varname>return_value</varname> is the value the program 179returns to the operating system. We expect the result of the 180transformation to be the standard output in most cases, but the user 181may wish otherwise via the <option>--out</option> command line 182option, so we need to keep track of the situation with the 183<varname>output_file</varname> file pointer.</para> 184 185<para>In libxslt, XSLT stylesheets are internally stored in 186<structname>xsltStylesheet</structname> structures; similarly, in 187libxml XML documents are stored in <structname>xmlDoc</structname> 188structures. <type>xsltStylesheetPtr</type> and <type>xmlDocPtr</type> 189are simply typedefs of pointers to them. The user may specify any 190number of stylesheets that will be applied to the documents one after 191the other. To save time we parse the stylesheets and the documents as 192we read them from the command line and keep the parsed representation 193of them. The parsed results are kept in arrays. These are dynamically 194allocated and sized to the number of arguments; this wastes some 195space, but not much (the size of <type>xmlStyleSheetPtr</type> and 196<type>xmlDocPtr</type> is the size of a pointer) and simplifies code 197later on. The array memory is allocated with 198<function>calloc</function> to ensure contents are initialised to 199zero. 200</para> 201 202</sect1> 203 204<sect1><title>Arguments Collection</title> 205 206<para>If the program gets no arguments at all, we print the usage 207description, set the program return value to 1 and exit. Instead of 208returning directly we go to (literally) to the end of the program text 209where some housekeeping takes place.</para> 210 211<para> 212<programlisting> 213 <![CDATA[ 214 if (argc <= 1) { 215 usage(argv[0]); 216 return_value = 1; 217 goto finish; 218 } 219 220 /* Collect arguments */ 221 for (arg_indx = 1; arg_indx < argc; arg_indx++) { 222 if (argv[arg_indx][0] != '-') 223 break; 224 if ((!strcmp(argv[arg_indx], "-param")) 225 || (!strcmp(argv[arg_indx], "--param"))) { 226 arg_indx++; 227 params[params_indx++] = argv[arg_indx++]; 228 params[params_indx++] = argv[arg_indx]; 229 if (params_indx >= 16) { 230 fprintf(stderr, "too many params\n"); 231 return_value = 1; 232 goto finish; 233 } 234 } else if ((!strcmp(argv[arg_indx], "-o")) 235 || (!strcmp(argv[arg_indx], "--out"))) { 236 arg_indx++; 237 output_file = fopen(argv[arg_indx], "w"); 238 } else { 239 fprintf(stderr, "Unknown option %s\n", argv[arg_indx]); 240 usage(argv[0]); 241 return_value = 1; 242 goto finish; 243 } 244 } 245 params[params_indx] = 0; 246 ]]> 247</programlisting> 248</para> 249 250<para>If the user passes arguments we have to collect them. This is a 251matter of iterating over the program argument list while we encounter 252arguments starting with a dash. The XSLT parameters are put into the 253<varname>params</varname> array and the <varname>output_file</varname> 254is set to the user request, if any. After processing all the parameter 255key-value pairs we set the last element of the <varname>params</varname> 256array to null. 257</para> 258</sect1> 259 260<sect1><title>Parsing</title> 261 262<para>The rest of the argument list is taken to be stylesheets and 263files to be transformed. Stylesheets are identified by their suffix, 264which is expected to be xsl (case sensitive). All other files are 265assumed to be XML documents, regardless of suffix.</para> 266 267<para> 268<programlisting> 269 <![CDATA[ 270 /* Collect and parse stylesheets and files to be transformed */ 271 for (; arg_indx < argc; arg_indx++) { 272 char *argument = 273 (char *) malloc(sizeof(char) * (strlen(argv[arg_indx]) + 1)); 274 strcpy(argument, argv[arg_indx]); 275 if (strtok(argument, ".")) { 276 char *suffix = strtok(0, "."); 277 if (suffix && !strcmp(suffix, "xsl")) { 278 stylesheets[stylesheet_indx++] = 279 xsltParseStylesheetFile((const xmlChar *)argv[arg_indx]);; 280 } else { 281 files[file_indx++] = xmlParseFile(argv[arg_indx]); 282 } 283 } else { 284 files[file_indx++] = xmlParseFile(argv[arg_indx]); 285 } 286 free(argument); 287 } 288 ]]> 289</programlisting> 290</para> 291 292<para>Stylesheets are parsed using the 293<function>xsltParseStylesheetFile</function> 294function. <function>xsltParseStylesheetFile</function> takes as 295argument a pointer to an <type>xmlChar</type>, a typedef of an 296unsigned char; in effect, the filename of the stylesheet. The 297resulting <type>xsltStylesheetPtr</type> is placed in the 298<varname>stylesheets</varname> array. In the same vein, XML files are 299parsed using the <function>xmlParseFile</function> function that takes 300as argument the file's name; the resulting <type>xmlDocPtr</type> is 301placed in the <varname>files</varname> array. 302</para> 303 304</sect1> 305 306<sect1><title>File Processing</title> 307 308<para>All stylesheets are applied to each file one after the 309other. Stylesheets are applied with the 310<function>xsltApplyStylesheet</function> function that takes as 311argument the stylesheet to be applied, the file to be transformed and 312any parameters we have collected. The in-memory representation of an 313XML document takes space, which we free using the 314<function>xmlFreeDoc</function> function. The file is then saved to the 315specified output.</para> 316 317<para> 318<programlisting> 319 <![CDATA[ 320 /* Process files */ 321 for (i = 0; files[i]; i++) { 322 doc = files[i]; 323 res = doc; 324 for (j = 0; stylesheets[j]; j++) { 325 res = xsltApplyStylesheet(stylesheets[j], doc, params); 326 xmlFreeDoc(doc); 327 doc = res; 328 } 329 330 if (stylesheets[0]) { 331 xsltSaveResultToFile(output_file, res, stylesheets[j-1]); 332 } else { 333 xmlDocDump(output_file, res); 334 } 335 xmlFreeDoc(res); 336 } 337 338 fclose(output_file); 339 340 for (k = 0; stylesheets[k]; k++) { 341 xsltFreeStylesheet(stylesheets[k]); 342 } 343 344 xsltCleanupGlobals(); 345 xmlCleanupParser(); 346 347 finish: 348 free(stylesheets); 349 free(files); 350 return(return_value); 351 ]]> 352</programlisting> 353</para> 354 355<para>To output an XML document we have in memory we use the 356<function>xlstSaveResultToFile</function> function, where we specify 357the destination, the document and the stylesheet that has been applied 358to it. The stylesheet is required so that output-related information 359contained in the stylesheet, such as the encoding to be used, is used 360in output. If no transformation has taken place, which will happen 361when the user specifies no stylesheets at all in the command line, we 362use the <function>xmlDocDump</function> libxml function that saves the 363source document to the file without further ado.</para> 364 365<para>As parsed stylesheets take up space in memory, we take care to 366free that memory after use with a call to 367<function>xmlFreeStyleSheet</function>. When all work is done, we 368clean up all global variables used by the XSLT library using 369<function>xsltCleanupGlobals</function>. Likewise, all global memory 370allocated for the XML parser is reclaimed by a call to 371<function>xmlCleanupParser</function>. Before returning we deallocate 372the memory allocated for the holding the pointers to the XML documents 373and stylesheets.</para> 374 375</sect1> 376 377<sect1><title>*NIX Compiling and Linking</title> 378 379<para>Compiling and linking in a *NIX environment 380is easy, as the required libraries are almost certain to be already in 381place (remember that libxml and libxslt are used by the GNOME project, 382so they are present in most installations). The program can be 383dynamically linked so that its footprint is minimized, or statically 384linked, so that it stands by itself, carrying all required code.</para> 385 386<para>For dynamic linking the following one liner will do:</para> 387 388<para> 389<userinput>gcc -o libxslt_pipes -Wall -I/usr/include/libxml2 -lxslt 390-lxml2 -L/usr/lib libxslt_pipes.c</userinput> 391</para> 392 393<para>We assume that the necessary header files are in <filename 394class="directory">/usr/include/libxml2</filename> and that the 395required libraries (<filename>libxslt.so</filename>, 396<filename>libxml2.so</filename>) are in <filename 397class="directory">/usr/lib</filename>.</para> 398 399<para>In general, a program may need to link to additional libraries, 400depending on the processing it actually performs. A good way to start 401is to use the <command>xslt-config</command> script. The 402<option>--help</option> option displays usage 403information. Running</para> 404 405<para> 406 <userinput> 407 xslt-config --cflags 408 </userinput> 409</para> 410 411<para>we get compile flags, while running</para> 412 413<para> 414 <userinput> 415 xslt-config --libs 416 </userinput> 417</para> 418 419<para>we get the library settings for the linker.</para> 420 421<para>For static linking we must list more libraries than we did for 422dynamic linking, as the libraries on which the libxsl and libxslt 423libraries depend are also needed. Using <command>xslt-config</command> 424on a particular installation we create the following one-liner:</para> 425 426<para> 427<userinput> 428gcc -o libxslt_pipes -Wall -I/usr/include/libxml2 libxslt_pipes.c 429-static -L/usr/lib -lxslt -lxml2 -lz -lpthread -lm 430</userinput> 431</para> 432 433<para>If we get warnings to the effect that some function in 434statically linked applications requires at runtime the shared 435libraries used from the glibc version used for linking, that means 436that the binary is not completely static. Although we statically 437linked against the GNU C runtime library glibc, glibc uses external 438libraries to perform some of its functions. Same version libraries 439must be present on the system we want the application to run. One way 440to avoid this it to use an alternative C runtime, for example <ulink 441url="http://www.uclibc.org">uClibc</ulink>, which requires obtaining 442and building a uClibc toolchain first (if the reason for trying to get 443a statically linked version of the program is to embed it somewhere, 444using uClibc might be a good idea anyway). 445</para> 446 447</sect1> 448 449<sect1 id="windows-build"><title>MS-Windows Compiling and 450Linking</title> 451 452<para>Compiling and linking in MS-Windows requires 453some attention. First, the MS-Windows ports must be 454downloaded and installed in the programming workstation. The ports are 455available in <ulink url="http://www.zlatkovic.com/libxml.en.html">Igor 456Zlatkovi�'s site</ulink>. We need the ports for iconv, zlib, libxml, 457and libxslt. In contrast to *NIX environments, we 458cannot assume that the libraries needed will be present in other 459computers where the program will be used. One solution is to 460distribute the program along with the necessary dynamic 461libraries. Another solution is to statically link the program so that 462only a single executable file will have to be distributed.</para> 463 464<para>We assume that we have decompressed the downloaded ports and 465have placed the required contents of their <filename 466class="directory">include</filename> directories in an <filename 467class="directory">include</filename> directory in our file system. The 468required contents include everything apart from the <filename 469class="directory">libexslt</filename> directory of the libxslt port, 470as we are not using EXLST (an initiative to provide extensions to 471XSLT) in this project. In order to compile the program we have to make 472sure that all necessary header files are included. When using the 473Microsoft compiler this translates to adding the required 474<option>/I</option> switches in the command line. If using a Visual 475Studio product the same effect is attained by specifying additional 476include directories in the compilation options. In the end, if the 477headers have been copied in <filename 478class="directory">C:\include</filename> the command line must contain 479<option>/I"C:\include" /I"C:\include\libslt" 480/I"C:\include\libxml"</option>.</para> 481 482<para>This being a C program, it needs to be compiled against an 483implementation of the C libraries. Microsoft provides various 484implementations. The ports, however, have been compiled against the 485<filename>msvcrt.dll</filename> implementation, so it is wise to use 486the same runtime in our project, lest we wish to come against 487unexpected runtime crashes. The <filename>msvcrt.dll</filename> is a 488multi-threaded implementation and is specified by giving 489<option>/MD</option> as a compiler option. Unfortunately, the 490correspondence between the <option>/MD</option> switch and 491<filename>msvcrt.dll</filename> breaks after version 6 of the 492Microsoft compiler. In version 7 and later (i.e., Visual Studio .NET), 493<option>/MD</option> links against a different DLL; in version 7.1 494this is <filename>msvcrt71.dll</filename>. The end result of this bit 495of esoterica is that if you try to dynamically link your application 496with a compiler whose version is greater than 6, your program is 497likely to crash unexpectedly. Alternatively, you may wish to compile 498all iconv, zlib, libxml and libxslt yourself, using the new runtime 499library. This is not a tall order, and some details are given 500<link linkend="windows-ports-build">below</link>.</para> 501 502<para>There are three kinds of libraries in MS-Windows. Dynamically 503Linked Libraries (DLLs), like <filename>msvcrt.dll</filename> we met 504above, are used for dynamic linking; an application links to them at 505runtime, so the application does not include the code contained in 506them. Static libraries are used for static linking; an application 507adds the libraries' code to its own code at link time. Import 508libraries are used when building an application that uses DLLs. For 509the application to be built, the linker must somehow find the 510definitions of the functions that will be provided in runtime by the 511DLLs, otherwise it will complain about unresolved references. Import 512libraries contain function stubs that, for each DLL function we want 513to call, know where to look for it in the DLL. In essence, in order to 514use a DLL we must link against its corresponding import library. DLLs 515have a <filename>.dll</filename> suffix; static and import libraries 516both have a <filename>.lib</filename> suffix. In the MS-Windows ports 517of libxml and libxslt static libraries are distinguished by their name 518ending in <filename>_a.lib</filename>, while in the zlib port the 519import library is <filename>zdll.lib</filename> and the static library 520is <filename>zlib.lib</filename>. In what follows we assume we have a 521<filename class="directory">lib</filename> directory in our filesystem 522where we place the libraries we need for linking.</para> 523 524<para>If we want to link dynamically we must make sure the <filename 525class="directory">lib</filename> directory contains 526<filename>iconv.lib</filename>, <filename>libxslt.lib</filename>, 527<filename>libxml2.lib</filename>, and 528<filename>zdll.lib</filename>. When using the Microsoft linker this 529translates to adding the required <option>/LIBPATH</option> 530switch and the necessary libraries in the command line. In Visual 531Studio we must specify an additional library directory for <filename 532class="directory">lib</filename> and put the necessary libraries in 533the additional dependencies. In the end, the command line must include 534<option>/LIBPATH:"C:\lib" "lib\iconv.lib" "lib\libxslt.lib" 535"lib\libxml2.lib" "lib\zdll.lib"</option>, provided the libraries' 536directory is <filename class="directory">C:\lib</filename>. In order 537for the resulting executable to run, the ports DLLs must be present; 538one way is to place all DLLs contained in the ports in the home 539directory of our application, and make sure they are distributed 540together.</para> 541 542<para>If we want to link statically we must make sure the <filename 543class="directory">lib</filename> directory contains 544<filename>iconv_a.lib</filename>, <filename>libxslt_a.lib</filename>, 545<filename>libxml2_a.lib</filename>, and 546<filename>zlib.lib</filename>. Adding <filename 547class="directory">lib</filename> as a library directory and putting 548the necessary libraries in the additional dependencies, we get a 549command line that should include <option>/LIBPATH:"C:\lib" 550"lib\iconv_a.lib" "lib\libxslt_a.lib" "lib\libxml2_a.lib" 551"lib\zlib.lib"</option>. The resulting executable is much bigger 552than if we linked dynamically; it is, however, self-contained and can 553be distributed more easily, in theory at least. In practice, however, 554the executable is not completely static. We saw that the ports are 555compiled against <filename>msvcrt.dll</filename>, so the program does 556require that DLL at runtime. Moreover, since when using a version of 557Microsoft developer tools with a version number greater than 6, we are 558no longer using <filename>msvcrt.dll</filename>, but another runtime 559like <filename>msvcrt71.dll</filename>, and we then need that DLL. In 560contrast to <filename>msvcrt.dll</filename> it may not be present on 561the target computer, so we may have to copy it along.</para> 562 563<sect2 id="windows-ports-build"><title>Building the Ports in 564MS-Windows</title> 565 566<para>The source code of the ports is readily available on the web, 567one has to check the ports sites. Each port can be built without 568problems in an MS-Windows environment using Microsoft development 569tools. The necessary command line tools (compiler, linker, 570<command>nmake</command>) must be available. This means running a 571batch file called <command>vcvars32.bat</command> that comes with 572Visual Studio (its exact location in the directory tree may vary 573depending on the version of Visual Studio, but a file search will find 574it anyway). Makefiles for the Microsoft tools are found in all 575ports. They are distinguished by their suffix, e.g., 576<filename>Makefile.msvc</filename> or 577<filename>Makefile.msc</filename>. To build zlib it suffices to run 578<command>nmake</command> against <filename>Makefile.msc</filename> 579(i.e., with the <option>/F</option> option); similarly, to build 580<filename>iconv</filename> it suffices to run <command>nmake</command> 581against <filename>Makefile.msvc</filename>. Building libxml and 582libxslt requires an extra configuration step; we must run the 583<filename>configure.js</filename> configuration script with the 584<command>cscript</command> command. <filename>configure.js</filename> 585is found in the <filename class="directory">win32</filename> directory 586in the distributions. It is written in JScript, Microsoft's 587implementation of the ECMA 262 language specification (ECMAScript 588Edition 3), a JavaScript offspring. The configuration string takes a 589number of parameters detailing our environment and needs; 590<userinput>cscript configure.js help</userinput> documents 591them.</para> 592 593<para>It is wise to read all documentation files in the source 594distributions before starting; moreover, pay attention to the 595dependencies between the ports. If we configure libxml and libxslt to 596use iconv and zlib we must build these two first and make sure their 597headers and libraries can be found by the compiler and the 598linker when building libxml and libxslt.</para> 599 600</sect2> 601 602</sect1> 603 604<sect1><title>zlib, iconv and All That</title> 605 606<para>We saw that libxml and libxslt depend on various other 607libraries, for instance zlib, iconv, and so forth. Taking a look into 608them gives us clues on the capabilities of libxml and libxslt.</para> 609 610<para><ulink url="http://www.zlib.org">zlib</ulink> is a free general 611purpose lossless data compression library. It is a venerable 612workhorse; more than <ulink 613url="http://www.gzip.org/zlib/apps.html">500 applications</ulink> 614(both commercial and open source) seem to use the library. libxml uses 615zlib so that it can read from or write to compressed files 616directly. The <function>xmlParseFile</function> function can 617transparently parse a compressed document to produce an 618<structname>xmlDoc</structname>. If we want to create a compressed 619document with libxml we can use an 620<structname>xmlTextWriterPtr</structname> (obtained through 621<function>xmlNewTextWriterDoc</function>), or another related 622structure from <filename>libxml/xmlwriter.h</filename>, with 623compression enabled.</para> 624 625<para>XML allows documents to use a variety of different character 626encodings. <ulink 627url="http://www.gnu.org/software/libiconv">iconv</ulink> is a free 628library for converting between different character encodings. libxml 629provides a set of default converters for some encodings: UTF-8, UTF-16 630(little endian and big endian), ISO-8859-1, ASCII, and HTML (a 631specific handler for the conversion of UTF-8 to ASCII with HTML 632predefined entities like &copy; for the copyright sign). However, 633when compiled with iconv support, libxml and libxslt can handle the 634full range of encodings provided by iconv; these should cover most 635needs.</para> 636 637<para>libxml and libxslt can be used in multi-threaded 638applications. In MS-Windows they are linked against 639<filename>MSVCRT.DLL</filename> (or one of its descendants, as we saw 640<link linkend="windows-build">above</link>). In *NIX the pthreads 641(POSIX threads) library is used.</para> 642 643</sect1> 644 645<sect1><title>The Complete Program</title> 646 647<para> 648The complete program listing is given below. The program is also 649<ulink url="libxslt_pipes.c">available online</ulink>. 650</para> 651 652<para> 653<programlisting> 654<xi:include href="libxslt_pipes.c" parse="text" 655 xmlns:xi="http://www.w3.org/2003/XInclude"/> 656</programlisting> 657</para> 658 659</sect1> 660 661</article> 662