1 | <html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>libxslt: An Extended Tutorial</title><meta name="generator" content="DocBook XSL Stylesheets V1.66.0"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="article" lang="en"><div class="titlepage"><div><div><h2 class="title"><a name="libxslt"></a>libxslt: An Extended Tutorial</h2></div><div><div class="author"><h3 class="author"><span class="firstname">Panos</span> <span class="surname">Louridas</span></h3></div></div><div><p class="copyright">Copyright © 2004 Panagiotis Louridas</p></div><div><div class="legalnotice"><a name="id2839296"></a><p>Permission is hereby granted, free of charge, to
|
---|
2 | any person obtaining a copy of this software and associated
|
---|
3 | documentation files (the "Software"), to deal in the Software
|
---|
4 | without restriction, including without limitation the rights to use,
|
---|
5 | copy, modify, merge, publish, distribute, sublicense, and/or sell
|
---|
6 | copies of the Software, and to permit persons to whom the Software
|
---|
7 | is furnished to do so, subject to the following conditions:
|
---|
8 | </p><p>The above copyright notice and this permission notice shall be
|
---|
9 | included in all copies or substantial portions of the Software.
|
---|
10 | </p><p>THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
---|
11 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
---|
12 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
---|
13 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
---|
14 | LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
---|
15 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
---|
16 | WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.</p></div></div></div><hr></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="sect1"><a href="#id2771767">Introduction</a></span></dt><dt><span class="sect1"><a href="#id2771862">Setting the Scene</a></span></dt><dt><span class="sect1"><a href="#id2799225">Program Start</a></span></dt><dt><span class="sect1"><a href="#id2799358">Arguments Collection</a></span></dt><dt><span class="sect1"><a href="#id2799396">Parsing</a></span></dt><dt><span class="sect1"><a href="#id2771038">File Processing</a></span></dt><dt><span class="sect1"><a href="#id2771153">*NIX Compiling and Linking</a></span></dt><dt><span class="sect1"><a href="#windows-build">MS-Windows Compiling and
|
---|
17 | Linking</a></span></dt><dd><dl><dt><span class="sect2"><a href="#windows-ports-build">Building the Ports in
|
---|
18 | MS-Windows</a></span></dt></dl></dd><dt><span class="sect1"><a href="#id2839739">zlib, iconv and All That</a></span></dt><dt><span class="sect1"><a href="#id2839841">The Complete Program</a></span></dt></dl></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="id2771767"></a>Introduction</h2></div></div></div><p>The Extensible Stylesheet Language Transformations (XSLT)
|
---|
19 | specification defines an XML template language for transforming XML
|
---|
20 | documents. An XSLT engine reads an XSLT file and an XML document and
|
---|
21 | transforms the document accordingly.</p><p>We want to perform a series of XSLT transformations to a series
|
---|
22 | of documents. An obvious solution is to use the operating system's
|
---|
23 | pipe mechanism and start a series of transformation processes, each
|
---|
24 | one taking as input the output of the previous transformation. It
|
---|
25 | would be interesting, though, and perhaps more efficient if we could
|
---|
26 | do our job within a single process.</p><p>libxslt is a library for doing XSLT transformations. It is built
|
---|
27 | on libxml, which is a library for handling XML documents. libxml and
|
---|
28 | libxslt are used by the GNOME project. Although developed in the
|
---|
29 | *NIX world, both libxml and libxslt have been
|
---|
30 | ported to the MS-Windows platform. In principle an application using
|
---|
31 | libxslt should be easily portable between the two systems. In
|
---|
32 | practice, however, there arise various wrinkles. These do not have
|
---|
33 | anything to do with libxml or libxslt per se, but rather with the
|
---|
34 | different compilation and linking procedures of each system.</p><p>The presented solution is an extension of <a href="http://xmlsoft.org/XSLT/tutorial/libxslttutorial.html" target="_top">John
|
---|
35 | Fleck's libxslt tutorial</a>, but the present tutorial tries to be
|
---|
36 | self-contained. It develops a minimal libxslt application
|
---|
37 | (libxslt_pipes) that can perform a series of transformations to a
|
---|
38 | series of files in a pipe-like manner. An invocation might be:</p><p>
|
---|
39 | <b class="userinput"><tt>
|
---|
40 | libxslt_pipes --out results.xml foo.xsl bar.xsl doc1.xml doc2.xml
|
---|
41 | </tt></b>
|
---|
42 | </p><p>The <tt class="filename">foo.xsl</tt> stylesheet will be applied to
|
---|
43 | <tt class="filename"> doc1.xml</tt> and the <tt class="filename">bar.xsl</tt>
|
---|
44 | stylesheet will be applied to the resulting document; then the two
|
---|
45 | stylesheets will be applied in the same sequence to
|
---|
46 | <tt class="filename">bar.xsl</tt>. The results are sent to
|
---|
47 | <tt class="filename">results.xml</tt> (if no output is specified they are
|
---|
48 | sent to standard output).</p><p>The application is compiled in both *NIX
|
---|
49 | systems and MS-Windows, where by *NIX systems we
|
---|
50 | mean Linux, BSD, and other members of the
|
---|
51 | family. The gcc suite is used in the *NIX platform
|
---|
52 | and the Microsoft compiler and linker are used in the
|
---|
53 | MS-Windows platform.</p></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="id2771862"></a>Setting the Scene</h2></div></div></div><p>
|
---|
54 | We need to include the necessary libraries:
|
---|
55 |
|
---|
56 | </p><pre class="programlisting">
|
---|
57 |
|
---|
58 | #include <stdio.h>
|
---|
59 | #include <string.h>
|
---|
60 | #include <stdlib.h>
|
---|
61 |
|
---|
62 | #include <libxslt/transform.h>
|
---|
63 | #include <libxslt/xsltutils.h>
|
---|
64 |
|
---|
65 | </pre><p>
|
---|
66 | </p><p>The first group of include directives includes general C
|
---|
67 | libraries. The libraries we need to make libxslt work are in the
|
---|
68 | second group. The <tt class="filename">transform.h</tt> header file
|
---|
69 | declares the API that does the bulk of the actual processing. The
|
---|
70 | <tt class="filename">xsltutils.h</tt> header file declares the API for some
|
---|
71 | generic utility functions of the XSLT engine; among other things,
|
---|
72 | saving to a file, which is what we need it for.</p><p>
|
---|
73 | If our input files contain entities through external subsets, we need
|
---|
74 | to tell libxslt to load them. The global variable
|
---|
75 | <tt class="function">xmlLoadExtDtdDefaultValue</tt>, defined in
|
---|
76 | <tt class="filename">libxml/globals.h</tt>, is responsible for that. As the
|
---|
77 | variable is defined outside our program we must specify external
|
---|
78 | linkage:
|
---|
79 | </p><pre class="programlisting">
|
---|
80 | extern int xmlLoadExtDtdDefaultValue;
|
---|
81 | </pre><p>
|
---|
82 | </p><p>
|
---|
83 | The program is called from the command line. We anticipate that the
|
---|
84 | user may not call it the right way, so we define a function for
|
---|
85 | describing its usage:
|
---|
86 | </p><pre class="programlisting">
|
---|
87 | static void usage(const char *name) {
|
---|
88 | printf("Usage: %s [options] stylesheet [stylesheet ...] file [file ...]\n",
|
---|
89 | name);
|
---|
90 | printf(" --out file: send output to file\n");
|
---|
91 | printf(" --param name value: pass a (parameter,value) pair\n");
|
---|
92 | }
|
---|
93 | </pre><p>
|
---|
94 | </p></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="id2799225"></a>Program Start</h2></div></div></div><p>We need to define a few variables that are used throughout the
|
---|
95 | program:
|
---|
96 | </p><pre class="programlisting">
|
---|
97 | int main(int argc, char **argv) {
|
---|
98 | int arg_indx;
|
---|
99 | const char *params[16 + 1];
|
---|
100 | int params_indx = 0;
|
---|
101 | int stylesheet_indx = 0;
|
---|
102 | int file_indx = 0;
|
---|
103 | int i, j, k;
|
---|
104 | FILE *output_file = stdout;
|
---|
105 | xsltStylesheetPtr *stylesheets =
|
---|
106 | (xsltStylesheetPtr *) calloc(argc, sizeof(xsltStylesheetPtr));
|
---|
107 | xmlDocPtr *files = (xmlDocPtr *) calloc(argc, sizeof(xmlDocPtr));
|
---|
108 | int return_value = 0;
|
---|
109 | </pre><p>
|
---|
110 | </p><p>The <tt class="varname">arg_indx</tt> integer is an index used to
|
---|
111 | iterate over the program arguments. The <tt class="varname">params</tt>
|
---|
112 | string array is used to collect the XSLT parameters. In XSLT,
|
---|
113 | additional information may be passed to the processor via
|
---|
114 | parameters. The user of the program specifies these in key-value pairs
|
---|
115 | in the command line following the <b class="userinput"><tt>--param</tt></b>
|
---|
116 | command line argument. We accept up to 8 such key-value pairs, which
|
---|
117 | we track with the <tt class="varname">params_indx</tt> integer. libxslt
|
---|
118 | expects the parameters array to be null-terminated, so we have to
|
---|
119 | allocate one extra place (16 + 1) for it. The
|
---|
120 | <tt class="varname">file_indx</tt> is an index to iterate over the files to
|
---|
121 | be processed. The <tt class="varname">i</tt>, <tt class="varname">j</tt>,
|
---|
122 | <tt class="varname">k</tt> integers are additional indices for iteration
|
---|
123 | purposes, and <tt class="varname">return_value</tt> is the value the program
|
---|
124 | returns to the operating system. We expect the result of the
|
---|
125 | transformation to be the standard output in most cases, but the user
|
---|
126 | may wish otherwise via the <tt class="option">--out</tt> command line
|
---|
127 | option, so we need to keep track of the situation with the
|
---|
128 | <tt class="varname">output_file</tt> file pointer.</p><p>In libxslt, XSLT stylesheets are internally stored in
|
---|
129 | <span class="structname">xsltStylesheet</span> structures; similarly, in
|
---|
130 | libxml XML documents are stored in <span class="structname">xmlDoc</span>
|
---|
131 | structures. <span class="type">xsltStylesheetPtr</span> and <span class="type">xmlDocPtr</span>
|
---|
132 | are simply typedefs of pointers to them. The user may specify any
|
---|
133 | number of stylesheets that will be applied to the documents one after
|
---|
134 | the other. To save time we parse the stylesheets and the documents as
|
---|
135 | we read them from the command line and keep the parsed representation
|
---|
136 | of them. The parsed results are kept in arrays. These are dynamically
|
---|
137 | allocated and sized to the number of arguments; this wastes some
|
---|
138 | space, but not much (the size of <span class="type">xmlStyleSheetPtr</span> and
|
---|
139 | <span class="type">xmlDocPtr</span> is the size of a pointer) and simplifies code
|
---|
140 | later on. The array memory is allocated with
|
---|
141 | <tt class="function">calloc</tt> to ensure contents are initialised to
|
---|
142 | zero.
|
---|
143 | </p></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="id2799358"></a>Arguments Collection</h2></div></div></div><p>If the program gets no arguments at all, we print the usage
|
---|
144 | description, set the program return value to 1 and exit. Instead of
|
---|
145 | returning directly we go to (literally) to the end of the program text
|
---|
146 | where some housekeeping takes place.</p><p>
|
---|
147 | </p><pre class="programlisting">
|
---|
148 |
|
---|
149 | if (argc <= 1) {
|
---|
150 | usage(argv[0]);
|
---|
151 | return_value = 1;
|
---|
152 | goto finish;
|
---|
153 | }
|
---|
154 |
|
---|
155 | /* Collect arguments */
|
---|
156 | for (arg_indx = 1; arg_indx < argc; arg_indx++) {
|
---|
157 | if (argv[arg_indx][0] != '-')
|
---|
158 | break;
|
---|
159 | if ((!strcmp(argv[arg_indx], "-param"))
|
---|
160 | || (!strcmp(argv[arg_indx], "--param"))) {
|
---|
161 | arg_indx++;
|
---|
162 | params[params_indx++] = argv[arg_indx++];
|
---|
163 | params[params_indx++] = argv[arg_indx];
|
---|
164 | if (params_indx >= 16) {
|
---|
165 | fprintf(stderr, "too many params\n");
|
---|
166 | return_value = 1;
|
---|
167 | goto finish;
|
---|
168 | }
|
---|
169 | } else if ((!strcmp(argv[arg_indx], "-o"))
|
---|
170 | || (!strcmp(argv[arg_indx], "--out"))) {
|
---|
171 | arg_indx++;
|
---|
172 | output_file = fopen(argv[arg_indx], "w");
|
---|
173 | } else {
|
---|
174 | fprintf(stderr, "Unknown option %s\n", argv[arg_indx]);
|
---|
175 | usage(argv[0]);
|
---|
176 | return_value = 1;
|
---|
177 | goto finish;
|
---|
178 | }
|
---|
179 | }
|
---|
180 | params[params_indx] = 0;
|
---|
181 |
|
---|
182 | </pre><p>
|
---|
183 | </p><p>If the user passes arguments we have to collect them. This is a
|
---|
184 | matter of iterating over the program argument list while we encounter
|
---|
185 | arguments starting with a dash. The XSLT parameters are put into the
|
---|
186 | <tt class="varname">params</tt> array and the <tt class="varname">output_file</tt>
|
---|
187 | is set to the user request, if any. After processing all the parameter
|
---|
188 | key-value pairs we set the last element of the <tt class="varname">params</tt>
|
---|
189 | array to null.
|
---|
190 | </p></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="id2799396"></a>Parsing</h2></div></div></div><p>The rest of the argument list is taken to be stylesheets and
|
---|
191 | files to be transformed. Stylesheets are identified by their suffix,
|
---|
192 | which is expected to be xsl (case sensitive). All other files are
|
---|
193 | assumed to be XML documents, regardless of suffix.</p><p>
|
---|
194 | </p><pre class="programlisting">
|
---|
195 |
|
---|
196 | /* Collect and parse stylesheets and files to be transformed */
|
---|
197 | for (; arg_indx < argc; arg_indx++) {
|
---|
198 | char *argument =
|
---|
199 | (char *) malloc(sizeof(char) * (strlen(argv[arg_indx]) + 1));
|
---|
200 | strcpy(argument, argv[arg_indx]);
|
---|
201 | if (strtok(argument, ".")) {
|
---|
202 | char *suffix = strtok(0, ".");
|
---|
203 | if (suffix && !strcmp(suffix, "xsl")) {
|
---|
204 | stylesheets[stylesheet_indx++] =
|
---|
205 | xsltParseStylesheetFile((const xmlChar *)argv[arg_indx]);;
|
---|
206 | } else {
|
---|
207 | files[file_indx++] = xmlParseFile(argv[arg_indx]);
|
---|
208 | }
|
---|
209 | } else {
|
---|
210 | files[file_indx++] = xmlParseFile(argv[arg_indx]);
|
---|
211 | }
|
---|
212 | free(argument);
|
---|
213 | }
|
---|
214 |
|
---|
215 | </pre><p>
|
---|
216 | </p><p>Stylesheets are parsed using the
|
---|
217 | <tt class="function">xsltParseStylesheetFile</tt>
|
---|
218 | function. <tt class="function">xsltParseStylesheetFile</tt> takes as
|
---|
219 | argument a pointer to an <span class="type">xmlChar</span>, a typedef of an
|
---|
220 | unsigned char; in effect, the filename of the stylesheet. The
|
---|
221 | resulting <span class="type">xsltStylesheetPtr</span> is placed in the
|
---|
222 | <tt class="varname">stylesheets</tt> array. In the same vein, XML files are
|
---|
223 | parsed using the <tt class="function">xmlParseFile</tt> function that takes
|
---|
224 | as argument the file's name; the resulting <span class="type">xmlDocPtr</span> is
|
---|
225 | placed in the <tt class="varname">files</tt> array.
|
---|
226 | </p></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="id2771038"></a>File Processing</h2></div></div></div><p>All stylesheets are applied to each file one after the
|
---|
227 | other. Stylesheets are applied with the
|
---|
228 | <tt class="function">xsltApplyStylesheet</tt> function that takes as
|
---|
229 | argument the stylesheet to be applied, the file to be transformed and
|
---|
230 | any parameters we have collected. The in-memory representation of an
|
---|
231 | XML document takes space, which we free using the
|
---|
232 | <tt class="function">xmlFreeDoc</tt> function. The file is then saved to the
|
---|
233 | specified output.</p><p>
|
---|
234 | </p><pre class="programlisting">
|
---|
235 |
|
---|
236 | /* Process files */
|
---|
237 | for (i = 0; files[i]; i++) {
|
---|
238 | doc = files[i];
|
---|
239 | res = doc;
|
---|
240 | for (j = 0; stylesheets[j]; j++) {
|
---|
241 | res = xsltApplyStylesheet(stylesheets[j], doc, params);
|
---|
242 | xmlFreeDoc(doc);
|
---|
243 | doc = res;
|
---|
244 | }
|
---|
245 |
|
---|
246 | if (stylesheets[0]) {
|
---|
247 | xsltSaveResultToFile(output_file, res, stylesheets[j-1]);
|
---|
248 | } else {
|
---|
249 | xmlDocDump(output_file, res);
|
---|
250 | }
|
---|
251 | xmlFreeDoc(res);
|
---|
252 | }
|
---|
253 |
|
---|
254 | fclose(output_file);
|
---|
255 |
|
---|
256 | for (k = 0; stylesheets[k]; k++) {
|
---|
257 | xsltFreeStylesheet(stylesheets[k]);
|
---|
258 | }
|
---|
259 |
|
---|
260 | xsltCleanupGlobals();
|
---|
261 | xmlCleanupParser();
|
---|
262 |
|
---|
263 | finish:
|
---|
264 | free(stylesheets);
|
---|
265 | free(files);
|
---|
266 | return(return_value);
|
---|
267 |
|
---|
268 | </pre><p>
|
---|
269 | </p><p>To output an XML document we have in memory we use the
|
---|
270 | <tt class="function">xlstSaveResultToFile</tt> function, where we specify
|
---|
271 | the destination, the document and the stylesheet that has been applied
|
---|
272 | to it. The stylesheet is required so that output-related information
|
---|
273 | contained in the stylesheet, such as the encoding to be used, is used
|
---|
274 | in output. If no transformation has taken place, which will happen
|
---|
275 | when the user specifies no stylesheets at all in the command line, we
|
---|
276 | use the <tt class="function">xmlDocDump</tt> libxml function that saves the
|
---|
277 | source document to the file without further ado.</p><p>As parsed stylesheets take up space in memory, we take care to
|
---|
278 | free that memory after use with a call to
|
---|
279 | <tt class="function">xmlFreeStyleSheet</tt>. When all work is done, we
|
---|
280 | clean up all global variables used by the XSLT library using
|
---|
281 | <tt class="function">xsltCleanupGlobals</tt>. Likewise, all global memory
|
---|
282 | allocated for the XML parser is reclaimed by a call to
|
---|
283 | <tt class="function">xmlCleanupParser</tt>. Before returning we deallocate
|
---|
284 | the memory allocated for the holding the pointers to the XML documents
|
---|
285 | and stylesheets.</p></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="id2771153"></a>*NIX Compiling and Linking</h2></div></div></div><p>Compiling and linking in a *NIX environment
|
---|
286 | is easy, as the required libraries are almost certain to be already in
|
---|
287 | place (remember that libxml and libxslt are used by the GNOME project,
|
---|
288 | so they are present in most installations). The program can be
|
---|
289 | dynamically linked so that its footprint is minimized, or statically
|
---|
290 | linked, so that it stands by itself, carrying all required code.</p><p>For dynamic linking the following one liner will do:</p><p>
|
---|
291 | <b class="userinput"><tt>gcc -o libxslt_pipes -Wall -I/usr/include/libxml2 -lxslt
|
---|
292 | -lxml2 -L/usr/lib libxslt_pipes.c</tt></b>
|
---|
293 | </p><p>We assume that the necessary header files are in <tt class="filename">/usr/include/libxml2</tt> and that the
|
---|
294 | required libraries (<tt class="filename">libxslt.so</tt>,
|
---|
295 | <tt class="filename">libxml2.so</tt>) are in <tt class="filename">/usr/lib</tt>.</p><p>In general, a program may need to link to additional libraries,
|
---|
296 | depending on the processing it actually performs. A good way to start
|
---|
297 | is to use the <span><b class="command">xslt-config</b></span> script. The
|
---|
298 | <tt class="option">--help</tt> option displays usage
|
---|
299 | information. Running</p><p>
|
---|
300 | <b class="userinput"><tt>
|
---|
301 | xslt-config --cflags
|
---|
302 | </tt></b>
|
---|
303 | </p><p>we get compile flags, while running</p><p>
|
---|
304 | <b class="userinput"><tt>
|
---|
305 | xslt-config --libs
|
---|
306 | </tt></b>
|
---|
307 | </p><p>we get the library settings for the linker.</p><p>For static linking we must list more libraries than we did for
|
---|
308 | dynamic linking, as the libraries on which the libxsl and libxslt
|
---|
309 | libraries depend are also needed. Using <span><b class="command">xslt-config</b></span>
|
---|
310 | on a particular installation we create the following one-liner:</p><p>
|
---|
311 | <b class="userinput"><tt>
|
---|
312 | gcc -o libxslt_pipes -Wall -I/usr/include/libxml2 libxslt_pipes.c
|
---|
313 | -static -L/usr/lib -lxslt -lxml2 -lz -lpthread -lm
|
---|
314 | </tt></b>
|
---|
315 | </p><p>If we get warnings to the effect that some function in
|
---|
316 | statically linked applications requires at runtime the shared
|
---|
317 | libraries used from the glibc version used for linking, that means
|
---|
318 | that the binary is not completely static. Although we statically
|
---|
319 | linked against the GNU C runtime library glibc, glibc uses external
|
---|
320 | libraries to perform some of its functions. Same version libraries
|
---|
321 | must be present on the system we want the application to run. One way
|
---|
322 | to avoid this it to use an alternative C runtime, for example <a href="http://www.uclibc.org" target="_top">uClibc</a>, which requires obtaining
|
---|
323 | and building a uClibc toolchain first (if the reason for trying to get
|
---|
324 | a statically linked version of the program is to embed it somewhere,
|
---|
325 | using uClibc might be a good idea anyway).
|
---|
326 | </p></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="windows-build"></a>MS-Windows Compiling and
|
---|
327 | Linking</h2></div></div></div><p>Compiling and linking in MS-Windows requires
|
---|
328 | some attention. First, the MS-Windows ports must be
|
---|
329 | downloaded and installed in the programming workstation. The ports are
|
---|
330 | available in <a href="http://www.zlatkovic.com/libxml.en.html" target="_top">Igor
|
---|
331 | Zlatković's site</a>. We need the ports for iconv, zlib, libxml,
|
---|
332 | and libxslt. In contrast to *NIX environments, we
|
---|
333 | cannot assume that the libraries needed will be present in other
|
---|
334 | computers where the program will be used. One solution is to
|
---|
335 | distribute the program along with the necessary dynamic
|
---|
336 | libraries. Another solution is to statically link the program so that
|
---|
337 | only a single executable file will have to be distributed.</p><p>We assume that we have decompressed the downloaded ports and
|
---|
338 | have placed the required contents of their <tt class="filename">include</tt> directories in an <tt class="filename">include</tt> directory in our file system. The
|
---|
339 | required contents include everything apart from the <tt class="filename">libexslt</tt> directory of the libxslt port,
|
---|
340 | as we are not using EXLST (an initiative to provide extensions to
|
---|
341 | XSLT) in this project. In order to compile the program we have to make
|
---|
342 | sure that all necessary header files are included. When using the
|
---|
343 | Microsoft compiler this translates to adding the required
|
---|
344 | <tt class="option">/I</tt> switches in the command line. If using a Visual
|
---|
345 | Studio product the same effect is attained by specifying additional
|
---|
346 | include directories in the compilation options. In the end, if the
|
---|
347 | headers have been copied in <tt class="filename">C:\include</tt> the command line must contain
|
---|
348 | <tt class="option">/I"C:\include" /I"C:\include\libslt"
|
---|
349 | /I"C:\include\libxml"</tt>.</p><p>This being a C program, it needs to be compiled against an
|
---|
350 | implementation of the C libraries. Microsoft provides various
|
---|
351 | implementations. The ports, however, have been compiled against the
|
---|
352 | <tt class="filename">msvcrt.dll</tt> implementation, so it is wise to use
|
---|
353 | the same runtime in our project, lest we wish to come against
|
---|
354 | unexpected runtime crashes. The <tt class="filename">msvcrt.dll</tt> is a
|
---|
355 | multi-threaded implementation and is specified by giving
|
---|
356 | <tt class="option">/MD</tt> as a compiler option. Unfortunately, the
|
---|
357 | correspondence between the <tt class="option">/MD</tt> switch and
|
---|
358 | <tt class="filename">msvcrt.dll</tt> breaks after version 6 of the
|
---|
359 | Microsoft compiler. In version 7 and later (i.e., Visual Studio .NET),
|
---|
360 | <tt class="option">/MD</tt> links against a different DLL; in version 7.1
|
---|
361 | this is <tt class="filename">msvcrt71.dll</tt>. The end result of this bit
|
---|
362 | of esoterica is that if you try to dynamically link your application
|
---|
363 | with a compiler whose version is greater than 6, your program is
|
---|
364 | likely to crash unexpectedly. Alternatively, you may wish to compile
|
---|
365 | all iconv, zlib, libxml and libxslt yourself, using the new runtime
|
---|
366 | library. This is not a tall order, and some details are given
|
---|
367 | <a href="#windows-ports-build" title="Building the Ports in
|
---|
368 | MS-Windows">below</a>.</p><p>There are three kinds of libraries in MS-Windows. Dynamically
|
---|
369 | Linked Libraries (DLLs), like <tt class="filename">msvcrt.dll</tt> we met
|
---|
370 | above, are used for dynamic linking; an application links to them at
|
---|
371 | runtime, so the application does not include the code contained in
|
---|
372 | them. Static libraries are used for static linking; an application
|
---|
373 | adds the libraries' code to its own code at link time. Import
|
---|
374 | libraries are used when building an application that uses DLLs. For
|
---|
375 | the application to be built, the linker must somehow find the
|
---|
376 | definitions of the functions that will be provided in runtime by the
|
---|
377 | DLLs, otherwise it will complain about unresolved references. Import
|
---|
378 | libraries contain function stubs that, for each DLL function we want
|
---|
379 | to call, know where to look for it in the DLL. In essence, in order to
|
---|
380 | use a DLL we must link against its corresponding import library. DLLs
|
---|
381 | have a <tt class="filename">.dll</tt> suffix; static and import libraries
|
---|
382 | both have a <tt class="filename">.lib</tt> suffix. In the MS-Windows ports
|
---|
383 | of libxml and libxslt static libraries are distinguished by their name
|
---|
384 | ending in <tt class="filename">_a.lib</tt>, while in the zlib port the
|
---|
385 | import library is <tt class="filename">zdll.lib</tt> and the static library
|
---|
386 | is <tt class="filename">zlib.lib</tt>. In what follows we assume we have a
|
---|
387 | <tt class="filename">lib</tt> directory in our filesystem
|
---|
388 | where we place the libraries we need for linking.</p><p>If we want to link dynamically we must make sure the <tt class="filename">lib</tt> directory contains
|
---|
389 | <tt class="filename">iconv.lib</tt>, <tt class="filename">libxslt.lib</tt>,
|
---|
390 | <tt class="filename">libxml2.lib</tt>, and
|
---|
391 | <tt class="filename">zdll.lib</tt>. When using the Microsoft linker this
|
---|
392 | translates to adding the required <tt class="option">/LIBPATH</tt>
|
---|
393 | switch and the necessary libraries in the command line. In Visual
|
---|
394 | Studio we must specify an additional library directory for <tt class="filename">lib</tt> and put the necessary libraries in
|
---|
395 | the additional dependencies. In the end, the command line must include
|
---|
396 | <tt class="option">/LIBPATH:"C:\lib" "lib\iconv.lib" "lib\libxslt.lib"
|
---|
397 | "lib\libxml2.lib" "lib\zdll.lib"</tt>, provided the libraries'
|
---|
398 | directory is <tt class="filename">C:\lib</tt>. In order
|
---|
399 | for the resulting executable to run, the ports DLLs must be present;
|
---|
400 | one way is to place all DLLs contained in the ports in the home
|
---|
401 | directory of our application, and make sure they are distributed
|
---|
402 | together.</p><p>If we want to link statically we must make sure the <tt class="filename">lib</tt> directory contains
|
---|
403 | <tt class="filename">iconv_a.lib</tt>, <tt class="filename">libxslt_a.lib</tt>,
|
---|
404 | <tt class="filename">libxml2_a.lib</tt>, and
|
---|
405 | <tt class="filename">zlib.lib</tt>. Adding <tt class="filename">lib</tt> as a library directory and putting
|
---|
406 | the necessary libraries in the additional dependencies, we get a
|
---|
407 | command line that should include <tt class="option">/LIBPATH:"C:\lib"
|
---|
408 | "lib\iconv_a.lib" "lib\libxslt_a.lib" "lib\libxml2_a.lib"
|
---|
409 | "lib\zlib.lib"</tt>. The resulting executable is much bigger
|
---|
410 | than if we linked dynamically; it is, however, self-contained and can
|
---|
411 | be distributed more easily, in theory at least. In practice, however,
|
---|
412 | the executable is not completely static. We saw that the ports are
|
---|
413 | compiled against <tt class="filename">msvcrt.dll</tt>, so the program does
|
---|
414 | require that DLL at runtime. Moreover, since when using a version of
|
---|
415 | Microsoft developer tools with a version number greater than 6, we are
|
---|
416 | no longer using <tt class="filename">msvcrt.dll</tt>, but another runtime
|
---|
417 | like <tt class="filename">msvcrt71.dll</tt>, and we then need that DLL. In
|
---|
418 | contrast to <tt class="filename">msvcrt.dll</tt> it may not be present on
|
---|
419 | the target computer, so we may have to copy it along.</p><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="windows-ports-build"></a>Building the Ports in
|
---|
420 | MS-Windows</h3></div></div></div><p>The source code of the ports is readily available on the web,
|
---|
421 | one has to check the ports sites. Each port can be built without
|
---|
422 | problems in an MS-Windows environment using Microsoft development
|
---|
423 | tools. The necessary command line tools (compiler, linker,
|
---|
424 | <span><b class="command">nmake</b></span>) must be available. This means running a
|
---|
425 | batch file called <span><b class="command">vcvars32.bat</b></span> that comes with
|
---|
426 | Visual Studio (its exact location in the directory tree may vary
|
---|
427 | depending on the version of Visual Studio, but a file search will find
|
---|
428 | it anyway). Makefiles for the Microsoft tools are found in all
|
---|
429 | ports. They are distinguished by their suffix, e.g.,
|
---|
430 | <tt class="filename">Makefile.msvc</tt> or
|
---|
431 | <tt class="filename">Makefile.msc</tt>. To build zlib it suffices to run
|
---|
432 | <span><b class="command">nmake</b></span> against <tt class="filename">Makefile.msc</tt>
|
---|
433 | (i.e., with the <tt class="option">/F</tt> option); similarly, to build
|
---|
434 | <tt class="filename">iconv</tt> it suffices to run <span><b class="command">nmake</b></span>
|
---|
435 | against <tt class="filename">Makefile.msvc</tt>. Building libxml and
|
---|
436 | libxslt requires an extra configuration step; we must run the
|
---|
437 | <tt class="filename">configure.js</tt> configuration script with the
|
---|
438 | <span><b class="command">cscript</b></span> command. <tt class="filename">configure.js</tt>
|
---|
439 | is found in the <tt class="filename">win32</tt> directory
|
---|
440 | in the distributions. It is written in JScript, Microsoft's
|
---|
441 | implementation of the ECMA 262 language specification (ECMAScript
|
---|
442 | Edition 3), a JavaScript offspring. The configuration string takes a
|
---|
443 | number of parameters detailing our environment and needs;
|
---|
444 | <b class="userinput"><tt>cscript configure.js help</tt></b> documents
|
---|
445 | them.</p><p>It is wise to read all documentation files in the source
|
---|
446 | distributions before starting; moreover, pay attention to the
|
---|
447 | dependencies between the ports. If we configure libxml and libxslt to
|
---|
448 | use iconv and zlib we must build these two first and make sure their
|
---|
449 | headers and libraries can be found by the compiler and the
|
---|
450 | linker when building libxml and libxslt.</p></div></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="id2839739"></a>zlib, iconv and All That</h2></div></div></div><p>We saw that libxml and libxslt depend on various other
|
---|
451 | libraries, for instance zlib, iconv, and so forth. Taking a look into
|
---|
452 | them gives us clues on the capabilities of libxml and libxslt.</p><p><a href="http://www.zlib.org" target="_top">zlib</a> is a free general
|
---|
453 | purpose lossless data compression library. It is a venerable
|
---|
454 | workhorse; more than <a href="http://www.gzip.org/zlib/apps.html" target="_top">500 applications</a>
|
---|
455 | (both commercial and open source) seem to use the library. libxml uses
|
---|
456 | zlib so that it can read from or write to compressed files
|
---|
457 | directly. The <tt class="function">xmlParseFile</tt> function can
|
---|
458 | transparently parse a compressed document to produce an
|
---|
459 | <span class="structname">xmlDoc</span>. If we want to create a compressed
|
---|
460 | document with libxml we can use an
|
---|
461 | <span class="structname">xmlTextWriterPtr</span> (obtained through
|
---|
462 | <tt class="function">xmlNewTextWriterDoc</tt>), or another related
|
---|
463 | structure from <tt class="filename">libxml/xmlwriter.h</tt>, with
|
---|
464 | compression enabled.</p><p>XML allows documents to use a variety of different character
|
---|
465 | encodings. <a href="http://www.gnu.org/software/libiconv" target="_top">iconv</a> is a free
|
---|
466 | library for converting between different character encodings. libxml
|
---|
467 | provides a set of default converters for some encodings: UTF-8, UTF-16
|
---|
468 | (little endian and big endian), ISO-8859-1, ASCII, and HTML (a
|
---|
469 | specific handler for the conversion of UTF-8 to ASCII with HTML
|
---|
470 | predefined entities like &copy; for the copyright sign). However,
|
---|
471 | when compiled with iconv support, libxml and libxslt can handle the
|
---|
472 | full range of encodings provided by iconv; these should cover most
|
---|
473 | needs.</p><p>libxml and libxslt can be used in multi-threaded
|
---|
474 | applications. In MS-Windows they are linked against
|
---|
475 | <tt class="filename">MSVCRT.DLL</tt> (or one of its descendants, as we saw
|
---|
476 | <a href="#windows-build" title="MS-Windows Compiling and
|
---|
477 | Linking">above</a>). In *NIX the pthreads
|
---|
478 | (POSIX threads) library is used.</p></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="id2839841"></a>The Complete Program</h2></div></div></div><p>
|
---|
479 | The complete program listing is given below. The program is also
|
---|
480 | <a href="libxslt_pipes.c" target="_top">available online</a>.
|
---|
481 | </p><p>
|
---|
482 | </p><pre class="programlisting">
|
---|
483 | /*
|
---|
484 | * libxslt_pipes.c: a program for performing a series of XSLT
|
---|
485 | * transformations
|
---|
486 | *
|
---|
487 | * Writen by Panos Louridas, based on libxslt_tutorial.c by John Fleck.
|
---|
488 | *
|
---|
489 | * This program is free software; you can redistribute it and/or modify
|
---|
490 | * it under the terms of the GNU General Public License as published by
|
---|
491 | * the Free Software Foundation; either version 2 of the License, or
|
---|
492 | * (at your option) any later version.
|
---|
493 | *
|
---|
494 | * This program is distributed in the hope that it will be useful,
|
---|
495 | * but WITHOUT ANY WARRANTY; without even the implied warranty of
|
---|
496 | * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
---|
497 | * GNU General Public License for more details.
|
---|
498 | *
|
---|
499 | * You should have received a copy of the GNU General Public License
|
---|
500 | * along with this program; if not, write to the Free Software
|
---|
501 | * Foundation, Inc., 59 Temple Place - Suite 330, Cambridge, MA 02139, USA.
|
---|
502 | *
|
---|
503 | */
|
---|
504 |
|
---|
505 | #include <stdio.h>
|
---|
506 | #include <string.h>
|
---|
507 | #include <stdlib.h>
|
---|
508 |
|
---|
509 | #include <libxslt/transform.h>
|
---|
510 | #include <libxslt/xsltutils.h>
|
---|
511 |
|
---|
512 | extern int xmlLoadExtDtdDefaultValue;
|
---|
513 |
|
---|
514 | static void usage(const char *name) {
|
---|
515 | printf("Usage: %s [options] stylesheet [stylesheet ...] file [file ...]\n",
|
---|
516 | name);
|
---|
517 | printf(" --out file: send output to file\n");
|
---|
518 | printf(" --param name value: pass a (parameter,value) pair\n");
|
---|
519 | }
|
---|
520 |
|
---|
521 | int main(int argc, char **argv) {
|
---|
522 | int arg_indx;
|
---|
523 | const char *params[16 + 1];
|
---|
524 | int params_indx = 0;
|
---|
525 | int stylesheet_indx = 0;
|
---|
526 | int file_indx = 0;
|
---|
527 | int i, j, k;
|
---|
528 | FILE *output_file = stdout;
|
---|
529 | xsltStylesheetPtr *stylesheets =
|
---|
530 | (xsltStylesheetPtr *) calloc(argc, sizeof(xsltStylesheetPtr));
|
---|
531 | xmlDocPtr *files = (xmlDocPtr *) calloc(argc, sizeof(xmlDocPtr));
|
---|
532 | xmlDocPtr doc, res;
|
---|
533 | int return_value = 0;
|
---|
534 |
|
---|
535 | if (argc <= 1) {
|
---|
536 | usage(argv[0]);
|
---|
537 | return_value = 1;
|
---|
538 | goto finish;
|
---|
539 | }
|
---|
540 |
|
---|
541 | /* Collect arguments */
|
---|
542 | for (arg_indx = 1; arg_indx < argc; arg_indx++) {
|
---|
543 | if (argv[arg_indx][0] != '-')
|
---|
544 | break;
|
---|
545 | if ((!strcmp(argv[arg_indx], "-param"))
|
---|
546 | || (!strcmp(argv[arg_indx], "--param"))) {
|
---|
547 | arg_indx++;
|
---|
548 | params[params_indx++] = argv[arg_indx++];
|
---|
549 | params[params_indx++] = argv[arg_indx];
|
---|
550 | if (params_indx >= 16) {
|
---|
551 | fprintf(stderr, "too many params\n");
|
---|
552 | return_value = 1;
|
---|
553 | goto finish;
|
---|
554 | }
|
---|
555 | } else if ((!strcmp(argv[arg_indx], "-o"))
|
---|
556 | || (!strcmp(argv[arg_indx], "--out"))) {
|
---|
557 | arg_indx++;
|
---|
558 | output_file = fopen(argv[arg_indx], "w");
|
---|
559 | } else {
|
---|
560 | fprintf(stderr, "Unknown option %s\n", argv[arg_indx]);
|
---|
561 | usage(argv[0]);
|
---|
562 | return_value = 1;
|
---|
563 | goto finish;
|
---|
564 | }
|
---|
565 | }
|
---|
566 | params[params_indx] = 0;
|
---|
567 |
|
---|
568 | /* Collect and parse stylesheets and files to be transformed */
|
---|
569 | for (; arg_indx < argc; arg_indx++) {
|
---|
570 | char *argument =
|
---|
571 | (char *) malloc(sizeof(char) * (strlen(argv[arg_indx]) + 1));
|
---|
572 | strcpy(argument, argv[arg_indx]);
|
---|
573 | if (strtok(argument, ".")) {
|
---|
574 | char *suffix = strtok(0, ".");
|
---|
575 | if (suffix && !strcmp(suffix, "xsl")) {
|
---|
576 | stylesheets[stylesheet_indx++] =
|
---|
577 | xsltParseStylesheetFile((const xmlChar *)argv[arg_indx]);;
|
---|
578 | } else {
|
---|
579 | files[file_indx++] = xmlParseFile(argv[arg_indx]);
|
---|
580 | }
|
---|
581 | } else {
|
---|
582 | files[file_indx++] = xmlParseFile(argv[arg_indx]);
|
---|
583 | }
|
---|
584 | free(argument);
|
---|
585 | }
|
---|
586 |
|
---|
587 | xmlSubstituteEntitiesDefault(1);
|
---|
588 | xmlLoadExtDtdDefaultValue = 1;
|
---|
589 |
|
---|
590 | /* Process files */
|
---|
591 | for (i = 0; files[i]; i++) {
|
---|
592 | doc = files[i];
|
---|
593 | res = doc;
|
---|
594 | for (j = 0; stylesheets[j]; j++) {
|
---|
595 | res = xsltApplyStylesheet(stylesheets[j], doc, params);
|
---|
596 | xmlFreeDoc(doc);
|
---|
597 | doc = res;
|
---|
598 | }
|
---|
599 |
|
---|
600 | if (stylesheets[0]) {
|
---|
601 | xsltSaveResultToFile(output_file, res, stylesheets[j-1]);
|
---|
602 | } else {
|
---|
603 | xmlDocDump(output_file, res);
|
---|
604 | }
|
---|
605 | xmlFreeDoc(res);
|
---|
606 | }
|
---|
607 |
|
---|
608 | fclose(output_file);
|
---|
609 |
|
---|
610 | for (k = 0; stylesheets[k]; k++) {
|
---|
611 | xsltFreeStylesheet(stylesheets[k]);
|
---|
612 | }
|
---|
613 |
|
---|
614 | xsltCleanupGlobals();
|
---|
615 | xmlCleanupParser();
|
---|
616 |
|
---|
617 | finish:
|
---|
618 | free(stylesheets);
|
---|
619 | free(files);
|
---|
620 | return(return_value);
|
---|
621 | }
|
---|
622 |
|
---|
623 | </pre><p>
|
---|
624 | </p></div></div></body></html>
|
---|