VirtualBox

source: vbox/trunk/src/libs/libxml2-2.6.30/doc/tutorial/ar01s09.html@ 11632

Last change on this file since 11632 was 6076, checked in by vboxsync, 17 years ago

Merged dmik/s2 branch (r25959:26751) to the trunk.

  • Property svn:eol-style set to native
  • Property svn:keywords set to Date Revision Author Id
File size: 7.4 KB
Line 
1<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Encoding Conversion</title><meta name="generator" content="DocBook XSL Stylesheets V1.61.2"><link rel="home" href="index.html" title="Libxml Tutorial"><link rel="up" href="index.html" title="Libxml Tutorial"><link rel="previous" href="ar01s08.html" title="Retrieving Attributes"><link rel="next" href="apa.html" title="A. Compilation"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Encoding Conversion</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="ar01s08.html">Prev</a> </td><th width="60%" align="center"> </th><td width="20%" align="right"> <a accesskey="n" href="apa.html">Next</a></td></tr></table><hr></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="xmltutorialconvert"></a>Encoding Conversion</h2></div></div><div></div></div><p><a class="indexterm" name="id2587348"></a>
2Data encoding compatibility problems are one of the most common
3 difficulties encountered by programmers new to <span class="acronym">XML</span> in
4 general and <span class="application">libxml</span> in particular. Thinking
5 through the design of your application in light of this issue will help
6 avoid difficulties later. Internally, <span class="application">libxml</span>
7 stores and manipulates data in the UTF-8 format. Data used by your program
8 in other formats, such as the commonly used ISO-8859-1 encoding, must be
9 converted to UTF-8 before passing it to <span class="application">libxml</span>
10 functions. If you want your program's output in an encoding other than
11 UTF-8, you also must convert it.</p><p><span class="application">Libxml</span> uses
12 <span class="application">iconv</span> if it is available to convert
13 data. Without <span class="application">iconv</span>, only UTF-8, UTF-16 and
14 ISO-8859-1 can be used as external formats. With
15 <span class="application">iconv</span>, any format can be used provided
16 <span class="application">iconv</span> is able to convert it to and from
17 UTF-8. Currently <span class="application">iconv</span> supports about 150
18 different character formats with ability to convert from any to any. While
19 the actual number of supported formats varies between implementations, every
20 <span class="application">iconv</span> implementation is almost guaranteed to
21 support every format anyone has ever heard of.</p><div class="warning" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Warning"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Warning]" src="images/warning.png"></td><th align="left">Warning</th></tr><tr><td colspan="2" align="left" valign="top"><p>A common mistake is to use different formats for the internal data
22 in different parts of one's code. The most common case is an application
23 that assumes ISO-8859-1 to be the internal data format, combined with
24 <span class="application">libxml</span>, which assumes UTF-8 to be the
25 internal data format. The result is an application that treats internal
26 data differently, depending on which code section is executing. The one or
27 the other part of code will then, naturally, misinterpret the data.
28 </p></td></tr></table></div><p>This example constructs a simple document, then adds content provided
29 at the command line to the document's root element and outputs the results
30 to <tt class="filename">stdout</tt> in the proper encoding. For this example, we
31 use ISO-8859-1 encoding. The encoding of the string input at the command
32 line is converted from ISO-8859-1 to UTF-8. Full code: <a href="aph.html" title="H. Code for Encoding Conversion Example">Appendix H, <i>Code for Encoding Conversion Example</i></a></p><p>The conversion, encapsulated in the example code in the
33 <tt class="function">convert</tt> function, uses
34 <span class="application">libxml's</span>
35 <tt class="function">xmlFindCharEncodingHandler</tt> function:
36 </p><pre class="programlisting">
37 <a name="handlerdatatype"></a><img src="images/callouts/1.png" alt="1" border="0">xmlCharEncodingHandlerPtr handler;
38 <a name="calcsize"></a><img src="images/callouts/2.png" alt="2" border="0">size = (int)strlen(in)+1;
39 out_size = size*2-1;
40 out = malloc((size_t)out_size);
41
42&#8230;
43 <a name="findhandlerfunction"></a><img src="images/callouts/3.png" alt="3" border="0">handler = xmlFindCharEncodingHandler(encoding);
44&#8230;
45 <a name="callconversionfunction"></a><img src="images/callouts/4.png" alt="4" border="0">handler-&gt;input(out, &amp;out_size, in, &amp;temp);
46&#8230;
47 <a name="outputencoding"></a><img src="images/callouts/5.png" alt="5" border="0">xmlSaveFormatFileEnc("-", doc, encoding, 1);
48 </pre><p>
49 </p><div class="calloutlist"><table border="0" summary="Callout list"><tr><td width="5%" valign="top" align="left"><a href="#handlerdatatype"><img src="images/callouts/1.png" alt="1" border="0"></a> </td><td valign="top" align="left"><p><tt class="varname">handler</tt> is declared as a pointer to an
50 <tt class="function">xmlCharEncodingHandler</tt> function.</p></td></tr><tr><td width="5%" valign="top" align="left"><a href="#calcsize"><img src="images/callouts/2.png" alt="2" border="0"></a> </td><td valign="top" align="left"><p>The <tt class="function">xmlCharEncodingHandler</tt> function needs
51 to be given the size of the input and output strings, which are
52 calculated here for strings <tt class="varname">in</tt> and
53 <tt class="varname">out</tt>.</p></td></tr><tr><td width="5%" valign="top" align="left"><a href="#findhandlerfunction"><img src="images/callouts/3.png" alt="3" border="0"></a> </td><td valign="top" align="left"><p><tt class="function">xmlFindCharEncodingHandler</tt> takes as its
54 argument the data's initial encoding and searches
55 <span class="application">libxml's</span> built-in set of conversion
56 handlers, returning a pointer to the function or NULL if none is
57 found.</p></td></tr><tr><td width="5%" valign="top" align="left"><a href="#callconversionfunction"><img src="images/callouts/4.png" alt="4" border="0"></a> </td><td valign="top" align="left"><p>The conversion function identified by <tt class="varname">handler</tt>
58 requires as its arguments pointers to the input and output strings,
59 along with the length of each. The lengths must be determined
60 separately by the application.</p></td></tr><tr><td width="5%" valign="top" align="left"><a href="#outputencoding"><img src="images/callouts/5.png" alt="5" border="0"></a> </td><td valign="top" align="left"><p>To output in a specified encoding rather than UTF-8, we use
61 <tt class="function">xmlSaveFormatFileEnc</tt>, specifying the
62 encoding.</p></td></tr></table></div><p>
63 </p></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="ar01s08.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="index.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="apa.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Retrieving Attributes </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> A. Compilation</td></tr></table></div></body></html>
Note: See TracBrowser for help on using the repository browser.

© 2024 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette