1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html401/loose.dtd">
2<html>
3<!-- Created on March, 19 2013 by texi2html 1.78a -->
4<!--
5Written by: Lionel Cons <Lionel.Cons@cern.ch> (original author)
6            Karl Berry  <karl@freefriends.org>
7            Olaf Bachmann <obachman@mathematik.uni-kl.de>
8            and many others.
9Maintained by: Many creative people.
10Send bugs and suggestions to <texi2html-bug@nongnu.org>
11
12-->
13<head>
14<title>GNU libunistring: 5. Conversions between Unicode and encodings &lt;uniconv.h&gt;</title>
15
16<meta name="description" content="GNU libunistring: 5. Conversions between Unicode and encodings &lt;uniconv.h&gt;">
17<meta name="keywords" content="GNU libunistring: 5. Conversions between Unicode and encodings &lt;uniconv.h&gt;">
18<meta name="resource-type" content="document">
19<meta name="distribution" content="global">
20<meta name="Generator" content="texi2html 1.78a">
21<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
22<style type="text/css">
23<!--
24a.summary-letter {text-decoration: none}
25pre.display {font-family: serif}
26pre.format {font-family: serif}
27pre.menu-comment {font-family: serif}
28pre.menu-preformatted {font-family: serif}
29pre.smalldisplay {font-family: serif; font-size: smaller}
30pre.smallexample {font-size: smaller}
31pre.smallformat {font-family: serif; font-size: smaller}
32pre.smalllisp {font-size: smaller}
33span.roman {font-family:serif; font-weight:normal;}
34span.sansserif {font-family:sans-serif; font-weight:normal;}
35ul.toc {list-style: none}
36-->
37</style>
38
39
40</head>
41
42<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
43
44<table cellpadding="1" cellspacing="1" border="0">
45<tr><td valign="middle" align="left">[<a href="libunistring_4.html#SEC11" title="Beginning of this chapter or previous chapter"> &lt;&lt; </a>]</td>
46<td valign="middle" align="left">[<a href="libunistring_6.html#SEC18" title="Next chapter"> &gt;&gt; </a>]</td>
47<td valign="middle" align="left"> &nbsp; </td>
48<td valign="middle" align="left"> &nbsp; </td>
49<td valign="middle" align="left"> &nbsp; </td>
50<td valign="middle" align="left"> &nbsp; </td>
51<td valign="middle" align="left"> &nbsp; </td>
52<td valign="middle" align="left">[<a href="libunistring.html#SEC_Top" title="Cover (top) of document">Top</a>]</td>
53<td valign="middle" align="left">[<a href="libunistring.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
54<td valign="middle" align="left">[<a href="libunistring_18.html#SEC71" title="Index">Index</a>]</td>
55<td valign="middle" align="left">[<a href="libunistring_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
56</tr></table>
57
58<hr size="2">
59<a name="uniconv_002eh"></a>
60<a name="SEC17"></a>
61<h1 class="chapter"> <a href="libunistring.html#TOC17">5. Conversions between Unicode and encodings <code>&lt;uniconv.h&gt;</code></a> </h1>
62
63<p>This include file declares functions for converting between Unicode strings
64and <code>char *</code> strings in locale encoding or in other specified encodings.
65</p>
66<a name="IDX154"></a>
67<p>The following function returns the locale encoding.
68</p>
69<dl>
70<dt><u>Function:</u> const char * <b>locale_charset</b><i> ()</i>
71<a name="IDX155"></a>
72</dt>
73<dd><p>Determines the current locale's character encoding, and canonicalizes it
74into one of the canonical names listed in &lsquo;<tt>config.charset</tt>&rsquo;.
75If the canonical name cannot be determined, the result is a non-canonical
76name.
77</p>
78<p>The result must not be freed; it is statically allocated.
79</p>
80<p>The result of this function can be used as an argument to the <code>iconv_open</code>
81function in GNU libc, in GNU libiconv, or in the gnulib provided wrapper
82around the native <code>iconv_open</code> function.  It may not work as an argument
83to the native <code>iconv_open</code> function directly.
84</p></dd></dl>
85
86<p>The handling of unconvertible characters during the conversions can be
87parametrized through the following enumeration type:
88</p>
89<dl>
90<dt><u>Type:</u> <b>enum iconv_ilseq_handler</b>
91<a name="IDX156"></a>
92</dt>
93<dd><p>This type specifies how unconvertible characters in the input are handled.
94</p></dd></dl>
95
96<dl>
97<dt><u>Constant:</u> enum iconv_ilseq_handler <b>iconveh_error</b>
98<a name="IDX157"></a>
99</dt>
100<dd><p>This handler causes the function to return with <code>errno</code> set to
101<code>EILSEQ</code>.
102</p></dd></dl>
103
104<dl>
105<dt><u>Constant:</u> enum iconv_ilseq_handler <b>iconveh_question_mark</b>
106<a name="IDX158"></a>
107</dt>
108<dd><p>This handler produces one question mark &lsquo;<samp>?</samp>&rsquo; per unconvertible character.
109</p></dd></dl>
110
111<dl>
112<dt><u>Constant:</u> enum iconv_ilseq_handler <b>iconveh_escape_sequence</b>
113<a name="IDX159"></a>
114</dt>
115<dd><p>This handler produces an escape sequence <code>\u<var>xxxx</var></code> or
116<code>\U<var>xxxxxxxx</var></code> for each unconvertible character.
117</p></dd></dl>
118
119<a name="IDX160"></a>
120<p>The following functions convert between strings in a specified encoding and
121Unicode strings.
122</p>
123<dl>
124<dt><u>Function:</u> uint8_t * <b>u8_conv_from_encoding</b><i> (const char *<var>fromcode</var>, enum iconv_ilseq_handler <var>handler</var>, const char *<var>src</var>, size_t <var>srclen</var>, size_t *<var>offsets</var>, uint8_t *<var>resultbuf</var>, size_t *<var>lengthp</var>)</i>
125<a name="IDX161"></a>
126</dt>
127<dt><u>Function:</u> uint16_t * <b>u16_conv_from_encoding</b><i> (const char *<var>fromcode</var>, enum iconv_ilseq_handler <var>handler</var>, const char *<var>src</var>, size_t <var>srclen</var>, size_t *<var>offsets</var>, uint16_t *<var>resultbuf</var>, size_t *<var>lengthp</var>)</i>
128<a name="IDX162"></a>
129</dt>
130<dt><u>Function:</u> uint32_t * <b>u32_conv_from_encoding</b><i> (const char *<var>fromcode</var>, enum iconv_ilseq_handler <var>handler</var>, const char *<var>src</var>, size_t <var>srclen</var>, size_t *<var>offsets</var>, uint32_t *<var>resultbuf</var>, size_t *<var>lengthp</var>)</i>
131<a name="IDX163"></a>
132</dt>
133<dd><p>Converts an entire string, possibly including NUL bytes, from one encoding
134to UTF-8 encoding.
135</p>
136<p>Converts a memory region given in encoding <var>fromcode</var>.  <var>fromcode</var> is
137as for the <code>iconv_open</code> function.
138</p>
139<p>The input is in the memory region between <var>src</var> (inclusive) and
140<code><var>src</var> + <var>srclen</var></code> (exclusive).
141</p>
142<p>If <var>offsets</var> is not NULL, it should point to an array of <var>srclen</var>
143integers; this array is filled with offsets into the result, i.e. the
144character starting at <code><var>src</var>[i]</code> corresponds to the character starting
145at <code><var>result</var>[<var>offsets</var>[i]]</code>, and other offsets are set to
146<code>(size_t)(-1)</code>.
147</p>
148<p><code><var>resultbuf</var></code> and <code>*<var>lengthp</var></code> should be a scratch
149buffer and its size, or <code><var>resultbuf</var></code> can be NULL.
150</p>
151<p>May erase the contents of the memory at <code><var>resultbuf</var></code>.
152</p>
153<p>If successful: The resulting Unicode string (non-NULL) is returned and
154its length stored in <code>*<var>lengthp</var></code>.  The resulting string is
155<code><var>resultbuf</var></code> if no dynamic memory allocation was necessary,
156or a freshly allocated memory block otherwise.
157</p>
158<p>In case of error: NULL is returned and <code>errno</code> is set.
159Particular <code>errno</code> values: <code>EINVAL</code>, <code>EILSEQ</code>, <code>ENOMEM</code>.
160</p></dd></dl>
161
162<dl>
163<dt><u>Function:</u> char * <b>u8_conv_to_encoding</b><i> (const char *<var>tocode</var>, enum iconv_ilseq_handler <var>handler</var>, const uint8_t *<var>src</var>, size_t <var>srclen</var>, size_t *<var>offsets</var>, char *<var>resultbuf</var>, size_t *<var>lengthp</var>)</i>
164<a name="IDX164"></a>
165</dt>
166<dt><u>Function:</u> char * <b>u16_conv_to_encoding</b><i> (const char *<var>tocode</var>, enum iconv_ilseq_handler <var>handler</var>, const uint16_t *<var>src</var>, size_t <var>srclen</var>, size_t *<var>offsets</var>, char *<var>resultbuf</var>, size_t *<var>lengthp</var>)</i>
167<a name="IDX165"></a>
168</dt>
169<dt><u>Function:</u> char * <b>u32_conv_to_encoding</b><i> (const char *<var>tocode</var>, enum iconv_ilseq_handler <var>handler</var>, const uint32_t *<var>src</var>, size_t <var>srclen</var>, size_t *<var>offsets</var>, char *<var>resultbuf</var>, size_t *<var>lengthp</var>)</i>
170<a name="IDX166"></a>
171</dt>
172<dd><p>Converts an entire Unicode string, possibly including NUL units, from UTF-8
173encoding to a given encoding.
174</p>
175<p>Converts a memory region to encoding <var>tocode</var>.  <var>tocode</var> is as for
176the <code>iconv_open</code> function.
177</p>
178<p>The input is in the memory region between <var>src</var> (inclusive) and
179<code><var>src</var> + <var>srclen</var></code> (exclusive).
180</p>
181<p>If <var>offsets</var> is not NULL, it should point to an array of <var>srclen</var>
182integers; this array is filled with offsets into the result, i.e. the
183character starting at <code><var>src</var>[i]</code> corresponds to the character starting
184at <code><var>result</var>[<var>offsets</var>[i]]</code>, and other offsets are set to
185<code>(size_t)(-1)</code>.
186</p>
187<p><code><var>resultbuf</var></code> and <code>*<var>lengthp</var></code> should be a scratch
188buffer and its size, or <code><var>resultbuf</var></code> can be NULL.
189</p>
190<p>May erase the contents of the memory at <code><var>resultbuf</var></code>.
191</p>
192<p>If successful: The resulting Unicode string (non-NULL) is returned and
193its length stored in <code>*<var>lengthp</var></code>.  The resulting string is
194<code><var>resultbuf</var></code> if no dynamic memory allocation was necessary,
195or a freshly allocated memory block otherwise.
196</p>
197<p>In case of error: NULL is returned and <code>errno</code> is set.
198Particular <code>errno</code> values: <code>EINVAL</code>, <code>EILSEQ</code>, <code>ENOMEM</code>.
199</p></dd></dl>
200
201<p>The following functions convert between NUL terminated strings in a specified
202encoding and NUL terminated Unicode strings.
203</p>
204<dl>
205<dt><u>Function:</u> uint8_t * <b>u8_strconv_from_encoding</b><i> (const char *<var>string</var>, const char *<var>fromcode</var>, enum iconv_ilseq_handler <var>handler</var>)</i>
206<a name="IDX167"></a>
207</dt>
208<dt><u>Function:</u> uint16_t * <b>u16_strconv_from_encoding</b><i> (const char *<var>string</var>, const char *<var>fromcode</var>, enum iconv_ilseq_handler <var>handler</var>)</i>
209<a name="IDX168"></a>
210</dt>
211<dt><u>Function:</u> uint32_t * <b>u32_strconv_from_encoding</b><i> (const char *<var>string</var>, const char *<var>fromcode</var>, enum iconv_ilseq_handler <var>handler</var>)</i>
212<a name="IDX169"></a>
213</dt>
214<dd><p>Converts a NUL terminated string from a given encoding.
215</p>
216<p>The result is <code>malloc</code> allocated, or NULL (with <var>errno</var> set) in case of error.
217</p>
218<p>Particular <code>errno</code> values: <code>EILSEQ</code>, <code>ENOMEM</code>.
219</p></dd></dl>
220
221<dl>
222<dt><u>Function:</u> char * <b>u8_strconv_to_encoding</b><i> (const uint8_t *<var>string</var>, const char *<var>tocode</var>, enum iconv_ilseq_handler <var>handler</var>)</i>
223<a name="IDX170"></a>
224</dt>
225<dt><u>Function:</u> char * <b>u16_strconv_to_encoding</b><i> (const uint16_t *<var>string</var>, const char *<var>tocode</var>, enum iconv_ilseq_handler <var>handler</var>)</i>
226<a name="IDX171"></a>
227</dt>
228<dt><u>Function:</u> char * <b>u32_strconv_to_encoding</b><i> (const uint32_t *<var>string</var>, const char *<var>tocode</var>, enum iconv_ilseq_handler <var>handler</var>)</i>
229<a name="IDX172"></a>
230</dt>
231<dd><p>Converts a NUL terminated string to a given encoding.
232</p>
233<p>The result is <code>malloc</code> allocated, or NULL (with <code>errno</code> set) in case of error.
234</p>
235<p>Particular <code>errno</code> values: <code>EILSEQ</code>, <code>ENOMEM</code>.
236</p></dd></dl>
237
238<p>The following functions are shorthands that convert between NUL terminated
239strings in locale encoding and NUL terminated Unicode strings.
240</p>
241<dl>
242<dt><u>Function:</u> uint8_t * <b>u8_strconv_from_locale</b><i> (const char *<var>string</var>)</i>
243<a name="IDX173"></a>
244</dt>
245<dt><u>Function:</u> uint16_t * <b>u16_strconv_from_locale</b><i> (const char *<var>string</var>)</i>
246<a name="IDX174"></a>
247</dt>
248<dt><u>Function:</u> uint32_t * <b>u32_strconv_from_locale</b><i> (const char *<var>string</var>)</i>
249<a name="IDX175"></a>
250</dt>
251<dd><p>Converts a NUL terminated string from the locale encoding.
252</p>
253<p>The result is <code>malloc</code> allocated, or NULL (with <code>errno</code> set) in case of error.
254</p>
255<p>Particular <code>errno</code> values: <code>ENOMEM</code>.
256</p></dd></dl>
257
258<dl>
259<dt><u>Function:</u> char * <b>u8_strconv_to_locale</b><i> (const uint8_t *<var>string</var>)</i>
260<a name="IDX176"></a>
261</dt>
262<dt><u>Function:</u> char * <b>u16_strconv_to_locale</b><i> (const uint16_t *<var>string</var>)</i>
263<a name="IDX177"></a>
264</dt>
265<dt><u>Function:</u> char * <b>u32_strconv_to_locale</b><i> (const uint32_t *<var>string</var>)</i>
266<a name="IDX178"></a>
267</dt>
268<dd><p>Converts a NUL terminated string to the locale encoding.
269</p>
270<p>The result is <code>malloc</code> allocated, or NULL (with <code>errno</code> set) in case of error.
271</p>
272<p>Particular <code>errno</code> values: <code>ENOMEM</code>.
273</p></dd></dl>
274<hr size="6">
275<table cellpadding="1" cellspacing="1" border="0">
276<tr><td valign="middle" align="left">[<a href="libunistring_4.html#SEC11" title="Beginning of this chapter or previous chapter"> &lt;&lt; </a>]</td>
277<td valign="middle" align="left">[<a href="libunistring_6.html#SEC18" title="Next chapter"> &gt;&gt; </a>]</td>
278<td valign="middle" align="left"> &nbsp; </td>
279<td valign="middle" align="left"> &nbsp; </td>
280<td valign="middle" align="left"> &nbsp; </td>
281<td valign="middle" align="left"> &nbsp; </td>
282<td valign="middle" align="left"> &nbsp; </td>
283<td valign="middle" align="left">[<a href="libunistring.html#SEC_Top" title="Cover (top) of document">Top</a>]</td>
284<td valign="middle" align="left">[<a href="libunistring.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
285<td valign="middle" align="left">[<a href="libunistring_18.html#SEC71" title="Index">Index</a>]</td>
286<td valign="middle" align="left">[<a href="libunistring_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
287</tr></table>
288<p>
289 <font size="-1">
290  This document was generated on <i>March, 19 2013</i> using <a href="http://www.nongnu.org/texi2html/"><i>texi2html 1.78a</i></a>.
291 </font>
292 <br>
293
294</p>
295</body>
296</html>
297