1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html401/loose.dtd"> 2<html> 3<!-- Created on March, 19 2013 by texi2html 1.78a --> 4<!-- 5Written by: Lionel Cons <Lionel.Cons@cern.ch> (original author) 6 Karl Berry <karl@freefriends.org> 7 Olaf Bachmann <obachman@mathematik.uni-kl.de> 8 and many others. 9Maintained by: Many creative people. 10Send bugs and suggestions to <texi2html-bug@nongnu.org> 11 12--> 13<head> 14<title>GNU libunistring: 5. Conversions between Unicode and encodings <uniconv.h></title> 15 16<meta name="description" content="GNU libunistring: 5. Conversions between Unicode and encodings <uniconv.h>"> 17<meta name="keywords" content="GNU libunistring: 5. Conversions between Unicode and encodings <uniconv.h>"> 18<meta name="resource-type" content="document"> 19<meta name="distribution" content="global"> 20<meta name="Generator" content="texi2html 1.78a"> 21<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 22<style type="text/css"> 23<!-- 24a.summary-letter {text-decoration: none} 25pre.display {font-family: serif} 26pre.format {font-family: serif} 27pre.menu-comment {font-family: serif} 28pre.menu-preformatted {font-family: serif} 29pre.smalldisplay {font-family: serif; font-size: smaller} 30pre.smallexample {font-size: smaller} 31pre.smallformat {font-family: serif; font-size: smaller} 32pre.smalllisp {font-size: smaller} 33span.roman {font-family:serif; font-weight:normal;} 34span.sansserif {font-family:sans-serif; font-weight:normal;} 35ul.toc {list-style: none} 36--> 37</style> 38 39 40</head> 41 42<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000"> 43 44<table cellpadding="1" cellspacing="1" border="0"> 45<tr><td valign="middle" align="left">[<a href="libunistring_4.html#SEC11" title="Beginning of this chapter or previous chapter"> << </a>]</td> 46<td valign="middle" align="left">[<a href="libunistring_6.html#SEC18" title="Next chapter"> >> </a>]</td> 47<td valign="middle" align="left"> </td> 48<td valign="middle" align="left"> </td> 49<td valign="middle" align="left"> </td> 50<td valign="middle" align="left"> </td> 51<td valign="middle" align="left"> </td> 52<td valign="middle" align="left">[<a href="libunistring.html#SEC_Top" title="Cover (top) of document">Top</a>]</td> 53<td valign="middle" align="left">[<a href="libunistring.html#SEC_Contents" title="Table of contents">Contents</a>]</td> 54<td valign="middle" align="left">[<a href="libunistring_18.html#SEC71" title="Index">Index</a>]</td> 55<td valign="middle" align="left">[<a href="libunistring_abt.html#SEC_About" title="About (help)"> ? </a>]</td> 56</tr></table> 57 58<hr size="2"> 59<a name="uniconv_002eh"></a> 60<a name="SEC17"></a> 61<h1 class="chapter"> <a href="libunistring.html#TOC17">5. Conversions between Unicode and encodings <code><uniconv.h></code></a> </h1> 62 63<p>This include file declares functions for converting between Unicode strings 64and <code>char *</code> strings in locale encoding or in other specified encodings. 65</p> 66<a name="IDX154"></a> 67<p>The following function returns the locale encoding. 68</p> 69<dl> 70<dt><u>Function:</u> const char * <b>locale_charset</b><i> ()</i> 71<a name="IDX155"></a> 72</dt> 73<dd><p>Determines the current locale's character encoding, and canonicalizes it 74into one of the canonical names listed in ‘<tt>config.charset</tt>’. 75If the canonical name cannot be determined, the result is a non-canonical 76name. 77</p> 78<p>The result must not be freed; it is statically allocated. 79</p> 80<p>The result of this function can be used as an argument to the <code>iconv_open</code> 81function in GNU libc, in GNU libiconv, or in the gnulib provided wrapper 82around the native <code>iconv_open</code> function. It may not work as an argument 83to the native <code>iconv_open</code> function directly. 84</p></dd></dl> 85 86<p>The handling of unconvertible characters during the conversions can be 87parametrized through the following enumeration type: 88</p> 89<dl> 90<dt><u>Type:</u> <b>enum iconv_ilseq_handler</b> 91<a name="IDX156"></a> 92</dt> 93<dd><p>This type specifies how unconvertible characters in the input are handled. 94</p></dd></dl> 95 96<dl> 97<dt><u>Constant:</u> enum iconv_ilseq_handler <b>iconveh_error</b> 98<a name="IDX157"></a> 99</dt> 100<dd><p>This handler causes the function to return with <code>errno</code> set to 101<code>EILSEQ</code>. 102</p></dd></dl> 103 104<dl> 105<dt><u>Constant:</u> enum iconv_ilseq_handler <b>iconveh_question_mark</b> 106<a name="IDX158"></a> 107</dt> 108<dd><p>This handler produces one question mark ‘<samp>?</samp>’ per unconvertible character. 109</p></dd></dl> 110 111<dl> 112<dt><u>Constant:</u> enum iconv_ilseq_handler <b>iconveh_escape_sequence</b> 113<a name="IDX159"></a> 114</dt> 115<dd><p>This handler produces an escape sequence <code>\u<var>xxxx</var></code> or 116<code>\U<var>xxxxxxxx</var></code> for each unconvertible character. 117</p></dd></dl> 118 119<a name="IDX160"></a> 120<p>The following functions convert between strings in a specified encoding and 121Unicode strings. 122</p> 123<dl> 124<dt><u>Function:</u> uint8_t * <b>u8_conv_from_encoding</b><i> (const char *<var>fromcode</var>, enum iconv_ilseq_handler <var>handler</var>, const char *<var>src</var>, size_t <var>srclen</var>, size_t *<var>offsets</var>, uint8_t *<var>resultbuf</var>, size_t *<var>lengthp</var>)</i> 125<a name="IDX161"></a> 126</dt> 127<dt><u>Function:</u> uint16_t * <b>u16_conv_from_encoding</b><i> (const char *<var>fromcode</var>, enum iconv_ilseq_handler <var>handler</var>, const char *<var>src</var>, size_t <var>srclen</var>, size_t *<var>offsets</var>, uint16_t *<var>resultbuf</var>, size_t *<var>lengthp</var>)</i> 128<a name="IDX162"></a> 129</dt> 130<dt><u>Function:</u> uint32_t * <b>u32_conv_from_encoding</b><i> (const char *<var>fromcode</var>, enum iconv_ilseq_handler <var>handler</var>, const char *<var>src</var>, size_t <var>srclen</var>, size_t *<var>offsets</var>, uint32_t *<var>resultbuf</var>, size_t *<var>lengthp</var>)</i> 131<a name="IDX163"></a> 132</dt> 133<dd><p>Converts an entire string, possibly including NUL bytes, from one encoding 134to UTF-8 encoding. 135</p> 136<p>Converts a memory region given in encoding <var>fromcode</var>. <var>fromcode</var> is 137as for the <code>iconv_open</code> function. 138</p> 139<p>The input is in the memory region between <var>src</var> (inclusive) and 140<code><var>src</var> + <var>srclen</var></code> (exclusive). 141</p> 142<p>If <var>offsets</var> is not NULL, it should point to an array of <var>srclen</var> 143integers; this array is filled with offsets into the result, i.e. the 144character starting at <code><var>src</var>[i]</code> corresponds to the character starting 145at <code><var>result</var>[<var>offsets</var>[i]]</code>, and other offsets are set to 146<code>(size_t)(-1)</code>. 147</p> 148<p><code><var>resultbuf</var></code> and <code>*<var>lengthp</var></code> should be a scratch 149buffer and its size, or <code><var>resultbuf</var></code> can be NULL. 150</p> 151<p>May erase the contents of the memory at <code><var>resultbuf</var></code>. 152</p> 153<p>If successful: The resulting Unicode string (non-NULL) is returned and 154its length stored in <code>*<var>lengthp</var></code>. The resulting string is 155<code><var>resultbuf</var></code> if no dynamic memory allocation was necessary, 156or a freshly allocated memory block otherwise. 157</p> 158<p>In case of error: NULL is returned and <code>errno</code> is set. 159Particular <code>errno</code> values: <code>EINVAL</code>, <code>EILSEQ</code>, <code>ENOMEM</code>. 160</p></dd></dl> 161 162<dl> 163<dt><u>Function:</u> char * <b>u8_conv_to_encoding</b><i> (const char *<var>tocode</var>, enum iconv_ilseq_handler <var>handler</var>, const uint8_t *<var>src</var>, size_t <var>srclen</var>, size_t *<var>offsets</var>, char *<var>resultbuf</var>, size_t *<var>lengthp</var>)</i> 164<a name="IDX164"></a> 165</dt> 166<dt><u>Function:</u> char * <b>u16_conv_to_encoding</b><i> (const char *<var>tocode</var>, enum iconv_ilseq_handler <var>handler</var>, const uint16_t *<var>src</var>, size_t <var>srclen</var>, size_t *<var>offsets</var>, char *<var>resultbuf</var>, size_t *<var>lengthp</var>)</i> 167<a name="IDX165"></a> 168</dt> 169<dt><u>Function:</u> char * <b>u32_conv_to_encoding</b><i> (const char *<var>tocode</var>, enum iconv_ilseq_handler <var>handler</var>, const uint32_t *<var>src</var>, size_t <var>srclen</var>, size_t *<var>offsets</var>, char *<var>resultbuf</var>, size_t *<var>lengthp</var>)</i> 170<a name="IDX166"></a> 171</dt> 172<dd><p>Converts an entire Unicode string, possibly including NUL units, from UTF-8 173encoding to a given encoding. 174</p> 175<p>Converts a memory region to encoding <var>tocode</var>. <var>tocode</var> is as for 176the <code>iconv_open</code> function. 177</p> 178<p>The input is in the memory region between <var>src</var> (inclusive) and 179<code><var>src</var> + <var>srclen</var></code> (exclusive). 180</p> 181<p>If <var>offsets</var> is not NULL, it should point to an array of <var>srclen</var> 182integers; this array is filled with offsets into the result, i.e. the 183character starting at <code><var>src</var>[i]</code> corresponds to the character starting 184at <code><var>result</var>[<var>offsets</var>[i]]</code>, and other offsets are set to 185<code>(size_t)(-1)</code>. 186</p> 187<p><code><var>resultbuf</var></code> and <code>*<var>lengthp</var></code> should be a scratch 188buffer and its size, or <code><var>resultbuf</var></code> can be NULL. 189</p> 190<p>May erase the contents of the memory at <code><var>resultbuf</var></code>. 191</p> 192<p>If successful: The resulting Unicode string (non-NULL) is returned and 193its length stored in <code>*<var>lengthp</var></code>. The resulting string is 194<code><var>resultbuf</var></code> if no dynamic memory allocation was necessary, 195or a freshly allocated memory block otherwise. 196</p> 197<p>In case of error: NULL is returned and <code>errno</code> is set. 198Particular <code>errno</code> values: <code>EINVAL</code>, <code>EILSEQ</code>, <code>ENOMEM</code>. 199</p></dd></dl> 200 201<p>The following functions convert between NUL terminated strings in a specified 202encoding and NUL terminated Unicode strings. 203</p> 204<dl> 205<dt><u>Function:</u> uint8_t * <b>u8_strconv_from_encoding</b><i> (const char *<var>string</var>, const char *<var>fromcode</var>, enum iconv_ilseq_handler <var>handler</var>)</i> 206<a name="IDX167"></a> 207</dt> 208<dt><u>Function:</u> uint16_t * <b>u16_strconv_from_encoding</b><i> (const char *<var>string</var>, const char *<var>fromcode</var>, enum iconv_ilseq_handler <var>handler</var>)</i> 209<a name="IDX168"></a> 210</dt> 211<dt><u>Function:</u> uint32_t * <b>u32_strconv_from_encoding</b><i> (const char *<var>string</var>, const char *<var>fromcode</var>, enum iconv_ilseq_handler <var>handler</var>)</i> 212<a name="IDX169"></a> 213</dt> 214<dd><p>Converts a NUL terminated string from a given encoding. 215</p> 216<p>The result is <code>malloc</code> allocated, or NULL (with <var>errno</var> set) in case of error. 217</p> 218<p>Particular <code>errno</code> values: <code>EILSEQ</code>, <code>ENOMEM</code>. 219</p></dd></dl> 220 221<dl> 222<dt><u>Function:</u> char * <b>u8_strconv_to_encoding</b><i> (const uint8_t *<var>string</var>, const char *<var>tocode</var>, enum iconv_ilseq_handler <var>handler</var>)</i> 223<a name="IDX170"></a> 224</dt> 225<dt><u>Function:</u> char * <b>u16_strconv_to_encoding</b><i> (const uint16_t *<var>string</var>, const char *<var>tocode</var>, enum iconv_ilseq_handler <var>handler</var>)</i> 226<a name="IDX171"></a> 227</dt> 228<dt><u>Function:</u> char * <b>u32_strconv_to_encoding</b><i> (const uint32_t *<var>string</var>, const char *<var>tocode</var>, enum iconv_ilseq_handler <var>handler</var>)</i> 229<a name="IDX172"></a> 230</dt> 231<dd><p>Converts a NUL terminated string to a given encoding. 232</p> 233<p>The result is <code>malloc</code> allocated, or NULL (with <code>errno</code> set) in case of error. 234</p> 235<p>Particular <code>errno</code> values: <code>EILSEQ</code>, <code>ENOMEM</code>. 236</p></dd></dl> 237 238<p>The following functions are shorthands that convert between NUL terminated 239strings in locale encoding and NUL terminated Unicode strings. 240</p> 241<dl> 242<dt><u>Function:</u> uint8_t * <b>u8_strconv_from_locale</b><i> (const char *<var>string</var>)</i> 243<a name="IDX173"></a> 244</dt> 245<dt><u>Function:</u> uint16_t * <b>u16_strconv_from_locale</b><i> (const char *<var>string</var>)</i> 246<a name="IDX174"></a> 247</dt> 248<dt><u>Function:</u> uint32_t * <b>u32_strconv_from_locale</b><i> (const char *<var>string</var>)</i> 249<a name="IDX175"></a> 250</dt> 251<dd><p>Converts a NUL terminated string from the locale encoding. 252</p> 253<p>The result is <code>malloc</code> allocated, or NULL (with <code>errno</code> set) in case of error. 254</p> 255<p>Particular <code>errno</code> values: <code>ENOMEM</code>. 256</p></dd></dl> 257 258<dl> 259<dt><u>Function:</u> char * <b>u8_strconv_to_locale</b><i> (const uint8_t *<var>string</var>)</i> 260<a name="IDX176"></a> 261</dt> 262<dt><u>Function:</u> char * <b>u16_strconv_to_locale</b><i> (const uint16_t *<var>string</var>)</i> 263<a name="IDX177"></a> 264</dt> 265<dt><u>Function:</u> char * <b>u32_strconv_to_locale</b><i> (const uint32_t *<var>string</var>)</i> 266<a name="IDX178"></a> 267</dt> 268<dd><p>Converts a NUL terminated string to the locale encoding. 269</p> 270<p>The result is <code>malloc</code> allocated, or NULL (with <code>errno</code> set) in case of error. 271</p> 272<p>Particular <code>errno</code> values: <code>ENOMEM</code>. 273</p></dd></dl> 274<hr size="6"> 275<table cellpadding="1" cellspacing="1" border="0"> 276<tr><td valign="middle" align="left">[<a href="libunistring_4.html#SEC11" title="Beginning of this chapter or previous chapter"> << </a>]</td> 277<td valign="middle" align="left">[<a href="libunistring_6.html#SEC18" title="Next chapter"> >> </a>]</td> 278<td valign="middle" align="left"> </td> 279<td valign="middle" align="left"> </td> 280<td valign="middle" align="left"> </td> 281<td valign="middle" align="left"> </td> 282<td valign="middle" align="left"> </td> 283<td valign="middle" align="left">[<a href="libunistring.html#SEC_Top" title="Cover (top) of document">Top</a>]</td> 284<td valign="middle" align="left">[<a href="libunistring.html#SEC_Contents" title="Table of contents">Contents</a>]</td> 285<td valign="middle" align="left">[<a href="libunistring_18.html#SEC71" title="Index">Index</a>]</td> 286<td valign="middle" align="left">[<a href="libunistring_abt.html#SEC_About" title="About (help)"> ? </a>]</td> 287</tr></table> 288<p> 289 <font size="-1"> 290 This document was generated on <i>March, 19 2013</i> using <a href="http://www.nongnu.org/texi2html/"><i>texi2html 1.78a</i></a>. 291 </font> 292 <br> 293 294</p> 295</body> 296</html> 297