1<?xml version="1.0" encoding="UTF-8" standalone="no"?>
2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>Facets</title><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><meta name="keywords" content="ISO C++, library" /><meta name="keywords" content="ISO C++, runtime, library" /><link rel="home" href="../index.html" title="The GNU C++ Library" /><link rel="up" href="localization.html" title="Chapter��8.�� Localization" /><link rel="prev" href="localization.html" title="Chapter��8.�� Localization" /><link rel="next" href="containers.html" title="Chapter��9.�� Containers" /></head><body><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Facets</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="localization.html">Prev</a>��</td><th width="60%" align="center">Chapter��8.��
3  Localization
4  
5</th><td width="20%" align="right">��<a accesskey="n" href="containers.html">Next</a></td></tr></table><hr /></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="std.localization.facet"></a>Facets</h2></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="std.localization.facet.ctype"></a>ctype</h3></div></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.ctype.impl"></a>Implementation</h4></div></div></div><div class="section"><div class="titlepage"><div><div><h5 class="title"><a id="facet.ctype.impl.spec"></a>Specializations</h5></div></div></div><p>
6For the required specialization <code class="classname">codecvt&lt;wchar_t, char, mbstate_t&gt;</code>,
7conversions are made between the internal character set (always UCS4
8on GNU/Linux) and whatever the currently selected locale for the
9<code class="code">LC_CTYPE</code> category implements.
10</p><p>
11The two required specializations are implemented as follows:
12</p><p>
13<code class="code">
14ctype&lt;char&gt;
15</code>
16</p><p>
17This is simple specialization. Implementing this was a piece of cake.
18</p><p>
19<code class="code">
20ctype&lt;wchar_t&gt;
21</code>
22</p><p>
23This specialization, by specifying all the template parameters, pretty
24much ties the hands of implementors. As such, the implementation is
25straightforward, involving <code class="function">mcsrtombs</code> for the
26conversions between <span class="type">char</span> to <span class="type">wchar_t</span> and
27<code class="function">wcsrtombs</code> for conversions between <span class="type">wchar_t</span>
28and <span class="type">char</span>.
29</p><p>
30Neither of these two required specializations deals with Unicode
31characters.
32</p></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.ctype.future"></a>Future</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
33   How to deal with the global locale issue?
34   </p></li><li class="listitem"><p>
35   How to deal with types other than <span class="type">char</span>, <span class="type">wchar_t</span>?
36   </p></li><li class="listitem"><p>
37   Overlap between codecvt/ctype: narrow/widen
38   </p></li><li class="listitem"><p>
39       <span class="type">mask</span> typedef in <code class="classname">codecvt_base</code>,
40       argument types in <span class="type">codecvt</span>.  what is know about this type?
41   </p></li><li class="listitem"><p>
42   Why mask* argument in codecvt?
43   </p></li><li class="listitem"><p>
44       Can this be made (more) generic? is there a simple way to
45       straighten out the configure-time mess that is a by-product of
46       this class?
47   </p></li><li class="listitem"><p>
48       Get the <span class="type">ctype&lt;wchar_t&gt;::mask</span> stuff under control.
49       Need to make some kind of static table, and not do lookup every time
50       somebody hits the <code class="code">do_is...</code> functions. Too bad we can't
51       just redefine <span class="type">mask</span> for
52       <code class="classname">ctype&lt;wchar_t&gt;</code>
53   </p></li><li class="listitem"><p>
54       Rename abstract base class. See if just smash-overriding is a
55       better approach. Clarify, add sanity to naming.
56     </p></li></ul></div></div><div class="bibliography"><div class="titlepage"><div><div><h4 class="title"><a id="facet.ctype.biblio"></a>Bibliography</h4></div></div></div><div class="biblioentry"><a id="id-1.3.4.6.3.2.4.2"></a><p><span class="citetitle"><em class="citetitle">
57      The GNU C Library
58    </em>. </span><span class="author"><span class="firstname">Roland</span> <span class="surname">McGrath</span>. </span><span class="author"><span class="firstname">Ulrich</span> <span class="surname">Drepper</span>. </span><span class="copyright">Copyright �� 2007 FSF. </span><span class="pagenums">Chapters 6  Character Set Handling and 7 Locales and Internationalization. </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.2.4.3"></a><p><span class="citetitle"><em class="citetitle">
59      Correspondence
60    </em>. </span><span class="author"><span class="firstname">Ulrich</span> <span class="surname">Drepper</span>. </span><span class="copyright">Copyright �� 2002 . </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.2.4.4"></a><p><span class="citetitle"><em class="citetitle">
61      ISO/IEC 14882:1998 Programming languages - C++
62    </em>. </span><span class="copyright">Copyright �� 1998 ISO. </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.2.4.5"></a><p><span class="citetitle"><em class="citetitle">
63      ISO/IEC 9899:1999 Programming languages - C
64    </em>. </span><span class="copyright">Copyright �� 1999 ISO. </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.2.4.6"></a><p><span class="title"><em>
65	<a class="link" href="https://www.unix.org/version3/ieee_std.html" target="_top">
66	The Open Group Base Specifications, Issue 6 (IEEE Std. 1003.1-2004)
67	</a>
68      </em>. </span><span class="copyright">Copyright �� 1999 
69      The Open Group/The Institute of Electrical and Electronics Engineers, Inc.. </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.2.4.7"></a><p><span class="citetitle"><em class="citetitle">
70      The C++ Programming Language, Special Edition
71    </em>. </span><span class="author"><span class="firstname">Bjarne</span> <span class="surname">Stroustrup</span>. </span><span class="copyright">Copyright �� 2000 Addison Wesley, Inc.. </span><span class="pagenums">Appendix D. </span><span class="publisher"><span class="publishername">
72	Addison Wesley
73      . </span></span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.2.4.8"></a><p><span class="citetitle"><em class="citetitle">
74      Standard C++ IOStreams and Locales
75    </em>. </span><span class="subtitle">
76      Advanced Programmer's Guide and Reference
77    . </span><span class="author"><span class="firstname">Angelika</span> <span class="surname">Langer</span>. </span><span class="author"><span class="firstname">Klaus</span> <span class="surname">Kreft</span>. </span><span class="copyright">Copyright �� 2000 Addison Wesley Longman, Inc.. </span><span class="publisher"><span class="publishername">
78	Addison Wesley Longman
79      . </span></span></p></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="std.localization.facet.codecvt"></a>codecvt</h3></div></div></div><p>
80The standard class codecvt attempts to address conversions between
81different character encoding schemes. In particular, the standard
82attempts to detail conversions between the implementation-defined wide
83characters (hereafter referred to as <span class="type">wchar_t</span>) and the standard
84type <span class="type">char</span> that is so beloved in classic <span class="quote">���<span class="quote">C</span>���</span>
85(which can now be referred to as narrow characters.)  This document attempts
86to describe how the GNU libstdc++ implementation deals with the conversion
87between wide and narrow characters, and also presents a framework for dealing
88with the huge number of other encodings that iconv can convert,
89including Unicode and UTF8. Design issues and requirements are
90addressed, and examples of correct usage for both the required
91specializations for wide and narrow characters and the
92implementation-provided extended functionality are given.
93</p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.codecvt.req"></a>Requirements</h4></div></div></div><p>
94Around page 425 of the C++ Standard, this charming heading comes into view:
95</p><div class="blockquote"><blockquote class="blockquote"><p>
9622.2.1.5 - Template class codecvt
97</p></blockquote></div><p>
98The text around the codecvt definition gives some clues:
99</p><div class="blockquote"><blockquote class="blockquote"><p>
100<span class="emphasis"><em>
101-1- The class <code class="code">codecvt&lt;internT,externT,stateT&gt;</code> is for use
102when converting from one codeset to another, such as from wide characters
103to multibyte characters, between wide character encodings such as
104Unicode and EUC.
105</em></span>
106</p></blockquote></div><p>
107Hmm. So, in some unspecified way, Unicode encodings and
108translations between other character sets should be handled by this
109class.
110</p><div class="blockquote"><blockquote class="blockquote"><p>
111<span class="emphasis"><em>
112-2- The <span class="type">stateT</span> argument selects the pair of codesets being mapped between.
113</em></span>
114</p></blockquote></div><p>
115Ah ha! Another clue...
116</p><div class="blockquote"><blockquote class="blockquote"><p>
117<span class="emphasis"><em>
118-3- The instantiations required in the Table 51 (lib.locale.category), namely
119<code class="classname">codecvt&lt;wchar_t,char,mbstate_t&gt;</code> and
120<code class="classname">codecvt&lt;char,char,mbstate_t&gt;</code>, convert the
121implementation-defined native character set.
122<code class="classname">codecvt&lt;char,char,mbstate_t&gt;</code> implements a
123degenerate conversion; it does not convert at all.
124<code class="classname">codecvt&lt;wchar_t,char,mbstate_t&gt;</code> converts between
125the native character sets for tiny and wide characters. Instantiations on
126<span class="type">mbstate_t</span> perform conversion between encodings known to the library
127implementor.  Other encodings can be converted by specializing on a
128user-defined <span class="type">stateT</span> type. The <span class="type">stateT</span> object can
129contain any state that is useful to communicate to or from the specialized
130<code class="function">do_convert</code> member.
131</em></span>
132</p></blockquote></div><p>
133At this point, a couple points become clear:
134</p><p>
135One: The standard clearly implies that attempts to add non-required
136(yet useful and widely used) conversions need to do so through the
137third template parameter, <span class="type">stateT</span>.</p><p>
138Two: The required conversions, by specifying <span class="type">mbstate_t</span> as the
139third template parameter, imply an implementation strategy that is mostly
140(or wholly) based on the underlying C library, and the functions
141<code class="function">mcsrtombs</code> and <code class="function">wcsrtombs</code> in
142particular.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.codecvt.design"></a>Design</h4></div></div></div><div class="section"><div class="titlepage"><div><div><h5 class="title"><a id="codecvt.design.wchar_t_size"></a><span class="type">wchar_t</span> Size</h5></div></div></div><p>
143      The simple implementation detail of <span class="type">wchar_t</span>'s size seems to
144      repeatedly confound people. Many systems use a two byte,
145      unsigned integral type to represent wide characters, and use an
146      internal encoding of Unicode or UCS2. (See AIX, Microsoft NT,
147      Java, others.) Other systems, use a four byte, unsigned integral
148      type to represent wide characters, and use an internal encoding
149      of UCS4. (GNU/Linux systems using glibc, in particular.) The C
150      programming language (and thus C++) does not specify a specific
151      size for the type <span class="type">wchar_t</span>.
152    </p><p>
153      Thus, portable C++ code cannot assume a byte size (or endianness) either.
154    </p></div><div class="section"><div class="titlepage"><div><div><h5 class="title"><a id="codecvt.design.unicode"></a>Support for Unicode</h5></div></div></div><p>
155    Probably the most frequently asked question about code conversion
156    is: "So dudes, what's the deal with Unicode strings?"
157    The dude part is optional, but apparently the usefulness of
158    Unicode strings is pretty widely appreciated. The Unicode character
159    set (and useful encodings like UTF-8, UCS-4, ISO 8859-10,
160    etc etc etc) were not mentioned in the first C++ standard. (The 2011
161    standard added support for string literals with different encodings
162    and some library facilities for converting between encodings, but the
163    notes below have not been updated to reflect that.)
164  </p><p>
165    A couple of comments:
166  </p><p>
167    The thought that all one needs to convert between two arbitrary
168    codesets is two types and some kind of state argument is
169    unfortunate. In particular, encodings may be stateless. The naming
170    of the third parameter as <span class="type">stateT</span> is unfortunate, as what is
171    really needed is some kind of generalized type that accounts for the
172    issues that abstract encodings will need. The minimum information
173    that is required includes:
174  </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
175	Identifiers for each of the codesets involved in the
176	conversion. For example, using the iconv family of functions
177	from the Single Unix Specification (what used to be called
178	X/Open) hosted on the GNU/Linux operating system allows
179	bi-directional mapping between far more than the following
180	tantalizing possibilities:
181      </p><p>
182	(An edited list taken from <code class="code">`iconv --list`</code> on a
183	Red Hat 6.2/Intel system:
184      </p><div class="blockquote"><blockquote class="blockquote"><pre class="programlisting">
1858859_1, 8859_9, 10646-1:1993, 10646-1:1993/UCS4, ARABIC, ARABIC7,
186ASCII, EUC-CN, EUC-JP, EUC-KR, EUC-TW, GREEK-CCIcode, GREEK, GREEK7-OLD,
187GREEK7, GREEK8, HEBREW, ISO-8859-1, ISO-8859-2, ISO-8859-3,
188ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8,
189ISO-8859-9, ISO-8859-10, ISO-8859-11, ISO-8859-13, ISO-8859-14,
190ISO-8859-15, ISO-10646, ISO-10646/UCS2, ISO-10646/UCS4,
191ISO-10646/UTF-8, ISO-10646/UTF8, SHIFT-JIS, SHIFT_JIS, UCS-2, UCS-4,
192UCS2, UCS4, UNICODE, UNICODEBIG, UNICODELIcodeLE, US-ASCII, US, UTF-8,
193UTF-16, UTF8, UTF16).
194</pre></blockquote></div><p>
195For iconv-based implementations, string literals for each of the
196encodings (i.e. "UCS-2" and "UTF-8") are necessary,
197although for other,
198non-iconv implementations a table of enumerated values or some other
199mechanism may be required.
200</p></li><li class="listitem"><p>
201 Maximum length of the identifying string literal.
202</p></li><li class="listitem"><p>
203 Some encodings require explicit endian-ness. As such, some kind
204  of endian marker or other byte-order marker will be necessary. See
205  "Footnotes for C/C++ developers" in Haible for more information on
206  UCS-2/Unicode endian issues. (Summary: big endian seems most likely,
207  however implementations, most notably Microsoft, vary.)
208</p></li><li class="listitem"><p>
209 Types representing the conversion state, for conversions involving
210  the machinery in the "C" library, or the conversion descriptor, for
211  conversions using iconv (such as the type iconv_t.)  Note that the
212  conversion descriptor encodes more information than a simple encoding
213  state type.
214</p></li><li class="listitem"><p>
215 Conversion descriptors for both directions of encoding. (i.e., both
216  UCS-2 to UTF-8 and UTF-8 to UCS-2.)
217</p></li><li class="listitem"><p>
218 Something to indicate if the conversion requested if valid.
219</p></li><li class="listitem"><p>
220 Something to represent if the conversion descriptors are valid.
221</p></li><li class="listitem"><p>
222 Some way to enforce strict type checking on the internal and
223  external types. As part of this, the size of the internal and
224  external types will need to be known.
225</p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h5 class="title"><a id="codecvt.design.issues"></a>Other Issues</h5></div></div></div><p>
226In addition, multi-threaded and multi-locale environments also impact
227the design and requirements for code conversions. In particular, they
228affect the required specialization
229<code class="classname">codecvt&lt;wchar_t, char, mbstate_t&gt;</code>
230when implemented using standard "C" functions.
231</p><p>
232Three problems arise, one big, one of medium importance, and one small.
233</p><p>
234First, the small: <code class="function">mcsrtombs</code> and
235<code class="function">wcsrtombs</code> may not be multithread-safe
236on all systems required by the GNU tools. For GNU/Linux and glibc,
237this is not an issue.
238</p><p>
239Of medium concern, in the grand scope of things, is that the functions
240used to implement this specialization work on null-terminated
241strings. Buffers, especially file buffers, may not be null-terminated,
242thus giving conversions that end prematurely or are otherwise
243incorrect. Yikes!
244</p><p>
245The last, and fundamental problem, is the assumption of a global
246locale for all the "C" functions referenced above. For something like
247C++ iostreams (where codecvt is explicitly used) the notion of
248multiple locales is fundamental. In practice, most users may not run
249into this limitation. However, as a quality of implementation issue,
250the GNU C++ library would like to offer a solution that allows
251multiple locales and or simultaneous usage with computationally
252correct results. In short, libstdc++ is trying to offer, as an
253option, a high-quality implementation, damn the additional complexity!
254</p><p>
255For the required specialization
256<code class="classname">codecvt&lt;wchar_t, char, mbstate_t&gt;</code>,
257conversions are made between the internal character set (always UCS4
258on GNU/Linux) and whatever the currently selected locale for the
259LC_CTYPE category implements.
260</p></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.codecvt.impl"></a>Implementation</h4></div></div></div><p>
261The two required specializations are implemented as follows:
262</p><p>
263<code class="code">
264codecvt&lt;char, char, mbstate_t&gt;
265</code>
266</p><p>
267This is a degenerate (i.e., does nothing) specialization. Implementing
268this was a piece of cake.
269</p><p>
270<code class="code">
271codecvt&lt;char, wchar_t, mbstate_t&gt;
272</code>
273</p><p>
274This specialization, by specifying all the template parameters, pretty
275much ties the hands of implementors. As such, the implementation is
276straightforward, involving <code class="function">mcsrtombs</code> for the conversions
277between <span class="type">char</span> to <span class="type">wchar_t</span> and
278<code class="function">wcsrtombs</code> for conversions between <span class="type">wchar_t</span>
279and <span class="type">char</span>.
280</p><p>
281Neither of these two required specializations deals with Unicode
282characters. As such, libstdc++ implements a partial specialization
283of the <span class="type">codecvt</span> class with an iconv wrapper class,
284<code class="classname">encoding_state</code> as the third template parameter.
285</p><p>
286This implementation should be standards conformant. First of all, the
287standard explicitly points out that instantiations on the third
288template parameter, <span class="type">stateT</span>, are the proper way to implement
289non-required conversions. Second of all, the standard says (in Chapter
29017) that partial specializations of required classes are A-OK. Third
291of all, the requirements for the <span class="type">stateT</span> type elsewhere in the
292standard (see 21.1.2 traits typedefs) only indicate that this type be copy
293constructible.
294</p><p>
295As such, the type <span class="type">encoding_state</span> is defined as a non-templatized,
296POD type to be used as the third type of a <span class="type">codecvt</span> instantiation.
297This type is just a wrapper class for iconv, and provides an easy interface
298to iconv functionality.
299</p><p>
300There are two constructors for <span class="type">encoding_state</span>:
301</p><p>
302<code class="code">
303encoding_state() : __in_desc(0), __out_desc(0)
304</code>
305</p><p>
306This default constructor sets the internal encoding to some default
307(currently UCS4) and the external encoding to whatever is returned by
308<code class="code">nl_langinfo(CODESET)</code>.
309</p><p>
310<code class="code">
311encoding_state(const char* __int, const char* __ext)
312</code>
313</p><p>
314This constructor takes as parameters string literals that indicate the
315desired internal and external encoding. There are no defaults for
316either argument.
317</p><p>
318One of the issues with iconv is that the string literals identifying
319conversions are not standardized. Because of this, the thought of
320mandating and/or enforcing some set of pre-determined valid
321identifiers seems iffy: thus, a more practical (and non-migraine
322inducing) strategy was implemented: end-users can specify any string
323(subject to a pre-determined length qualifier, currently 32 bytes) for
324encodings. It is up to the user to make sure that these strings are
325valid on the target system.
326</p><p>
327<code class="code">
328void
329_M_init()
330</code>
331</p><p>
332Strangely enough, this member function attempts to open conversion
333descriptors for a given encoding_state object. If the conversion
334descriptors are not valid, the conversion descriptors returned will
335not be valid and the resulting calls to the codecvt conversion
336functions will return error.
337</p><p>
338<code class="code">
339bool
340_M_good()
341</code>
342</p><p>
343Provides a way to see if the given <span class="type">encoding_state</span> object has been
344properly initialized. If the string literals describing the desired
345internal and external encoding are not valid, initialization will
346fail, and this will return false. If the internal and external
347encodings are valid, but <code class="function">iconv_open</code> could not allocate
348conversion descriptors, this will also return false. Otherwise, the object is
349ready to convert and will return true.
350</p><p>
351<code class="code">
352encoding_state(const encoding_state&amp;)
353</code>
354</p><p>
355As iconv allocates memory and sets up conversion descriptors, the copy
356constructor can only copy the member data pertaining to the internal
357and external code conversions, and not the conversion descriptors
358themselves.
359</p><p>
360Definitions for all the required codecvt member functions are provided
361for this specialization, and usage of <code class="code">codecvt&lt;<em class="replaceable"><code>internal
362character type</code></em>, <em class="replaceable"><code>external character type</code></em>, <em class="replaceable"><code>encoding_state</code></em>&gt;</code> is consistent with other
363codecvt usage.
364</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.codecvt.use"></a>Use</h4></div></div></div><p>A conversion involving a string literal.</p><pre class="programlisting">
365  typedef codecvt_base::result                  result;
366  typedef unsigned short                        unicode_t;
367  typedef unicode_t                             int_type;
368  typedef char                                  ext_type;
369  typedef encoding_state                          state_type;
370  typedef codecvt&lt;int_type, ext_type, state_type&gt; unicode_codecvt;
371
372  const ext_type*       e_lit = "black pearl jasmine tea";
373  int                   size = strlen(e_lit);
374  int_type              i_lit_base[24] =
375  { 25088, 27648, 24832, 25344, 27392, 8192, 28672, 25856, 24832, 29184,
376    27648, 8192, 27136, 24832, 29440, 27904, 26880, 28160, 25856, 8192, 29696,
377    25856, 24832, 2560
378  };
379  const int_type*       i_lit = i_lit_base;
380  const ext_type*       efrom_next;
381  const int_type*       ifrom_next;
382  ext_type*             e_arr = new ext_type[size + 1];
383  ext_type*             eto_next;
384  int_type*             i_arr = new int_type[size + 1];
385  int_type*             ito_next;
386
387  // construct a locale object with the specialized facet.
388  locale                loc(locale::classic(), new unicode_codecvt);
389  // sanity check the constructed locale has the specialized facet.
390  VERIFY( has_facet&lt;unicode_codecvt&gt;(loc) );
391  const unicode_codecvt&amp; cvt = use_facet&lt;unicode_codecvt&gt;(loc);
392  // convert between const char* and unicode strings
393  unicode_codecvt::state_type state01("UNICODE", "ISO_8859-1");
394  initialize_state(state01);
395  result r1 = cvt.in(state01, e_lit, e_lit + size, efrom_next,
396		     i_arr, i_arr + size, ito_next);
397  VERIFY( r1 == codecvt_base::ok );
398  VERIFY( !int_traits::compare(i_arr, i_lit, size) );
399  VERIFY( efrom_next == e_lit + size );
400  VERIFY( ito_next == i_arr + size );
401</pre></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.codecvt.future"></a>Future</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
402   a. things that are sketchy, or remain unimplemented:
403      do_encoding, max_length and length member functions
404      are only weakly implemented. I have no idea how to do
405      this correctly, and in a generic manner.  Nathan?
406</p></li><li class="listitem"><p>
407   b. conversions involving <span class="type">std::string</span>
408  </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: circle; "><li class="listitem"><p>
409      how should operators != and == work for string of
410      different/same encoding?
411      </p></li><li class="listitem"><p>
412      what is equal? A byte by byte comparison or an
413      encoding then byte comparison?
414      </p></li><li class="listitem"><p>
415      conversions between narrow, wide, and unicode strings
416      </p></li></ul></div></li><li class="listitem"><p>
417   c. conversions involving std::filebuf and std::ostream
418</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: circle; "><li class="listitem"><p>
419      how to initialize the state object in a
420      standards-conformant manner?
421      </p></li><li class="listitem"><p>
422      how to synchronize the "C" and "C++"
423      conversion information?
424      </p></li><li class="listitem"><p>
425      wchar_t/char internal buffers and conversions between
426      internal/external buffers?
427      </p></li></ul></div></li></ul></div></div><div class="bibliography"><div class="titlepage"><div><div><h4 class="title"><a id="facet.codecvt.biblio"></a>Bibliography</h4></div></div></div><div class="biblioentry"><a id="id-1.3.4.6.3.3.8.2"></a><p><span class="citetitle"><em class="citetitle">
428      The GNU C Library
429    </em>. </span><span class="author"><span class="firstname">Roland</span> <span class="surname">McGrath</span>. </span><span class="author"><span class="firstname">Ulrich</span> <span class="surname">Drepper</span>. </span><span class="copyright">Copyright �� 2007 FSF. </span><span class="pagenums">
430      Chapters 6 Character Set Handling and 7 Locales and Internationalization
431    . </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.3.8.3"></a><p><span class="citetitle"><em class="citetitle">
432      Correspondence
433    </em>. </span><span class="author"><span class="firstname">Ulrich</span> <span class="surname">Drepper</span>. </span><span class="copyright">Copyright �� 2002 . </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.3.8.4"></a><p><span class="citetitle"><em class="citetitle">
434      ISO/IEC 14882:1998 Programming languages - C++
435    </em>. </span><span class="copyright">Copyright �� 1998 ISO. </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.3.8.5"></a><p><span class="citetitle"><em class="citetitle">
436      ISO/IEC 9899:1999 Programming languages - C
437    </em>. </span><span class="copyright">Copyright �� 1999 ISO. </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.3.8.6"></a><p><span class="title"><em>
438	<a class="link" href="https://pubs.opengroup.org/onlinepubs/9699919799/" target="_top">
439      System Interface Definitions, Issue 7 (IEEE Std. 1003.1-2008)
440	</a>
441      </em>. </span><span class="copyright">Copyright �� 2008 
442	The Open Group/The Institute of Electrical and Electronics
443	Engineers, Inc.
444      . </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.3.8.7"></a><p><span class="citetitle"><em class="citetitle">
445      The C++ Programming Language, Special Edition
446    </em>. </span><span class="author"><span class="firstname">Bjarne</span> <span class="surname">Stroustrup</span>. </span><span class="copyright">Copyright �� 2000 Addison Wesley, Inc.. </span><span class="pagenums">Appendix D. </span><span class="publisher"><span class="publishername">
447	Addison Wesley
448      . </span></span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.3.8.8"></a><p><span class="citetitle"><em class="citetitle">
449      Standard C++ IOStreams and Locales
450    </em>. </span><span class="subtitle">
451      Advanced Programmer's Guide and Reference
452    . </span><span class="author"><span class="firstname">Angelika</span> <span class="surname">Langer</span>. </span><span class="author"><span class="firstname">Klaus</span> <span class="surname">Kreft</span>. </span><span class="copyright">Copyright �� 2000 Addison Wesley Longman, Inc.. </span><span class="publisher"><span class="publishername">
453	Addison Wesley Longman
454      . </span></span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.3.8.9"></a><p><span class="title"><em>
455	<a class="link" href="http://www.lysator.liu.se/c/na1.html" target="_top">
456      A brief description of Normative Addendum 1
457	</a>
458      </em>. </span><span class="author"><span class="firstname">Clive</span> <span class="surname">Feather</span>. </span><span class="pagenums">Extended Character Sets. </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.3.8.10"></a><p><span class="title"><em>
459	<a class="link" href="https://tldp.org/HOWTO/Unicode-HOWTO.html" target="_top">
460	  The Unicode HOWTO
461	</a>
462      </em>. </span><span class="author"><span class="firstname">Bruno</span> <span class="surname">Haible</span>. </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.3.8.11"></a><p><span class="title"><em>
463	<a class="link" href="https://www.cl.cam.ac.uk/~mgk25/unicode.html" target="_top">
464      UTF-8 and Unicode FAQ for Unix/Linux
465	</a>
466      </em>. </span><span class="author"><span class="firstname">Markus</span> <span class="surname">Khun</span>. </span></p></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="std.localization.facet.messages"></a>messages</h3></div></div></div><p>
467The <code class="classname">std::messages</code> facet implements message retrieval functionality
468equivalent to Java's <code class="classname">java.text.MessageFormat</code> using either GNU <code class="function">gettext</code>
469or IEEE 1003.1-200 functions.
470</p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.messages.req"></a>Requirements</h4></div></div></div><p>
471The <code class="classname">std::messages</code> facet is probably the most vaguely defined facet in
472the standard library. It's assumed that this facility was built into
473the standard library in order to convert string literals from one
474locale to the other. For instance, converting the "C" locale's
475<code class="code">const char* c = "please"</code> to a German-localized <code class="code">"bitte"</code>
476during program execution.
477</p><div class="blockquote"><blockquote class="blockquote"><p>
47822.2.7.1 - Template class messages [lib.locale.messages]
479</p></blockquote></div><p>
480This class has three public member functions, which directly
481correspond to three protected virtual member functions.
482</p><p>
483The public member functions are:
484</p><p>
485<code class="code">catalog open(const string&amp;, const locale&amp;) const</code>
486</p><p>
487<code class="code">string_type get(catalog, int, int, const string_type&amp;) const</code>
488</p><p>
489<code class="code">void close(catalog) const</code>
490</p><p>
491While the virtual functions are:
492</p><p>
493<code class="code">catalog do_open(const string&amp; name, const locale&amp; loc) const</code>
494</p><div class="blockquote"><blockquote class="blockquote"><p>
495<span class="emphasis"><em>
496-1- Returns: A value that may be passed to <code class="code">get()</code> to retrieve a
497message, from the message catalog identified by the string <code class="code">name</code>
498according to an implementation-defined mapping. The result can be used
499until it is passed to <code class="code">close()</code>.  Returns a value less than 0 if no such
500catalog can be opened.
501</em></span>
502</p></blockquote></div><p>
503<code class="code">string_type do_get(catalog cat, int set , int msgid, const string_type&amp; dfault) const</code>
504</p><div class="blockquote"><blockquote class="blockquote"><p>
505<span class="emphasis"><em>
506-3- Requires: A catalog <code class="code">cat</code> obtained from <code class="code">open()</code> and not yet closed.
507-4- Returns: A message identified by arguments <code class="code">set</code>, <code class="code">msgid</code>, and <code class="code">dfault</code>,
508according to an implementation-defined mapping. If no such message can
509be found, returns <code class="code">dfault</code>.
510</em></span>
511</p></blockquote></div><p>
512<code class="code">void do_close(catalog cat) const</code>
513</p><div class="blockquote"><blockquote class="blockquote"><p>
514<span class="emphasis"><em>
515-5- Requires: A catalog cat obtained from <code class="code">open()</code> and not yet closed.
516-6- Effects: Releases unspecified resources associated with <code class="code">cat</code>.
517-7- Notes: The limit on such resources, if any, is implementation-defined.
518</em></span>
519</p></blockquote></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.messages.design"></a>Design</h4></div></div></div><p>
520A couple of notes on the standard.
521</p><p>
522First, why is <code class="code">messages_base::catalog</code> specified as a typedef
523to int? This makes sense for implementations that use
524<code class="code">catopen</code> and define <code class="code">nl_catd</code> as int, but not for
525others. Fortunately, it's not heavily used and so only a minor irritant.
526This has been reported as a possible defect in the standard (LWG 2028).
527</p><p>
528Second, by making the member functions <code class="code">const</code>, it is
529impossible to save state in them. Thus, storing away information used
530in the 'open' member function for use in 'get' is impossible. This is
531unfortunate.
532</p><p>
533The 'open' member function in particular seems to be oddly
534designed. The signature seems quite peculiar. Why specify a <code class="code">const
535string&amp; </code> argument, for instance, instead of just <code class="code">const
536char*</code>? Or, why specify a <code class="code">const locale&amp;</code> argument that is
537to be used in the 'get' member function? How, exactly, is this locale
538argument useful? What was the intent? It might make sense if a locale
539argument was associated with a given default message string in the
540'open' member function, for instance. Quite murky and unclear, on
541reflection.
542</p><p>
543Lastly, it seems odd that messages, which explicitly require code
544conversion, don't use the codecvt facet. Because the messages facet
545has only one template parameter, it is assumed that ctype, and not
546codecvt, is to be used to convert between character sets.
547</p><p>
548It is implicitly assumed that the locale for the default message
549string in 'get' is in the "C" locale. Thus, all source code is assumed
550to be written in English, so translations are always from "en_US" to
551other, explicitly named locales.
552</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.messages.impl"></a>Implementation</h4></div></div></div><div class="section"><div class="titlepage"><div><div><h5 class="title"><a id="messages.impl.models"></a>Models</h5></div></div></div><p>
553    This is a relatively simple class, on the face of it. The standard
554    specifies very little in concrete terms, so generic
555    implementations that are conforming yet do very little are the
556    norm. Adding functionality that would be useful to programmers and
557    comparable to Java's java.text.MessageFormat takes a bit of work,
558    and is highly dependent on the capabilities of the underlying
559    operating system.
560  </p><p>
561    Three different mechanisms have been provided, selectable via
562    configure flags:
563  </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
564       generic
565     </p><p>
566       This model does very little, and is what is used by default.
567     </p></li><li class="listitem"><p>
568       gnu
569     </p><p>
570       The gnu model is complete and fully tested. It's based on the
571       GNU gettext package, which is part of glibc. It uses the
572       functions <code class="code">textdomain, bindtextdomain, gettext</code> to
573       implement full functionality. Creating message catalogs is a
574       relatively straight-forward process and is lightly documented
575       below, and fully documented in gettext's distributed
576       documentation.
577     </p></li><li class="listitem"><p>
578       ieee_1003.1-200x
579     </p><p>
580       This is a complete, though untested, implementation based on
581       the IEEE standard. The functions <code class="code">catopen, catgets,
582       catclose</code> are used to retrieve locale-specific messages
583       given the appropriate message catalogs that have been
584       constructed for their use. Note, the script <code class="code">
585       po2msg.sed</code> that is part of the gettext distribution can
586       convert gettext catalogs into catalogs that
587       <code class="code">catopen</code> can use.
588   </p></li></ul></div><p>
589A new, standards-conformant non-virtual member function signature was
590added for 'open' so that a directory could be specified with a given
591message catalog. This simplifies calling conventions for the gnu
592model.
593</p></div><div class="section"><div class="titlepage"><div><div><h5 class="title"><a id="messages.impl.gnu"></a>The GNU Model</h5></div></div></div><p>
594    The messages facet, because it is retrieving and converting
595    between characters sets, depends on the ctype and perhaps the
596    codecvt facet in a given locale. In addition, underlying "C"
597    library locale support is necessary for more than just the
598    <code class="code">LC_MESSAGES</code> mask: <code class="code">LC_CTYPE</code> is also
599    necessary. To avoid any unpleasantness, all bits of the "C" mask
600    (i.e. <code class="code">LC_ALL</code>) are set before retrieving messages.
601  </p><p>
602    Making the message catalogs can be initially tricky, but become
603    quite simple with practice. For complete info, see the gettext
604    documentation. Here's an idea of what is required:
605  </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
606       Make a source file with the required string literals that need
607       to be translated. See <code class="code">intl/string_literals.cc</code> for
608       an example.
609     </p></li><li class="listitem"><p>
610       Make initial catalog (see "4 Making the PO Template File" from
611       the gettext docs).</p><p>
612   <code class="code"> xgettext --c++ --debug string_literals.cc -o libstdc++.pot </code>
613   </p></li><li class="listitem"><p>Make language and country-specific locale catalogs.</p><p>
614   <code class="code">cp libstdc++.pot fr_FR.po</code>
615   </p><p>
616   <code class="code">cp libstdc++.pot de_DE.po</code>
617   </p></li><li class="listitem"><p>
618       Edit localized catalogs in emacs so that strings are
619       translated.
620     </p><p>
621   <code class="code">emacs fr_FR.po</code>
622   </p></li><li class="listitem"><p>Make the binary mo files.</p><p>
623   <code class="code">msgfmt fr_FR.po -o fr_FR.mo</code>
624   </p><p>
625   <code class="code">msgfmt de_DE.po -o de_DE.mo</code>
626   </p></li><li class="listitem"><p>Copy the binary files into the correct directory structure.</p><p>
627   <code class="code">cp fr_FR.mo (dir)/fr_FR/LC_MESSAGES/libstdc++.mo</code>
628   </p><p>
629   <code class="code">cp de_DE.mo (dir)/de_DE/LC_MESSAGES/libstdc++.mo</code>
630   </p></li><li class="listitem"><p>Use the new message catalogs.</p><p>
631   <code class="code">locale loc_de("de_DE");</code>
632   </p><p>
633   <code class="code">
634   use_facet&lt;messages&lt;char&gt; &gt;(loc_de).open("libstdc++", locale(), dir);
635   </code>
636   </p></li></ul></div></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.messages.use"></a>Use</h4></div></div></div><p>
637   A simple example using the GNU model of message conversion.
638 </p><pre class="programlisting">
639#include &lt;iostream&gt;
640#include &lt;locale&gt;
641using namespace std;
642
643void test01()
644{
645  typedef messages&lt;char&gt;::catalog catalog;
646  const char* dir =
647  "/mnt/egcs/build/i686-pc-linux-gnu/libstdc++/po/share/locale";
648  const locale loc_de("de_DE");
649  const messages&lt;char&gt;&amp; mssg_de = use_facet&lt;messages&lt;char&gt; &gt;(loc_de);
650
651  catalog cat_de = mssg_de.open("libstdc++", loc_de, dir);
652  string s01 = mssg_de.get(cat_de, 0, 0, "please");
653  string s02 = mssg_de.get(cat_de, 0, 0, "thank you");
654  cout &lt;&lt; "please in german:" &lt;&lt; s01 &lt;&lt; '\n';
655  cout &lt;&lt; "thank you in german:" &lt;&lt; s02 &lt;&lt; '\n';
656  mssg_de.close(cat_de);
657}
658</pre></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.messages.future"></a>Future</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
659    Things that are sketchy, or remain unimplemented:
660  </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: circle; "><li class="listitem"><p>
661	  _M_convert_from_char, _M_convert_to_char are in flux,
662	  depending on how the library ends up doing character set
663	  conversions. It might not be possible to do a real character
664	  set based conversion, due to the fact that the template
665	  parameter for messages is not enough to instantiate the
666	  codecvt facet (1 supplied, need at least 2 but would prefer
667	  3).
668	</p></li><li class="listitem"><p>
669	  There are issues with gettext needing the global locale set
670	  to extract a message. This dependence on the global locale
671	  makes the current "gnu" model non MT-safe. Future versions
672	  of glibc, i.e. glibc 2.3.x will fix this, and the C++ library
673	  bits are already in place.
674	</p></li></ul></div></li><li class="listitem"><p>
675    Development versions of the GNU "C" library, glibc 2.3 will allow
676    a more efficient, MT implementation of std::messages, and will
677    allow the removal of the _M_name_messages data member. If this is
678    done, it will change the library ABI. The C++ parts to support
679    glibc 2.3 have already been coded, but are not in use: once this
680    version of the "C" library is released, the marked parts of the
681    messages implementation can be switched over to the new "C"
682    library functionality.
683  </p></li><li class="listitem"><p>
684    At some point in the near future, std::numpunct will probably use
685    std::messages facilities to implement truename/falsename
686    correctly. This is currently not done, but entries in
687    libstdc++.pot have already been made for "true" and "false" string
688    literals, so all that remains is the std::numpunct coding and the
689    configure/make hassles to make the installed library search its
690    own catalog. Currently the libstdc++.mo catalog is only searched
691    for the testsuite cases involving messages members.
692  </p></li><li class="listitem"><p> The following member functions:</p><p>
693   <code class="code">
694	catalog
695	open(const basic_string&lt;char&gt;&amp; __s, const locale&amp; __loc) const
696   </code>
697   </p><p>
698   <code class="code">
699   catalog
700   open(const basic_string&lt;char&gt;&amp;, const locale&amp;, const char*) const;
701   </code>
702   </p><p>
703   Don't actually return a "value less than 0 if no such catalog
704   can be opened" as required by the standard in the "gnu"
705   model. As of this writing, it is unknown how to query to see
706   if a specified message catalog exists using the gettext
707   package.
708   </p></li></ul></div></div><div class="bibliography"><div class="titlepage"><div><div><h4 class="title"><a id="facet.messages.biblio"></a>Bibliography</h4></div></div></div><div class="biblioentry"><a id="id-1.3.4.6.3.4.8.2"></a><p><span class="citetitle"><em class="citetitle">
709      The GNU C Library
710    </em>. </span><span class="author"><span class="firstname">Roland</span> <span class="surname">McGrath</span>. </span><span class="author"><span class="firstname">Ulrich</span> <span class="surname">Drepper</span>. </span><span class="copyright">Copyright �� 2007 FSF. </span><span class="pagenums">Chapters 6 Character Set Handling, and 7 Locales and Internationalization
711    . </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.4.8.3"></a><p><span class="citetitle"><em class="citetitle">
712      Correspondence
713    </em>. </span><span class="author"><span class="firstname">Ulrich</span> <span class="surname">Drepper</span>. </span><span class="copyright">Copyright �� 2002 . </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.4.8.4"></a><p><span class="citetitle"><em class="citetitle">
714      ISO/IEC 14882:1998 Programming languages - C++
715    </em>. </span><span class="copyright">Copyright �� 1998 ISO. </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.4.8.5"></a><p><span class="citetitle"><em class="citetitle">
716      ISO/IEC 9899:1999 Programming languages - C
717    </em>. </span><span class="copyright">Copyright �� 1999 ISO. </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.4.8.6"></a><p><span class="title"><em>
718	<a class="link" href="https://pubs.opengroup.org/onlinepubs/9699919799/" target="_top">
719      System Interface Definitions, Issue 7 (IEEE Std. 1003.1-2008)
720	</a>
721      </em>. </span><span class="copyright">Copyright �� 2008 
722	The Open Group/The Institute of Electrical and Electronics
723	Engineers, Inc.
724      . </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.4.8.7"></a><p><span class="citetitle"><em class="citetitle">
725      The C++ Programming Language, Special Edition
726    </em>. </span><span class="author"><span class="firstname">Bjarne</span> <span class="surname">Stroustrup</span>. </span><span class="copyright">Copyright �� 2000 Addison Wesley, Inc.. </span><span class="pagenums">Appendix D. </span><span class="publisher"><span class="publishername">
727	Addison Wesley
728      . </span></span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.4.8.8"></a><p><span class="citetitle"><em class="citetitle">
729      Standard C++ IOStreams and Locales
730    </em>. </span><span class="subtitle">
731      Advanced Programmer's Guide and Reference
732    . </span><span class="author"><span class="firstname">Angelika</span> <span class="surname">Langer</span>. </span><span class="author"><span class="firstname">Klaus</span> <span class="surname">Kreft</span>. </span><span class="copyright">Copyright �� 2000 Addison Wesley Longman, Inc.. </span><span class="publisher"><span class="publishername">
733	Addison Wesley Longman
734      . </span></span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.4.8.9"></a><p><span class="title"><em>
735	<a class="link" href="https://docs.oracle.com/en/java/" target="_top">
736	API Specifications, Java Platform
737	</a>
738      </em>. </span><span class="pagenums">java.util.Properties, java.text.MessageFormat,
739java.util.Locale, java.util.ResourceBundle
740    . </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.4.8.10"></a><p><span class="title"><em>
741	<a class="link" href="https://www.gnu.org/software/gettext/" target="_top">
742      GNU gettext tools, version 0.10.38, Native Language Support
743      Library and Tools.
744	</a>
745      </em>. </span></p></div></div></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="localization.html">Prev</a>��</td><td width="20%" align="center"><a accesskey="u" href="localization.html">Up</a></td><td width="40%" align="right">��<a accesskey="n" href="containers.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Chapter��8.��
746  Localization
747  
748��</td><td width="20%" align="center"><a accesskey="h" href="../index.html">Home</a></td><td width="40%" align="right" valign="top">��Chapter��9.��
749  Containers
750  
751</td></tr></table></div></body></html>