1<HTML>
2<HEAD>
3<!-- This HTML file has been created by texi2html 1.52b
4     from gettext.texi on 29 December 2011 -->
5
6<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">
7<TITLE>GNU gettext utilities - 11  The Programmer's View</TITLE>
8</HEAD>
9<BODY>
10Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_10.html">previous</A>, <A HREF="gettext_12.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
11<P><HR><P>
12
13
14<H1><A NAME="SEC178" HREF="gettext_toc.html#TOC178">11  The Programmer's View</A></H1>
15
16<P>
17One aim of the current message catalog implementation provided by
18GNU <CODE>gettext</CODE> was to use the system's message catalog handling, if the
19installer wishes to do so.  So we perhaps should first take a look at
20the solutions we know about.  The people in the POSIX committee did not
21manage to agree on one of the semi-official standards which we'll
22describe below.  In fact they couldn't agree on anything, so they decided
23only to include an example of an interface.  The major Unix vendors
24are split in the usage of the two most important specifications: X/Open's
25catgets vs. Uniforum's gettext interface.  We'll describe them both and
26later explain our solution of this dilemma.
27
28</P>
29
30
31
32<H2><A NAME="SEC179" HREF="gettext_toc.html#TOC179">11.1  About <CODE>catgets</CODE></A></H2>
33<P>
34<A NAME="IDX1027"></A>
35
36</P>
37<P>
38The <CODE>catgets</CODE> implementation is defined in the X/Open Portability
39Guide, Volume 3, XSI Supplementary Definitions, Chapter 5.  But the
40process of creating this standard seemed to be too slow for some of
41the Unix vendors so they created their implementations on preliminary
42versions of the standard.  Of course this leads again to problems while
43writing platform independent programs: even the usage of <CODE>catgets</CODE>
44does not guarantee a unique interface.
45
46</P>
47<P>
48Another, personal comment on this that only a bunch of committee members
49could have made this interface.  They never really tried to program
50using this interface.  It is a fast, memory-saving implementation, an
51user can happily live with it.  But programmers hate it (at least I and
52some others do...)
53
54</P>
55<P>
56But we must not forget one point: after all the trouble with transferring
57the rights on Unix(tm) they at last came to X/Open, the very same who
58published this specification.  This leads me to making the prediction
59that this interface will be in future Unix standards (e.g. Spec1170) and
60therefore part of all Unix implementation (implementations, which are
61<EM>allowed</EM> to wear this name).
62
63</P>
64
65
66
67<H3><A NAME="SEC180" HREF="gettext_toc.html#TOC180">11.1.1  The Interface</A></H3>
68<P>
69<A NAME="IDX1028"></A>
70
71</P>
72<P>
73The interface to the <CODE>catgets</CODE> implementation consists of three
74functions which correspond to those used in file access: <CODE>catopen</CODE>
75to open the catalog for using, <CODE>catgets</CODE> for accessing the message
76tables, and <CODE>catclose</CODE> for closing after work is done.  Prototypes
77for the functions and the needed definitions are in the
78<CODE>&#60;nl_types.h&#62;</CODE> header file.
79
80</P>
81<P>
82<A NAME="IDX1029"></A>
83<CODE>catopen</CODE> is used like in this:
84
85</P>
86
87<PRE>
88nl_catd catd = catopen ("catalog_name", 0);
89</PRE>
90
91<P>
92The function takes as the argument the name of the catalog.  This usual
93refers to the name of the program or the package.  The second parameter
94is not further specified in the standard.  I don't even know whether it
95is implemented consistently among various systems.  So the common advice
96is to use <CODE>0</CODE> as the value.  The return value is a handle to the
97message catalog, equivalent to handles to file returned by <CODE>open</CODE>.
98
99</P>
100<P>
101<A NAME="IDX1030"></A>
102This handle is of course used in the <CODE>catgets</CODE> function which can
103be used like this:
104
105</P>
106
107<PRE>
108char *translation = catgets (catd, set_no, msg_id, "original string");
109</PRE>
110
111<P>
112The first parameter is this catalog descriptor.  The second parameter
113specifies the set of messages in this catalog, in which the message
114described by <CODE>msg_id</CODE> is obtained.  <CODE>catgets</CODE> therefore uses a
115three-stage addressing:
116
117</P>
118
119<PRE>
120catalog name => set number => message ID => translation
121</PRE>
122
123<P>
124The fourth argument is not used to address the translation.  It is given
125as a default value in case when one of the addressing stages fail.  One
126important thing to remember is that although the return type of catgets
127is <CODE>char *</CODE> the resulting string <EM>must not</EM> be changed.  It
128should better be <CODE>const char *</CODE>, but the standard is published in
1291988, one year before ANSI C.
130
131</P>
132<P>
133<A NAME="IDX1031"></A>
134The last of these functions is used and behaves as expected:
135
136</P>
137
138<PRE>
139catclose (catd);
140</PRE>
141
142<P>
143After this no <CODE>catgets</CODE> call using the descriptor is legal anymore.
144
145</P>
146
147
148<H3><A NAME="SEC181" HREF="gettext_toc.html#TOC181">11.1.2  Problems with the <CODE>catgets</CODE> Interface?!</A></H3>
149<P>
150<A NAME="IDX1032"></A>
151
152</P>
153<P>
154Now that this description seemed to be really easy -- where are the
155problems we speak of?  In fact the interface could be used in a
156reasonable way, but constructing the message catalogs is a pain.  The
157reason for this lies in the third argument of <CODE>catgets</CODE>: the unique
158message ID.  This has to be a numeric value for all messages in a single
159set.  Perhaps you could imagine the problems keeping such a list while
160changing the source code.  Add a new message here, remove one there.  Of
161course there have been developed a lot of tools helping to organize this
162chaos but one as the other fails in one aspect or the other.  We don't
163want to say that the other approach has no problems but they are far
164more easy to manage.
165
166</P>
167
168
169<H2><A NAME="SEC182" HREF="gettext_toc.html#TOC182">11.2  About <CODE>gettext</CODE></A></H2>
170<P>
171<A NAME="IDX1033"></A>
172
173</P>
174<P>
175The definition of the <CODE>gettext</CODE> interface comes from a Uniforum
176proposal.  It was submitted there by Sun, who had implemented the
177<CODE>gettext</CODE> function in SunOS 4, around 1990.  Nowadays, the
178<CODE>gettext</CODE> interface is specified by the OpenI18N standard.
179
180</P>
181<P>
182The main point about this solution is that it does not follow the
183method of normal file handling (open-use-close) and that it does not
184burden the programmer with so many tasks, especially the unique key handling.
185Of course here also a unique key is needed, but this key is the message
186itself (how long or short it is).  See section <A HREF="gettext_11.html#SEC190">11.3  Comparing the Two Interfaces</A> for a more
187detailed comparison of the two methods.
188
189</P>
190<P>
191The following section contains a rather detailed description of the
192interface.  We make it that detailed because this is the interface
193we chose for the GNU <CODE>gettext</CODE> Library.  Programmers interested
194in using this library will be interested in this description.
195
196</P>
197
198
199
200<H3><A NAME="SEC183" HREF="gettext_toc.html#TOC183">11.2.1  The Interface</A></H3>
201<P>
202<A NAME="IDX1034"></A>
203
204</P>
205<P>
206The minimal functionality an interface must have is a) to select a
207domain the strings are coming from (a single domain for all programs is
208not reasonable because its construction and maintenance is difficult,
209perhaps impossible) and b) to access a string in a selected domain.
210
211</P>
212<P>
213This is principally the description of the <CODE>gettext</CODE> interface.  It
214has a global domain which unqualified usages reference.  Of course this
215domain is selectable by the user.
216
217</P>
218
219<PRE>
220char *textdomain (const char *domain_name);
221</PRE>
222
223<P>
224This provides the possibility to change or query the current status of
225the current global domain of the <CODE>LC_MESSAGE</CODE> category.  The
226argument is a null-terminated string, whose characters must be legal in
227the use in filenames.  If the <VAR>domain_name</VAR> argument is <CODE>NULL</CODE>,
228the function returns the current value.  If no value has been set
229before, the name of the default domain is returned: <EM>messages</EM>.
230Please note that although the return value of <CODE>textdomain</CODE> is of
231type <CODE>char *</CODE> no changing is allowed.  It is also important to know
232that no checks of the availability are made.  If the name is not
233available you will see this by the fact that no translations are provided.
234
235</P>
236<P>
237To use a domain set by <CODE>textdomain</CODE> the function
238
239</P>
240
241<PRE>
242char *gettext (const char *msgid);
243</PRE>
244
245<P>
246is to be used.  This is the simplest reasonable form one can imagine.
247The translation of the string <VAR>msgid</VAR> is returned if it is available
248in the current domain.  If it is not available, the argument itself is
249returned.  If the argument is <CODE>NULL</CODE> the result is undefined.
250
251</P>
252<P>
253One thing which should come into mind is that no explicit dependency to
254the used domain is given.  The current value of the domain is used.
255If this changes between two
256executions of the same <CODE>gettext</CODE> call in the program, both calls
257reference a different message catalog.
258
259</P>
260<P>
261For the easiest case, which is normally used in internationalized
262packages, once at the beginning of execution a call to <CODE>textdomain</CODE>
263is issued, setting the domain to a unique name, normally the package
264name.  In the following code all strings which have to be translated are
265filtered through the gettext function.  That's all, the package speaks
266your language.
267
268</P>
269
270
271<H3><A NAME="SEC184" HREF="gettext_toc.html#TOC184">11.2.2  Solving Ambiguities</A></H3>
272<P>
273<A NAME="IDX1035"></A>
274<A NAME="IDX1036"></A>
275<A NAME="IDX1037"></A>
276
277</P>
278<P>
279While this single name domain works well for most applications there
280might be the need to get translations from more than one domain.  Of
281course one could switch between different domains with calls to
282<CODE>textdomain</CODE>, but this is really not convenient nor is it fast.  A
283possible situation could be one case subject to discussion during this
284writing:  all
285error messages of functions in the set of common used functions should
286go into a separate domain <CODE>error</CODE>.  By this mean we would only need
287to translate them once.
288Another case are messages from a library, as these <EM>have</EM> to be
289independent of the current domain set by the application.
290
291</P>
292<P>
293For this reasons there are two more functions to retrieve strings:
294
295</P>
296
297<PRE>
298char *dgettext (const char *domain_name, const char *msgid);
299char *dcgettext (const char *domain_name, const char *msgid,
300                 int category);
301</PRE>
302
303<P>
304Both take an additional argument at the first place, which corresponds
305to the argument of <CODE>textdomain</CODE>.  The third argument of
306<CODE>dcgettext</CODE> allows to use another locale category but <CODE>LC_MESSAGES</CODE>.
307But I really don't know where this can be useful.  If the
308<VAR>domain_name</VAR> is <CODE>NULL</CODE> or <VAR>category</VAR> has an value beside
309the known ones, the result is undefined.  It should also be noted that
310this function is not part of the second known implementation of this
311function family, the one found in Solaris.
312
313</P>
314<P>
315A second ambiguity can arise by the fact, that perhaps more than one
316domain has the same name.  This can be solved by specifying where the
317needed message catalog files can be found.
318
319</P>
320
321<PRE>
322char *bindtextdomain (const char *domain_name,
323                      const char *dir_name);
324</PRE>
325
326<P>
327Calling this function binds the given domain to a file in the specified
328directory (how this file is determined follows below).  Especially a
329file in the systems default place is not favored against the specified
330file anymore (as it would be by solely using <CODE>textdomain</CODE>).  A
331<CODE>NULL</CODE> pointer for the <VAR>dir_name</VAR> parameter returns the binding
332associated with <VAR>domain_name</VAR>.  If <VAR>domain_name</VAR> itself is
333<CODE>NULL</CODE> nothing happens and a <CODE>NULL</CODE> pointer is returned.  Here
334again as for all the other functions is true that none of the return
335value must be changed!
336
337</P>
338<P>
339It is important to remember that relative path names for the
340<VAR>dir_name</VAR> parameter can be trouble.  Since the path is always
341computed relative to the current directory different results will be
342achieved when the program executes a <CODE>chdir</CODE> command.  Relative
343paths should always be avoided to avoid dependencies and
344unreliabilities.
345
346</P>
347
348
349<H3><A NAME="SEC185" HREF="gettext_toc.html#TOC185">11.2.3  Locating Message Catalog Files</A></H3>
350<P>
351<A NAME="IDX1038"></A>
352
353</P>
354<P>
355Because many different languages for many different packages have to be
356stored we need some way to add these information to file message catalog
357files.  The way usually used in Unix environments is have this encoding
358in the file name.  This is also done here.  The directory name given in
359<CODE>bindtextdomain</CODE>s second argument (or the default directory),
360followed by the name of the locale, the locale category, and the domain name
361are concatenated:
362
363</P>
364
365<PRE>
366<VAR>dir_name</VAR>/<VAR>locale</VAR>/LC_<VAR>category</VAR>/<VAR>domain_name</VAR>.mo
367</PRE>
368
369<P>
370The default value for <VAR>dir_name</VAR> is system specific.  For the GNU
371library, and for packages adhering to its conventions, it's:
372
373<PRE>
374/usr/local/share/locale
375</PRE>
376
377<P>
378<VAR>locale</VAR> is the name of the locale category which is designated by
379<CODE>LC_<VAR>category</VAR></CODE>.  For <CODE>gettext</CODE> and <CODE>dgettext</CODE> this
380<CODE>LC_<VAR>category</VAR></CODE> is always <CODE>LC_MESSAGES</CODE>.<A NAME="DOCF3" HREF="gettext_foot.html#FOOT3">(3)</A>
381The name of the locale category is determined through
382<CODE>setlocale (LC_<VAR>category</VAR>, NULL)</CODE>.
383<A NAME="DOCF4" HREF="gettext_foot.html#FOOT4">(4)</A>
384When using the function <CODE>dcgettext</CODE>, you can specify the locale category
385through the third argument.
386
387</P>
388
389
390<H3><A NAME="SEC186" HREF="gettext_toc.html#TOC186">11.2.4  How to specify the output character set <CODE>gettext</CODE> uses</A></H3>
391<P>
392<A NAME="IDX1039"></A>
393<A NAME="IDX1040"></A>
394
395</P>
396<P>
397<CODE>gettext</CODE> not only looks up a translation in a message catalog.  It
398also converts the translation on the fly to the desired output character
399set.  This is useful if the user is working in a different character set
400than the translator who created the message catalog, because it avoids
401distributing variants of message catalogs which differ only in the
402character set.
403
404</P>
405<P>
406The output character set is, by default, the value of <CODE>nl_langinfo
407(CODESET)</CODE>, which depends on the <CODE>LC_CTYPE</CODE> part of the current
408locale.  But programs which store strings in a locale independent way
409(e.g. UTF-8) can request that <CODE>gettext</CODE> and related functions
410return the translations in that encoding, by use of the
411<CODE>bind_textdomain_codeset</CODE> function.
412
413</P>
414<P>
415Note that the <VAR>msgid</VAR> argument to <CODE>gettext</CODE> is not subject to
416character set conversion.  Also, when <CODE>gettext</CODE> does not find a
417translation for <VAR>msgid</VAR>, it returns <VAR>msgid</VAR> unchanged --
418independently of the current output character set.  It is therefore
419recommended that all <VAR>msgid</VAR>s be US-ASCII strings.
420
421</P>
422<P>
423<DL>
424<DT><U>Function:</U> char * <B>bind_textdomain_codeset</B> <I>(const char *<VAR>domainname</VAR>, const char *<VAR>codeset</VAR>)</I>
425<DD><A NAME="IDX1041"></A>
426The <CODE>bind_textdomain_codeset</CODE> function can be used to specify the
427output character set for message catalogs for domain <VAR>domainname</VAR>.
428The <VAR>codeset</VAR> argument must be a valid codeset name which can be used
429for the <CODE>iconv_open</CODE> function, or a null pointer.
430
431</P>
432<P>
433If the <VAR>codeset</VAR> parameter is the null pointer,
434<CODE>bind_textdomain_codeset</CODE> returns the currently selected codeset
435for the domain with the name <VAR>domainname</VAR>.  It returns <CODE>NULL</CODE> if
436no codeset has yet been selected.
437
438</P>
439<P>
440The <CODE>bind_textdomain_codeset</CODE> function can be used several times. 
441If used multiple times with the same <VAR>domainname</VAR> argument, the
442later call overrides the settings made by the earlier one.
443
444</P>
445<P>
446The <CODE>bind_textdomain_codeset</CODE> function returns a pointer to a
447string containing the name of the selected codeset.  The string is
448allocated internally in the function and must not be changed by the
449user.  If the system went out of core during the execution of
450<CODE>bind_textdomain_codeset</CODE>, the return value is <CODE>NULL</CODE> and the
451global variable <VAR>errno</VAR> is set accordingly.
452</DL>
453
454</P>
455
456
457<H3><A NAME="SEC187" HREF="gettext_toc.html#TOC187">11.2.5  Using contexts for solving ambiguities</A></H3>
458<P>
459<A NAME="IDX1042"></A>
460<A NAME="IDX1043"></A>
461<A NAME="IDX1044"></A>
462<A NAME="IDX1045"></A>
463
464</P>
465<P>
466One place where the <CODE>gettext</CODE> functions, if used normally, have big
467problems is within programs with graphical user interfaces (GUIs).  The
468problem is that many of the strings which have to be translated are very
469short.  They have to appear in pull-down menus which restricts the
470length.  But strings which are not containing entire sentences or at
471least large fragments of a sentence may appear in more than one
472situation in the program but might have different translations.  This is
473especially true for the one-word strings which are frequently used in
474GUI programs.
475
476</P>
477<P>
478As a consequence many people say that the <CODE>gettext</CODE> approach is
479wrong and instead <CODE>catgets</CODE> should be used which indeed does not
480have this problem.  But there is a very simple and powerful method to
481handle this kind of problems with the <CODE>gettext</CODE> functions.
482
483</P>
484<P>
485Contexts can be added to strings to be translated.  A context dependent
486translation lookup is when a translation for a given string is searched,
487that is limited to a given context.  The translation for the same string
488in a different context can be different.  The different translations of
489the same string in different contexts can be stored in the in the same
490MO file, and can be edited by the translator in the same PO file.
491
492</P>
493<P>
494The <TT>&lsquo;gettext.h&rsquo;</TT> include file contains the lookup macros for strings
495with contexts.  They are implemented as thin macros and inline functions
496over the functions from <CODE>&#60;libintl.h&#62;</CODE>.
497
498</P>
499<P>
500<A NAME="IDX1046"></A>
501
502<PRE>
503const char *pgettext (const char *msgctxt, const char *msgid);
504</PRE>
505
506<P>
507In a call of this macro, <VAR>msgctxt</VAR> and <VAR>msgid</VAR> must be string
508literals.  The macro returns the translation of <VAR>msgid</VAR>, restricted
509to the context given by <VAR>msgctxt</VAR>.
510
511</P>
512<P>
513The <VAR>msgctxt</VAR> string is visible in the PO file to the translator.
514You should try to make it somehow canonical and never changing.  Because
515every time you change an <VAR>msgctxt</VAR>, the translator will have to review
516the translation of <VAR>msgid</VAR>.
517
518</P>
519<P>
520Finding a canonical <VAR>msgctxt</VAR> string that doesn't change over time can
521be hard.  But you shouldn't use the file name or class name containing the
522<CODE>pgettext</CODE> call -- because it is a common development task to rename
523a file or a class, and it shouldn't cause translator work.  Also you shouldn't
524use a comment in the form of a complete English sentence as <VAR>msgctxt</VAR> --
525because orthography or grammar changes are often applied to such sentences,
526and again, it shouldn't force the translator to do a review.
527
528</P>
529<P>
530The <SAMP>&lsquo;p&rsquo;</SAMP> in <SAMP>&lsquo;pgettext&rsquo;</SAMP> stands for ���particular���: <CODE>pgettext</CODE>
531fetches a particular translation of the <VAR>msgid</VAR>.
532
533</P>
534<P>
535<A NAME="IDX1047"></A>
536<A NAME="IDX1048"></A>
537
538<PRE>
539const char *dpgettext (const char *domain_name,
540                       const char *msgctxt, const char *msgid);
541const char *dcpgettext (const char *domain_name,
542                        const char *msgctxt, const char *msgid,
543                        int category);
544</PRE>
545
546<P>
547These are generalizations of <CODE>pgettext</CODE>.  They behave similarly to
548<CODE>dgettext</CODE> and <CODE>dcgettext</CODE>, respectively.  The <VAR>domain_name</VAR>
549argument defines the translation domain.  The <VAR>category</VAR> argument
550allows to use another locale category than <CODE>LC_MESSAGES</CODE>.
551
552</P>
553<P>
554As as example consider the following fictional situation.  A GUI program
555has a menu bar with the following entries:
556
557</P>
558
559<PRE>
560+------------+------------+--------------------------------------+
561| File       | Printer    |                                      |
562+------------+------------+--------------------------------------+
563| Open     | | Select   |
564| New      | | Open     |
565+----------+ | Connect  |
566             +----------+
567</PRE>
568
569<P>
570To have the strings <CODE>File</CODE>, <CODE>Printer</CODE>, <CODE>Open</CODE>,
571<CODE>New</CODE>, <CODE>Select</CODE>, and <CODE>Connect</CODE> translated there has to be
572at some point in the code a call to a function of the <CODE>gettext</CODE>
573family.  But in two places the string passed into the function would be
574<CODE>Open</CODE>.  The translations might not be the same and therefore we
575are in the dilemma described above.
576
577</P>
578<P>
579What distinguishes the two places is the menu path from the menu root to
580the particular menu entries:
581
582</P>
583
584<PRE>
585Menu|File
586Menu|Printer
587Menu|File|Open
588Menu|File|New
589Menu|Printer|Select
590Menu|Printer|Open
591Menu|Printer|Connect
592</PRE>
593
594<P>
595The context is thus the menu path without its last part.  So, the calls
596look like this:
597
598</P>
599
600<PRE>
601pgettext ("Menu|", "File")
602pgettext ("Menu|", "Printer")
603pgettext ("Menu|File|", "Open")
604pgettext ("Menu|File|", "New")
605pgettext ("Menu|Printer|", "Select")
606pgettext ("Menu|Printer|", "Open")
607pgettext ("Menu|Printer|", "Connect")
608</PRE>
609
610<P>
611Whether or not to use the <SAMP>&lsquo;|&rsquo;</SAMP> character at the end of the context is a
612matter of style.
613
614</P>
615<P>
616For more complex cases, where the <VAR>msgctxt</VAR> or <VAR>msgid</VAR> are not
617string literals, more general macros are available:
618
619</P>
620<P>
621<A NAME="IDX1049"></A>
622<A NAME="IDX1050"></A>
623<A NAME="IDX1051"></A>
624
625<PRE>
626const char *pgettext_expr (const char *msgctxt, const char *msgid);
627const char *dpgettext_expr (const char *domain_name,
628                            const char *msgctxt, const char *msgid);
629const char *dcpgettext_expr (const char *domain_name,
630                             const char *msgctxt, const char *msgid,
631                             int category);
632</PRE>
633
634<P>
635Here <VAR>msgctxt</VAR> and <VAR>msgid</VAR> can be arbitrary string-valued expressions.
636These macros are more general.  But in the case that both argument expressions
637are string literals, the macros without the <SAMP>&lsquo;_expr&rsquo;</SAMP> suffix are more
638efficient.
639
640</P>
641
642
643<H3><A NAME="SEC188" HREF="gettext_toc.html#TOC188">11.2.6  Additional functions for plural forms</A></H3>
644<P>
645<A NAME="IDX1052"></A>
646
647</P>
648<P>
649The functions of the <CODE>gettext</CODE> family described so far (and all the
650<CODE>catgets</CODE> functions as well) have one problem in the real world
651which have been neglected completely in all existing approaches.  What
652is meant here is the handling of plural forms.
653
654</P>
655<P>
656Looking through Unix source code before the time anybody thought about
657internationalization (and, sadly, even afterwards) one can often find
658code similar to the following:
659
660</P>
661
662<PRE>
663   printf ("%d file%s deleted", n, n == 1 ? "" : "s");
664</PRE>
665
666<P>
667After the first complaints from people internationalizing the code people
668either completely avoided formulations like this or used strings like
669<CODE>"file(s)"</CODE>.  Both look unnatural and should be avoided.  First
670tries to solve the problem correctly looked like this:
671
672</P>
673
674<PRE>
675   if (n == 1)
676     printf ("%d file deleted", n);
677   else
678     printf ("%d files deleted", n);
679</PRE>
680
681<P>
682But this does not solve the problem.  It helps languages where the
683plural form of a noun is not simply constructed by adding an
684���s���
685but that is all.  Once again people fell into the trap of believing the
686rules their language is using are universal.  But the handling of plural
687forms differs widely between the language families.  For example,
688Rafal Maszkowski <CODE>&#60;rzm@mat.uni.torun.pl&#62;</CODE> reports:
689
690</P>
691
692<BLOCKQUOTE>
693<P>
694In Polish we use e.g. plik (file) this way:
695
696<PRE>
6971 plik
6982,3,4 pliki
6995-21 pliko'w
70022-24 pliki
70125-31 pliko'w
702</PRE>
703
704<P>
705and so on (o' means 8859-2 oacute which should be rather okreska,
706similar to aogonek).
707</BLOCKQUOTE>
708
709<P>
710There are two things which can differ between languages (and even inside
711language families);
712
713</P>
714
715<UL>
716<LI>
717
718The form how plural forms are built differs.  This is a problem with
719languages which have many irregularities.  German, for instance, is a
720drastic case.  Though English and German are part of the same language
721family (Germanic), the almost regular forming of plural noun forms
722(appending an
723���s���)
724is hardly found in German.
725
726<LI>
727
728The number of plural forms differ.  This is somewhat surprising for
729those who only have experiences with Romanic and Germanic languages
730since here the number is the same (there are two).
731
732But other language families have only one form or many forms.  More
733information on this in an extra section.
734</UL>
735
736<P>
737The consequence of this is that application writers should not try to
738solve the problem in their code.  This would be localization since it is
739only usable for certain, hardcoded language environments.  Instead the
740extended <CODE>gettext</CODE> interface should be used.
741
742</P>
743<P>
744These extra functions are taking instead of the one key string two
745strings and a numerical argument.  The idea behind this is that using
746the numerical argument and the first string as a key, the implementation
747can select using rules specified by the translator the right plural
748form.  The two string arguments then will be used to provide a return
749value in case no message catalog is found (similar to the normal
750<CODE>gettext</CODE> behavior).  In this case the rules for Germanic language
751is used and it is assumed that the first string argument is the singular
752form, the second the plural form.
753
754</P>
755<P>
756This has the consequence that programs without language catalogs can
757display the correct strings only if the program itself is written using
758a Germanic language.  This is a limitation but since the GNU C library
759(as well as the GNU <CODE>gettext</CODE> package) are written as part of the
760GNU package and the coding standards for the GNU project require program
761being written in English, this solution nevertheless fulfills its
762purpose.
763
764</P>
765<P>
766<DL>
767<DT><U>Function:</U> char * <B>ngettext</B> <I>(const char *<VAR>msgid1</VAR>, const char *<VAR>msgid2</VAR>, unsigned long int <VAR>n</VAR>)</I>
768<DD><A NAME="IDX1053"></A>
769The <CODE>ngettext</CODE> function is similar to the <CODE>gettext</CODE> function
770as it finds the message catalogs in the same way.  But it takes two
771extra arguments.  The <VAR>msgid1</VAR> parameter must contain the singular
772form of the string to be converted.  It is also used as the key for the
773search in the catalog.  The <VAR>msgid2</VAR> parameter is the plural form.
774The parameter <VAR>n</VAR> is used to determine the plural form.  If no
775message catalog is found <VAR>msgid1</VAR> is returned if <CODE>n == 1</CODE>,
776otherwise <CODE>msgid2</CODE>.
777
778</P>
779<P>
780An example for the use of this function is:
781
782</P>
783
784<PRE>
785printf (ngettext ("%d file removed", "%d files removed", n), n);
786</PRE>
787
788<P>
789Please note that the numeric value <VAR>n</VAR> has to be passed to the
790<CODE>printf</CODE> function as well.  It is not sufficient to pass it only to
791<CODE>ngettext</CODE>.
792
793</P>
794<P>
795In the English singular case, the number -- always 1 -- can be replaced with
796"one":
797
798</P>
799
800<PRE>
801printf (ngettext ("One file removed", "%d files removed", n), n);
802</PRE>
803
804<P>
805This works because the <SAMP>&lsquo;printf&rsquo;</SAMP> function discards excess arguments that
806are not consumed by the format string.
807
808</P>
809<P>
810It is also possible to use this function when the strings don't contain a
811cardinal number:
812
813</P>
814
815<PRE>
816puts (ngettext ("Delete the selected file?",
817                "Delete the selected files?",
818                n));
819</PRE>
820
821<P>
822In this case the number <VAR>n</VAR> is only used to choose the plural form.
823</DL>
824
825</P>
826<P>
827<DL>
828<DT><U>Function:</U> char * <B>dngettext</B> <I>(const char *<VAR>domain</VAR>, const char *<VAR>msgid1</VAR>, const char *<VAR>msgid2</VAR>, unsigned long int <VAR>n</VAR>)</I>
829<DD><A NAME="IDX1054"></A>
830The <CODE>dngettext</CODE> is similar to the <CODE>dgettext</CODE> function in the
831way the message catalog is selected.  The difference is that it takes
832two extra parameter to provide the correct plural form.  These two
833parameters are handled in the same way <CODE>ngettext</CODE> handles them.
834</DL>
835
836</P>
837<P>
838<DL>
839<DT><U>Function:</U> char * <B>dcngettext</B> <I>(const char *<VAR>domain</VAR>, const char *<VAR>msgid1</VAR>, const char *<VAR>msgid2</VAR>, unsigned long int <VAR>n</VAR>, int <VAR>category</VAR>)</I>
840<DD><A NAME="IDX1055"></A>
841The <CODE>dcngettext</CODE> is similar to the <CODE>dcgettext</CODE> function in the
842way the message catalog is selected.  The difference is that it takes
843two extra parameter to provide the correct plural form.  These two
844parameters are handled in the same way <CODE>ngettext</CODE> handles them.
845</DL>
846
847</P>
848<P>
849Now, how do these functions solve the problem of the plural forms?
850Without the input of linguists (which was not available) it was not
851possible to determine whether there are only a few different forms in
852which plural forms are formed or whether the number can increase with
853every new supported language.
854
855</P>
856<P>
857Therefore the solution implemented is to allow the translator to specify
858the rules of how to select the plural form.  Since the formula varies
859with every language this is the only viable solution except for
860hardcoding the information in the code (which still would require the
861possibility of extensions to not prevent the use of new languages).
862
863</P>
864<P>
865<A NAME="IDX1056"></A>
866<A NAME="IDX1057"></A>
867<A NAME="IDX1058"></A>
868The information about the plural form selection has to be stored in the
869header entry of the PO file (the one with the empty <CODE>msgid</CODE> string).
870The plural form information looks like this:
871
872</P>
873
874<PRE>
875Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1;
876</PRE>
877
878<P>
879The <CODE>nplurals</CODE> value must be a decimal number which specifies how
880many different plural forms exist for this language.  The string
881following <CODE>plural</CODE> is an expression which is using the C language
882syntax.  Exceptions are that no negative numbers are allowed, numbers
883must be decimal, and the only variable allowed is <CODE>n</CODE>.  Spaces are
884allowed in the expression, but backslash-newlines are not; in the
885examples below the backslash-newlines are present for formatting purposes
886only.  This expression will be evaluated whenever one of the functions
887<CODE>ngettext</CODE>, <CODE>dngettext</CODE>, or <CODE>dcngettext</CODE> is called.  The
888numeric value passed to these functions is then substituted for all uses
889of the variable <CODE>n</CODE> in the expression.  The resulting value then
890must be greater or equal to zero and smaller than the value given as the
891value of <CODE>nplurals</CODE>.
892
893</P>
894<P>
895<A NAME="IDX1059"></A>
896The following rules are known at this point.  The language with families
897are listed.  But this does not necessarily mean the information can be
898generalized for the whole family (as can be easily seen in the table
899below).<A NAME="DOCF5" HREF="gettext_foot.html#FOOT5">(5)</A>
900
901</P>
902<DL COMPACT>
903
904<DT>Only one form:
905<DD>
906Some languages only require one single form.  There is no distinction
907between the singular and plural form.  An appropriate header entry
908would look like this:
909
910
911<PRE>
912Plural-Forms: nplurals=1; plural=0;
913</PRE>
914
915Languages with this property include:
916
917<DL COMPACT>
918
919<DT>Asian family
920<DD>
921Japanese, Korean, Vietnamese
922<DT>Turkic/Altaic family
923<DD>
924Turkish
925</DL>
926
927<DT>Two forms, singular used for one only
928<DD>
929This is the form used in most existing programs since it is what English
930is using.  A header entry would look like this:
931
932
933<PRE>
934Plural-Forms: nplurals=2; plural=n != 1;
935</PRE>
936
937(Note: this uses the feature of C expressions that boolean expressions
938have to value zero or one.)
939
940Languages with this property include:
941
942<DL COMPACT>
943
944<DT>Germanic family
945<DD>
946Danish, Dutch, English, Faroese, German, Norwegian, Swedish
947<DT>Finno-Ugric family
948<DD>
949Estonian, Finnish
950<DT>Latin/Greek family
951<DD>
952Greek
953<DT>Semitic family
954<DD>
955Hebrew
956<DT>Romanic family
957<DD>
958Italian, Portuguese, Spanish
959<DT>Artificial
960<DD>
961Esperanto
962</DL>
963
964Another language using the same header entry is:
965
966<DL COMPACT>
967
968<DT>Finno-Ugric family
969<DD>
970Hungarian
971</DL>
972
973Hungarian does not appear to have a plural if you look at sentences involving
974cardinal numbers.  For example, ���1 apple��� is ���1 alma���, and ���123 apples��� is
975���123 alma���.  But when the number is not explicit, the distinction between
976singular and plural exists: ���the apple��� is ���az alma���, and ���the apples��� is
977���az alm'{a}k���.  Since <CODE>ngettext</CODE> has to support both types of sentences,
978it is classified here, under ���two forms���.
979
980<DT>Two forms, singular used for zero and one
981<DD>
982Exceptional case in the language family.  The header entry would be:
983
984
985<PRE>
986Plural-Forms: nplurals=2; plural=n&#62;1;
987</PRE>
988
989Languages with this property include:
990
991<DL COMPACT>
992
993<DT>Romanic family
994<DD>
995French, Brazilian Portuguese
996</DL>
997
998<DT>Three forms, special case for zero
999<DD>
1000The header entry would be:
1001
1002
1003<PRE>
1004Plural-Forms: nplurals=3; plural=n%10==1 &#38;&#38; n%100!=11 ? 0 : n != 0 ? 1 : 2;
1005</PRE>
1006
1007Languages with this property include:
1008
1009<DL COMPACT>
1010
1011<DT>Baltic family
1012<DD>
1013Latvian
1014</DL>
1015
1016<DT>Three forms, special cases for one and two
1017<DD>
1018The header entry would be:
1019
1020
1021<PRE>
1022Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2;
1023</PRE>
1024
1025Languages with this property include:
1026
1027<DL COMPACT>
1028
1029<DT>Celtic
1030<DD>
1031Gaeilge (Irish)
1032</DL>
1033
1034<DT>Three forms, special case for numbers ending in 00 or [2-9][0-9]
1035<DD>
1036The header entry would be:
1037
1038
1039<PRE>
1040Plural-Forms: nplurals=3; \
1041    plural=n==1 ? 0 : (n==0 || (n%100 &#62; 0 &#38;&#38; n%100 &#60; 20)) ? 1 : 2;
1042</PRE>
1043
1044Languages with this property include:
1045
1046<DL COMPACT>
1047
1048<DT>Romanic family
1049<DD>
1050Romanian
1051</DL>
1052
1053<DT>Three forms, special case for numbers ending in 1[2-9]
1054<DD>
1055The header entry would look like this:
1056
1057
1058<PRE>
1059Plural-Forms: nplurals=3; \
1060    plural=n%10==1 &#38;&#38; n%100!=11 ? 0 : \
1061           n%10&#62;=2 &#38;&#38; (n%100&#60;10 || n%100&#62;=20) ? 1 : 2;
1062</PRE>
1063
1064Languages with this property include:
1065
1066<DL COMPACT>
1067
1068<DT>Baltic family
1069<DD>
1070Lithuanian
1071</DL>
1072
1073<DT>Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4]
1074<DD>
1075The header entry would look like this:
1076
1077
1078<PRE>
1079Plural-Forms: nplurals=3; \
1080    plural=n%10==1 &#38;&#38; n%100!=11 ? 0 : \
1081           n%10&#62;=2 &#38;&#38; n%10&#60;=4 &#38;&#38; (n%100&#60;10 || n%100&#62;=20) ? 1 : 2;
1082</PRE>
1083
1084Languages with this property include:
1085
1086<DL COMPACT>
1087
1088<DT>Slavic family
1089<DD>
1090Croatian, Serbian, Russian, Ukrainian
1091</DL>
1092
1093<DT>Three forms, special cases for 1 and 2, 3, 4
1094<DD>
1095The header entry would look like this:
1096
1097
1098<PRE>
1099Plural-Forms: nplurals=3; \
1100    plural=(n==1) ? 0 : (n&#62;=2 &#38;&#38; n&#60;=4) ? 1 : 2;
1101</PRE>
1102
1103Languages with this property include:
1104
1105<DL COMPACT>
1106
1107<DT>Slavic family
1108<DD>
1109Slovak, Czech
1110</DL>
1111
1112<DT>Three forms, special case for one and some numbers ending in 2, 3, or 4
1113<DD>
1114The header entry would look like this:
1115
1116
1117<PRE>
1118Plural-Forms: nplurals=3; \
1119    plural=n==1 ? 0 : \
1120           n%10&#62;=2 &#38;&#38; n%10&#60;=4 &#38;&#38; (n%100&#60;10 || n%100&#62;=20) ? 1 : 2;
1121</PRE>
1122
1123Languages with this property include:
1124
1125<DL COMPACT>
1126
1127<DT>Slavic family
1128<DD>
1129Polish
1130</DL>
1131
1132<DT>Four forms, special case for one and all numbers ending in 02, 03, or 04
1133<DD>
1134The header entry would look like this:
1135
1136
1137<PRE>
1138Plural-Forms: nplurals=4; \
1139    plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3;
1140</PRE>
1141
1142Languages with this property include:
1143
1144<DL COMPACT>
1145
1146<DT>Slavic family
1147<DD>
1148Slovenian
1149</DL>
1150</DL>
1151
1152<P>
1153You might now ask, <CODE>ngettext</CODE> handles only numbers <VAR>n</VAR> of type
1154<SAMP>&lsquo;unsigned long&rsquo;</SAMP>.  What about larger integer types?  What about negative
1155numbers?  What about floating-point numbers?
1156
1157</P>
1158<P>
1159About larger integer types, such as <SAMP>&lsquo;uintmax_t&rsquo;</SAMP> or 
1160<SAMP>&lsquo;unsigned long long&rsquo;</SAMP>: they can be handled by reducing the value to a
1161range that fits in an <SAMP>&lsquo;unsigned long&rsquo;</SAMP>.  Simply casting the value to
1162<SAMP>&lsquo;unsigned long&rsquo;</SAMP> would not do the right thing, since it would treat
1163<CODE>ULONG_MAX + 1</CODE> like zero, <CODE>ULONG_MAX + 2</CODE> like singular, and
1164the like.  Here you can exploit the fact that all mentioned plural form
1165formulas eventually become periodic, with a period that is a divisor of 100
1166(or 1000 or 1000000).  So, when you reduce a large value to another one in
1167the range [1000000, 1999999] that ends in the same 6 decimal digits, you
1168can assume that it will lead to the same plural form selection.  This code
1169does this:
1170
1171</P>
1172
1173<PRE>
1174#include &#60;inttypes.h&#62;
1175uintmax_t nbytes = ...;
1176printf (ngettext ("The file has %"PRIuMAX" byte.",
1177                  "The file has %"PRIuMAX" bytes.",
1178                  (nbytes &#62; ULONG_MAX
1179                   ? (nbytes % 1000000) + 1000000
1180                   : nbytes)),
1181        nbytes);
1182</PRE>
1183
1184<P>
1185Negative and floating-point values usually represent physical entities for
1186which singular and plural don't clearly apply.  In such cases, there is no
1187need to use <CODE>ngettext</CODE>; a simple <CODE>gettext</CODE> call with a form suitable
1188for all values will do.  For example:
1189
1190</P>
1191
1192<PRE>
1193printf (gettext ("Time elapsed: %.3f seconds"),
1194        num_milliseconds * 0.001);
1195</PRE>
1196
1197<P>
1198Even if <VAR>num_milliseconds</VAR> happens to be a multiple of 1000, the output
1199
1200<PRE>
1201Time elapsed: 1.000 seconds
1202</PRE>
1203
1204<P>
1205is acceptable in English, and similarly for other languages.
1206
1207</P>
1208
1209
1210<H3><A NAME="SEC189" HREF="gettext_toc.html#TOC189">11.2.7  Optimization of the *gettext functions</A></H3>
1211<P>
1212<A NAME="IDX1060"></A>
1213
1214</P>
1215<P>
1216At this point of the discussion we should talk about an advantage of the
1217GNU <CODE>gettext</CODE> implementation.  Some readers might have pointed out
1218that an internationalized program might have a poor performance if some
1219string has to be translated in an inner loop.  While this is unavoidable
1220when the string varies from one run of the loop to the other it is
1221simply a waste of time when the string is always the same.  Take the
1222following example:
1223
1224</P>
1225
1226<PRE>
1227{
1228  while (...)
1229    {
1230      puts (gettext ("Hello world"));
1231    }
1232}
1233</PRE>
1234
1235<P>
1236When the locale selection does not change between two runs the resulting
1237string is always the same.  One way to use this is:
1238
1239</P>
1240
1241<PRE>
1242{
1243  str = gettext ("Hello world");
1244  while (...)
1245    {
1246      puts (str);
1247    }
1248}
1249</PRE>
1250
1251<P>
1252But this solution is not usable in all situation (e.g. when the locale
1253selection changes) nor does it lead to legible code.
1254
1255</P>
1256<P>
1257For this reason, GNU <CODE>gettext</CODE> caches previous translation results.
1258When the same translation is requested twice, with no new message
1259catalogs being loaded in between, <CODE>gettext</CODE> will, the second time,
1260find the result through a single cache lookup.
1261
1262</P>
1263
1264
1265<H2><A NAME="SEC190" HREF="gettext_toc.html#TOC190">11.3  Comparing the Two Interfaces</A></H2>
1266<P>
1267<A NAME="IDX1061"></A>
1268<A NAME="IDX1062"></A>
1269
1270</P>
1271
1272<P>
1273The following discussion is perhaps a little bit colored.  As said
1274above we implemented GNU <CODE>gettext</CODE> following the Uniforum
1275proposal and this surely has its reasons.  But it should show how we
1276came to this decision.
1277
1278</P>
1279<P>
1280First we take a look at the developing process.  When we write an
1281application using NLS provided by <CODE>gettext</CODE> we proceed as always.
1282Only when we come to a string which might be seen by the users and thus
1283has to be translated we use <CODE>gettext("...")</CODE> instead of
1284<CODE>"..."</CODE>.  At the beginning of each source file (or in a central
1285header file) we define
1286
1287</P>
1288
1289<PRE>
1290#define gettext(String) (String)
1291</PRE>
1292
1293<P>
1294Even this definition can be avoided when the system supports the
1295<CODE>gettext</CODE> function in its C library.  When we compile this code the
1296result is the same as if no NLS code is used.  When  you take a look at
1297the GNU <CODE>gettext</CODE> code you will see that we use <CODE>_("...")</CODE>
1298instead of <CODE>gettext("...")</CODE>.  This reduces the number of
1299additional characters per translatable string to <EM>3</EM> (in words:
1300three).
1301
1302</P>
1303<P>
1304When now a production version of the program is needed we simply replace
1305the definition
1306
1307</P>
1308
1309<PRE>
1310#define _(String) (String)
1311</PRE>
1312
1313<P>
1314by
1315
1316</P>
1317<P>
1318<A NAME="IDX1063"></A>
1319
1320<PRE>
1321#include &#60;libintl.h&#62;
1322#define _(String) gettext (String)
1323</PRE>
1324
1325<P>
1326Additionally we run the program <TT>&lsquo;xgettext&rsquo;</TT> on all source code file
1327which contain translatable strings and that's it: we have a running
1328program which does not depend on translations to be available, but which
1329can use any that becomes available.
1330
1331</P>
1332<P>
1333<A NAME="IDX1064"></A>
1334The same procedure can be done for the <CODE>gettext_noop</CODE> invocations
1335(see section <A HREF="gettext_4.html#SEC23">4.7  Special Cases of Translatable Strings</A>).  One usually defines <CODE>gettext_noop</CODE> as a
1336no-op macro.  So you should consider the following code for your project:
1337
1338</P>
1339
1340<PRE>
1341#define gettext_noop(String) String
1342#define N_(String) gettext_noop (String)
1343</PRE>
1344
1345<P>
1346<CODE>N_</CODE> is a short form similar to <CODE>_</CODE>.  The <TT>&lsquo;Makefile&rsquo;</TT> in
1347the <TT>&lsquo;po/&rsquo;</TT> directory of GNU <CODE>gettext</CODE> knows by default both of the
1348mentioned short forms so you are invited to follow this proposal for
1349your own ease.
1350
1351</P>
1352<P>
1353Now to <CODE>catgets</CODE>.  The main problem is the work for the
1354programmer.  Every time he comes to a translatable string he has to
1355define a number (or a symbolic constant) which has also be defined in
1356the message catalog file.  He also has to take care for duplicate
1357entries, duplicate message IDs etc.  If he wants to have the same
1358quality in the message catalog as the GNU <CODE>gettext</CODE> program
1359provides he also has to put the descriptive comments for the strings and
1360the location in all source code files in the message catalog.  This is
1361nearly a Mission: Impossible.
1362
1363</P>
1364<P>
1365But there are also some points people might call advantages speaking for
1366<CODE>catgets</CODE>.  If you have a single word in a string and this string
1367is used in different contexts it is likely that in one or the other
1368language the word has different translations.  Example:
1369
1370</P>
1371
1372<PRE>
1373printf ("%s: %d", gettext ("number"), number_of_errors)
1374
1375printf ("you should see %d %s", number_count,
1376        number_count == 1 ? gettext ("number") : gettext ("numbers"))
1377</PRE>
1378
1379<P>
1380Here we have to translate two times the string <CODE>"number"</CODE>.  Even
1381if you do not speak a language beside English it might be possible to
1382recognize that the two words have a different meaning.  In German the
1383first appearance has to be translated to <CODE>"Anzahl"</CODE> and the second
1384to <CODE>"Zahl"</CODE>.
1385
1386</P>
1387<P>
1388Now you can say that this example is really esoteric.  And you are
1389right!  This is exactly how we felt about this problem and decide that
1390it does not weight that much.  The solution for the above problem could
1391be very easy:
1392
1393</P>
1394
1395<PRE>
1396printf ("%s %d", gettext ("number:"), number_of_errors)
1397
1398printf (number_count == 1 ? gettext ("you should see %d number")
1399                          : gettext ("you should see %d numbers"),
1400        number_count)
1401</PRE>
1402
1403<P>
1404We believe that we can solve all conflicts with this method.  If it is
1405difficult one can also consider changing one of the conflicting string a
1406little bit.  But it is not impossible to overcome.
1407
1408</P>
1409<P>
1410<CODE>catgets</CODE> allows same original entry to have different translations,
1411but <CODE>gettext</CODE> has another, scalable approach for solving ambiguities
1412of this kind: See section <A HREF="gettext_11.html#SEC184">11.2.2  Solving Ambiguities</A>.
1413
1414</P>
1415
1416
1417<H2><A NAME="SEC191" HREF="gettext_toc.html#TOC191">11.4  Using libintl.a in own programs</A></H2>
1418
1419<P>
1420Starting with version 0.9.4 the library <CODE>libintl.h</CODE> should be
1421self-contained.  I.e., you can use it in your own programs without
1422providing additional functions.  The <TT>&lsquo;Makefile&rsquo;</TT> will put the header
1423and the library in directories selected using the <CODE>$(prefix)</CODE>.
1424
1425</P>
1426
1427
1428<H2><A NAME="SEC192" HREF="gettext_toc.html#TOC192">11.5  Being a <CODE>gettext</CODE> grok</A></H2>
1429
1430<P>
1431<STRONG> NOTE: </STRONG> This documentation section is outdated and needs to be
1432revised.
1433
1434</P>
1435<P>
1436To fully exploit the functionality of the GNU <CODE>gettext</CODE> library it
1437is surely helpful to read the source code.  But for those who don't want
1438to spend that much time in reading the (sometimes complicated) code here
1439is a list comments:
1440
1441</P>
1442
1443<UL>
1444<LI>Changing the language at runtime
1445
1446<A NAME="IDX1065"></A>
1447
1448For interactive programs it might be useful to offer a selection of the
1449used language at runtime.  To understand how to do this one need to know
1450how the used language is determined while executing the <CODE>gettext</CODE>
1451function.  The method which is presented here only works correctly
1452with the GNU implementation of the <CODE>gettext</CODE> functions.
1453
1454In the function <CODE>dcgettext</CODE> at every call the current setting of
1455the highest priority environment variable is determined and used.
1456Highest priority means here the following list with decreasing
1457priority:
1458
1459
1460<OL>
1461<LI><CODE>LANGUAGE</CODE>
1462
1463<A NAME="IDX1066"></A>
1464 
1465<A NAME="IDX1067"></A>
1466<LI><CODE>LC_ALL</CODE>
1467
1468<A NAME="IDX1068"></A>
1469<A NAME="IDX1069"></A>
1470<A NAME="IDX1070"></A>
1471<A NAME="IDX1071"></A>
1472<A NAME="IDX1072"></A>
1473<A NAME="IDX1073"></A>
1474<LI><CODE>LC_xxx</CODE>, according to selected locale category
1475
1476<A NAME="IDX1074"></A>
1477<LI><CODE>LANG</CODE>
1478
1479</OL>
1480
1481Afterwards the path is constructed using the found value and the
1482translation file is loaded if available.
1483
1484What happens now when the value for, say, <CODE>LANGUAGE</CODE> changes?  According
1485to the process explained above the new value of this variable is found
1486as soon as the <CODE>dcgettext</CODE> function is called.  But this also means
1487the (perhaps) different message catalog file is loaded.  In other
1488words: the used language is changed.
1489
1490But there is one little hook.  The code for gcc-2.7.0 and up provides
1491some optimization.  This optimization normally prevents the calling of
1492the <CODE>dcgettext</CODE> function as long as no new catalog is loaded.  But
1493if <CODE>dcgettext</CODE> is not called the program also cannot find the
1494<CODE>LANGUAGE</CODE> variable be changed (see section <A HREF="gettext_11.html#SEC189">11.2.7  Optimization of the *gettext functions</A>).  A
1495solution for this is very easy.  Include the following code in the
1496language switching function.
1497
1498
1499<PRE>
1500  /* Change language.  */
1501  setenv ("LANGUAGE", "fr", 1);
1502
1503  /* Make change known.  */
1504  {
1505    extern int  _nl_msg_cat_cntr;
1506    ++_nl_msg_cat_cntr;
1507  }
1508</PRE>
1509
1510<A NAME="IDX1075"></A>
1511The variable <CODE>_nl_msg_cat_cntr</CODE> is defined in <TT>&lsquo;loadmsgcat.c&rsquo;</TT>.
1512You don't need to know what this is for.  But it can be used to detect
1513whether a <CODE>gettext</CODE> implementation is GNU gettext and not non-GNU
1514system's native gettext implementation.
1515
1516</UL>
1517
1518
1519
1520<H2><A NAME="SEC193" HREF="gettext_toc.html#TOC193">11.6  Temporary Notes for the Programmers Chapter</A></H2>
1521
1522<P>
1523<STRONG> NOTE: </STRONG> This documentation section is outdated and needs to be
1524revised.
1525
1526</P>
1527
1528
1529
1530<H3><A NAME="SEC194" HREF="gettext_toc.html#TOC194">11.6.1  Temporary - Two Possible Implementations</A></H3>
1531
1532<P>
1533There are two competing methods for language independent messages:
1534the X/Open <CODE>catgets</CODE> method, and the Uniforum <CODE>gettext</CODE>
1535method.  The <CODE>catgets</CODE> method indexes messages by integers; the
1536<CODE>gettext</CODE> method indexes them by their English translations.
1537The <CODE>catgets</CODE> method has been around longer and is supported
1538by more vendors.  The <CODE>gettext</CODE> method is supported by Sun,
1539and it has been heard that the COSE multi-vendor initiative is
1540supporting it.  Neither method is a POSIX standard; the POSIX.1
1541committee had a lot of disagreement in this area.
1542
1543</P>
1544<P>
1545Neither one is in the POSIX standard.  There was much disagreement
1546in the POSIX.1 committee about using the <CODE>gettext</CODE> routines
1547vs. <CODE>catgets</CODE> (XPG).  In the end the committee couldn't
1548agree on anything, so no messaging system was included as part
1549of the standard.  I believe the informative annex of the standard
1550includes the XPG3 messaging interfaces, ���...as an example of
1551a messaging system that has been implemented...���
1552
1553</P>
1554<P>
1555They were very careful not to say anywhere that you should use one
1556set of interfaces over the other.  For more on this topic please
1557see the Programming for Internationalization FAQ.
1558
1559</P>
1560
1561
1562<H3><A NAME="SEC195" HREF="gettext_toc.html#TOC195">11.6.2  Temporary - About <CODE>catgets</CODE></A></H3>
1563
1564<P>
1565There have been a few discussions of late on the use of
1566<CODE>catgets</CODE> as a base.  I think it important to present both
1567sides of the argument and hence am opting to play devil's advocate
1568for a little bit.
1569
1570</P>
1571<P>
1572I'll not deny the fact that <CODE>catgets</CODE> could have been designed
1573a lot better.  It currently has quite a number of limitations and
1574these have already been pointed out.
1575
1576</P>
1577<P>
1578However there is a great deal to be said for consistency and
1579standardization.  A common recurring problem when writing Unix
1580software is the myriad portability problems across Unix platforms.
1581It seems as if every Unix vendor had a look at the operating system
1582and found parts they could improve upon.  Undoubtedly, these
1583modifications are probably innovative and solve real problems.
1584However, software developers have a hard time keeping up with all
1585these changes across so many platforms.
1586
1587</P>
1588<P>
1589And this has prompted the Unix vendors to begin to standardize their
1590systems.  Hence the impetus for Spec1170.  Every major Unix vendor
1591has committed to supporting this standard and every Unix software
1592developer waits with glee the day they can write software to this
1593standard and simply recompile (without having to use autoconf)
1594across different platforms.
1595
1596</P>
1597<P>
1598As I understand it, Spec1170 is roughly based upon version 4 of the
1599X/Open Portability Guidelines (XPG4).  Because <CODE>catgets</CODE> and
1600friends are defined in XPG4, I'm led to believe that <CODE>catgets</CODE>
1601is a part of Spec1170 and hence will become a standardized component
1602of all Unix systems.
1603
1604</P>
1605
1606
1607<H3><A NAME="SEC196" HREF="gettext_toc.html#TOC196">11.6.3  Temporary - Why a single implementation</A></H3>
1608
1609<P>
1610Now it seems kind of wasteful to me to have two different systems
1611installed for accessing message catalogs.  If we do want to remedy
1612<CODE>catgets</CODE> deficiencies why don't we try to expand <CODE>catgets</CODE>
1613(in a compatible manner) rather than implement an entirely new system.
1614Otherwise, we'll end up with two message catalog access systems installed
1615with an operating system - one set of routines for packages using GNU
1616<CODE>gettext</CODE> for their internationalization, and another set of routines
1617(catgets) for all other software.  Bloated?
1618
1619</P>
1620<P>
1621Supposing another catalog access system is implemented.  Which do
1622we recommend?  At least for Linux, we need to attract as many
1623software developers as possible.  Hence we need to make it as easy
1624for them to port their software as possible.  Which means supporting
1625<CODE>catgets</CODE>.  We will be implementing the <CODE>libintl</CODE> code
1626within our <CODE>libc</CODE>, but does this mean we also have to incorporate
1627another message catalog access scheme within our <CODE>libc</CODE> as well?
1628And what about people who are going to be using the <CODE>libintl</CODE>
1629+ non-<CODE>catgets</CODE> routines.  When they port their software to
1630other platforms, they're now going to have to include the front-end
1631(<CODE>libintl</CODE>) code plus the back-end code (the non-<CODE>catgets</CODE>
1632access routines) with their software instead of just including the
1633<CODE>libintl</CODE> code with their software.
1634
1635</P>
1636<P>
1637Message catalog support is however only the tip of the iceberg.
1638What about the data for the other locale categories?  They also have
1639a number of deficiencies.  Are we going to abandon them as well and
1640develop another duplicate set of routines (should <CODE>libintl</CODE>
1641expand beyond message catalog support)?
1642
1643</P>
1644<P>
1645Like many parts of Unix that can be improved upon, we're stuck with balancing
1646compatibility with the past with useful improvements and innovations for
1647the future.
1648
1649</P>
1650
1651
1652<H3><A NAME="SEC197" HREF="gettext_toc.html#TOC197">11.6.4  Temporary - Notes</A></H3>
1653
1654<P>
1655X/Open agreed very late on the standard form so that many
1656implementations differ from the final form.  Both of my system (old
1657Linux catgets and Ultrix-4) have a strange variation.
1658
1659</P>
1660<P>
1661OK.  After incorporating the last changes I have to spend some time on
1662making the GNU/Linux <CODE>libc</CODE> <CODE>gettext</CODE> functions.  So in future
1663Solaris is not the only system having <CODE>gettext</CODE>.
1664
1665</P>
1666<P><HR><P>
1667Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_10.html">previous</A>, <A HREF="gettext_12.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
1668</BODY>
1669</HTML>
1670