1<HTML> 2<HEAD> 3<!-- This HTML file has been created by texi2html 1.52b 4 from gettext.texi on 29 December 2011 --> 5 6<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8"> 7<TITLE>GNU gettext utilities - 11 The Programmer's View</TITLE> 8</HEAD> 9<BODY> 10Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_10.html">previous</A>, <A HREF="gettext_12.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. 11<P><HR><P> 12 13 14<H1><A NAME="SEC178" HREF="gettext_toc.html#TOC178">11 The Programmer's View</A></H1> 15 16<P> 17One aim of the current message catalog implementation provided by 18GNU <CODE>gettext</CODE> was to use the system's message catalog handling, if the 19installer wishes to do so. So we perhaps should first take a look at 20the solutions we know about. The people in the POSIX committee did not 21manage to agree on one of the semi-official standards which we'll 22describe below. In fact they couldn't agree on anything, so they decided 23only to include an example of an interface. The major Unix vendors 24are split in the usage of the two most important specifications: X/Open's 25catgets vs. Uniforum's gettext interface. We'll describe them both and 26later explain our solution of this dilemma. 27 28</P> 29 30 31 32<H2><A NAME="SEC179" HREF="gettext_toc.html#TOC179">11.1 About <CODE>catgets</CODE></A></H2> 33<P> 34<A NAME="IDX1027"></A> 35 36</P> 37<P> 38The <CODE>catgets</CODE> implementation is defined in the X/Open Portability 39Guide, Volume 3, XSI Supplementary Definitions, Chapter 5. But the 40process of creating this standard seemed to be too slow for some of 41the Unix vendors so they created their implementations on preliminary 42versions of the standard. Of course this leads again to problems while 43writing platform independent programs: even the usage of <CODE>catgets</CODE> 44does not guarantee a unique interface. 45 46</P> 47<P> 48Another, personal comment on this that only a bunch of committee members 49could have made this interface. They never really tried to program 50using this interface. It is a fast, memory-saving implementation, an 51user can happily live with it. But programmers hate it (at least I and 52some others do...) 53 54</P> 55<P> 56But we must not forget one point: after all the trouble with transferring 57the rights on Unix(tm) they at last came to X/Open, the very same who 58published this specification. This leads me to making the prediction 59that this interface will be in future Unix standards (e.g. Spec1170) and 60therefore part of all Unix implementation (implementations, which are 61<EM>allowed</EM> to wear this name). 62 63</P> 64 65 66 67<H3><A NAME="SEC180" HREF="gettext_toc.html#TOC180">11.1.1 The Interface</A></H3> 68<P> 69<A NAME="IDX1028"></A> 70 71</P> 72<P> 73The interface to the <CODE>catgets</CODE> implementation consists of three 74functions which correspond to those used in file access: <CODE>catopen</CODE> 75to open the catalog for using, <CODE>catgets</CODE> for accessing the message 76tables, and <CODE>catclose</CODE> for closing after work is done. Prototypes 77for the functions and the needed definitions are in the 78<CODE><nl_types.h></CODE> header file. 79 80</P> 81<P> 82<A NAME="IDX1029"></A> 83<CODE>catopen</CODE> is used like in this: 84 85</P> 86 87<PRE> 88nl_catd catd = catopen ("catalog_name", 0); 89</PRE> 90 91<P> 92The function takes as the argument the name of the catalog. This usual 93refers to the name of the program or the package. The second parameter 94is not further specified in the standard. I don't even know whether it 95is implemented consistently among various systems. So the common advice 96is to use <CODE>0</CODE> as the value. The return value is a handle to the 97message catalog, equivalent to handles to file returned by <CODE>open</CODE>. 98 99</P> 100<P> 101<A NAME="IDX1030"></A> 102This handle is of course used in the <CODE>catgets</CODE> function which can 103be used like this: 104 105</P> 106 107<PRE> 108char *translation = catgets (catd, set_no, msg_id, "original string"); 109</PRE> 110 111<P> 112The first parameter is this catalog descriptor. The second parameter 113specifies the set of messages in this catalog, in which the message 114described by <CODE>msg_id</CODE> is obtained. <CODE>catgets</CODE> therefore uses a 115three-stage addressing: 116 117</P> 118 119<PRE> 120catalog name => set number => message ID => translation 121</PRE> 122 123<P> 124The fourth argument is not used to address the translation. It is given 125as a default value in case when one of the addressing stages fail. One 126important thing to remember is that although the return type of catgets 127is <CODE>char *</CODE> the resulting string <EM>must not</EM> be changed. It 128should better be <CODE>const char *</CODE>, but the standard is published in 1291988, one year before ANSI C. 130 131</P> 132<P> 133<A NAME="IDX1031"></A> 134The last of these functions is used and behaves as expected: 135 136</P> 137 138<PRE> 139catclose (catd); 140</PRE> 141 142<P> 143After this no <CODE>catgets</CODE> call using the descriptor is legal anymore. 144 145</P> 146 147 148<H3><A NAME="SEC181" HREF="gettext_toc.html#TOC181">11.1.2 Problems with the <CODE>catgets</CODE> Interface?!</A></H3> 149<P> 150<A NAME="IDX1032"></A> 151 152</P> 153<P> 154Now that this description seemed to be really easy -- where are the 155problems we speak of? In fact the interface could be used in a 156reasonable way, but constructing the message catalogs is a pain. The 157reason for this lies in the third argument of <CODE>catgets</CODE>: the unique 158message ID. This has to be a numeric value for all messages in a single 159set. Perhaps you could imagine the problems keeping such a list while 160changing the source code. Add a new message here, remove one there. Of 161course there have been developed a lot of tools helping to organize this 162chaos but one as the other fails in one aspect or the other. We don't 163want to say that the other approach has no problems but they are far 164more easy to manage. 165 166</P> 167 168 169<H2><A NAME="SEC182" HREF="gettext_toc.html#TOC182">11.2 About <CODE>gettext</CODE></A></H2> 170<P> 171<A NAME="IDX1033"></A> 172 173</P> 174<P> 175The definition of the <CODE>gettext</CODE> interface comes from a Uniforum 176proposal. It was submitted there by Sun, who had implemented the 177<CODE>gettext</CODE> function in SunOS 4, around 1990. Nowadays, the 178<CODE>gettext</CODE> interface is specified by the OpenI18N standard. 179 180</P> 181<P> 182The main point about this solution is that it does not follow the 183method of normal file handling (open-use-close) and that it does not 184burden the programmer with so many tasks, especially the unique key handling. 185Of course here also a unique key is needed, but this key is the message 186itself (how long or short it is). See section <A HREF="gettext_11.html#SEC190">11.3 Comparing the Two Interfaces</A> for a more 187detailed comparison of the two methods. 188 189</P> 190<P> 191The following section contains a rather detailed description of the 192interface. We make it that detailed because this is the interface 193we chose for the GNU <CODE>gettext</CODE> Library. Programmers interested 194in using this library will be interested in this description. 195 196</P> 197 198 199 200<H3><A NAME="SEC183" HREF="gettext_toc.html#TOC183">11.2.1 The Interface</A></H3> 201<P> 202<A NAME="IDX1034"></A> 203 204</P> 205<P> 206The minimal functionality an interface must have is a) to select a 207domain the strings are coming from (a single domain for all programs is 208not reasonable because its construction and maintenance is difficult, 209perhaps impossible) and b) to access a string in a selected domain. 210 211</P> 212<P> 213This is principally the description of the <CODE>gettext</CODE> interface. It 214has a global domain which unqualified usages reference. Of course this 215domain is selectable by the user. 216 217</P> 218 219<PRE> 220char *textdomain (const char *domain_name); 221</PRE> 222 223<P> 224This provides the possibility to change or query the current status of 225the current global domain of the <CODE>LC_MESSAGE</CODE> category. The 226argument is a null-terminated string, whose characters must be legal in 227the use in filenames. If the <VAR>domain_name</VAR> argument is <CODE>NULL</CODE>, 228the function returns the current value. If no value has been set 229before, the name of the default domain is returned: <EM>messages</EM>. 230Please note that although the return value of <CODE>textdomain</CODE> is of 231type <CODE>char *</CODE> no changing is allowed. It is also important to know 232that no checks of the availability are made. If the name is not 233available you will see this by the fact that no translations are provided. 234 235</P> 236<P> 237To use a domain set by <CODE>textdomain</CODE> the function 238 239</P> 240 241<PRE> 242char *gettext (const char *msgid); 243</PRE> 244 245<P> 246is to be used. This is the simplest reasonable form one can imagine. 247The translation of the string <VAR>msgid</VAR> is returned if it is available 248in the current domain. If it is not available, the argument itself is 249returned. If the argument is <CODE>NULL</CODE> the result is undefined. 250 251</P> 252<P> 253One thing which should come into mind is that no explicit dependency to 254the used domain is given. The current value of the domain is used. 255If this changes between two 256executions of the same <CODE>gettext</CODE> call in the program, both calls 257reference a different message catalog. 258 259</P> 260<P> 261For the easiest case, which is normally used in internationalized 262packages, once at the beginning of execution a call to <CODE>textdomain</CODE> 263is issued, setting the domain to a unique name, normally the package 264name. In the following code all strings which have to be translated are 265filtered through the gettext function. That's all, the package speaks 266your language. 267 268</P> 269 270 271<H3><A NAME="SEC184" HREF="gettext_toc.html#TOC184">11.2.2 Solving Ambiguities</A></H3> 272<P> 273<A NAME="IDX1035"></A> 274<A NAME="IDX1036"></A> 275<A NAME="IDX1037"></A> 276 277</P> 278<P> 279While this single name domain works well for most applications there 280might be the need to get translations from more than one domain. Of 281course one could switch between different domains with calls to 282<CODE>textdomain</CODE>, but this is really not convenient nor is it fast. A 283possible situation could be one case subject to discussion during this 284writing: all 285error messages of functions in the set of common used functions should 286go into a separate domain <CODE>error</CODE>. By this mean we would only need 287to translate them once. 288Another case are messages from a library, as these <EM>have</EM> to be 289independent of the current domain set by the application. 290 291</P> 292<P> 293For this reasons there are two more functions to retrieve strings: 294 295</P> 296 297<PRE> 298char *dgettext (const char *domain_name, const char *msgid); 299char *dcgettext (const char *domain_name, const char *msgid, 300 int category); 301</PRE> 302 303<P> 304Both take an additional argument at the first place, which corresponds 305to the argument of <CODE>textdomain</CODE>. The third argument of 306<CODE>dcgettext</CODE> allows to use another locale category but <CODE>LC_MESSAGES</CODE>. 307But I really don't know where this can be useful. If the 308<VAR>domain_name</VAR> is <CODE>NULL</CODE> or <VAR>category</VAR> has an value beside 309the known ones, the result is undefined. It should also be noted that 310this function is not part of the second known implementation of this 311function family, the one found in Solaris. 312 313</P> 314<P> 315A second ambiguity can arise by the fact, that perhaps more than one 316domain has the same name. This can be solved by specifying where the 317needed message catalog files can be found. 318 319</P> 320 321<PRE> 322char *bindtextdomain (const char *domain_name, 323 const char *dir_name); 324</PRE> 325 326<P> 327Calling this function binds the given domain to a file in the specified 328directory (how this file is determined follows below). Especially a 329file in the systems default place is not favored against the specified 330file anymore (as it would be by solely using <CODE>textdomain</CODE>). A 331<CODE>NULL</CODE> pointer for the <VAR>dir_name</VAR> parameter returns the binding 332associated with <VAR>domain_name</VAR>. If <VAR>domain_name</VAR> itself is 333<CODE>NULL</CODE> nothing happens and a <CODE>NULL</CODE> pointer is returned. Here 334again as for all the other functions is true that none of the return 335value must be changed! 336 337</P> 338<P> 339It is important to remember that relative path names for the 340<VAR>dir_name</VAR> parameter can be trouble. Since the path is always 341computed relative to the current directory different results will be 342achieved when the program executes a <CODE>chdir</CODE> command. Relative 343paths should always be avoided to avoid dependencies and 344unreliabilities. 345 346</P> 347 348 349<H3><A NAME="SEC185" HREF="gettext_toc.html#TOC185">11.2.3 Locating Message Catalog Files</A></H3> 350<P> 351<A NAME="IDX1038"></A> 352 353</P> 354<P> 355Because many different languages for many different packages have to be 356stored we need some way to add these information to file message catalog 357files. The way usually used in Unix environments is have this encoding 358in the file name. This is also done here. The directory name given in 359<CODE>bindtextdomain</CODE>s second argument (or the default directory), 360followed by the name of the locale, the locale category, and the domain name 361are concatenated: 362 363</P> 364 365<PRE> 366<VAR>dir_name</VAR>/<VAR>locale</VAR>/LC_<VAR>category</VAR>/<VAR>domain_name</VAR>.mo 367</PRE> 368 369<P> 370The default value for <VAR>dir_name</VAR> is system specific. For the GNU 371library, and for packages adhering to its conventions, it's: 372 373<PRE> 374/usr/local/share/locale 375</PRE> 376 377<P> 378<VAR>locale</VAR> is the name of the locale category which is designated by 379<CODE>LC_<VAR>category</VAR></CODE>. For <CODE>gettext</CODE> and <CODE>dgettext</CODE> this 380<CODE>LC_<VAR>category</VAR></CODE> is always <CODE>LC_MESSAGES</CODE>.<A NAME="DOCF3" HREF="gettext_foot.html#FOOT3">(3)</A> 381The name of the locale category is determined through 382<CODE>setlocale (LC_<VAR>category</VAR>, NULL)</CODE>. 383<A NAME="DOCF4" HREF="gettext_foot.html#FOOT4">(4)</A> 384When using the function <CODE>dcgettext</CODE>, you can specify the locale category 385through the third argument. 386 387</P> 388 389 390<H3><A NAME="SEC186" HREF="gettext_toc.html#TOC186">11.2.4 How to specify the output character set <CODE>gettext</CODE> uses</A></H3> 391<P> 392<A NAME="IDX1039"></A> 393<A NAME="IDX1040"></A> 394 395</P> 396<P> 397<CODE>gettext</CODE> not only looks up a translation in a message catalog. It 398also converts the translation on the fly to the desired output character 399set. This is useful if the user is working in a different character set 400than the translator who created the message catalog, because it avoids 401distributing variants of message catalogs which differ only in the 402character set. 403 404</P> 405<P> 406The output character set is, by default, the value of <CODE>nl_langinfo 407(CODESET)</CODE>, which depends on the <CODE>LC_CTYPE</CODE> part of the current 408locale. But programs which store strings in a locale independent way 409(e.g. UTF-8) can request that <CODE>gettext</CODE> and related functions 410return the translations in that encoding, by use of the 411<CODE>bind_textdomain_codeset</CODE> function. 412 413</P> 414<P> 415Note that the <VAR>msgid</VAR> argument to <CODE>gettext</CODE> is not subject to 416character set conversion. Also, when <CODE>gettext</CODE> does not find a 417translation for <VAR>msgid</VAR>, it returns <VAR>msgid</VAR> unchanged -- 418independently of the current output character set. It is therefore 419recommended that all <VAR>msgid</VAR>s be US-ASCII strings. 420 421</P> 422<P> 423<DL> 424<DT><U>Function:</U> char * <B>bind_textdomain_codeset</B> <I>(const char *<VAR>domainname</VAR>, const char *<VAR>codeset</VAR>)</I> 425<DD><A NAME="IDX1041"></A> 426The <CODE>bind_textdomain_codeset</CODE> function can be used to specify the 427output character set for message catalogs for domain <VAR>domainname</VAR>. 428The <VAR>codeset</VAR> argument must be a valid codeset name which can be used 429for the <CODE>iconv_open</CODE> function, or a null pointer. 430 431</P> 432<P> 433If the <VAR>codeset</VAR> parameter is the null pointer, 434<CODE>bind_textdomain_codeset</CODE> returns the currently selected codeset 435for the domain with the name <VAR>domainname</VAR>. It returns <CODE>NULL</CODE> if 436no codeset has yet been selected. 437 438</P> 439<P> 440The <CODE>bind_textdomain_codeset</CODE> function can be used several times. 441If used multiple times with the same <VAR>domainname</VAR> argument, the 442later call overrides the settings made by the earlier one. 443 444</P> 445<P> 446The <CODE>bind_textdomain_codeset</CODE> function returns a pointer to a 447string containing the name of the selected codeset. The string is 448allocated internally in the function and must not be changed by the 449user. If the system went out of core during the execution of 450<CODE>bind_textdomain_codeset</CODE>, the return value is <CODE>NULL</CODE> and the 451global variable <VAR>errno</VAR> is set accordingly. 452</DL> 453 454</P> 455 456 457<H3><A NAME="SEC187" HREF="gettext_toc.html#TOC187">11.2.5 Using contexts for solving ambiguities</A></H3> 458<P> 459<A NAME="IDX1042"></A> 460<A NAME="IDX1043"></A> 461<A NAME="IDX1044"></A> 462<A NAME="IDX1045"></A> 463 464</P> 465<P> 466One place where the <CODE>gettext</CODE> functions, if used normally, have big 467problems is within programs with graphical user interfaces (GUIs). The 468problem is that many of the strings which have to be translated are very 469short. They have to appear in pull-down menus which restricts the 470length. But strings which are not containing entire sentences or at 471least large fragments of a sentence may appear in more than one 472situation in the program but might have different translations. This is 473especially true for the one-word strings which are frequently used in 474GUI programs. 475 476</P> 477<P> 478As a consequence many people say that the <CODE>gettext</CODE> approach is 479wrong and instead <CODE>catgets</CODE> should be used which indeed does not 480have this problem. But there is a very simple and powerful method to 481handle this kind of problems with the <CODE>gettext</CODE> functions. 482 483</P> 484<P> 485Contexts can be added to strings to be translated. A context dependent 486translation lookup is when a translation for a given string is searched, 487that is limited to a given context. The translation for the same string 488in a different context can be different. The different translations of 489the same string in different contexts can be stored in the in the same 490MO file, and can be edited by the translator in the same PO file. 491 492</P> 493<P> 494The <TT>‘gettext.h’</TT> include file contains the lookup macros for strings 495with contexts. They are implemented as thin macros and inline functions 496over the functions from <CODE><libintl.h></CODE>. 497 498</P> 499<P> 500<A NAME="IDX1046"></A> 501 502<PRE> 503const char *pgettext (const char *msgctxt, const char *msgid); 504</PRE> 505 506<P> 507In a call of this macro, <VAR>msgctxt</VAR> and <VAR>msgid</VAR> must be string 508literals. The macro returns the translation of <VAR>msgid</VAR>, restricted 509to the context given by <VAR>msgctxt</VAR>. 510 511</P> 512<P> 513The <VAR>msgctxt</VAR> string is visible in the PO file to the translator. 514You should try to make it somehow canonical and never changing. Because 515every time you change an <VAR>msgctxt</VAR>, the translator will have to review 516the translation of <VAR>msgid</VAR>. 517 518</P> 519<P> 520Finding a canonical <VAR>msgctxt</VAR> string that doesn't change over time can 521be hard. But you shouldn't use the file name or class name containing the 522<CODE>pgettext</CODE> call -- because it is a common development task to rename 523a file or a class, and it shouldn't cause translator work. Also you shouldn't 524use a comment in the form of a complete English sentence as <VAR>msgctxt</VAR> -- 525because orthography or grammar changes are often applied to such sentences, 526and again, it shouldn't force the translator to do a review. 527 528</P> 529<P> 530The <SAMP>‘p’</SAMP> in <SAMP>‘pgettext’</SAMP> stands for ���particular���: <CODE>pgettext</CODE> 531fetches a particular translation of the <VAR>msgid</VAR>. 532 533</P> 534<P> 535<A NAME="IDX1047"></A> 536<A NAME="IDX1048"></A> 537 538<PRE> 539const char *dpgettext (const char *domain_name, 540 const char *msgctxt, const char *msgid); 541const char *dcpgettext (const char *domain_name, 542 const char *msgctxt, const char *msgid, 543 int category); 544</PRE> 545 546<P> 547These are generalizations of <CODE>pgettext</CODE>. They behave similarly to 548<CODE>dgettext</CODE> and <CODE>dcgettext</CODE>, respectively. The <VAR>domain_name</VAR> 549argument defines the translation domain. The <VAR>category</VAR> argument 550allows to use another locale category than <CODE>LC_MESSAGES</CODE>. 551 552</P> 553<P> 554As as example consider the following fictional situation. A GUI program 555has a menu bar with the following entries: 556 557</P> 558 559<PRE> 560+------------+------------+--------------------------------------+ 561| File | Printer | | 562+------------+------------+--------------------------------------+ 563| Open | | Select | 564| New | | Open | 565+----------+ | Connect | 566 +----------+ 567</PRE> 568 569<P> 570To have the strings <CODE>File</CODE>, <CODE>Printer</CODE>, <CODE>Open</CODE>, 571<CODE>New</CODE>, <CODE>Select</CODE>, and <CODE>Connect</CODE> translated there has to be 572at some point in the code a call to a function of the <CODE>gettext</CODE> 573family. But in two places the string passed into the function would be 574<CODE>Open</CODE>. The translations might not be the same and therefore we 575are in the dilemma described above. 576 577</P> 578<P> 579What distinguishes the two places is the menu path from the menu root to 580the particular menu entries: 581 582</P> 583 584<PRE> 585Menu|File 586Menu|Printer 587Menu|File|Open 588Menu|File|New 589Menu|Printer|Select 590Menu|Printer|Open 591Menu|Printer|Connect 592</PRE> 593 594<P> 595The context is thus the menu path without its last part. So, the calls 596look like this: 597 598</P> 599 600<PRE> 601pgettext ("Menu|", "File") 602pgettext ("Menu|", "Printer") 603pgettext ("Menu|File|", "Open") 604pgettext ("Menu|File|", "New") 605pgettext ("Menu|Printer|", "Select") 606pgettext ("Menu|Printer|", "Open") 607pgettext ("Menu|Printer|", "Connect") 608</PRE> 609 610<P> 611Whether or not to use the <SAMP>‘|’</SAMP> character at the end of the context is a 612matter of style. 613 614</P> 615<P> 616For more complex cases, where the <VAR>msgctxt</VAR> or <VAR>msgid</VAR> are not 617string literals, more general macros are available: 618 619</P> 620<P> 621<A NAME="IDX1049"></A> 622<A NAME="IDX1050"></A> 623<A NAME="IDX1051"></A> 624 625<PRE> 626const char *pgettext_expr (const char *msgctxt, const char *msgid); 627const char *dpgettext_expr (const char *domain_name, 628 const char *msgctxt, const char *msgid); 629const char *dcpgettext_expr (const char *domain_name, 630 const char *msgctxt, const char *msgid, 631 int category); 632</PRE> 633 634<P> 635Here <VAR>msgctxt</VAR> and <VAR>msgid</VAR> can be arbitrary string-valued expressions. 636These macros are more general. But in the case that both argument expressions 637are string literals, the macros without the <SAMP>‘_expr’</SAMP> suffix are more 638efficient. 639 640</P> 641 642 643<H3><A NAME="SEC188" HREF="gettext_toc.html#TOC188">11.2.6 Additional functions for plural forms</A></H3> 644<P> 645<A NAME="IDX1052"></A> 646 647</P> 648<P> 649The functions of the <CODE>gettext</CODE> family described so far (and all the 650<CODE>catgets</CODE> functions as well) have one problem in the real world 651which have been neglected completely in all existing approaches. What 652is meant here is the handling of plural forms. 653 654</P> 655<P> 656Looking through Unix source code before the time anybody thought about 657internationalization (and, sadly, even afterwards) one can often find 658code similar to the following: 659 660</P> 661 662<PRE> 663 printf ("%d file%s deleted", n, n == 1 ? "" : "s"); 664</PRE> 665 666<P> 667After the first complaints from people internationalizing the code people 668either completely avoided formulations like this or used strings like 669<CODE>"file(s)"</CODE>. Both look unnatural and should be avoided. First 670tries to solve the problem correctly looked like this: 671 672</P> 673 674<PRE> 675 if (n == 1) 676 printf ("%d file deleted", n); 677 else 678 printf ("%d files deleted", n); 679</PRE> 680 681<P> 682But this does not solve the problem. It helps languages where the 683plural form of a noun is not simply constructed by adding an 684���s��� 685but that is all. Once again people fell into the trap of believing the 686rules their language is using are universal. But the handling of plural 687forms differs widely between the language families. For example, 688Rafal Maszkowski <CODE><rzm@mat.uni.torun.pl></CODE> reports: 689 690</P> 691 692<BLOCKQUOTE> 693<P> 694In Polish we use e.g. plik (file) this way: 695 696<PRE> 6971 plik 6982,3,4 pliki 6995-21 pliko'w 70022-24 pliki 70125-31 pliko'w 702</PRE> 703 704<P> 705and so on (o' means 8859-2 oacute which should be rather okreska, 706similar to aogonek). 707</BLOCKQUOTE> 708 709<P> 710There are two things which can differ between languages (and even inside 711language families); 712 713</P> 714 715<UL> 716<LI> 717 718The form how plural forms are built differs. This is a problem with 719languages which have many irregularities. German, for instance, is a 720drastic case. Though English and German are part of the same language 721family (Germanic), the almost regular forming of plural noun forms 722(appending an 723���s���) 724is hardly found in German. 725 726<LI> 727 728The number of plural forms differ. This is somewhat surprising for 729those who only have experiences with Romanic and Germanic languages 730since here the number is the same (there are two). 731 732But other language families have only one form or many forms. More 733information on this in an extra section. 734</UL> 735 736<P> 737The consequence of this is that application writers should not try to 738solve the problem in their code. This would be localization since it is 739only usable for certain, hardcoded language environments. Instead the 740extended <CODE>gettext</CODE> interface should be used. 741 742</P> 743<P> 744These extra functions are taking instead of the one key string two 745strings and a numerical argument. The idea behind this is that using 746the numerical argument and the first string as a key, the implementation 747can select using rules specified by the translator the right plural 748form. The two string arguments then will be used to provide a return 749value in case no message catalog is found (similar to the normal 750<CODE>gettext</CODE> behavior). In this case the rules for Germanic language 751is used and it is assumed that the first string argument is the singular 752form, the second the plural form. 753 754</P> 755<P> 756This has the consequence that programs without language catalogs can 757display the correct strings only if the program itself is written using 758a Germanic language. This is a limitation but since the GNU C library 759(as well as the GNU <CODE>gettext</CODE> package) are written as part of the 760GNU package and the coding standards for the GNU project require program 761being written in English, this solution nevertheless fulfills its 762purpose. 763 764</P> 765<P> 766<DL> 767<DT><U>Function:</U> char * <B>ngettext</B> <I>(const char *<VAR>msgid1</VAR>, const char *<VAR>msgid2</VAR>, unsigned long int <VAR>n</VAR>)</I> 768<DD><A NAME="IDX1053"></A> 769The <CODE>ngettext</CODE> function is similar to the <CODE>gettext</CODE> function 770as it finds the message catalogs in the same way. But it takes two 771extra arguments. The <VAR>msgid1</VAR> parameter must contain the singular 772form of the string to be converted. It is also used as the key for the 773search in the catalog. The <VAR>msgid2</VAR> parameter is the plural form. 774The parameter <VAR>n</VAR> is used to determine the plural form. If no 775message catalog is found <VAR>msgid1</VAR> is returned if <CODE>n == 1</CODE>, 776otherwise <CODE>msgid2</CODE>. 777 778</P> 779<P> 780An example for the use of this function is: 781 782</P> 783 784<PRE> 785printf (ngettext ("%d file removed", "%d files removed", n), n); 786</PRE> 787 788<P> 789Please note that the numeric value <VAR>n</VAR> has to be passed to the 790<CODE>printf</CODE> function as well. It is not sufficient to pass it only to 791<CODE>ngettext</CODE>. 792 793</P> 794<P> 795In the English singular case, the number -- always 1 -- can be replaced with 796"one": 797 798</P> 799 800<PRE> 801printf (ngettext ("One file removed", "%d files removed", n), n); 802</PRE> 803 804<P> 805This works because the <SAMP>‘printf’</SAMP> function discards excess arguments that 806are not consumed by the format string. 807 808</P> 809<P> 810It is also possible to use this function when the strings don't contain a 811cardinal number: 812 813</P> 814 815<PRE> 816puts (ngettext ("Delete the selected file?", 817 "Delete the selected files?", 818 n)); 819</PRE> 820 821<P> 822In this case the number <VAR>n</VAR> is only used to choose the plural form. 823</DL> 824 825</P> 826<P> 827<DL> 828<DT><U>Function:</U> char * <B>dngettext</B> <I>(const char *<VAR>domain</VAR>, const char *<VAR>msgid1</VAR>, const char *<VAR>msgid2</VAR>, unsigned long int <VAR>n</VAR>)</I> 829<DD><A NAME="IDX1054"></A> 830The <CODE>dngettext</CODE> is similar to the <CODE>dgettext</CODE> function in the 831way the message catalog is selected. The difference is that it takes 832two extra parameter to provide the correct plural form. These two 833parameters are handled in the same way <CODE>ngettext</CODE> handles them. 834</DL> 835 836</P> 837<P> 838<DL> 839<DT><U>Function:</U> char * <B>dcngettext</B> <I>(const char *<VAR>domain</VAR>, const char *<VAR>msgid1</VAR>, const char *<VAR>msgid2</VAR>, unsigned long int <VAR>n</VAR>, int <VAR>category</VAR>)</I> 840<DD><A NAME="IDX1055"></A> 841The <CODE>dcngettext</CODE> is similar to the <CODE>dcgettext</CODE> function in the 842way the message catalog is selected. The difference is that it takes 843two extra parameter to provide the correct plural form. These two 844parameters are handled in the same way <CODE>ngettext</CODE> handles them. 845</DL> 846 847</P> 848<P> 849Now, how do these functions solve the problem of the plural forms? 850Without the input of linguists (which was not available) it was not 851possible to determine whether there are only a few different forms in 852which plural forms are formed or whether the number can increase with 853every new supported language. 854 855</P> 856<P> 857Therefore the solution implemented is to allow the translator to specify 858the rules of how to select the plural form. Since the formula varies 859with every language this is the only viable solution except for 860hardcoding the information in the code (which still would require the 861possibility of extensions to not prevent the use of new languages). 862 863</P> 864<P> 865<A NAME="IDX1056"></A> 866<A NAME="IDX1057"></A> 867<A NAME="IDX1058"></A> 868The information about the plural form selection has to be stored in the 869header entry of the PO file (the one with the empty <CODE>msgid</CODE> string). 870The plural form information looks like this: 871 872</P> 873 874<PRE> 875Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1; 876</PRE> 877 878<P> 879The <CODE>nplurals</CODE> value must be a decimal number which specifies how 880many different plural forms exist for this language. The string 881following <CODE>plural</CODE> is an expression which is using the C language 882syntax. Exceptions are that no negative numbers are allowed, numbers 883must be decimal, and the only variable allowed is <CODE>n</CODE>. Spaces are 884allowed in the expression, but backslash-newlines are not; in the 885examples below the backslash-newlines are present for formatting purposes 886only. This expression will be evaluated whenever one of the functions 887<CODE>ngettext</CODE>, <CODE>dngettext</CODE>, or <CODE>dcngettext</CODE> is called. The 888numeric value passed to these functions is then substituted for all uses 889of the variable <CODE>n</CODE> in the expression. The resulting value then 890must be greater or equal to zero and smaller than the value given as the 891value of <CODE>nplurals</CODE>. 892 893</P> 894<P> 895<A NAME="IDX1059"></A> 896The following rules are known at this point. The language with families 897are listed. But this does not necessarily mean the information can be 898generalized for the whole family (as can be easily seen in the table 899below).<A NAME="DOCF5" HREF="gettext_foot.html#FOOT5">(5)</A> 900 901</P> 902<DL COMPACT> 903 904<DT>Only one form: 905<DD> 906Some languages only require one single form. There is no distinction 907between the singular and plural form. An appropriate header entry 908would look like this: 909 910 911<PRE> 912Plural-Forms: nplurals=1; plural=0; 913</PRE> 914 915Languages with this property include: 916 917<DL COMPACT> 918 919<DT>Asian family 920<DD> 921Japanese, Korean, Vietnamese 922<DT>Turkic/Altaic family 923<DD> 924Turkish 925</DL> 926 927<DT>Two forms, singular used for one only 928<DD> 929This is the form used in most existing programs since it is what English 930is using. A header entry would look like this: 931 932 933<PRE> 934Plural-Forms: nplurals=2; plural=n != 1; 935</PRE> 936 937(Note: this uses the feature of C expressions that boolean expressions 938have to value zero or one.) 939 940Languages with this property include: 941 942<DL COMPACT> 943 944<DT>Germanic family 945<DD> 946Danish, Dutch, English, Faroese, German, Norwegian, Swedish 947<DT>Finno-Ugric family 948<DD> 949Estonian, Finnish 950<DT>Latin/Greek family 951<DD> 952Greek 953<DT>Semitic family 954<DD> 955Hebrew 956<DT>Romanic family 957<DD> 958Italian, Portuguese, Spanish 959<DT>Artificial 960<DD> 961Esperanto 962</DL> 963 964Another language using the same header entry is: 965 966<DL COMPACT> 967 968<DT>Finno-Ugric family 969<DD> 970Hungarian 971</DL> 972 973Hungarian does not appear to have a plural if you look at sentences involving 974cardinal numbers. For example, ���1 apple��� is ���1 alma���, and ���123 apples��� is 975���123 alma���. But when the number is not explicit, the distinction between 976singular and plural exists: ���the apple��� is ���az alma���, and ���the apples��� is 977���az alm'{a}k���. Since <CODE>ngettext</CODE> has to support both types of sentences, 978it is classified here, under ���two forms���. 979 980<DT>Two forms, singular used for zero and one 981<DD> 982Exceptional case in the language family. The header entry would be: 983 984 985<PRE> 986Plural-Forms: nplurals=2; plural=n>1; 987</PRE> 988 989Languages with this property include: 990 991<DL COMPACT> 992 993<DT>Romanic family 994<DD> 995French, Brazilian Portuguese 996</DL> 997 998<DT>Three forms, special case for zero 999<DD> 1000The header entry would be: 1001 1002 1003<PRE> 1004Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2; 1005</PRE> 1006 1007Languages with this property include: 1008 1009<DL COMPACT> 1010 1011<DT>Baltic family 1012<DD> 1013Latvian 1014</DL> 1015 1016<DT>Three forms, special cases for one and two 1017<DD> 1018The header entry would be: 1019 1020 1021<PRE> 1022Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2; 1023</PRE> 1024 1025Languages with this property include: 1026 1027<DL COMPACT> 1028 1029<DT>Celtic 1030<DD> 1031Gaeilge (Irish) 1032</DL> 1033 1034<DT>Three forms, special case for numbers ending in 00 or [2-9][0-9] 1035<DD> 1036The header entry would be: 1037 1038 1039<PRE> 1040Plural-Forms: nplurals=3; \ 1041 plural=n==1 ? 0 : (n==0 || (n%100 > 0 && n%100 < 20)) ? 1 : 2; 1042</PRE> 1043 1044Languages with this property include: 1045 1046<DL COMPACT> 1047 1048<DT>Romanic family 1049<DD> 1050Romanian 1051</DL> 1052 1053<DT>Three forms, special case for numbers ending in 1[2-9] 1054<DD> 1055The header entry would look like this: 1056 1057 1058<PRE> 1059Plural-Forms: nplurals=3; \ 1060 plural=n%10==1 && n%100!=11 ? 0 : \ 1061 n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2; 1062</PRE> 1063 1064Languages with this property include: 1065 1066<DL COMPACT> 1067 1068<DT>Baltic family 1069<DD> 1070Lithuanian 1071</DL> 1072 1073<DT>Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4] 1074<DD> 1075The header entry would look like this: 1076 1077 1078<PRE> 1079Plural-Forms: nplurals=3; \ 1080 plural=n%10==1 && n%100!=11 ? 0 : \ 1081 n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2; 1082</PRE> 1083 1084Languages with this property include: 1085 1086<DL COMPACT> 1087 1088<DT>Slavic family 1089<DD> 1090Croatian, Serbian, Russian, Ukrainian 1091</DL> 1092 1093<DT>Three forms, special cases for 1 and 2, 3, 4 1094<DD> 1095The header entry would look like this: 1096 1097 1098<PRE> 1099Plural-Forms: nplurals=3; \ 1100 plural=(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2; 1101</PRE> 1102 1103Languages with this property include: 1104 1105<DL COMPACT> 1106 1107<DT>Slavic family 1108<DD> 1109Slovak, Czech 1110</DL> 1111 1112<DT>Three forms, special case for one and some numbers ending in 2, 3, or 4 1113<DD> 1114The header entry would look like this: 1115 1116 1117<PRE> 1118Plural-Forms: nplurals=3; \ 1119 plural=n==1 ? 0 : \ 1120 n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2; 1121</PRE> 1122 1123Languages with this property include: 1124 1125<DL COMPACT> 1126 1127<DT>Slavic family 1128<DD> 1129Polish 1130</DL> 1131 1132<DT>Four forms, special case for one and all numbers ending in 02, 03, or 04 1133<DD> 1134The header entry would look like this: 1135 1136 1137<PRE> 1138Plural-Forms: nplurals=4; \ 1139 plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3; 1140</PRE> 1141 1142Languages with this property include: 1143 1144<DL COMPACT> 1145 1146<DT>Slavic family 1147<DD> 1148Slovenian 1149</DL> 1150</DL> 1151 1152<P> 1153You might now ask, <CODE>ngettext</CODE> handles only numbers <VAR>n</VAR> of type 1154<SAMP>‘unsigned long’</SAMP>. What about larger integer types? What about negative 1155numbers? What about floating-point numbers? 1156 1157</P> 1158<P> 1159About larger integer types, such as <SAMP>‘uintmax_t’</SAMP> or 1160<SAMP>‘unsigned long long’</SAMP>: they can be handled by reducing the value to a 1161range that fits in an <SAMP>‘unsigned long’</SAMP>. Simply casting the value to 1162<SAMP>‘unsigned long’</SAMP> would not do the right thing, since it would treat 1163<CODE>ULONG_MAX + 1</CODE> like zero, <CODE>ULONG_MAX + 2</CODE> like singular, and 1164the like. Here you can exploit the fact that all mentioned plural form 1165formulas eventually become periodic, with a period that is a divisor of 100 1166(or 1000 or 1000000). So, when you reduce a large value to another one in 1167the range [1000000, 1999999] that ends in the same 6 decimal digits, you 1168can assume that it will lead to the same plural form selection. This code 1169does this: 1170 1171</P> 1172 1173<PRE> 1174#include <inttypes.h> 1175uintmax_t nbytes = ...; 1176printf (ngettext ("The file has %"PRIuMAX" byte.", 1177 "The file has %"PRIuMAX" bytes.", 1178 (nbytes > ULONG_MAX 1179 ? (nbytes % 1000000) + 1000000 1180 : nbytes)), 1181 nbytes); 1182</PRE> 1183 1184<P> 1185Negative and floating-point values usually represent physical entities for 1186which singular and plural don't clearly apply. In such cases, there is no 1187need to use <CODE>ngettext</CODE>; a simple <CODE>gettext</CODE> call with a form suitable 1188for all values will do. For example: 1189 1190</P> 1191 1192<PRE> 1193printf (gettext ("Time elapsed: %.3f seconds"), 1194 num_milliseconds * 0.001); 1195</PRE> 1196 1197<P> 1198Even if <VAR>num_milliseconds</VAR> happens to be a multiple of 1000, the output 1199 1200<PRE> 1201Time elapsed: 1.000 seconds 1202</PRE> 1203 1204<P> 1205is acceptable in English, and similarly for other languages. 1206 1207</P> 1208 1209 1210<H3><A NAME="SEC189" HREF="gettext_toc.html#TOC189">11.2.7 Optimization of the *gettext functions</A></H3> 1211<P> 1212<A NAME="IDX1060"></A> 1213 1214</P> 1215<P> 1216At this point of the discussion we should talk about an advantage of the 1217GNU <CODE>gettext</CODE> implementation. Some readers might have pointed out 1218that an internationalized program might have a poor performance if some 1219string has to be translated in an inner loop. While this is unavoidable 1220when the string varies from one run of the loop to the other it is 1221simply a waste of time when the string is always the same. Take the 1222following example: 1223 1224</P> 1225 1226<PRE> 1227{ 1228 while (...) 1229 { 1230 puts (gettext ("Hello world")); 1231 } 1232} 1233</PRE> 1234 1235<P> 1236When the locale selection does not change between two runs the resulting 1237string is always the same. One way to use this is: 1238 1239</P> 1240 1241<PRE> 1242{ 1243 str = gettext ("Hello world"); 1244 while (...) 1245 { 1246 puts (str); 1247 } 1248} 1249</PRE> 1250 1251<P> 1252But this solution is not usable in all situation (e.g. when the locale 1253selection changes) nor does it lead to legible code. 1254 1255</P> 1256<P> 1257For this reason, GNU <CODE>gettext</CODE> caches previous translation results. 1258When the same translation is requested twice, with no new message 1259catalogs being loaded in between, <CODE>gettext</CODE> will, the second time, 1260find the result through a single cache lookup. 1261 1262</P> 1263 1264 1265<H2><A NAME="SEC190" HREF="gettext_toc.html#TOC190">11.3 Comparing the Two Interfaces</A></H2> 1266<P> 1267<A NAME="IDX1061"></A> 1268<A NAME="IDX1062"></A> 1269 1270</P> 1271 1272<P> 1273The following discussion is perhaps a little bit colored. As said 1274above we implemented GNU <CODE>gettext</CODE> following the Uniforum 1275proposal and this surely has its reasons. But it should show how we 1276came to this decision. 1277 1278</P> 1279<P> 1280First we take a look at the developing process. When we write an 1281application using NLS provided by <CODE>gettext</CODE> we proceed as always. 1282Only when we come to a string which might be seen by the users and thus 1283has to be translated we use <CODE>gettext("...")</CODE> instead of 1284<CODE>"..."</CODE>. At the beginning of each source file (or in a central 1285header file) we define 1286 1287</P> 1288 1289<PRE> 1290#define gettext(String) (String) 1291</PRE> 1292 1293<P> 1294Even this definition can be avoided when the system supports the 1295<CODE>gettext</CODE> function in its C library. When we compile this code the 1296result is the same as if no NLS code is used. When you take a look at 1297the GNU <CODE>gettext</CODE> code you will see that we use <CODE>_("...")</CODE> 1298instead of <CODE>gettext("...")</CODE>. This reduces the number of 1299additional characters per translatable string to <EM>3</EM> (in words: 1300three). 1301 1302</P> 1303<P> 1304When now a production version of the program is needed we simply replace 1305the definition 1306 1307</P> 1308 1309<PRE> 1310#define _(String) (String) 1311</PRE> 1312 1313<P> 1314by 1315 1316</P> 1317<P> 1318<A NAME="IDX1063"></A> 1319 1320<PRE> 1321#include <libintl.h> 1322#define _(String) gettext (String) 1323</PRE> 1324 1325<P> 1326Additionally we run the program <TT>‘xgettext’</TT> on all source code file 1327which contain translatable strings and that's it: we have a running 1328program which does not depend on translations to be available, but which 1329can use any that becomes available. 1330 1331</P> 1332<P> 1333<A NAME="IDX1064"></A> 1334The same procedure can be done for the <CODE>gettext_noop</CODE> invocations 1335(see section <A HREF="gettext_4.html#SEC23">4.7 Special Cases of Translatable Strings</A>). One usually defines <CODE>gettext_noop</CODE> as a 1336no-op macro. So you should consider the following code for your project: 1337 1338</P> 1339 1340<PRE> 1341#define gettext_noop(String) String 1342#define N_(String) gettext_noop (String) 1343</PRE> 1344 1345<P> 1346<CODE>N_</CODE> is a short form similar to <CODE>_</CODE>. The <TT>‘Makefile’</TT> in 1347the <TT>‘po/’</TT> directory of GNU <CODE>gettext</CODE> knows by default both of the 1348mentioned short forms so you are invited to follow this proposal for 1349your own ease. 1350 1351</P> 1352<P> 1353Now to <CODE>catgets</CODE>. The main problem is the work for the 1354programmer. Every time he comes to a translatable string he has to 1355define a number (or a symbolic constant) which has also be defined in 1356the message catalog file. He also has to take care for duplicate 1357entries, duplicate message IDs etc. If he wants to have the same 1358quality in the message catalog as the GNU <CODE>gettext</CODE> program 1359provides he also has to put the descriptive comments for the strings and 1360the location in all source code files in the message catalog. This is 1361nearly a Mission: Impossible. 1362 1363</P> 1364<P> 1365But there are also some points people might call advantages speaking for 1366<CODE>catgets</CODE>. If you have a single word in a string and this string 1367is used in different contexts it is likely that in one or the other 1368language the word has different translations. Example: 1369 1370</P> 1371 1372<PRE> 1373printf ("%s: %d", gettext ("number"), number_of_errors) 1374 1375printf ("you should see %d %s", number_count, 1376 number_count == 1 ? gettext ("number") : gettext ("numbers")) 1377</PRE> 1378 1379<P> 1380Here we have to translate two times the string <CODE>"number"</CODE>. Even 1381if you do not speak a language beside English it might be possible to 1382recognize that the two words have a different meaning. In German the 1383first appearance has to be translated to <CODE>"Anzahl"</CODE> and the second 1384to <CODE>"Zahl"</CODE>. 1385 1386</P> 1387<P> 1388Now you can say that this example is really esoteric. And you are 1389right! This is exactly how we felt about this problem and decide that 1390it does not weight that much. The solution for the above problem could 1391be very easy: 1392 1393</P> 1394 1395<PRE> 1396printf ("%s %d", gettext ("number:"), number_of_errors) 1397 1398printf (number_count == 1 ? gettext ("you should see %d number") 1399 : gettext ("you should see %d numbers"), 1400 number_count) 1401</PRE> 1402 1403<P> 1404We believe that we can solve all conflicts with this method. If it is 1405difficult one can also consider changing one of the conflicting string a 1406little bit. But it is not impossible to overcome. 1407 1408</P> 1409<P> 1410<CODE>catgets</CODE> allows same original entry to have different translations, 1411but <CODE>gettext</CODE> has another, scalable approach for solving ambiguities 1412of this kind: See section <A HREF="gettext_11.html#SEC184">11.2.2 Solving Ambiguities</A>. 1413 1414</P> 1415 1416 1417<H2><A NAME="SEC191" HREF="gettext_toc.html#TOC191">11.4 Using libintl.a in own programs</A></H2> 1418 1419<P> 1420Starting with version 0.9.4 the library <CODE>libintl.h</CODE> should be 1421self-contained. I.e., you can use it in your own programs without 1422providing additional functions. The <TT>‘Makefile’</TT> will put the header 1423and the library in directories selected using the <CODE>$(prefix)</CODE>. 1424 1425</P> 1426 1427 1428<H2><A NAME="SEC192" HREF="gettext_toc.html#TOC192">11.5 Being a <CODE>gettext</CODE> grok</A></H2> 1429 1430<P> 1431<STRONG> NOTE: </STRONG> This documentation section is outdated and needs to be 1432revised. 1433 1434</P> 1435<P> 1436To fully exploit the functionality of the GNU <CODE>gettext</CODE> library it 1437is surely helpful to read the source code. But for those who don't want 1438to spend that much time in reading the (sometimes complicated) code here 1439is a list comments: 1440 1441</P> 1442 1443<UL> 1444<LI>Changing the language at runtime 1445 1446<A NAME="IDX1065"></A> 1447 1448For interactive programs it might be useful to offer a selection of the 1449used language at runtime. To understand how to do this one need to know 1450how the used language is determined while executing the <CODE>gettext</CODE> 1451function. The method which is presented here only works correctly 1452with the GNU implementation of the <CODE>gettext</CODE> functions. 1453 1454In the function <CODE>dcgettext</CODE> at every call the current setting of 1455the highest priority environment variable is determined and used. 1456Highest priority means here the following list with decreasing 1457priority: 1458 1459 1460<OL> 1461<LI><CODE>LANGUAGE</CODE> 1462 1463<A NAME="IDX1066"></A> 1464 1465<A NAME="IDX1067"></A> 1466<LI><CODE>LC_ALL</CODE> 1467 1468<A NAME="IDX1068"></A> 1469<A NAME="IDX1069"></A> 1470<A NAME="IDX1070"></A> 1471<A NAME="IDX1071"></A> 1472<A NAME="IDX1072"></A> 1473<A NAME="IDX1073"></A> 1474<LI><CODE>LC_xxx</CODE>, according to selected locale category 1475 1476<A NAME="IDX1074"></A> 1477<LI><CODE>LANG</CODE> 1478 1479</OL> 1480 1481Afterwards the path is constructed using the found value and the 1482translation file is loaded if available. 1483 1484What happens now when the value for, say, <CODE>LANGUAGE</CODE> changes? According 1485to the process explained above the new value of this variable is found 1486as soon as the <CODE>dcgettext</CODE> function is called. But this also means 1487the (perhaps) different message catalog file is loaded. In other 1488words: the used language is changed. 1489 1490But there is one little hook. The code for gcc-2.7.0 and up provides 1491some optimization. This optimization normally prevents the calling of 1492the <CODE>dcgettext</CODE> function as long as no new catalog is loaded. But 1493if <CODE>dcgettext</CODE> is not called the program also cannot find the 1494<CODE>LANGUAGE</CODE> variable be changed (see section <A HREF="gettext_11.html#SEC189">11.2.7 Optimization of the *gettext functions</A>). A 1495solution for this is very easy. Include the following code in the 1496language switching function. 1497 1498 1499<PRE> 1500 /* Change language. */ 1501 setenv ("LANGUAGE", "fr", 1); 1502 1503 /* Make change known. */ 1504 { 1505 extern int _nl_msg_cat_cntr; 1506 ++_nl_msg_cat_cntr; 1507 } 1508</PRE> 1509 1510<A NAME="IDX1075"></A> 1511The variable <CODE>_nl_msg_cat_cntr</CODE> is defined in <TT>‘loadmsgcat.c’</TT>. 1512You don't need to know what this is for. But it can be used to detect 1513whether a <CODE>gettext</CODE> implementation is GNU gettext and not non-GNU 1514system's native gettext implementation. 1515 1516</UL> 1517 1518 1519 1520<H2><A NAME="SEC193" HREF="gettext_toc.html#TOC193">11.6 Temporary Notes for the Programmers Chapter</A></H2> 1521 1522<P> 1523<STRONG> NOTE: </STRONG> This documentation section is outdated and needs to be 1524revised. 1525 1526</P> 1527 1528 1529 1530<H3><A NAME="SEC194" HREF="gettext_toc.html#TOC194">11.6.1 Temporary - Two Possible Implementations</A></H3> 1531 1532<P> 1533There are two competing methods for language independent messages: 1534the X/Open <CODE>catgets</CODE> method, and the Uniforum <CODE>gettext</CODE> 1535method. The <CODE>catgets</CODE> method indexes messages by integers; the 1536<CODE>gettext</CODE> method indexes them by their English translations. 1537The <CODE>catgets</CODE> method has been around longer and is supported 1538by more vendors. The <CODE>gettext</CODE> method is supported by Sun, 1539and it has been heard that the COSE multi-vendor initiative is 1540supporting it. Neither method is a POSIX standard; the POSIX.1 1541committee had a lot of disagreement in this area. 1542 1543</P> 1544<P> 1545Neither one is in the POSIX standard. There was much disagreement 1546in the POSIX.1 committee about using the <CODE>gettext</CODE> routines 1547vs. <CODE>catgets</CODE> (XPG). In the end the committee couldn't 1548agree on anything, so no messaging system was included as part 1549of the standard. I believe the informative annex of the standard 1550includes the XPG3 messaging interfaces, ���...as an example of 1551a messaging system that has been implemented...��� 1552 1553</P> 1554<P> 1555They were very careful not to say anywhere that you should use one 1556set of interfaces over the other. For more on this topic please 1557see the Programming for Internationalization FAQ. 1558 1559</P> 1560 1561 1562<H3><A NAME="SEC195" HREF="gettext_toc.html#TOC195">11.6.2 Temporary - About <CODE>catgets</CODE></A></H3> 1563 1564<P> 1565There have been a few discussions of late on the use of 1566<CODE>catgets</CODE> as a base. I think it important to present both 1567sides of the argument and hence am opting to play devil's advocate 1568for a little bit. 1569 1570</P> 1571<P> 1572I'll not deny the fact that <CODE>catgets</CODE> could have been designed 1573a lot better. It currently has quite a number of limitations and 1574these have already been pointed out. 1575 1576</P> 1577<P> 1578However there is a great deal to be said for consistency and 1579standardization. A common recurring problem when writing Unix 1580software is the myriad portability problems across Unix platforms. 1581It seems as if every Unix vendor had a look at the operating system 1582and found parts they could improve upon. Undoubtedly, these 1583modifications are probably innovative and solve real problems. 1584However, software developers have a hard time keeping up with all 1585these changes across so many platforms. 1586 1587</P> 1588<P> 1589And this has prompted the Unix vendors to begin to standardize their 1590systems. Hence the impetus for Spec1170. Every major Unix vendor 1591has committed to supporting this standard and every Unix software 1592developer waits with glee the day they can write software to this 1593standard and simply recompile (without having to use autoconf) 1594across different platforms. 1595 1596</P> 1597<P> 1598As I understand it, Spec1170 is roughly based upon version 4 of the 1599X/Open Portability Guidelines (XPG4). Because <CODE>catgets</CODE> and 1600friends are defined in XPG4, I'm led to believe that <CODE>catgets</CODE> 1601is a part of Spec1170 and hence will become a standardized component 1602of all Unix systems. 1603 1604</P> 1605 1606 1607<H3><A NAME="SEC196" HREF="gettext_toc.html#TOC196">11.6.3 Temporary - Why a single implementation</A></H3> 1608 1609<P> 1610Now it seems kind of wasteful to me to have two different systems 1611installed for accessing message catalogs. If we do want to remedy 1612<CODE>catgets</CODE> deficiencies why don't we try to expand <CODE>catgets</CODE> 1613(in a compatible manner) rather than implement an entirely new system. 1614Otherwise, we'll end up with two message catalog access systems installed 1615with an operating system - one set of routines for packages using GNU 1616<CODE>gettext</CODE> for their internationalization, and another set of routines 1617(catgets) for all other software. Bloated? 1618 1619</P> 1620<P> 1621Supposing another catalog access system is implemented. Which do 1622we recommend? At least for Linux, we need to attract as many 1623software developers as possible. Hence we need to make it as easy 1624for them to port their software as possible. Which means supporting 1625<CODE>catgets</CODE>. We will be implementing the <CODE>libintl</CODE> code 1626within our <CODE>libc</CODE>, but does this mean we also have to incorporate 1627another message catalog access scheme within our <CODE>libc</CODE> as well? 1628And what about people who are going to be using the <CODE>libintl</CODE> 1629+ non-<CODE>catgets</CODE> routines. When they port their software to 1630other platforms, they're now going to have to include the front-end 1631(<CODE>libintl</CODE>) code plus the back-end code (the non-<CODE>catgets</CODE> 1632access routines) with their software instead of just including the 1633<CODE>libintl</CODE> code with their software. 1634 1635</P> 1636<P> 1637Message catalog support is however only the tip of the iceberg. 1638What about the data for the other locale categories? They also have 1639a number of deficiencies. Are we going to abandon them as well and 1640develop another duplicate set of routines (should <CODE>libintl</CODE> 1641expand beyond message catalog support)? 1642 1643</P> 1644<P> 1645Like many parts of Unix that can be improved upon, we're stuck with balancing 1646compatibility with the past with useful improvements and innovations for 1647the future. 1648 1649</P> 1650 1651 1652<H3><A NAME="SEC197" HREF="gettext_toc.html#TOC197">11.6.4 Temporary - Notes</A></H3> 1653 1654<P> 1655X/Open agreed very late on the standard form so that many 1656implementations differ from the final form. Both of my system (old 1657Linux catgets and Ultrix-4) have a strange variation. 1658 1659</P> 1660<P> 1661OK. After incorporating the last changes I have to spend some time on 1662making the GNU/Linux <CODE>libc</CODE> <CODE>gettext</CODE> functions. So in future 1663Solaris is not the only system having <CODE>gettext</CODE>. 1664 1665</P> 1666<P><HR><P> 1667Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_10.html">previous</A>, <A HREF="gettext_12.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. 1668</BODY> 1669</HTML> 1670