1<HTML> 2<HEAD> 3<!-- This HTML file has been created by texi2html 1.52b 4 from gettext.texi on 29 December 2011 --> 5 6<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8"> 7<TITLE>GNU gettext utilities - 4 Preparing Program Sources</TITLE> 8</HEAD> 9<BODY> 10Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_3.html">previous</A>, <A HREF="gettext_5.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. 11<P><HR><P> 12 13 14<H1><A NAME="SEC16" HREF="gettext_toc.html#TOC16">4 Preparing Program Sources</A></H1> 15<P> 16<A NAME="IDX113"></A> 17 18</P> 19 20<P> 21For the programmer, changes to the C source code fall into three 22categories. First, you have to make the localization functions 23known to all modules needing message translation. Second, you should 24properly trigger the operation of GNU <CODE>gettext</CODE> when the program 25initializes, usually from the <CODE>main</CODE> function. Last, you should 26identify, adjust and mark all constant strings in your program 27needing translation. 28 29</P> 30 31 32 33<H2><A NAME="SEC17" HREF="gettext_toc.html#TOC17">4.1 Importing the <CODE>gettext</CODE> declaration</A></H2> 34 35<P> 36Presuming that your set of programs, or package, has been adjusted 37so all needed GNU <CODE>gettext</CODE> files are available, and your 38<TT>‘Makefile’</TT> files are adjusted (see section <A HREF="gettext_13.html#SEC210">13 The Maintainer's View</A>), each C module 39having translated C strings should contain the line: 40 41</P> 42<P> 43<A NAME="IDX114"></A> 44 45<PRE> 46#include <libintl.h> 47</PRE> 48 49<P> 50Similarly, each C module containing <CODE>printf()</CODE>/<CODE>fprintf()</CODE>/... 51calls with a format string that could be a translated C string (even if 52the C string comes from a different C module) should contain the line: 53 54</P> 55 56<PRE> 57#include <libintl.h> 58</PRE> 59 60 61 62<H2><A NAME="SEC18" HREF="gettext_toc.html#TOC18">4.2 Triggering <CODE>gettext</CODE> Operations</A></H2> 63 64<P> 65<A NAME="IDX115"></A> 66The initialization of locale data should be done with more or less 67the same code in every program, as demonstrated below: 68 69</P> 70 71<PRE> 72int 73main (int argc, char *argv[]) 74{ 75 ... 76 setlocale (LC_ALL, ""); 77 bindtextdomain (PACKAGE, LOCALEDIR); 78 textdomain (PACKAGE); 79 ... 80} 81</PRE> 82 83<P> 84<VAR>PACKAGE</VAR> and <VAR>LOCALEDIR</VAR> should be provided either by 85<TT>‘config.h’</TT> or by the Makefile. For now consult the <CODE>gettext</CODE> 86or <CODE>hello</CODE> sources for more information. 87 88</P> 89<P> 90<A NAME="IDX116"></A> 91<A NAME="IDX117"></A> 92The use of <CODE>LC_ALL</CODE> might not be appropriate for you. 93<CODE>LC_ALL</CODE> includes all locale categories and especially 94<CODE>LC_CTYPE</CODE>. This latter category is responsible for determining 95character classes with the <CODE>isalnum</CODE> etc. functions from 96<TT>‘ctype.h’</TT> which could especially for programs, which process some 97kind of input language, be wrong. For example this would mean that a 98source code using the ç (c-cedilla character) is runnable in 99France but not in the U.S. 100 101</P> 102<P> 103Some systems also have problems with parsing numbers using the 104<CODE>scanf</CODE> functions if an other but the <CODE>LC_ALL</CODE> locale category is 105used. The standards say that additional formats but the one known in the 106<CODE>"C"</CODE> locale might be recognized. But some systems seem to reject 107numbers in the <CODE>"C"</CODE> locale format. In some situation, it might 108also be a problem with the notation itself which makes it impossible to 109recognize whether the number is in the <CODE>"C"</CODE> locale or the local 110format. This can happen if thousands separator characters are used. 111Some locales define this character according to the national 112conventions to <CODE>'.'</CODE> which is the same character used in the 113<CODE>"C"</CODE> locale to denote the decimal point. 114 115</P> 116<P> 117So it is sometimes necessary to replace the <CODE>LC_ALL</CODE> line in the 118code above by a sequence of <CODE>setlocale</CODE> lines 119 120</P> 121 122<PRE> 123{ 124 ... 125 setlocale (LC_CTYPE, ""); 126 setlocale (LC_MESSAGES, ""); 127 ... 128} 129</PRE> 130 131<P> 132<A NAME="IDX118"></A> 133<A NAME="IDX119"></A> 134<A NAME="IDX120"></A> 135<A NAME="IDX121"></A> 136<A NAME="IDX122"></A> 137<A NAME="IDX123"></A> 138<A NAME="IDX124"></A> 139On all POSIX conformant systems the locale categories <CODE>LC_CTYPE</CODE>, 140<CODE>LC_MESSAGES</CODE>, <CODE>LC_COLLATE</CODE>, <CODE>LC_MONETARY</CODE>, 141<CODE>LC_NUMERIC</CODE>, and <CODE>LC_TIME</CODE> are available. On some systems 142which are only ISO C compliant, <CODE>LC_MESSAGES</CODE> is missing, but 143a substitute for it is defined in GNU gettext's <CODE><libintl.h></CODE> and 144in GNU gnulib's <CODE><locale.h></CODE>. 145 146</P> 147<P> 148Note that changing the <CODE>LC_CTYPE</CODE> also affects the functions 149declared in the <CODE><ctype.h></CODE> standard header and some functions 150declared in the <CODE><string.h></CODE> and <CODE><stdlib.h></CODE> standard headers. 151If this is not 152desirable in your application (for example in a compiler's parser), 153you can use a set of substitute functions which hardwire the C locale, 154such as found in the modules <SAMP>‘c-ctype’</SAMP>, <SAMP>‘c-strcase’</SAMP>, 155<SAMP>‘c-strcasestr’</SAMP>, <SAMP>‘c-strtod’</SAMP>, <SAMP>‘c-strtold’</SAMP> in the GNU gnulib 156source distribution. 157 158</P> 159<P> 160It is also possible to switch the locale forth and back between the 161environment dependent locale and the C locale, but this approach is 162normally avoided because a <CODE>setlocale</CODE> call is expensive, 163because it is tedious to determine the places where a locale switch 164is needed in a large program's source, and because switching a locale 165is not multithread-safe. 166 167</P> 168 169 170<H2><A NAME="SEC19" HREF="gettext_toc.html#TOC19">4.3 Preparing Translatable Strings</A></H2> 171 172<P> 173<A NAME="IDX125"></A> 174Before strings can be marked for translations, they sometimes need to 175be adjusted. Usually preparing a string for translation is done right 176before marking it, during the marking phase which is described in the 177next sections. What you have to keep in mind while doing that is the 178following. 179 180</P> 181 182<UL> 183<LI> 184 185Decent English style. 186 187<LI> 188 189Entire sentences. 190 191<LI> 192 193Split at paragraphs. 194 195<LI> 196 197Use format strings instead of string concatenation. 198 199<LI> 200 201Avoid unusual markup and unusual control characters. 202</UL> 203 204<P> 205Let's look at some examples of these guidelines. 206 207</P> 208<P> 209<A NAME="IDX126"></A> 210Translatable strings should be in good English style. If slang language 211with abbreviations and shortcuts is used, often translators will not 212understand the message and will produce very inappropriate translations. 213 214</P> 215 216<PRE> 217"%s: is parameter\n" 218</PRE> 219 220<P> 221This is nearly untranslatable: Is the displayed item <EM>a</EM> parameter or 222<EM>the</EM> parameter? 223 224</P> 225 226<PRE> 227"No match" 228</PRE> 229 230<P> 231The ambiguity in this message makes it unintelligible: Is the program 232attempting to set something on fire? Does it mean "The given object does 233not match the template"? Does it mean "The template does not fit for any 234of the objects"? 235 236</P> 237<P> 238<A NAME="IDX127"></A> 239In both cases, adding more words to the message will help both the 240translator and the English speaking user. 241 242</P> 243<P> 244<A NAME="IDX128"></A> 245Translatable strings should be entire sentences. It is often not possible 246to translate single verbs or adjectives in a substitutable way. 247 248</P> 249 250<PRE> 251printf ("File %s is %s protected", filename, rw ? "write" : "read"); 252</PRE> 253 254<P> 255Most translators will not look at the source and will thus only see the 256string <CODE>"File %s is %s protected"</CODE>, which is unintelligible. Change 257this to 258 259</P> 260 261<PRE> 262printf (rw ? "File %s is write protected" : "File %s is read protected", 263 filename); 264</PRE> 265 266<P> 267This way the translator will not only understand the message, she will 268also be able to find the appropriate grammatical construction. A French 269translator for example translates "write protected" like "protected 270against writing". 271 272</P> 273<P> 274Entire sentences are also important because in many languages, the 275declination of some word in a sentence depends on the gender or the 276number (singular/plural) of another part of the sentence. There are 277usually more interdependencies between words than in English. The 278consequence is that asking a translator to translate two half-sentences 279and then combining these two half-sentences through dumb string concatenation 280will not work, for many languages, even though it would work for English. 281That's why translators need to handle entire sentences. 282 283</P> 284<P> 285Often sentences don't fit into a single line. If a sentence is output 286using two subsequent <CODE>printf</CODE> statements, like this 287 288</P> 289 290<PRE> 291printf ("Locale charset \"%s\" is different from\n", lcharset); 292printf ("input file charset \"%s\".\n", fcharset); 293</PRE> 294 295<P> 296the translator would have to translate two half sentences, but nothing 297in the POT file would tell her that the two half sentences belong together. 298It is necessary to merge the two <CODE>printf</CODE> statements so that the 299translator can handle the entire sentence at once and decide at which 300place to insert a line break in the translation (if at all): 301 302</P> 303 304<PRE> 305printf ("Locale charset \"%s\" is different from\n\ 306input file charset \"%s\".\n", lcharset, fcharset); 307</PRE> 308 309<P> 310You may now ask: how about two or more adjacent sentences? Like in this case: 311 312</P> 313 314<PRE> 315puts ("Apollo 13 scenario: Stack overflow handling failed."); 316puts ("On the next stack overflow we will crash!!!"); 317</PRE> 318 319<P> 320Should these two statements merged into a single one? I would recommend to 321merge them if the two sentences are related to each other, because then it 322makes it easier for the translator to understand and translate both. On 323the other hand, if one of the two messages is a stereotypic one, occurring 324in other places as well, you will do a favour to the translator by not 325merging the two. (Identical messages occurring in several places are 326combined by xgettext, so the translator has to handle them once only.) 327 328</P> 329<P> 330<A NAME="IDX129"></A> 331Translatable strings should be limited to one paragraph; don't let a 332single message be longer than ten lines. The reason is that when the 333translatable string changes, the translator is faced with the task of 334updating the entire translated string. Maybe only a single word will 335have changed in the English string, but the translator doesn't see that 336(with the current translation tools), therefore she has to proofread 337the entire message. 338 339</P> 340<P> 341<A NAME="IDX130"></A> 342Many GNU programs have a <SAMP>‘--help’</SAMP> output that extends over several 343screen pages. It is a courtesy towards the translators to split such a 344message into several ones of five to ten lines each. While doing that, 345you can also attempt to split the documented options into groups, 346such as the input options, the output options, and the informative 347output options. This will help every user to find the option he is 348looking for. 349 350</P> 351<P> 352<A NAME="IDX131"></A> 353<A NAME="IDX132"></A> 354Hardcoded string concatenation is sometimes used to construct English 355strings: 356 357</P> 358 359<PRE> 360strcpy (s, "Replace "); 361strcat (s, object1); 362strcat (s, " with "); 363strcat (s, object2); 364strcat (s, "?"); 365</PRE> 366 367<P> 368In order to present to the translator only entire sentences, and also 369because in some languages the translator might want to swap the order 370of <CODE>object1</CODE> and <CODE>object2</CODE>, it is necessary to change this 371to use a format string: 372 373</P> 374 375<PRE> 376sprintf (s, "Replace %s with %s?", object1, object2); 377</PRE> 378 379<P> 380<A NAME="IDX133"></A> 381A similar case is compile time concatenation of strings. The ISO C 99 382include file <CODE><inttypes.h></CODE> contains a macro <CODE>PRId64</CODE> that 383can be used as a formatting directive for outputting an <SAMP>‘int64_t’</SAMP> 384integer through <CODE>printf</CODE>. It expands to a constant string, usually 385"d" or "ld" or "lld" or something like this, depending on the platform. 386Assume you have code like 387 388</P> 389 390<PRE> 391printf ("The amount is %0" PRId64 "\n", number); 392</PRE> 393 394<P> 395The <CODE>gettext</CODE> tools and library have special support for these 396<CODE><inttypes.h></CODE> macros. You can therefore simply write 397 398</P> 399 400<PRE> 401printf (gettext ("The amount is %0" PRId64 "\n"), number); 402</PRE> 403 404<P> 405The PO file will contain the string "The amount is %0<PRId64>\n". 406The translators will provide a translation containing "%0<PRId64>" 407as well, and at runtime the <CODE>gettext</CODE> function's result will 408contain the appropriate constant string, "d" or "ld" or "lld". 409 410</P> 411<P> 412This works only for the predefined <CODE><inttypes.h></CODE> macros. If 413you have defined your own similar macros, let's say <SAMP>‘MYPRId64’</SAMP>, 414that are not known to <CODE>xgettext</CODE>, the solution for this problem 415is to change the code like this: 416 417</P> 418 419<PRE> 420char buf1[100]; 421sprintf (buf1, "%0" MYPRId64, number); 422printf (gettext ("The amount is %s\n"), buf1); 423</PRE> 424 425<P> 426This means, you put the platform dependent code in one statement, and the 427internationalization code in a different statement. Note that a buffer length 428of 100 is safe, because all available hardware integer types are limited to 429128 bits, and to print a 128 bit integer one needs at most 54 characters, 430regardless whether in decimal, octal or hexadecimal. 431 432</P> 433<P> 434<A NAME="IDX134"></A> 435<A NAME="IDX135"></A> 436All this applies to other programming languages as well. For example, in 437Java and C#, string concatenation is very frequently used, because it is a 438compiler built-in operator. Like in C, in Java, you would change 439 440</P> 441 442<PRE> 443System.out.println("Replace "+object1+" with "+object2+"?"); 444</PRE> 445 446<P> 447into a statement involving a format string: 448 449</P> 450 451<PRE> 452System.out.println( 453 MessageFormat.format("Replace {0} with {1}?", 454 new Object[] { object1, object2 })); 455</PRE> 456 457<P> 458Similarly, in C#, you would change 459 460</P> 461 462<PRE> 463Console.WriteLine("Replace "+object1+" with "+object2+"?"); 464</PRE> 465 466<P> 467into a statement involving a format string: 468 469</P> 470 471<PRE> 472Console.WriteLine( 473 String.Format("Replace {0} with {1}?", object1, object2)); 474</PRE> 475 476<P> 477<A NAME="IDX136"></A> 478<A NAME="IDX137"></A> 479Unusual markup or control characters should not be used in translatable 480strings. Translators will likely not understand the particular meaning 481of the markup or control characters. 482 483</P> 484<P> 485For example, if you have a convention that <SAMP>‘|’</SAMP> delimits the 486left-hand and right-hand part of some GUI elements, translators will 487often not understand it without specific comments. It might be 488better to have the translator translate the left-hand and right-hand 489part separately. 490 491</P> 492<P> 493Another example is the <SAMP>‘argp’</SAMP> convention to use a single <SAMP>‘\v’</SAMP> 494(vertical tab) control character to delimit two sections inside a 495string. This is flawed. Some translators may convert it to a simple 496newline, some to blank lines. With some PO file editors it may not be 497easy to even enter a vertical tab control character. So, you cannot 498be sure that the translation will contain a <SAMP>‘\v’</SAMP> character, at the 499corresponding position. The solution is, again, to let the translator 500translate two separate strings and combine at run-time the two translated 501strings with the <SAMP>‘\v’</SAMP> required by the convention. 502 503</P> 504<P> 505HTML markup, however, is common enough that it's probably ok to use in 506translatable strings. But please bear in mind that the GNU gettext tools 507don't verify that the translations are well-formed HTML. 508 509</P> 510 511 512<H2><A NAME="SEC20" HREF="gettext_toc.html#TOC20">4.4 How Marks Appear in Sources</A></H2> 513<P> 514<A NAME="IDX138"></A> 515 516</P> 517<P> 518All strings requiring translation should be marked in the C sources. Marking 519is done in such a way that each translatable string appears to be 520the sole argument of some function or preprocessor macro. There are 521only a few such possible functions or macros meant for translation, 522and their names are said to be marking keywords. The marking is 523attached to strings themselves, rather than to what we do with them. 524This approach has more uses. A blatant example is an error message 525produced by formatting. The format string needs translation, as 526well as some strings inserted through some <SAMP>‘%s’</SAMP> specification 527in the format, while the result from <CODE>sprintf</CODE> may have so many 528different instances that it is impractical to list them all in some 529<SAMP>‘error_string_out()’</SAMP> routine, say. 530 531</P> 532<P> 533This marking operation has two goals. The first goal of marking 534is for triggering the retrieval of the translation, at run time. 535The keyword is possibly resolved into a routine able to dynamically 536return the proper translation, as far as possible or wanted, for the 537argument string. Most localizable strings are found in executable 538positions, that is, attached to variables or given as parameters to 539functions. But this is not universal usage, and some translatable 540strings appear in structured initializations. See section <A HREF="gettext_4.html#SEC23">4.7 Special Cases of Translatable Strings</A>. 541 542</P> 543<P> 544The second goal of the marking operation is to help <CODE>xgettext</CODE> 545at properly extracting all translatable strings when it scans a set 546of program sources and produces PO file templates. 547 548</P> 549<P> 550The canonical keyword for marking translatable strings is 551<SAMP>‘gettext’</SAMP>, it gave its name to the whole GNU <CODE>gettext</CODE> 552package. For packages making only light use of the <SAMP>‘gettext’</SAMP> 553keyword, macro or function, it is easily used <EM>as is</EM>. However, 554for packages using the <CODE>gettext</CODE> interface more heavily, it 555is usually more convenient to give the main keyword a shorter, less 556obtrusive name. Indeed, the keyword might appear on a lot of strings 557all over the package, and programmers usually do not want nor need 558their program sources to remind them forcefully, all the time, that they 559are internationalized. Further, a long keyword has the disadvantage 560of using more horizontal space, forcing more indentation work on 561sources for those trying to keep them within 79 or 80 columns. 562 563</P> 564<P> 565<A NAME="IDX139"></A> 566Many packages use <SAMP>‘_’</SAMP> (a simple underline) as a keyword, 567and write <SAMP>‘_("Translatable string")’</SAMP> instead of <SAMP>‘gettext 568("Translatable string")’</SAMP>. Further, the coding rule, from GNU standards, 569wanting that there is a space between the keyword and the opening 570parenthesis is relaxed, in practice, for this particular usage. 571So, the textual overhead per translatable string is reduced to 572only three characters: the underline and the two parentheses. 573However, even if GNU <CODE>gettext</CODE> uses this convention internally, 574it does not offer it officially. The real, genuine keyword is truly 575<SAMP>‘gettext’</SAMP> indeed. It is fairly easy for those wanting to use 576<SAMP>‘_’</SAMP> instead of <SAMP>‘gettext’</SAMP> to declare: 577 578</P> 579 580<PRE> 581#include <libintl.h> 582#define _(String) gettext (String) 583</PRE> 584 585<P> 586instead of merely using <SAMP>‘#include <libintl.h>’</SAMP>. 587 588</P> 589<P> 590The marking keywords <SAMP>‘gettext’</SAMP> and <SAMP>‘_’</SAMP> take the translatable 591string as sole argument. It is also possible to define marking functions 592that take it at another argument position. It is even possible to make 593the marked argument position depend on the total number of arguments of 594the function call; this is useful in C++. All this is achieved using 595<CODE>xgettext</CODE>'s <SAMP>‘--keyword’</SAMP> option. 596 597</P> 598<P> 599Note also that long strings can be split across lines, into multiple 600adjacent string tokens. Automatic string concatenation is performed 601at compile time according to ISO C and ISO C++; <CODE>xgettext</CODE> also 602supports this syntax. 603 604</P> 605<P> 606Later on, the maintenance is relatively easy. If, as a programmer, 607you add or modify a string, you will have to ask yourself if the 608new or altered string requires translation, and include it within 609<SAMP>‘_()’</SAMP> if you think it should be translated. For example, <SAMP>‘"%s"’</SAMP> 610is an example of string <EM>not</EM> requiring translation. But 611<SAMP>‘"%s: %d"’</SAMP> <EM>does</EM> require translation, because in French, unlike 612in English, it's customary to put a space before a colon. 613 614</P> 615 616 617<H2><A NAME="SEC21" HREF="gettext_toc.html#TOC21">4.5 Marking Translatable Strings</A></H2> 618<P> 619<A NAME="IDX140"></A> 620 621</P> 622<P> 623In PO mode, one set of features is meant more for the programmer than 624for the translator, and allows him to interactively mark which strings, 625in a set of program sources, are translatable, and which are not. 626Even if it is a fairly easy job for a programmer to find and mark 627such strings by other means, using any editor of his choice, PO mode 628makes this work more comfortable. Further, this gives translators 629who feel a little like programmers, or programmers who feel a little 630like translators, a tool letting them work at marking translatable 631strings in the program sources, while simultaneously producing a set of 632translation in some language, for the package being internationalized. 633 634</P> 635<P> 636<A NAME="IDX141"></A> 637The set of program sources, targeted by the PO mode commands describe 638here, should have an Emacs tags table constructed for your project, 639prior to using these PO file commands. This is easy to do. In any 640shell window, change the directory to the root of your project, then 641execute a command resembling: 642 643</P> 644 645<PRE> 646etags src/*.[hc] lib/*.[hc] 647</PRE> 648 649<P> 650presuming here you want to process all <TT>‘.h’</TT> and <TT>‘.c’</TT> files 651from the <TT>‘src/’</TT> and <TT>‘lib/’</TT> directories. This command will 652explore all said files and create a <TT>‘TAGS’</TT> file in your root 653directory, somewhat summarizing the contents using a special file 654format Emacs can understand. 655 656</P> 657<P> 658<A NAME="IDX142"></A> 659For packages following the GNU coding standards, there is 660a make goal <CODE>tags</CODE> or <CODE>TAGS</CODE> which constructs the tag files in 661all directories and for all files containing source code. 662 663</P> 664<P> 665Once your <TT>‘TAGS’</TT> file is ready, the following commands assist 666the programmer at marking translatable strings in his set of sources. 667But these commands are necessarily driven from within a PO file 668window, and it is likely that you do not even have such a PO file yet. 669This is not a problem at all, as you may safely open a new, empty PO 670file, mainly for using these commands. This empty PO file will slowly 671fill in while you mark strings as translatable in your program sources. 672 673</P> 674<DL COMPACT> 675 676<DT><KBD>,</KBD> 677<DD> 678<A NAME="IDX143"></A> 679Search through program sources for a string which looks like a 680candidate for translation (<CODE>po-tags-search</CODE>). 681 682<DT><KBD>M-,</KBD> 683<DD> 684<A NAME="IDX144"></A> 685Mark the last string found with <SAMP>‘_()’</SAMP> (<CODE>po-mark-translatable</CODE>). 686 687<DT><KBD>M-.</KBD> 688<DD> 689<A NAME="IDX145"></A> 690Mark the last string found with a keyword taken from a set of possible 691keywords. This command with a prefix allows some management of these 692keywords (<CODE>po-select-mark-and-mark</CODE>). 693 694</DL> 695 696<P> 697<A NAME="IDX146"></A> 698The <KBD>,</KBD> (<CODE>po-tags-search</CODE>) command searches for the next 699occurrence of a string which looks like a possible candidate for 700translation, and displays the program source in another Emacs window, 701positioned in such a way that the string is near the top of this other 702window. If the string is too big to fit whole in this window, it is 703positioned so only its end is shown. In any case, the cursor 704is left in the PO file window. If the shown string would be better 705presented differently in different native languages, you may mark it 706using <KBD>M-,</KBD> or <KBD>M-.</KBD>. Otherwise, you might rather ignore it 707and skip to the next string by merely repeating the <KBD>,</KBD> command. 708 709</P> 710<P> 711A string is a good candidate for translation if it contains a sequence 712of three or more letters. A string containing at most two letters in 713a row will be considered as a candidate if it has more letters than 714non-letters. The command disregards strings containing no letters, 715or isolated letters only. It also disregards strings within comments, 716or strings already marked with some keyword PO mode knows (see below). 717 718</P> 719<P> 720If you have never told Emacs about some <TT>‘TAGS’</TT> file to use, the 721command will request that you specify one from the minibuffer, the 722first time you use the command. You may later change your <TT>‘TAGS’</TT> 723file by using the regular Emacs command <KBD>M-x visit-tags-table</KBD>, 724which will ask you to name the precise <TT>‘TAGS’</TT> file you want 725to use. See section ���Tag Tables��� in <CITE>The Emacs Editor</CITE>. 726 727</P> 728<P> 729Each time you use the <KBD>,</KBD> command, the search resumes from where it was 730left by the previous search, and goes through all program sources, 731obeying the <TT>‘TAGS’</TT> file, until all sources have been processed. 732However, by giving a prefix argument to the command (<KBD>C-u 733,)</KBD>, you may request that the search be restarted all over again 734from the first program source; but in this case, strings that you 735recently marked as translatable will be automatically skipped. 736 737</P> 738<P> 739Using this <KBD>,</KBD> command does not prevent using of other regular 740Emacs tags commands. For example, regular <CODE>tags-search</CODE> or 741<CODE>tags-query-replace</CODE> commands may be used without disrupting the 742independent <KBD>,</KBD> search sequence. However, as implemented, the 743<EM>initial</EM> <KBD>,</KBD> command (or the <KBD>,</KBD> command is used with a 744prefix) might also reinitialize the regular Emacs tags searching to the 745first tags file, this reinitialization might be considered spurious. 746 747</P> 748<P> 749<A NAME="IDX147"></A> 750<A NAME="IDX148"></A> 751The <KBD>M-,</KBD> (<CODE>po-mark-translatable</CODE>) command will mark the 752recently found string with the <SAMP>‘_’</SAMP> keyword. The <KBD>M-.</KBD> 753(<CODE>po-select-mark-and-mark</CODE>) command will request that you type 754one keyword from the minibuffer and use that keyword for marking 755the string. Both commands will automatically create a new PO file 756untranslated entry for the string being marked, and make it the 757current entry (making it easy for you to immediately proceed to its 758translation, if you feel like doing it right away). It is possible 759that the modifications made to the program source by <KBD>M-,</KBD> or 760<KBD>M-.</KBD> render some source line longer than 80 columns, forcing you 761to break and re-indent this line differently. You may use the <KBD>O</KBD> 762command from PO mode, or any other window changing command from 763Emacs, to break out into the program source window, and do any 764needed adjustments. You will have to use some regular Emacs command 765to return the cursor to the PO file window, if you want command 766<KBD>,</KBD> for the next string, say. 767 768</P> 769<P> 770The <KBD>M-.</KBD> command has a few built-in speedups, so you do not 771have to explicitly type all keywords all the time. The first such 772speedup is that you are presented with a <EM>preferred</EM> keyword, 773which you may accept by merely typing <KBD><KBD>RET</KBD></KBD> at the prompt. 774The second speedup is that you may type any non-ambiguous prefix of the 775keyword you really mean, and the command will complete it automatically 776for you. This also means that PO mode has to <EM>know</EM> all 777your possible keywords, and that it will not accept mistyped keywords. 778 779</P> 780<P> 781If you reply <KBD>?</KBD> to the keyword request, the command gives a 782list of all known keywords, from which you may choose. When the 783command is prefixed by an argument (<KBD>C-u M-.</KBD>), it inhibits 784updating any program source or PO file buffer, and does some simple 785keyword management instead. In this case, the command asks for a 786keyword, written in full, which becomes a new allowed keyword for 787later <KBD>M-.</KBD> commands. Moreover, this new keyword automatically 788becomes the <EM>preferred</EM> keyword for later commands. By typing 789an already known keyword in response to <KBD>C-u M-.</KBD>, one merely 790changes the <EM>preferred</EM> keyword and does nothing more. 791 792</P> 793<P> 794All keywords known for <KBD>M-.</KBD> are recognized by the <KBD>,</KBD> command 795when scanning for strings, and strings already marked by any of those 796known keywords are automatically skipped. If many PO files are opened 797simultaneously, each one has its own independent set of known keywords. 798There is no provision in PO mode, currently, for deleting a known 799keyword, you have to quit the file (maybe using <KBD>q</KBD>) and reopen 800it afresh. When a PO file is newly brought up in an Emacs window, only 801<SAMP>‘gettext’</SAMP> and <SAMP>‘_’</SAMP> are known as keywords, and <SAMP>‘gettext’</SAMP> 802is preferred for the <KBD>M-.</KBD> command. In fact, this is not useful to 803prefer <SAMP>‘_’</SAMP>, as this one is already built in the <KBD>M-,</KBD> command. 804 805</P> 806 807 808<H2><A NAME="SEC22" HREF="gettext_toc.html#TOC22">4.6 Special Comments preceding Keywords</A></H2> 809 810<P> 811<A NAME="IDX149"></A> 812In C programs strings are often used within calls of functions from the 813<CODE>printf</CODE> family. The special thing about these format strings is 814that they can contain format specifiers introduced with <KBD>%</KBD>. Assume 815we have the code 816 817</P> 818 819<PRE> 820printf (gettext ("String `%s' has %d characters\n"), s, strlen (s)); 821</PRE> 822 823<P> 824A possible German translation for the above string might be: 825 826</P> 827 828<PRE> 829"%d Zeichen lang ist die Zeichenkette `%s'" 830</PRE> 831 832<P> 833A C programmer, even if he cannot speak German, will recognize that 834there is something wrong here. The order of the two format specifiers 835is changed but of course the arguments in the <CODE>printf</CODE> don't have. 836This will most probably lead to problems because now the length of the 837string is regarded as the address. 838 839</P> 840<P> 841To prevent errors at runtime caused by translations the <CODE>msgfmt</CODE> 842tool can check statically whether the arguments in the original and the 843translation string match in type and number. If this is not the case 844and the <SAMP>‘-c’</SAMP> option has been passed to <CODE>msgfmt</CODE>, <CODE>msgfmt</CODE> 845will give an error and refuse to produce a MO file. Thus consequent 846use of <SAMP>‘msgfmt -c’</SAMP> will catch the error, so that it cannot cause 847cause problems at runtime. 848 849</P> 850<P> 851If the word order in the above German translation would be correct one 852would have to write 853 854</P> 855 856<PRE> 857"%2$d Zeichen lang ist die Zeichenkette `%1$s'" 858</PRE> 859 860<P> 861The routines in <CODE>msgfmt</CODE> know about this special notation. 862 863</P> 864<P> 865Because not all strings in a program must be format strings it is not 866useful for <CODE>msgfmt</CODE> to test all the strings in the <TT>‘.po’</TT> file. 867This might cause problems because the string might contain what looks 868like a format specifier, but the string is not used in <CODE>printf</CODE>. 869 870</P> 871<P> 872Therefore the <CODE>xgettext</CODE> adds a special tag to those messages it 873thinks might be a format string. There is no absolute rule for this, 874only a heuristic. In the <TT>‘.po’</TT> file the entry is marked using the 875<CODE>c-format</CODE> flag in the <CODE>#,</CODE> comment line (see section <A HREF="gettext_3.html#SEC15">3 The Format of PO Files</A>). 876 877</P> 878<P> 879<A NAME="IDX150"></A> 880<A NAME="IDX151"></A> 881The careful reader now might say that this again can cause problems. 882The heuristic might guess it wrong. This is true and therefore 883<CODE>xgettext</CODE> knows about a special kind of comment which lets 884the programmer take over the decision. If in the same line as or 885the immediately preceding line to the <CODE>gettext</CODE> keyword 886the <CODE>xgettext</CODE> program finds a comment containing the words 887<CODE>xgettext:c-format</CODE>, it will mark the string in any case with 888the <CODE>c-format</CODE> flag. This kind of comment should be used when 889<CODE>xgettext</CODE> does not recognize the string as a format string but 890it really is one and it should be tested. Please note that when the 891comment is in the same line as the <CODE>gettext</CODE> keyword, it must be 892before the string to be translated. 893 894</P> 895<P> 896This situation happens quite often. The <CODE>printf</CODE> function is often 897called with strings which do not contain a format specifier. Of course 898one would normally use <CODE>fputs</CODE> but it does happen. In this case 899<CODE>xgettext</CODE> does not recognize this as a format string but what 900happens if the translation introduces a valid format specifier? The 901<CODE>printf</CODE> function will try to access one of the parameters but none 902exists because the original code does not pass any parameters. 903 904</P> 905<P> 906<CODE>xgettext</CODE> of course could make a wrong decision the other way 907round, i.e. a string marked as a format string actually is not a format 908string. In this case the <CODE>msgfmt</CODE> might give too many warnings and 909would prevent translating the <TT>‘.po’</TT> file. The method to prevent 910this wrong decision is similar to the one used above, only the comment 911to use must contain the string <CODE>xgettext:no-c-format</CODE>. 912 913</P> 914<P> 915If a string is marked with <CODE>c-format</CODE> and this is not correct the 916user can find out who is responsible for the decision. See 917section <A HREF="gettext_5.html#SEC28">5.1 Invoking the <CODE>xgettext</CODE> Program</A> to see how the <CODE>--debug</CODE> option can be 918used for solving this problem. 919 920</P> 921 922 923<H2><A NAME="SEC23" HREF="gettext_toc.html#TOC23">4.7 Special Cases of Translatable Strings</A></H2> 924 925<P> 926<A NAME="IDX152"></A> 927The attentive reader might now point out that it is not always possible 928to mark translatable string with <CODE>gettext</CODE> or something like this. 929Consider the following case: 930 931</P> 932 933<PRE> 934{ 935 static const char *messages[] = { 936 "some very meaningful message", 937 "and another one" 938 }; 939 const char *string; 940 ... 941 string 942 = index > 1 ? "a default message" : messages[index]; 943 944 fputs (string); 945 ... 946} 947</PRE> 948 949<P> 950While it is no problem to mark the string <CODE>"a default message"</CODE> it 951is not possible to mark the string initializers for <CODE>messages</CODE>. 952What is to be done? We have to fulfill two tasks. First we have to mark the 953strings so that the <CODE>xgettext</CODE> program (see section <A HREF="gettext_5.html#SEC28">5.1 Invoking the <CODE>xgettext</CODE> Program</A>) 954can find them, and second we have to translate the string at runtime 955before printing them. 956 957</P> 958<P> 959The first task can be fulfilled by creating a new keyword, which names a 960no-op. For the second we have to mark all access points to a string 961from the array. So one solution can look like this: 962 963</P> 964 965<PRE> 966#define gettext_noop(String) String 967 968{ 969 static const char *messages[] = { 970 gettext_noop ("some very meaningful message"), 971 gettext_noop ("and another one") 972 }; 973 const char *string; 974 ... 975 string 976 = index > 1 ? gettext ("a default message") : gettext (messages[index]); 977 978 fputs (string); 979 ... 980} 981</PRE> 982 983<P> 984Please convince yourself that the string which is written by 985<CODE>fputs</CODE> is translated in any case. How to get <CODE>xgettext</CODE> know 986the additional keyword <CODE>gettext_noop</CODE> is explained in section <A HREF="gettext_5.html#SEC28">5.1 Invoking the <CODE>xgettext</CODE> Program</A>. 987 988</P> 989<P> 990The above is of course not the only solution. You could also come along 991with the following one: 992 993</P> 994 995<PRE> 996#define gettext_noop(String) String 997 998{ 999 static const char *messages[] = { 1000 gettext_noop ("some very meaningful message", 1001 gettext_noop ("and another one") 1002 }; 1003 const char *string; 1004 ... 1005 string 1006 = index > 1 ? gettext_noop ("a default message") : messages[index]; 1007 1008 fputs (gettext (string)); 1009 ... 1010} 1011</PRE> 1012 1013<P> 1014But this has a drawback. The programmer has to take care that 1015he uses <CODE>gettext_noop</CODE> for the string <CODE>"a default message"</CODE>. 1016A use of <CODE>gettext</CODE> could have in rare cases unpredictable results. 1017 1018</P> 1019<P> 1020One advantage is that you need not make control flow analysis to make 1021sure the output is really translated in any case. But this analysis is 1022generally not very difficult. If it should be in any situation you can 1023use this second method in this situation. 1024 1025</P> 1026 1027 1028<H2><A NAME="SEC24" HREF="gettext_toc.html#TOC24">4.8 Letting Users Report Translation Bugs</A></H2> 1029 1030<P> 1031Code sometimes has bugs, but translations sometimes have bugs too. The 1032users need to be able to report them. Reporting translation bugs to the 1033programmer or maintainer of a package is not very useful, since the 1034maintainer must never change a translation, except on behalf of the 1035translator. Hence the translation bugs must be reported to the 1036translators. 1037 1038</P> 1039<P> 1040Here is a way to organize this so that the maintainer does not need to 1041forward translation bug reports, nor even keep a list of the addresses of 1042the translators or their translation teams. 1043 1044</P> 1045<P> 1046Every program has a place where is shows the bug report address. For 1047GNU programs, it is the code which handles the ���--help��� option, 1048typically in a function called ���usage���. In this place, instruct the 1049translator to add her own bug reporting address. For example, if that 1050code has a statement 1051 1052</P> 1053 1054<PRE> 1055printf (_("Report bugs to <%s>.\n"), PACKAGE_BUGREPORT); 1056</PRE> 1057 1058<P> 1059you can add some translator instructions like this: 1060 1061</P> 1062 1063<PRE> 1064/* TRANSLATORS: The placeholder indicates the bug-reporting address 1065 for this package. Please add _another line_ saying 1066 "Report translation bugs to <...>\n" with the address for translation 1067 bugs (typically your translation team's web or email address). */ 1068printf (_("Report bugs to <%s>.\n"), PACKAGE_BUGREPORT); 1069</PRE> 1070 1071<P> 1072These will be extracted by <SAMP>‘xgettext’</SAMP>, leading to a .pot file that 1073contains this: 1074 1075</P> 1076 1077<PRE> 1078#. TRANSLATORS: The placeholder indicates the bug-reporting address 1079#. for this package. Please add _another line_ saying 1080#. "Report translation bugs to <...>\n" with the address for translation 1081#. bugs (typically your translation team's web or email address). 1082#: src/hello.c:178 1083#, c-format 1084msgid "Report bugs to <%s>.\n" 1085msgstr "" 1086</PRE> 1087 1088 1089 1090<H2><A NAME="SEC25" HREF="gettext_toc.html#TOC25">4.9 Marking Proper Names for Translation</A></H2> 1091 1092<P> 1093Should names of persons, cities, locations etc. be marked for translation 1094or not? People who only know languages that can be written with Latin 1095letters (English, Spanish, French, German, etc.) are tempted to say ���no���, 1096because names usually do not change when transported between these languages. 1097However, in general when translating from one script to another, names 1098are translated too, usually phonetically or by transliteration. For 1099example, Russian or Greek names are converted to the Latin alphabet when 1100being translated to English, and English or French names are converted 1101to the Katakana script when being translated to Japanese. This is 1102necessary because the speakers of the target language in general cannot 1103read the script the name is originally written in. 1104 1105</P> 1106<P> 1107As a programmer, you should therefore make sure that names are marked 1108for translation, with a special comment telling the translators that it 1109is a proper name and how to pronounce it. Like this: 1110 1111</P> 1112 1113<PRE> 1114printf (_("Written by %s.\n"), 1115 /* TRANSLATORS: This is a proper name. See the gettext 1116 manual, section Names. Note this is actually a non-ASCII 1117 name: The first name is (with Unicode escapes) 1118 "Fran\u00e7ois" or (with HTML entities) "Fran&ccedil;ois". 1119 Pronunciation is like "fraa-swa pee-nar". */ 1120 _("Francois Pinard")); 1121</PRE> 1122 1123<P> 1124As a translator, you should use some care when translating names, because 1125it is frustrating if people see their names mutilated or distorted. If 1126your language uses the Latin script, all you need to do is to reproduce 1127the name as perfectly as you can within the usual character set of your 1128language. In this particular case, this means to provide a translation 1129containing the c-cedilla character. If your language uses a different 1130script and the people speaking it don't usually read Latin words, it means 1131transliteration; but you should still give, in parentheses, the original 1132writing of the name -- for the sake of the people that do read the Latin 1133script. Here is an example, using Greek as the target script: 1134 1135</P> 1136 1137<PRE> 1138#. This is a proper name. See the gettext 1139#. manual, section Names. Note this is actually a non-ASCII 1140#. name: The first name is (with Unicode escapes) 1141#. "Fran\u00e7ois" or (with HTML entities) "Fran&ccedil;ois". 1142#. Pronunciation is like "fraa-swa pee-nar". 1143msgid "Francois Pinard" 1144msgstr "\phi\rho\alpha\sigma\omicron\alpha \pi\iota\nu\alpha\rho" 1145 " (Francois Pinard)" 1146</PRE> 1147 1148<P> 1149Because translation of names is such a sensitive domain, it is a good 1150idea to test your translation before submitting it. 1151 1152</P> 1153<P> 1154The translation project <A HREF="http://sourceforge.net/projects/translation">http://sourceforge.net/projects/translation</A> 1155has set up a POT file and translation domain consisting of program author 1156names, with better facilities for the translator than those presented here. 1157Namely, there the original name is written directly in Unicode (rather 1158than with Unicode escapes or HTML entities), and the pronunciation is 1159denoted using the International Phonetic Alphabet (see 1160<A HREF="http://www.wikipedia.org/wiki/International_Phonetic_Alphabet">http://www.wikipedia.org/wiki/International_Phonetic_Alphabet</A>). 1161 1162</P> 1163<P> 1164However, we don't recommend this approach for all POT files in all packages, 1165because this would force translators to use PO files in UTF-8 encoding, 1166which is - in the current state of software (as of 2003) - a major hassle 1167for translators using GNU Emacs or XEmacs with po-mode. 1168 1169</P> 1170 1171 1172<H2><A NAME="SEC26" HREF="gettext_toc.html#TOC26">4.10 Preparing Library Sources</A></H2> 1173 1174<P> 1175When you are preparing a library, not a program, for the use of 1176<CODE>gettext</CODE>, only a few details are different. Here we assume that 1177the library has a translation domain and a POT file of its own. (If 1178it uses the translation domain and POT file of the main program, then 1179the previous sections apply without changes.) 1180 1181</P> 1182 1183<OL> 1184<LI> 1185 1186The library code doesn't call <CODE>setlocale (LC_ALL, "")</CODE>. It's the 1187responsibility of the main program to set the locale. The library's 1188documentation should mention this fact, so that developers of programs 1189using the library are aware of it. 1190 1191<LI> 1192 1193The library code doesn't call <CODE>textdomain (PACKAGE)</CODE>, because it 1194would interfere with the text domain set by the main program. 1195 1196<LI> 1197 1198The initialization code for a program was 1199 1200 1201<PRE> 1202 setlocale (LC_ALL, ""); 1203 bindtextdomain (PACKAGE, LOCALEDIR); 1204 textdomain (PACKAGE); 1205</PRE> 1206 1207For a library it is reduced to 1208 1209 1210<PRE> 1211 bindtextdomain (PACKAGE, LOCALEDIR); 1212</PRE> 1213 1214If your library's API doesn't already have an initialization function, 1215you need to create one, containing at least the <CODE>bindtextdomain</CODE> 1216invocation. However, you usually don't need to export and document this 1217initialization function: It is sufficient that all entry points of the 1218library call the initialization function if it hasn't been called before. 1219The typical idiom used to achieve this is a static boolean variable that 1220indicates whether the initialization function has been called. Like this: 1221 1222 1223<PRE> 1224static bool libfoo_initialized; 1225 1226static void 1227libfoo_initialize (void) 1228{ 1229 bindtextdomain (PACKAGE, LOCALEDIR); 1230 libfoo_initialized = true; 1231} 1232 1233/* This function is part of the exported API. */ 1234struct foo * 1235create_foo (...) 1236{ 1237 /* Must ensure the initialization is performed. */ 1238 if (!libfoo_initialized) 1239 libfoo_initialize (); 1240 ... 1241} 1242 1243/* This function is part of the exported API. The argument must be 1244 non-NULL and have been created through create_foo(). */ 1245int 1246foo_refcount (struct foo *argument) 1247{ 1248 /* No need to invoke the initialization function here, because 1249 create_foo() must already have been called before. */ 1250 ... 1251} 1252</PRE> 1253 1254<LI> 1255 1256The usual declaration of the <SAMP>‘_’</SAMP> macro in each source file was 1257 1258 1259<PRE> 1260#include <libintl.h> 1261#define _(String) gettext (String) 1262</PRE> 1263 1264for a program. For a library, which has its own translation domain, 1265it reads like this: 1266 1267 1268<PRE> 1269#include <libintl.h> 1270#define _(String) dgettext (PACKAGE, String) 1271</PRE> 1272 1273In other words, <CODE>dgettext</CODE> is used instead of <CODE>gettext</CODE>. 1274Similarly, the <CODE>dngettext</CODE> function should be used in place of the 1275<CODE>ngettext</CODE> function. 1276</OL> 1277 1278<P><HR><P> 1279Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_3.html">previous</A>, <A HREF="gettext_5.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. 1280</BODY> 1281</HTML> 1282