1<HTML>
2<HEAD>
3<!-- This HTML file has been created by texi2html 1.52b
4     from gettext.texi on 29 December 2011 -->
5
6<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">
7<TITLE>GNU gettext utilities - 4  Preparing Program Sources</TITLE>
8</HEAD>
9<BODY>
10Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_3.html">previous</A>, <A HREF="gettext_5.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
11<P><HR><P>
12
13
14<H1><A NAME="SEC16" HREF="gettext_toc.html#TOC16">4  Preparing Program Sources</A></H1>
15<P>
16<A NAME="IDX113"></A>
17
18</P>
19
20<P>
21For the programmer, changes to the C source code fall into three
22categories.  First, you have to make the localization functions
23known to all modules needing message translation.  Second, you should
24properly trigger the operation of GNU <CODE>gettext</CODE> when the program
25initializes, usually from the <CODE>main</CODE> function.  Last, you should
26identify, adjust and mark all constant strings in your program
27needing translation.
28
29</P>
30
31
32
33<H2><A NAME="SEC17" HREF="gettext_toc.html#TOC17">4.1  Importing the <CODE>gettext</CODE> declaration</A></H2>
34
35<P>
36Presuming that your set of programs, or package, has been adjusted
37so all needed GNU <CODE>gettext</CODE> files are available, and your
38<TT>&lsquo;Makefile&rsquo;</TT> files are adjusted (see section <A HREF="gettext_13.html#SEC210">13  The Maintainer's View</A>), each C module
39having translated C strings should contain the line:
40
41</P>
42<P>
43<A NAME="IDX114"></A>
44
45<PRE>
46#include &#60;libintl.h&#62;
47</PRE>
48
49<P>
50Similarly, each C module containing <CODE>printf()</CODE>/<CODE>fprintf()</CODE>/...
51calls with a format string that could be a translated C string (even if
52the C string comes from a different C module) should contain the line:
53
54</P>
55
56<PRE>
57#include &#60;libintl.h&#62;
58</PRE>
59
60
61
62<H2><A NAME="SEC18" HREF="gettext_toc.html#TOC18">4.2  Triggering <CODE>gettext</CODE> Operations</A></H2>
63
64<P>
65<A NAME="IDX115"></A>
66The initialization of locale data should be done with more or less
67the same code in every program, as demonstrated below:
68
69</P>
70
71<PRE>
72int
73main (int argc, char *argv[])
74{
75  ...
76  setlocale (LC_ALL, "");
77  bindtextdomain (PACKAGE, LOCALEDIR);
78  textdomain (PACKAGE);
79  ...
80}
81</PRE>
82
83<P>
84<VAR>PACKAGE</VAR> and <VAR>LOCALEDIR</VAR> should be provided either by
85<TT>&lsquo;config.h&rsquo;</TT> or by the Makefile.  For now consult the <CODE>gettext</CODE>
86or <CODE>hello</CODE> sources for more information.
87
88</P>
89<P>
90<A NAME="IDX116"></A>
91<A NAME="IDX117"></A>
92The use of <CODE>LC_ALL</CODE> might not be appropriate for you.
93<CODE>LC_ALL</CODE> includes all locale categories and especially
94<CODE>LC_CTYPE</CODE>.  This latter category is responsible for determining
95character classes with the <CODE>isalnum</CODE> etc. functions from
96<TT>&lsquo;ctype.h&rsquo;</TT> which could especially for programs, which process some
97kind of input language, be wrong.  For example this would mean that a
98source code using the &ccedil; (c-cedilla character) is runnable in
99France but not in the U.S.
100
101</P>
102<P>
103Some systems also have problems with parsing numbers using the
104<CODE>scanf</CODE> functions if an other but the <CODE>LC_ALL</CODE> locale category is
105used.  The standards say that additional formats but the one known in the
106<CODE>"C"</CODE> locale might be recognized.  But some systems seem to reject
107numbers in the <CODE>"C"</CODE> locale format.  In some situation, it might
108also be a problem with the notation itself which makes it impossible to
109recognize whether the number is in the <CODE>"C"</CODE> locale or the local
110format.  This can happen if thousands separator characters are used.
111Some locales define this character according to the national
112conventions to <CODE>'.'</CODE> which is the same character used in the
113<CODE>"C"</CODE> locale to denote the decimal point.
114
115</P>
116<P>
117So it is sometimes necessary to replace the <CODE>LC_ALL</CODE> line in the
118code above by a sequence of <CODE>setlocale</CODE> lines
119
120</P>
121
122<PRE>
123{
124  ...
125  setlocale (LC_CTYPE, "");
126  setlocale (LC_MESSAGES, "");
127  ...
128}
129</PRE>
130
131<P>
132<A NAME="IDX118"></A>
133<A NAME="IDX119"></A>
134<A NAME="IDX120"></A>
135<A NAME="IDX121"></A>
136<A NAME="IDX122"></A>
137<A NAME="IDX123"></A>
138<A NAME="IDX124"></A>
139On all POSIX conformant systems the locale categories <CODE>LC_CTYPE</CODE>,
140<CODE>LC_MESSAGES</CODE>, <CODE>LC_COLLATE</CODE>, <CODE>LC_MONETARY</CODE>,
141<CODE>LC_NUMERIC</CODE>, and <CODE>LC_TIME</CODE> are available.  On some systems
142which are only ISO C compliant, <CODE>LC_MESSAGES</CODE> is missing, but
143a substitute for it is defined in GNU gettext's <CODE>&#60;libintl.h&#62;</CODE> and
144in GNU gnulib's <CODE>&#60;locale.h&#62;</CODE>.
145
146</P>
147<P>
148Note that changing the <CODE>LC_CTYPE</CODE> also affects the functions
149declared in the <CODE>&#60;ctype.h&#62;</CODE> standard header and some functions
150declared in the <CODE>&#60;string.h&#62;</CODE> and <CODE>&#60;stdlib.h&#62;</CODE> standard headers.
151If this is not
152desirable in your application (for example in a compiler's parser),
153you can use a set of substitute functions which hardwire the C locale,
154such as found in the modules <SAMP>&lsquo;c-ctype&rsquo;</SAMP>, <SAMP>&lsquo;c-strcase&rsquo;</SAMP>,
155<SAMP>&lsquo;c-strcasestr&rsquo;</SAMP>, <SAMP>&lsquo;c-strtod&rsquo;</SAMP>, <SAMP>&lsquo;c-strtold&rsquo;</SAMP> in the GNU gnulib
156source distribution.
157
158</P>
159<P>
160It is also possible to switch the locale forth and back between the
161environment dependent locale and the C locale, but this approach is
162normally avoided because a <CODE>setlocale</CODE> call is expensive,
163because it is tedious to determine the places where a locale switch
164is needed in a large program's source, and because switching a locale
165is not multithread-safe.
166
167</P>
168
169
170<H2><A NAME="SEC19" HREF="gettext_toc.html#TOC19">4.3  Preparing Translatable Strings</A></H2>
171
172<P>
173<A NAME="IDX125"></A>
174Before strings can be marked for translations, they sometimes need to
175be adjusted.  Usually preparing a string for translation is done right
176before marking it, during the marking phase which is described in the
177next sections.  What you have to keep in mind while doing that is the
178following.
179
180</P>
181
182<UL>
183<LI>
184
185Decent English style.
186
187<LI>
188
189Entire sentences.
190
191<LI>
192
193Split at paragraphs.
194
195<LI>
196
197Use format strings instead of string concatenation.
198
199<LI>
200
201Avoid unusual markup and unusual control characters.
202</UL>
203
204<P>
205Let's look at some examples of these guidelines.
206
207</P>
208<P>
209<A NAME="IDX126"></A>
210Translatable strings should be in good English style.  If slang language
211with abbreviations and shortcuts is used, often translators will not
212understand the message and will produce very inappropriate translations.
213
214</P>
215
216<PRE>
217"%s: is parameter\n"
218</PRE>
219
220<P>
221This is nearly untranslatable: Is the displayed item <EM>a</EM> parameter or
222<EM>the</EM> parameter?
223
224</P>
225
226<PRE>
227"No match"
228</PRE>
229
230<P>
231The ambiguity in this message makes it unintelligible: Is the program
232attempting to set something on fire? Does it mean "The given object does
233not match the template"? Does it mean "The template does not fit for any
234of the objects"?
235
236</P>
237<P>
238<A NAME="IDX127"></A>
239In both cases, adding more words to the message will help both the
240translator and the English speaking user.
241
242</P>
243<P>
244<A NAME="IDX128"></A>
245Translatable strings should be entire sentences.  It is often not possible
246to translate single verbs or adjectives in a substitutable way.
247
248</P>
249
250<PRE>
251printf ("File %s is %s protected", filename, rw ? "write" : "read");
252</PRE>
253
254<P>
255Most translators will not look at the source and will thus only see the
256string <CODE>"File %s is %s protected"</CODE>, which is unintelligible.  Change
257this to
258
259</P>
260
261<PRE>
262printf (rw ? "File %s is write protected" : "File %s is read protected",
263        filename);
264</PRE>
265
266<P>
267This way the translator will not only understand the message, she will
268also be able to find the appropriate grammatical construction.  A French
269translator for example translates "write protected" like "protected
270against writing".
271
272</P>
273<P>
274Entire sentences are also important because in many languages, the
275declination of some word in a sentence depends on the gender or the
276number (singular/plural) of another part of the sentence.  There are
277usually more interdependencies between words than in English.  The
278consequence is that asking a translator to translate two half-sentences
279and then combining these two half-sentences through dumb string concatenation
280will not work, for many languages, even though it would work for English.
281That's why translators need to handle entire sentences.
282
283</P>
284<P>
285Often sentences don't fit into a single line.  If a sentence is output
286using two subsequent <CODE>printf</CODE> statements, like this
287
288</P>
289
290<PRE>
291printf ("Locale charset \"%s\" is different from\n", lcharset);
292printf ("input file charset \"%s\".\n", fcharset);
293</PRE>
294
295<P>
296the translator would have to translate two half sentences, but nothing
297in the POT file would tell her that the two half sentences belong together.
298It is necessary to merge the two <CODE>printf</CODE> statements so that the
299translator can handle the entire sentence at once and decide at which
300place to insert a line break in the translation (if at all):
301
302</P>
303
304<PRE>
305printf ("Locale charset \"%s\" is different from\n\
306input file charset \"%s\".\n", lcharset, fcharset);
307</PRE>
308
309<P>
310You may now ask: how about two or more adjacent sentences? Like in this case:
311
312</P>
313
314<PRE>
315puts ("Apollo 13 scenario: Stack overflow handling failed.");
316puts ("On the next stack overflow we will crash!!!");
317</PRE>
318
319<P>
320Should these two statements merged into a single one? I would recommend to
321merge them if the two sentences are related to each other, because then it
322makes it easier for the translator to understand and translate both.  On
323the other hand, if one of the two messages is a stereotypic one, occurring
324in other places as well, you will do a favour to the translator by not
325merging the two.  (Identical messages occurring in several places are
326combined by xgettext, so the translator has to handle them once only.)
327
328</P>
329<P>
330<A NAME="IDX129"></A>
331Translatable strings should be limited to one paragraph; don't let a
332single message be longer than ten lines.  The reason is that when the
333translatable string changes, the translator is faced with the task of
334updating the entire translated string.  Maybe only a single word will
335have changed in the English string, but the translator doesn't see that
336(with the current translation tools), therefore she has to proofread
337the entire message.
338
339</P>
340<P>
341<A NAME="IDX130"></A>
342Many GNU programs have a <SAMP>&lsquo;--help&rsquo;</SAMP> output that extends over several
343screen pages.  It is a courtesy towards the translators to split such a
344message into several ones of five to ten lines each.  While doing that,
345you can also attempt to split the documented options into groups,
346such as the input options, the output options, and the informative
347output options.  This will help every user to find the option he is
348looking for.
349
350</P>
351<P>
352<A NAME="IDX131"></A>
353<A NAME="IDX132"></A>
354Hardcoded string concatenation is sometimes used to construct English
355strings:
356
357</P>
358
359<PRE>
360strcpy (s, "Replace ");
361strcat (s, object1);
362strcat (s, " with ");
363strcat (s, object2);
364strcat (s, "?");
365</PRE>
366
367<P>
368In order to present to the translator only entire sentences, and also
369because in some languages the translator might want to swap the order
370of <CODE>object1</CODE> and <CODE>object2</CODE>, it is necessary to change this
371to use a format string:
372
373</P>
374
375<PRE>
376sprintf (s, "Replace %s with %s?", object1, object2);
377</PRE>
378
379<P>
380<A NAME="IDX133"></A>
381A similar case is compile time concatenation of strings.  The ISO C 99
382include file <CODE>&#60;inttypes.h&#62;</CODE> contains a macro <CODE>PRId64</CODE> that
383can be used as a formatting directive for outputting an <SAMP>&lsquo;int64_t&rsquo;</SAMP>
384integer through <CODE>printf</CODE>.  It expands to a constant string, usually
385"d" or "ld" or "lld" or something like this, depending on the platform.
386Assume you have code like
387
388</P>
389
390<PRE>
391printf ("The amount is %0" PRId64 "\n", number);
392</PRE>
393
394<P>
395The <CODE>gettext</CODE> tools and library have special support for these
396<CODE>&#60;inttypes.h&#62;</CODE> macros.  You can therefore simply write
397
398</P>
399
400<PRE>
401printf (gettext ("The amount is %0" PRId64 "\n"), number);
402</PRE>
403
404<P>
405The PO file will contain the string "The amount is %0&#60;PRId64&#62;\n".
406The translators will provide a translation containing "%0&#60;PRId64&#62;"
407as well, and at runtime the <CODE>gettext</CODE> function's result will
408contain the appropriate constant string, "d" or "ld" or "lld".
409
410</P>
411<P>
412This works only for the predefined <CODE>&#60;inttypes.h&#62;</CODE> macros.  If
413you have defined your own similar macros, let's say <SAMP>&lsquo;MYPRId64&rsquo;</SAMP>,
414that are not known to <CODE>xgettext</CODE>, the solution for this problem
415is to change the code like this:
416
417</P>
418
419<PRE>
420char buf1[100];
421sprintf (buf1, "%0" MYPRId64, number);
422printf (gettext ("The amount is %s\n"), buf1);
423</PRE>
424
425<P>
426This means, you put the platform dependent code in one statement, and the
427internationalization code in a different statement.  Note that a buffer length
428of 100 is safe, because all available hardware integer types are limited to
429128 bits, and to print a 128 bit integer one needs at most 54 characters,
430regardless whether in decimal, octal or hexadecimal.
431
432</P>
433<P>
434<A NAME="IDX134"></A>
435<A NAME="IDX135"></A>
436All this applies to other programming languages as well.  For example, in
437Java and C#, string concatenation is very frequently used, because it is a
438compiler built-in operator.  Like in C, in Java, you would change
439
440</P>
441
442<PRE>
443System.out.println("Replace "+object1+" with "+object2+"?");
444</PRE>
445
446<P>
447into a statement involving a format string:
448
449</P>
450
451<PRE>
452System.out.println(
453    MessageFormat.format("Replace {0} with {1}?",
454                         new Object[] { object1, object2 }));
455</PRE>
456
457<P>
458Similarly, in C#, you would change
459
460</P>
461
462<PRE>
463Console.WriteLine("Replace "+object1+" with "+object2+"?");
464</PRE>
465
466<P>
467into a statement involving a format string:
468
469</P>
470
471<PRE>
472Console.WriteLine(
473    String.Format("Replace {0} with {1}?", object1, object2));
474</PRE>
475
476<P>
477<A NAME="IDX136"></A>
478<A NAME="IDX137"></A>
479Unusual markup or control characters should not be used in translatable
480strings.  Translators will likely not understand the particular meaning
481of the markup or control characters.
482
483</P>
484<P>
485For example, if you have a convention that <SAMP>&lsquo;|&rsquo;</SAMP> delimits the
486left-hand and right-hand part of some GUI elements, translators will
487often not understand it without specific comments.  It might be
488better to have the translator translate the left-hand and right-hand
489part separately.
490
491</P>
492<P>
493Another example is the <SAMP>&lsquo;argp&rsquo;</SAMP> convention to use a single <SAMP>&lsquo;\v&rsquo;</SAMP>
494(vertical tab) control character to delimit two sections inside a
495string.  This is flawed.  Some translators may convert it to a simple
496newline, some to blank lines.  With some PO file editors it may not be
497easy to even enter a vertical tab control character.  So, you cannot
498be sure that the translation will contain a <SAMP>&lsquo;\v&rsquo;</SAMP> character, at the
499corresponding position.  The solution is, again, to let the translator
500translate two separate strings and combine at run-time the two translated
501strings with the <SAMP>&lsquo;\v&rsquo;</SAMP> required by the convention.
502
503</P>
504<P>
505HTML markup, however, is common enough that it's probably ok to use in
506translatable strings.  But please bear in mind that the GNU gettext tools
507don't verify that the translations are well-formed HTML.
508
509</P>
510
511
512<H2><A NAME="SEC20" HREF="gettext_toc.html#TOC20">4.4  How Marks Appear in Sources</A></H2>
513<P>
514<A NAME="IDX138"></A>
515
516</P>
517<P>
518All strings requiring translation should be marked in the C sources.  Marking
519is done in such a way that each translatable string appears to be
520the sole argument of some function or preprocessor macro.  There are
521only a few such possible functions or macros meant for translation,
522and their names are said to be marking keywords.  The marking is
523attached to strings themselves, rather than to what we do with them.
524This approach has more uses.  A blatant example is an error message
525produced by formatting.  The format string needs translation, as
526well as some strings inserted through some <SAMP>&lsquo;%s&rsquo;</SAMP> specification
527in the format, while the result from <CODE>sprintf</CODE> may have so many
528different instances that it is impractical to list them all in some
529<SAMP>&lsquo;error_string_out()&rsquo;</SAMP> routine, say.
530
531</P>
532<P>
533This marking operation has two goals.  The first goal of marking
534is for triggering the retrieval of the translation, at run time.
535The keyword is possibly resolved into a routine able to dynamically
536return the proper translation, as far as possible or wanted, for the
537argument string.  Most localizable strings are found in executable
538positions, that is, attached to variables or given as parameters to
539functions.  But this is not universal usage, and some translatable
540strings appear in structured initializations.  See section <A HREF="gettext_4.html#SEC23">4.7  Special Cases of Translatable Strings</A>.
541
542</P>
543<P>
544The second goal of the marking operation is to help <CODE>xgettext</CODE>
545at properly extracting all translatable strings when it scans a set
546of program sources and produces PO file templates.
547
548</P>
549<P>
550The canonical keyword for marking translatable strings is
551<SAMP>&lsquo;gettext&rsquo;</SAMP>, it gave its name to the whole GNU <CODE>gettext</CODE>
552package.  For packages making only light use of the <SAMP>&lsquo;gettext&rsquo;</SAMP>
553keyword, macro or function, it is easily used <EM>as is</EM>.  However,
554for packages using the <CODE>gettext</CODE> interface more heavily, it
555is usually more convenient to give the main keyword a shorter, less
556obtrusive name.  Indeed, the keyword might appear on a lot of strings
557all over the package, and programmers usually do not want nor need
558their program sources to remind them forcefully, all the time, that they
559are internationalized.  Further, a long keyword has the disadvantage
560of using more horizontal space, forcing more indentation work on
561sources for those trying to keep them within 79 or 80 columns.
562
563</P>
564<P>
565<A NAME="IDX139"></A>
566Many packages use <SAMP>&lsquo;_&rsquo;</SAMP> (a simple underline) as a keyword,
567and write <SAMP>&lsquo;_("Translatable string")&rsquo;</SAMP> instead of <SAMP>&lsquo;gettext
568("Translatable string")&rsquo;</SAMP>.  Further, the coding rule, from GNU standards,
569wanting that there is a space between the keyword and the opening
570parenthesis is relaxed, in practice, for this particular usage.
571So, the textual overhead per translatable string is reduced to
572only three characters: the underline and the two parentheses.
573However, even if GNU <CODE>gettext</CODE> uses this convention internally,
574it does not offer it officially.  The real, genuine keyword is truly
575<SAMP>&lsquo;gettext&rsquo;</SAMP> indeed.  It is fairly easy for those wanting to use
576<SAMP>&lsquo;_&rsquo;</SAMP> instead of <SAMP>&lsquo;gettext&rsquo;</SAMP> to declare:
577
578</P>
579
580<PRE>
581#include &#60;libintl.h&#62;
582#define _(String) gettext (String)
583</PRE>
584
585<P>
586instead of merely using <SAMP>&lsquo;#include &#60;libintl.h&#62;&rsquo;</SAMP>.
587
588</P>
589<P>
590The marking keywords <SAMP>&lsquo;gettext&rsquo;</SAMP> and <SAMP>&lsquo;_&rsquo;</SAMP> take the translatable
591string as sole argument.  It is also possible to define marking functions
592that take it at another argument position.  It is even possible to make
593the marked argument position depend on the total number of arguments of
594the function call; this is useful in C++.  All this is achieved using
595<CODE>xgettext</CODE>'s <SAMP>&lsquo;--keyword&rsquo;</SAMP> option.
596
597</P>
598<P>
599Note also that long strings can be split across lines, into multiple
600adjacent string tokens.  Automatic string concatenation is performed
601at compile time according to ISO C and ISO C++; <CODE>xgettext</CODE> also
602supports this syntax.
603
604</P>
605<P>
606Later on, the maintenance is relatively easy.  If, as a programmer,
607you add or modify a string, you will have to ask yourself if the
608new or altered string requires translation, and include it within
609<SAMP>&lsquo;_()&rsquo;</SAMP> if you think it should be translated.  For example, <SAMP>&lsquo;"%s"&rsquo;</SAMP>
610is an example of string <EM>not</EM> requiring translation.  But
611<SAMP>&lsquo;"%s: %d"&rsquo;</SAMP> <EM>does</EM> require translation, because in French, unlike
612in English, it's customary to put a space before a colon.
613
614</P>
615
616
617<H2><A NAME="SEC21" HREF="gettext_toc.html#TOC21">4.5  Marking Translatable Strings</A></H2>
618<P>
619<A NAME="IDX140"></A>
620
621</P>
622<P>
623In PO mode, one set of features is meant more for the programmer than
624for the translator, and allows him to interactively mark which strings,
625in a set of program sources, are translatable, and which are not.
626Even if it is a fairly easy job for a programmer to find and mark
627such strings by other means, using any editor of his choice, PO mode
628makes this work more comfortable.  Further, this gives translators
629who feel a little like programmers, or programmers who feel a little
630like translators, a tool letting them work at marking translatable
631strings in the program sources, while simultaneously producing a set of
632translation in some language, for the package being internationalized.
633
634</P>
635<P>
636<A NAME="IDX141"></A>
637The set of program sources, targeted by the PO mode commands describe
638here, should have an Emacs tags table constructed for your project,
639prior to using these PO file commands.  This is easy to do.  In any
640shell window, change the directory to the root of your project, then
641execute a command resembling:
642
643</P>
644
645<PRE>
646etags src/*.[hc] lib/*.[hc]
647</PRE>
648
649<P>
650presuming here you want to process all <TT>&lsquo;.h&rsquo;</TT> and <TT>&lsquo;.c&rsquo;</TT> files
651from the <TT>&lsquo;src/&rsquo;</TT> and <TT>&lsquo;lib/&rsquo;</TT> directories.  This command will
652explore all said files and create a <TT>&lsquo;TAGS&rsquo;</TT> file in your root
653directory, somewhat summarizing the contents using a special file
654format Emacs can understand.
655
656</P>
657<P>
658<A NAME="IDX142"></A>
659For packages following the GNU coding standards, there is
660a make goal <CODE>tags</CODE> or <CODE>TAGS</CODE> which constructs the tag files in
661all directories and for all files containing source code.
662
663</P>
664<P>
665Once your <TT>&lsquo;TAGS&rsquo;</TT> file is ready, the following commands assist
666the programmer at marking translatable strings in his set of sources.
667But these commands are necessarily driven from within a PO file
668window, and it is likely that you do not even have such a PO file yet.
669This is not a problem at all, as you may safely open a new, empty PO
670file, mainly for using these commands.  This empty PO file will slowly
671fill in while you mark strings as translatable in your program sources.
672
673</P>
674<DL COMPACT>
675
676<DT><KBD>,</KBD>
677<DD>
678<A NAME="IDX143"></A>
679Search through program sources for a string which looks like a
680candidate for translation (<CODE>po-tags-search</CODE>).
681
682<DT><KBD>M-,</KBD>
683<DD>
684<A NAME="IDX144"></A>
685Mark the last string found with <SAMP>&lsquo;_()&rsquo;</SAMP> (<CODE>po-mark-translatable</CODE>).
686
687<DT><KBD>M-.</KBD>
688<DD>
689<A NAME="IDX145"></A>
690Mark the last string found with a keyword taken from a set of possible
691keywords.  This command with a prefix allows some management of these
692keywords (<CODE>po-select-mark-and-mark</CODE>).
693
694</DL>
695
696<P>
697<A NAME="IDX146"></A>
698The <KBD>,</KBD> (<CODE>po-tags-search</CODE>) command searches for the next
699occurrence of a string which looks like a possible candidate for
700translation, and displays the program source in another Emacs window,
701positioned in such a way that the string is near the top of this other
702window.  If the string is too big to fit whole in this window, it is
703positioned so only its end is shown.  In any case, the cursor
704is left in the PO file window.  If the shown string would be better
705presented differently in different native languages, you may mark it
706using <KBD>M-,</KBD> or <KBD>M-.</KBD>.  Otherwise, you might rather ignore it
707and skip to the next string by merely repeating the <KBD>,</KBD> command.
708
709</P>
710<P>
711A string is a good candidate for translation if it contains a sequence
712of three or more letters.  A string containing at most two letters in
713a row will be considered as a candidate if it has more letters than
714non-letters.  The command disregards strings containing no letters,
715or isolated letters only.  It also disregards strings within comments,
716or strings already marked with some keyword PO mode knows (see below).
717
718</P>
719<P>
720If you have never told Emacs about some <TT>&lsquo;TAGS&rsquo;</TT> file to use, the
721command will request that you specify one from the minibuffer, the
722first time you use the command.  You may later change your <TT>&lsquo;TAGS&rsquo;</TT>
723file by using the regular Emacs command <KBD>M-x visit-tags-table</KBD>,
724which will ask you to name the precise <TT>&lsquo;TAGS&rsquo;</TT> file you want
725to use.  See section ���Tag Tables��� in <CITE>The Emacs Editor</CITE>.
726
727</P>
728<P>
729Each time you use the <KBD>,</KBD> command, the search resumes from where it was
730left by the previous search, and goes through all program sources,
731obeying the <TT>&lsquo;TAGS&rsquo;</TT> file, until all sources have been processed.
732However, by giving a prefix argument to the command (<KBD>C-u
733,)</KBD>, you may request that the search be restarted all over again
734from the first program source; but in this case, strings that you
735recently marked as translatable will be automatically skipped.
736
737</P>
738<P>
739Using this <KBD>,</KBD> command does not prevent using of other regular
740Emacs tags commands.  For example, regular <CODE>tags-search</CODE> or
741<CODE>tags-query-replace</CODE> commands may be used without disrupting the
742independent <KBD>,</KBD> search sequence.  However, as implemented, the
743<EM>initial</EM> <KBD>,</KBD> command (or the <KBD>,</KBD> command is used with a
744prefix) might also reinitialize the regular Emacs tags searching to the
745first tags file, this reinitialization might be considered spurious.
746
747</P>
748<P>
749<A NAME="IDX147"></A>
750<A NAME="IDX148"></A>
751The <KBD>M-,</KBD> (<CODE>po-mark-translatable</CODE>) command will mark the
752recently found string with the <SAMP>&lsquo;_&rsquo;</SAMP> keyword.  The <KBD>M-.</KBD>
753(<CODE>po-select-mark-and-mark</CODE>) command will request that you type
754one keyword from the minibuffer and use that keyword for marking
755the string.  Both commands will automatically create a new PO file
756untranslated entry for the string being marked, and make it the
757current entry (making it easy for you to immediately proceed to its
758translation, if you feel like doing it right away).  It is possible
759that the modifications made to the program source by <KBD>M-,</KBD> or
760<KBD>M-.</KBD> render some source line longer than 80 columns, forcing you
761to break and re-indent this line differently.  You may use the <KBD>O</KBD>
762command from PO mode, or any other window changing command from
763Emacs, to break out into the program source window, and do any
764needed adjustments.  You will have to use some regular Emacs command
765to return the cursor to the PO file window, if you want command
766<KBD>,</KBD> for the next string, say.
767
768</P>
769<P>
770The <KBD>M-.</KBD> command has a few built-in speedups, so you do not
771have to explicitly type all keywords all the time.  The first such
772speedup is that you are presented with a <EM>preferred</EM> keyword,
773which you may accept by merely typing <KBD><KBD>RET</KBD></KBD> at the prompt.
774The second speedup is that you may type any non-ambiguous prefix of the
775keyword you really mean, and the command will complete it automatically
776for you.  This also means that PO mode has to <EM>know</EM> all
777your possible keywords, and that it will not accept mistyped keywords.
778
779</P>
780<P>
781If you reply <KBD>?</KBD> to the keyword request, the command gives a
782list of all known keywords, from which you may choose.  When the
783command is prefixed by an argument (<KBD>C-u M-.</KBD>), it inhibits
784updating any program source or PO file buffer, and does some simple
785keyword management instead.  In this case, the command asks for a
786keyword, written in full, which becomes a new allowed keyword for
787later <KBD>M-.</KBD> commands.  Moreover, this new keyword automatically
788becomes the <EM>preferred</EM> keyword for later commands.  By typing
789an already known keyword in response to <KBD>C-u M-.</KBD>, one merely
790changes the <EM>preferred</EM> keyword and does nothing more.
791
792</P>
793<P>
794All keywords known for <KBD>M-.</KBD> are recognized by the <KBD>,</KBD> command
795when scanning for strings, and strings already marked by any of those
796known keywords are automatically skipped.  If many PO files are opened
797simultaneously, each one has its own independent set of known keywords.
798There is no provision in PO mode, currently, for deleting a known
799keyword, you have to quit the file (maybe using <KBD>q</KBD>) and reopen
800it afresh.  When a PO file is newly brought up in an Emacs window, only
801<SAMP>&lsquo;gettext&rsquo;</SAMP> and <SAMP>&lsquo;_&rsquo;</SAMP> are known as keywords, and <SAMP>&lsquo;gettext&rsquo;</SAMP>
802is preferred for the <KBD>M-.</KBD> command.  In fact, this is not useful to
803prefer <SAMP>&lsquo;_&rsquo;</SAMP>, as this one is already built in the <KBD>M-,</KBD> command.
804
805</P>
806
807
808<H2><A NAME="SEC22" HREF="gettext_toc.html#TOC22">4.6  Special Comments preceding Keywords</A></H2>
809
810<P>
811<A NAME="IDX149"></A>
812In C programs strings are often used within calls of functions from the
813<CODE>printf</CODE> family.  The special thing about these format strings is
814that they can contain format specifiers introduced with <KBD>%</KBD>.  Assume
815we have the code
816
817</P>
818
819<PRE>
820printf (gettext ("String `%s' has %d characters\n"), s, strlen (s));
821</PRE>
822
823<P>
824A possible German translation for the above string might be:
825
826</P>
827
828<PRE>
829"%d Zeichen lang ist die Zeichenkette `%s'"
830</PRE>
831
832<P>
833A C programmer, even if he cannot speak German, will recognize that
834there is something wrong here.  The order of the two format specifiers
835is changed but of course the arguments in the <CODE>printf</CODE> don't have.
836This will most probably lead to problems because now the length of the
837string is regarded as the address.
838
839</P>
840<P>
841To prevent errors at runtime caused by translations the <CODE>msgfmt</CODE>
842tool can check statically whether the arguments in the original and the
843translation string match in type and number.  If this is not the case
844and the <SAMP>&lsquo;-c&rsquo;</SAMP> option has been passed to <CODE>msgfmt</CODE>, <CODE>msgfmt</CODE>
845will give an error and refuse to produce a MO file.  Thus consequent
846use of <SAMP>&lsquo;msgfmt -c&rsquo;</SAMP> will catch the error, so that it cannot cause
847cause problems at runtime.
848
849</P>
850<P>
851If the word order in the above German translation would be correct one
852would have to write
853
854</P>
855
856<PRE>
857"%2$d Zeichen lang ist die Zeichenkette `%1$s'"
858</PRE>
859
860<P>
861The routines in <CODE>msgfmt</CODE> know about this special notation.
862
863</P>
864<P>
865Because not all strings in a program must be format strings it is not
866useful for <CODE>msgfmt</CODE> to test all the strings in the <TT>&lsquo;.po&rsquo;</TT> file.
867This might cause problems because the string might contain what looks
868like a format specifier, but the string is not used in <CODE>printf</CODE>.
869
870</P>
871<P>
872Therefore the <CODE>xgettext</CODE> adds a special tag to those messages it
873thinks might be a format string.  There is no absolute rule for this,
874only a heuristic.  In the <TT>&lsquo;.po&rsquo;</TT> file the entry is marked using the
875<CODE>c-format</CODE> flag in the <CODE>#,</CODE> comment line (see section <A HREF="gettext_3.html#SEC15">3  The Format of PO Files</A>).
876
877</P>
878<P>
879<A NAME="IDX150"></A>
880<A NAME="IDX151"></A>
881The careful reader now might say that this again can cause problems.
882The heuristic might guess it wrong.  This is true and therefore
883<CODE>xgettext</CODE> knows about a special kind of comment which lets
884the programmer take over the decision.  If in the same line as or
885the immediately preceding line to the <CODE>gettext</CODE> keyword
886the <CODE>xgettext</CODE> program finds a comment containing the words
887<CODE>xgettext:c-format</CODE>, it will mark the string in any case with
888the <CODE>c-format</CODE> flag.  This kind of comment should be used when
889<CODE>xgettext</CODE> does not recognize the string as a format string but
890it really is one and it should be tested.  Please note that when the
891comment is in the same line as the <CODE>gettext</CODE> keyword, it must be
892before the string to be translated.
893
894</P>
895<P>
896This situation happens quite often.  The <CODE>printf</CODE> function is often
897called with strings which do not contain a format specifier.  Of course
898one would normally use <CODE>fputs</CODE> but it does happen.  In this case
899<CODE>xgettext</CODE> does not recognize this as a format string but what
900happens if the translation introduces a valid format specifier?  The
901<CODE>printf</CODE> function will try to access one of the parameters but none
902exists because the original code does not pass any parameters.
903
904</P>
905<P>
906<CODE>xgettext</CODE> of course could make a wrong decision the other way
907round, i.e. a string marked as a format string actually is not a format
908string.  In this case the <CODE>msgfmt</CODE> might give too many warnings and
909would prevent translating the <TT>&lsquo;.po&rsquo;</TT> file.  The method to prevent
910this wrong decision is similar to the one used above, only the comment
911to use must contain the string <CODE>xgettext:no-c-format</CODE>.
912
913</P>
914<P>
915If a string is marked with <CODE>c-format</CODE> and this is not correct the
916user can find out who is responsible for the decision.  See
917section <A HREF="gettext_5.html#SEC28">5.1  Invoking the <CODE>xgettext</CODE> Program</A> to see how the <CODE>--debug</CODE> option can be
918used for solving this problem.
919
920</P>
921
922
923<H2><A NAME="SEC23" HREF="gettext_toc.html#TOC23">4.7  Special Cases of Translatable Strings</A></H2>
924
925<P>
926<A NAME="IDX152"></A>
927The attentive reader might now point out that it is not always possible
928to mark translatable string with <CODE>gettext</CODE> or something like this.
929Consider the following case:
930
931</P>
932
933<PRE>
934{
935  static const char *messages[] = {
936    "some very meaningful message",
937    "and another one"
938  };
939  const char *string;
940  ...
941  string
942    = index &#62; 1 ? "a default message" : messages[index];
943
944  fputs (string);
945  ...
946}
947</PRE>
948
949<P>
950While it is no problem to mark the string <CODE>"a default message"</CODE> it
951is not possible to mark the string initializers for <CODE>messages</CODE>.
952What is to be done?  We have to fulfill two tasks.  First we have to mark the
953strings so that the <CODE>xgettext</CODE> program (see section <A HREF="gettext_5.html#SEC28">5.1  Invoking the <CODE>xgettext</CODE> Program</A>)
954can find them, and second we have to translate the string at runtime
955before printing them.
956
957</P>
958<P>
959The first task can be fulfilled by creating a new keyword, which names a
960no-op.  For the second we have to mark all access points to a string
961from the array.  So one solution can look like this:
962
963</P>
964
965<PRE>
966#define gettext_noop(String) String
967
968{
969  static const char *messages[] = {
970    gettext_noop ("some very meaningful message"),
971    gettext_noop ("and another one")
972  };
973  const char *string;
974  ...
975  string
976    = index &#62; 1 ? gettext ("a default message") : gettext (messages[index]);
977
978  fputs (string);
979  ...
980}
981</PRE>
982
983<P>
984Please convince yourself that the string which is written by
985<CODE>fputs</CODE> is translated in any case.  How to get <CODE>xgettext</CODE> know
986the additional keyword <CODE>gettext_noop</CODE> is explained in section <A HREF="gettext_5.html#SEC28">5.1  Invoking the <CODE>xgettext</CODE> Program</A>.
987
988</P>
989<P>
990The above is of course not the only solution.  You could also come along
991with the following one:
992
993</P>
994
995<PRE>
996#define gettext_noop(String) String
997
998{
999  static const char *messages[] = {
1000    gettext_noop ("some very meaningful message",
1001    gettext_noop ("and another one")
1002  };
1003  const char *string;
1004  ...
1005  string
1006    = index &#62; 1 ? gettext_noop ("a default message") : messages[index];
1007
1008  fputs (gettext (string));
1009  ...
1010}
1011</PRE>
1012
1013<P>
1014But this has a drawback.  The programmer has to take care that
1015he uses <CODE>gettext_noop</CODE> for the string <CODE>"a default message"</CODE>.
1016A use of <CODE>gettext</CODE> could have in rare cases unpredictable results.
1017
1018</P>
1019<P>
1020One advantage is that you need not make control flow analysis to make
1021sure the output is really translated in any case.  But this analysis is
1022generally not very difficult.  If it should be in any situation you can
1023use this second method in this situation.
1024
1025</P>
1026
1027
1028<H2><A NAME="SEC24" HREF="gettext_toc.html#TOC24">4.8  Letting Users Report Translation Bugs</A></H2>
1029
1030<P>
1031Code sometimes has bugs, but translations sometimes have bugs too.  The
1032users need to be able to report them.  Reporting translation bugs to the
1033programmer or maintainer of a package is not very useful, since the
1034maintainer must never change a translation, except on behalf of the
1035translator.  Hence the translation bugs must be reported to the
1036translators.
1037
1038</P>
1039<P>
1040Here is a way to organize this so that the maintainer does not need to
1041forward translation bug reports, nor even keep a list of the addresses of
1042the translators or their translation teams.
1043
1044</P>
1045<P>
1046Every program has a place where is shows the bug report address.  For
1047GNU programs, it is the code which handles the ���--help��� option,
1048typically in a function called ���usage���.  In this place, instruct the
1049translator to add her own bug reporting address.  For example, if that
1050code has a statement
1051
1052</P>
1053
1054<PRE>
1055printf (_("Report bugs to &#60;%s&#62;.\n"), PACKAGE_BUGREPORT);
1056</PRE>
1057
1058<P>
1059you can add some translator instructions like this:
1060
1061</P>
1062
1063<PRE>
1064/* TRANSLATORS: The placeholder indicates the bug-reporting address
1065   for this package.  Please add _another line_ saying
1066   "Report translation bugs to &#60;...&#62;\n" with the address for translation
1067   bugs (typically your translation team's web or email address).  */
1068printf (_("Report bugs to &#60;%s&#62;.\n"), PACKAGE_BUGREPORT);
1069</PRE>
1070
1071<P>
1072These will be extracted by <SAMP>&lsquo;xgettext&rsquo;</SAMP>, leading to a .pot file that
1073contains this:
1074
1075</P>
1076
1077<PRE>
1078#. TRANSLATORS: The placeholder indicates the bug-reporting address
1079#. for this package.  Please add _another line_ saying
1080#. "Report translation bugs to &#60;...&#62;\n" with the address for translation
1081#. bugs (typically your translation team's web or email address).
1082#: src/hello.c:178
1083#, c-format
1084msgid "Report bugs to &#60;%s&#62;.\n"
1085msgstr ""
1086</PRE>
1087
1088
1089
1090<H2><A NAME="SEC25" HREF="gettext_toc.html#TOC25">4.9  Marking Proper Names for Translation</A></H2>
1091
1092<P>
1093Should names of persons, cities, locations etc. be marked for translation
1094or not?  People who only know languages that can be written with Latin
1095letters (English, Spanish, French, German, etc.) are tempted to say ���no���,
1096because names usually do not change when transported between these languages.
1097However, in general when translating from one script to another, names
1098are translated too, usually phonetically or by transliteration.  For
1099example, Russian or Greek names are converted to the Latin alphabet when
1100being translated to English, and English or French names are converted
1101to the Katakana script when being translated to Japanese.  This is
1102necessary because the speakers of the target language in general cannot
1103read the script the name is originally written in.
1104
1105</P>
1106<P>
1107As a programmer, you should therefore make sure that names are marked
1108for translation, with a special comment telling the translators that it
1109is a proper name and how to pronounce it.  Like this:
1110
1111</P>
1112
1113<PRE>
1114printf (_("Written by %s.\n"),
1115        /* TRANSLATORS: This is a proper name.  See the gettext
1116           manual, section Names.  Note this is actually a non-ASCII
1117           name: The first name is (with Unicode escapes)
1118           "Fran\u00e7ois" or (with HTML entities) "Fran&#38;ccedil;ois".
1119           Pronunciation is like "fraa-swa pee-nar".  */
1120        _("Francois Pinard"));
1121</PRE>
1122
1123<P>
1124As a translator, you should use some care when translating names, because
1125it is frustrating if people see their names mutilated or distorted.  If
1126your language uses the Latin script, all you need to do is to reproduce
1127the name as perfectly as you can within the usual character set of your
1128language.  In this particular case, this means to provide a translation
1129containing the c-cedilla character.  If your language uses a different
1130script and the people speaking it don't usually read Latin words, it means
1131transliteration; but you should still give, in parentheses, the original
1132writing of the name -- for the sake of the people that do read the Latin
1133script.  Here is an example, using Greek as the target script:
1134
1135</P>
1136
1137<PRE>
1138#. This is a proper name.  See the gettext
1139#. manual, section Names.  Note this is actually a non-ASCII
1140#. name: The first name is (with Unicode escapes)
1141#. "Fran\u00e7ois" or (with HTML entities) "Fran&#38;ccedil;ois".
1142#. Pronunciation is like "fraa-swa pee-nar".
1143msgid "Francois Pinard"
1144msgstr "\phi\rho\alpha\sigma\omicron\alpha \pi\iota\nu\alpha\rho"
1145       " (Francois Pinard)"
1146</PRE>
1147
1148<P>
1149Because translation of names is such a sensitive domain, it is a good
1150idea to test your translation before submitting it.
1151
1152</P>
1153<P>
1154The translation project <A HREF="http://sourceforge.net/projects/translation">http://sourceforge.net/projects/translation</A>
1155has set up a POT file and translation domain consisting of program author
1156names, with better facilities for the translator than those presented here.
1157Namely, there the original name is written directly in Unicode (rather
1158than with Unicode escapes or HTML entities), and the pronunciation is
1159denoted using the International Phonetic Alphabet (see
1160<A HREF="http://www.wikipedia.org/wiki/International_Phonetic_Alphabet">http://www.wikipedia.org/wiki/International_Phonetic_Alphabet</A>).
1161
1162</P>
1163<P>
1164However, we don't recommend this approach for all POT files in all packages,
1165because this would force translators to use PO files in UTF-8 encoding,
1166which is - in the current state of software (as of 2003) - a major hassle
1167for translators using GNU Emacs or XEmacs with po-mode.
1168
1169</P>
1170
1171
1172<H2><A NAME="SEC26" HREF="gettext_toc.html#TOC26">4.10  Preparing Library Sources</A></H2>
1173
1174<P>
1175When you are preparing a library, not a program, for the use of
1176<CODE>gettext</CODE>, only a few details are different.  Here we assume that
1177the library has a translation domain and a POT file of its own.  (If
1178it uses the translation domain and POT file of the main program, then
1179the previous sections apply without changes.)
1180
1181</P>
1182
1183<OL>
1184<LI>
1185
1186The library code doesn't call <CODE>setlocale (LC_ALL, "")</CODE>.  It's the
1187responsibility of the main program to set the locale.  The library's
1188documentation should mention this fact, so that developers of programs
1189using the library are aware of it.
1190
1191<LI>
1192
1193The library code doesn't call <CODE>textdomain (PACKAGE)</CODE>, because it
1194would interfere with the text domain set by the main program.
1195
1196<LI>
1197
1198The initialization code for a program was
1199
1200
1201<PRE>
1202  setlocale (LC_ALL, "");
1203  bindtextdomain (PACKAGE, LOCALEDIR);
1204  textdomain (PACKAGE);
1205</PRE>
1206
1207For a library it is reduced to
1208
1209
1210<PRE>
1211  bindtextdomain (PACKAGE, LOCALEDIR);
1212</PRE>
1213
1214If your library's API doesn't already have an initialization function,
1215you need to create one, containing at least the <CODE>bindtextdomain</CODE>
1216invocation.  However, you usually don't need to export and document this
1217initialization function: It is sufficient that all entry points of the
1218library call the initialization function if it hasn't been called before.
1219The typical idiom used to achieve this is a static boolean variable that
1220indicates whether the initialization function has been called. Like this:
1221
1222
1223<PRE>
1224static bool libfoo_initialized;
1225
1226static void
1227libfoo_initialize (void)
1228{
1229  bindtextdomain (PACKAGE, LOCALEDIR);
1230  libfoo_initialized = true;
1231}
1232
1233/* This function is part of the exported API.  */
1234struct foo *
1235create_foo (...)
1236{
1237  /* Must ensure the initialization is performed.  */
1238  if (!libfoo_initialized)
1239    libfoo_initialize ();
1240  ...
1241}
1242
1243/* This function is part of the exported API.  The argument must be
1244   non-NULL and have been created through create_foo().  */
1245int
1246foo_refcount (struct foo *argument)
1247{
1248  /* No need to invoke the initialization function here, because
1249     create_foo() must already have been called before.  */
1250  ...
1251}
1252</PRE>
1253
1254<LI>
1255
1256The usual declaration of the <SAMP>&lsquo;_&rsquo;</SAMP> macro in each source file was
1257
1258
1259<PRE>
1260#include &#60;libintl.h&#62;
1261#define _(String) gettext (String)
1262</PRE>
1263
1264for a program.  For a library, which has its own translation domain,
1265it reads like this:
1266
1267
1268<PRE>
1269#include &#60;libintl.h&#62;
1270#define _(String) dgettext (PACKAGE, String)
1271</PRE>
1272
1273In other words, <CODE>dgettext</CODE> is used instead of <CODE>gettext</CODE>.
1274Similarly, the <CODE>dngettext</CODE> function should be used in place of the
1275<CODE>ngettext</CODE> function.
1276</OL>
1277
1278<P><HR><P>
1279Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_3.html">previous</A>, <A HREF="gettext_5.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
1280</BODY>
1281</HTML>
1282