1\input texinfo          @c -*-texinfo-*-
2@c %**start of header
3@c The @documentencoding is needed for makeinfo, but not for texi2html.
4@c @ifhtml
5@c @documentencoding UTF-8
6@c @end ifhtml
7@setfilename gettext.info
8@settitle GNU @code{gettext} utilities
9@finalout
10@c Indices:
11@c   am = autoconf macro  @amindex
12@c   cp = concept         @cindex
13@c   ef = emacs function  @efindex
14@c   em = emacs mode      @emindex
15@c   ev = emacs variable  @evindex
16@c   fn = function        @findex
17@c   kw = keyword         @kwindex
18@c   op = option          @opindex
19@c   pg = program         @pindex
20@c   vr = variable        @vindex
21@c Unused predefined indices:
22@c   tp = type            @tindex
23@c   ky = keystroke       @kindex
24@defcodeindex am
25@defcodeindex ef
26@defindex em
27@defcodeindex ev
28@defcodeindex kw
29@defcodeindex op
30@syncodeindex ef em
31@syncodeindex ev em
32@syncodeindex fn cp
33@syncodeindex kw cp
34@c %**end of header
35
36@include version.texi
37
38@ifinfo
39@dircategory GNU Gettext Utilities
40@direntry
41* gettext: (gettext).                          GNU gettext utilities.
42* autopoint: (gettext)autopoint Invocation.    Copy gettext infrastructure.
43* envsubst: (gettext)envsubst Invocation.      Expand environment variables.
44* gettextize: (gettext)gettextize Invocation.  Prepare a package for gettext.
45* msgattrib: (gettext)msgattrib Invocation.    Select part of a PO file.
46* msgcat: (gettext)msgcat Invocation.          Combine several PO files.
47* msgcmp: (gettext)msgcmp Invocation.          Compare a PO file and template.
48* msgcomm: (gettext)msgcomm Invocation.        Match two PO files.
49* msgconv: (gettext)msgconv Invocation.        Convert PO file to encoding.
50* msgen: (gettext)msgen Invocation.            Create an English PO file.
51* msgexec: (gettext)msgexec Invocation.        Process a PO file.
52* msgfilter: (gettext)msgfilter Invocation.    Pipe a PO file through a filter.
53* msgfmt: (gettext)msgfmt Invocation.          Make MO files out of PO files.
54* msggrep: (gettext)msggrep Invocation.        Select part of a PO file.
55* msginit: (gettext)msginit Invocation.        Create a fresh PO file.
56* msgmerge: (gettext)msgmerge Invocation.      Update a PO file from template.
57* msgunfmt: (gettext)msgunfmt Invocation.      Uncompile MO file into PO file.
58* msguniq: (gettext)msguniq Invocation.        Unify duplicates for PO file.
59* ngettext: (gettext)ngettext Invocation.      Translate a message with plural.
60* xgettext: (gettext)xgettext Invocation.      Extract strings into a PO file.
61* ISO639: (gettext)Language Codes.             ISO 639 language codes.
62* ISO3166: (gettext)Country Codes.             ISO 3166 country codes.
63@end direntry
64@end ifinfo
65
66@ifinfo
67This file provides documentation for GNU @code{gettext} utilities.
68It also serves as a reference for the free Translation Project.
69
70@copying
71Copyright (C) 1995-1998, 2001-2006 Free Software Foundation, Inc.
72
73This manual is free documentation.  It is dually licensed under the
74GNU FDL and the GNU GPL.  This means that you can redistribute this
75manual under either of these two licenses, at your choice.
76
77This manual is covered by the GNU FDL.  Permission is granted to copy,
78distribute and/or modify this document under the terms of the
79GNU Free Documentation License (FDL), either version 1.2 of the
80License, or (at your option) any later version published by the
81Free Software Foundation (FSF); with no Invariant Sections, with no
82Front-Cover Text, and with no Back-Cover Texts.
83A copy of the license is included in @ref{GNU FDL}.
84
85This manual is covered by the GNU GPL.  You can redistribute it and/or
86modify it under the terms of the GNU General Public License (GPL), either
87version 2 of the License, or (at your option) any later version published
88by the Free Software Foundation (FSF).
89A copy of the license is included in @ref{GNU GPL}.
90@end copying
91@end ifinfo
92
93@titlepage
94@title GNU gettext tools, version @value{VERSION}
95@subtitle Native Language Support Library and Tools
96@subtitle Edition @value{EDITION}, @value{UPDATED}
97@author Ulrich Drepper
98@author Jim Meyering
99@author Fran@,{c}ois Pinard
100@author Bruno Haible
101
102@ifnothtml
103@page
104@vskip 0pt plus 1filll
105@c @insertcopying
106Copyright (C) 1995-1998, 2001-2006 Free Software Foundation, Inc.
107
108This manual is free documentation.  It is dually licensed under the
109GNU FDL and the GNU GPL.  This means that you can redistribute this
110manual under either of these two licenses, at your choice.
111
112This manual is covered by the GNU FDL.  Permission is granted to copy,
113distribute and/or modify this document under the terms of the
114GNU Free Documentation License (FDL), either version 1.2 of the
115License, or (at your option) any later version published by the
116Free Software Foundation (FSF); with no Invariant Sections, with no
117Front-Cover Text, and with no Back-Cover Texts.
118A copy of the license is included in @ref{GNU FDL}.
119
120This manual is covered by the GNU GPL.  You can redistribute it and/or
121modify it under the terms of the GNU General Public License (GPL), either
122version 2 of the License, or (at your option) any later version published
123by the Free Software Foundation (FSF).
124A copy of the license is included in @ref{GNU GPL}.
125@end ifnothtml
126@end titlepage
127
128@ifnottex
129@c Table of Contents
130@contents
131@end ifnottex
132
133@ifinfo
134@node Top, Introduction, (dir), (dir)
135@top GNU @code{gettext} utilities
136
137This manual documents the GNU gettext tools and the GNU libintl library,
138version @value{VERSION}.
139
140@menu
141* Introduction::                Introduction
142* Users::                       The User's View
143* PO Files::                    The Format of PO Files
144* Sources::                     Preparing Program Sources
145* Template::                    Making the PO Template File
146* Creating::                    Creating a New PO File
147* Updating::                    Updating Existing PO Files
148* Editing::                     Editing PO Files
149* Manipulating::                Manipulating PO Files
150* Binaries::                    Producing Binary MO Files
151* Programmers::                 The Programmer's View
152* Translators::                 The Translator's View
153* Maintainers::                 The Maintainer's View
154* Installers::                  The Installer's and Distributor's View
155* Programming Languages::       Other Programming Languages
156* Conclusion::                  Concluding Remarks
157
158* Language Codes::              ISO 639 language codes
159* Country Codes::               ISO 3166 country codes
160* Licenses::                    Licenses
161
162* Program Index::               Index of Programs
163* Option Index::                Index of Command-Line Options
164* Variable Index::              Index of Environment Variables
165* PO Mode Index::               Index of Emacs PO Mode Commands
166* Autoconf Macro Index::        Index of Autoconf Macros
167* Index::                       General Index
168
169@detailmenu
170 --- The Detailed Node Listing ---
171
172Introduction
173
174* Why::                         The Purpose of GNU @code{gettext}
175* Concepts::                    I18n, L10n, and Such
176* Aspects::                     Aspects in Native Language Support
177* Files::                       Files Conveying Translations
178* Overview::                    Overview of GNU @code{gettext}
179
180The User's View
181
182* Matrix::                      The Current @file{ABOUT-NLS} Matrix
183* End Users::                   Magic for End Users
184
185Preparing Program Sources
186
187* Importing::                   Importing the @code{gettext} declaration
188* Triggering::                  Triggering @code{gettext} Operations
189* Preparing Strings::           Preparing Translatable Strings
190* Mark Keywords::               How Marks Appear in Sources
191* Marking::                     Marking Translatable Strings
192* c-format Flag::               Telling something about the following string
193* Special cases::               Special Cases of Translatable Strings
194* Names::                       Marking Proper Names for Translation
195* Libraries::                   Preparing Library Sources
196
197Making the PO Template File
198
199* xgettext Invocation::         Invoking the @code{xgettext} Program
200
201Creating a New PO File
202
203* msginit Invocation::          Invoking the @code{msginit} Program
204* Header Entry::                Filling in the Header Entry
205
206Updating Existing PO Files
207
208* msgmerge Invocation::         Invoking the @code{msgmerge} Program
209
210Editing PO Files
211
212* KBabel::                      KDE's PO File Editor
213* Gtranslator::                 GNOME's PO File Editor
214* PO Mode::                     Emacs's PO File Editor
215
216Emacs's PO File Editor
217
218* Installation::                Completing GNU @code{gettext} Installation
219* Main PO Commands::            Main Commands
220* Entry Positioning::           Entry Positioning
221* Normalizing::                 Normalizing Strings in Entries
222* Translated Entries::          Translated Entries
223* Fuzzy Entries::               Fuzzy Entries
224* Untranslated Entries::        Untranslated Entries
225* Obsolete Entries::            Obsolete Entries
226* Modifying Translations::      Modifying Translations
227* Modifying Comments::          Modifying Comments
228* Subedit::                     Mode for Editing Translations
229* C Sources Context::           C Sources Context
230* Auxiliary::                   Consulting Auxiliary PO Files
231* Compendium::                  Using Translation Compendia
232
233Using Translation Compendia
234
235* Creating Compendia::          Merging translations for later use
236* Using Compendia::             Using older translations if they fit
237
238Manipulating PO Files
239
240* msgcat Invocation::           Invoking the @code{msgcat} Program
241* msgconv Invocation::          Invoking the @code{msgconv} Program
242* msggrep Invocation::          Invoking the @code{msggrep} Program
243* msgfilter Invocation::        Invoking the @code{msgfilter} Program
244* msguniq Invocation::          Invoking the @code{msguniq} Program
245* msgcomm Invocation::          Invoking the @code{msgcomm} Program
246* msgcmp Invocation::           Invoking the @code{msgcmp} Program
247* msgattrib Invocation::        Invoking the @code{msgattrib} Program
248* msgen Invocation::            Invoking the @code{msgen} Program
249* msgexec Invocation::          Invoking the @code{msgexec} Program
250* libgettextpo::                Writing your own programs that process PO files
251
252Producing Binary MO Files
253
254* msgfmt Invocation::           Invoking the @code{msgfmt} Program
255* msgunfmt Invocation::         Invoking the @code{msgunfmt} Program
256* MO Files::                    The Format of GNU MO Files
257
258The Programmer's View
259
260* catgets::                     About @code{catgets}
261* gettext::                     About @code{gettext}
262* Comparison::                  Comparing the two interfaces
263* Using libintl.a::             Using libintl.a in own programs
264* gettext grok::                Being a @code{gettext} grok
265* Temp Programmers::            Temporary Notes for the Programmers Chapter
266
267About @code{catgets}
268
269* Interface to catgets::        The interface
270* Problems with catgets::       Problems with the @code{catgets} interface?!
271
272About @code{gettext}
273
274* Interface to gettext::        The interface
275* Ambiguities::                 Solving ambiguities
276* Locating Catalogs::           Locating message catalog files
277* Charset conversion::          How to request conversion to Unicode
278* Contexts::                    Solving ambiguities in GUI programs
279* Plural forms::                Additional functions for handling plurals
280* Optimized gettext::           Optimization of the *gettext functions
281
282Temporary Notes for the Programmers Chapter
283
284* Temp Implementations::        Temporary - Two Possible Implementations
285* Temp catgets::                Temporary - About @code{catgets}
286* Temp WSI::                    Temporary - Why a single implementation
287* Temp Notes::                  Temporary - Notes
288
289The Translator's View
290
291* Trans Intro 0::               Introduction 0
292* Trans Intro 1::               Introduction 1
293* Discussions::                 Discussions
294* Organization::                Organization
295* Information Flow::            Information Flow
296* Prioritizing messages::       How to find which messages to translate first
297
298Organization
299
300* Central Coordination::        Central Coordination
301* National Teams::              National Teams
302* Mailing Lists::               Mailing Lists
303
304National Teams
305
306* Sub-Cultures::                Sub-Cultures
307* Organizational Ideas::        Organizational Ideas
308
309The Maintainer's View
310
311* Flat and Non-Flat::           Flat or Non-Flat Directory Structures
312* Prerequisites::               Prerequisite Works
313* gettextize Invocation::       Invoking the @code{gettextize} Program
314* Adjusting Files::             Files You Must Create or Alter
315* autoconf macros::             Autoconf macros for use in @file{configure.in}
316* CVS Issues::                  Integrating with CVS
317* Release Management::          Creating a Distribution Tarball
318
319Files You Must Create or Alter
320
321* po/POTFILES.in::              @file{POTFILES.in} in @file{po/}
322* po/LINGUAS::                  @file{LINGUAS} in @file{po/}
323* po/Makevars::                 @file{Makevars} in @file{po/}
324* po/Rules-*::                  Extending @file{Makefile} in @file{po/}
325* configure.in::                @file{configure.in} at top level
326* config.guess::                @file{config.guess}, @file{config.sub} at top level
327* mkinstalldirs::               @file{mkinstalldirs} at top level
328* aclocal::                     @file{aclocal.m4} at top level
329* acconfig::                    @file{acconfig.h} at top level
330* config.h.in::                 @file{config.h.in} at top level
331* Makefile::                    @file{Makefile.in} at top level
332* src/Makefile::                @file{Makefile.in} in @file{src/}
333* lib/gettext.h::               @file{gettext.h} in @file{lib/}
334
335Autoconf macros for use in @file{configure.in}
336
337* AM_GNU_GETTEXT::              AM_GNU_GETTEXT in @file{gettext.m4}
338* AM_GNU_GETTEXT_VERSION::      AM_GNU_GETTEXT_VERSION in @file{gettext.m4}
339* AM_GNU_GETTEXT_NEED::         AM_GNU_GETTEXT_NEED in @file{gettext.m4}
340* AM_GNU_GETTEXT_INTL_SUBDIR::  AM_GNU_GETTEXT_INTL_SUBDIR in @file{intldir.m4}
341* AM_PO_SUBDIRS::               AM_PO_SUBDIRS in @file{po.m4}
342* AM_ICONV::                    AM_ICONV in @file{iconv.m4}
343
344Integrating with CVS
345
346* Distributed CVS::             Avoiding version mismatch in distributed development
347* Files under CVS::             Files to put under CVS version control
348* autopoint Invocation::        Invoking the @code{autopoint} Program
349
350Other Programming Languages
351
352* Language Implementors::       The Language Implementor's View
353* Programmers for other Languages::  The Programmer's View
354* Translators for other Languages::  The Translator's View
355* Maintainers for other Languages::  The Maintainer's View
356* List of Programming Languages::  Individual Programming Languages
357* List of Data Formats::        Internationalizable Data
358
359The Translator's View
360
361* c-format::                    C Format Strings
362* objc-format::                 Objective C Format Strings
363* sh-format::                   Shell Format Strings
364* python-format::               Python Format Strings
365* lisp-format::                 Lisp Format Strings
366* elisp-format::                Emacs Lisp Format Strings
367* librep-format::               librep Format Strings
368* scheme-format::               Scheme Format Strings
369* smalltalk-format::            Smalltalk Format Strings
370* java-format::                 Java Format Strings
371* csharp-format::               C# Format Strings
372* awk-format::                  awk Format Strings
373* object-pascal-format::        Object Pascal Format Strings
374* ycp-format::                  YCP Format Strings
375* tcl-format::                  Tcl Format Strings
376* perl-format::                 Perl Format Strings
377* php-format::                  PHP Format Strings
378* gcc-internal-format::         GCC internal Format Strings
379* qt-format::                   Qt Format Strings
380* boost-format::                Boost Format Strings
381
382Individual Programming Languages
383
384* C::                           C, C++, Objective C
385* sh::                          sh - Shell Script
386* bash::                        bash - Bourne-Again Shell Script
387* Python::                      Python
388* Common Lisp::                 GNU clisp - Common Lisp
389* clisp C::                     GNU clisp C sources
390* Emacs Lisp::                  Emacs Lisp
391* librep::                      librep
392* Scheme::                      GNU guile - Scheme
393* Smalltalk::                   GNU Smalltalk
394* Java::                        Java
395* C#::                          C#
396* gawk::                        GNU awk
397* Pascal::                      Pascal - Free Pascal Compiler
398* wxWidgets::                   wxWidgets library
399* YCP::                         YCP - YaST2 scripting language
400* Tcl::                         Tcl - Tk's scripting language
401* Perl::                        Perl
402* PHP::                         PHP Hypertext Preprocessor
403* Pike::                        Pike
404* GCC-source::                  GNU Compiler Collection sources
405
406sh - Shell Script
407
408* Preparing Shell Scripts::     Preparing Shell Scripts for Internationalization
409* gettext.sh::                  Contents of @code{gettext.sh}
410* gettext Invocation::          Invoking the @code{gettext} program
411* ngettext Invocation::         Invoking the @code{ngettext} program
412* envsubst Invocation::         Invoking the @code{envsubst} program
413* eval_gettext Invocation::     Invoking the @code{eval_gettext} function
414* eval_ngettext Invocation::    Invoking the @code{eval_ngettext} function
415
416Perl
417
418* General Problems::            General Problems Parsing Perl Code
419* Default Keywords::            Which Keywords Will xgettext Look For?
420* Special Keywords::            How to Extract Hash Keys
421* Quote-like Expressions::      What are Strings And Quote-like Expressions?
422* Interpolation I::             Invalid String Interpolation
423* Interpolation II::            Valid String Interpolation
424* Parentheses::                 When To Use Parentheses
425* Long Lines::                  How To Grok with Long Lines
426* Perl Pitfalls::               Bugs, Pitfalls, and Things That Do Not Work
427
428Internationalizable Data
429
430* POT::                         POT - Portable Object Template
431* RST::                         Resource String Table
432* Glade::                       Glade - GNOME user interface description
433
434Concluding Remarks
435
436* History::                     History of GNU @code{gettext}
437* References::                  Related Readings
438
439Language Codes
440
441* Usual Language Codes::        Two-letter ISO 639 language codes
442* Rare Language Codes::         Three-letter ISO 639 language codes
443
444Licenses
445
446* GNU GPL::                     GNU General Public License
447* GNU LGPL::                    GNU Lesser General Public License
448* GNU FDL::                     GNU Free Documentation License
449
450@end detailmenu
451@end menu
452
453@end ifinfo
454
455@node Introduction, Users, Top, Top
456@chapter Introduction
457
458This chapter explains the goals sought in the creation
459of GNU @code{gettext} and the free Translation Project.
460Then, it explains a few broad concepts around
461Native Language Support, and positions message translation with regard
462to other aspects of national and cultural variance, as they apply
463to programs.  It also surveys those files used to convey the
464translations.  It explains how the various tools interact in the
465initial generation of these files, and later, how the maintenance
466cycle should usually operate.
467
468@cindex sex
469@cindex he, she, and they
470@cindex she, he, and they
471In this manual, we use @emph{he} when speaking of the programmer or
472maintainer, @emph{she} when speaking of the translator, and @emph{they}
473when speaking of the installers or end users of the translated program.
474This is only a convenience for clarifying the documentation.  It is
475@emph{absolutely} not meant to imply that some roles are more appropriate
476to males or females.  Besides, as you might guess, GNU @code{gettext}
477is meant to be useful for people using computers, whatever their sex,
478race, religion or nationality!
479
480@cindex bug report address
481Please send suggestions and corrections to:
482
483@example
484@group
485@r{Internet address:}
486    bug-gnu-gettext@@gnu.org
487@end group
488@end example
489
490@noindent
491Please include the manual's edition number and update date in your messages.
492
493@menu
494* Why::                         The Purpose of GNU @code{gettext}
495* Concepts::                    I18n, L10n, and Such
496* Aspects::                     Aspects in Native Language Support
497* Files::                       Files Conveying Translations
498* Overview::                    Overview of GNU @code{gettext}
499@end menu
500
501@node Why, Concepts, Introduction, Introduction
502@section The Purpose of GNU @code{gettext}
503
504Usually, programs are written and documented in English, and use
505English at execution time to interact with users.  This is true
506not only of GNU software, but also of a great deal of commercial
507and free software.  Using a common language is quite handy for
508communication between developers, maintainers and users from all
509countries.  On the other hand, most people are less comfortable with
510English than with their own native language, and would prefer to
511use their mother tongue for day to day's work, as far as possible.
512Many would simply @emph{love} to see their computer screen showing
513a lot less of English, and far more of their own language.
514
515@cindex Translation Project
516However, to many people, this dream might appear so far fetched that
517they may believe it is not even worth spending time thinking about
518it.  They have no confidence at all that the dream might ever
519become true.  Yet some have not lost hope, and have organized themselves.
520The Translation Project is a formalization of this hope into a
521workable structure, which has a good chance to get all of us nearer
522the achievement of a truly multi-lingual set of programs.
523
524GNU @code{gettext} is an important step for the Translation Project,
525as it is an asset on which we may build many other steps.  This package
526offers to programmers, translators and even users, a well integrated
527set of tools and documentation.  Specifically, the GNU @code{gettext}
528utilities are a set of tools that provides a framework within which
529other free packages may produce multi-lingual messages.  These tools
530include
531
532@itemize @bullet
533@item
534A set of conventions about how programs should be written to support
535message catalogs.
536
537@item
538A directory and file naming organization for the message catalogs
539themselves.
540
541@item
542A runtime library supporting the retrieval of translated messages.
543
544@item
545A few stand-alone programs to massage in various ways the sets of
546translatable strings, or already translated strings.
547
548@item
549A library supporting the parsing and creation of files containing
550translated messages.
551
552@item
553A special mode for Emacs@footnote{In this manual, all mentions of Emacs
554refers to either GNU Emacs or to XEmacs, which people sometimes call FSF
555Emacs and Lucid Emacs, respectively.} which helps preparing these sets
556and bringing them up to date.
557@end itemize
558
559GNU @code{gettext} is designed to minimize the impact of
560internationalization on program sources, keeping this impact as small
561and hardly noticeable as possible.  Internationalization has better
562chances of succeeding if it is very light weighted, or at least,
563appear to be so, when looking at program sources.
564
565The Translation Project also uses the GNU @code{gettext} distribution
566as a vehicle for documenting its structure and methods.  This goes
567beyond the strict technicalities of documenting the GNU @code{gettext}
568proper.  By so doing, translators will find in a single place, as
569far as possible, all they need to know for properly doing their
570translating work.  Also, this supplemental documentation might also
571help programmers, and even curious users, in understanding how GNU
572@code{gettext} is related to the remainder of the Translation
573Project, and consequently, have a glimpse at the @emph{big picture}.
574
575@node Concepts, Aspects, Why, Introduction
576@section I18n, L10n, and Such
577
578@cindex i18n
579@cindex l10n
580Two long words appear all the time when we discuss support of native
581language in programs, and these words have a precise meaning, worth
582being explained here, once and for all in this document.  The words are
583@emph{internationalization} and @emph{localization}.  Many people,
584tired of writing these long words over and over again, took the
585habit of writing @dfn{i18n} and @dfn{l10n} instead, quoting the first
586and last letter of each word, and replacing the run of intermediate
587letters by a number merely telling how many such letters there are.
588But in this manual, in the sake of clarity, we will patiently write
589the names in full, each time@dots{}
590
591@cindex internationalization
592By @dfn{internationalization}, one refers to the operation by which a
593program, or a set of programs turned into a package, is made aware of and
594able to support multiple languages.  This is a generalization process,
595by which the programs are untied from calling only English strings or
596other English specific habits, and connected to generic ways of doing
597the same, instead.  Program developers may use various techniques to
598internationalize their programs.  Some of these have been standardized.
599GNU @code{gettext} offers one of these standards.  @xref{Programmers}.
600
601@cindex localization
602By @dfn{localization}, one means the operation by which, in a set
603of programs already internationalized, one gives the program all
604needed information so that it can adapt itself to handle its input
605and output in a fashion which is correct for some native language and
606cultural habits.  This is a particularisation process, by which generic
607methods already implemented in an internationalized program are used
608in specific ways.  The programming environment puts several functions
609to the programmers disposal which allow this runtime configuration.
610The formal description of specific set of cultural habits for some
611country, together with all associated translations targeted to the
612same native language, is called the @dfn{locale} for this language
613or country.  Users achieve localization of programs by setting proper
614values to special environment variables, prior to executing those
615programs, identifying which locale should be used.
616
617In fact, locale message support is only one component of the cultural
618data that makes up a particular locale.  There are a whole host of
619routines and functions provided to aid programmers in developing
620internationalized software and which allow them to access the data
621stored in a particular locale.  When someone presently refers to a
622particular locale, they are obviously referring to the data stored
623within that particular locale.  Similarly, if a programmer is referring
624to ``accessing the locale routines'', they are referring to the
625complete suite of routines that access all of the locale's information.
626
627@cindex NLS
628@cindex Native Language Support
629@cindex Natural Language Support
630One uses the expression @dfn{Native Language Support}, or merely NLS,
631for speaking of the overall activity or feature encompassing both
632internationalization and localization, allowing for multi-lingual
633interactions in a program.  In a nutshell, one could say that
634internationalization is the operation by which further localizations
635are made possible.
636
637Also, very roughly said, when it comes to multi-lingual messages,
638internationalization is usually taken care of by programmers, and
639localization is usually taken care of by translators.
640
641@node Aspects, Files, Concepts, Introduction
642@section Aspects in Native Language Support
643
644@cindex translation aspects
645For a totally multi-lingual distribution, there are many things to
646translate beyond output messages.
647
648@itemize @bullet
649@item
650As of today, GNU @code{gettext} offers a complete toolset for
651translating messages output by C programs.  Perl scripts and shell
652scripts will also need to be translated.  Even if there are today some hooks
653by which this can be done, these hooks are not integrated as well as they
654should be.
655
656@item
657Some programs, like @code{autoconf} or @code{bison}, are able
658to produce other programs (or scripts).  Even if the generating
659programs themselves are internationalized, the generated programs they
660produce may need internationalization on their own, and this indirect
661internationalization could be automated right from the generating
662program.  In fact, quite usually, generating and generated programs
663could be internationalized independently, as the effort needed is
664fairly orthogonal.
665
666@item
667A few programs include textual tables which might need translation
668themselves, independently of the strings contained in the program
669itself.  For example, @w{RFC 1345} gives an English description for each
670character which the @code{recode} program is able to reconstruct at execution.
671Since these descriptions are extracted from the RFC by mechanical means,
672translating them properly would require a prior translation of the RFC
673itself.
674
675@item
676Almost all programs accept options, which are often worded out so to
677be descriptive for the English readers; one might want to consider
678offering translated versions for program options as well.
679
680@item
681Many programs read, interpret, compile, or are somewhat driven by
682input files which are texts containing keywords, identifiers, or
683replies which are inherently translatable.  For example, one may want
684@code{gcc} to allow diacriticized characters in identifiers or use
685translated keywords; @samp{rm -i} might accept something else than
686@samp{y} or @samp{n} for replies, etc.  Even if the program will
687eventually make most of its output in the foreign languages, one has
688to decide whether the input syntax, option values, etc., are to be
689localized or not.
690
691@item
692The manual accompanying a package, as well as all documentation files
693in the distribution, could surely be translated, too.  Translating a
694manual, with the intent of later keeping up with updates, is a major
695undertaking in itself, generally.
696
697@end itemize
698
699As we already stressed, translation is only one aspect of locales.
700Other internationalization aspects are system services and are handled
701in GNU @code{libc}.  There
702are many attributes that are needed to define a country's cultural
703conventions.  These attributes include beside the country's native
704language, the formatting of the date and time, the representation of
705numbers, the symbols for currency, etc.  These local @dfn{rules} are
706termed the country's locale.  The locale represents the knowledge
707needed to support the country's native attributes.
708
709@cindex locale facets
710There are a few major areas which may vary between countries and
711hence, define what a locale must describe.  The following list helps
712putting multi-lingual messages into the proper context of other tasks
713related to locales.  See the GNU @code{libc} manual for details.
714
715@table @emph
716
717@item Characters and Codesets
718@cindex codeset
719@cindex encoding
720@cindex character encoding
721@cindex locale facet, LC_CTYPE
722
723The codeset most commonly used through out the USA and most English
724speaking parts of the world is the ASCII codeset.  However, there are
725many characters needed by various locales that are not found within
726this codeset.  The 8-bit @w{ISO 8859-1} code set has most of the special
727characters needed to handle the major European languages.  However, in
728many cases, choosing @w{ISO 8859-1} is nevertheless not adequate: it
729doesn't even handle the major European currency.  Hence each locale
730will need to specify which codeset they need to use and will need
731to have the appropriate character handling routines to cope with
732the codeset.
733
734@item Currency
735@cindex currency symbols
736@cindex locale facet, LC_MONETARY
737
738The symbols used vary from country to country as does the position
739used by the symbol.  Software needs to be able to transparently
740display currency figures in the native mode for each locale.
741
742@item Dates
743@cindex date format
744@cindex locale facet, LC_TIME
745
746The format of date varies between locales.  For example, Christmas day
747in 1994 is written as 12/25/94 in the USA and as 25/12/94 in Australia.
748Other countries might use @w{ISO 8601} dates, etc.
749
750Time of the day may be noted as @var{hh}:@var{mm}, @var{hh}.@var{mm},
751or otherwise.  Some locales require time to be specified in 24-hour
752mode rather than as AM or PM.  Further, the nature and yearly extent
753of the Daylight Saving correction vary widely between countries.
754
755@item Numbers
756@cindex number format
757@cindex locale facet, LC_NUMERIC
758
759Numbers can be represented differently in different locales.
760For example, the following numbers are all written correctly for
761their respective locales:
762
763@example
76412,345.67       English
76512.345,67       German
766 12345,67       French
7671,2345.67       Asia
768@end example
769
770Some programs could go further and use different unit systems, like
771English units or Metric units, or even take into account variants
772about how numbers are spelled in full.
773
774@item Messages
775@cindex messages
776@cindex locale facet, LC_MESSAGES
777
778The most obvious area is the language support within a locale.  This is
779where GNU @code{gettext} provides the means for developers and users to
780easily change the language that the software uses to communicate to
781the user.
782
783@end table
784
785@cindex Linux
786Components of locale outside of message handling are standardized in
787the ISO C standard and the SUSV2 specification.  GNU @code{libc}
788fully implements this, and most other modern systems provide a more
789or less reasonable support for at least some of the missing components.
790
791@node Files, Overview, Aspects, Introduction
792@section Files Conveying Translations
793
794@cindex files, @file{.po} and @file{.mo}
795The letters PO in @file{.po} files means Portable Object, to
796distinguish it from @file{.mo} files, where MO stands for Machine
797Object.  This paradigm, as well as the PO file format, is inspired
798by the NLS standard developed by Uniforum, and first implemented by
799Sun in their Solaris system.
800
801PO files are meant to be read and edited by humans, and associate each
802original, translatable string of a given package with its translation
803in a particular target language.  A single PO file is dedicated to
804a single target language.  If a package supports many languages,
805there is one such PO file per language supported, and each package
806has its own set of PO files.  These PO files are best created by
807the @code{xgettext} program, and later updated or refreshed through
808the @code{msgmerge} program.  Program @code{xgettext} extracts all
809marked messages from a set of C files and initializes a PO file with
810empty translations.  Program @code{msgmerge} takes care of adjusting
811PO files between releases of the corresponding sources, commenting
812obsolete entries, initializing new ones, and updating all source
813line references.  Files ending with @file{.pot} are kind of base
814translation files found in distributions, in PO file format.
815
816MO files are meant to be read by programs, and are binary in nature.
817A few systems already offer tools for creating and handling MO files
818as part of the Native Language Support coming with the system, but the
819format of these MO files is often different from system to system,
820and non-portable.  The tools already provided with these systems don't
821support all the features of GNU @code{gettext}.  Therefore GNU
822@code{gettext} uses its own format for MO files.  Files ending with
823@file{.gmo} are really MO files, when it is known that these files use
824the GNU format.
825
826@node Overview,  , Files, Introduction
827@section Overview of GNU @code{gettext}
828
829@cindex overview of @code{gettext}
830@cindex big picture
831@cindex tutorial of @code{gettext} usage
832The following diagram summarizes the relation between the files
833handled by GNU @code{gettext} and the tools acting on these files.
834It is followed by somewhat detailed explanations, which you should
835read while keeping an eye on the diagram.  Having a clear understanding
836of these interrelations will surely help programmers, translators
837and maintainers.
838
839@example
840@ifhtml
841@group
842Original C Sources ���������> Preparation ���������> Marked C Sources ������������
843                                                             ���
844              ������������������������������<��������� GNU gettext Library             ���
845������������ make <������������                                              ���
846���             ������������������������������<���������������������������������������������������������������������������������������������������������������
847���                                            ���
848���   ������������������<��������� PACKAGE.pot <��������� xgettext <������������   ������������<��������� PO Compendium
849���   ���                                            ���              ���
850���   ���                                            ���������������          ���
851���   ���������������                                            ������������> PO editor ������������
852���       ���������������> msgmerge ������������������> LANG.po ������������>���������������������������                  ���
853���   ���������������                                                               ���
854���   ���                                                                   ���
855���   ������������������������������������������<������������������������������������������������                                     ���
856���                                 ������������ New LANG.po <���������������������������������������������������������������
857���   ������������ LANG.gmo <��������� msgfmt <������������
858���   ���
859���   ������������> install ���������> /.../LANG/PACKAGE.mo ������������
860���                                              ������������> "Hello world!"
861������������������������> install ���������> /.../bin/PROGRAM ������������������������
862@end group
863@end ifhtml
864@ifnothtml
865@group
866Original C Sources ---> Preparation ---> Marked C Sources ---.
867                                                             |
868              .---------<--- GNU gettext Library             |
869.--- make <---+                                              |
870|             `---------<--------------------+---------------'
871|                                            |
872|   .-----<--- PACKAGE.pot <--- xgettext <---'   .---<--- PO Compendium
873|   |                                            |              ^
874|   |                                            `---.          |
875|   `---.                                            +---> PO editor ---.
876|       +----> msgmerge ------> LANG.po ---->--------'                  |
877|   .---'                                                               |
878|   |                                                                   |
879|   `-------------<---------------.                                     |
880|                                 +--- New LANG.po <--------------------'
881|   .--- LANG.gmo <--- msgfmt <---'
882|   |
883|   `---> install ---> /.../LANG/PACKAGE.mo ---.
884|                                              +---> "Hello world!"
885`-------> install ---> /.../bin/PROGRAM -------'
886@end group
887@end ifnothtml
888@end example
889
890@cindex marking translatable strings
891As a programmer, the first step to bringing GNU @code{gettext}
892into your package is identifying, right in the C sources, those strings
893which are meant to be translatable, and those which are untranslatable.
894This tedious job can be done a little more comfortably using emacs PO
895mode, but you can use any means familiar to you for modifying your
896C sources.  Beside this some other simple, standard changes are needed to
897properly initialize the translation library.  @xref{Sources}, for
898more information about all this.
899
900For newly written software the strings of course can and should be
901marked while writing it.  The @code{gettext} approach makes this
902very easy.  Simply put the following lines at the beginning of each file
903or in a central header file:
904
905@example
906@group
907#define _(String) (String)
908#define N_(String) String
909#define textdomain(Domain)
910#define bindtextdomain(Package, Directory)
911@end group
912@end example
913
914@noindent
915Doing this allows you to prepare the sources for internationalization.
916Later when you feel ready for the step to use the @code{gettext} library
917simply replace these definitions by the following:
918
919@cindex include file @file{libintl.h}
920@example
921@group
922#include <libintl.h>
923#define _(String) gettext (String)
924#define gettext_noop(String) String
925#define N_(String) gettext_noop (String)
926@end group
927@end example
928
929@cindex link with @file{libintl}
930@cindex Linux
931@noindent
932and link against @file{libintl.a} or @file{libintl.so}.  Note that on
933GNU systems, you don't need to link with @code{libintl} because the
934@code{gettext} library functions are already contained in GNU libc.
935That is all you have to change.
936
937@cindex template PO file
938@cindex files, @file{.pot}
939Once the C sources have been modified, the @code{xgettext} program
940is used to find and extract all translatable strings, and create a
941PO template file out of all these.  This @file{@var{package}.pot} file
942contains all original program strings.  It has sets of pointers to
943exactly where in C sources each string is used.  All translations
944are set to empty.  The letter @code{t} in @file{.pot} marks this as
945a Template PO file, not yet oriented towards any particular language.
946@xref{xgettext Invocation}, for more details about how one calls the
947@code{xgettext} program.  If you are @emph{really} lazy, you might
948be interested at working a lot more right away, and preparing the
949whole distribution setup (@pxref{Maintainers}).  By doing so, you
950spare yourself typing the @code{xgettext} command, as @code{make}
951should now generate the proper things automatically for you!
952
953The first time through, there is no @file{@var{lang}.po} yet, so the
954@code{msgmerge} step may be skipped and replaced by a mere copy of
955@file{@var{package}.pot} to @file{@var{lang}.po}, where @var{lang}
956represents the target language.  See @ref{Creating} for details.
957
958Then comes the initial translation of messages.  Translation in
959itself is a whole matter, still exclusively meant for humans,
960and whose complexity far overwhelms the level of this manual.
961Nevertheless, a few hints are given in some other chapter of this
962manual (@pxref{Translators}).  You will also find there indications
963about how to contact translating teams, or becoming part of them,
964for sharing your translating concerns with others who target the same
965native language.
966
967While adding the translated messages into the @file{@var{lang}.po}
968PO file, if you are not using one of the dedicated PO file editors
969(@pxref{Editing}), you are on your own
970for ensuring that your efforts fully respect the PO file format, and quoting
971conventions (@pxref{PO Files}).  This is surely not an impossible task,
972as this is the way many people have handled PO files around 1995.
973On the other hand, by using a PO file editor, most details
974of PO file format are taken care of for you, but you have to acquire
975some familiarity with PO file editor itself.
976
977If some common translations have already been saved into a compendium
978PO file, translators may use PO mode for initializing untranslated
979entries from the compendium, and also save selected translations into
980the compendium, updating it (@pxref{Compendium}).  Compendium files
981are meant to be exchanged between members of a given translation team.
982
983Programs, or packages of programs, are dynamic in nature: users write
984bug reports and suggestion for improvements, maintainers react by
985modifying programs in various ways.  The fact that a package has
986already been internationalized should not make maintainers shy
987of adding new strings, or modifying strings already translated.
988They just do their job the best they can.  For the Translation
989Project to work smoothly, it is important that maintainers do not
990carry translation concerns on their already loaded shoulders, and that
991translators be kept as free as possible of programming concerns.
992
993The only concern maintainers should have is carefully marking new
994strings as translatable, when they should be, and do not otherwise
995worry about them being translated, as this will come in proper time.
996Consequently, when programs and their strings are adjusted in various
997ways by maintainers, and for matters usually unrelated to translation,
998@code{xgettext} would construct @file{@var{package}.pot} files which are
999evolving over time, so the translations carried by @file{@var{lang}.po}
1000are slowly fading out of date.
1001
1002@cindex evolution of packages
1003It is important for translators (and even maintainers) to understand
1004that package translation is a continuous process in the lifetime of a
1005package, and not something which is done once and for all at the start.
1006After an initial burst of translation activity for a given package,
1007interventions are needed once in a while, because here and there,
1008translated entries become obsolete, and new untranslated entries
1009appear, needing translation.
1010
1011The @code{msgmerge} program has the purpose of refreshing an already
1012existing @file{@var{lang}.po} file, by comparing it with a newer
1013@file{@var{package}.pot} template file, extracted by @code{xgettext}
1014out of recent C sources.  The refreshing operation adjusts all
1015references to C source locations for strings, since these strings
1016move as programs are modified.  Also, @code{msgmerge} comments out as
1017obsolete, in @file{@var{lang}.po}, those already translated entries
1018which are no longer used in the program sources (@pxref{Obsolete
1019Entries}).  It finally discovers new strings and inserts them in
1020the resulting PO file as untranslated entries (@pxref{Untranslated
1021Entries}).  @xref{msgmerge Invocation}, for more information about what
1022@code{msgmerge} really does.
1023
1024Whatever route or means taken, the goal is to obtain an updated
1025@file{@var{lang}.po} file offering translations for all strings.
1026
1027The temporal mobility, or fluidity of PO files, is an integral part of
1028the translation game, and should be well understood, and accepted.
1029People resisting it will have a hard time participating in the
1030Translation Project, or will give a hard time to other participants!  In
1031particular, maintainers should relax and include all available official
1032PO files in their distributions, even if these have not recently been
1033updated, without exerting pressure on the translator teams to get the
1034job done.  The pressure should rather come
1035from the community of users speaking a particular language, and
1036maintainers should consider themselves fairly relieved of any concern
1037about the adequacy of translation files.  On the other hand, translators
1038should reasonably try updating the PO files they are responsible for,
1039while the package is undergoing pretest, prior to an official
1040distribution.
1041
1042Once the PO file is complete and dependable, the @code{msgfmt} program
1043is used for turning the PO file into a machine-oriented format, which
1044may yield efficient retrieval of translations by the programs of the
1045package, whenever needed at runtime (@pxref{MO Files}).  @xref{msgfmt
1046Invocation}, for more information about all modes of execution
1047for the @code{msgfmt} program.
1048
1049Finally, the modified and marked C sources are compiled and linked
1050with the GNU @code{gettext} library, usually through the operation of
1051@code{make}, given a suitable @file{Makefile} exists for the project,
1052and the resulting executable is installed somewhere users will find it.
1053The MO files themselves should also be properly installed.  Given the
1054appropriate environment variables are set (@pxref{End Users}), the
1055program should localize itself automatically, whenever it executes.
1056
1057The remainder of this manual has the purpose of explaining in depth the various
1058steps outlined above.
1059
1060@node Users, PO Files, Introduction, Top
1061@chapter The User's View
1062
1063When GNU @code{gettext} will truly have reached its goal, average users
1064should feel some kind of astonished pleasure, seeing the effect of
1065that strange kind of magic that just makes their own native language
1066appear everywhere on their screens.  As for naive users, they would
1067ideally have no special pleasure about it, merely taking their own
1068language for @emph{granted}, and becoming rather unhappy otherwise.
1069
1070So, let's try to describe here how we would like the magic to operate,
1071as we want the users' view to be the simplest, among all ways one
1072could look at GNU @code{gettext}.  All other software engineers:
1073programmers, translators, maintainers, should work together in such a
1074way that the magic becomes possible.  This is a long and progressive
1075undertaking, and information is available about the progress of the
1076Translation Project.
1077
1078When a package is distributed, there are two kinds of users:
1079@dfn{installers} who fetch the distribution, unpack it, configure
1080it, compile it and install it for themselves or others to use; and
1081@dfn{end users} that call programs of the package, once these have
1082been installed at their site.  GNU @code{gettext} is offering magic
1083for both installers and end users.
1084
1085@menu
1086* Matrix::                      The Current @file{ABOUT-NLS} Matrix
1087* End Users::                   Magic for End Users
1088@end menu
1089
1090@node Matrix, End Users, Users, Users
1091@section The Current @file{ABOUT-NLS} Matrix
1092@cindex Translation Matrix
1093@cindex available translations
1094@cindex @file{ABOUT-NLS} file
1095
1096Languages are not equally supported in all packages using GNU
1097@code{gettext}.  To know if some package uses GNU @code{gettext}, one
1098may check the distribution for the @file{ABOUT-NLS} information file, for
1099some @file{@var{ll}.po} files, often kept together into some @file{po/}
1100directory, or for an @file{intl/} directory.  Internationalized packages
1101have usually many @file{@var{ll}.po} files, where @var{ll} represents
1102the language.  @ref{End Users} for a complete description of the format
1103for @var{ll}.
1104
1105More generally, a matrix is available for showing the current state
1106of the Translation Project, listing which packages are prepared for
1107multi-lingual messages, and which languages are supported by each.
1108Because this information changes often, this matrix is not kept within
1109this GNU @code{gettext} manual.  This information is often found in
1110file @file{ABOUT-NLS} from various distributions, but is also as old as
1111the distribution itself.  A recent copy of this @file{ABOUT-NLS} file,
1112containing up-to-date information, should generally be found on the
1113Translation Project sites, and also on most GNU archive sites.
1114
1115@node End Users,  , Matrix, Users
1116@section Magic for End Users
1117@cindex setting up @code{gettext} at run time
1118@cindex selecting message language
1119@cindex language selection
1120
1121@vindex LANG@r{, environment variable}
1122We consider here those packages using GNU @code{gettext} internally,
1123and for which the installers did not disable translation at
1124@emph{configure} time.  Then, users only have to set the @code{LANG}
1125environment variable to the appropriate @samp{@var{ll}_@var{CC}}
1126combination prior to using the programs in the package.  @xref{Matrix}.
1127For example, let's presume a German site.  At the shell prompt, users
1128merely have to execute @w{@samp{setenv LANG de_DE}} (in @code{csh}) or
1129@w{@samp{export LANG; LANG=de_DE}} (in @code{sh}).  They could even do
1130this from their @file{.login} or @file{.profile} file.
1131
1132@node PO Files, Sources, Users, Top
1133@chapter The Format of PO Files
1134@cindex PO files' format
1135@cindex file format, @file{.po}
1136
1137The GNU @code{gettext} toolset helps programmers and translators
1138at producing, updating and using translation files, mainly those
1139PO files which are textual, editable files.  This chapter explains
1140the format of PO files.
1141
1142A PO file is made up of many entries, each entry holding the relation
1143between an original untranslated string and its corresponding
1144translation.  All entries in a given PO file usually pertain
1145to a single project, and all translations are expressed in a single
1146target language.  One PO file @dfn{entry} has the following schematic
1147structure:
1148
1149@example
1150@var{white-space}
1151#  @var{translator-comments}
1152#. @var{extracted-comments}
1153#: @var{reference}@dots{}
1154#, @var{flag}@dots{}
1155#| msgid @var{previous-untranslated-string}
1156msgid @var{untranslated-string}
1157msgstr @var{translated-string}
1158@end example
1159
1160The general structure of a PO file should be well understood by
1161the translator.  When using PO mode, very little has to be known
1162about the format details, as PO mode takes care of them for her.
1163
1164A simple entry can look like this:
1165
1166@example
1167#: lib/error.c:116
1168msgid "Unknown system error"
1169msgstr "Error desconegut del sistema"
1170@end example
1171
1172@cindex comments, translator
1173@cindex comments, automatic
1174@cindex comments, extracted
1175Entries begin with some optional white space.  Usually, when generated
1176through GNU @code{gettext} tools, there is exactly one blank line
1177between entries.  Then comments follow, on lines all starting with the
1178character @code{#}.  There are two kinds of comments: those which have
1179some white space immediately following the @code{#} - the @var{translator
1180comments} -, which comments are created and maintained exclusively by the
1181translator, and those which have some non-white character just after the
1182@code{#} - the @var{automatic comments} -, which comments are created and
1183maintained automatically by GNU @code{gettext} tools.  Comment lines
1184starting with @code{#.} contain comments given by the programmer, directed
1185at the translator; these comments are called @var{extracted comments}
1186because the @code{xgettext} program extracts them from the program's
1187source code.  Comment lines starting with @code{#:} contain references to
1188the program's source code.  Comment lines starting with @code{#,} contain
1189flags; more about these below.  Comment lines starting with @code{#|}
1190contain the previous untranslated string for which the translator gave
1191a translation.
1192
1193All comments, of either kind, are optional.
1194
1195@kwindex msgid
1196@kwindex msgstr
1197After white space and comments, entries show two strings, namely
1198first the untranslated string as it appears in the original program
1199sources, and then, the translation of this string.  The original
1200string is introduced by the keyword @code{msgid}, and the translation,
1201by @code{msgstr}.  The two strings, untranslated and translated,
1202are quoted in various ways in the PO file, using @code{"}
1203delimiters and @code{\} escapes, but the translator does not really
1204have to pay attention to the precise quoting format, as PO mode fully
1205takes care of quoting for her.
1206
1207The @code{msgid} strings, as well as automatic comments, are produced
1208and managed by other GNU @code{gettext} tools, and PO mode does not
1209provide means for the translator to alter these.  The most she can
1210do is merely deleting them, and only by deleting the whole entry.
1211On the other hand, the @code{msgstr} string, as well as translator
1212comments, are really meant for the translator, and PO mode gives her
1213the full control she needs.
1214
1215The comment lines beginning with @code{#,} are special because they are
1216not completely ignored by the programs as comments generally are.  The
1217comma separated list of @var{flag}s is used by the @code{msgfmt}
1218program to give the user some better diagnostic messages.  Currently
1219there are two forms of flags defined:
1220
1221@table @code
1222@item fuzzy
1223@kwindex fuzzy@r{ flag}
1224This flag can be generated by the @code{msgmerge} program or it can be
1225inserted by the translator herself.  It shows that the @code{msgstr}
1226string might not be a correct translation (anymore).  Only the translator
1227can judge if the translation requires further modification, or is
1228acceptable as is.  Once satisfied with the translation, she then removes
1229this @code{fuzzy} attribute.  The @code{msgmerge} program inserts this
1230when it combined the @code{msgid} and @code{msgstr} entries after fuzzy
1231search only.  @xref{Fuzzy Entries}.
1232
1233@item c-format
1234@kwindex c-format@r{ flag}
1235@itemx no-c-format
1236@kwindex no-c-format@r{ flag}
1237These flags should not be added by a human.  Instead only the
1238@code{xgettext} program adds them.  In an automated PO file processing
1239system as proposed here the user changes would be thrown away again as
1240soon as the @code{xgettext} program generates a new template file.
1241
1242The @code{c-format} flag tells that the untranslated string and the
1243translation are supposed to be C format strings.  The @code{no-c-format}
1244flag tells that they are not C format strings, even though the untranslated
1245string happens to look like a C format string (with @samp{%} directives).
1246
1247In case the @code{c-format} flag is given for a string the @code{msgfmt}
1248does some more tests to check to validity of the translation.
1249@xref{msgfmt Invocation}, @ref{c-format Flag} and @ref{c-format}.
1250
1251@item objc-format
1252@kwindex objc-format@r{ flag}
1253@itemx no-objc-format
1254@kwindex no-objc-format@r{ flag}
1255Likewise for Objective C, see @ref{objc-format}.
1256
1257@item sh-format
1258@kwindex sh-format@r{ flag}
1259@itemx no-sh-format
1260@kwindex no-sh-format@r{ flag}
1261Likewise for Shell, see @ref{sh-format}.
1262
1263@item python-format
1264@kwindex python-format@r{ flag}
1265@itemx no-python-format
1266@kwindex no-python-format@r{ flag}
1267Likewise for Python, see @ref{python-format}.
1268
1269@item lisp-format
1270@kwindex lisp-format@r{ flag}
1271@itemx no-lisp-format
1272@kwindex no-lisp-format@r{ flag}
1273Likewise for Lisp, see @ref{lisp-format}.
1274
1275@item elisp-format
1276@kwindex elisp-format@r{ flag}
1277@itemx no-elisp-format
1278@kwindex no-elisp-format@r{ flag}
1279Likewise for Emacs Lisp, see @ref{elisp-format}.
1280
1281@item librep-format
1282@kwindex librep-format@r{ flag}
1283@itemx no-librep-format
1284@kwindex no-librep-format@r{ flag}
1285Likewise for librep, see @ref{librep-format}.
1286
1287@item scheme-format
1288@kwindex scheme-format@r{ flag}
1289@itemx no-scheme-format
1290@kwindex no-scheme-format@r{ flag}
1291Likewise for Scheme, see @ref{scheme-format}.
1292
1293@item smalltalk-format
1294@kwindex smalltalk-format@r{ flag}
1295@itemx no-smalltalk-format
1296@kwindex no-smalltalk-format@r{ flag}
1297Likewise for Smalltalk, see @ref{smalltalk-format}.
1298
1299@item java-format
1300@kwindex java-format@r{ flag}
1301@itemx no-java-format
1302@kwindex no-java-format@r{ flag}
1303Likewise for Java, see @ref{java-format}.
1304
1305@item csharp-format
1306@kwindex csharp-format@r{ flag}
1307@itemx no-csharp-format
1308@kwindex no-csharp-format@r{ flag}
1309Likewise for C#, see @ref{csharp-format}.
1310
1311@item awk-format
1312@kwindex awk-format@r{ flag}
1313@itemx no-awk-format
1314@kwindex no-awk-format@r{ flag}
1315Likewise for awk, see @ref{awk-format}.
1316
1317@item object-pascal-format
1318@kwindex object-pascal-format@r{ flag}
1319@itemx no-object-pascal-format
1320@kwindex no-object-pascal-format@r{ flag}
1321Likewise for Object Pascal, see @ref{object-pascal-format}.
1322
1323@item ycp-format
1324@kwindex ycp-format@r{ flag}
1325@itemx no-ycp-format
1326@kwindex no-ycp-format@r{ flag}
1327Likewise for YCP, see @ref{ycp-format}.
1328
1329@item tcl-format
1330@kwindex tcl-format@r{ flag}
1331@itemx no-tcl-format
1332@kwindex no-tcl-format@r{ flag}
1333Likewise for Tcl, see @ref{tcl-format}.
1334
1335@item perl-format
1336@kwindex perl-format@r{ flag}
1337@itemx no-perl-format
1338@kwindex no-perl-format@r{ flag}
1339Likewise for Perl, see @ref{perl-format}.
1340
1341@item perl-brace-format
1342@kwindex perl-brace-format@r{ flag}
1343@itemx no-perl-brace-format
1344@kwindex no-perl-brace-format@r{ flag}
1345Likewise for Perl brace, see @ref{perl-format}.
1346
1347@item php-format
1348@kwindex php-format@r{ flag}
1349@itemx no-php-format
1350@kwindex no-php-format@r{ flag}
1351Likewise for PHP, see @ref{php-format}.
1352
1353@item gcc-internal-format
1354@kwindex gcc-internal-format@r{ flag}
1355@itemx no-gcc-internal-format
1356@kwindex no-gcc-internal-format@r{ flag}
1357Likewise for the GCC sources, see @ref{gcc-internal-format}.
1358
1359@item qt-format
1360@kwindex qt-format@r{ flag}
1361@itemx no-qt-format
1362@kwindex no-qt-format@r{ flag}
1363Likewise for Qt, see @ref{qt-format}.
1364
1365@item boost-format
1366@kwindex boost-format@r{ flag}
1367@itemx no-boost-format
1368@kwindex no-boost-format@r{ flag}
1369Likewise for Boost, see @ref{boost-format}.
1370
1371@end table
1372
1373@kwindex msgctxt
1374@cindex context, in PO files
1375It is also possible to have entries with a context specifier. They look like
1376this:
1377
1378@example
1379@var{white-space}
1380#  @var{translator-comments}
1381#. @var{extracted-comments}
1382#: @var{reference}@dots{}
1383#, @var{flag}@dots{}
1384#| msgctxt @var{previous-context}
1385#| msgid @var{previous-untranslated-string}
1386msgctxt @var{context}
1387msgid @var{untranslated-string}
1388msgstr @var{translated-string}
1389@end example
1390
1391The context serves to disambiguate messages with the same
1392@var{untranslated-string}.  It is possible to have several entries with
1393the same @var{untranslated-string} in a PO file, provided that they each
1394have a different @var{context}.  Note that an empty @var{context} string
1395and an absent @code{msgctxt} line do not mean the same thing.
1396
1397@kwindex msgid_plural
1398@cindex plural forms, in PO files
1399A different kind of entries is used for translations which involve
1400plural forms.
1401
1402@example
1403@var{white-space}
1404#  @var{translator-comments}
1405#. @var{extracted-comments}
1406#: @var{reference}@dots{}
1407#, @var{flag}@dots{}
1408#| msgid @var{previous-untranslated-string-singular}
1409#| msgid_plural @var{previous-untranslated-string-plural}
1410msgid @var{untranslated-string-singular}
1411msgid_plural @var{untranslated-string-plural}
1412msgstr[0] @var{translated-string-case-0}
1413...
1414msgstr[N] @var{translated-string-case-n}
1415@end example
1416
1417Such an entry can look like this:
1418
1419@example
1420#: src/msgcmp.c:338 src/po-lex.c:699
1421#, c-format
1422msgid "found %d fatal error"
1423msgid_plural "found %d fatal errors"
1424msgstr[0] "s'ha trobat %d error fatal"
1425msgstr[1] "s'han trobat %d errors fatals"
1426@end example
1427
1428Here also, a @code{msgctxt} context can be specified before @code{msgid},
1429like above.
1430
1431The @var{previous-untranslated-string} is optionally inserted by the
1432@code{msgmerge} program, at the same time when it marks a message fuzzy.
1433It helps the translator to see which changes were done by the developers
1434on the @var{untranslated-string}.
1435
1436It happens that some lines, usually whitespace or comments, follow the
1437very last entry of a PO file.  Such lines are not part of any entry,
1438and will be dropped when the PO file is processed by the tools, or may
1439disturb some PO file editors.
1440
1441The remainder of this section may be safely skipped by those using
1442a PO file editor, yet it may be interesting for everybody to have a better
1443idea of the precise format of a PO file.  On the other hand, those
1444wishing to modify PO files by hand should carefully continue reading on.
1445
1446Each of @var{untranslated-string} and @var{translated-string} respects
1447the C syntax for a character string, including the surrounding quotes
1448and embedded backslashed escape sequences.  When the time comes
1449to write multi-line strings, one should not use escaped newlines.
1450Instead, a closing quote should follow the last character on the
1451line to be continued, and an opening quote should resume the string
1452at the beginning of the following PO file line.  For example:
1453
1454@example
1455msgid ""
1456"Here is an example of how one might continue a very long string\n"
1457"for the common case the string represents multi-line output.\n"
1458@end example
1459
1460@noindent
1461In this example, the empty string is used on the first line, to
1462allow better alignment of the @code{H} from the word @samp{Here}
1463over the @code{f} from the word @samp{for}.  In this example, the
1464@code{msgid} keyword is followed by three strings, which are meant
1465to be concatenated.  Concatenating the empty string does not change
1466the resulting overall string, but it is a way for us to comply with
1467the necessity of @code{msgid} to be followed by a string on the same
1468line, while keeping the multi-line presentation left-justified, as
1469we find this to be a cleaner disposition.  The empty string could have
1470been omitted, but only if the string starting with @samp{Here} was
1471promoted on the first line, right after @code{msgid}.@footnote{This
1472limitation is not imposed by GNU @code{gettext}, but is for compatibility
1473with the @code{msgfmt} implementation on Solaris.} It was not really necessary
1474either to switch between the two last quoted strings immediately after
1475the newline @samp{\n}, the switch could have occurred after @emph{any}
1476other character, we just did it this way because it is neater.
1477
1478@cindex newlines in PO files
1479One should carefully distinguish between end of lines marked as
1480@samp{\n} @emph{inside} quotes, which are part of the represented
1481string, and end of lines in the PO file itself, outside string quotes,
1482which have no incidence on the represented string.
1483
1484@cindex comments in PO files
1485Outside strings, white lines and comments may be used freely.
1486Comments start at the beginning of a line with @samp{#} and extend
1487until the end of the PO file line.  Comments written by translators
1488should have the initial @samp{#} immediately followed by some white
1489space.  If the @samp{#} is not immediately followed by white space,
1490this comment is most likely generated and managed by specialized GNU
1491tools, and might disappear or be replaced unexpectedly when the PO
1492file is given to @code{msgmerge}.
1493
1494@node Sources, Template, PO Files, Top
1495@chapter Preparing Program Sources
1496@cindex preparing programs for translation
1497
1498@c FIXME: Rewrite (the whole chapter).
1499
1500For the programmer, changes to the C source code fall into three
1501categories.  First, you have to make the localization functions
1502known to all modules needing message translation.  Second, you should
1503properly trigger the operation of GNU @code{gettext} when the program
1504initializes, usually from the @code{main} function.  Last, you should
1505identify, adjust and mark all constant strings in your program
1506needing translation.
1507
1508@menu
1509* Importing::                   Importing the @code{gettext} declaration
1510* Triggering::                  Triggering @code{gettext} Operations
1511* Preparing Strings::           Preparing Translatable Strings
1512* Mark Keywords::               How Marks Appear in Sources
1513* Marking::                     Marking Translatable Strings
1514* c-format Flag::               Telling something about the following string
1515* Special cases::               Special Cases of Translatable Strings
1516* Names::                       Marking Proper Names for Translation
1517* Libraries::                   Preparing Library Sources
1518@end menu
1519
1520@node Importing, Triggering, Sources, Sources
1521@section Importing the @code{gettext} declaration
1522
1523Presuming that your set of programs, or package, has been adjusted
1524so all needed GNU @code{gettext} files are available, and your
1525@file{Makefile} files are adjusted (@pxref{Maintainers}), each C module
1526having translated C strings should contain the line:
1527
1528@cindex include file @file{libintl.h}
1529@example
1530#include <libintl.h>
1531@end example
1532
1533Similarly, each C module containing @code{printf()}/@code{fprintf()}/...
1534calls with a format string that could be a translated C string (even if
1535the C string comes from a different C module) should contain the line:
1536
1537@example
1538#include <libintl.h>
1539@end example
1540
1541@node Triggering, Preparing Strings, Importing, Sources
1542@section Triggering @code{gettext} Operations
1543
1544@cindex initialization
1545The initialization of locale data should be done with more or less
1546the same code in every program, as demonstrated below:
1547
1548@example
1549@group
1550int
1551main (int argc, char *argv[])
1552@{
1553  @dots{}
1554  setlocale (LC_ALL, "");
1555  bindtextdomain (PACKAGE, LOCALEDIR);
1556  textdomain (PACKAGE);
1557  @dots{}
1558@}
1559@end group
1560@end example
1561
1562@var{PACKAGE} and @var{LOCALEDIR} should be provided either by
1563@file{config.h} or by the Makefile.  For now consult the @code{gettext}
1564or @code{hello} sources for more information.
1565
1566@cindex locale facet, LC_ALL
1567@cindex locale facet, LC_CTYPE
1568The use of @code{LC_ALL} might not be appropriate for you.
1569@code{LC_ALL} includes all locale categories and especially
1570@code{LC_CTYPE}.  This later category is responsible for determining
1571character classes with the @code{isalnum} etc. functions from
1572@file{ctype.h} which could especially for programs, which process some
1573kind of input language, be wrong.  For example this would mean that a
1574source code using the @,{c} (c-cedilla character) is runnable in
1575France but not in the U.S.
1576
1577Some systems also have problems with parsing numbers using the
1578@code{scanf} functions if an other but the @code{LC_ALL} locale is used.
1579The standards say that additional formats but the one known in the
1580@code{"C"} locale might be recognized.  But some systems seem to reject
1581numbers in the @code{"C"} locale format.  In some situation, it might
1582also be a problem with the notation itself which makes it impossible to
1583recognize whether the number is in the @code{"C"} locale or the local
1584format.  This can happen if thousands separator characters are used.
1585Some locales define this character according to the national
1586conventions to @code{'.'} which is the same character used in the
1587@code{"C"} locale to denote the decimal point.
1588
1589So it is sometimes necessary to replace the @code{LC_ALL} line in the
1590code above by a sequence of @code{setlocale} lines
1591
1592@example
1593@group
1594@{
1595  @dots{}
1596  setlocale (LC_CTYPE, "");
1597  setlocale (LC_MESSAGES, "");
1598  @dots{}
1599@}
1600@end group
1601@end example
1602
1603@cindex locale facet, LC_CTYPE
1604@cindex locale facet, LC_COLLATE
1605@cindex locale facet, LC_MONETARY
1606@cindex locale facet, LC_NUMERIC
1607@cindex locale facet, LC_TIME
1608@cindex locale facet, LC_MESSAGES
1609@cindex locale facet, LC_RESPONSES
1610@noindent
1611On all POSIX conformant systems the locale categories @code{LC_CTYPE},
1612@code{LC_MESSAGES}, @code{LC_COLLATE}, @code{LC_MONETARY},
1613@code{LC_NUMERIC}, and @code{LC_TIME} are available.  On some systems
1614which are only ISO C compliant, @code{LC_MESSAGES} is missing, but
1615a substitute for it is defined in GNU gettext's @code{<libintl.h>}.
1616
1617Note that changing the @code{LC_CTYPE} also affects the functions
1618declared in the @code{<ctype.h>} standard header.  If this is not
1619desirable in your application (for example in a compiler's parser),
1620you can use a set of substitute functions which hardwire the C locale,
1621such as found in the @code{<c-ctype.h>} and @code{<c-ctype.c>} files
1622in the gettext source distribution.
1623
1624It is also possible to switch the locale forth and back between the
1625environment dependent locale and the C locale, but this approach is
1626normally avoided because a @code{setlocale} call is expensive,
1627because it is tedious to determine the places where a locale switch
1628is needed in a large program's source, and because switching a locale
1629is not multithread-safe.
1630
1631@node Preparing Strings, Mark Keywords, Triggering, Sources
1632@section Preparing Translatable Strings
1633
1634@cindex marking strings, preparations
1635Before strings can be marked for translations, they sometimes need to
1636be adjusted.  Usually preparing a string for translation is done right
1637before marking it, during the marking phase which is described in the
1638next sections.  What you have to keep in mind while doing that is the
1639following.
1640
1641@itemize @bullet
1642@item
1643Decent English style.
1644
1645@item
1646Entire sentences.
1647
1648@item
1649Split at paragraphs.
1650
1651@item
1652Use format strings instead of string concatenation.
1653
1654@item
1655Avoid unusual markup and unusual control characters.
1656@end itemize
1657
1658@noindent
1659Let's look at some examples of these guidelines.
1660
1661@cindex style
1662Translatable strings should be in good English style.  If slang language
1663with abbreviations and shortcuts is used, often translators will not
1664understand the message and will produce very inappropriate translations.
1665
1666@example
1667"%s: is parameter\n"
1668@end example
1669
1670@noindent
1671This is nearly untranslatable: Is the displayed item @emph{a} parameter or
1672@emph{the} parameter?
1673
1674@example
1675"No match"
1676@end example
1677
1678@noindent
1679The ambiguity in this message makes it unintelligible: Is the program
1680attempting to set something on fire? Does it mean "The given object does
1681not match the template"? Does it mean "The template does not fit for any
1682of the objects"?
1683
1684@cindex ambiguities
1685In both cases, adding more words to the message will help both the
1686translator and the English speaking user.
1687
1688@cindex sentences
1689Translatable strings should be entire sentences.  It is often not possible
1690to translate single verbs or adjectives in a substitutable way.
1691
1692@example
1693printf ("File %s is %s protected", filename, rw ? "write" : "read");
1694@end example
1695
1696@noindent
1697Most translators will not look at the source and will thus only see the
1698string @code{"File %s is %s protected"}, which is unintelligible.  Change
1699this to
1700
1701@example
1702printf (rw ? "File %s is write protected" : "File %s is read protected",
1703        filename);
1704@end example
1705
1706@noindent
1707This way the translator will not only understand the message, she will
1708also be able to find the appropriate grammatical construction.  A French
1709translator for example translates "write protected" like "protected
1710against writing".
1711
1712Entire sentences are also important because in many languages, the
1713declination of some word in a sentence depends on the gender or the
1714number (singular/plural) of another part of the sentence.  There are
1715usually more interdependencies between words than in English.  The
1716consequence is that asking a translator to translate two half-sentences
1717and then combining these two half-sentences through dumb string concatenation
1718will not work, for many languages, even though it would work for English.
1719That's why translators need to handle entire sentences.
1720
1721Often sentences don't fit into a single line.  If a sentence is output
1722using two subsequent @code{printf} statements, like this
1723
1724@example
1725printf ("Locale charset \"%s\" is different from\n", lcharset);
1726printf ("input file charset \"%s\".\n", fcharset);
1727@end example
1728
1729@noindent
1730the translator would have to translate two half sentences, but nothing
1731in the POT file would tell her that the two half sentences belong together.
1732It is necessary to merge the two @code{printf} statements so that the
1733translator can handle the entire sentence at once and decide at which
1734place to insert a line break in the translation (if at all):
1735
1736@example
1737printf ("Locale charset \"%s\" is different from\n\
1738input file charset \"%s\".\n", lcharset, fcharset);
1739@end example
1740
1741You may now ask: how about two or more adjacent sentences? Like in this case:
1742
1743@example
1744puts ("Apollo 13 scenario: Stack overflow handling failed.");
1745puts ("On the next stack overflow we will crash!!!");
1746@end example
1747
1748@noindent
1749Should these two statements merged into a single one? I would recommend to
1750merge them if the two sentences are related to each other, because then it
1751makes it easier for the translator to understand and translate both.  On
1752the other hand, if one of the two messages is a stereotypic one, occurring
1753in other places as well, you will do a favour to the translator by not
1754merging the two.  (Identical messages occurring in several places are
1755combined by xgettext, so the translator has to handle them once only.)
1756
1757@cindex paragraphs
1758Translatable strings should be limited to one paragraph; don't let a
1759single message be longer than ten lines.  The reason is that when the
1760translatable string changes, the translator is faced with the task of
1761updating the entire translated string.  Maybe only a single word will
1762have changed in the English string, but the translator doesn't see that
1763(with the current translation tools), therefore she has to proofread
1764the entire message.
1765
1766@cindex help option
1767Many GNU programs have a @samp{--help} output that extends over several
1768screen pages.  It is a courtesy towards the translators to split such a
1769message into several ones of five to ten lines each.  While doing that,
1770you can also attempt to split the documented options into groups,
1771such as the input options, the output options, and the informative
1772output options.  This will help every user to find the option he is
1773looking for.
1774
1775@cindex string concatenation
1776@cindex concatenation of strings
1777Hardcoded string concatenation is sometimes used to construct English
1778strings:
1779
1780@example
1781strcpy (s, "Replace ");
1782strcat (s, object1);
1783strcat (s, " with ");
1784strcat (s, object2);
1785strcat (s, "?");
1786@end example
1787
1788@noindent
1789In order to present to the translator only entire sentences, and also
1790because in some languages the translator might want to swap the order
1791of @code{object1} and @code{object2}, it is necessary to change this
1792to use a format string:
1793
1794@example
1795sprintf (s, "Replace %s with %s?", object1, object2);
1796@end example
1797
1798@cindex @code{inttypes.h}
1799A similar case is compile time concatenation of strings.  The ISO C 99
1800include file @code{<inttypes.h>} contains a macro @code{PRId64} that
1801can be used as a formatting directive for outputting an @samp{int64_t}
1802integer through @code{printf}.  It expands to a constant string, usually
1803"d" or "ld" or "lld" or something like this, depending on the platform.
1804Assume you have code like
1805
1806@example
1807printf ("The amount is %0" PRId64 "\n", number);
1808@end example
1809
1810@noindent
1811The @code{gettext} tools and library have special support for these
1812@code{<inttypes.h>} macros.  You can therefore simply write
1813
1814@example
1815printf (gettext ("The amount is %0" PRId64 "\n"), number);
1816@end example
1817
1818@noindent
1819The PO file will contain the string "The amount is %0<PRId64>\n".
1820The translators will provide a translation containing "%0<PRId64>"
1821as well, and at runtime the @code{gettext} function's result will
1822contain the appropriate constant string, "d" or "ld" or "lld".
1823
1824This works only for the predefined @code{<inttypes.h>} macros.  If
1825you have defined your own similar macros, let's say @samp{MYPRId64},
1826that are not known to @code{xgettext}, the solution for this problem
1827is to change the code like this:
1828
1829@example
1830char buf1[100];
1831sprintf (buf1, "%0" MYPRId64, number);
1832printf (gettext ("The amount is %s\n"), buf1);
1833@end example
1834
1835This means, you put the platform dependent code in one statement, and the
1836internationalization code in a different statement.  Note that a buffer length
1837of 100 is safe, because all available hardware integer types are limited to
1838128 bits, and to print a 128 bit integer one needs at most 54 characters,
1839regardless whether in decimal, octal or hexadecimal.
1840
1841@cindex Java, string concatenation
1842@cindex C#, string concatenation
1843All this applies to other programming languages as well.  For example, in
1844Java and C#, string concatenation is very frequently used, because it is a
1845compiler built-in operator.  Like in C, in Java, you would change
1846
1847@example
1848System.out.println("Replace "+object1+" with "+object2+"?");
1849@end example
1850
1851@noindent
1852into a statement involving a format string:
1853
1854@example
1855System.out.println(
1856    MessageFormat.format("Replace @{0@} with @{1@}?",
1857                         new Object[] @{ object1, object2 @}));
1858@end example
1859
1860@noindent
1861Similarly, in C#, you would change
1862
1863@example
1864Console.WriteLine("Replace "+object1+" with "+object2+"?");
1865@end example
1866
1867@noindent
1868into a statement involving a format string:
1869
1870@example
1871Console.WriteLine(
1872    String.Format("Replace @{0@} with @{1@}?", object1, object2));
1873@end example
1874
1875@cindex markup
1876@cindex control characters
1877Unusual markup or control characters should not be used in translatable
1878strings.  Translators will likely not understand the particular meaning
1879of the markup or control characters.
1880
1881For example, if you have a convention that @samp{|} delimits the
1882left-hand and right-hand part of some GUI elements, translators will
1883often not understand it without specific comments.  It might be
1884better to have the translator translate the left-hand and right-hand
1885part separately.
1886
1887Another example is the @samp{argp} convention to use a single @samp{\v}
1888(vertical tab) control character to delimit two sections inside a
1889string.  This is flawed.  Some translators may convert it to a simple
1890newline, some to blank lines.  With some PO file editors it may not be
1891easy to even enter a vertical tab control character.  So, you cannot
1892be sure that the translation will contain a @samp{\v} character, at the
1893corresponding position.  The solution is, again, to let the translator
1894translate two separate strings and combine at run-time the two translated
1895strings with the @samp{\v} required by the convention.
1896
1897HTML markup, however, is common enough that it's probably ok to use in
1898translatable strings.  But please bear in mind that the GNU gettext tools
1899don't verify that the translations are well-formed HTML.
1900
1901@node Mark Keywords, Marking, Preparing Strings, Sources
1902@section How Marks Appear in Sources
1903@cindex marking strings that require translation
1904
1905All strings requiring translation should be marked in the C sources.  Marking
1906is done in such a way that each translatable string appears to be
1907the sole argument of some function or preprocessor macro.  There are
1908only a few such possible functions or macros meant for translation,
1909and their names are said to be marking keywords.  The marking is
1910attached to strings themselves, rather than to what we do with them.
1911This approach has more uses.  A blatant example is an error message
1912produced by formatting.  The format string needs translation, as
1913well as some strings inserted through some @samp{%s} specification
1914in the format, while the result from @code{sprintf} may have so many
1915different instances that it is impractical to list them all in some
1916@samp{error_string_out()} routine, say.
1917
1918This marking operation has two goals.  The first goal of marking
1919is for triggering the retrieval of the translation, at run time.
1920The keyword is possibly resolved into a routine able to dynamically
1921return the proper translation, as far as possible or wanted, for the
1922argument string.  Most localizable strings are found in executable
1923positions, that is, attached to variables or given as parameters to
1924functions.  But this is not universal usage, and some translatable
1925strings appear in structured initializations.  @xref{Special cases}.
1926
1927The second goal of the marking operation is to help @code{xgettext}
1928at properly extracting all translatable strings when it scans a set
1929of program sources and produces PO file templates.
1930
1931The canonical keyword for marking translatable strings is
1932@samp{gettext}, it gave its name to the whole GNU @code{gettext}
1933package.  For packages making only light use of the @samp{gettext}
1934keyword, macro or function, it is easily used @emph{as is}.  However,
1935for packages using the @code{gettext} interface more heavily, it
1936is usually more convenient to give the main keyword a shorter, less
1937obtrusive name.  Indeed, the keyword might appear on a lot of strings
1938all over the package, and programmers usually do not want nor need
1939their program sources to remind them forcefully, all the time, that they
1940are internationalized.  Further, a long keyword has the disadvantage
1941of using more horizontal space, forcing more indentation work on
1942sources for those trying to keep them within 79 or 80 columns.
1943
1944@cindex @code{_}, a macro to mark strings for translation
1945Many packages use @samp{_} (a simple underline) as a keyword,
1946and write @samp{_("Translatable string")} instead of @samp{gettext
1947("Translatable string")}.  Further, the coding rule, from GNU standards,
1948wanting that there is a space between the keyword and the opening
1949parenthesis is relaxed, in practice, for this particular usage.
1950So, the textual overhead per translatable string is reduced to
1951only three characters: the underline and the two parentheses.
1952However, even if GNU @code{gettext} uses this convention internally,
1953it does not offer it officially.  The real, genuine keyword is truly
1954@samp{gettext} indeed.  It is fairly easy for those wanting to use
1955@samp{_} instead of @samp{gettext} to declare:
1956
1957@example
1958#include <libintl.h>
1959#define _(String) gettext (String)
1960@end example
1961
1962@noindent
1963instead of merely using @samp{#include <libintl.h>}.
1964
1965The marking keywords @samp{gettext} and @samp{_} take the translatable
1966string as sole argument.  It is also possible to define marking functions
1967that take it at another argument position.  It is even possible to make
1968the marked argument position depend on the total number of arguments of
1969the function call; this is useful in C++.  All this is achieved using
1970@code{xgettext}'s @samp{--keyword} option.
1971
1972Note also that long strings can be split across lines, into multiple
1973adjacent string tokens.  Automatic string concatenation is performed
1974at compile time according to ISO C and ISO C++; @code{xgettext} also
1975supports this syntax.
1976
1977Later on, the maintenance is relatively easy.  If, as a programmer,
1978you add or modify a string, you will have to ask yourself if the
1979new or altered string requires translation, and include it within
1980@samp{_()} if you think it should be translated.  For example, @samp{"%s"}
1981is an example of string @emph{not} requiring translation.  But
1982@samp{"%s: %d"} @emph{does} require translation, because in French, unlike
1983in English, it's customary to put a space before a colon.
1984
1985@node Marking, c-format Flag, Mark Keywords, Sources
1986@section Marking Translatable Strings
1987@emindex marking strings for translation
1988
1989In PO mode, one set of features is meant more for the programmer than
1990for the translator, and allows him to interactively mark which strings,
1991in a set of program sources, are translatable, and which are not.
1992Even if it is a fairly easy job for a programmer to find and mark
1993such strings by other means, using any editor of his choice, PO mode
1994makes this work more comfortable.  Further, this gives translators
1995who feel a little like programmers, or programmers who feel a little
1996like translators, a tool letting them work at marking translatable
1997strings in the program sources, while simultaneously producing a set of
1998translation in some language, for the package being internationalized.
1999
2000@emindex @code{etags}, using for marking strings
2001The set of program sources, targeted by the PO mode commands describe
2002here, should have an Emacs tags table constructed for your project,
2003prior to using these PO file commands.  This is easy to do.  In any
2004shell window, change the directory to the root of your project, then
2005execute a command resembling:
2006
2007@example
2008etags src/*.[hc] lib/*.[hc]
2009@end example
2010
2011@noindent
2012presuming here you want to process all @file{.h} and @file{.c} files
2013from the @file{src/} and @file{lib/} directories.  This command will
2014explore all said files and create a @file{TAGS} file in your root
2015directory, somewhat summarizing the contents using a special file
2016format Emacs can understand.
2017
2018@emindex @file{TAGS}, and marking translatable strings
2019For packages following the GNU coding standards, there is
2020a make goal @code{tags} or @code{TAGS} which constructs the tag files in
2021all directories and for all files containing source code.
2022
2023Once your @file{TAGS} file is ready, the following commands assist
2024the programmer at marking translatable strings in his set of sources.
2025But these commands are necessarily driven from within a PO file
2026window, and it is likely that you do not even have such a PO file yet.
2027This is not a problem at all, as you may safely open a new, empty PO
2028file, mainly for using these commands.  This empty PO file will slowly
2029fill in while you mark strings as translatable in your program sources.
2030
2031@table @kbd
2032@item ,
2033@efindex ,@r{, PO Mode command}
2034Search through program sources for a string which looks like a
2035candidate for translation (@code{po-tags-search}).
2036
2037@item M-,
2038@efindex M-,@r{, PO Mode command}
2039Mark the last string found with @samp{_()} (@code{po-mark-translatable}).
2040
2041@item M-.
2042@efindex M-.@r{, PO Mode command}
2043Mark the last string found with a keyword taken from a set of possible
2044keywords.  This command with a prefix allows some management of these
2045keywords (@code{po-select-mark-and-mark}).
2046
2047@end table
2048
2049@efindex po-tags-search@r{, PO Mode command}
2050The @kbd{,} (@code{po-tags-search}) command searches for the next
2051occurrence of a string which looks like a possible candidate for
2052translation, and displays the program source in another Emacs window,
2053positioned in such a way that the string is near the top of this other
2054window.  If the string is too big to fit whole in this window, it is
2055positioned so only its end is shown.  In any case, the cursor
2056is left in the PO file window.  If the shown string would be better
2057presented differently in different native languages, you may mark it
2058using @kbd{M-,} or @kbd{M-.}.  Otherwise, you might rather ignore it
2059and skip to the next string by merely repeating the @kbd{,} command.
2060
2061A string is a good candidate for translation if it contains a sequence
2062of three or more letters.  A string containing at most two letters in
2063a row will be considered as a candidate if it has more letters than
2064non-letters.  The command disregards strings containing no letters,
2065or isolated letters only.  It also disregards strings within comments,
2066or strings already marked with some keyword PO mode knows (see below).
2067
2068If you have never told Emacs about some @file{TAGS} file to use, the
2069command will request that you specify one from the minibuffer, the
2070first time you use the command.  You may later change your @file{TAGS}
2071file by using the regular Emacs command @w{@kbd{M-x visit-tags-table}},
2072which will ask you to name the precise @file{TAGS} file you want
2073to use.  @xref{Tags, , Tag Tables, emacs, The Emacs Editor}.
2074
2075Each time you use the @kbd{,} command, the search resumes from where it was
2076left by the previous search, and goes through all program sources,
2077obeying the @file{TAGS} file, until all sources have been processed.
2078However, by giving a prefix argument to the command @w{(@kbd{C-u
2079,})}, you may request that the search be restarted all over again
2080from the first program source; but in this case, strings that you
2081recently marked as translatable will be automatically skipped.
2082
2083Using this @kbd{,} command does not prevent using of other regular
2084Emacs tags commands.  For example, regular @code{tags-search} or
2085@code{tags-query-replace} commands may be used without disrupting the
2086independent @kbd{,} search sequence.  However, as implemented, the
2087@emph{initial} @kbd{,} command (or the @kbd{,} command is used with a
2088prefix) might also reinitialize the regular Emacs tags searching to the
2089first tags file, this reinitialization might be considered spurious.
2090
2091@efindex po-mark-translatable@r{, PO Mode command}
2092@efindex po-select-mark-and-mark@r{, PO Mode command}
2093The @kbd{M-,} (@code{po-mark-translatable}) command will mark the
2094recently found string with the @samp{_} keyword.  The @kbd{M-.}
2095(@code{po-select-mark-and-mark}) command will request that you type
2096one keyword from the minibuffer and use that keyword for marking
2097the string.  Both commands will automatically create a new PO file
2098untranslated entry for the string being marked, and make it the
2099current entry (making it easy for you to immediately proceed to its
2100translation, if you feel like doing it right away).  It is possible
2101that the modifications made to the program source by @kbd{M-,} or
2102@kbd{M-.} render some source line longer than 80 columns, forcing you
2103to break and re-indent this line differently.  You may use the @kbd{O}
2104command from PO mode, or any other window changing command from
2105Emacs, to break out into the program source window, and do any
2106needed adjustments.  You will have to use some regular Emacs command
2107to return the cursor to the PO file window, if you want command
2108@kbd{,} for the next string, say.
2109
2110The @kbd{M-.} command has a few built-in speedups, so you do not
2111have to explicitly type all keywords all the time.  The first such
2112speedup is that you are presented with a @emph{preferred} keyword,
2113which you may accept by merely typing @kbd{@key{RET}} at the prompt.
2114The second speedup is that you may type any non-ambiguous prefix of the
2115keyword you really mean, and the command will complete it automatically
2116for you.  This also means that PO mode has to @emph{know} all
2117your possible keywords, and that it will not accept mistyped keywords.
2118
2119If you reply @kbd{?} to the keyword request, the command gives a
2120list of all known keywords, from which you may choose.  When the
2121command is prefixed by an argument @w{(@kbd{C-u M-.})}, it inhibits
2122updating any program source or PO file buffer, and does some simple
2123keyword management instead.  In this case, the command asks for a
2124keyword, written in full, which becomes a new allowed keyword for
2125later @kbd{M-.} commands.  Moreover, this new keyword automatically
2126becomes the @emph{preferred} keyword for later commands.  By typing
2127an already known keyword in response to @w{@kbd{C-u M-.}}, one merely
2128changes the @emph{preferred} keyword and does nothing more.
2129
2130All keywords known for @kbd{M-.} are recognized by the @kbd{,} command
2131when scanning for strings, and strings already marked by any of those
2132known keywords are automatically skipped.  If many PO files are opened
2133simultaneously, each one has its own independent set of known keywords.
2134There is no provision in PO mode, currently, for deleting a known
2135keyword, you have to quit the file (maybe using @kbd{q}) and reopen
2136it afresh.  When a PO file is newly brought up in an Emacs window, only
2137@samp{gettext} and @samp{_} are known as keywords, and @samp{gettext}
2138is preferred for the @kbd{M-.} command.  In fact, this is not useful to
2139prefer @samp{_}, as this one is already built in the @kbd{M-,} command.
2140
2141@node c-format Flag, Special cases, Marking, Sources
2142@section Special Comments preceding Keywords
2143
2144@c FIXME document c-format and no-c-format.
2145
2146@cindex format strings
2147In C programs strings are often used within calls of functions from the
2148@code{printf} family.  The special thing about these format strings is
2149that they can contain format specifiers introduced with @kbd{%}.  Assume
2150we have the code
2151
2152@example
2153printf (gettext ("String `%s' has %d characters\n"), s, strlen (s));
2154@end example
2155
2156@noindent
2157A possible German translation for the above string might be:
2158
2159@example
2160"%d Zeichen lang ist die Zeichenkette `%s'"
2161@end example
2162
2163A C programmer, even if he cannot speak German, will recognize that
2164there is something wrong here.  The order of the two format specifiers
2165is changed but of course the arguments in the @code{printf} don't have.
2166This will most probably lead to problems because now the length of the
2167string is regarded as the address.
2168
2169To prevent errors at runtime caused by translations the @code{msgfmt}
2170tool can check statically whether the arguments in the original and the
2171translation string match in type and number.  If this is not the case
2172and the @samp{-c} option has been passed to @code{msgfmt}, @code{msgfmt}
2173will give an error and refuse to produce a MO file.  Thus consequent
2174use of @samp{msgfmt -c} will catch the error, so that it cannot cause
2175cause problems at runtime.
2176
2177@noindent
2178If the word order in the above German translation would be correct one
2179would have to write
2180
2181@example
2182"%2$d Zeichen lang ist die Zeichenkette `%1$s'"
2183@end example
2184
2185@noindent
2186The routines in @code{msgfmt} know about this special notation.
2187
2188Because not all strings in a program must be format strings it is not
2189useful for @code{msgfmt} to test all the strings in the @file{.po} file.
2190This might cause problems because the string might contain what looks
2191like a format specifier, but the string is not used in @code{printf}.
2192
2193Therefore the @code{xgettext} adds a special tag to those messages it
2194thinks might be a format string.  There is no absolute rule for this,
2195only a heuristic.  In the @file{.po} file the entry is marked using the
2196@code{c-format} flag in the @code{#,} comment line (@pxref{PO Files}).
2197
2198@kwindex c-format@r{, and @code{xgettext}}
2199@kwindex no-c-format@r{, and @code{xgettext}}
2200The careful reader now might say that this again can cause problems.
2201The heuristic might guess it wrong.  This is true and therefore
2202@code{xgettext} knows about a special kind of comment which lets
2203the programmer take over the decision.  If in the same line as or
2204the immediately preceding line to the @code{gettext} keyword
2205the @code{xgettext} program finds a comment containing the words
2206@code{xgettext:c-format}, it will mark the string in any case with
2207the @code{c-format} flag.  This kind of comment should be used when
2208@code{xgettext} does not recognize the string as a format string but
2209it really is one and it should be tested.  Please note that when the
2210comment is in the same line as the @code{gettext} keyword, it must be
2211before the string to be translated.
2212
2213This situation happens quite often.  The @code{printf} function is often
2214called with strings which do not contain a format specifier.  Of course
2215one would normally use @code{fputs} but it does happen.  In this case
2216@code{xgettext} does not recognize this as a format string but what
2217happens if the translation introduces a valid format specifier?  The
2218@code{printf} function will try to access one of the parameters but none
2219exists because the original code does not pass any parameters.
2220
2221@code{xgettext} of course could make a wrong decision the other way
2222round, i.e.@: a string marked as a format string actually is not a format
2223string.  In this case the @code{msgfmt} might give too many warnings and
2224would prevent translating the @file{.po} file.  The method to prevent
2225this wrong decision is similar to the one used above, only the comment
2226to use must contain the string @code{xgettext:no-c-format}.
2227
2228If a string is marked with @code{c-format} and this is not correct the
2229user can find out who is responsible for the decision.  See
2230@ref{xgettext Invocation} to see how the @code{--debug} option can be
2231used for solving this problem.
2232
2233@node Special cases, Names, c-format Flag, Sources
2234@section Special Cases of Translatable Strings
2235
2236@cindex marking string initializers
2237The attentive reader might now point out that it is not always possible
2238to mark translatable string with @code{gettext} or something like this.
2239Consider the following case:
2240
2241@example
2242@group
2243@{
2244  static const char *messages[] = @{
2245    "some very meaningful message",
2246    "and another one"
2247  @};
2248  const char *string;
2249  @dots{}
2250  string
2251    = index > 1 ? "a default message" : messages[index];
2252
2253  fputs (string);
2254  @dots{}
2255@}
2256@end group
2257@end example
2258
2259While it is no problem to mark the string @code{"a default message"} it
2260is not possible to mark the string initializers for @code{messages}.
2261What is to be done?  We have to fulfill two tasks.  First we have to mark the
2262strings so that the @code{xgettext} program (@pxref{xgettext Invocation})
2263can find them, and second we have to translate the string at runtime
2264before printing them.
2265
2266The first task can be fulfilled by creating a new keyword, which names a
2267no-op.  For the second we have to mark all access points to a string
2268from the array.  So one solution can look like this:
2269
2270@example
2271@group
2272#define gettext_noop(String) String
2273
2274@{
2275  static const char *messages[] = @{
2276    gettext_noop ("some very meaningful message"),
2277    gettext_noop ("and another one")
2278  @};
2279  const char *string;
2280  @dots{}
2281  string
2282    = index > 1 ? gettext ("a default message") : gettext (messages[index]);
2283
2284  fputs (string);
2285  @dots{}
2286@}
2287@end group
2288@end example
2289
2290Please convince yourself that the string which is written by
2291@code{fputs} is translated in any case.  How to get @code{xgettext} know
2292the additional keyword @code{gettext_noop} is explained in @ref{xgettext
2293Invocation}.
2294
2295The above is of course not the only solution.  You could also come along
2296with the following one:
2297
2298@example
2299@group
2300#define gettext_noop(String) String
2301
2302@{
2303  static const char *messages[] = @{
2304    gettext_noop ("some very meaningful message",
2305    gettext_noop ("and another one")
2306  @};
2307  const char *string;
2308  @dots{}
2309  string
2310    = index > 1 ? gettext_noop ("a default message") : messages[index];
2311
2312  fputs (gettext (string));
2313  @dots{}
2314@}
2315@end group
2316@end example
2317
2318But this has a drawback.  The programmer has to take care that
2319he uses @code{gettext_noop} for the string @code{"a default message"}.
2320A use of @code{gettext} could have in rare cases unpredictable results.
2321
2322One advantage is that you need not make control flow analysis to make
2323sure the output is really translated in any case.  But this analysis is
2324generally not very difficult.  If it should be in any situation you can
2325use this second method in this situation.
2326
2327@node Names, Libraries, Special cases, Sources
2328@section Marking Proper Names for Translation
2329
2330Should names of persons, cities, locations etc. be marked for translation
2331or not?  People who only know languages that can be written with Latin
2332letters (English, Spanish, French, German, etc.) are tempted to say ``no'',
2333because names usually do not change when transported between these languages.
2334However, in general when translating from one script to another, names
2335are translated too, usually phonetically or by transliteration.  For
2336example, Russian or Greek names are converted to the Latin alphabet when
2337being translated to English, and English or French names are converted
2338to the Katakana script when being translated to Japanese.  This is
2339necessary because the speakers of the target language in general cannot
2340read the script the name is originally written in.
2341
2342As a programmer, you should therefore make sure that names are marked
2343for translation, with a special comment telling the translators that it
2344is a proper name and how to pronounce it.  Like this:
2345
2346@example
2347@group
2348printf (_("Written by %s.\n"),
2349        /* TRANSLATORS: This is a proper name.  See the gettext
2350           manual, section Names.  Note this is actually a non-ASCII
2351           name: The first name is (with Unicode escapes)
2352           "Fran\u00e7ois" or (with HTML entities) "Fran&ccedil;ois".
2353           Pronunciation is like "fraa-swa pee-nar".  */
2354        _("Francois Pinard"));
2355@end group
2356@end example
2357
2358As a translator, you should use some care when translating names, because
2359it is frustrating if people see their names mutilated or distorted.  If
2360your language uses the Latin script, all you need to do is to reproduce
2361the name as perfectly as you can within the usual character set of your
2362language.  In this particular case, this means to provide a translation
2363containing the c-cedilla character.  If your language uses a different
2364script and the people speaking it don't usually read Latin words, it means
2365transliteration; but you should still give, in parentheses, the original
2366writing of the name -- for the sake of the people that do read the Latin
2367script.  Here is an example, using Greek as the target script:
2368
2369@example
2370@group
2371#. This is a proper name.  See the gettext
2372#. manual, section Names.  Note this is actually a non-ASCII
2373#. name: The first name is (with Unicode escapes)
2374#. "Fran\u00e7ois" or (with HTML entities) "Fran&ccedil;ois".
2375#. Pronunciation is like "fraa-swa pee-nar".
2376msgid "Francois Pinard"
2377msgstr "\phi\rho\alpha\sigma\omicron\alpha \pi\iota\nu\alpha\rho"
2378       " (Francois Pinard)"
2379@end group
2380@end example
2381
2382Because translation of names is such a sensitive domain, it is a good
2383idea to test your translation before submitting it.
2384
2385The translation project @url{http://sourceforge.net/projects/translation}
2386has set up a POT file and translation domain consisting of program author
2387names, with better facilities for the translator than those presented here.
2388Namely, there the original name is written directly in Unicode (rather
2389than with Unicode escapes or HTML entities), and the pronunciation is
2390denoted using the International Phonetic Alphabet (see
2391@url{http://www.wikipedia.org/wiki/International_Phonetic_Alphabet}).
2392
2393However, we don't recommend this approach for all POT files in all packages,
2394because this would force translators to use PO files in UTF-8 encoding,
2395which is - in the current state of software (as of 2003) - a major hassle
2396for translators using GNU Emacs or XEmacs with po-mode.
2397
2398@node Libraries,  , Names, Sources
2399@section Preparing Library Sources
2400
2401When you are preparing a library, not a program, for the use of
2402@code{gettext}, only a few details are different.  Here we assume that
2403the library has a translation domain and a POT file of its own.  (If
2404it uses the translation domain and POT file of the main program, then
2405the previous sections apply without changes.)
2406
2407@enumerate
2408@item
2409The library code doesn't call @code{setlocale (LC_ALL, "")}.  It's the
2410responsibility of the main program to set the locale.  The library's
2411documentation should mention this fact, so that developers of programs
2412using the library are aware of it.
2413
2414@item
2415The library code doesn't call @code{textdomain (PACKAGE)}, because it
2416would interfere with the text domain set by the main program.
2417
2418@item
2419The initialization code for a program was
2420
2421@smallexample
2422  setlocale (LC_ALL, "");
2423  bindtextdomain (PACKAGE, LOCALEDIR);
2424  textdomain (PACKAGE);
2425@end smallexample
2426
2427@noindent
2428For a library it is reduced to
2429
2430@smallexample
2431  bindtextdomain (PACKAGE, LOCALEDIR);
2432@end smallexample
2433
2434@noindent
2435If your library's API doesn't already have an initialization function,
2436you need to create one, containing at least the @code{bindtextdomain}
2437invocation.  However, you usually don't need to export and document this
2438initialization function: It is sufficient that all entry points of the
2439library call the initialization function if it hasn't been called before.
2440The typical idiom used to achieve this is a static boolean variable that
2441indicates whether the initialization function has been called. Like this:
2442
2443@example
2444@group
2445static bool libfoo_initialized;
2446
2447static void
2448libfoo_initialize (void)
2449@{
2450  bindtextdomain (PACKAGE, LOCALEDIR);
2451  libfoo_initialized = true;
2452@}
2453
2454/* This function is part of the exported API.  */
2455struct foo *
2456create_foo (...)
2457@{
2458  /* Must ensure the initialization is performed.  */
2459  if (!libfoo_initialized)
2460    libfoo_initialize ();
2461  ...
2462@}
2463
2464/* This function is part of the exported API.  The argument must be
2465   non-NULL and have been created through create_foo().  */
2466int
2467foo_refcount (struct foo *argument)
2468@{
2469  /* No need to invoke the initialization function here, because
2470     create_foo() must already have been called before.  */
2471  ...
2472@}
2473@end group
2474@end example
2475
2476@item
2477The usual declaration of the @samp{_} macro in each source file was
2478
2479@smallexample
2480#include <libintl.h>
2481#define _(String) gettext (String)
2482@end smallexample
2483
2484@noindent
2485for a program.  For a library, which has its own translation domain,
2486it reads like this:
2487
2488@smallexample
2489#include <libintl.h>
2490#define _(String) dgettext (PACKAGE, String)
2491@end smallexample
2492
2493In other words, @code{dgettext} is used instead of @code{gettext}.
2494Similarly, the @code{dngettext} function should be used in place of the
2495@code{ngettext} function.
2496@end enumerate
2497
2498@node Template, Creating, Sources, Top
2499@chapter Making the PO Template File
2500@cindex PO template file
2501
2502After preparing the sources, the programmer creates a PO template file.
2503This section explains how to use @code{xgettext} for this purpose.
2504
2505@code{xgettext} creates a file named @file{@var{domainname}.po}.  You
2506should then rename it to @file{@var{domainname}.pot}.  (Why doesn't
2507@code{xgettext} create it under the name @file{@var{domainname}.pot}
2508right away?  The answer is: for historical reasons.  When @code{xgettext}
2509was specified, the distinction between a PO file and PO file template
2510was fuzzy, and the suffix @samp{.pot} wasn't in use at that time.)
2511
2512@c FIXME: Rewrite.
2513
2514@menu
2515* xgettext Invocation::         Invoking the @code{xgettext} Program
2516@end menu
2517
2518@node xgettext Invocation,  , Template, Template
2519@section Invoking the @code{xgettext} Program
2520
2521@include xgettext.texi
2522
2523@node Creating, Updating, Template, Top
2524@chapter Creating a New PO File
2525@cindex creating a new PO file
2526
2527When starting a new translation, the translator creates a file called
2528@file{@var{LANG}.po}, as a copy of the @file{@var{package}.pot} template
2529file with modifications in the initial comments (at the beginning of the file)
2530and in the header entry (the first entry, near the beginning of the file).
2531
2532The easiest way to do so is by use of the @samp{msginit} program.
2533For example:
2534
2535@example
2536$ cd @var{PACKAGE}-@var{VERSION}
2537$ cd po
2538$ msginit
2539@end example
2540
2541The alternative way is to do the copy and modifications by hand.
2542To do so, the translator copies @file{@var{package}.pot} to
2543@file{@var{LANG}.po}.  Then she modifies the initial comments and
2544the header entry of this file.
2545
2546@menu
2547* msginit Invocation::          Invoking the @code{msginit} Program
2548* Header Entry::                Filling in the Header Entry
2549@end menu
2550
2551@node msginit Invocation, Header Entry, Creating, Creating
2552@section Invoking the @code{msginit} Program
2553
2554@include msginit.texi
2555
2556@node Header Entry,  , msginit Invocation, Creating
2557@section Filling in the Header Entry
2558@cindex header entry of a PO file
2559
2560The initial comments "SOME DESCRIPTIVE TITLE", "YEAR" and
2561"FIRST AUTHOR <EMAIL@@ADDRESS>, YEAR" ought to be replaced by sensible
2562information.  This can be done in any text editor; if Emacs is used
2563and it switched to PO mode automatically (because it has recognized
2564the file's suffix), you can disable it by typing @kbd{M-x fundamental-mode}.
2565
2566Modifying the header entry can already be done using PO mode: in Emacs,
2567type @kbd{M-x po-mode RET} and then @kbd{RET} again to start editing the
2568entry.  You should fill in the following fields.
2569
2570@table @asis
2571@item Project-Id-Version
2572This is the name and version of the package.
2573
2574@item Report-Msgid-Bugs-To
2575This has already been filled in by @code{xgettext}.  It contains an email
2576address or URL where you can report bugs in the untranslated strings:
2577
2578@itemize -
2579@item Strings which are not entire sentences, see the maintainer guidelines
2580in @ref{Preparing Strings}.
2581@item Strings which use unclear terms or require additional context to be
2582understood.
2583@item Strings which make invalid assumptions about notation of date, time or
2584money.
2585@item Pluralisation problems.
2586@item Incorrect English spelling.
2587@item Incorrect formatting.
2588@end itemize
2589
2590@item POT-Creation-Date
2591This has already been filled in by @code{xgettext}.
2592
2593@item PO-Revision-Date
2594You don't need to fill this in.  It will be filled by the PO file editor
2595when you save the file.
2596
2597@item Last-Translator
2598Fill in your name and email address (without double quotes).
2599
2600@item Language-Team
2601Fill in the English name of the language, and the email address or
2602homepage URL of the language team you are part of.
2603
2604Before starting a translation, it is a good idea to get in touch with
2605your translation team, not only to make sure you don't do duplicated work,
2606but also to coordinate difficult linguistic issues.
2607
2608@cindex list of translation teams, where to find
2609In the Free Translation Project, each translation team has its own mailing
2610list.  The up-to-date list of teams can be found at the Free Translation
2611Project's homepage, @uref{http://www.iro.umontreal.ca/contrib/po/HTML/},
2612in the "National teams" area.
2613
2614@item Content-Type
2615@cindex encoding of PO files
2616@cindex charset of PO files
2617Replace @samp{CHARSET} with the character encoding used for your language,
2618in your locale, or UTF-8.  This field is needed for correct operation of the
2619@code{msgmerge} and @code{msgfmt} programs, as well as for users whose
2620locale's character encoding differs from yours (see @ref{Charset conversion}).
2621
2622@cindex @code{locale} program
2623You get the character encoding of your locale by running the shell command
2624@samp{locale charmap}.  If the result is @samp{C} or @samp{ANSI_X3.4-1968},
2625which is equivalent to @samp{ASCII} (= @samp{US-ASCII}), it means that your
2626locale is not correctly configured.  In this case, ask your translation
2627team which charset to use.  @samp{ASCII} is not usable for any language
2628except Latin.
2629
2630@cindex encoding list
2631Because the PO files must be portable to operating systems with less advanced
2632internationalization facilities, the character encodings that can be used
2633are limited to those supported by both GNU @code{libc} and GNU
2634@code{libiconv}.  These are:
2635@code{ASCII}, @code{ISO-8859-1}, @code{ISO-8859-2}, @code{ISO-8859-3},
2636@code{ISO-8859-4}, @code{ISO-8859-5}, @code{ISO-8859-6}, @code{ISO-8859-7},
2637@code{ISO-8859-8}, @code{ISO-8859-9}, @code{ISO-8859-13}, @code{ISO-8859-14},
2638@code{ISO-8859-15},
2639@code{KOI8-R}, @code{KOI8-U}, @code{KOI8-T},
2640@code{CP850}, @code{CP866}, @code{CP874},
2641@code{CP932}, @code{CP949}, @code{CP950}, @code{CP1250}, @code{CP1251},
2642@code{CP1252}, @code{CP1253}, @code{CP1254}, @code{CP1255}, @code{CP1256},
2643@code{CP1257}, @code{GB2312}, @code{EUC-JP}, @code{EUC-KR}, @code{EUC-TW},
2644@code{BIG5}, @code{BIG5-HKSCS}, @code{GBK}, @code{GB18030}, @code{SHIFT_JIS},
2645@code{JOHAB}, @code{TIS-620}, @code{VISCII}, @code{GEORGIAN-PS}, @code{UTF-8}.
2646
2647@c This data is taken from glibc/localedata/SUPPORTED.
2648@cindex Linux
2649In the GNU system, the following encodings are frequently used for the
2650corresponding languages.
2651
2652@cindex encoding for your language
2653@itemize
2654@item @code{ISO-8859-1} for
2655Afrikaans, Albanian, Basque, Breton, Catalan, Cornish, Danish, Dutch,
2656English, Estonian, Faroese, Finnish, French, Galician, German,
2657Greenlandic, Icelandic, Indonesian, Irish, Italian, Malay, Manx,
2658Norwegian, Occitan, Portuguese, Spanish, Swedish, Tagalog, Uzbek,
2659Walloon,
2660@item @code{ISO-8859-2} for
2661Bosnian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian, Slovak,
2662Slovenian,
2663@item @code{ISO-8859-3} for Maltese,
2664@item @code{ISO-8859-5} for Macedonian, Serbian,
2665@item @code{ISO-8859-6} for Arabic,
2666@item @code{ISO-8859-7} for Greek,
2667@item @code{ISO-8859-8} for Hebrew,
2668@item @code{ISO-8859-9} for Turkish,
2669@item @code{ISO-8859-13} for Latvian, Lithuanian, Maori,
2670@item @code{ISO-8859-14} for Welsh,
2671@item @code{ISO-8859-15} for
2672Basque, Catalan, Dutch, English, Finnish, French, Galician, German, Irish,
2673Italian, Portuguese, Spanish, Swedish, Walloon,
2674@item @code{KOI8-R} for Russian,
2675@item @code{KOI8-U} for Ukrainian,
2676@item @code{KOI8-T} for Tajik,
2677@item @code{CP1251} for Bulgarian, Byelorussian,
2678@item @code{GB2312}, @code{GBK}, @code{GB18030}
2679for simplified writing of Chinese,
2680@item @code{BIG5}, @code{BIG5-HKSCS}
2681for traditional writing of Chinese,
2682@item @code{EUC-JP} for Japanese,
2683@item @code{EUC-KR} for Korean,
2684@item @code{TIS-620} for Thai,
2685@item @code{GEORGIAN-PS} for Georgian,
2686@item @code{UTF-8} for any language, including those listed above.
2687@end itemize
2688
2689@cindex quote characters, use in PO files
2690@cindex quotation marks
2691When single quote characters or double quote characters are used in
2692translations for your language, and your locale's encoding is one of the
2693ISO-8859-* charsets, it is best if you create your PO files in UTF-8
2694encoding, instead of your locale's encoding.  This is because in UTF-8
2695the real quote characters can be represented (single quote characters:
2696U+2018, U+2019, double quote characters: U+201C, U+201D), whereas none of
2697ISO-8859-* charsets has them all.  Users in UTF-8 locales will see the
2698real quote characters, whereas users in ISO-8859-* locales will see the
2699vertical apostrophe and the vertical double quote instead (because that's
2700what the character set conversion will transliterate them to).
2701
2702@cindex @code{xmodmap} program, and typing quotation marks
2703To enter such quote characters under X11, you can change your keyboard
2704mapping using the @code{xmodmap} program.  The X11 names of the quote
2705characters are "leftsinglequotemark", "rightsinglequotemark",
2706"leftdoublequotemark", "rightdoublequotemark", "singlelowquotemark",
2707"doublelowquotemark".
2708
2709Note that only recent versions of GNU Emacs support the UTF-8 encoding:
2710Emacs 20 with Mule-UCS, and Emacs 21.  As of January 2001, XEmacs doesn't
2711support the UTF-8 encoding.
2712
2713The character encoding name can be written in either upper or lower case.
2714Usually upper case is preferred.
2715
2716@item Content-Transfer-Encoding
2717Set this to @code{8bit}.
2718
2719@item Plural-Forms
2720This field is optional.  It is only needed if the PO file has plural forms.
2721You can find them by searching for the @samp{msgid_plural} keyword.  The
2722format of the plural forms field is described in @ref{Plural forms}.
2723@end table
2724
2725@node Updating, Editing, Creating, Top
2726@chapter Updating Existing PO Files
2727
2728@menu
2729* msgmerge Invocation::         Invoking the @code{msgmerge} Program
2730@end menu
2731
2732@node msgmerge Invocation,  , Updating, Updating
2733@section Invoking the @code{msgmerge} Program
2734
2735@include msgmerge.texi
2736
2737@node Editing, Manipulating, Updating, Top
2738@chapter Editing PO Files
2739@cindex Editing PO Files
2740
2741@menu
2742* KBabel::                      KDE's PO File Editor
2743* Gtranslator::                 GNOME's PO File Editor
2744* PO Mode::                     Emacs's PO File Editor
2745@end menu
2746
2747@node KBabel, Gtranslator, Editing, Editing
2748@section KDE's PO File Editor
2749@cindex KDE PO file editor
2750
2751@node Gtranslator, PO Mode, KBabel, Editing
2752@section GNOME's PO File Editor
2753@cindex GNOME PO file editor
2754
2755@node PO Mode,  , Gtranslator, Editing
2756@section Emacs's PO File Editor
2757@cindex Emacs PO Mode
2758
2759@c FIXME: Rewrite.
2760
2761For those of you being
2762the lucky users of Emacs, PO mode has been specifically created
2763for providing a cozy environment for editing or modifying PO files.
2764While editing a PO file, PO mode allows for the easy browsing of
2765auxiliary and compendium PO files, as well as for following references into
2766the set of C program sources from which PO files have been derived.
2767It has a few special features, among which are the interactive marking
2768of program strings as translatable, and the validation of PO files
2769with easy repositioning to PO file lines showing errors.
2770
2771For the beginning, besides main PO mode commands
2772(@pxref{Main PO Commands}), you should know how to move between entries
2773(@pxref{Entry Positioning}), and how to handle untranslated entries
2774(@pxref{Untranslated Entries}).
2775
2776@menu
2777* Installation::                Completing GNU @code{gettext} Installation
2778* Main PO Commands::            Main Commands
2779* Entry Positioning::           Entry Positioning
2780* Normalizing::                 Normalizing Strings in Entries
2781* Translated Entries::          Translated Entries
2782* Fuzzy Entries::               Fuzzy Entries
2783* Untranslated Entries::        Untranslated Entries
2784* Obsolete Entries::            Obsolete Entries
2785* Modifying Translations::      Modifying Translations
2786* Modifying Comments::          Modifying Comments
2787* Subedit::                     Mode for Editing Translations
2788* C Sources Context::           C Sources Context
2789* Auxiliary::                   Consulting Auxiliary PO Files
2790* Compendium::                  Using Translation Compendia
2791@end menu
2792
2793@node Installation, Main PO Commands, PO Mode, PO Mode
2794@subsection Completing GNU @code{gettext} Installation
2795
2796@cindex installing @code{gettext}
2797@cindex @code{gettext} installation
2798Once you have received, unpacked, configured and compiled the GNU
2799@code{gettext} distribution, the @samp{make install} command puts in
2800place the programs @code{xgettext}, @code{msgfmt}, @code{gettext}, and
2801@code{msgmerge}, as well as their available message catalogs.  To
2802top off a comfortable installation, you might also want to make the
2803PO mode available to your Emacs users.
2804
2805@emindex @file{.emacs} customizations
2806@emindex installing PO mode
2807During the installation of the PO mode, you might want to modify your
2808file @file{.emacs}, once and for all, so it contains a few lines looking
2809like:
2810
2811@example
2812(setq auto-mode-alist
2813      (cons '("\\.po\\'\\|\\.po\\." . po-mode) auto-mode-alist))
2814(autoload 'po-mode "po-mode" "Major mode for translators to edit PO files" t)
2815@end example
2816
2817Later, whenever you edit some @file{.po}
2818file, or any file having the string @samp{.po.} within its name,
2819Emacs loads @file{po-mode.elc} (or @file{po-mode.el}) as needed, and
2820automatically activates PO mode commands for the associated buffer.
2821The string @emph{PO} appears in the mode line for any buffer for
2822which PO mode is active.  Many PO files may be active at once in a
2823single Emacs session.
2824
2825If you are using Emacs version 20 or newer, and have already installed
2826the appropriate international fonts on your system, you may also tell
2827Emacs how to determine automatically the coding system of every PO file.
2828This will often (but not always) cause the necessary fonts to be loaded
2829and used for displaying the translations on your Emacs screen.  For this
2830to happen, add the lines:
2831
2832@example
2833(modify-coding-system-alist 'file "\\.po\\'\\|\\.po\\."
2834                            'po-find-file-coding-system)
2835(autoload 'po-find-file-coding-system "po-mode")
2836@end example
2837
2838@noindent
2839to your @file{.emacs} file.  If, with this, you still see boxes instead
2840of international characters, try a different font set (via Shift Mouse
2841button 1).
2842
2843@node Main PO Commands, Entry Positioning, Installation, PO Mode
2844@subsection Main PO mode Commands
2845
2846@cindex PO mode (Emacs) commands
2847@emindex commands
2848After setting up Emacs with something similar to the lines in
2849@ref{Installation}, PO mode is activated for a window when Emacs finds a
2850PO file in that window.  This puts the window read-only and establishes a
2851po-mode-map, which is a genuine Emacs mode, in a way that is not derived
2852from text mode in any way.  Functions found on @code{po-mode-hook},
2853if any, will be executed.
2854
2855When PO mode is active in a window, the letters @samp{PO} appear
2856in the mode line for that window.  The mode line also displays how
2857many entries of each kind are held in the PO file.  For example,
2858the string @samp{132t+3f+10u+2o} would tell the translator that the
2859PO mode contains 132 translated entries (@pxref{Translated Entries},
28603 fuzzy entries (@pxref{Fuzzy Entries}), 10 untranslated entries
2861(@pxref{Untranslated Entries}) and 2 obsolete entries (@pxref{Obsolete
2862Entries}).  Zero-coefficients items are not shown.  So, in this example, if
2863the fuzzy entries were unfuzzied, the untranslated entries were translated
2864and the obsolete entries were deleted, the mode line would merely display
2865@samp{145t} for the counters.
2866
2867The main PO commands are those which do not fit into the other categories of
2868subsequent sections.  These allow for quitting PO mode or for managing windows
2869in special ways.
2870
2871@table @kbd
2872@item _
2873@efindex _@r{, PO Mode command}
2874Undo last modification to the PO file (@code{po-undo}).
2875
2876@item Q
2877@efindex Q@r{, PO Mode command}
2878Quit processing and save the PO file (@code{po-quit}).
2879
2880@item q
2881@efindex q@r{, PO Mode command}
2882Quit processing, possibly after confirmation (@code{po-confirm-and-quit}).
2883
2884@item 0
2885@efindex 0@r{, PO Mode command}
2886Temporary leave the PO file window (@code{po-other-window}).
2887
2888@item ?
2889@itemx h
2890@efindex ?@r{, PO Mode command}
2891@efindex h@r{, PO Mode command}
2892Show help about PO mode (@code{po-help}).
2893
2894@item =
2895@efindex =@r{, PO Mode command}
2896Give some PO file statistics (@code{po-statistics}).
2897
2898@item V
2899@efindex V@r{, PO Mode command}
2900Batch validate the format of the whole PO file (@code{po-validate}).
2901
2902@end table
2903
2904@efindex _@r{, PO Mode command}
2905@efindex po-undo@r{, PO Mode command}
2906The command @kbd{_} (@code{po-undo}) interfaces to the Emacs
2907@emph{undo} facility.  @xref{Undo, , Undoing Changes, emacs, The Emacs
2908Editor}.  Each time @kbd{U} is typed, modifications which the translator
2909did to the PO file are undone a little more.  For the purpose of
2910undoing, each PO mode command is atomic.  This is especially true for
2911the @kbd{@key{RET}} command: the whole edition made by using a single
2912use of this command is undone at once, even if the edition itself
2913implied several actions.  However, while in the editing window, one
2914can undo the edition work quite parsimoniously.
2915
2916@efindex Q@r{, PO Mode command}
2917@efindex q@r{, PO Mode command}
2918@efindex po-quit@r{, PO Mode command}
2919@efindex po-confirm-and-quit@r{, PO Mode command}
2920The commands @kbd{Q} (@code{po-quit}) and @kbd{q}
2921(@code{po-confirm-and-quit}) are used when the translator is done with the
2922PO file.  The former is a bit less verbose than the latter.  If the file
2923has been modified, it is saved to disk first.  In both cases, and prior to
2924all this, the commands check if any untranslated messages remain in the
2925PO file and, if so, the translator is asked if she really wants to leave
2926off working with this PO file.  This is the preferred way of getting rid
2927of an Emacs PO file buffer.  Merely killing it through the usual command
2928@w{@kbd{C-x k}} (@code{kill-buffer}) is not the tidiest way to proceed.
2929
2930@efindex 0@r{, PO Mode command}
2931@efindex po-other-window@r{, PO Mode command}
2932The command @kbd{0} (@code{po-other-window}) is another, softer way,
2933to leave PO mode, temporarily.  It just moves the cursor to some other
2934Emacs window, and pops one if necessary.  For example, if the translator
2935just got PO mode to show some source context in some other, she might
2936discover some apparent bug in the program source that needs correction.
2937This command allows the translator to change sex, become a programmer,
2938and have the cursor right into the window containing the program she
2939(or rather @emph{he}) wants to modify.  By later getting the cursor back
2940in the PO file window, or by asking Emacs to edit this file once again,
2941PO mode is then recovered.
2942
2943@efindex ?@r{, PO Mode command}
2944@efindex h@r{, PO Mode command}
2945@efindex po-help@r{, PO Mode command}
2946The command @kbd{h} (@code{po-help}) displays a summary of all available PO
2947mode commands.  The translator should then type any character to resume
2948normal PO mode operations.  The command @kbd{?} has the same effect
2949as @kbd{h}.
2950
2951@efindex =@r{, PO Mode command}
2952@efindex po-statistics@r{, PO Mode command}
2953The command @kbd{=} (@code{po-statistics}) computes the total number of
2954entries in the PO file, the ordinal of the current entry (counted from
29551), the number of untranslated entries, the number of obsolete entries,
2956and displays all these numbers.
2957
2958@efindex V@r{, PO Mode command}
2959@efindex po-validate@r{, PO Mode command}
2960The command @kbd{V} (@code{po-validate}) launches @code{msgfmt} in
2961checking and verbose
2962mode over the current PO file.  This command first offers to save the
2963current PO file on disk.  The @code{msgfmt} tool, from GNU @code{gettext},
2964has the purpose of creating a MO file out of a PO file, and PO mode uses
2965the features of this program for checking the overall format of a PO file,
2966as well as all individual entries.
2967
2968@efindex next-error@r{, stepping through PO file validation results}
2969The program @code{msgfmt} runs asynchronously with Emacs, so the
2970translator regains control immediately while her PO file is being studied.
2971Error output is collected in the Emacs @samp{*compilation*} buffer,
2972displayed in another window.  The regular Emacs command @kbd{C-x`}
2973(@code{next-error}), as well as other usual compile commands, allow the
2974translator to reposition quickly to the offending parts of the PO file.
2975Once the cursor is on the line in error, the translator may decide on
2976any PO mode action which would help correcting the error.
2977
2978@node Entry Positioning, Normalizing, Main PO Commands, PO Mode
2979@subsection Entry Positioning
2980
2981@emindex current entry of a PO file
2982The cursor in a PO file window is almost always part of
2983an entry.  The only exceptions are the special case when the cursor
2984is after the last entry in the file, or when the PO file is
2985empty.  The entry where the cursor is found to be is said to be the
2986current entry.  Many PO mode commands operate on the current entry,
2987so moving the cursor does more than allowing the translator to browse
2988the PO file, this also selects on which entry commands operate.
2989
2990@emindex moving through a PO file
2991Some PO mode commands alter the position of the cursor in a specialized
2992way.  A few of those special purpose positioning are described here,
2993the others are described in following sections (for a complete list try
2994@kbd{C-h m}):
2995
2996@table @kbd
2997
2998@item .
2999@efindex .@r{, PO Mode command}
3000Redisplay the current entry (@code{po-current-entry}).
3001
3002@item n
3003@efindex n@r{, PO Mode command}
3004Select the entry after the current one (@code{po-next-entry}).
3005
3006@item p
3007@efindex p@r{, PO Mode command}
3008Select the entry before the current one (@code{po-previous-entry}).
3009
3010@item <
3011@efindex <@r{, PO Mode command}
3012Select the first entry in the PO file (@code{po-first-entry}).
3013
3014@item >
3015@efindex >@r{, PO Mode command}
3016Select the last entry in the PO file (@code{po-last-entry}).
3017
3018@item m
3019@efindex m@r{, PO Mode command}
3020Record the location of the current entry for later use
3021(@code{po-push-location}).
3022
3023@item r
3024@efindex r@r{, PO Mode command}
3025Return to a previously saved entry location (@code{po-pop-location}).
3026
3027@item x
3028@efindex x@r{, PO Mode command}
3029Exchange the current entry location with the previously saved one
3030(@code{po-exchange-location}).
3031
3032@end table
3033
3034@efindex .@r{, PO Mode command}
3035@efindex po-current-entry@r{, PO Mode command}
3036Any Emacs command able to reposition the cursor may be used
3037to select the current entry in PO mode, including commands which
3038move by characters, lines, paragraphs, screens or pages, and search
3039commands.  However, there is a kind of standard way to display the
3040current entry in PO mode, which usual Emacs commands moving
3041the cursor do not especially try to enforce.  The command @kbd{.}
3042(@code{po-current-entry}) has the sole purpose of redisplaying the
3043current entry properly, after the current entry has been changed by
3044means external to PO mode, or the Emacs screen otherwise altered.
3045
3046It is yet to be decided if PO mode helps the translator, or otherwise
3047irritates her, by forcing a rigid window disposition while she
3048is doing her work.  We originally had quite precise ideas about
3049how windows should behave, but on the other hand, anyone used to
3050Emacs is often happy to keep full control.  Maybe a fixed window
3051disposition might be offered as a PO mode option that the translator
3052might activate or deactivate at will, so it could be offered on an
3053experimental basis.  If nobody feels a real need for using it, or
3054a compulsion for writing it, we should drop this whole idea.
3055The incentive for doing it should come from translators rather than
3056programmers, as opinions from an experienced translator are surely
3057more worth to me than opinions from programmers @emph{thinking} about
3058how @emph{others} should do translation.
3059
3060@efindex n@r{, PO Mode command}
3061@efindex po-next-entry@r{, PO Mode command}
3062@efindex p@r{, PO Mode command}
3063@efindex po-previous-entry@r{, PO Mode command}
3064The commands @kbd{n} (@code{po-next-entry}) and @kbd{p}
3065(@code{po-previous-entry}) move the cursor the entry following,
3066or preceding, the current one.  If @kbd{n} is given while the
3067cursor is on the last entry of the PO file, or if @kbd{p}
3068is given while the cursor is on the first entry, no move is done.
3069
3070@efindex <@r{, PO Mode command}
3071@efindex po-first-entry@r{, PO Mode command}
3072@efindex >@r{, PO Mode command}
3073@efindex po-last-entry@r{, PO Mode command}
3074The commands @kbd{<} (@code{po-first-entry}) and @kbd{>}
3075(@code{po-last-entry}) move the cursor to the first entry, or last
3076entry, of the PO file.  When the cursor is located past the last
3077entry in a PO file, most PO mode commands will return an error saying
3078@samp{After last entry}.  Moreover, the commands @kbd{<} and @kbd{>}
3079have the special property of being able to work even when the cursor
3080is not into some PO file entry, and one may use them for nicely
3081correcting this situation.  But even these commands will fail on a
3082truly empty PO file.  There are development plans for the PO mode for it
3083to interactively fill an empty PO file from sources.  @xref{Marking}.
3084
3085The translator may decide, before working at the translation of
3086a particular entry, that she needs to browse the remainder of the
3087PO file, maybe for finding the terminology or phraseology used
3088in related entries.  She can of course use the standard Emacs idioms
3089for saving the current cursor location in some register, and use that
3090register for getting back, or else, use the location ring.
3091
3092@efindex m@r{, PO Mode command}
3093@efindex po-push-location@r{, PO Mode command}
3094@efindex r@r{, PO Mode command}
3095@efindex po-pop-location@r{, PO Mode command}
3096PO mode offers another approach, by which cursor locations may be saved
3097onto a special stack.  The command @kbd{m} (@code{po-push-location})
3098merely adds the location of current entry to the stack, pushing
3099the already saved locations under the new one.  The command
3100@kbd{r} (@code{po-pop-location}) consumes the top stack element and
3101repositions the cursor to the entry associated with that top element.
3102This position is then lost, for the next @kbd{r} will move the cursor
3103to the previously saved location, and so on until no locations remain
3104on the stack.
3105
3106If the translator wants the position to be kept on the location stack,
3107maybe for taking a look at the entry associated with the top
3108element, then go elsewhere with the intent of getting back later, she
3109ought to use @kbd{m} immediately after @kbd{r}.
3110
3111@efindex x@r{, PO Mode command}
3112@efindex po-exchange-location@r{, PO Mode command}
3113The command @kbd{x} (@code{po-exchange-location}) simultaneously
3114repositions the cursor to the entry associated with the top element of
3115the stack of saved locations, and replaces that top element with the
3116location of the current entry before the move.  Consequently, repeating
3117the @kbd{x} command toggles alternatively between two entries.
3118For achieving this, the translator will position the cursor on the
3119first entry, use @kbd{m}, then position to the second entry, and
3120merely use @kbd{x} for making the switch.
3121
3122@node Normalizing, Translated Entries, Entry Positioning, PO Mode
3123@subsection Normalizing Strings in Entries
3124@cindex string normalization in entries
3125
3126There are many different ways for encoding a particular string into a
3127PO file entry, because there are so many different ways to split and
3128quote multi-line strings, and even, to represent special characters
3129by backslashed escaped sequences.  Some features of PO mode rely on
3130the ability for PO mode to scan an already existing PO file for a
3131particular string encoded into the @code{msgid} field of some entry.
3132Even if PO mode has internally all the built-in machinery for
3133implementing this recognition easily, doing it fast is technically
3134difficult.  To facilitate a solution to this efficiency problem,
3135we decided on a canonical representation for strings.
3136
3137A conventional representation of strings in a PO file is currently
3138under discussion, and PO mode experiments with a canonical representation.
3139Having both @code{xgettext} and PO mode converging towards a uniform
3140way of representing equivalent strings would be useful, as the internal
3141normalization needed by PO mode could be automatically satisfied
3142when using @code{xgettext} from GNU @code{gettext}.  An explicit
3143PO mode normalization should then be only necessary for PO files
3144imported from elsewhere, or for when the convention itself evolves.
3145
3146So, for achieving normalization of at least the strings of a given
3147PO file needing a canonical representation, the following PO mode
3148command is available:
3149
3150@emindex string normalization in entries
3151@table @kbd
3152@item M-x po-normalize
3153@efindex po-normalize@r{, PO Mode command}
3154Tidy the whole PO file by making entries more uniform.
3155
3156@end table
3157
3158The special command @kbd{M-x po-normalize}, which has no associated
3159keys, revises all entries, ensuring that strings of both original
3160and translated entries use uniform internal quoting in the PO file.
3161It also removes any crumb after the last entry.  This command may be
3162useful for PO files freshly imported from elsewhere, or if we ever
3163improve on the canonical quoting format we use.  This canonical format
3164is not only meant for getting cleaner PO files, but also for greatly
3165speeding up @code{msgid} string lookup for some other PO mode commands.
3166
3167@kbd{M-x po-normalize} presently makes three passes over the entries.
3168The first implements heuristics for converting PO files for GNU
3169@code{gettext} 0.6 and earlier, in which @code{msgid} and @code{msgstr}
3170fields were using K&R style C string syntax for multi-line strings.
3171These heuristics may fail for comments not related to obsolete
3172entries and ending with a backslash; they also depend on subsequent
3173passes for finalizing the proper commenting of continued lines for
3174obsolete entries.  This first pass might disappear once all oldish PO
3175files would have been adjusted.  The second and third pass normalize
3176all @code{msgid} and @code{msgstr} strings respectively.  They also
3177clean out those trailing backslashes used by XView's @code{msgfmt}
3178for continued lines.
3179
3180@cindex importing PO files
3181Having such an explicit normalizing command allows for importing PO
3182files from other sources, but also eases the evolution of the current
3183convention, evolution driven mostly by aesthetic concerns, as of now.
3184It is easy to make suggested adjustments at a later time, as the
3185normalizing command and eventually, other GNU @code{gettext} tools
3186should greatly automate conformance.  A description of the canonical
3187string format is given below, for the particular benefit of those not
3188having Emacs handy, and who would nevertheless want to handcraft
3189their PO files in nice ways.
3190
3191@cindex multi-line strings
3192Right now, in PO mode, strings are single line or multi-line.  A string
3193goes multi-line if and only if it has @emph{embedded} newlines, that
3194is, if it matches @samp{[^\n]\n+[^\n]}.  So, we would have:
3195
3196@example
3197msgstr "\n\nHello, world!\n\n\n"
3198@end example
3199
3200but, replacing the space by a newline, this becomes:
3201
3202@example
3203msgstr ""
3204"\n"
3205"\n"
3206"Hello,\n"
3207"world!\n"
3208"\n"
3209"\n"
3210@end example
3211
3212We are deliberately using a caricatural example, here, to make the
3213point clearer.  Usually, multi-lines are not that bad looking.
3214It is probable that we will implement the following suggestion.
3215We might lump together all initial newlines into the empty string,
3216and also all newlines introducing empty lines (that is, for @w{@var{n}
3217> 1}, the @var{n}-1'th last newlines would go together on a separate
3218string), so making the previous example appear:
3219
3220@example
3221msgstr "\n\n"
3222"Hello,\n"
3223"world!\n"
3224"\n\n"
3225@end example
3226
3227There are a few yet undecided little points about string normalization,
3228to be documented in this manual, once these questions settle.
3229
3230@node Translated Entries, Fuzzy Entries, Normalizing, PO Mode
3231@subsection Translated Entries
3232@cindex translated entries
3233
3234Each PO file entry for which the @code{msgstr} field has been filled with
3235a translation, and which is not marked as fuzzy (@pxref{Fuzzy Entries}),
3236is said to be a @dfn{translated} entry.  Only translated entries will
3237later be compiled by GNU @code{msgfmt} and become usable in programs.
3238Other entry types will be excluded; translation will not occur for them.
3239
3240@emindex moving by translated entries
3241Some commands are more specifically related to translated entry processing.
3242
3243@table @kbd
3244@item t
3245@efindex t@r{, PO Mode command}
3246Find the next translated entry (@code{po-next-translated-entry}).
3247
3248@item T
3249@efindex T@r{, PO Mode command}
3250Find the previous translated entry (@code{po-previous-translated-entry}).
3251
3252@end table
3253
3254@efindex t@r{, PO Mode command}
3255@efindex po-next-translated-entry@r{, PO Mode command}
3256@efindex T@r{, PO Mode command}
3257@efindex po-previous-translated-entry@r{, PO Mode command}
3258The commands @kbd{t} (@code{po-next-translated-entry}) and @kbd{T}
3259(@code{po-previous-translated-entry}) move forwards or backwards, chasing
3260for an translated entry.  If none is found, the search is extended and
3261wraps around in the PO file buffer.
3262
3263@evindex po-auto-fuzzy-on-edit@r{, PO Mode variable}
3264Translated entries usually result from the translator having edited in
3265a translation for them, @ref{Modifying Translations}.  However, if the
3266variable @code{po-auto-fuzzy-on-edit} is not @code{nil}, the entry having
3267received a new translation first becomes a fuzzy entry, which ought to
3268be later unfuzzied before becoming an official, genuine translated entry.
3269@xref{Fuzzy Entries}.
3270
3271@node Fuzzy Entries, Untranslated Entries, Translated Entries, PO Mode
3272@subsection Fuzzy Entries
3273@cindex fuzzy entries
3274
3275@cindex attributes of a PO file entry
3276@cindex attribute, fuzzy
3277Each PO file entry may have a set of @dfn{attributes}, which are
3278qualities given a name and explicitly associated with the translation,
3279using a special system comment.  One of these attributes
3280has the name @code{fuzzy}, and entries having this attribute are said
3281to have a fuzzy translation.  They are called fuzzy entries, for short.
3282
3283Fuzzy entries, even if they account for translated entries for
3284most other purposes, usually call for revision by the translator.
3285Those may be produced by applying the program @code{msgmerge} to
3286update an older translated PO files according to a new PO template
3287file, when this tool hypothesises that some new @code{msgid} has
3288been modified only slightly out of an older one, and chooses to pair
3289what it thinks to be the old translation for the new modified entry.
3290The slight alteration in the original string (the @code{msgid} string)
3291should often be reflected in the translated string, and this requires
3292the intervention of the translator.  For this reason, @code{msgmerge}
3293might mark some entries as being fuzzy.
3294
3295@emindex moving by fuzzy entries
3296Also, the translator may decide herself to mark an entry as fuzzy
3297for her own convenience, when she wants to remember that the entry
3298has to be later revisited.  So, some commands are more specifically
3299related to fuzzy entry processing.
3300
3301@table @kbd
3302@item z
3303@efindex z@r{, PO Mode command}
3304@c better append "-entry" all the time. -ke-
3305Find the next fuzzy entry (@code{po-next-fuzzy-entry}).
3306
3307@item Z
3308@efindex Z@r{, PO Mode command}
3309Find the previous fuzzy entry (@code{po-previous-fuzzy-entry}).
3310
3311@item @key{TAB}
3312@efindex TAB@r{, PO Mode command}
3313Remove the fuzzy attribute of the current entry (@code{po-unfuzzy}).
3314
3315@end table
3316
3317@efindex z@r{, PO Mode command}
3318@efindex po-next-fuzzy-entry@r{, PO Mode command}
3319@efindex Z@r{, PO Mode command}
3320@efindex po-previous-fuzzy-entry@r{, PO Mode command}
3321The commands @kbd{z} (@code{po-next-fuzzy-entry}) and @kbd{Z}
3322(@code{po-previous-fuzzy-entry}) move forwards or backwards, chasing for
3323a fuzzy entry.  If none is found, the search is extended and wraps
3324around in the PO file buffer.
3325
3326@efindex TAB@r{, PO Mode command}
3327@efindex po-unfuzzy@r{, PO Mode command}
3328@evindex po-auto-select-on-unfuzzy@r{, PO Mode variable}
3329The command @kbd{@key{TAB}} (@code{po-unfuzzy}) removes the fuzzy
3330attribute associated with an entry, usually leaving it translated.
3331Further, if the variable @code{po-auto-select-on-unfuzzy} has not
3332the @code{nil} value, the @kbd{@key{TAB}} command will automatically chase
3333for another interesting entry to work on.  The initial value of
3334@code{po-auto-select-on-unfuzzy} is @code{nil}.
3335
3336The initial value of @code{po-auto-fuzzy-on-edit} is @code{nil}.  However,
3337if the variable @code{po-auto-fuzzy-on-edit} is set to @code{t}, any entry
3338edited through the @kbd{@key{RET}} command is marked fuzzy, as a way to
3339ensure some kind of double check, later.  In this case, the usual paradigm
3340is that an entry becomes fuzzy (if not already) whenever the translator
3341modifies it.  If she is satisfied with the translation, she then uses
3342@kbd{@key{TAB}} to pick another entry to work on, clearing the fuzzy attribute
3343on the same blow.  If she is not satisfied yet, she merely uses @kbd{@key{SPC}}
3344to chase another entry, leaving the entry fuzzy.
3345
3346@efindex DEL@r{, PO Mode command}
3347@efindex po-fade-out-entry@r{, PO Mode command}
3348The translator may also use the @kbd{@key{DEL}} command
3349(@code{po-fade-out-entry}) over any translated entry to mark it as being
3350fuzzy, when she wants to easily leave a trace she wants to later return
3351working at this entry.
3352
3353Also, when time comes to quit working on a PO file buffer with the @kbd{q}
3354command, the translator is asked for confirmation, if fuzzy string
3355still exists.
3356
3357@node Untranslated Entries, Obsolete Entries, Fuzzy Entries, PO Mode
3358@subsection Untranslated Entries
3359@cindex untranslated entries
3360
3361When @code{xgettext} originally creates a PO file, unless told
3362otherwise, it initializes the @code{msgid} field with the untranslated
3363string, and leaves the @code{msgstr} string to be empty.  Such entries,
3364having an empty translation, are said to be @dfn{untranslated} entries.
3365Later, when the programmer slightly modifies some string right in
3366the program, this change is later reflected in the PO file
3367by the appearance of a new untranslated entry for the modified string.
3368
3369The usual commands moving from entry to entry consider untranslated
3370entries on the same level as active entries.  Untranslated entries
3371are easily recognizable by the fact they end with @w{@samp{msgstr ""}}.
3372
3373@emindex moving by untranslated entries
3374The work of the translator might be (quite naively) seen as the process
3375of seeking for an untranslated entry, editing a translation for
3376it, and repeating these actions until no untranslated entries remain.
3377Some commands are more specifically related to untranslated entry
3378processing.
3379
3380@table @kbd
3381@item u
3382@efindex u@r{, PO Mode command}
3383Find the next untranslated entry (@code{po-next-untranslated-entry}).
3384
3385@item U
3386@efindex U@r{, PO Mode command}
3387Find the previous untranslated entry (@code{po-previous-untransted-entry}).
3388
3389@item k
3390@efindex k@r{, PO Mode command}
3391Turn the current entry into an untranslated one (@code{po-kill-msgstr}).
3392
3393@end table
3394
3395@efindex u@r{, PO Mode command}
3396@efindex po-next-untranslated-entry@r{, PO Mode command}
3397@efindex U@r{, PO Mode command}
3398@efindex po-previous-untransted-entry@r{, PO Mode command}
3399The commands @kbd{u} (@code{po-next-untranslated-entry}) and @kbd{U}
3400(@code{po-previous-untransted-entry}) move forwards or backwards,
3401chasing for an untranslated entry.  If none is found, the search is
3402extended and wraps around in the PO file buffer.
3403
3404@efindex k@r{, PO Mode command}
3405@efindex po-kill-msgstr@r{, PO Mode command}
3406An entry can be turned back into an untranslated entry by
3407merely emptying its translation, using the command @kbd{k}
3408(@code{po-kill-msgstr}).  @xref{Modifying Translations}.
3409
3410Also, when time comes to quit working on a PO file buffer
3411with the @kbd{q} command, the translator is asked for confirmation,
3412if some untranslated string still exists.
3413
3414@node Obsolete Entries, Modifying Translations, Untranslated Entries, PO Mode
3415@subsection Obsolete Entries
3416@cindex obsolete entries
3417
3418By @dfn{obsolete} PO file entries, we mean those entries which are
3419commented out, usually by @code{msgmerge} when it found that the
3420translation is not needed anymore by the package being localized.
3421
3422The usual commands moving from entry to entry consider obsolete
3423entries on the same level as active entries.  Obsolete entries are
3424easily recognizable by the fact that all their lines start with
3425@code{#}, even those lines containing @code{msgid} or @code{msgstr}.
3426
3427Commands exist for emptying the translation or reinitializing it
3428to the original untranslated string.  Commands interfacing with the
3429kill ring may force some previously saved text into the translation.
3430The user may interactively edit the translation.  All these commands
3431may apply to obsolete entries, carefully leaving the entry obsolete
3432after the fact.
3433
3434@emindex moving by obsolete entries
3435Moreover, some commands are more specifically related to obsolete
3436entry processing.
3437
3438@table @kbd
3439@item o
3440@efindex o@r{, PO Mode command}
3441Find the next obsolete entry (@code{po-next-obsolete-entry}).
3442
3443@item O
3444@efindex O@r{, PO Mode command}
3445Find the previous obsolete entry (@code{po-previous-obsolete-entry}).
3446
3447@item @key{DEL}
3448@efindex DEL@r{, PO Mode command}
3449Make an active entry obsolete, or zap out an obsolete entry
3450(@code{po-fade-out-entry}).
3451
3452@end table
3453
3454@efindex o@r{, PO Mode command}
3455@efindex po-next-obsolete-entry@r{, PO Mode command}
3456@efindex O@r{, PO Mode command}
3457@efindex po-previous-obsolete-entry@r{, PO Mode command}
3458The commands @kbd{o} (@code{po-next-obsolete-entry}) and @kbd{O}
3459(@code{po-previous-obsolete-entry}) move forwards or backwards,
3460chasing for an obsolete entry.  If none is found, the search is
3461extended and wraps around in the PO file buffer.
3462
3463PO mode does not provide ways for un-commenting an obsolete entry
3464and making it active, because this would reintroduce an original
3465untranslated string which does not correspond to any marked string
3466in the program sources.  This goes with the philosophy of never
3467introducing useless @code{msgid} values.
3468
3469@efindex DEL@r{, PO Mode command}
3470@efindex po-fade-out-entry@r{, PO Mode command}
3471@emindex obsolete active entry
3472@emindex comment out PO file entry
3473However, it is possible to comment out an active entry, so making
3474it obsolete.  GNU @code{gettext} utilities will later react to the
3475disappearance of a translation by using the untranslated string.
3476The command @kbd{@key{DEL}} (@code{po-fade-out-entry}) pushes the current entry
3477a little further towards annihilation.  If the entry is active (it is a
3478translated entry), then it is first made fuzzy.  If it is already fuzzy,
3479then the entry is merely commented out, with confirmation.  If the entry
3480is already obsolete, then it is completely deleted from the PO file.
3481It is easy to recycle the translation so deleted into some other PO file
3482entry, usually one which is untranslated.  @xref{Modifying Translations}.
3483
3484Here is a quite interesting problem to solve for later development of
3485PO mode, for those nights you are not sleepy.  The idea would be that
3486PO mode might become bright enough, one of these days, to make good
3487guesses at retrieving the most probable candidate, among all obsolete
3488entries, for initializing the translation of a newly appeared string.
3489I think it might be a quite hard problem to do this algorithmically, as
3490we have to develop good and efficient measures of string similarity.
3491Right now, PO mode completely lets the decision to the translator,
3492when the time comes to find the adequate obsolete translation, it
3493merely tries to provide handy tools for helping her to do so.
3494
3495@node Modifying Translations, Modifying Comments, Obsolete Entries, PO Mode
3496@subsection Modifying Translations
3497@cindex editing translations
3498@emindex editing translations
3499
3500PO mode prevents direct modification of the PO file, by the usual
3501means Emacs gives for altering a buffer's contents.  By doing so,
3502it pretends helping the translator to avoid little clerical errors
3503about the overall file format, or the proper quoting of strings,
3504as those errors would be easily made.  Other kinds of errors are
3505still possible, but some may be caught and diagnosed by the batch
3506validation process, which the translator may always trigger by the
3507@kbd{V} command.  For all other errors, the translator has to rely on
3508her own judgment, and also on the linguistic reports submitted to her
3509by the users of the translated package, having the same mother tongue.
3510
3511When the time comes to create a translation, correct an error diagnosed
3512mechanically or reported by a user, the translators have to resort to
3513using the following commands for modifying the translations.
3514
3515@table @kbd
3516@item @key{RET}
3517@efindex RET@r{, PO Mode command}
3518Interactively edit the translation (@code{po-edit-msgstr}).
3519
3520@item @key{LFD}
3521@itemx C-j
3522@efindex LFD@r{, PO Mode command}
3523@efindex C-j@r{, PO Mode command}
3524Reinitialize the translation with the original, untranslated string
3525(@code{po-msgid-to-msgstr}).
3526
3527@item k
3528@efindex k@r{, PO Mode command}
3529Save the translation on the kill ring, and delete it (@code{po-kill-msgstr}).
3530
3531@item w
3532@efindex w@r{, PO Mode command}
3533Save the translation on the kill ring, without deleting it
3534(@code{po-kill-ring-save-msgstr}).
3535
3536@item y
3537@efindex y@r{, PO Mode command}
3538Replace the translation, taking the new from the kill ring
3539(@code{po-yank-msgstr}).
3540
3541@end table
3542
3543@efindex RET@r{, PO Mode command}
3544@efindex po-edit-msgstr@r{, PO Mode command}
3545The command @kbd{@key{RET}} (@code{po-edit-msgstr}) opens a new Emacs
3546window meant to edit in a new translation, or to modify an already existing
3547translation.  The new window contains a copy of the translation taken from
3548the current PO file entry, all ready for edition, expunged of all quoting
3549marks, fully modifiable and with the complete extent of Emacs modifying
3550commands.  When the translator is done with her modifications, she may use
3551@w{@kbd{C-c C-c}} to close the subedit window with the automatically requoted
3552results, or @w{@kbd{C-c C-k}} to abort her modifications.  @xref{Subedit},
3553for more information.
3554
3555@efindex LFD@r{, PO Mode command}
3556@efindex C-j@r{, PO Mode command}
3557@efindex po-msgid-to-msgstr@r{, PO Mode command}
3558The command @kbd{@key{LFD}} (@code{po-msgid-to-msgstr}) initializes, or
3559reinitializes the translation with the original string.  This command is
3560normally used when the translator wants to redo a fresh translation of
3561the original string, disregarding any previous work.
3562
3563@evindex po-auto-edit-with-msgid@r{, PO Mode variable}
3564It is possible to arrange so, whenever editing an untranslated
3565entry, the @kbd{@key{LFD}} command be automatically executed.  If you set
3566@code{po-auto-edit-with-msgid} to @code{t}, the translation gets
3567initialised with the original string, in case none exists already.
3568The default value for @code{po-auto-edit-with-msgid} is @code{nil}.
3569
3570@emindex starting a string translation
3571In fact, whether it is best to start a translation with an empty
3572string, or rather with a copy of the original string, is a matter of
3573taste or habit.  Sometimes, the source language and the
3574target language are so different that is simply best to start writing
3575on an empty page.  At other times, the source and target languages
3576are so close that it would be a waste to retype a number of words
3577already being written in the original string.  A translator may also
3578like having the original string right under her eyes, as she will
3579progressively overwrite the original text with the translation, even
3580if this requires some extra editing work to get rid of the original.
3581
3582@emindex cut and paste for translated strings
3583@efindex k@r{, PO Mode command}
3584@efindex po-kill-msgstr@r{, PO Mode command}
3585@efindex w@r{, PO Mode command}
3586@efindex po-kill-ring-save-msgstr@r{, PO Mode command}
3587The command @kbd{k} (@code{po-kill-msgstr}) merely empties the
3588translation string, so turning the entry into an untranslated
3589one.  But while doing so, its previous contents is put apart in
3590a special place, known as the kill ring.  The command @kbd{w}
3591(@code{po-kill-ring-save-msgstr}) has also the effect of taking a
3592copy of the translation onto the kill ring, but it otherwise leaves
3593the entry alone, and does @emph{not} remove the translation from the
3594entry.  Both commands use exactly the Emacs kill ring, which is shared
3595between buffers, and which is well known already to Emacs lovers.
3596
3597The translator may use @kbd{k} or @kbd{w} many times in the course
3598of her work, as the kill ring may hold several saved translations.
3599From the kill ring, strings may later be reinserted in various
3600Emacs buffers.  In particular, the kill ring may be used for moving
3601translation strings between different entries of a single PO file
3602buffer, or if the translator is handling many such buffers at once,
3603even between PO files.
3604
3605To facilitate exchanges with buffers which are not in PO mode, the
3606translation string put on the kill ring by the @kbd{k} command is fully
3607unquoted before being saved: external quotes are removed, multi-line
3608strings are concatenated, and backslash escaped sequences are turned
3609into their corresponding characters.  In the special case of obsolete
3610entries, the translation is also uncommented prior to saving.
3611
3612@efindex y@r{, PO Mode command}
3613@efindex po-yank-msgstr@r{, PO Mode command}
3614The command @kbd{y} (@code{po-yank-msgstr}) completely replaces the
3615translation of the current entry by a string taken from the kill ring.
3616Following Emacs terminology, we then say that the replacement
3617string is @dfn{yanked} into the PO file buffer.
3618@xref{Yanking, , , emacs, The Emacs Editor}.
3619The first time @kbd{y} is used, the translation receives the value of
3620the most recent addition to the kill ring.  If @kbd{y} is typed once
3621again, immediately, without intervening keystrokes, the translation
3622just inserted is taken away and replaced by the second most recent
3623addition to the kill ring.  By repeating @kbd{y} many times in a row,
3624the translator may travel along the kill ring for saved strings,
3625until she finds the string she really wanted.
3626
3627When a string is yanked into a PO file entry, it is fully and
3628automatically requoted for complying with the format PO files should
3629have.  Further, if the entry is obsolete, PO mode then appropriately
3630push the inserted string inside comments.  Once again, translators
3631should not burden themselves with quoting considerations besides, of
3632course, the necessity of the translated string itself respective to
3633the program using it.
3634
3635Note that @kbd{k} or @kbd{w} are not the only commands pushing strings
3636on the kill ring, as almost any PO mode command replacing translation
3637strings (or the translator comments) automatically saves the old string
3638on the kill ring.  The main exceptions to this general rule are the
3639yanking commands themselves.
3640
3641@emindex using obsolete translations to make new entries
3642To better illustrate the operation of killing and yanking, let's
3643use an actual example, taken from a common situation.  When the
3644programmer slightly modifies some string right in the program, his
3645change is later reflected in the PO file by the appearance
3646of a new untranslated entry for the modified string, and the fact
3647that the entry translating the original or unmodified string becomes
3648obsolete.  In many cases, the translator might spare herself some work
3649by retrieving the unmodified translation from the obsolete entry,
3650then initializing the untranslated entry @code{msgstr} field with
3651this retrieved translation.  Once this done, the obsolete entry is
3652not wanted anymore, and may be safely deleted.
3653
3654When the translator finds an untranslated entry and suspects that a
3655slight variant of the translation exists, she immediately uses @kbd{m}
3656to mark the current entry location, then starts chasing obsolete
3657entries with @kbd{o}, hoping to find some translation corresponding
3658to the unmodified string.  Once found, she uses the @kbd{@key{DEL}} command
3659for deleting the obsolete entry, knowing that @kbd{@key{DEL}} also @emph{kills}
3660the translation, that is, pushes the translation on the kill ring.
3661Then, @kbd{r} returns to the initial untranslated entry, and @kbd{y}
3662then @emph{yanks} the saved translation right into the @code{msgstr}
3663field.  The translator is then free to use @kbd{@key{RET}} for fine
3664tuning the translation contents, and maybe to later use @kbd{u},
3665then @kbd{m} again, for going on with the next untranslated string.
3666
3667When some sequence of keys has to be typed over and over again, the
3668translator may find it useful to become better acquainted with the Emacs
3669capability of learning these sequences and playing them back under request.
3670@xref{Keyboard Macros, , , emacs, The Emacs Editor}.
3671
3672@node Modifying Comments, Subedit, Modifying Translations, PO Mode
3673@subsection Modifying Comments
3674@cindex editing comments in PO files
3675@emindex editing comments
3676
3677Any translation work done seriously will raise many linguistic
3678difficulties, for which decisions have to be made, and the choices
3679further documented.  These documents may be saved within the
3680PO file in form of translator comments, which the translator
3681is free to create, delete, or modify at will.  These comments may
3682be useful to herself when she returns to this PO file after a while.
3683
3684Comments not having whitespace after the initial @samp{#}, for example,
3685those beginning with @samp{#.} or @samp{#:}, are @emph{not} translator
3686comments, they are exclusively created by other @code{gettext} tools.
3687So, the commands below will never alter such system added comments,
3688they are not meant for the translator to modify.  @xref{PO Files}.
3689
3690The following commands are somewhat similar to those modifying translations,
3691so the general indications given for those apply here.  @xref{Modifying
3692Translations}.
3693
3694@table @kbd
3695
3696@item #
3697@efindex #@r{, PO Mode command}
3698Interactively edit the translator comments (@code{po-edit-comment}).
3699
3700@item K
3701@efindex K@r{, PO Mode command}
3702Save the translator comments on the kill ring, and delete it
3703(@code{po-kill-comment}).
3704
3705@item W
3706@efindex W@r{, PO Mode command}
3707Save the translator comments on the kill ring, without deleting it
3708(@code{po-kill-ring-save-comment}).
3709
3710@item Y
3711@efindex Y@r{, PO Mode command}
3712Replace the translator comments, taking the new from the kill ring
3713(@code{po-yank-comment}).
3714
3715@end table
3716
3717These commands parallel PO mode commands for modifying the translation
3718strings, and behave much the same way as they do, except that they handle
3719this part of PO file comments meant for translator usage, rather
3720than the translation strings.  So, if the descriptions given below are
3721slightly succinct, it is because the full details have already been given.
3722@xref{Modifying Translations}.
3723
3724@efindex #@r{, PO Mode command}
3725@efindex po-edit-comment@r{, PO Mode command}
3726The command @kbd{#} (@code{po-edit-comment}) opens a new Emacs window
3727containing a copy of the translator comments on the current PO file entry.
3728If there are no such comments, PO mode understands that the translator wants
3729to add a comment to the entry, and she is presented with an empty screen.
3730Comment marks (@code{#}) and the space following them are automatically
3731removed before edition, and reinstated after.  For translator comments
3732pertaining to obsolete entries, the uncommenting and recommenting operations
3733are done twice.  Once in the editing window, the keys @w{@kbd{C-c C-c}}
3734allow the translator to tell she is finished with editing the comment.
3735@xref{Subedit}, for further details.
3736
3737@evindex po-subedit-mode-hook@r{, PO Mode variable}
3738Functions found on @code{po-subedit-mode-hook}, if any, are executed after
3739the string has been inserted in the edit buffer.
3740
3741@efindex K@r{, PO Mode command}
3742@efindex po-kill-comment@r{, PO Mode command}
3743@efindex W@r{, PO Mode command}
3744@efindex po-kill-ring-save-comment@r{, PO Mode command}
3745@efindex Y@r{, PO Mode command}
3746@efindex po-yank-comment@r{, PO Mode command}
3747The command @kbd{K} (@code{po-kill-comment}) gets rid of all
3748translator comments, while saving those comments on the kill ring.
3749The command @kbd{W} (@code{po-kill-ring-save-comment}) takes
3750a copy of the translator comments on the kill ring, but leaves
3751them undisturbed in the current entry.  The command @kbd{Y}
3752(@code{po-yank-comment}) completely replaces the translator comments
3753by a string taken at the front of the kill ring.  When this command
3754is immediately repeated, the comments just inserted are withdrawn,
3755and replaced by other strings taken along the kill ring.
3756
3757On the kill ring, all strings have the same nature.  There is no
3758distinction between @emph{translation} strings and @emph{translator
3759comments} strings.  So, for example, let's presume the translator
3760has just finished editing a translation, and wants to create a new
3761translator comment to document why the previous translation was
3762not good, just to remember what was the problem.  Foreseeing that she
3763will do that in her documentation, the translator may want to quote
3764the previous translation in her translator comments.  To do so, she
3765may initialize the translator comments with the previous translation,
3766still at the head of the kill ring.  Because editing already pushed the
3767previous translation on the kill ring, she merely has to type @kbd{M-w}
3768prior to @kbd{#}, and the previous translation will be right there,
3769all ready for being introduced by some explanatory text.
3770
3771On the other hand, presume there are some translator comments already
3772and that the translator wants to add to those comments, instead
3773of wholly replacing them.  Then, she should edit the comment right
3774away with @kbd{#}.  Once inside the editing window, she can use the
3775regular Emacs commands @kbd{C-y} (@code{yank}) and @kbd{M-y}
3776(@code{yank-pop}) to get the previous translation where she likes.
3777
3778@node Subedit, C Sources Context, Modifying Comments, PO Mode
3779@subsection Details of Sub Edition
3780@emindex subedit minor mode
3781
3782The PO subedit minor mode has a few peculiarities worth being described
3783in fuller detail.  It installs a few commands over the usual editing set
3784of Emacs, which are described below.
3785
3786@table @kbd
3787@item C-c C-c
3788@efindex C-c C-c@r{, PO Mode command}
3789Complete edition (@code{po-subedit-exit}).
3790
3791@item C-c C-k
3792@efindex C-c C-k@r{, PO Mode command}
3793Abort edition (@code{po-subedit-abort}).
3794
3795@item C-c C-a
3796@efindex C-c C-a@r{, PO Mode command}
3797Consult auxiliary PO files (@code{po-subedit-cycle-auxiliary}).
3798
3799@end table
3800
3801@emindex exiting PO subedit
3802@efindex C-c C-c@r{, PO Mode command}
3803@efindex po-subedit-exit@r{, PO Mode command}
3804The window's contents represents a translation for a given message,
3805or a translator comment.  The translator may modify this window to
3806her heart's content.  Once this is done, the command @w{@kbd{C-c C-c}}
3807(@code{po-subedit-exit}) may be used to return the edited translation into
3808the PO file, replacing the original translation, even if it moved out of
3809sight or if buffers were switched.
3810
3811@efindex C-c C-k@r{, PO Mode command}
3812@efindex po-subedit-abort@r{, PO Mode command}
3813If the translator becomes unsatisfied with her translation or comment,
3814to the extent she prefers keeping what was existent prior to the
3815@kbd{@key{RET}} or @kbd{#} command, she may use the command @w{@kbd{C-c C-k}}
3816(@code{po-subedit-abort}) to merely get rid of edition, while preserving
3817the original translation or comment.  Another way would be for her to exit
3818normally with @w{@kbd{C-c C-c}}, then type @code{U} once for undoing the
3819whole effect of last edition.
3820
3821@efindex C-c C-a@r{, PO Mode command}
3822@efindex po-subedit-cycle-auxiliary@r{, PO Mode command}
3823The command @w{@kbd{C-c C-a}} (@code{po-subedit-cycle-auxiliary})
3824allows for glancing through translations
3825already achieved in other languages, directly while editing the current
3826translation.  This may be quite convenient when the translator is fluent
3827at many languages, but of course, only makes sense when such completed
3828auxiliary PO files are already available to her (@pxref{Auxiliary}).
3829
3830Functions found on @code{po-subedit-mode-hook}, if any, are executed after
3831the string has been inserted in the edit buffer.
3832
3833While editing her translation, the translator should pay attention to not
3834inserting unwanted @kbd{@key{RET}} (newline) characters at the end of
3835the translated string if those are not meant to be there, or to removing
3836such characters when they are required.  Since these characters are not
3837visible in the editing buffer, they are easily introduced by mistake.
3838To help her, @kbd{@key{RET}} automatically puts the character @code{<}
3839at the end of the string being edited, but this @code{<} is not really
3840part of the string.  On exiting the editing window with @w{@kbd{C-c C-c}},
3841PO mode automatically removes such @kbd{<} and all whitespace added after
3842it.  If the translator adds characters after the terminating @code{<}, it
3843looses its delimiting property and integrally becomes part of the string.
3844If she removes the delimiting @code{<}, then the edited string is taken
3845@emph{as is}, with all trailing newlines, even if invisible.  Also, if
3846the translated string ought to end itself with a genuine @code{<}, then
3847the delimiting @code{<} may not be removed; so the string should appear,
3848in the editing window, as ending with two @code{<} in a row.
3849
3850@emindex editing multiple entries
3851When a translation (or a comment) is being edited, the translator may move
3852the cursor back into the PO file buffer and freely move to other entries,
3853browsing at will.  If, with an edition pending, the translator wanders in the
3854PO file buffer, she may decide to start modifying another entry.  Each entry
3855being edited has its own subedit buffer.  It is possible to simultaneously
3856edit the translation @emph{and} the comment of a single entry, or to
3857edit entries in different PO files, all at once.  Typing @kbd{@key{RET}}
3858on a field already being edited merely resumes that particular edit.  Yet,
3859the translator should better be comfortable at handling many Emacs windows!
3860
3861@emindex pending subedits
3862Pending subedits may be completed or aborted in any order, regardless
3863of how or when they were started.  When many subedits are pending and the
3864translator asks for quitting the PO file (with the @kbd{q} command), subedits
3865are automatically resumed one at a time, so she may decide for each of them.
3866
3867@node C Sources Context, Auxiliary, Subedit, PO Mode
3868@subsection C Sources Context
3869@emindex consulting program sources
3870@emindex looking at the source to aid translation
3871@emindex use the source, Luke
3872
3873PO mode is particularly powerful when used with PO files
3874created through GNU @code{gettext} utilities, as those utilities
3875insert special comments in the PO files they generate.
3876Some of these special comments relate the PO file entry to
3877exactly where the untranslated string appears in the program sources.
3878
3879When the translator gets to an untranslated entry, she is fairly
3880often faced with an original string which is not as informative as
3881it normally should be, being succinct, cryptic, or otherwise ambiguous.
3882Before choosing how to translate the string, she needs to understand
3883better what the string really means and how tight the translation has
3884to be.  Most of the time, when problems arise, the only way left to make
3885her judgment is looking at the true program sources from where this
3886string originated, searching for surrounding comments the programmer
3887might have put in there, and looking around for helping clues of
3888@emph{any} kind.
3889
3890Surely, when looking at program sources, the translator will receive
3891more help if she is a fluent programmer.  However, even if she is
3892not versed in programming and feels a little lost in C code, the
3893translator should not be shy at taking a look, once in a while.
3894It is most probable that she will still be able to find some of the
3895hints she needs.  She will learn quickly to not feel uncomfortable
3896in program code, paying more attention to programmer's comments,
3897variable and function names (if he dared choosing them well), and
3898overall organization, than to the program code itself.
3899
3900@emindex find source fragment for a PO file entry
3901The following commands are meant to help the translator at getting
3902program source context for a PO file entry.
3903
3904@table @kbd
3905@item s
3906@efindex s@r{, PO Mode command}
3907Resume the display of a program source context, or cycle through them
3908(@code{po-cycle-source-reference}).
3909
3910@item M-s
3911@efindex M-s@r{, PO Mode command}
3912Display of a program source context selected by menu
3913(@code{po-select-source-reference}).
3914
3915@item S
3916@efindex S@r{, PO Mode command}
3917Add a directory to the search path for source files
3918(@code{po-consider-source-path}).
3919
3920@item M-S
3921@efindex M-S@r{, PO Mode command}
3922Delete a directory from the search path for source files
3923(@code{po-ignore-source-path}).
3924
3925@end table
3926
3927@efindex s@r{, PO Mode command}
3928@efindex po-cycle-source-reference@r{, PO Mode command}
3929@efindex M-s@r{, PO Mode command}
3930@efindex po-select-source-reference@r{, PO Mode command}
3931The commands @kbd{s} (@code{po-cycle-source-reference}) and @kbd{M-s}
3932(@code{po-select-source-reference}) both open another window displaying
3933some source program file, and already positioned in such a way that
3934it shows an actual use of the string to be translated.  By doing
3935so, the command gives source program context for the string.  But if
3936the entry has no source context references, or if all references
3937are unresolved along the search path for program sources, then the
3938command diagnoses this as an error.
3939
3940Even if @kbd{s} (or @kbd{M-s}) opens a new window, the cursor stays
3941in the PO file window.  If the translator really wants to
3942get into the program source window, she ought to do it explicitly,
3943maybe by using command @kbd{O}.
3944
3945When @kbd{s} is typed for the first time, or for a PO file entry which
3946is different of the last one used for getting source context, then the
3947command reacts by giving the first context available for this entry,
3948if any.  If some context has already been recently displayed for the
3949current PO file entry, and the translator wandered off to do other
3950things, typing @kbd{s} again will merely resume, in another window,
3951the context last displayed.  In particular, if the translator moved
3952the cursor away from the context in the source file, the command will
3953bring the cursor back to the context.  By using @kbd{s} many times
3954in a row, with no other commands intervening, PO mode will cycle to
3955the next available contexts for this particular entry, getting back
3956to the first context once the last has been shown.
3957
3958The command @kbd{M-s} behaves differently.  Instead of cycling through
3959references, it lets the translator choose a particular reference among
3960many, and displays that reference.  It is best used with completion,
3961if the translator types @kbd{@key{TAB}} immediately after @kbd{M-s}, in
3962response to the question, she will be offered a menu of all possible
3963references, as a reminder of which are the acceptable answers.
3964This command is useful only where there are really many contexts
3965available for a single string to translate.
3966
3967@efindex S@r{, PO Mode command}
3968@efindex po-consider-source-path@r{, PO Mode command}
3969@efindex M-S@r{, PO Mode command}
3970@efindex po-ignore-source-path@r{, PO Mode command}
3971Program source files are usually found relative to where the PO
3972file stands.  As a special provision, when this fails, the file is
3973also looked for, but relative to the directory immediately above it.
3974Those two cases take proper care of most PO files.  However, it might
3975happen that a PO file has been moved, or is edited in a different
3976place than its normal location.  When this happens, the translator
3977should tell PO mode in which directory normally sits the genuine PO
3978file.  Many such directories may be specified, and all together, they
3979constitute what is called the @dfn{search path} for program sources.
3980The command @kbd{S} (@code{po-consider-source-path}) is used to interactively
3981enter a new directory at the front of the search path, and the command
3982@kbd{M-S} (@code{po-ignore-source-path}) is used to select, with completion,
3983one of the directories she does not want anymore on the search path.
3984
3985@node Auxiliary, Compendium, C Sources Context, PO Mode
3986@subsection Consulting Auxiliary PO Files
3987@emindex consulting translations to other languages
3988
3989PO mode is able to help the knowledgeable translator, being fluent in
3990many languages, at taking advantage of translations already achieved
3991in other languages she just happens to know.  It provides these other
3992language translations as additional context for her own work.  Moreover,
3993it has features to ease the production of translations for many languages
3994at once, for translators preferring to work in this way.
3995
3996@cindex auxiliary PO file
3997@emindex auxiliary PO file
3998An @dfn{auxiliary} PO file is an existing PO file meant for the same
3999package the translator is working on, but targeted to a different mother
4000tongue language.  Commands exist for declaring and handling auxiliary
4001PO files, and also for showing contexts for the entry under work.
4002
4003Here are the auxiliary file commands available in PO mode.
4004
4005@table @kbd
4006@item a
4007@efindex a@r{, PO Mode command}
4008Seek auxiliary files for another translation for the same entry
4009(@code{po-cycle-auxiliary}).
4010
4011@item C-c C-a
4012@efindex C-c C-a@r{, PO Mode command}
4013Switch to a particular auxiliary file (@code{po-select-auxiliary}).
4014
4015@item A
4016@efindex A@r{, PO Mode command}
4017Declare this PO file as an auxiliary file (@code{po-consider-as-auxiliary}).
4018
4019@item M-A
4020@efindex M-A@r{, PO Mode command}
4021Remove this PO file from the list of auxiliary files
4022(@code{po-ignore-as-auxiliary}).
4023
4024@end table
4025
4026@efindex A@r{, PO Mode command}
4027@efindex po-consider-as-auxiliary@r{, PO Mode command}
4028@efindex M-A@r{, PO Mode command}
4029@efindex po-ignore-as-auxiliary@r{, PO Mode command}
4030Command @kbd{A} (@code{po-consider-as-auxiliary}) adds the current
4031PO file to the list of auxiliary files, while command @kbd{M-A}
4032(@code{po-ignore-as-auxiliary} just removes it.
4033
4034@efindex a@r{, PO Mode command}
4035@efindex po-cycle-auxiliary@r{, PO Mode command}
4036The command @kbd{a} (@code{po-cycle-auxiliary}) seeks all auxiliary PO
4037files, round-robin, searching for a translated entry in some other language
4038having an @code{msgid} field identical as the one for the current entry.
4039The found PO file, if any, takes the place of the current PO file in
4040the display (its window gets on top).  Before doing so, the current PO
4041file is also made into an auxiliary file, if not already.  So, @kbd{a}
4042in this newly displayed PO file will seek another PO file, and so on,
4043so repeating @kbd{a} will eventually yield back the original PO file.
4044
4045@efindex C-c C-a@r{, PO Mode command}
4046@efindex po-select-auxiliary@r{, PO Mode command}
4047The command @kbd{C-c C-a} (@code{po-select-auxiliary}) asks the translator
4048for her choice of a particular auxiliary file, with completion, and
4049then switches to that selected PO file.  The command also checks if
4050the selected file has an @code{msgid} field identical as the one for
4051the current entry, and if yes, this entry becomes current.  Otherwise,
4052the cursor of the selected file is left undisturbed.
4053
4054For all this to work fully, auxiliary PO files will have to be normalized,
4055in that way that @code{msgid} fields should be written @emph{exactly}
4056the same way.  It is possible to write @code{msgid} fields in various
4057ways for representing the same string, different writing would break the
4058proper behaviour of the auxiliary file commands of PO mode.  This is not
4059expected to be much a problem in practice, as most existing PO files have
4060their @code{msgid} entries written by the same GNU @code{gettext} tools.
4061
4062@efindex normalize@r{, PO Mode command}
4063However, PO files initially created by PO mode itself, while marking
4064strings in source files, are normalised differently.  So are PO
4065files resulting of the @samp{M-x normalize} command.  Until these
4066discrepancies between PO mode and other GNU @code{gettext} tools get
4067fully resolved, the translator should stay aware of normalisation issues.
4068
4069@node Compendium,  , Auxiliary, PO Mode
4070@subsection Using Translation Compendia
4071@emindex using translation compendia
4072
4073@cindex compendium
4074A @dfn{compendium} is a special PO file containing a set of
4075translations recurring in many different packages.  The translator can
4076use gettext tools to build a new compendium, to add entries to her
4077compendium, and to initialize untranslated entries, or to update
4078already translated entries, from translations kept in the compendium.
4079
4080@menu
4081* Creating Compendia::          Merging translations for later use
4082* Using Compendia::             Using older translations if they fit
4083@end menu
4084
4085@node Creating Compendia, Using Compendia, Compendium, Compendium
4086@subsubsection Creating Compendia
4087@cindex creating compendia
4088@cindex compendium, creating
4089
4090Basically every PO file consisting of translated entries only can be
4091declared as a valid compendium.  Often the translator wants to have
4092special compendia; let's consider two cases: @cite{concatenating PO
4093files} and @cite{extracting a message subset from a PO file}.
4094
4095@subsubsection Concatenate PO Files
4096
4097@cindex concatenating PO files into a compendium
4098@cindex accumulating translations
4099To concatenate several valid PO files into one compendium file you can
4100use @samp{msgcomm} or @samp{msgcat} (the latter preferred):
4101
4102@example
4103msgcat -o compendium.po file1.po file2.po
4104@end example
4105
4106By default, @code{msgcat} will accumulate divergent translations
4107for the same string.  Those occurrences will be marked as @code{fuzzy}
4108and highly visible decorated; calling @code{msgcat} on
4109@file{file1.po}:
4110
4111@example
4112#: src/hello.c:200
4113#, c-format
4114msgid "Report bugs to <%s>.\n"
4115msgstr "Comunicar `bugs' a <%s>.\n"
4116@end example
4117
4118@noindent
4119and @file{file2.po}:
4120
4121@example
4122#: src/bye.c:100
4123#, c-format
4124msgid "Report bugs to <%s>.\n"
4125msgstr "Comunicar \"bugs\" a <%s>.\n"
4126@end example
4127
4128@noindent
4129will result in:
4130
4131@example
4132#: src/hello.c:200 src/bye.c:100
4133#, fuzzy, c-format
4134msgid "Report bugs to <%s>.\n"
4135msgstr ""
4136"#-#-#-#-#  file1.po  #-#-#-#-#\n"
4137"Comunicar `bugs' a <%s>.\n"
4138"#-#-#-#-#  file2.po  #-#-#-#-#\n"
4139"Comunicar \"bugs\" a <%s>.\n"
4140@end example
4141
4142@noindent
4143The translator will have to resolve this ``conflict'' manually; she
4144has to decide whether the first or the second version is appropriate
4145(or provide a new translation), to delete the ``marker lines'', and
4146finally to remove the @code{fuzzy} mark.
4147
4148If the translator knows in advance the first found translation of a
4149message is always the best translation she can make use to the
4150@samp{--use-first} switch:
4151
4152@example
4153msgcat --use-first -o compendium.po file1.po file2.po
4154@end example
4155
4156A good compendium file must not contain @code{fuzzy} or untranslated
4157entries.  If input files are ``dirty'' you must preprocess the input
4158files or postprocess the result using @samp{msgattrib --translated --no-fuzzy}.
4159
4160@subsubsection Extract a Message Subset from a PO File
4161@cindex extracting parts of a PO file into a compendium
4162
4163Nobody wants to translate the same messages again and again; thus you
4164may wish to have a compendium file containing @file{getopt.c} messages.
4165
4166To extract a message subset (e.g., all @file{getopt.c} messages) from an
4167existing PO file into one compendium file you can use @samp{msggrep}:
4168
4169@example
4170msggrep --location src/getopt.c -o compendium.po file.po
4171@end example
4172
4173@node Using Compendia,  , Creating Compendia, Compendium
4174@subsubsection Using Compendia
4175
4176You can use a compendium file to initialize a translation from scratch
4177or to update an already existing translation.
4178
4179@subsubsection Initialize a New Translation File
4180@cindex initialize translations from a compendium
4181
4182Since a PO file with translations does not exist the translator can
4183merely use @file{/dev/null} to fake the ``old'' translation file.
4184
4185@example
4186msgmerge --compendium compendium.po -o file.po /dev/null file.pot
4187@end example
4188
4189@subsubsection Update an Existing Translation File
4190@cindex update translations from a compendium
4191
4192Concatenate the compendium file(s) and the existing PO, merge the
4193result with the POT file and remove the obsolete entries (optional,
4194here done using @samp{sed}):
4195
4196@example
4197msgcat --use-first -o update.po compendium1.po compendium2.po file.po
4198msgmerge update.po file.pot | sed -e '/^#~/d' > file.po
4199@end example
4200
4201@node Manipulating, Binaries, Editing, Top
4202@chapter Manipulating PO Files
4203@cindex manipulating PO files
4204
4205Sometimes it is necessary to manipulate PO files in a way that is better
4206performed automatically than by hand.  GNU @code{gettext} includes a
4207complete set of tools for this purpose.
4208
4209@cindex merging two PO files
4210When merging two packages into a single package, the resulting POT file
4211will be the concatenation of the two packages' POT files.  Thus the
4212maintainer must concatenate the two existing package translations into
4213a single translation catalog, for each language.  This is best performed
4214using @samp{msgcat}.  It is then the translators' duty to deal with any
4215possible conflicts that arose during the merge.
4216
4217@cindex encoding conversion
4218When a translator takes over the translation job from another translator,
4219but she uses a different character encoding in her locale, she will
4220convert the catalog to her character encoding.  This is best done through
4221the @samp{msgconv} program.
4222
4223When a maintainer takes a source file with tagged messages from another
4224package, he should also take the existing translations for this source
4225file (and not let the translators do the same job twice).  One way to do
4226this is through @samp{msggrep}, another is to create a POT file for
4227that source file and use @samp{msgmerge}.
4228
4229@cindex dialect
4230@cindex orthography
4231When a translator wants to adjust some translation catalog for a special
4232dialect or orthography --- for example, German as written in Switzerland
4233versus German as written in Germany --- she needs to apply some text
4234processing to every message in the catalog.  The tool for doing this is
4235@samp{msgfilter}.
4236
4237Another use of @code{msgfilter} is to produce approximately the POT file for
4238which a given PO file was made.  This can be done through a filter command
4239like @samp{msgfilter sed -e d | sed -e '/^# /d'}.  Note that the original
4240POT file may have had different comments and different plural message counts,
4241that's why it's better to use the original POT file if available.
4242
4243@cindex checking of translations
4244When a translator wants to check her translations, for example according
4245to orthography rules or using a non-interactive spell checker, she can do
4246so using the @samp{msgexec} program.
4247
4248@cindex duplicate elimination
4249When third party tools create PO or POT files, sometimes duplicates cannot
4250be avoided.  But the GNU @code{gettext} tools give an error when they
4251encounter duplicate msgids in the same file and in the same domain.
4252To merge duplicates, the @samp{msguniq} program can be used.
4253
4254@samp{msgcomm} is a more general tool for keeping or throwing away
4255duplicates, occurring in different files.
4256
4257@samp{msgcmp} can be used to check whether a translation catalog is
4258completely translated.
4259
4260@cindex attributes, manipulating
4261@samp{msgattrib} can be used to select and extract only the fuzzy
4262or untranslated messages of a translation catalog.
4263
4264@samp{msgen} is useful as a first step for preparing English translation
4265catalogs.  It copies each message's msgid to its msgstr.
4266
4267Finally, for those applications where all these various programs are not
4268sufficient, a library @samp{libgettextpo} is provided that can be used to
4269write other specialized programs that process PO files.
4270
4271@menu
4272* msgcat Invocation::           Invoking the @code{msgcat} Program
4273* msgconv Invocation::          Invoking the @code{msgconv} Program
4274* msggrep Invocation::          Invoking the @code{msggrep} Program
4275* msgfilter Invocation::        Invoking the @code{msgfilter} Program
4276* msguniq Invocation::          Invoking the @code{msguniq} Program
4277* msgcomm Invocation::          Invoking the @code{msgcomm} Program
4278* msgcmp Invocation::           Invoking the @code{msgcmp} Program
4279* msgattrib Invocation::        Invoking the @code{msgattrib} Program
4280* msgen Invocation::            Invoking the @code{msgen} Program
4281* msgexec Invocation::          Invoking the @code{msgexec} Program
4282* libgettextpo::                Writing your own programs that process PO files
4283@end menu
4284
4285@node msgcat Invocation, msgconv Invocation, Manipulating, Manipulating
4286@section Invoking the @code{msgcat} Program
4287
4288@include msgcat.texi
4289
4290@node msgconv Invocation, msggrep Invocation, msgcat Invocation, Manipulating
4291@section Invoking the @code{msgconv} Program
4292
4293@include msgconv.texi
4294
4295@node msggrep Invocation, msgfilter Invocation, msgconv Invocation, Manipulating
4296@section Invoking the @code{msggrep} Program
4297
4298@include msggrep.texi
4299
4300@node msgfilter Invocation, msguniq Invocation, msggrep Invocation, Manipulating
4301@section Invoking the @code{msgfilter} Program
4302
4303@include msgfilter.texi
4304
4305@node msguniq Invocation, msgcomm Invocation, msgfilter Invocation, Manipulating
4306@section Invoking the @code{msguniq} Program
4307
4308@include msguniq.texi
4309
4310@node msgcomm Invocation, msgcmp Invocation, msguniq Invocation, Manipulating
4311@section Invoking the @code{msgcomm} Program
4312
4313@include msgcomm.texi
4314
4315@node msgcmp Invocation, msgattrib Invocation, msgcomm Invocation, Manipulating
4316@section Invoking the @code{msgcmp} Program
4317
4318@include msgcmp.texi
4319
4320@node msgattrib Invocation, msgen Invocation, msgcmp Invocation, Manipulating
4321@section Invoking the @code{msgattrib} Program
4322
4323@include msgattrib.texi
4324
4325@node msgen Invocation, msgexec Invocation, msgattrib Invocation, Manipulating
4326@section Invoking the @code{msgen} Program
4327
4328@include msgen.texi
4329
4330@node msgexec Invocation, libgettextpo, msgen Invocation, Manipulating
4331@section Invoking the @code{msgexec} Program
4332
4333@include msgexec.texi
4334
4335@node libgettextpo,  , msgexec Invocation, Manipulating
4336@section Writing your own programs that process PO files
4337
4338For the tasks for which a combination of @samp{msgattrib}, @samp{msgcat} etc.
4339is not sufficient, a set of C functions is provided in a library, to make it
4340possible to process PO files in your own programs.  When you use this library,
4341you don't need to write routines to parse the PO file; instead, you retrieve
4342a pointer in memory to each of messages contained in the PO file.  Functions
4343for writing PO files are not provided at this time.
4344
4345The functions are declared in the header file @samp{<gettext-po.h>}, and are
4346defined in a library called @samp{libgettextpo}.
4347
4348@deftp {Data Type} po_file_t
4349This is a pointer type that refers to the contents of a PO file, after it has
4350been read into memory.
4351@end deftp
4352
4353@deftp {Data Type} po_message_iterator_t
4354This is a pointer type that refers to an iterator that produces a sequence of
4355messages.
4356@end deftp
4357
4358@deftp {Data Type} po_message_t
4359This is a pointer type that refers to a message of a PO file, including its
4360translation.
4361@end deftp
4362
4363@deftypefun po_file_t po_file_read (const char *@var{filename})
4364The @code{po_file_read} function reads a PO file into memory.  The file name
4365is given as argument.  The return value is a handle to the PO file's contents,
4366valid until @code{po_file_free} is called on it.  In case of error, the return
4367value is @code{NULL}, and @code{errno} is set.
4368@end deftypefun
4369
4370@deftypefun void po_file_free (po_file_t @var{file})
4371The @code{po_file_free} function frees a PO file's contents from memory,
4372including all messages that are only implicitly accessible through iterators.
4373@end deftypefun
4374
4375@deftypefun {const char * const *} po_file_domains (po_file_t @var{file})
4376The @code{po_file_domains} function returns the domains for which the given
4377PO file has messages.  The return value is a @code{NULL} terminated array
4378which is valid as long as the @var{file} handle is valid.  For PO files which
4379contain no @samp{domain} directive, the return value contains only one domain,
4380namely the default domain @code{"messages"}.
4381@end deftypefun
4382
4383@deftypefun po_message_iterator_t po_message_iterator (po_file_t @var{file}, const char *@var{domain})
4384The @code{po_message_iterator} returns an iterator that will produce the
4385messages of @var{file} that belong to the given @var{domain}.  If @var{domain}
4386is @code{NULL}, the default domain is used instead.  To list the messages,
4387use the function @code{po_next_message} repeatedly.
4388@end deftypefun
4389
4390@deftypefun void po_message_iterator_free (po_message_iterator_t @var{iterator})
4391The @code{po_message_iterator_free} function frees an iterator previously
4392allocated through the @code{po_message_iterator} function.
4393@end deftypefun
4394
4395@deftypefun po_message_t po_next_message (po_message_iterator_t @var{iterator})
4396The @code{po_next_message} function returns the next message from
4397@var{iterator} and advances the iterator.  It returns @code{NULL} when the
4398iterator has reached the end of its message list.
4399@end deftypefun
4400
4401The following functions returns details of a @code{po_message_t}.  Recall
4402that the results are valid as long as the @var{file} handle is valid.
4403
4404@deftypefun {const char *} po_message_msgid (po_message_t @var{message})
4405The @code{po_message_msgid} function returns the @code{msgid} (untranslated
4406English string) of a message.  This is guaranteed to be non-@code{NULL}.
4407@end deftypefun
4408
4409@deftypefun {const char *} po_message_msgid_plural (po_message_t @var{message})
4410The @code{po_message_msgid_plural} function returns the @code{msgid_plural}
4411(untranslated English plural string) of a message with plurals, or @code{NULL}
4412for a message without plural.
4413@end deftypefun
4414
4415@deftypefun {const char *} po_message_msgstr (po_message_t @var{message})
4416The @code{po_message_msgstr} function returns the @code{msgstr} (translation)
4417of a message.  For an untranslated message, the return value is an empty
4418string.
4419@end deftypefun
4420
4421@deftypefun {const char *} po_message_msgstr_plural (po_message_t @var{message}, int @var{index})
4422The @code{po_message_msgstr_plural} function returns the
4423@code{msgstr[@var{index}]} of a message with plurals, or @code{NULL} when
4424the @var{index} is out of range or for a message without plural.
4425@end deftypefun
4426
4427Here is an example code how these functions can be used.
4428
4429@example
4430const char *filename = @dots{};
4431po_file_t file = po_file_read (filename);
4432
4433if (file == NULL)
4434  error (EXIT_FAILURE, errno, "couldn't open the PO file %s", filename);
4435@{
4436  const char * const *domains = po_file_domains (file);
4437  const char * const *domainp;
4438
4439  for (domainp = domains; *domainp; domainp++)
4440    @{
4441      const char *domain = *domainp;
4442      po_message_iterator_t iterator = po_message_iterator (file, domain);
4443
4444      for (;;)
4445        @{
4446          po_message_t *message = po_next_message (iterator);
4447
4448          if (message == NULL)
4449            break;
4450          @{
4451            const char *msgid = po_message_msgid (message);
4452            const char *msgstr = po_message_msgstr (message);
4453
4454            @dots{}
4455          @}
4456        @}
4457      po_message_iterator_free (iterator);
4458    @}
4459@}
4460po_file_free (file);
4461@end example
4462
4463@node Binaries, Programmers, Manipulating, Top
4464@chapter Producing Binary MO Files
4465
4466@c FIXME: Rewrite.
4467
4468@menu
4469* msgfmt Invocation::           Invoking the @code{msgfmt} Program
4470* msgunfmt Invocation::         Invoking the @code{msgunfmt} Program
4471* MO Files::                    The Format of GNU MO Files
4472@end menu
4473
4474@node msgfmt Invocation, msgunfmt Invocation, Binaries, Binaries
4475@section Invoking the @code{msgfmt} Program
4476
4477@include msgfmt.texi
4478
4479@node msgunfmt Invocation, MO Files, msgfmt Invocation, Binaries
4480@section Invoking the @code{msgunfmt} Program
4481
4482@include msgunfmt.texi
4483
4484@node MO Files,  , msgunfmt Invocation, Binaries
4485@section The Format of GNU MO Files
4486@cindex MO file's format
4487@cindex file format, @file{.mo}
4488
4489The format of the generated MO files is best described by a picture,
4490which appears below.
4491
4492@cindex magic signature of MO files
4493The first two words serve the identification of the file.  The magic
4494number will always signal GNU MO files.  The number is stored in the
4495byte order of the generating machine, so the magic number really is
4496two numbers: @code{0x950412de} and @code{0xde120495}.  The second
4497word describes the current revision of the file format.  For now the
4498revision is 0.  This might change in future versions, and ensures
4499that the readers of MO files can distinguish new formats from old
4500ones, so that both can be handled correctly.  The version is kept
4501separate from the magic number, instead of using different magic
4502numbers for different formats, mainly because @file{/etc/magic} is
4503not updated often.  It might be better to have magic separated from
4504internal format version identification.
4505
4506Follow a number of pointers to later tables in the file, allowing
4507for the extension of the prefix part of MO files without having to
4508recompile programs reading them.  This might become useful for later
4509inserting a few flag bits, indication about the charset used, new
4510tables, or other things.
4511
4512Then, at offset @var{O} and offset @var{T} in the picture, two tables
4513of string descriptors can be found.  In both tables, each string
4514descriptor uses two 32 bits integers, one for the string length,
4515another for the offset of the string in the MO file, counting in bytes
4516from the start of the file.  The first table contains descriptors
4517for the original strings, and is sorted so the original strings
4518are in increasing lexicographical order.  The second table contains
4519descriptors for the translated strings, and is parallel to the first
4520table: to find the corresponding translation one has to access the
4521array slot in the second array with the same index.
4522
4523Having the original strings sorted enables the use of simple binary
4524search, for when the MO file does not contain an hashing table, or
4525for when it is not practical to use the hashing table provided in
4526the MO file.  This also has another advantage, as the empty string
4527in a PO file GNU @code{gettext} is usually @emph{translated} into
4528some system information attached to that particular MO file, and the
4529empty string necessarily becomes the first in both the original and
4530translated tables, making the system information very easy to find.
4531
4532@cindex hash table, inside MO files
4533The size @var{S} of the hash table can be zero.  In this case, the
4534hash table itself is not contained in the MO file.  Some people might
4535prefer this because a precomputed hashing table takes disk space, and
4536does not win @emph{that} much speed.  The hash table contains indices
4537to the sorted array of strings in the MO file.  Conflict resolution is
4538done by double hashing.  The precise hashing algorithm used is fairly
4539dependent on GNU @code{gettext} code, and is not documented here.
4540
4541As for the strings themselves, they follow the hash file, and each
4542is terminated with a @key{NUL}, and this @key{NUL} is not counted in
4543the length which appears in the string descriptor.  The @code{msgfmt}
4544program has an option selecting the alignment for MO file strings.
4545With this option, each string is separately aligned so it starts at
4546an offset which is a multiple of the alignment value.  On some RISC
4547machines, a correct alignment will speed things up.
4548
4549@cindex context, in MO files
4550Contexts are stored by storing the concatenation of the context, a
4551@key{EOT} byte, and the original string, instead of the original string.
4552
4553@cindex plural forms, in MO files
4554Plural forms are stored by letting the plural of the original string
4555follow the singular of the original string, separated through a
4556@key{NUL} byte.  The length which appears in the string descriptor
4557includes both.  However, only the singular of the original string
4558takes part in the hash table lookup.  The plural variants of the
4559translation are all stored consecutively, separated through a
4560@key{NUL} byte.  Here also, the length in the string descriptor
4561includes all of them.
4562
4563Nothing prevents a MO file from having embedded @key{NUL}s in strings.
4564However, the program interface currently used already presumes
4565that strings are @key{NUL} terminated, so embedded @key{NUL}s are
4566somewhat useless.  But the MO file format is general enough so other
4567interfaces would be later possible, if for example, we ever want to
4568implement wide characters right in MO files, where @key{NUL} bytes may
4569accidentally appear.  (No, we don't want to have wide characters in MO
4570files.  They would make the file unnecessarily large, and the
4571@samp{wchar_t} type being platform dependent, MO files would be
4572platform dependent as well.)
4573
4574This particular issue has been strongly debated in the GNU
4575@code{gettext} development forum, and it is expectable that MO file
4576format will evolve or change over time.  It is even possible that many
4577formats may later be supported concurrently.  But surely, we have to
4578start somewhere, and the MO file format described here is a good start.
4579Nothing is cast in concrete, and the format may later evolve fairly
4580easily, so we should feel comfortable with the current approach.
4581
4582@example
4583@group
4584        byte
4585             +------------------------------------------+
4586          0  | magic number = 0x950412de                |
4587             |                                          |
4588          4  | file format revision = 0                 |
4589             |                                          |
4590          8  | number of strings                        |  == N
4591             |                                          |
4592         12  | offset of table with original strings    |  == O
4593             |                                          |
4594         16  | offset of table with translation strings |  == T
4595             |                                          |
4596         20  | size of hashing table                    |  == S
4597             |                                          |
4598         24  | offset of hashing table                  |  == H
4599             |                                          |
4600             .                                          .
4601             .    (possibly more entries later)         .
4602             .                                          .
4603             |                                          |
4604          O  | length & offset 0th string  ----------------.
4605      O + 8  | length & offset 1st string  ------------------.
4606              ...                                    ...   | |
4607O + ((N-1)*8)| length & offset (N-1)th string           |  | |
4608             |                                          |  | |
4609          T  | length & offset 0th translation  ---------------.
4610      T + 8  | length & offset 1st translation  -----------------.
4611              ...                                    ...   | | | |
4612T + ((N-1)*8)| length & offset (N-1)th translation      |  | | | |
4613             |                                          |  | | | |
4614          H  | start hash table                         |  | | | |
4615              ...                                    ...   | | | |
4616  H + S * 4  | end hash table                           |  | | | |
4617             |                                          |  | | | |
4618             | NUL terminated 0th string  <----------------' | | |
4619             |                                          |    | | |
4620             | NUL terminated 1st string  <------------------' | |
4621             |                                          |      | |
4622              ...                                    ...       | |
4623             |                                          |      | |
4624             | NUL terminated 0th translation  <---------------' |
4625             |                                          |        |
4626             | NUL terminated 1st translation  <-----------------'
4627             |                                          |
4628              ...                                    ...
4629             |                                          |
4630             +------------------------------------------+
4631@end group
4632@end example
4633
4634@node Programmers, Translators, Binaries, Top
4635@chapter The Programmer's View
4636
4637@c FIXME: Reorganize whole chapter.
4638
4639One aim of the current message catalog implementation provided by
4640GNU @code{gettext} was to use the system's message catalog handling, if the
4641installer wishes to do so.  So we perhaps should first take a look at
4642the solutions we know about.  The people in the POSIX committee did not
4643manage to agree on one of the semi-official standards which we'll
4644describe below.  In fact they couldn't agree on anything, so they decided
4645only to include an example of an interface.  The major Unix vendors
4646are split in the usage of the two most important specifications: X/Open's
4647catgets vs. Uniforum's gettext interface.  We'll describe them both and
4648later explain our solution of this dilemma.
4649
4650@menu
4651* catgets::                     About @code{catgets}
4652* gettext::                     About @code{gettext}
4653* Comparison::                  Comparing the two interfaces
4654* Using libintl.a::             Using libintl.a in own programs
4655* gettext grok::                Being a @code{gettext} grok
4656* Temp Programmers::            Temporary Notes for the Programmers Chapter
4657@end menu
4658
4659@node catgets, gettext, Programmers, Programmers
4660@section About @code{catgets}
4661@cindex @code{catgets}, X/Open specification
4662
4663The @code{catgets} implementation is defined in the X/Open Portability
4664Guide, Volume 3, XSI Supplementary Definitions, Chapter 5.  But the
4665process of creating this standard seemed to be too slow for some of
4666the Unix vendors so they created their implementations on preliminary
4667versions of the standard.  Of course this leads again to problems while
4668writing platform independent programs: even the usage of @code{catgets}
4669does not guarantee a unique interface.
4670
4671Another, personal comment on this that only a bunch of committee members
4672could have made this interface.  They never really tried to program
4673using this interface.  It is a fast, memory-saving implementation, an
4674user can happily live with it.  But programmers hate it (at least I and
4675some others do@dots{})
4676
4677But we must not forget one point: after all the trouble with transferring
4678the rights on Unix(tm) they at last came to X/Open, the very same who
4679published this specification.  This leads me to making the prediction
4680that this interface will be in future Unix standards (e.g.@: Spec1170) and
4681therefore part of all Unix implementation (implementations, which are
4682@emph{allowed} to wear this name).
4683
4684@menu
4685* Interface to catgets::        The interface
4686* Problems with catgets::       Problems with the @code{catgets} interface?!
4687@end menu
4688
4689@node Interface to catgets, Problems with catgets, catgets, catgets
4690@subsection The Interface
4691@cindex interface to @code{catgets}
4692
4693The interface to the @code{catgets} implementation consists of three
4694functions which correspond to those used in file access: @code{catopen}
4695to open the catalog for using, @code{catgets} for accessing the message
4696tables, and @code{catclose} for closing after work is done.  Prototypes
4697for the functions and the needed definitions are in the
4698@code{<nl_types.h>} header file.
4699
4700@cindex @code{catopen}, a @code{catgets} function
4701@code{catopen} is used like in this:
4702
4703@example
4704nl_catd catd = catopen ("catalog_name", 0);
4705@end example
4706
4707The function takes as the argument the name of the catalog.  This usual
4708refers to the name of the program or the package.  The second parameter
4709is not further specified in the standard.  I don't even know whether it
4710is implemented consistently among various systems.  So the common advice
4711is to use @code{0} as the value.  The return value is a handle to the
4712message catalog, equivalent to handles to file returned by @code{open}.
4713
4714@cindex @code{catgets}, a @code{catgets} function
4715This handle is of course used in the @code{catgets} function which can
4716be used like this:
4717
4718@example
4719char *translation = catgets (catd, set_no, msg_id, "original string");
4720@end example
4721
4722The first parameter is this catalog descriptor.  The second parameter
4723specifies the set of messages in this catalog, in which the message
4724described by @code{msg_id} is obtained.  @code{catgets} therefore uses a
4725three-stage addressing:
4726
4727@display
4728catalog name @result{} set number @result{} message ID @result{} translation
4729@end display
4730
4731@c Anybody else loving Haskell??? :-) -- Uli
4732
4733The fourth argument is not used to address the translation.  It is given
4734as a default value in case when one of the addressing stages fail.  One
4735important thing to remember is that although the return type of catgets
4736is @code{char *} the resulting string @emph{must not} be changed.  It
4737should better be @code{const char *}, but the standard is published in
47381988, one year before ANSI C.
4739
4740@noindent
4741@cindex @code{catclose}, a @code{catgets} function
4742The last of these functions is used and behaves as expected:
4743
4744@example
4745catclose (catd);
4746@end example
4747
4748After this no @code{catgets} call using the descriptor is legal anymore.
4749
4750@node Problems with catgets,  , Interface to catgets, catgets
4751@subsection Problems with the @code{catgets} Interface?!
4752@cindex problems with @code{catgets} interface
4753
4754Now that this description seemed to be really easy --- where are the
4755problems we speak of?  In fact the interface could be used in a
4756reasonable way, but constructing the message catalogs is a pain.  The
4757reason for this lies in the third argument of @code{catgets}: the unique
4758message ID.  This has to be a numeric value for all messages in a single
4759set.  Perhaps you could imagine the problems keeping such a list while
4760changing the source code.  Add a new message here, remove one there.  Of
4761course there have been developed a lot of tools helping to organize this
4762chaos but one as the other fails in one aspect or the other.  We don't
4763want to say that the other approach has no problems but they are far
4764more easy to manage.
4765
4766@node gettext, Comparison, catgets, Programmers
4767@section About @code{gettext}
4768@cindex @code{gettext}, a programmer's view
4769
4770The definition of the @code{gettext} interface comes from a Uniforum
4771proposal.  It was submitted there by Sun, who had implemented the
4772@code{gettext} function in SunOS 4, around 1990.  Nowadays, the
4773@code{gettext} interface is specified by the OpenI18N standard.
4774
4775The main point about this solution is that it does not follow the
4776method of normal file handling (open-use-close) and that it does not
4777burden the programmer with so many tasks, especially the unique key handling.
4778Of course here also a unique key is needed, but this key is the message
4779itself (how long or short it is).  See @ref{Comparison} for a more
4780detailed comparison of the two methods.
4781
4782The following section contains a rather detailed description of the
4783interface.  We make it that detailed because this is the interface
4784we chose for the GNU @code{gettext} Library.  Programmers interested
4785in using this library will be interested in this description.
4786
4787@menu
4788* Interface to gettext::        The interface
4789* Ambiguities::                 Solving ambiguities
4790* Locating Catalogs::           Locating message catalog files
4791* Charset conversion::          How to request conversion to Unicode
4792* Contexts::                    Solving ambiguities in GUI programs
4793* Plural forms::                Additional functions for handling plurals
4794* Optimized gettext::           Optimization of the *gettext functions
4795@end menu
4796
4797@node Interface to gettext, Ambiguities, gettext, gettext
4798@subsection The Interface
4799@cindex @code{gettext} interface
4800
4801The minimal functionality an interface must have is a) to select a
4802domain the strings are coming from (a single domain for all programs is
4803not reasonable because its construction and maintenance is difficult,
4804perhaps impossible) and b) to access a string in a selected domain.
4805
4806This is principally the description of the @code{gettext} interface.  It
4807has a global domain which unqualified usages reference.  Of course this
4808domain is selectable by the user.
4809
4810@example
4811char *textdomain (const char *domain_name);
4812@end example
4813
4814This provides the possibility to change or query the current status of
4815the current global domain of the @code{LC_MESSAGE} category.  The
4816argument is a null-terminated string, whose characters must be legal in
4817the use in filenames.  If the @var{domain_name} argument is @code{NULL},
4818the function returns the current value.  If no value has been set
4819before, the name of the default domain is returned: @emph{messages}.
4820Please note that although the return value of @code{textdomain} is of
4821type @code{char *} no changing is allowed.  It is also important to know
4822that no checks of the availability are made.  If the name is not
4823available you will see this by the fact that no translations are provided.
4824
4825@noindent
4826To use a domain set by @code{textdomain} the function
4827
4828@example
4829char *gettext (const char *msgid);
4830@end example
4831
4832@noindent
4833is to be used.  This is the simplest reasonable form one can imagine.
4834The translation of the string @var{msgid} is returned if it is available
4835in the current domain.  If it is not available, the argument itself is
4836returned.  If the argument is @code{NULL} the result is undefined.
4837
4838One thing which should come into mind is that no explicit dependency to
4839the used domain is given.  The current value of the domain for the
4840@code{LC_MESSAGES} locale is used.  If this changes between two
4841executions of the same @code{gettext} call in the program, both calls
4842reference a different message catalog.
4843
4844For the easiest case, which is normally used in internationalized
4845packages, once at the beginning of execution a call to @code{textdomain}
4846is issued, setting the domain to a unique name, normally the package
4847name.  In the following code all strings which have to be translated are
4848filtered through the gettext function.  That's all, the package speaks
4849your language.
4850
4851@node Ambiguities, Locating Catalogs, Interface to gettext, gettext
4852@subsection Solving Ambiguities
4853@cindex several domains
4854@cindex domain ambiguities
4855@cindex large package
4856
4857While this single name domain works well for most applications there
4858might be the need to get translations from more than one domain.  Of
4859course one could switch between different domains with calls to
4860@code{textdomain}, but this is really not convenient nor is it fast.  A
4861possible situation could be one case subject to discussion during this
4862writing:  all
4863error messages of functions in the set of common used functions should
4864go into a separate domain @code{error}.  By this mean we would only need
4865to translate them once.
4866Another case are messages from a library, as these @emph{have} to be
4867independent of the current domain set by the application.
4868
4869@noindent
4870For this reasons there are two more functions to retrieve strings:
4871
4872@example
4873char *dgettext (const char *domain_name, const char *msgid);
4874char *dcgettext (const char *domain_name, const char *msgid,
4875                 int category);
4876@end example
4877
4878Both take an additional argument at the first place, which corresponds
4879to the argument of @code{textdomain}.  The third argument of
4880@code{dcgettext} allows to use another locale but @code{LC_MESSAGES}.
4881But I really don't know where this can be useful.  If the
4882@var{domain_name} is @code{NULL} or @var{category} has an value beside
4883the known ones, the result is undefined.  It should also be noted that
4884this function is not part of the second known implementation of this
4885function family, the one found in Solaris.
4886
4887A second ambiguity can arise by the fact, that perhaps more than one
4888domain has the same name.  This can be solved by specifying where the
4889needed message catalog files can be found.
4890
4891@example
4892char *bindtextdomain (const char *domain_name,
4893                      const char *dir_name);
4894@end example
4895
4896Calling this function binds the given domain to a file in the specified
4897directory (how this file is determined follows below).  Especially a
4898file in the systems default place is not favored against the specified
4899file anymore (as it would be by solely using @code{textdomain}).  A
4900@code{NULL} pointer for the @var{dir_name} parameter returns the binding
4901associated with @var{domain_name}.  If @var{domain_name} itself is
4902@code{NULL} nothing happens and a @code{NULL} pointer is returned.  Here
4903again as for all the other functions is true that none of the return
4904value must be changed!
4905
4906It is important to remember that relative path names for the
4907@var{dir_name} parameter can be trouble.  Since the path is always
4908computed relative to the current directory different results will be
4909achieved when the program executes a @code{chdir} command.  Relative
4910paths should always be avoided to avoid dependencies and
4911unreliabilities.
4912
4913@node Locating Catalogs, Charset conversion, Ambiguities, gettext
4914@subsection Locating Message Catalog Files
4915@cindex message catalog files location
4916
4917Because many different languages for many different packages have to be
4918stored we need some way to add these information to file message catalog
4919files.  The way usually used in Unix environments is have this encoding
4920in the file name.  This is also done here.  The directory name given in
4921@code{bindtextdomain}s second argument (or the default directory),
4922followed by the value and name of the locale and the domain name are
4923concatenated:
4924
4925@example
4926@var{dir_name}/@var{locale}/LC_@var{category}/@var{domain_name}.mo
4927@end example
4928
4929The default value for @var{dir_name} is system specific.  For the GNU
4930library, and for packages adhering to its conventions, it's:
4931@example
4932/usr/local/share/locale
4933@end example
4934
4935@noindent
4936@var{locale} is the value of the locale whose name is this
4937@code{LC_@var{category}}.  For @code{gettext} and @code{dgettext} this
4938@code{LC_@var{category}} is always @code{LC_MESSAGES}.@footnote{Some
4939system, eg Ultrix, don't have @code{LC_MESSAGES}.  Here we use a more or
4940less arbitrary value for it, namely 1729, the smallest positive integer
4941which can be represented in two different ways as the sum of two cubes.}
4942The value of the locale is determined through
4943@code{setlocale (LC_@var{category}, NULL)}.
4944@footnote{When the system does not support @code{setlocale} its behavior
4945in setting the locale values is simulated by looking at the environment
4946variables.}
4947@code{dcgettext} specifies the locale category by the third argument.
4948
4949@node Charset conversion, Contexts, Locating Catalogs, gettext
4950@subsection How to specify the output character set @code{gettext} uses
4951@cindex charset conversion at runtime
4952@cindex encoding conversion at runtime
4953
4954@code{gettext} not only looks up a translation in a message catalog.  It
4955also converts the translation on the fly to the desired output character
4956set.  This is useful if the user is working in a different character set
4957than the translator who created the message catalog, because it avoids
4958distributing variants of message catalogs which differ only in the
4959character set.
4960
4961The output character set is, by default, the value of @code{nl_langinfo
4962(CODESET)}, which depends on the @code{LC_CTYPE} part of the current
4963locale.  But programs which store strings in a locale independent way
4964(e.g.@: UTF-8) can request that @code{gettext} and related functions
4965return the translations in that encoding, by use of the
4966@code{bind_textdomain_codeset} function.
4967
4968Note that the @var{msgid} argument to @code{gettext} is not subject to
4969character set conversion.  Also, when @code{gettext} does not find a
4970translation for @var{msgid}, it returns @var{msgid} unchanged --
4971independently of the current output character set.  It is therefore
4972recommended that all @var{msgid}s be US-ASCII strings.
4973
4974@deftypefun {char *} bind_textdomain_codeset (const char *@var{domainname}, const char *@var{codeset})
4975The @code{bind_textdomain_codeset} function can be used to specify the
4976output character set for message catalogs for domain @var{domainname}.
4977The @var{codeset} argument must be a valid codeset name which can be used
4978for the @code{iconv_open} function, or a null pointer.
4979
4980If the @var{codeset} parameter is the null pointer,
4981@code{bind_textdomain_codeset} returns the currently selected codeset
4982for the domain with the name @var{domainname}.  It returns @code{NULL} if
4983no codeset has yet been selected.
4984
4985The @code{bind_textdomain_codeset} function can be used several times. 
4986If used multiple times with the same @var{domainname} argument, the
4987later call overrides the settings made by the earlier one.
4988
4989The @code{bind_textdomain_codeset} function returns a pointer to a
4990string containing the name of the selected codeset.  The string is
4991allocated internally in the function and must not be changed by the
4992user.  If the system went out of core during the execution of
4993@code{bind_textdomain_codeset}, the return value is @code{NULL} and the
4994global variable @var{errno} is set accordingly.
4995@end deftypefun
4996
4997@node Contexts, Plural forms, Charset conversion, gettext
4998@subsection Using contexts for solving ambiguities
4999@cindex context
5000@cindex GUI programs
5001@cindex translating menu entries
5002@cindex menu entries
5003
5004One place where the @code{gettext} functions, if used normally, have big
5005problems is within programs with graphical user interfaces (GUIs).  The
5006problem is that many of the strings which have to be translated are very
5007short.  They have to appear in pull-down menus which restricts the
5008length.  But strings which are not containing entire sentences or at
5009least large fragments of a sentence may appear in more than one
5010situation in the program but might have different translations.  This is
5011especially true for the one-word strings which are frequently used in
5012GUI programs.
5013
5014As a consequence many people say that the @code{gettext} approach is
5015wrong and instead @code{catgets} should be used which indeed does not
5016have this problem.  But there is a very simple and powerful method to
5017handle this kind of problems with the @code{gettext} functions.
5018
5019Contexts can be added to strings to be translated.  A context dependent
5020translation lookup is when a translation for a given string is searched,
5021that is limited to a given context.  The translation for the same string
5022in a different context can be different.  The different translations of
5023the same string in different contexts can be stored in the in the same
5024MO file, and can be edited by the translator in the same PO file.
5025
5026The @file{gettext.h} include file contains the lookup macros for strings
5027with contexts.  They are implemented as thin macros and inline functions
5028over the functions from @code{<libintl.h>}.
5029
5030@findex pgettext
5031@example
5032const char *pgettext (const char *msgctxt, const char *msgid);
5033@end example
5034
5035In a call of this macro, @var{msgctxt} and @var{msgid} must be string
5036literals.  The macro returns the translation of @var{msgid}, restricted
5037to the context given by @var{msgctxt}.
5038
5039The @var{msgctxt} string is visible in the PO file to the translator.
5040You should try to make it somehow canonical and never changing.  Because
5041every time you change an @var{msgctxt}, the translator will have to review
5042the translation of @var{msgid}.
5043
5044Finding a canonical @var{msgctxt} string that doesn't change over time can
5045be hard.  But you shouldn't use the file name or class name containing the
5046@code{pgettext} call -- because it is a common development task to rename
5047a file or a class, and it shouldn't cause translator work.  Also you shouldn't
5048use a comment in the form of a complete English sentence as @var{msgctxt} --
5049because orthography or grammar changes are often applied to such sentences,
5050and again, it shouldn't force the translator to do a review.
5051
5052The @samp{p} in @samp{pgettext} stands for ``particular'': @code{pgettext}
5053fetches a particular translation of the @var{msgid}.
5054
5055@findex dpgettext
5056@findex dcpgettext
5057@example
5058const char *dpgettext (const char *domain_name,
5059                       const char *msgctxt, const char *msgid);
5060const char *dcpgettext (const char *domain_name,
5061                        const char *msgctxt, const char *msgid,
5062                        int category);
5063@end example
5064
5065These are generalizations of @code{pgettext}.  They behave similarly to
5066@code{dgettext} and @code{dcgettext}, respectively.  The @var{domain_name}
5067argument defines the translation domain.  The @var{category} argument
5068allows to use another locale facet than @code{LC_MESSAGES}.
5069
5070As as example consider the following fictional situation.  A GUI program
5071has a menu bar with the following entries:
5072
5073@smallexample
5074+------------+------------+--------------------------------------+
5075| File       | Printer    |                                      |
5076+------------+------------+--------------------------------------+
5077| Open     | | Select   |
5078| New      | | Open     |
5079+----------+ | Connect  |
5080             +----------+
5081@end smallexample
5082
5083To have the strings @code{File}, @code{Printer}, @code{Open},
5084@code{New}, @code{Select}, and @code{Connect} translated there has to be
5085at some point in the code a call to a function of the @code{gettext}
5086family.  But in two places the string passed into the function would be
5087@code{Open}.  The translations might not be the same and therefore we
5088are in the dilemma described above.
5089
5090What distinguishes the two places is the menu path from the menu root to
5091the particular menu entries:
5092
5093@smallexample
5094Menu|File
5095Menu|Printer
5096Menu|File|Open
5097Menu|File|New
5098Menu|Printer|Select
5099Menu|Printer|Open
5100Menu|Printer|Connect
5101@end smallexample
5102
5103The context is thus the menu path without its last part.  So, the calls
5104look like this:
5105
5106@smallexample
5107pgettext ("Menu|", "File")
5108pgettext ("Menu|", "Printer")
5109pgettext ("Menu|File|", "Open")
5110pgettext ("Menu|File|", "New")
5111pgettext ("Menu|Printer|", "Select")
5112pgettext ("Menu|Printer|", "Open")
5113pgettext ("Menu|Printer|", "Connect")
5114@end smallexample
5115
5116Whether or not to use the @samp{|} character at the end of the context is a
5117matter of style.
5118
5119For more complex cases, where the @var{msgctxt} or @var{msgid} are not
5120string literals, more general macros are available:
5121
5122@findex pgettext_expr
5123@findex dpgettext_expr
5124@findex dcpgettext_expr
5125@example
5126const char *pgettext_expr (const char *msgctxt, const char *msgid);
5127const char *dpgettext_expr (const char *domain_name,
5128                            const char *msgctxt, const char *msgid);
5129const char *dcpgettext_expr (const char *domain_name,
5130                             const char *msgctxt, const char *msgid,
5131                             int category);
5132@end example
5133
5134Here @var{msgctxt} and @var{msgid} can be arbitrary string-valued expressions.
5135These macros are more general.  But in the case that both argument expressions
5136are string literals, the macros without the @samp{_expr} suffix are more
5137efficient.
5138
5139@node Plural forms, Optimized gettext, Contexts, gettext
5140@subsection Additional functions for plural forms
5141@cindex plural forms
5142
5143The functions of the @code{gettext} family described so far (and all the
5144@code{catgets} functions as well) have one problem in the real world
5145which have been neglected completely in all existing approaches.  What
5146is meant here is the handling of plural forms.
5147
5148Looking through Unix source code before the time anybody thought about
5149internationalization (and, sadly, even afterwards) one can often find
5150code similar to the following:
5151
5152@smallexample
5153   printf ("%d file%s deleted", n, n == 1 ? "" : "s");
5154@end smallexample
5155
5156@noindent
5157After the first complaints from people internationalizing the code people
5158either completely avoided formulations like this or used strings like
5159@code{"file(s)"}.  Both look unnatural and should be avoided.  First
5160tries to solve the problem correctly looked like this:
5161
5162@smallexample
5163   if (n == 1)
5164     printf ("%d file deleted", n);
5165   else
5166     printf ("%d files deleted", n);
5167@end smallexample
5168
5169But this does not solve the problem.  It helps languages where the
5170plural form of a noun is not simply constructed by adding an
5171@ifhtml
5172���s���
5173@end ifhtml
5174@ifnothtml
5175`s'
5176@end ifnothtml
5177but that is all.  Once again people fell into the trap of believing the
5178rules their language is using are universal.  But the handling of plural
5179forms differs widely between the language families.  For example,
5180Rafal Maszkowski @code{<rzm@@mat.uni.torun.pl>} reports:
5181
5182@quotation
5183In Polish we use e.g.@: plik (file) this way:
5184@example
51851 plik
51862,3,4 pliki
51875-21 pliko'w
518822-24 pliki
518925-31 pliko'w
5190@end example
5191and so on (o' means 8859-2 oacute which should be rather okreska,
5192similar to aogonek).
5193@end quotation
5194
5195There are two things which can differ between languages (and even inside
5196language families);
5197
5198@itemize @bullet
5199@item
5200The form how plural forms are built differs.  This is a problem with
5201languages which have many irregularities.  German, for instance, is a
5202drastic case.  Though English and German are part of the same language
5203family (Germanic), the almost regular forming of plural noun forms
5204(appending an
5205@ifhtml
5206���s���)
5207@end ifhtml
5208@ifnothtml
5209`s')
5210@end ifnothtml
5211is hardly found in German.
5212
5213@item
5214The number of plural forms differ.  This is somewhat surprising for
5215those who only have experiences with Romanic and Germanic languages
5216since here the number is the same (there are two).
5217
5218But other language families have only one form or many forms.  More
5219information on this in an extra section.
5220@end itemize
5221
5222The consequence of this is that application writers should not try to
5223solve the problem in their code.  This would be localization since it is
5224only usable for certain, hardcoded language environments.  Instead the
5225extended @code{gettext} interface should be used.
5226
5227These extra functions are taking instead of the one key string two
5228strings and a numerical argument.  The idea behind this is that using
5229the numerical argument and the first string as a key, the implementation
5230can select using rules specified by the translator the right plural
5231form.  The two string arguments then will be used to provide a return
5232value in case no message catalog is found (similar to the normal
5233@code{gettext} behavior).  In this case the rules for Germanic language
5234is used and it is assumed that the first string argument is the singular
5235form, the second the plural form.
5236
5237This has the consequence that programs without language catalogs can
5238display the correct strings only if the program itself is written using
5239a Germanic language.  This is a limitation but since the GNU C library
5240(as well as the GNU @code{gettext} package) are written as part of the
5241GNU package and the coding standards for the GNU project require program
5242being written in English, this solution nevertheless fulfills its
5243purpose.
5244
5245@deftypefun {char *} ngettext (const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
5246The @code{ngettext} function is similar to the @code{gettext} function
5247as it finds the message catalogs in the same way.  But it takes two
5248extra arguments.  The @var{msgid1} parameter must contain the singular
5249form of the string to be converted.  It is also used as the key for the
5250search in the catalog.  The @var{msgid2} parameter is the plural form.
5251The parameter @var{n} is used to determine the plural form.  If no
5252message catalog is found @var{msgid1} is returned if @code{n == 1},
5253otherwise @code{msgid2}.
5254
5255An example for the use of this function is:
5256
5257@smallexample
5258printf (ngettext ("%d file removed", "%d files removed", n), n);
5259@end smallexample
5260
5261Please note that the numeric value @var{n} has to be passed to the
5262@code{printf} function as well.  It is not sufficient to pass it only to
5263@code{ngettext}.
5264
5265In the English singular case, the number -- always 1 -- can be replaced with
5266"one":
5267
5268@smallexample
5269printf (ngettext ("One file removed", "%d files removed", n), n);
5270@end smallexample
5271
5272@noindent
5273This works because the @samp{printf} function discards excess arguments that
5274are not consumed by the format string.
5275
5276It is also possible to use this function when the strings don't contain a
5277cardinal number:
5278
5279@smallexample
5280puts (ngettext ("Delete the selected file?",
5281                "Delete the selected files?",
5282                n));
5283@end smallexample
5284
5285In this case the number @var{n} is only used to choose the plural form.
5286@end deftypefun
5287
5288@deftypefun {char *} dngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
5289The @code{dngettext} is similar to the @code{dgettext} function in the
5290way the message catalog is selected.  The difference is that it takes
5291two extra parameter to provide the correct plural form.  These two
5292parameters are handled in the same way @code{ngettext} handles them.
5293@end deftypefun
5294
5295@deftypefun {char *} dcngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}, int @var{category})
5296The @code{dcngettext} is similar to the @code{dcgettext} function in the
5297way the message catalog is selected.  The difference is that it takes
5298two extra parameter to provide the correct plural form.  These two
5299parameters are handled in the same way @code{ngettext} handles them.
5300@end deftypefun
5301
5302Now, how do these functions solve the problem of the plural forms?
5303Without the input of linguists (which was not available) it was not
5304possible to determine whether there are only a few different forms in
5305which plural forms are formed or whether the number can increase with
5306every new supported language.
5307
5308Therefore the solution implemented is to allow the translator to specify
5309the rules of how to select the plural form.  Since the formula varies
5310with every language this is the only viable solution except for
5311hardcoding the information in the code (which still would require the
5312possibility of extensions to not prevent the use of new languages).
5313
5314@cindex specifying plural form in a PO file
5315@kwindex nplurals@r{, in a PO file header}
5316@kwindex plural@r{, in a PO file header}
5317The information about the plural form selection has to be stored in the
5318header entry of the PO file (the one with the empty @code{msgid} string).
5319The plural form information looks like this:
5320
5321@smallexample
5322Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1;
5323@end smallexample
5324
5325The @code{nplurals} value must be a decimal number which specifies how
5326many different plural forms exist for this language.  The string
5327following @code{plural} is an expression which is using the C language
5328syntax.  Exceptions are that no negative numbers are allowed, numbers
5329must be decimal, and the only variable allowed is @code{n}.  Spaces are
5330allowed in the expression, but backslash-newlines are not; in the
5331examples below the backslash-newlines are present for formatting purposes
5332only.  This expression will be evaluated whenever one of the functions
5333@code{ngettext}, @code{dngettext}, or @code{dcngettext} is called.  The
5334numeric value passed to these functions is then substituted for all uses
5335of the variable @code{n} in the expression.  The resulting value then
5336must be greater or equal to zero and smaller than the value given as the
5337value of @code{nplurals}.
5338
5339@noindent
5340@cindex plural form formulas
5341The following rules are known at this point.  The language with families
5342are listed.  But this does not necessarily mean the information can be
5343generalized for the whole family (as can be easily seen in the table
5344below).@footnote{Additions are welcome.  Send appropriate information to
5345@email{bug-gnu-gettext@@gnu.org} and @email{bug-glibc-manual@@gnu.org}.}
5346
5347@table @asis
5348@item Only one form:
5349Some languages only require one single form.  There is no distinction
5350between the singular and plural form.  An appropriate header entry
5351would look like this:
5352
5353@smallexample
5354Plural-Forms: nplurals=1; plural=0;
5355@end smallexample
5356
5357@noindent
5358Languages with this property include:
5359
5360@table @asis
5361@item Asian family
5362Japanese, Korean, Vietnamese
5363@item Turkic/Altaic family
5364Turkish
5365@end table
5366
5367@item Two forms, singular used for one only
5368This is the form used in most existing programs since it is what English
5369is using.  A header entry would look like this:
5370
5371@smallexample
5372Plural-Forms: nplurals=2; plural=n != 1;
5373@end smallexample
5374
5375(Note: this uses the feature of C expressions that boolean expressions
5376have to value zero or one.)
5377
5378@noindent
5379Languages with this property include:
5380
5381@table @asis
5382@item Germanic family
5383Danish, Dutch, English, Faroese, German, Norwegian, Swedish
5384@item Finno-Ugric family
5385Estonian, Finnish
5386@item Latin/Greek family
5387Greek
5388@item Semitic family
5389Hebrew
5390@item Romanic family
5391Italian, Portuguese, Spanish
5392@item Artificial
5393Esperanto
5394@end table
5395
5396@noindent
5397Another language using the same header entry is:
5398
5399@table @asis
5400@item Finno-Ugric family
5401Hungarian
5402@end table
5403
5404Hungarian does not appear to have a plural if you look at sentences involving
5405cardinal numbers.  For example, ``1 apple'' is ``1 alma'', and ``123 apples'' is
5406``123 alma''.  But when the number is not explicit, the distinction between
5407singular and plural exists: ``the apple'' is ``az alma'', and ``the apples'' is
5408``az alm@'{a}k''.  Since @code{ngettext} has to support both types of sentences,
5409it is classified here, under ``two forms''.
5410
5411@item Two forms, singular used for zero and one
5412Exceptional case in the language family.  The header entry would be:
5413
5414@smallexample
5415Plural-Forms: nplurals=2; plural=n>1;
5416@end smallexample
5417
5418@noindent
5419Languages with this property include:
5420
5421@table @asis
5422@item Romanic family
5423French, Brazilian Portuguese
5424@end table
5425
5426@item Three forms, special case for zero
5427The header entry would be:
5428
5429@smallexample
5430Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2;
5431@end smallexample
5432
5433@noindent
5434Languages with this property include:
5435
5436@table @asis
5437@item Baltic family
5438Latvian
5439@end table
5440
5441@item Three forms, special cases for one and two
5442The header entry would be:
5443
5444@smallexample
5445Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2;
5446@end smallexample
5447
5448@noindent
5449Languages with this property include:
5450
5451@table @asis
5452@item Celtic
5453Gaeilge (Irish)
5454@end table
5455
5456@item Three forms, special case for numbers ending in 00 or [2-9][0-9]
5457The header entry would be:
5458
5459@smallexample
5460Plural-Forms: nplurals=3; \
5461    plural=n==1 ? 0 : (n==0 || (n%100 > 0 && n%100 < 20)) ? 1 : 2;
5462@end smallexample
5463
5464@noindent
5465Languages with this property include:
5466
5467@table @asis
5468@item Romanic family
5469Romanian
5470@end table
5471
5472@item Three forms, special case for numbers ending in 1[2-9]
5473The header entry would look like this:
5474
5475@smallexample
5476Plural-Forms: nplurals=3; \
5477    plural=n%10==1 && n%100!=11 ? 0 : \
5478           n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2;
5479@end smallexample
5480
5481@noindent
5482Languages with this property include:
5483
5484@table @asis
5485@item Baltic family
5486Lithuanian
5487@end table
5488
5489@item Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4]
5490The header entry would look like this:
5491
5492@smallexample
5493Plural-Forms: nplurals=3; \
5494    plural=n%10==1 && n%100!=11 ? 0 : \
5495           n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
5496@end smallexample
5497
5498@noindent
5499Languages with this property include:
5500
5501@table @asis
5502@item Slavic family
5503Croatian, Serbian, Russian, Ukrainian
5504@end table
5505
5506@item Three forms, special cases for 1 and 2, 3, 4
5507The header entry would look like this:
5508
5509@smallexample
5510Plural-Forms: nplurals=3; \
5511    plural=(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2;
5512@end smallexample
5513
5514@noindent
5515Languages with this property include:
5516
5517@table @asis
5518@item Slavic family
5519Slovak, Czech
5520@end table
5521
5522@item Three forms, special case for one and some numbers ending in 2, 3, or 4
5523The header entry would look like this:
5524
5525@smallexample
5526Plural-Forms: nplurals=3; \
5527    plural=n==1 ? 0 : \
5528           n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
5529@end smallexample
5530
5531@noindent
5532Languages with this property include:
5533
5534@table @asis
5535@item Slavic family
5536Polish
5537@end table
5538
5539@item Four forms, special case for one and all numbers ending in 02, 03, or 04
5540The header entry would look like this:
5541
5542@smallexample
5543Plural-Forms: nplurals=4; \
5544    plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3;
5545@end smallexample
5546
5547@noindent
5548Languages with this property include:
5549
5550@table @asis
5551@item Slavic family
5552Slovenian
5553@end table
5554@end table
5555
5556You might now ask, @code{ngettext} handles only numbers @var{n} of type
5557@samp{unsigned long}.  What about larger integer types?  What about negative
5558numbers?  What about floating-point numbers?
5559
5560About larger integer types, such as @samp{uintmax_t} or 
5561@samp{unsigned long long}: they can be handled by reducing the value to a
5562range that fits in an @samp{unsigned long}.  Simply casting the value to
5563@samp{unsigned long} would not do the right thing, since it would treat
5564@code{ULONG_MAX + 1} like zero, @code{ULONG_MAX + 2} like singular, and
5565the like.  Here you can exploit the fact that all mentioned plural form
5566formulas eventually become periodic, with a period that is a divisor of 100
5567(or 1000 or 1000000).  So, when you reduce a large value to another one in
5568the range [1000000, 1999999] that ends in the same 6 decimal digits, you
5569can assume that it will lead to the same plural form selection.  This code
5570does this:
5571
5572@smallexample
5573#include <inttypes.h>
5574uintmax_t nbytes = ...;
5575printf (ngettext ("The file has %"PRIuMAX" byte.",
5576                  "The file has %"PRIuMAX" bytes.",
5577                  (nbytes > ULONG_MAX
5578                   ? (nbytes % 1000000) + 1000000
5579                   : nbytes)),
5580        nbytes);
5581@end smallexample
5582
5583Negative and floating-point values usually represent physical entities for
5584which singular and plural don't clearly apply.  In such cases, there is no
5585need to use @code{ngettext}; a simple @code{gettext} call with a form suitable
5586for all values will do.  For example:
5587
5588@smallexample
5589printf (gettext ("Time elapsed: %.3f seconds"),
5590        num_milliseconds * 0.001);
5591@end smallexample
5592
5593@noindent
5594Even if @var{num_milliseconds} happens to be a multiple of 1000, the output
5595@smallexample
5596Time elapsed: 1.000 seconds
5597@end smallexample
5598@noindent
5599is acceptable in English, and similarly for other languages.
5600
5601@node Optimized gettext,  , Plural forms, gettext
5602@subsection Optimization of the *gettext functions
5603@cindex optimization of @code{gettext} functions
5604
5605At this point of the discussion we should talk about an advantage of the
5606GNU @code{gettext} implementation.  Some readers might have pointed out
5607that an internationalized program might have a poor performance if some
5608string has to be translated in an inner loop.  While this is unavoidable
5609when the string varies from one run of the loop to the other it is
5610simply a waste of time when the string is always the same.  Take the
5611following example:
5612
5613@example
5614@group
5615@{
5616  while (@dots{})
5617    @{
5618      puts (gettext ("Hello world"));
5619    @}
5620@}
5621@end group
5622@end example
5623
5624@noindent
5625When the locale selection does not change between two runs the resulting
5626string is always the same.  One way to use this is:
5627
5628@example
5629@group
5630@{
5631  str = gettext ("Hello world");
5632  while (@dots{})
5633    @{
5634      puts (str);
5635    @}
5636@}
5637@end group
5638@end example
5639
5640@noindent
5641But this solution is not usable in all situation (e.g.@: when the locale
5642selection changes) nor does it lead to legible code.
5643
5644For this reason, GNU @code{gettext} caches previous translation results.
5645When the same translation is requested twice, with no new message
5646catalogs being loaded in between, @code{gettext} will, the second time,
5647find the result through a single cache lookup.
5648
5649@node Comparison, Using libintl.a, gettext, Programmers
5650@section Comparing the Two Interfaces
5651@cindex @code{gettext} vs @code{catgets}
5652@cindex comparison of interfaces
5653
5654@c FIXME: arguments to catgets vs. gettext
5655@c Partly done 950718 -- drepper
5656
5657The following discussion is perhaps a little bit colored.  As said
5658above we implemented GNU @code{gettext} following the Uniforum
5659proposal and this surely has its reasons.  But it should show how we
5660came to this decision.
5661
5662First we take a look at the developing process.  When we write an
5663application using NLS provided by @code{gettext} we proceed as always.
5664Only when we come to a string which might be seen by the users and thus
5665has to be translated we use @code{gettext("@dots{}")} instead of
5666@code{"@dots{}"}.  At the beginning of each source file (or in a central
5667header file) we define
5668
5669@example
5670#define gettext(String) (String)
5671@end example
5672
5673Even this definition can be avoided when the system supports the
5674@code{gettext} function in its C library.  When we compile this code the
5675result is the same as if no NLS code is used.  When  you take a look at
5676the GNU @code{gettext} code you will see that we use @code{_("@dots{}")}
5677instead of @code{gettext("@dots{}")}.  This reduces the number of
5678additional characters per translatable string to @emph{3} (in words:
5679three).
5680
5681When now a production version of the program is needed we simply replace
5682the definition
5683
5684@example
5685#define _(String) (String)
5686@end example
5687
5688@noindent
5689by
5690
5691@cindex include file @file{libintl.h}
5692@example
5693#include <libintl.h>
5694#define _(String) gettext (String)
5695@end example
5696
5697@noindent
5698Additionally we run the program @file{xgettext} on all source code file
5699which contain translatable strings and that's it: we have a running
5700program which does not depend on translations to be available, but which
5701can use any that becomes available.
5702
5703@cindex @code{N_}, a convenience macro
5704The same procedure can be done for the @code{gettext_noop} invocations
5705(@pxref{Special cases}).  One usually defines @code{gettext_noop} as a
5706no-op macro.  So you should consider the following code for your project:
5707
5708@example
5709#define gettext_noop(String) String
5710#define N_(String) gettext_noop (String)
5711@end example
5712
5713@code{N_} is a short form similar to @code{_}.  The @file{Makefile} in
5714the @file{po/} directory of GNU @code{gettext} knows by default both of the
5715mentioned short forms so you are invited to follow this proposal for
5716your own ease.
5717
5718Now to @code{catgets}.  The main problem is the work for the
5719programmer.  Every time he comes to a translatable string he has to
5720define a number (or a symbolic constant) which has also be defined in
5721the message catalog file.  He also has to take care for duplicate
5722entries, duplicate message IDs etc.  If he wants to have the same
5723quality in the message catalog as the GNU @code{gettext} program
5724provides he also has to put the descriptive comments for the strings and
5725the location in all source code files in the message catalog.  This is
5726nearly a Mission: Impossible.
5727
5728But there are also some points people might call advantages speaking for
5729@code{catgets}.  If you have a single word in a string and this string
5730is used in different contexts it is likely that in one or the other
5731language the word has different translations.  Example:
5732
5733@example
5734printf ("%s: %d", gettext ("number"), number_of_errors)
5735
5736printf ("you should see %d %s", number_count,
5737        number_count == 1 ? gettext ("number") : gettext ("numbers"))
5738@end example
5739
5740Here we have to translate two times the string @code{"number"}.  Even
5741if you do not speak a language beside English it might be possible to
5742recognize that the two words have a different meaning.  In German the
5743first appearance has to be translated to @code{"Anzahl"} and the second
5744to @code{"Zahl"}.
5745
5746Now you can say that this example is really esoteric.  And you are
5747right!  This is exactly how we felt about this problem and decide that
5748it does not weight that much.  The solution for the above problem could
5749be very easy:
5750
5751@example
5752printf ("%s %d", gettext ("number:"), number_of_errors)
5753
5754printf (number_count == 1 ? gettext ("you should see %d number")
5755                          : gettext ("you should see %d numbers"),
5756        number_count)
5757@end example
5758
5759We believe that we can solve all conflicts with this method.  If it is
5760difficult one can also consider changing one of the conflicting string a
5761little bit.  But it is not impossible to overcome.
5762
5763@code{catgets} allows same original entry to have different translations,
5764but @code{gettext} has another, scalable approach for solving ambiguities
5765of this kind: @xref{Ambiguities}.
5766
5767@node Using libintl.a, gettext grok, Comparison, Programmers
5768@section Using libintl.a in own programs
5769
5770Starting with version 0.9.4 the library @code{libintl.h} should be
5771self-contained.  I.e., you can use it in your own programs without
5772providing additional functions.  The @file{Makefile} will put the header
5773and the library in directories selected using the @code{$(prefix)}.
5774
5775@node gettext grok, Temp Programmers, Using libintl.a, Programmers
5776@section Being a @code{gettext} grok
5777
5778@strong{ NOTE: } This documentation section is outdated and needs to be
5779revised.
5780
5781To fully exploit the functionality of the GNU @code{gettext} library it
5782is surely helpful to read the source code.  But for those who don't want
5783to spend that much time in reading the (sometimes complicated) code here
5784is a list comments:
5785
5786@itemize @bullet
5787@item Changing the language at runtime
5788@cindex language selection at runtime
5789
5790For interactive programs it might be useful to offer a selection of the
5791used language at runtime.  To understand how to do this one need to know
5792how the used language is determined while executing the @code{gettext}
5793function.  The method which is presented here only works correctly
5794with the GNU implementation of the @code{gettext} functions.
5795
5796In the function @code{dcgettext} at every call the current setting of
5797the highest priority environment variable is determined and used.
5798Highest priority means here the following list with decreasing
5799priority:
5800
5801@enumerate
5802@vindex LANGUAGE@r{, environment variable}
5803@item @code{LANGUAGE}
5804@vindex LC_ALL@r{, environment variable}
5805@item @code{LC_ALL}
5806@vindex LC_CTYPE@r{, environment variable}
5807@vindex LC_NUMERIC@r{, environment variable}
5808@vindex LC_TIME@r{, environment variable}
5809@vindex LC_COLLATE@r{, environment variable}
5810@vindex LC_MONETARY@r{, environment variable}
5811@vindex LC_MESSAGES@r{, environment variable}
5812@item @code{LC_xxx}, according to selected locale
5813@vindex LANG@r{, environment variable}
5814@item @code{LANG}
5815@end enumerate
5816
5817Afterwards the path is constructed using the found value and the
5818translation file is loaded if available.
5819
5820What happens now when the value for, say, @code{LANGUAGE} changes?  According
5821to the process explained above the new value of this variable is found
5822as soon as the @code{dcgettext} function is called.  But this also means
5823the (perhaps) different message catalog file is loaded.  In other
5824words: the used language is changed.
5825
5826But there is one little hook.  The code for gcc-2.7.0 and up provides
5827some optimization.  This optimization normally prevents the calling of
5828the @code{dcgettext} function as long as no new catalog is loaded.  But
5829if @code{dcgettext} is not called the program also cannot find the
5830@code{LANGUAGE} variable be changed (@pxref{Optimized gettext}).  A
5831solution for this is very easy.  Include the following code in the
5832language switching function.
5833
5834@example
5835  /* Change language.  */
5836  setenv ("LANGUAGE", "fr", 1);
5837
5838  /* Make change known.  */
5839  @{
5840    extern int  _nl_msg_cat_cntr;
5841    ++_nl_msg_cat_cntr;
5842  @}
5843@end example
5844
5845@cindex @code{_nl_msg_cat_cntr}
5846The variable @code{_nl_msg_cat_cntr} is defined in @file{loadmsgcat.c}.
5847You don't need to know what this is for.  But it can be used to detect
5848whether a @code{gettext} implementation is GNU gettext and not non-GNU
5849system's native gettext implementation.
5850
5851@end itemize
5852
5853@node Temp Programmers,  , gettext grok, Programmers
5854@section Temporary Notes for the Programmers Chapter
5855
5856@strong{ NOTE: } This documentation section is outdated and needs to be
5857revised.
5858
5859@menu
5860* Temp Implementations::        Temporary - Two Possible Implementations
5861* Temp catgets::                Temporary - About @code{catgets}
5862* Temp WSI::                    Temporary - Why a single implementation
5863* Temp Notes::                  Temporary - Notes
5864@end menu
5865
5866@node Temp Implementations, Temp catgets, Temp Programmers, Temp Programmers
5867@subsection Temporary - Two Possible Implementations
5868
5869There are two competing methods for language independent messages:
5870the X/Open @code{catgets} method, and the Uniforum @code{gettext}
5871method.  The @code{catgets} method indexes messages by integers; the
5872@code{gettext} method indexes them by their English translations.
5873The @code{catgets} method has been around longer and is supported
5874by more vendors.  The @code{gettext} method is supported by Sun,
5875and it has been heard that the COSE multi-vendor initiative is
5876supporting it.  Neither method is a POSIX standard; the POSIX.1
5877committee had a lot of disagreement in this area.
5878
5879Neither one is in the POSIX standard.  There was much disagreement
5880in the POSIX.1 committee about using the @code{gettext} routines
5881vs. @code{catgets} (XPG).  In the end the committee couldn't
5882agree on anything, so no messaging system was included as part
5883of the standard.  I believe the informative annex of the standard
5884includes the XPG3 messaging interfaces, ``@dots{}as an example of
5885a messaging system that has been implemented@dots{}''
5886
5887They were very careful not to say anywhere that you should use one
5888set of interfaces over the other.  For more on this topic please
5889see the Programming for Internationalization FAQ.
5890
5891@node Temp catgets, Temp WSI, Temp Implementations, Temp Programmers
5892@subsection Temporary - About @code{catgets}
5893
5894There have been a few discussions of late on the use of
5895@code{catgets} as a base.  I think it important to present both
5896sides of the argument and hence am opting to play devil's advocate
5897for a little bit.
5898
5899I'll not deny the fact that @code{catgets} could have been designed
5900a lot better.  It currently has quite a number of limitations and
5901these have already been pointed out.
5902
5903However there is a great deal to be said for consistency and
5904standardization.  A common recurring problem when writing Unix
5905software is the myriad portability problems across Unix platforms.
5906It seems as if every Unix vendor had a look at the operating system
5907and found parts they could improve upon.  Undoubtedly, these
5908modifications are probably innovative and solve real problems.
5909However, software developers have a hard time keeping up with all
5910these changes across so many platforms.
5911
5912And this has prompted the Unix vendors to begin to standardize their
5913systems.  Hence the impetus for Spec1170.  Every major Unix vendor
5914has committed to supporting this standard and every Unix software
5915developer waits with glee the day they can write software to this
5916standard and simply recompile (without having to use autoconf)
5917across different platforms.
5918
5919As I understand it, Spec1170 is roughly based upon version 4 of the
5920X/Open Portability Guidelines (XPG4).  Because @code{catgets} and
5921friends are defined in XPG4, I'm led to believe that @code{catgets}
5922is a part of Spec1170 and hence will become a standardized component
5923of all Unix systems.
5924
5925@node Temp WSI, Temp Notes, Temp catgets, Temp Programmers
5926@subsection Temporary - Why a single implementation
5927
5928Now it seems kind of wasteful to me to have two different systems
5929installed for accessing message catalogs.  If we do want to remedy
5930@code{catgets} deficiencies why don't we try to expand @code{catgets}
5931(in a compatible manner) rather than implement an entirely new system.
5932Otherwise, we'll end up with two message catalog access systems installed
5933with an operating system - one set of routines for packages using GNU
5934@code{gettext} for their internationalization, and another set of routines
5935(catgets) for all other software.  Bloated?
5936
5937Supposing another catalog access system is implemented.  Which do
5938we recommend?  At least for Linux, we need to attract as many
5939software developers as possible.  Hence we need to make it as easy
5940for them to port their software as possible.  Which means supporting
5941@code{catgets}.  We will be implementing the @code{libintl} code
5942within our @code{libc}, but does this mean we also have to incorporate
5943another message catalog access scheme within our @code{libc} as well?
5944And what about people who are going to be using the @code{libintl}
5945+ non-@code{catgets} routines.  When they port their software to
5946other platforms, they're now going to have to include the front-end
5947(@code{libintl}) code plus the back-end code (the non-@code{catgets}
5948access routines) with their software instead of just including the
5949@code{libintl} code with their software.
5950
5951Message catalog support is however only the tip of the iceberg.
5952What about the data for the other locale categories.  They also have
5953a number of deficiencies.  Are we going to abandon them as well and
5954develop another duplicate set of routines (should @code{libintl}
5955expand beyond message catalog support)?
5956
5957Like many parts of Unix that can be improved upon, we're stuck with balancing
5958compatibility with the past with useful improvements and innovations for
5959the future.
5960
5961@node Temp Notes,  , Temp WSI, Temp Programmers
5962@subsection Temporary - Notes
5963
5964X/Open agreed very late on the standard form so that many
5965implementations differ from the final form.  Both of my system (old
5966Linux catgets and Ultrix-4) have a strange variation.
5967
5968OK.  After incorporating the last changes I have to spend some time on
5969making the GNU/Linux @code{libc} @code{gettext} functions.  So in future
5970Solaris is not the only system having @code{gettext}.
5971
5972@node Translators, Maintainers, Programmers, Top
5973@chapter The Translator's View
5974
5975@c FIXME: Reorganize whole chapter.
5976
5977@menu
5978* Trans Intro 0::               Introduction 0
5979* Trans Intro 1::               Introduction 1
5980* Discussions::                 Discussions
5981* Organization::                Organization
5982* Information Flow::            Information Flow
5983* Prioritizing messages::       How to find which messages to translate first
5984@end menu
5985
5986@node Trans Intro 0, Trans Intro 1, Translators, Translators
5987@section Introduction 0
5988
5989@strong{ NOTE: } This documentation section is outdated and needs to be
5990revised.
5991
5992Free software is going international!  The Translation Project is a way
5993to get maintainers, translators and users all together, so free software
5994will gradually become able to speak many native languages.
5995
5996The GNU @code{gettext} tool set contains @emph{everything} maintainers
5997need for internationalizing their packages for messages.  It also
5998contains quite useful tools for helping translators at localizing
5999messages to their native language, once a package has already been
6000internationalized.
6001
6002To achieve the Translation Project, we need many interested
6003people who like their own language and write it well, and who are also
6004able to synergize with other translators speaking the same language.
6005If you'd like to volunteer to @emph{work} at translating messages,
6006please send mail to your translating team.
6007
6008Each team has its own mailing list, courtesy of Linux
6009International.  You may reach your translating team at the address
6010@file{@var{ll}@@li.org}, replacing @var{ll} by the two-letter @w{ISO 639}
6011code for your language.  Language codes are @emph{not} the same as
6012country codes given in @w{ISO 3166}.  The following translating teams
6013exist:
6014
6015@quotation
6016Chinese @code{zh}, Czech @code{cs}, Danish @code{da}, Dutch @code{nl},
6017Esperanto @code{eo}, Finnish @code{fi}, French @code{fr}, Irish
6018@code{ga}, German @code{de}, Greek @code{el}, Italian @code{it},
6019Japanese @code{ja}, Indonesian @code{in}, Norwegian @code{no}, Polish
6020@code{pl}, Portuguese @code{pt}, Russian @code{ru}, Spanish @code{es},
6021Swedish @code{sv} and Turkish @code{tr}.
6022@end quotation
6023
6024@noindent
6025For example, you may reach the Chinese translating team by writing to
6026@file{zh@@li.org}.  When you become a member of the translating team
6027for your own language, you may subscribe to its list.  For example,
6028Swedish people can send a message to @w{@file{sv-request@@li.org}},
6029having this message body:
6030
6031@example
6032subscribe
6033@end example
6034
6035Keep in mind that team members should be interested in @emph{working}
6036at translations, or at solving translational difficulties, rather than
6037merely lurking around.  If your team does not exist yet and you want to
6038start one, please write to @w{@file{translation@@iro.umontreal.ca}};
6039you will then reach the coordinator for all translator teams.
6040
6041A handful of GNU packages have already been adapted and provided
6042with message translations for several languages.  Translation
6043teams have begun to organize, using these packages as a starting
6044point.  But there are many more packages and many languages for
6045which we have no volunteer translators.  If you would like to
6046volunteer to work at translating messages, please send mail to
6047@file{translation@@iro.umontreal.ca} indicating what language(s)
6048you can work on.
6049
6050@node Trans Intro 1, Discussions, Trans Intro 0, Translators
6051@section Introduction 1
6052
6053@strong{ NOTE: } This documentation section is outdated and needs to be
6054revised.
6055
6056This is now official, GNU is going international!  Here is the
6057announcement submitted for the January 1995 GNU Bulletin:
6058
6059@quotation
6060A handful of GNU packages have already been adapted and provided
6061with message translations for several languages.  Translation
6062teams have begun to organize, using these packages as a starting
6063point.  But there are many more packages and many languages
6064for which we have no volunteer translators.  If you'd like to
6065volunteer to work at translating messages, please send mail to
6066@samp{translation@@iro.umontreal.ca} indicating what language(s)
6067you can work on.
6068@end quotation
6069
6070This document should answer many questions for those who are curious about
6071the process or would like to contribute.  Please at least skim over it,
6072hoping to cut down a little of the high volume of e-mail generated by this
6073collective effort towards internationalization of free software.
6074
6075Most free programming which is widely shared is done in English, and
6076currently, English is used as the main communicating language between
6077national communities collaborating to free software.  This very document
6078is written in English.  This will not change in the foreseeable future.
6079
6080However, there is a strong appetite from national communities for
6081having more software able to write using national language and habits,
6082and there is an on-going effort to modify free software in such a way
6083that it becomes able to do so.  The experiments driven so far raised
6084an enthusiastic response from pretesters, so we believe that
6085internationalization of free software is dedicated to succeed.
6086
6087For suggestion clarifications, additions or corrections to this
6088document, please e-mail to @file{translation@@iro.umontreal.ca}.
6089
6090@node Discussions, Organization, Trans Intro 1, Translators
6091@section Discussions
6092
6093@strong{ NOTE: } This documentation section is outdated and needs to be
6094revised.
6095
6096Facing this internationalization effort, a few users expressed their
6097concerns.  Some of these doubts are presented and discussed, here.
6098
6099@itemize @bullet
6100@item Smaller groups
6101
6102Some languages are not spoken by a very large number of people, so people
6103speaking them sometimes consider that there may not be all that much
6104demand such versions of free software packages.  Moreover, many people
6105being @emph{into computers}, in some countries, generally seem to prefer
6106English versions of their software.
6107
6108On the other end, people might enjoy their own language a lot, and be
6109very motivated at providing to themselves the pleasure of having their
6110beloved free software speaking their mother tongue.  They do themselves
6111a personal favor, and do not pay that much attention to the number of
6112people benefiting of their work.
6113
6114@item Misinterpretation
6115
6116Other users are shy to push forward their own language, seeing in this
6117some kind of misplaced propaganda.  Someone thought there must be some
6118users of the language over the networks pestering other people with it.
6119
6120But any spoken language is worth localization, because there are
6121people behind the language for whom the language is important and
6122dear to their hearts.
6123
6124@item Odd translations
6125
6126The biggest problem is to find the right translations so that
6127everybody can understand the messages.  Translations are usually a
6128little odd.  Some people get used to English, to the extent they may
6129find translations into their own language ``rather pushy, obnoxious
6130and sometimes even hilarious.''  As a French speaking man, I have
6131the experience of those instruction manuals for goods, so poorly
6132translated in French in Korea or Taiwan@dots{}
6133
6134The fact is that we sometimes have to create a kind of national
6135computer culture, and this is not easy without the collaboration of
6136many people liking their mother tongue.  This is why translations are
6137better achieved by people knowing and loving their own language, and
6138ready to work together at improving the results they obtain.
6139
6140@item Dependencies over the GPL or LGPL
6141
6142Some people wonder if using GNU @code{gettext} necessarily brings their
6143package under the protective wing of the GNU General Public License or
6144the GNU Library General Public License, when they do not want to make
6145their program free, or want other kinds of freedom.  The simplest
6146answer is ``normally not''.
6147
6148The @code{gettext-runtime} part of GNU @code{gettext}, i.e.@: the
6149contents of @code{libintl}, is covered by the GNU Library General Public
6150License.  The @code{gettext-tools} part of GNU @code{gettext}, i.e.@: the
6151rest of the GNU @code{gettext} package, is covered by the GNU General
6152Public License.
6153
6154The mere marking of localizable strings in a package, or conditional
6155inclusion of a few lines for initialization, is not really including
6156GPL'ed or LGPL'ed code.  However, since the localization routines in
6157@code{libintl} are under the LGPL, the LGPL needs to be considered.
6158It gives the right to distribute the complete unmodified source of
6159@code{libintl} even with non-free programs.  It also gives the right
6160to use @code{libintl} as a shared library, even for non-free programs.
6161But it gives the right to use @code{libintl} as a static library or
6162to incorporate @code{libintl} into another library only to free
6163software.
6164
6165@end itemize
6166
6167@node Organization, Information Flow, Discussions, Translators
6168@section Organization
6169
6170@strong{ NOTE: } This documentation section is outdated and needs to be
6171revised.
6172
6173On a larger scale, the true solution would be to organize some kind of
6174fairly precise set up in which volunteers could participate.  I gave
6175some thought to this idea lately, and realize there will be some
6176touchy points.  I thought of writing to Richard Stallman to launch
6177such a project, but feel it might be good to shake out the ideas
6178between ourselves first.  Most probably that Linux International has
6179some experience in the field already, or would like to orchestrate
6180the volunteer work, maybe.  Food for thought, in any case!
6181
6182I guess we have to setup something early, somehow, that will help
6183many possible contributors of the same language to interlock and avoid
6184work duplication, and further be put in contact for solving together
6185problems particular to their tongue (in most languages, there are many
6186difficulties peculiar to translating technical English).  My Swedish
6187contributor acknowledged these difficulties, and I'm well aware of
6188them for French.
6189
6190This is surely not a technical issue, but we should manage so the
6191effort of locale contributors be maximally useful, despite the national
6192team layer interface between contributors and maintainers.
6193
6194The Translation Project needs some setup for coordinating language
6195coordinators.  Localizing evolving programs will surely
6196become a permanent and continuous activity in the free software community,
6197once well started.
6198The setup should be minimally completed and tested before GNU
6199@code{gettext} becomes an official reality.  The e-mail address
6200@file{translation@@iro.umontreal.ca} has been setup for receiving
6201offers from volunteers and general e-mail on these topics.  This address
6202reaches the Translation Project coordinator.
6203
6204@menu
6205* Central Coordination::        Central Coordination
6206* National Teams::              National Teams
6207* Mailing Lists::               Mailing Lists
6208@end menu
6209
6210@node Central Coordination, National Teams, Organization, Organization
6211@subsection Central Coordination
6212
6213I also think GNU will need sooner than it thinks, that someone setup
6214a way to organize and coordinate these groups.  Some kind of group
6215of groups.  My opinion is that it would be good that GNU delegates
6216this task to a small group of collaborating volunteers, shortly.
6217Perhaps in @file{gnu.announce} a list of this national committee's
6218can be published.
6219
6220My role as coordinator would simply be to refer to Ulrich any German
6221speaking volunteer interested to localization of free software packages, and
6222maybe helping national groups to initially organize, while maintaining
6223national registries for until national groups are ready to take over.
6224In fact, the coordinator should ease volunteers to get in contact with
6225one another for creating national teams, which should then select
6226one coordinator per language, or country (regionalized language).
6227If well done, the coordination should be useful without being an
6228overwhelming task, the time to put delegations in place.
6229
6230@node National Teams, Mailing Lists, Central Coordination, Organization
6231@subsection National Teams
6232
6233I suggest we look for volunteer coordinators/editors for individual
6234languages.  These people will scan contributions of translation files
6235for various programs, for their own languages, and will ensure high
6236and uniform standards of diction.
6237
6238From my current experience with other people in these days, those who
6239provide localizations are very enthusiastic about the process, and are
6240more interested in the localization process than in the program they
6241localize, and want to do many programs, not just one.  This seems
6242to confirm that having a coordinator/editor for each language is a
6243good idea.
6244
6245We need to choose someone who is good at writing clear and concise
6246prose in the language in question.  That is hard---we can't check
6247it ourselves.  So we need to ask a few people to judge each others'
6248writing and select the one who is best.
6249
6250I announce my prerelease to a few dozen people, and you would not
6251believe all the discussions it generated already.  I shudder to think
6252what will happen when this will be launched, for true, officially,
6253world wide.  Who am I to arbitrate between two Czekolsovak users
6254contradicting each other, for example?
6255
6256I assume that your German is not much better than my French so that
6257I would not be able to judge about these formulations.  What I would
6258suggest is that for each language there is a group for people who
6259maintain the PO files and judge about changes.  I suspect there will
6260be cultural differences between how such groups of people will behave.
6261Some will have relaxed ways, reach consensus easily, and have anyone
6262of the group relate to the maintainers, while others will fight to
6263death, organize heavy administrations up to national standards, and
6264use strict channels.
6265
6266The German team is putting out a good example.  Right now, they are
6267maybe half a dozen people revising translations of each other and
6268discussing the linguistic issues.  I do not even have all the names.
6269Ulrich Drepper is taking care of coordinating the German team.
6270He subscribed to all my pretest lists, so I do not even have to warn
6271him specifically of incoming releases.
6272
6273I'm sure, that is a good idea to get teams for each language working
6274on translations.  That will make the translations better and more
6275consistent.
6276
6277@menu
6278* Sub-Cultures::                Sub-Cultures
6279* Organizational Ideas::        Organizational Ideas
6280@end menu
6281
6282@node Sub-Cultures, Organizational Ideas, National Teams, National Teams
6283@subsubsection Sub-Cultures
6284
6285Taking French for example, there are a few sub-cultures around computers
6286which developed diverging vocabularies.  Picking volunteers here and
6287there without addressing this problem in an organized way, soon in the
6288project, might produce a distasteful mix of internationalized programs,
6289and possibly trigger endless quarrels among those who really care.
6290
6291Keeping some kind of unity in the way French localization of
6292internationalized programs is achieved is a difficult (and delicate) job.
6293Knowing the latin character of French people (:-), if we take this
6294the wrong way, we could end up nowhere, or spoil a lot of energies.
6295Maybe we should begin to address this problem seriously @emph{before}
6296GNU @code{gettext} become officially published.  And I suspect that this
6297means soon!
6298
6299@node Organizational Ideas,  , Sub-Cultures, National Teams
6300@subsubsection Organizational Ideas
6301
6302I expect the next big changes after the official release.  Please note
6303that I use the German translation of the short GPL message.  We need
6304to set a few good examples before the localization goes out for true
6305in the free software community.  Here are a few points to discuss:
6306
6307@itemize @bullet
6308@item
6309Each group should have one FTP server (at least one master).
6310
6311@item
6312The files on the server should reflect the latest version (of
6313course!) and it should also contain a RCS directory with the
6314corresponding archives (I don't have this now).
6315
6316@item
6317There should also be a ChangeLog file (this is more useful than the
6318RCS archive but can be generated automatically from the later by
6319Emacs).
6320
6321@item
6322A @dfn{core group} should judge about questionable changes (for now
6323this group consists solely by me but I ask some others occasionally;
6324this also seems to work).
6325
6326@end itemize
6327
6328@node Mailing Lists,  , National Teams, Organization
6329@subsection Mailing Lists
6330
6331If we get any inquiries about GNU @code{gettext}, send them on to:
6332
6333@example
6334@file{translation@@iro.umontreal.ca}
6335@end example
6336
6337The @file{*-pretest} lists are quite useful to me, maybe the idea could
6338be generalized to many GNU, and non-GNU packages.  But each maintainer
6339his/her way!
6340
6341Fran@,{c}ois, we have a mechanism in place here at
6342@file{gnu.ai.mit.edu} to track teams, support mailing lists for
6343them and log members.  We have a slight preference that you use it.
6344If this is OK with you, I can get you clued in.
6345
6346Things are changing!  A few years ago, when Daniel Fekete and I
6347asked for a mailing list for GNU localization, nested at the FSF, we
6348were politely invited to organize it anywhere else, and so did we.
6349For communicating with my pretesters, I later made a handful of
6350mailing lists located at iro.umontreal.ca and administrated by
6351@code{majordomo}.  These lists have been @emph{very} dependable
6352so far@dots{}
6353
6354I suspect that the German team will organize itself a mailing list
6355located in Germany, and so forth for other countries.  But before they
6356organize for true, it could surely be useful to offer mailing lists
6357located at the FSF to each national team.  So yes, please explain me
6358how I should proceed to create and handle them.
6359
6360We should create temporary mailing lists, one per country, to help
6361people organize.  Temporary, because once regrouped and structured, it
6362would be fair the volunteers from country bring back @emph{their} list
6363in there and manage it as they want.  My feeling is that, in the long
6364run, each team should run its own list, from within their country.
6365There also should be some central list to which all teams could
6366subscribe as they see fit, as long as each team is represented in it.
6367
6368@node Information Flow, Prioritizing messages, Organization, Translators
6369@section Information Flow
6370
6371@strong{ NOTE: } This documentation section is outdated and needs to be
6372revised.
6373
6374There will surely be some discussion about this messages after the
6375packages are finally released.  If people now send you some proposals
6376for better messages, how do you proceed?  Jim, please note that
6377right now, as I put forward nearly a dozen of localizable programs, I
6378receive both the translations and the coordination concerns about them.
6379
6380If I put one of my things to pretest, Ulrich receives the announcement
6381and passes it on to the German team, who make last minute revisions.
6382Then he submits the translation files to me @emph{as the maintainer}.
6383For free packages I do not maintain, I would not even hear about it.
6384This scheme could be made to work for the whole Translation Project,
6385I think.  For security reasons, maybe Ulrich (national coordinators,
6386in fact) should update central registry kept at the Translation Project
6387(Jim, me, or Len's recruits) once in a while.
6388
6389In December/January, I was aggressively ready to internationalize
6390all of GNU, giving myself the duty of one small GNU package per week
6391or so, taking many weeks or months for bigger packages.  But it does
6392not work this way.  I first did all the things I'm responsible for.
6393I've nothing against some missionary work on other maintainers, but
6394I'm also loosing a lot of energy over it---same debates over again.
6395
6396And when the first localized packages are released we'll get a lot of
6397responses about ugly translations :-).  Surely, and we need to have
6398beforehand a fairly good idea about how to handle the information
6399flow between the national teams and the package maintainers.
6400
6401Please start saving somewhere a quick history of each PO file.  I know
6402for sure that the file format will change, allowing for comments.
6403It would be nice that each file has a kind of log, and references for
6404those who want to submit comments or gripes, or otherwise contribute.
6405I sent a proposal for a fast and flexible format, but it is not
6406receiving acceptance yet by the GNU deciders.  I'll tell you when I
6407have more information about this.
6408
6409@node Prioritizing messages,  , Information Flow, Translators
6410@section Prioritizing messages: How to determine which messages to translate first
6411
6412A translator sometimes has only a limited amount of time per week to
6413spend on a package, and some packages have quite large message catalogs
6414(over 1000 messages).  Therefore she wishes to translate the messages
6415first that are the most visible to the user, or that occur most frequently.
6416This section describes how to determine these "most urgent" messages.
6417It also applies to determine the "next most urgent" messages after the
6418message catalog has already been partially translated.
6419
6420In a first step, she uses the programs like a user would do.  While she
6421does this, the GNU @code{gettext} library logs into a file the not yet
6422translated messages for which a translation was requested from the program.
6423
6424In a second step, she uses the PO mode to translate precisely this set
6425of messages.
6426
6427@vindex GETTEXT_LOG_UNTRANSLATED@r{, environment variable}
6428Here a more details.  The GNU @code{libintl} library (but not the
6429corresponding functions in GNU @code{libc}) supports an environment variable
6430@code{GETTEXT_LOG_UNTRANSLATED}.  The GNU @code{libintl} library will
6431log into this file the messages for which @code{gettext()} and related
6432functions couldn't find the translation.  If the file doesn't exist, it
6433will be created as needed.  On systems with GNU @code{libc} a shared library
6434@samp{preloadable_libintl.so} is provided that can be used with the ELF
6435@samp{LD_PRELOAD} mechanism.
6436
6437So, in the first step, the translator uses these commands on systems with
6438GNU @code{libc}:
6439
6440@smallexample
6441$ LD_PRELOAD=/usr/local/lib/preloadable_libintl.so
6442$ export LD_PRELOAD
6443$ GETTEXT_LOG_UNTRANSLATED=$HOME/gettextlogused
6444$ export GETTEXT_LOG_UNTRANSLATED
6445@end smallexample
6446
6447@noindent
6448and these commands on other systems:
6449
6450@smallexample
6451$ GETTEXT_LOG_UNTRANSLATED=$HOME/gettextlogused
6452$ export GETTEXT_LOG_UNTRANSLATED
6453@end smallexample
6454
6455Then she uses and peruses the programs.  (It is a good and recommended
6456practice to use the programs for which you provide translations: it
6457gives you the needed context.)  When done, she removes the environment
6458variables:
6459
6460@smallexample
6461$ unset LD_PRELOAD
6462$ unset GETTEXT_LOG_UNTRANSLATED
6463@end smallexample
6464
6465The second step starts with removing duplicates:
6466
6467@smallexample
6468$ msguniq $HOME/gettextlogused > missing.po
6469@end smallexample
6470
6471The result is a PO file, but needs some preprocessing before a PO file editor
6472can be used with it.  First, it is a multi-domain PO file, containing
6473messages from many translation domains.  Second, it lacks all translator
6474comments and source references.  Here is how to get a list of the affected
6475translation domains:
6476
6477@smallexample
6478$ sed -n -e 's,^domain "\(.*\)"$,\1,p' < missing.po | sort | uniq
6479@end smallexample
6480
6481Then the translator can handle the domains one by one.  For simplicity,
6482let's use environment variables to denote the language, domain and source
6483package.
6484
6485@smallexample
6486$ lang=nl             # your language
6487$ domain=coreutils    # the name of the domain to be handled
6488$ package=/usr/src/gnu/coreutils-4.5.4   # the package where it comes from
6489@end smallexample
6490
6491She takes the latest copy of @file{$lang.po} from the Translation Project,
6492or from the package (in most cases, @file{$package/po/$lang.po}), or
6493creates a fresh one if she's the first translator (see @ref{Creating}).
6494She then uses the following commands to mark the not urgent messages as
6495"obsolete".  (This doesn't mean that these messages - translated and
6496untranslated ones - will go away.  It simply means that the PO file editor
6497will ignore them in the following editing session.)
6498
6499@smallexample
6500$ msggrep --domain=$domain missing.po | grep -v '^domain' \
6501  > $domain-missing.po
6502$ msgattrib --set-obsolete --ignore-file $domain-missing.po $domain.$lang.po \
6503  > $domain.$lang-urgent.po
6504@end smallexample
6505
6506The she translates @file{$domain.$lang-urgent.po} by use of a PO file editor
6507(@pxref{Editing}).
6508(FIXME: I don't know whether @code{KBabel} and @code{gtranslator} also
6509preserve obsolete messages, as they should.)
6510Finally she restores the not urgent messages (with their earlier
6511translations, for those which were already translated) through this command:
6512
6513@smallexample
6514$ msgmerge --no-fuzzy-matching $domain.$lang-urgent.po $package/po/$domain.pot \
6515  > $domain.$lang.po
6516@end smallexample
6517
6518Then she can submit @file{$domain.$lang.po} and proceed to the next domain.
6519
6520@node Maintainers, Installers, Translators, Top
6521@chapter The Maintainer's View
6522@cindex package maintainer's view of @code{gettext}
6523
6524The maintainer of a package has many responsibilities.  One of them
6525is ensuring that the package will install easily on many platforms,
6526and that the magic we described earlier (@pxref{Users}) will work
6527for installers and end users.
6528
6529Of course, there are many possible ways by which GNU @code{gettext}
6530might be integrated in a distribution, and this chapter does not cover
6531them in all generality.  Instead, it details one possible approach which
6532is especially adequate for many free software distributions following GNU
6533standards, or even better, Gnits standards, because GNU @code{gettext}
6534is purposely for helping the internationalization of the whole GNU
6535project, and as many other good free packages as possible.  So, the
6536maintainer's view presented here presumes that the package already has
6537a @file{configure.in} file and uses GNU Autoconf.
6538
6539Nevertheless, GNU @code{gettext} may surely be useful for free packages
6540not following GNU standards and conventions, but the maintainers of such
6541packages might have to show imagination and initiative in organizing
6542their distributions so @code{gettext} work for them in all situations.
6543There are surely many, out there.
6544
6545Even if @code{gettext} methods are now stabilizing, slight adjustments
6546might be needed between successive @code{gettext} versions, so you
6547should ideally revise this chapter in subsequent releases, looking
6548for changes.
6549
6550@menu
6551* Flat and Non-Flat::           Flat or Non-Flat Directory Structures
6552* Prerequisites::               Prerequisite Works
6553* gettextize Invocation::       Invoking the @code{gettextize} Program
6554* Adjusting Files::             Files You Must Create or Alter
6555* autoconf macros::             Autoconf macros for use in @file{configure.in}
6556* CVS Issues::                  Integrating with CVS
6557* Release Management::          Creating a Distribution Tarball
6558@end menu
6559
6560@node Flat and Non-Flat, Prerequisites, Maintainers, Maintainers
6561@section Flat or Non-Flat Directory Structures
6562
6563Some free software packages are distributed as @code{tar} files which unpack
6564in a single directory, these are said to be @dfn{flat} distributions.
6565Other free software packages have a one level hierarchy of subdirectories, using
6566for example a subdirectory named @file{doc/} for the Texinfo manual and
6567man pages, another called @file{lib/} for holding functions meant to
6568replace or complement C libraries, and a subdirectory @file{src/} for
6569holding the proper sources for the package.  These other distributions
6570are said to be @dfn{non-flat}.
6571
6572We cannot say much about flat distributions.  A flat
6573directory structure has the disadvantage of increasing the difficulty
6574of updating to a new version of GNU @code{gettext}.  Also, if you have
6575many PO files, this could somewhat pollute your single directory.
6576Also, GNU @code{gettext}'s libintl sources consist of C sources, shell
6577scripts, @code{sed} scripts and complicated Makefile rules, which don't
6578fit well into an existing flat structure.  For these reasons, we
6579recommend to use non-flat approach in this case as well.
6580
6581Maybe because GNU @code{gettext} itself has a non-flat structure,
6582we have more experience with this approach, and this is what will be
6583described in the remaining of this chapter.  Some maintainers might
6584use this as an opportunity to unflatten their package structure.
6585
6586@node Prerequisites, gettextize Invocation, Flat and Non-Flat, Maintainers
6587@section Prerequisite Works
6588@cindex converting a package to use @code{gettext}
6589@cindex migration from earlier versions of @code{gettext}
6590@cindex upgrading to new versions of @code{gettext}
6591
6592There are some works which are required for using GNU @code{gettext}
6593in one of your package.  These works have some kind of generality
6594that escape the point by point descriptions used in the remainder
6595of this chapter.  So, we describe them here.
6596
6597@itemize @bullet
6598@item
6599Before attempting to use @code{gettextize} you should install some
6600other packages first.
6601Ensure that recent versions of GNU @code{m4}, GNU Autoconf and GNU
6602@code{gettext} are already installed at your site, and if not, proceed
6603to do this first.  If you get to install these things, beware that
6604GNU @code{m4} must be fully installed before GNU Autoconf is even
6605@emph{configured}.
6606
6607To further ease the task of a package maintainer the @code{automake}
6608package was designed and implemented.  GNU @code{gettext} now uses this
6609tool and the @file{Makefile}s in the @file{intl/} and @file{po/}
6610therefore know about all the goals necessary for using @code{automake}
6611and @file{libintl} in one project.
6612
6613Those four packages are only needed by you, as a maintainer; the
6614installers of your own package and end users do not really need any of
6615GNU @code{m4}, GNU Autoconf, GNU @code{gettext}, or GNU @code{automake}
6616for successfully installing and running your package, with messages
6617properly translated.  But this is not completely true if you provide
6618internationalized shell scripts within your own package: GNU
6619@code{gettext} shall then be installed at the user site if the end users
6620want to see the translation of shell script messages.
6621
6622@item
6623Your package should use Autoconf and have a @file{configure.in} or
6624@file{configure.ac} file.
6625If it does not, you have to learn how.  The Autoconf documentation
6626is quite well written, it is a good idea that you print it and get
6627familiar with it.
6628
6629@item
6630Your C sources should have already been modified according to
6631instructions given earlier in this manual.  @xref{Sources}.
6632
6633@item
6634Your @file{po/} directory should receive all PO files submitted to you
6635by the translator teams, each having @file{@var{ll}.po} as a name.
6636This is not usually easy to get translation
6637work done before your package gets internationalized and available!
6638Since the cycle has to start somewhere, the easiest for the maintainer
6639is to start with absolutely no PO files, and wait until various
6640translator teams get interested in your package, and submit PO files.
6641
6642@end itemize
6643
6644It is worth adding here a few words about how the maintainer should
6645ideally behave with PO files submissions.  As a maintainer, your role is
6646to authenticate the origin of the submission as being the representative
6647of the appropriate translating teams of the Translation Project (forward
6648the submission to @file{translation@@iro.umontreal.ca} in case of doubt),
6649to ensure that the PO file format is not severely broken and does not
6650prevent successful installation, and for the rest, to merely put these
6651PO files in @file{po/} for distribution.
6652
6653As a maintainer, you do not have to take on your shoulders the
6654responsibility of checking if the translations are adequate or
6655complete, and should avoid diving into linguistic matters.  Translation
6656teams drive themselves and are fully responsible of their linguistic
6657choices for the Translation Project.  Keep in mind that translator teams are @emph{not}
6658driven by maintainers.  You can help by carefully redirecting all
6659communications and reports from users about linguistic matters to the
6660appropriate translation team, or explain users how to reach or join
6661their team.  The simplest might be to send them the @file{ABOUT-NLS} file.
6662
6663Maintainers should @emph{never ever} apply PO file bug reports
6664themselves, short-cutting translation teams.  If some translator has
6665difficulty to get some of her points through her team, it should not be
6666an option for her to directly negotiate translations with maintainers.
6667Teams ought to settle their problems themselves, if any.  If you, as
6668a maintainer, ever think there is a real problem with a team, please
6669never try to @emph{solve} a team's problem on your own.
6670
6671@node gettextize Invocation, Adjusting Files, Prerequisites, Maintainers
6672@section Invoking the @code{gettextize} Program
6673
6674@include gettextize.texi
6675
6676@node Adjusting Files, autoconf macros, gettextize Invocation, Maintainers
6677@section Files You Must Create or Alter
6678@cindex @code{gettext} files
6679
6680Besides files which are automatically added through @code{gettextize},
6681there are many files needing revision for properly interacting with
6682GNU @code{gettext}.  If you are closely following GNU standards for
6683Makefile engineering and auto-configuration, the adaptations should
6684be easier to achieve.  Here is a point by point description of the
6685changes needed in each.
6686
6687So, here comes a list of files, each one followed by a description of
6688all alterations it needs.  Many examples are taken out from the GNU
6689@code{gettext} @value{VERSION} distribution itself, or from the GNU
6690@code{hello} distribution (@uref{http://www.franken.de/users/gnu/ke/hello}
6691or @uref{http://www.gnu.franken.de/ke/hello/})  You may indeed
6692refer to the source code of the GNU @code{gettext} and GNU @code{hello}
6693packages, as they are intended to be good examples for using GNU
6694gettext functionality.
6695
6696@menu
6697* po/POTFILES.in::              @file{POTFILES.in} in @file{po/}
6698* po/LINGUAS::                  @file{LINGUAS} in @file{po/}
6699* po/Makevars::                 @file{Makevars} in @file{po/}
6700* po/Rules-*::                  Extending @file{Makefile} in @file{po/}
6701* configure.in::                @file{configure.in} at top level
6702* config.guess::                @file{config.guess}, @file{config.sub} at top level
6703* mkinstalldirs::               @file{mkinstalldirs} at top level
6704* aclocal::                     @file{aclocal.m4} at top level
6705* acconfig::                    @file{acconfig.h} at top level
6706* config.h.in::                 @file{config.h.in} at top level
6707* Makefile::                    @file{Makefile.in} at top level
6708* src/Makefile::                @file{Makefile.in} in @file{src/}
6709* lib/gettext.h::               @file{gettext.h} in @file{lib/}
6710@end menu
6711
6712@node po/POTFILES.in, po/LINGUAS, Adjusting Files, Adjusting Files
6713@subsection @file{POTFILES.in} in @file{po/}
6714@cindex @file{POTFILES.in} file
6715
6716The @file{po/} directory should receive a file named
6717@file{POTFILES.in}.  This file tells which files, among all program
6718sources, have marked strings needing translation.  Here is an example
6719of such a file:
6720
6721@example
6722@group
6723# List of source files containing translatable strings.
6724# Copyright (C) 1995 Free Software Foundation, Inc.
6725
6726# Common library files
6727lib/error.c
6728lib/getopt.c
6729lib/xmalloc.c
6730
6731# Package source files
6732src/gettext.c
6733src/msgfmt.c
6734src/xgettext.c
6735@end group
6736@end example
6737
6738@noindent
6739Hash-marked comments and white lines are ignored.  All other lines
6740list those source files containing strings marked for translation
6741(@pxref{Mark Keywords}), in a notation relative to the top level
6742of your whole distribution, rather than the location of the
6743@file{POTFILES.in} file itself.
6744
6745When a C file is automatically generated by a tool, like @code{flex} or
6746@code{bison}, that doesn't introduce translatable strings by itself,
6747it is recommended to list in @file{po/POTFILES.in} the real source file
6748(ending in @file{.l} in the case of @code{flex}, or in @file{.y} in the
6749case of @code{bison}), not the generated C file.
6750
6751@node po/LINGUAS, po/Makevars, po/POTFILES.in, Adjusting Files
6752@subsection @file{LINGUAS} in @file{po/}
6753@cindex @file{LINGUAS} file
6754
6755The @file{po/} directory should also receive a file named
6756@file{LINGUAS}.  This file contains the list of available translations.
6757It is a whitespace separated list.  Hash-marked comments and white lines
6758are ignored.  Here is an example file:
6759
6760@example
6761@group
6762# Set of available languages.
6763de fr
6764@end group
6765@end example
6766
6767@noindent
6768This example means that German and French PO files are available, so
6769that these languages are currently supported by your package.  If you
6770want to further restrict, at installation time, the set of installed
6771languages, this should not be done by modifying the @file{LINGUAS} file,
6772but rather by using the @code{LINGUAS} environment variable
6773(@pxref{Installers}).
6774
6775It is recommended that you add the "languages" @samp{en@@quot} and
6776@samp{en@@boldquot} to the @code{LINGUAS} file.  @code{en@@quot} is a
6777variant of English message catalogs (@code{en}) which uses real quotation
6778marks instead of the ugly looking asymmetric ASCII substitutes @samp{`}
6779and @samp{'}.  @code{en@@boldquot} is a variant of @code{en@@quot} that
6780additionally outputs quoted pieces of text in a bold font, when used in
6781a terminal emulator which supports the VT100 escape sequences (such as
6782@code{xterm} or the Linux console, but not Emacs in @kbd{M-x shell} mode).
6783
6784These extra message catalogs @samp{en@@quot} and @samp{en@@boldquot}
6785are constructed automatically, not by translators; to support them, you
6786need the files @file{Rules-quot}, @file{quot.sed}, @file{boldquot.sed},
6787@file{en@@quot.header}, @file{en@@boldquot.header}, @file{insert-header.sin}
6788in the @file{po/} directory.  You can copy them from GNU gettext's @file{po/}
6789directory; they are also installed by running @code{gettextize}.
6790
6791@node po/Makevars, po/Rules-*, po/LINGUAS, Adjusting Files
6792@subsection @file{Makevars} in @file{po/}
6793@cindex @file{Makevars} file
6794
6795The @file{po/} directory also has a file named @file{Makevars}.  It
6796contains variables that are specific to your project.  @file{po/Makevars}
6797gets inserted into the @file{po/Makefile} when the latter is created.
6798The variables thus take effect when the POT file is created or updated,
6799and when the message catalogs get installed.
6800
6801The first three variables can be left unmodified if your package has a
6802single message domain and, accordingly, a single @file{po/} directory.
6803Only packages which have multiple @file{po/} directories at different
6804locations need to adjust the three first variables defined in
6805@file{Makevars}.
6806
6807@node po/Rules-*, configure.in, po/Makevars, Adjusting Files
6808@subsection Extending @file{Makefile} in @file{po/}
6809@cindex @file{Makefile.in.in} extensions
6810
6811All files called @file{Rules-*} in the @file{po/} directory get appended to
6812the @file{po/Makefile} when it is created.  They present an opportunity to
6813add rules for special PO files to the Makefile, without needing to mess
6814with @file{po/Makefile.in.in}.
6815
6816@cindex quotation marks
6817@vindex LANGUAGE@r{, environment variable}
6818GNU gettext comes with a @file{Rules-quot} file, containing rules for
6819building catalogs @file{en@@quot.po} and @file{en@@boldquot.po}.  The
6820effect of @file{en@@quot.po} is that people who set their @code{LANGUAGE}
6821environment variable to @samp{en@@quot} will get messages with proper
6822looking symmetric Unicode quotation marks instead of abusing the ASCII
6823grave accent and the ASCII apostrophe for indicating quotations.  To
6824enable this catalog, simply add @code{en@@quot} to the @file{po/LINGUAS}
6825file.  The effect of @file{en@@boldquot.po} is that people who set
6826@code{LANGUAGE} to @samp{en@@boldquot} will get not only proper quotation
6827marks, but also the quoted text will be shown in a bold font on terminals
6828and consoles.  This catalog is useful only for command-line programs, not
6829GUI programs.  To enable it, similarly add @code{en@@boldquot} to the
6830@file{po/LINGUAS} file.
6831
6832Similarly, you can create rules for building message catalogs for the
6833@file{sr@@latin} locale -- Serbian written with the Latin alphabet --
6834from those for the @file{sr} locale -- Serbian written with Cyrillic
6835letters.  See @ref{msgfilter Invocation}.
6836
6837@node configure.in, config.guess, po/Rules-*, Adjusting Files
6838@subsection @file{configure.in} at top level
6839
6840@file{configure.in} or @file{configure.ac} - this is the source from which
6841@code{autoconf} generates the @file{configure} script.
6842
6843@enumerate
6844@item Declare the package and version.
6845@cindex package and version declaration in @file{configure.in}
6846
6847This is done by a set of lines like these:
6848
6849@example
6850PACKAGE=gettext
6851VERSION=@value{VERSION}
6852AC_DEFINE_UNQUOTED(PACKAGE, "$PACKAGE")
6853AC_DEFINE_UNQUOTED(VERSION, "$VERSION")
6854AC_SUBST(PACKAGE)
6855AC_SUBST(VERSION)
6856@end example
6857
6858@noindent
6859or, if you are using GNU @code{automake}, by a line like this:
6860
6861@example
6862AM_INIT_AUTOMAKE(gettext, @value{VERSION})
6863@end example
6864
6865@noindent
6866Of course, you replace @samp{gettext} with the name of your package,
6867and @samp{@value{VERSION}} by its version numbers, exactly as they
6868should appear in the packaged @code{tar} file name of your distribution
6869(@file{gettext-@value{VERSION}.tar.gz}, here).
6870
6871@item Check for internationalization support.
6872
6873Here is the main @code{m4} macro for triggering internationalization
6874support.  Just add this line to @file{configure.in}:
6875
6876@example
6877AM_GNU_GETTEXT
6878@end example
6879
6880@noindent
6881This call is purposely simple, even if it generates a lot of configure
6882time checking and actions.
6883
6884If you have suppressed the @file{intl/} subdirectory by calling
6885@code{gettextize} without @samp{--intl} option, this call should read
6886
6887@example
6888AM_GNU_GETTEXT([external])
6889@end example
6890
6891@item Have output files created.
6892
6893The @code{AC_OUTPUT} directive, at the end of your @file{configure.in}
6894file, needs to be modified in two ways:
6895
6896@example
6897AC_OUTPUT([@var{existing configuration files} intl/Makefile po/Makefile.in],
6898[@var{existing additional actions}])
6899@end example
6900
6901The modification to the first argument to @code{AC_OUTPUT} asks
6902for substitution in the @file{intl/} and @file{po/} directories.
6903Note the @samp{.in} suffix used for @file{po/} only.  This is because
6904the distributed file is really @file{po/Makefile.in.in}.
6905
6906If you have suppressed the @file{intl/} subdirectory by calling
6907@code{gettextize} without @samp{--intl} option, then you don't need to
6908add @code{intl/Makefile} to the @code{AC_OUTPUT} line.
6909
6910@end enumerate
6911
6912If, after doing the recommended modifications, a command like
6913@samp{aclocal -I m4} or @samp{autoconf} or @samp{autoreconf} fails with
6914a trace similar to this:
6915
6916@smallexample
6917configure.ac:44: warning: AC_COMPILE_IFELSE was called before AC_GNU_SOURCE
6918../../lib/autoconf/specific.m4:335: AC_GNU_SOURCE is expanded from...
6919m4/lock.m4:224: gl_LOCK is expanded from...
6920m4/gettext.m4:571: gt_INTL_SUBDIR_CORE is expanded from...
6921m4/gettext.m4:472: AM_INTL_SUBDIR is expanded from...
6922m4/gettext.m4:347: AM_GNU_GETTEXT is expanded from...
6923configure.ac:44: the top level
6924configure.ac:44: warning: AC_RUN_IFELSE was called before AC_GNU_SOURCE
6925@end smallexample
6926
6927@noindent
6928you need to add an explicit invocation of @samp{AC_GNU_SOURCE} in the
6929@file{configure.ac} file - after @samp{AC_PROG_CC} but before
6930@samp{AM_GNU_GETTEXT}, most likely very close to the @samp{AC_PROG_CC}
6931invocation.  This is necessary because of ordering restrictions imposed
6932by GNU autoconf.
6933
6934@node config.guess, mkinstalldirs, configure.in, Adjusting Files
6935@subsection @file{config.guess}, @file{config.sub} at top level
6936
6937If you haven't suppressed the @file{intl/} subdirectory,
6938you need to add the GNU @file{config.guess} and @file{config.sub} files
6939to your distribution.  They are needed because the @file{intl/} directory
6940has platform dependent support for determining the locale's character
6941encoding and therefore needs to identify the platform.
6942
6943You can obtain the newest version of @file{config.guess} and
6944@file{config.sub} from the CVS of the @samp{config} project at
6945@file{http://savannah.gnu.org/}. The commands to fetch them are
6946@smallexample
6947$ wget 'http://savannah.gnu.org/cgi-bin/viewcvs/*checkout*/config/config/config.guess'
6948$ wget 'http://savannah.gnu.org/cgi-bin/viewcvs/*checkout*/config/config/config.sub'
6949@end smallexample
6950@noindent
6951Less recent versions are also contained in the GNU @code{automake} and
6952GNU @code{libtool} packages.
6953
6954Normally, @file{config.guess} and @file{config.sub} are put at the
6955top level of a distribution.  But it is also possible to put them in a
6956subdirectory, altogether with other configuration support files like
6957@file{install-sh}, @file{ltconfig}, @file{ltmain.sh} or @file{missing}.
6958All you need to do, other than moving the files, is to add the following line
6959to your @file{configure.in}.
6960
6961@example
6962AC_CONFIG_AUX_DIR([@var{subdir}])
6963@end example
6964
6965@node mkinstalldirs, aclocal, config.guess, Adjusting Files
6966@subsection @file{mkinstalldirs} at top level
6967@cindex @file{mkinstalldirs} file
6968
6969With earlier versions of GNU gettext, you needed to add the GNU
6970@file{mkinstalldirs} script to your distribution.  This is not needed any
6971more.  You can remove it if you not also using an automake version older than
6972automake 1.9.
6973
6974@node aclocal, acconfig, mkinstalldirs, Adjusting Files
6975@subsection @file{aclocal.m4} at top level
6976@cindex @file{aclocal.m4} file
6977
6978If you do not have an @file{aclocal.m4} file in your distribution,
6979the simplest is to concatenate the files @file{codeset.m4},
6980@file{gettext.m4}, @file{glibc2.m4}, @file{glibc21.m4}, @file{iconv.m4},
6981@file{intdiv0.m4}, @file{intl.m4}, @file{intldir.m4}, @file{intmax.m4},
6982@file{inttypes_h.m4}, @file{inttypes-pri.m4}, @file{lcmessage.m4},
6983@file{lib-ld.m4}, @file{lib-link.m4}, @file{lib-prefix.m4}, @file{lock.m4},
6984@file{longdouble.m4}, @file{longlong.m4}, @file{nls.m4}, @file{po.m4},
6985@file{printf-posix.m4}, @file{progtest.m4}, @file{size_max.m4},
6986@file{stdint_h.m4}, @file{uintmax_t.m4}, @file{ulonglong.m4},
6987@file{visibility.m4}, @file{wchar_t.m4}, @file{wint_t.m4}, @file{xsize.m4}
6988from GNU @code{gettext}'s
6989@file{m4/} directory into a single file.  If you have suppressed the
6990@file{intl/} directory, only @file{gettext.m4}, @file{iconv.m4},
6991@file{lib-ld.m4}, @file{lib-link.m4}, @file{lib-prefix.m4},
6992@file{nls.m4}, @file{po.m4}, @file{progtest.m4} need to be concatenated.
6993
6994If you are not using GNU @code{automake} 1.8 or newer, you will need to
6995add a file @file{mkdirp.m4} from a newer automake distribution to the
6996list of files above.
6997
6998If you already have an @file{aclocal.m4} file, then you will have
6999to merge the said macro files into your @file{aclocal.m4}.  Note that if
7000you are upgrading from a previous release of GNU @code{gettext}, you
7001should most probably @emph{replace} the macros (@code{AM_GNU_GETTEXT},
7002etc.), as they usually
7003change a little from one release of GNU @code{gettext} to the next.
7004Their contents may vary as we get more experience with strange systems
7005out there.
7006
7007If you are using GNU @code{automake} 1.5 or newer, it is enough to put
7008these macro files into a subdirectory named @file{m4/} and add the line
7009
7010@example
7011ACLOCAL_AMFLAGS = -I m4
7012@end example
7013
7014@noindent
7015to your top level @file{Makefile.am}.
7016
7017These macros check for the internationalization support functions
7018and related informations.  Hopefully, once stabilized, these macros
7019might be integrated in the standard Autoconf set, because this
7020piece of @code{m4} code will be the same for all projects using GNU
7021@code{gettext}.
7022
7023@node acconfig, config.h.in, aclocal, Adjusting Files
7024@subsection @file{acconfig.h} at top level
7025@cindex @file{acconfig.h} file
7026
7027Earlier GNU @code{gettext} releases required to put definitions for
7028@code{ENABLE_NLS}, @code{HAVE_GETTEXT} and @code{HAVE_LC_MESSAGES},
7029@code{HAVE_STPCPY}, @code{PACKAGE} and @code{VERSION} into an
7030@file{acconfig.h} file.  This is not needed any more; you can remove
7031them from your @file{acconfig.h} file unless your package uses them
7032independently from the @file{intl/} directory.
7033
7034@node config.h.in, Makefile, acconfig, Adjusting Files
7035@subsection @file{config.h.in} at top level
7036@cindex @file{config.h.in} file
7037
7038The include file template that holds the C macros to be defined by
7039@code{configure} is usually called @file{config.h.in} and may be
7040maintained either manually or automatically.
7041
7042If @code{gettextize} has created an @file{intl/} directory, this file
7043must be called @file{config.h.in} and must be at the top level.  If,
7044however, you have suppressed the @file{intl/} directory by calling
7045@code{gettextize} without @samp{--intl} option, then you can choose the
7046name of this file and its location freely.
7047
7048If it is maintained automatically, by use of the @samp{autoheader}
7049program, you need to do nothing about it.  This is the case in particular
7050if you are using GNU @code{automake}.
7051
7052If it is maintained manually, and if @code{gettextize} has created an
7053@file{intl/} directory, you should switch to using @samp{autoheader}.
7054The list of C macros to be added for the sake of the @file{intl/}
7055directory is just too long to be maintained manually; it also changes
7056between different versions of GNU @code{gettext}.
7057
7058If it is maintained manually, and if on the other hand you have
7059suppressed the @file{intl/} directory by calling @code{gettextize}
7060without @samp{--intl} option, then you can get away by adding the
7061following lines to @file{config.h.in}:
7062
7063@example
7064/* Define to 1 if translation of program messages to the user's
7065   native language is requested. */
7066#undef ENABLE_NLS
7067@end example
7068
7069@node Makefile, src/Makefile, config.h.in, Adjusting Files
7070@subsection @file{Makefile.in} at top level
7071
7072Here are a few modifications you need to make to your main, top-level
7073@file{Makefile.in} file.
7074
7075@enumerate
7076@item
7077Add the following lines near the beginning of your @file{Makefile.in},
7078so the @samp{dist:} goal will work properly (as explained further down):
7079
7080@example
7081PACKAGE = @@PACKAGE@@
7082VERSION = @@VERSION@@
7083@end example
7084
7085@item
7086Add file @file{ABOUT-NLS} to the @code{DISTFILES} definition, so the file gets
7087distributed.
7088
7089@item
7090Wherever you process subdirectories in your @file{Makefile.in}, be sure
7091you also process the subdirectories @samp{intl} and @samp{po}.  Special
7092rules in the @file{Makefiles} take care for the case where no
7093internationalization is wanted.
7094
7095If you are using Makefiles, either generated by automake, or hand-written
7096so they carefully follow the GNU coding standards, the effected goals for
7097which the new subdirectories must be handled include @samp{installdirs},
7098@samp{install}, @samp{uninstall}, @samp{clean}, @samp{distclean}.
7099
7100Here is an example of a canonical order of processing.  In this
7101example, we also define @code{SUBDIRS} in @code{Makefile.in} for it
7102to be further used in the @samp{dist:} goal.
7103
7104@example
7105SUBDIRS = doc intl lib src po
7106@end example
7107
7108Note that you must arrange for @samp{make} to descend into the
7109@code{intl} directory before descending into other directories containing
7110code which make use of the @code{libintl.h} header file.  For this
7111reason, here we mention @code{intl} before @code{lib} and @code{src}.
7112
7113@item
7114A delicate point is the @samp{dist:} goal, as both
7115@file{intl/Makefile} and @file{po/Makefile} will later assume that the
7116proper directory has been set up from the main @file{Makefile}.  Here is
7117an example at what the @samp{dist:} goal might look like:
7118
7119@example
7120distdir = $(PACKAGE)-$(VERSION)
7121dist: Makefile
7122	rm -fr $(distdir)
7123	mkdir $(distdir)
7124	chmod 777 $(distdir)
7125	for file in $(DISTFILES); do \
7126	  ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir); \
7127	done
7128	for subdir in $(SUBDIRS); do \
7129	  mkdir $(distdir)/$$subdir || exit 1; \
7130	  chmod 777 $(distdir)/$$subdir; \
7131	  (cd $$subdir && $(MAKE) $@@) || exit 1; \
7132	done
7133	tar chozf $(distdir).tar.gz $(distdir)
7134	rm -fr $(distdir)
7135@end example
7136
7137@end enumerate
7138
7139Note that if you are using GNU @code{automake}, @file{Makefile.in} is
7140automatically generated from @file{Makefile.am}, and all needed changes
7141to @file{Makefile.am} are already made by running @samp{gettextize}.
7142
7143@node src/Makefile, lib/gettext.h, Makefile, Adjusting Files
7144@subsection @file{Makefile.in} in @file{src/}
7145
7146Some of the modifications made in the main @file{Makefile.in} will
7147also be needed in the @file{Makefile.in} from your package sources,
7148which we assume here to be in the @file{src/} subdirectory.  Here are
7149all the modifications needed in @file{src/Makefile.in}:
7150
7151@enumerate
7152@item
7153In view of the @samp{dist:} goal, you should have these lines near the
7154beginning of @file{src/Makefile.in}:
7155
7156@example
7157PACKAGE = @@PACKAGE@@
7158VERSION = @@VERSION@@
7159@end example
7160
7161@item
7162If not done already, you should guarantee that @code{top_srcdir}
7163gets defined.  This will serve for @code{cpp} include files.  Just add
7164the line:
7165
7166@example
7167top_srcdir = @@top_srcdir@@
7168@end example
7169
7170@item
7171You might also want to define @code{subdir} as @samp{src}, later
7172allowing for almost uniform @samp{dist:} goals in all your
7173@file{Makefile.in}.  At list, the @samp{dist:} goal below assume that
7174you used:
7175
7176@example
7177subdir = src
7178@end example
7179
7180@item
7181The @code{main} function of your program will normally call
7182@code{bindtextdomain} (see @pxref{Triggering}), like this:
7183
7184@example
7185bindtextdomain (@var{PACKAGE}, LOCALEDIR);
7186textdomain (@var{PACKAGE});
7187@end example
7188
7189To make LOCALEDIR known to the program, add the following lines to
7190@file{Makefile.in}:
7191
7192@example
7193datadir = @@datadir@@
7194localedir = $(datadir)/locale
7195DEFS = -DLOCALEDIR=\"$(localedir)\" @@DEFS@@
7196@end example
7197
7198Note that @code{@@datadir@@} defaults to @samp{$(prefix)/share}, thus
7199@code{$(localedir)} defaults to @samp{$(prefix)/share/locale}.
7200
7201@item
7202You should ensure that the final linking will use @code{@@LIBINTL@@} or
7203@code{@@LTLIBINTL@@} as a library.  @code{@@LIBINTL@@} is for use without
7204@code{libtool}, @code{@@LTLIBINTL@@} is for use with @code{libtool}.  An
7205easy way to achieve this is to manage that it gets into @code{LIBS}, like
7206this:
7207
7208@example
7209LIBS = @@LIBINTL@@ @@LIBS@@
7210@end example
7211
7212In most packages internationalized with GNU @code{gettext}, one will
7213find a directory @file{lib/} in which a library containing some helper
7214functions will be build.  (You need at least the few functions which the
7215GNU @code{gettext} Library itself needs.)  However some of the functions
7216in the @file{lib/} also give messages to the user which of course should be
7217translated, too.  Taking care of this, the support library (say
7218@file{libsupport.a}) should be placed before @code{@@LIBINTL@@} and
7219@code{@@LIBS@@} in the above example.  So one has to write this:
7220
7221@example
7222LIBS = ../lib/libsupport.a @@LIBINTL@@ @@LIBS@@
7223@end example
7224
7225@item
7226You should also ensure that directory @file{intl/} will be searched for
7227C preprocessor include files in all circumstances.  So, you have to
7228manage so both @samp{-I../intl} and @samp{-I$(top_srcdir)/intl} will
7229be given to the C compiler.
7230
7231@item
7232Your @samp{dist:} goal has to conform with others.  Here is a
7233reasonable definition for it:
7234
7235@example
7236distdir = ../$(PACKAGE)-$(VERSION)/$(subdir)
7237dist: Makefile $(DISTFILES)
7238	for file in $(DISTFILES); do \
7239	  ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir) || exit 1; \
7240	done
7241@end example
7242
7243@end enumerate
7244
7245Note that if you are using GNU @code{automake}, @file{Makefile.in} is
7246automatically generated from @file{Makefile.am}, and the first three
7247changes and the last change are not necessary.  The remaining needed
7248@file{Makefile.am} modifications are the following:
7249
7250@enumerate
7251@item
7252To make LOCALEDIR known to the program, add the following to
7253@file{Makefile.am}:
7254
7255@example
7256<module>_CPPFLAGS = -DLOCALEDIR=\"$(localedir)\"
7257@end example
7258
7259@noindent
7260for each specific module or compilation unit, or
7261
7262@example
7263AM_CPPFLAGS = -DLOCALEDIR=\"$(localedir)\"
7264@end example
7265
7266for all modules and compilation units together.  Furthermore, add this
7267line to define @samp{localedir}:
7268
7269@example
7270localedir = $(datadir)/locale
7271@end example
7272
7273@item
7274To ensure that the final linking will use @code{@@LIBINTL@@} or
7275@code{@@LTLIBINTL@@} as a library, add the following to
7276@file{Makefile.am}:
7277
7278@example
7279<program>_LDADD = @@LIBINTL@@
7280@end example
7281
7282@noindent
7283for each specific program, or
7284
7285@example
7286LDADD = @@LIBINTL@@
7287@end example
7288
7289for all programs together.  Remember that when you use @code{libtool}
7290to link a program, you need to use @@LTLIBINTL@@ instead of @@LIBINTL@@
7291for that program.
7292
7293@item
7294If you have an @file{intl/} directory, whose contents is created by
7295@code{gettextize}, then to ensure that it will be searched for
7296C preprocessor include files in all circumstances, add something like
7297this to @file{Makefile.am}:
7298
7299@example
7300AM_CPPFLAGS = -I../intl -I$(top_srcdir)/intl
7301@end example
7302
7303@end enumerate
7304
7305@node lib/gettext.h,  , src/Makefile, Adjusting Files
7306@subsection @file{gettext.h} in @file{lib/}
7307@cindex @file{gettext.h} file
7308@cindex turning off NLS support
7309@cindex disabling NLS
7310
7311Internationalization of packages, as provided by GNU @code{gettext}, is
7312optional.  It can be turned off in two situations:
7313
7314@itemize @bullet
7315@item
7316When the installer has specified @samp{./configure --disable-nls}.  This
7317can be useful when small binaries are more important than features, for
7318example when building utilities for boot diskettes.  It can also be useful
7319in order to get some specific C compiler warnings about code quality with
7320some older versions of GCC (older than 3.0).
7321
7322@item
7323When the package does not include the @code{intl/} subdirectory, and the
7324libintl.h header (with its associated libintl library, if any) is not
7325already installed on the system, it is preferable that the package builds
7326without internationalization support, rather than to give a compilation
7327error.
7328@end itemize
7329
7330A C preprocessor macro can be used to detect these two cases.  Usually,
7331when @code{libintl.h} was found and not explicitly disabled, the
7332@code{ENABLE_NLS} macro will be defined to 1 in the autoconf generated
7333configuration file (usually called @file{config.h}).  In the two negative
7334situations, however, this macro will not be defined, thus it will evaluate
7335to 0 in C preprocessor expressions.
7336
7337@cindex include file @file{libintl.h}
7338@file{gettext.h} is a convenience header file for conditional use of
7339@file{<libintl.h>}, depending on the @code{ENABLE_NLS} macro.  If
7340@code{ENABLE_NLS} is set, it includes @file{<libintl.h>}; otherwise it
7341defines no-op substitutes for the libintl.h functions.  We recommend
7342the use of @code{"gettext.h"} over direct use of @file{<libintl.h>},
7343so that portability to older systems is guaranteed and installers can
7344turn off internationalization if they want to.  In the C code, you will
7345then write
7346
7347@example
7348#include "gettext.h"
7349@end example
7350
7351@noindent
7352instead of
7353
7354@example
7355#include <libintl.h>
7356@end example
7357
7358The location of @code{gettext.h} is usually in a directory containing
7359auxiliary include files.  In many GNU packages, there is a directory
7360@file{lib/} containing helper functions; @file{gettext.h} fits there.
7361In other packages, it can go into the @file{src} directory.
7362
7363Do not install the @code{gettext.h} file in public locations.  Every
7364package that needs it should contain a copy of it on its own.
7365
7366@node autoconf macros, CVS Issues, Adjusting Files, Maintainers
7367@section Autoconf macros for use in @file{configure.in}
7368@cindex autoconf macros for @code{gettext}
7369
7370GNU @code{gettext} installs macros for use in a package's
7371@file{configure.in} or @file{configure.ac}.
7372@xref{Top, , Introduction, autoconf, The Autoconf Manual}.
7373The primary macro is, of course, @code{AM_GNU_GETTEXT}.
7374
7375@menu
7376* AM_GNU_GETTEXT::              AM_GNU_GETTEXT in @file{gettext.m4}
7377* AM_GNU_GETTEXT_VERSION::      AM_GNU_GETTEXT_VERSION in @file{gettext.m4}
7378* AM_GNU_GETTEXT_NEED::         AM_GNU_GETTEXT_NEED in @file{gettext.m4}
7379* AM_GNU_GETTEXT_INTL_SUBDIR::  AM_GNU_GETTEXT_INTL_SUBDIR in @file{intldir.m4}
7380* AM_PO_SUBDIRS::               AM_PO_SUBDIRS in @file{po.m4}
7381* AM_ICONV::                    AM_ICONV in @file{iconv.m4}
7382@end menu
7383
7384@node AM_GNU_GETTEXT, AM_GNU_GETTEXT_VERSION, autoconf macros, autoconf macros
7385@subsection AM_GNU_GETTEXT in @file{gettext.m4}
7386
7387@amindex AM_GNU_GETTEXT
7388The @code{AM_GNU_GETTEXT} macro tests for the presence of the GNU gettext
7389function family in either the C library or a separate @code{libintl}
7390library (shared or static libraries are both supported) or in the package's
7391@file{intl/} directory.  It also invokes @code{AM_PO_SUBDIRS}, thus preparing
7392the @file{po/} directories of the package for building.
7393
7394@code{AM_GNU_GETTEXT} accepts up to three optional arguments.  The general
7395syntax is
7396
7397@example
7398AM_GNU_GETTEXT([@var{intlsymbol}], [@var{needsymbol}], [@var{intldir}])
7399@end example
7400
7401@c We don't document @var{intlsymbol} = @samp{use-libtool} here, because
7402@c it is of no use for packages other than GNU gettext itself.  (Such packages
7403@c are not allowed to install the shared libintl.  But if they use libtool,
7404@c then it is in order to install shared libraries that depend on libintl.)
7405@var{intlsymbol} can be @samp{external} or @samp{no-libtool}.  The default
7406(if it is not specified or empty) is @samp{no-libtool}.  @var{intlsymbol}
7407should be @samp{external} for packages with no @file{intl/} directory.
7408For packages with an @file{intl/} directory, you can either use an
7409@var{intlsymbol} equal to @samp{no-libtool}, or you can use @samp{external}
7410and override by using the macro @code{AM_GNU_GETTEXT_INTL_SUBDIR} elsewhere.
7411The two ways to specify the existence of an @file{intl/} directory are
7412equivalent.  At build time, a static library
7413@code{$(top_builddir)/intl/libintl.a} will then be created.
7414
7415If @var{needsymbol} is specified and is @samp{need-ngettext}, then GNU
7416gettext implementations (in libc or libintl) without the @code{ngettext()}
7417function will be ignored.  If @var{needsymbol} is specified and is
7418@samp{need-formatstring-macros}, then GNU gettext implementations that don't
7419support the ISO C 99 @file{<inttypes.h>} formatstring macros will be ignored.
7420Only one @var{needsymbol} can be specified.  These requirements can also be
7421specified by using the macro @code{AM_GNU_GETTEXT_NEED} elsewhere.  To specify
7422more than one requirement, just specify the strongest one among them, or
7423invoke the @code{AM_GNU_GETTEXT_NEED} macro several times.  The hierarchy
7424among the various alternatives is as follows: @samp{need-formatstring-macros}
7425implies @samp{need-ngettext}.
7426
7427@var{intldir} is used to find the intl libraries.  If empty, the value
7428@samp{$(top_builddir)/intl/} is used.
7429
7430The @code{AM_GNU_GETTEXT} macro determines whether GNU gettext is
7431available and should be used.  If so, it sets the @code{USE_NLS} variable
7432to @samp{yes}; it defines @code{ENABLE_NLS} to 1 in the autoconf
7433generated configuration file (usually called @file{config.h}); it sets
7434the variables @code{LIBINTL} and @code{LTLIBINTL} to the linker options
7435for use in a Makefile (@code{LIBINTL} for use without libtool,
7436@code{LTLIBINTL} for use with libtool); it adds an @samp{-I} option to
7437@code{CPPFLAGS} if necessary.  In the negative case, it sets
7438@code{USE_NLS} to @samp{no}; it sets @code{LIBINTL} and @code{LTLIBINTL}
7439to empty and doesn't change @code{CPPFLAGS}.
7440
7441The complexities that @code{AM_GNU_GETTEXT} deals with are the following:
7442
7443@itemize @bullet
7444@item
7445@cindex @code{libintl} library
7446Some operating systems have @code{gettext} in the C library, for example
7447glibc.  Some have it in a separate library @code{libintl}.  GNU @code{libintl}
7448might have been installed as part of the GNU @code{gettext} package.
7449
7450@item
7451GNU @code{libintl}, if installed, is not necessarily already in the search
7452path (@code{CPPFLAGS} for the include file search path, @code{LDFLAGS} for
7453the library search path).
7454
7455@item
7456Except for glibc, the operating system's native @code{gettext} cannot
7457exploit the GNU mo files, doesn't have the necessary locale dependency
7458features, and cannot convert messages from the catalog's text encoding
7459to the user's locale encoding.
7460
7461@item
7462GNU @code{libintl}, if installed, is not necessarily already in the
7463run time library search path.  To avoid the need for setting an environment
7464variable like @code{LD_LIBRARY_PATH}, the macro adds the appropriate
7465run time search path options to the @code{LIBINTL} and @code{LTLIBINTL}
7466variables.  This works on most systems, but not on some operating systems
7467with limited shared library support, like SCO.
7468
7469@item
7470GNU @code{libintl} relies on POSIX/XSI @code{iconv}.  The macro checks for
7471linker options needed to use iconv and appends them to the @code{LIBINTL}
7472and @code{LTLIBINTL} variables.
7473@end itemize
7474
7475@node AM_GNU_GETTEXT_VERSION, AM_GNU_GETTEXT_NEED, AM_GNU_GETTEXT, autoconf macros
7476@subsection AM_GNU_GETTEXT_VERSION in @file{gettext.m4}
7477
7478@amindex AM_GNU_GETTEXT_VERSION
7479The @code{AM_GNU_GETTEXT_VERSION} macro declares the version number of
7480the GNU gettext infrastructure that is used by the package.
7481
7482The use of this macro is optional; only the @code{autopoint} program makes
7483use of it (@pxref{CVS Issues}).
7484
7485
7486@node AM_GNU_GETTEXT_NEED, AM_GNU_GETTEXT_INTL_SUBDIR, AM_GNU_GETTEXT_VERSION, autoconf macros
7487@subsection AM_GNU_GETTEXT_NEED in @file{gettext.m4}
7488
7489@amindex AM_GNU_GETTEXT_NEED
7490The @code{AM_GNU_GETTEXT_NEED} macro declares a constraint regarding the
7491GNU gettext implementation.  The syntax is
7492
7493@example
7494AM_GNU_GETTEXT_NEED([@var{needsymbol}])
7495@end example
7496
7497If @var{needsymbol} is @samp{need-ngettext}, then GNU gettext implementations
7498(in libc or libintl) without the @code{ngettext()} function will be ignored.
7499If @var{needsymbol} is @samp{need-formatstring-macros}, then GNU gettext
7500implementations that don't support the ISO C 99 @file{<inttypes.h>}
7501formatstring macros will be ignored.
7502
7503The optional second argument of @code{AM_GNU_GETTEXT} is also taken into
7504account.
7505
7506The @code{AM_GNU_GETTEXT_NEED} invocations can occur before or after
7507the @code{AM_GNU_GETTEXT} invocation; the order doesn't matter.
7508
7509@node AM_GNU_GETTEXT_INTL_SUBDIR, AM_PO_SUBDIRS, AM_GNU_GETTEXT_NEED, autoconf macros
7510@subsection AM_GNU_GETTEXT_INTL_SUBDIR in @file{intldir.m4}
7511
7512@amindex AM_GNU_GETTEXT_INTL_SUBDIR
7513The @code{AM_GNU_GETTEXT_INTL_SUBDIR} macro specifies that the
7514@code{AM_GNU_GETTEXT} macro, although invoked with the first argument
7515@samp{external}, should also prepare for building the @file{intl/}
7516subdirectory.
7517
7518The @code{AM_GNU_GETTEXT_INTL_SUBDIR} invocation can occur before or after
7519the @code{AM_GNU_GETTEXT} invocation; the order doesn't matter.
7520
7521The use of this macro requires GNU automake 1.10 or newer and
7522GNU autoconf 2.61 or newer.
7523
7524@node AM_PO_SUBDIRS, AM_ICONV, AM_GNU_GETTEXT_INTL_SUBDIR, autoconf macros
7525@subsection AM_PO_SUBDIRS in @file{po.m4}
7526
7527@amindex AM_PO_SUBDIRS
7528The @code{AM_PO_SUBDIRS} macro prepares the @file{po/} directories of the
7529package for building.  This macro should be used in internationalized
7530programs written in other programming languages than C, C++, Objective C,
7531for example @code{sh}, @code{Python}, @code{Lisp}.  See @ref{Programming
7532Languages} for a list of programming languages that support localization
7533through PO files.
7534
7535The @code{AM_PO_SUBDIRS} macro determines whether internationalization
7536should be used.  If so, it sets the @code{USE_NLS} variable to @samp{yes},
7537otherwise to @samp{no}.  It also determines the right values for Makefile
7538variables in each @file{po/} directory.
7539
7540@node AM_ICONV,  , AM_PO_SUBDIRS, autoconf macros
7541@subsection AM_ICONV in @file{iconv.m4}
7542
7543@amindex AM_ICONV
7544The @code{AM_ICONV} macro tests for the presence of the POSIX/XSI
7545@code{iconv} function family in either the C library or a separate
7546@code{libiconv} library.  If found, it sets the @code{am_cv_func_iconv}
7547variable to @samp{yes}; it defines @code{HAVE_ICONV} to 1 in the autoconf
7548generated configuration file (usually called @file{config.h}); it defines
7549@code{ICONV_CONST} to @samp{const} or to empty, depending on whether the
7550second argument of @code{iconv()} is of type @samp{const char **} or
7551@samp{char **}; it sets the variables @code{LIBICONV} and
7552@code{LTLIBICONV} to the linker options for use in a Makefile
7553(@code{LIBICONV} for use without libtool, @code{LTLIBICONV} for use with
7554libtool); it adds an @samp{-I} option to @code{CPPFLAGS} if
7555necessary.  If not found, it sets @code{LIBICONV} and @code{LTLIBICONV} to
7556empty and doesn't change @code{CPPFLAGS}.
7557
7558The complexities that @code{AM_ICONV} deals with are the following:
7559
7560@itemize @bullet
7561@item
7562@cindex @code{libiconv} library
7563Some operating systems have @code{iconv} in the C library, for example
7564glibc.  Some have it in a separate library @code{libiconv}, for example
7565OSF/1 or FreeBSD.  Regardless of the operating system, GNU @code{libiconv}
7566might have been installed.  In that case, it should be used instead of the
7567operating system's native @code{iconv}.
7568
7569@item
7570GNU @code{libiconv}, if installed, is not necessarily already in the search
7571path (@code{CPPFLAGS} for the include file search path, @code{LDFLAGS} for
7572the library search path).
7573
7574@item
7575GNU @code{libiconv} is binary incompatible with some operating system's
7576native @code{iconv}, for example on FreeBSD.  Use of an @file{iconv.h}
7577and @file{libiconv.so} that don't fit together would produce program
7578crashes.
7579
7580@item
7581GNU @code{libiconv}, if installed, is not necessarily already in the
7582run time library search path.  To avoid the need for setting an environment
7583variable like @code{LD_LIBRARY_PATH}, the macro adds the appropriate
7584run time search path options to the @code{LIBICONV} variable.  This works
7585on most systems, but not on some operating systems with limited shared
7586library support, like SCO.
7587@end itemize
7588
7589@file{iconv.m4} is distributed with the GNU gettext package because
7590@file{gettext.m4} relies on it.
7591
7592@node CVS Issues, Release Management, autoconf macros, Maintainers
7593@section Integrating with CVS
7594
7595Many projects use CVS for distributed development, version control and
7596source backup.  This section gives some advice how to manage the uses
7597of @code{cvs}, @code{gettextize}, @code{autopoint} and @code{autoconf}.
7598
7599@menu
7600* Distributed CVS::             Avoiding version mismatch in distributed development
7601* Files under CVS::             Files to put under CVS version control
7602* autopoint Invocation::        Invoking the @code{autopoint} Program
7603@end menu
7604
7605@node Distributed CVS, Files under CVS, CVS Issues, CVS Issues
7606@subsection Avoiding version mismatch in distributed development
7607
7608In a project development with multiple developers, using CVS, there
7609should be a single developer who occasionally - when there is desire to
7610upgrade to a new @code{gettext} version - runs @code{gettextize} and
7611performs the changes listed in @ref{Adjusting Files}, and then commits
7612his changes to the CVS.
7613
7614It is highly recommended that all developers on a project use the same
7615version of GNU @code{gettext} in the package.  In other words, if a
7616developer runs @code{gettextize}, he should go the whole way, make the
7617necessary remaining changes and commit his changes to the CVS.
7618Otherwise the following damages will likely occur:
7619
7620@itemize @bullet
7621@item
7622Apparent version mismatch between developers.  Since some @code{gettext}
7623specific portions in @file{configure.in}, @file{configure.ac} and
7624@code{Makefile.am}, @code{Makefile.in} files depend on the @code{gettext}
7625version, the use of infrastructure files belonging to different
7626@code{gettext} versions can easily lead to build errors.
7627
7628@item
7629Hidden version mismatch.  Such version mismatch can also lead to
7630malfunctioning of the package, that may be undiscovered by the developers.
7631The worst case of hidden version mismatch is that internationalization
7632of the package doesn't work at all.
7633
7634@item
7635Release risks.  All developers implicitly perform constant testing on
7636a package.  This is important in the days and weeks before a release.
7637If the guy who makes the release tar files uses a different version
7638of GNU @code{gettext} than the other developers, the distribution will
7639be less well tested than if all had been using the same @code{gettext}
7640version.  For example, it is possible that a platform specific bug goes
7641undiscovered due to this constellation.
7642@end itemize
7643
7644@node Files under CVS, autopoint Invocation, Distributed CVS, CVS Issues
7645@subsection Files to put under CVS version control
7646
7647There are basically three ways to deal with generated files in the
7648context of a CVS repository, such as @file{configure} generated from
7649@file{configure.in}, @code{@var{parser}.c} generated from
7650@code{@var{parser}.y}, or @code{po/Makefile.in.in} autoinstalled by
7651@code{gettextize} or @code{autopoint}.
7652
7653@enumerate
7654@item
7655All generated files are always committed into the repository.
7656
7657@item
7658All generated files are committed into the repository occasionally,
7659for example each time a release is made.
7660
7661@item
7662Generated files are never committed into the repository.
7663@end enumerate
7664
7665Each of these three approaches has different advantages and drawbacks.
7666
7667@enumerate
7668@item
7669The advantage is that anyone can check out the CVS at any moment and
7670gets a working build.  The drawbacks are:  1a. It requires some frequent
7671"cvs commit" actions by the maintainers.  1b. The repository grows in size
7672quite fast.
7673
7674@item
7675The advantage is that anyone can check out the CVS, and the usual
7676"./configure; make" will work.  The drawbacks are:  2a. The one who
7677checks out the repository needs tools like GNU @code{automake},
7678GNU @code{autoconf}, GNU @code{m4} installed in his PATH; sometimes
7679he even needs particular versions of them.  2b. When a release is made
7680and a commit is made on the generated files, the other developers get
7681conflicts on the generated files after doing "cvs update".  Although
7682these conflicts are easy to resolve, they are annoying.
7683
7684@item
7685The advantage is less work for the maintainers.  The drawback is that
7686anyone who checks out the CVS not only needs tools like GNU @code{automake},
7687GNU @code{autoconf}, GNU @code{m4} installed in his PATH, but also that
7688he needs to perform a package specific pre-build step before being able
7689to "./configure; make".
7690@end enumerate
7691
7692For the first and second approach, all files modified or brought in
7693by the occasional @code{gettextize} invocation and update should be
7694committed into the CVS.
7695
7696For the third approach, the maintainer can omit from the CVS repository
7697all the files that @code{gettextize} mentions as "copy".  Instead, he
7698adds to the @file{configure.in} or @file{configure.ac} a line of the
7699form
7700
7701@example
7702AM_GNU_GETTEXT_VERSION(@value{VERSION})
7703@end example
7704
7705@noindent
7706and adds to the package's pre-build script an invocation of
7707@samp{autopoint}.  For everyone who checks out the CVS, this
7708@code{autopoint} invocation will copy into the right place the
7709@code{gettext} infrastructure files that have been omitted from the CVS.
7710
7711The version number used as argument to @code{AM_GNU_GETTEXT_VERSION} is
7712the version of the @code{gettext} infrastructure that the package wants
7713to use.  It is also the minimum version number of the @samp{autopoint}
7714program.  So, if you write @code{AM_GNU_GETTEXT_VERSION(0.11.5)} then the
7715developers can have any version >= 0.11.5 installed; the package will work
7716with the 0.11.5 infrastructure in all developers' builds.  When the
7717maintainer then runs gettextize from, say, version 0.12.1 on the package,
7718the occurrence of @code{AM_GNU_GETTEXT_VERSION(0.11.5)} will be changed
7719into @code{AM_GNU_GETTEXT_VERSION(0.12.1)}, and all other developers that
7720use the CVS will henceforth need to have GNU @code{gettext} 0.12.1 or newer
7721installed.
7722
7723@node autopoint Invocation,  , Files under CVS, CVS Issues
7724@subsection Invoking the @code{autopoint} Program
7725
7726@include autopoint.texi
7727
7728@node Release Management,  , CVS Issues, Maintainers
7729@section Creating a Distribution Tarball
7730
7731@cindex release
7732@cindex distribution tarball
7733In projects that use GNU @code{automake}, the usual commands for creating
7734a distribution tarball, @samp{make dist} or @samp{make distcheck},
7735automatically update the PO files as needed.
7736
7737If GNU @code{automake} is not used, the maintainer needs to perform this
7738update before making a release:
7739
7740@example
7741$ ./configure
7742$ (cd po; make update-po)
7743$ make distclean
7744@end example
7745
7746@node Installers, Programming Languages, Maintainers, Top
7747@chapter The Installer's and Distributor's View
7748@cindex package installer's view of @code{gettext}
7749@cindex package distributor's view of @code{gettext}
7750@cindex package build and installation options
7751@cindex setting up @code{gettext} at build time
7752
7753By default, packages fully using GNU @code{gettext}, internally,
7754are installed in such a way that they to allow translation of
7755messages.  At @emph{configuration} time, those packages should
7756automatically detect whether the underlying host system already provides
7757the GNU @code{gettext} functions.  If not,
7758the GNU @code{gettext} library should be automatically prepared
7759and used.  Installers may use special options at configuration
7760time for changing this behavior.  The command @samp{./configure
7761--with-included-gettext} bypasses system @code{gettext} to
7762use the included GNU @code{gettext} instead,
7763while @samp{./configure --disable-nls}
7764produces programs totally unable to translate messages.
7765
7766@vindex LINGUAS@r{, environment variable}
7767Internationalized packages have usually many @file{@var{ll}.po}
7768files.  Unless
7769translations are disabled, all those available are installed together
7770with the package.  However, the environment variable @code{LINGUAS}
7771may be set, prior to configuration, to limit the installed set.
7772@code{LINGUAS} should then contain a space separated list of two-letter
7773codes, stating which languages are allowed.
7774
7775@node Programming Languages, Conclusion, Installers, Top
7776@chapter Other Programming Languages
7777
7778While the presentation of @code{gettext} focuses mostly on C and
7779implicitly applies to C++ as well, its scope is far broader than that:
7780Many programming languages, scripting languages and other textual data
7781like GUI resources or package descriptions can make use of the gettext
7782approach.
7783
7784@menu
7785* Language Implementors::       The Language Implementor's View
7786* Programmers for other Languages::  The Programmer's View
7787* Translators for other Languages::  The Translator's View
7788* Maintainers for other Languages::  The Maintainer's View
7789* List of Programming Languages::  Individual Programming Languages
7790* List of Data Formats::        Internationalizable Data
7791@end menu
7792
7793@node Language Implementors, Programmers for other Languages, Programming Languages, Programming Languages
7794@section The Language Implementor's View
7795@cindex programming languages
7796@cindex scripting languages
7797
7798All programming and scripting languages that have the notion of strings
7799are eligible to supporting @code{gettext}.  Supporting @code{gettext}
7800means the following:
7801
7802@enumerate
7803@item
7804You should add to the language a syntax for translatable strings.  In
7805principle, a function call of @code{gettext} would do, but a shorthand
7806syntax helps keeping the legibility of internationalized programs.  For
7807example, in C we use the syntax @code{_("string")}, and in GNU awk we use
7808the shorthand @code{_"string"}.
7809
7810@item
7811You should arrange that evaluation of such a translatable string at
7812runtime calls the @code{gettext} function, or performs equivalent
7813processing.
7814
7815@item
7816Similarly, you should make the functions @code{ngettext},
7817@code{dcgettext}, @code{dcngettext} available from within the language.
7818These functions are less often used, but are nevertheless necessary for
7819particular purposes: @code{ngettext} for correct plural handling, and
7820@code{dcgettext} and @code{dcngettext} for obeying other locale
7821environment variables than @code{LC_MESSAGES}, such as @code{LC_TIME} or
7822@code{LC_MONETARY}.  For these latter functions, you need to make the
7823@code{LC_*} constants, available in the C header @code{<locale.h>},
7824referenceable from within the language, usually either as enumeration
7825values or as strings.
7826
7827@item
7828You should allow the programmer to designate a message domain, either by
7829making the @code{textdomain} function available from within the
7830language, or by introducing a magic variable called @code{TEXTDOMAIN}.
7831Similarly, you should allow the programmer to designate where to search
7832for message catalogs, by providing access to the @code{bindtextdomain}
7833function.
7834
7835@item
7836You should either perform a @code{setlocale (LC_ALL, "")} call during
7837the startup of your language runtime, or allow the programmer to do so.
7838Remember that gettext will act as a no-op if the @code{LC_MESSAGES} and
7839@code{LC_CTYPE} locale facets are not both set.
7840
7841@item
7842A programmer should have a way to extract translatable strings from a
7843program into a PO file.  The GNU @code{xgettext} program is being
7844extended to support very different programming languages.  Please
7845contact the GNU @code{gettext} maintainers to help them doing this.  If
7846the string extractor is best integrated into your language's parser, GNU
7847@code{xgettext} can function as a front end to your string extractor.
7848
7849@item
7850The language's library should have a string formatting facility where
7851the arguments of a format string are denoted by a positional number or a
7852name.  This is needed because for some languages and some messages with
7853more than one substitutable argument, the translation will need to
7854output the substituted arguments in different order.  @xref{c-format Flag}.
7855
7856@item
7857If the language has more than one implementation, and not all of the
7858implementations use @code{gettext}, but the programs should be portable
7859across implementations, you should provide a no-i18n emulation, that
7860makes the other implementations accept programs written for yours,
7861without actually translating the strings.
7862
7863@item
7864To help the programmer in the task of marking translatable strings,
7865which is sometimes performed using the Emacs PO mode (@pxref{Marking}),
7866you are welcome to
7867contact the GNU @code{gettext} maintainers, so they can add support for
7868your language to @file{po-mode.el}.
7869@end enumerate
7870
7871On the implementation side, three approaches are possible, with
7872different effects on portability and copyright:
7873
7874@itemize @bullet
7875@item
7876You may integrate the GNU @code{gettext}'s @file{intl/} directory in
7877your package, as described in @ref{Maintainers}.  This allows you to
7878have internationalization on all kinds of platforms.  Note that when you
7879then distribute your package, it legally falls under the GNU General
7880Public License, and the GNU project will be glad about your contribution
7881to the Free Software pool.
7882
7883@item
7884You may link against GNU @code{gettext} functions if they are found in
7885the C library.  For example, an autoconf test for @code{gettext()} and
7886@code{ngettext()} will detect this situation.  For the moment, this test
7887will succeed on GNU systems and not on other platforms.  No severe
7888copyright restrictions apply.
7889
7890@item
7891You may emulate or reimplement the GNU @code{gettext} functionality.
7892This has the advantage of full portability and no copyright
7893restrictions, but also the drawback that you have to reimplement the GNU
7894@code{gettext} features (such as the @code{LANGUAGE} environment
7895variable, the locale aliases database, the automatic charset conversion,
7896and plural handling).
7897@end itemize
7898
7899@node Programmers for other Languages, Translators for other Languages, Language Implementors, Programming Languages
7900@section The Programmer's View
7901
7902For the programmer, the general procedure is the same as for the C
7903language.  The Emacs PO mode marking supports other languages, and the GNU
7904@code{xgettext} string extractor recognizes other languages based on the
7905file extension or a command-line option.  In some languages,
7906@code{setlocale} is not needed because it is already performed by the
7907underlying language runtime.
7908
7909@node Translators for other Languages, Maintainers for other Languages, Programmers for other Languages, Programming Languages
7910@section The Translator's View
7911
7912The translator works exactly as in the C language case.  The only
7913difference is that when translating format strings, she has to be aware
7914of the language's particular syntax for positional arguments in format
7915strings.
7916
7917@menu
7918* c-format::                    C Format Strings
7919* objc-format::                 Objective C Format Strings
7920* sh-format::                   Shell Format Strings
7921* python-format::               Python Format Strings
7922* lisp-format::                 Lisp Format Strings
7923* elisp-format::                Emacs Lisp Format Strings
7924* librep-format::               librep Format Strings
7925* scheme-format::               Scheme Format Strings
7926* smalltalk-format::            Smalltalk Format Strings
7927* java-format::                 Java Format Strings
7928* csharp-format::               C# Format Strings
7929* awk-format::                  awk Format Strings
7930* object-pascal-format::        Object Pascal Format Strings
7931* ycp-format::                  YCP Format Strings
7932* tcl-format::                  Tcl Format Strings
7933* perl-format::                 Perl Format Strings
7934* php-format::                  PHP Format Strings
7935* gcc-internal-format::         GCC internal Format Strings
7936* qt-format::                   Qt Format Strings
7937* boost-format::                Boost Format Strings
7938@end menu
7939
7940@node c-format, objc-format, Translators for other Languages, Translators for other Languages
7941@subsection C Format Strings
7942
7943C format strings are described in POSIX (IEEE P1003.1 2001), section
7944XSH 3 fprintf(),
7945@uref{http://www.opengroup.org/onlinepubs/007904975/functions/fprintf.html}.
7946See also the fprintf() manual page,
7947@uref{http://www.linuxvalley.it/encyclopedia/ldp/manpage/man3/printf.3.php},
7948@uref{http://informatik.fh-wuerzburg.de/student/i510/man/printf.html}.
7949
7950Although format strings with positions that reorder arguments, such as
7951
7952@example
7953"Only %2$d bytes free on '%1$s'."
7954@end example
7955
7956@noindent
7957which is semantically equivalent to
7958
7959@example
7960"'%s' has only %d bytes free."
7961@end example
7962
7963@noindent
7964are a POSIX/XSI feature and not specified by ISO C 99, translators can rely
7965on this reordering ability: On the few platforms where @code{printf()},
7966@code{fprintf()} etc. don't support this feature natively, @file{libintl.a}
7967or @file{libintl.so} provides replacement functions, and GNU @code{<libintl.h>}
7968activates these replacement functions automatically.
7969
7970@cindex outdigits
7971@cindex Arabic digits
7972As a special feature for Farsi (Persian) and maybe Arabic, translators can
7973insert an @samp{I} flag into numeric format directives.  For example, the
7974translation of @code{"%d"} can be @code{"%Id"}.  The effect of this flag,
7975on systems with GNU @code{libc}, is that in the output, the ASCII digits are
7976replaced with the @samp{outdigits} defined in the @code{LC_CTYPE} locale
7977facet.  On other systems, the @code{gettext} function removes this flag,
7978so that it has no effect.
7979
7980Note that the programmer should @emph{not} put this flag into the
7981untranslated string.  (Putting the @samp{I} format directive flag into an
7982@var{msgid} string would lead to undefined behaviour on platforms without
7983glibc when NLS is disabled.)
7984
7985@node objc-format, sh-format, c-format, Translators for other Languages
7986@subsection Objective C Format Strings
7987
7988Objective C format strings are like C format strings.  They support an
7989additional format directive: "$@@", which when executed consumes an argument
7990of type @code{Object *}.
7991
7992@node sh-format, python-format, objc-format, Translators for other Languages
7993@subsection Shell Format Strings
7994
7995Shell format strings, as supported by GNU gettext and the @samp{envsubst}
7996program, are strings with references to shell variables in the form
7997@code{$@var{variable}} or @code{$@{@var{variable}@}}.  References of the form
7998@code{$@{@var{variable}-@var{default}@}},
7999@code{$@{@var{variable}:-@var{default}@}},
8000@code{$@{@var{variable}=@var{default}@}},
8001@code{$@{@var{variable}:=@var{default}@}},
8002@code{$@{@var{variable}+@var{replacement}@}},
8003@code{$@{@var{variable}:+@var{replacement}@}},
8004@code{$@{@var{variable}?@var{ignored}@}},
8005@code{$@{@var{variable}:?@var{ignored}@}},
8006that would be valid inside shell scripts, are not supported.  The
8007@var{variable} names must consist solely of alphanumeric or underscore
8008ASCII characters, not start with a digit and be nonempty; otherwise such
8009a variable reference is ignored.
8010
8011@node python-format, lisp-format, sh-format, Translators for other Languages
8012@subsection Python Format Strings
8013
8014Python format strings are described in
8015@w{Python Library reference} /
8016@w{2. Built-in Types, Exceptions and Functions} /
8017@w{2.2. Built-in Types} /
8018@w{2.2.6. Sequence Types} /
8019@w{2.2.6.2. String Formatting Operations}.
8020@uref{http://www.python.org/doc/2.2.1/lib/typesseq-strings.html}.
8021
8022@node lisp-format, elisp-format, python-format, Translators for other Languages
8023@subsection Lisp Format Strings
8024
8025Lisp format strings are described in the Common Lisp HyperSpec,
8026chapter 22.3 @w{Formatted Output},
8027@uref{http://www.lisp.org/HyperSpec/Body/sec_22-3.html}.
8028
8029@node elisp-format, librep-format, lisp-format, Translators for other Languages
8030@subsection Emacs Lisp Format Strings
8031
8032Emacs Lisp format strings are documented in the Emacs Lisp reference,
8033section @w{Formatting Strings},
8034@uref{http://www.gnu.org/manual/elisp-manual-21-2.8/html_chapter/elisp_4.html#SEC75}.
8035Note that as of version 21, XEmacs supports numbered argument specifications
8036in format strings while FSF Emacs doesn't.
8037
8038@node librep-format, scheme-format, elisp-format, Translators for other Languages
8039@subsection librep Format Strings
8040
8041librep format strings are documented in the librep manual, section
8042@w{Formatted Output},
8043@url{http://librep.sourceforge.net/librep-manual.html#Formatted%20Output},
8044@url{http://www.gwinnup.org/research/docs/librep.html#SEC122}.
8045
8046@node scheme-format, smalltalk-format, librep-format, Translators for other Languages
8047@subsection Scheme Format Strings
8048
8049Scheme format strings are documented in the SLIB manual, section
8050@w{Format Specification}.
8051
8052@node smalltalk-format, java-format, scheme-format, Translators for other Languages
8053@subsection Smalltalk Format Strings
8054
8055Smalltalk format strings are described in the GNU Smalltalk documentation,
8056class @code{CharArray}, methods @samp{bindWith:} and
8057@samp{bindWithArguments:}.
8058@uref{http://www.gnu.org/software/smalltalk/gst-manual/gst_68.html#SEC238}.
8059In summary, a directive starts with @samp{%} and is followed by @samp{%}
8060or a nonzero digit (@samp{1} to @samp{9}).
8061
8062@node java-format, csharp-format, smalltalk-format, Translators for other Languages
8063@subsection Java Format Strings
8064
8065Java format strings are described in the JDK documentation for class
8066@code{java.text.MessageFormat},
8067@uref{http://java.sun.com/j2se/1.4/docs/api/java/text/MessageFormat.html}.
8068See also the ICU documentation
8069@uref{http://oss.software.ibm.com/icu/apiref/classMessageFormat.html}.
8070
8071@node csharp-format, awk-format, java-format, Translators for other Languages
8072@subsection C# Format Strings
8073
8074C# format strings are described in the .NET documentation for class
8075@code{System.String} and in
8076@uref{http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/cpConFormattingOverview.asp}.
8077
8078@node awk-format, object-pascal-format, csharp-format, Translators for other Languages
8079@subsection awk Format Strings
8080
8081awk format strings are described in the gawk documentation, section
8082@w{Printf},
8083@uref{http://www.gnu.org/manual/gawk/html_node/Printf.html#Printf}.
8084
8085@node object-pascal-format, ycp-format, awk-format, Translators for other Languages
8086@subsection Object Pascal Format Strings
8087
8088Where is this documented?
8089
8090@node ycp-format, tcl-format, object-pascal-format, Translators for other Languages
8091@subsection YCP Format Strings
8092
8093YCP sformat strings are described in the libycp documentation
8094@uref{file:/usr/share/doc/packages/libycp/YCP-builtins.html}.
8095In summary, a directive starts with @samp{%} and is followed by @samp{%}
8096or a nonzero digit (@samp{1} to @samp{9}).
8097
8098@node tcl-format, perl-format, ycp-format, Translators for other Languages
8099@subsection Tcl Format Strings
8100
8101Tcl format strings are described in the @file{format.n} manual page,
8102@uref{http://www.scriptics.com/man/tcl8.3/TclCmd/format.htm}.
8103
8104@node perl-format, php-format, tcl-format, Translators for other Languages
8105@subsection Perl Format Strings
8106
8107There are two kinds format strings in Perl: those acceptable to the
8108Perl built-in function @code{printf}, labelled as @samp{perl-format},
8109and those acceptable to the @code{libintl-perl} function @code{__x},
8110labelled as @samp{perl-brace-format}.
8111
8112Perl @code{printf} format strings are described in the @code{sprintf}
8113section of @samp{man perlfunc}.
8114
8115Perl brace format strings are described in the
8116@file{Locale::TextDomain(3pm)} manual page of the CPAN package
8117libintl-perl.  In brief, Perl format uses placeholders put between
8118braces (@samp{@{} and @samp{@}}).  The placeholder must have the syntax
8119of simple identifiers.
8120
8121@node php-format, gcc-internal-format, perl-format, Translators for other Languages
8122@subsection PHP Format Strings
8123
8124PHP format strings are described in the documentation of the PHP function
8125@code{sprintf}, in @file{phpdoc/manual/function.sprintf.html} or
8126@uref{http://www.php.net/manual/en/function.sprintf.php}.
8127
8128@node gcc-internal-format, qt-format, php-format, Translators for other Languages
8129@subsection GCC internal Format Strings
8130
8131These format strings are used inside the GCC sources.  In such a format
8132string, a directive starts with @samp{%}, is optionally followed by a
8133size specifier @samp{l}, an optional flag @samp{+}, another optional flag
8134@samp{#}, and is finished by a specifier: @samp{%} denotes a literal
8135percent sign, @samp{c} denotes a character, @samp{s} denotes a string,
8136@samp{i} and @samp{d} denote an integer, @samp{o}, @samp{u}, @samp{x}
8137denote an unsigned integer, @samp{.*s} denotes a string preceded by a
8138width specification, @samp{H} denotes a @samp{location_t *} pointer,
8139@samp{D} denotes a general declaration, @samp{F} denotes a function
8140declaration, @samp{T} denotes a type, @samp{A} denotes a function argument,
8141@samp{C} denotes a tree code, @samp{E} denotes an expression, @samp{L}
8142denotes a programming language, @samp{O} denotes a binary operator,
8143@samp{P} denotes a function parameter, @samp{Q} denotes an assignment
8144operator, @samp{V} denotes a const/volatile qualifier.
8145
8146@node qt-format, boost-format, gcc-internal-format, Translators for other Languages
8147@subsection Qt Format Strings
8148
8149Qt format strings are described in the documentation of the QString class
8150@uref{file:/usr/lib/qt-3.0.5/doc/html/qstring.html}.
8151In summary, a directive consists of a @samp{%} followed by a digit. The same
8152directive cannot occur more than once in a format string.
8153
8154@node boost-format,  , qt-format, Translators for other Languages
8155@subsection Boost Format Strings
8156
8157Boost format strings are described in the documentation of the
8158@code{boost::format} class, at
8159@uref{http://www.boost.org/libs/format/doc/format.html}.
8160In summary, a directive has either the same syntax as in a C format string,
8161such as @samp{%1$+5d}, or may be surrounded by vertical bars, such as
8162@samp{%|1$+5d|} or @samp{%|1$+5|}, or consists of just an argument number
8163between percent signs, such as @samp{%1%}.
8164
8165@node Maintainers for other Languages, List of Programming Languages, Translators for other Languages, Programming Languages
8166@section The Maintainer's View
8167
8168For the maintainer, the general procedure differs from the C language
8169case in two ways.
8170
8171@itemize @bullet
8172@item
8173For those languages that don't use GNU gettext, the @file{intl/} directory
8174is not needed and can be omitted.  This means that the maintainer calls the
8175@code{gettextize} program without the @samp{--intl} option, and that he
8176invokes the @code{AM_GNU_GETTEXT} autoconf macro via
8177@samp{AM_GNU_GETTEXT([external])}.
8178
8179@item
8180If only a single programming language is used, the @code{XGETTEXT_OPTIONS}
8181variable in @file{po/Makevars} (@pxref{po/Makevars}) should be adjusted to
8182match the @code{xgettext} options for that particular programming language.
8183If the package uses more than one programming language with @code{gettext}
8184support, it becomes necessary to change the POT file construction rule
8185in @file{po/Makefile.in.in}.  It is recommended to make one @code{xgettext}
8186invocation per programming language, each with the options appropriate for
8187that language, and to combine the resulting files using @code{msgcat}.
8188@end itemize
8189
8190@node List of Programming Languages, List of Data Formats, Maintainers for other Languages, Programming Languages
8191@section Individual Programming Languages
8192
8193@c Here is a list of programming languages, as used for Free Software projects
8194@c on SourceForge/Freshmeat, as of February 2002.  Those supported by gettext
8195@c are marked with a star.
8196@c   C                       3580     *
8197@c   Perl                    1911     *
8198@c   C++                     1379     *
8199@c   Java                    1200     *
8200@c   PHP                     1051     *
8201@c   Python                   613     *
8202@c   Unix Shell               357     *
8203@c   Tcl                      266     *
8204@c   SQL                      174
8205@c   JavaScript               118
8206@c   Assembly                 108
8207@c   Scheme                    51
8208@c   Ruby                      47
8209@c   Lisp                      45     *
8210@c   Objective C               39     *
8211@c   PL/SQL                    29
8212@c   Fortran                   25
8213@c   Ada                       24
8214@c   Delphi                    22
8215@c   Awk                       19     *
8216@c   Pascal                    19
8217@c   ML                        19
8218@c   Eiffel                    17
8219@c   Emacs-Lisp                14     *
8220@c   Zope                      14
8221@c   ASP                       12
8222@c   Forth                     12
8223@c   Cold Fusion               10
8224@c   Haskell                    9
8225@c   Visual Basic               9
8226@c   C#                         6     *
8227@c   Smalltalk                  6     *
8228@c   Basic                      5
8229@c   Erlang                     5
8230@c   Modula                     5
8231@c   Object Pascal              5     *
8232@c   Rexx                       5
8233@c   Dylan                      4
8234@c   Prolog                     4
8235@c   APL                        3
8236@c   PROGRESS                   2
8237@c   Euler                      1
8238@c   Euphoria                   1
8239@c   Pliant                     1
8240@c   Simula                     1
8241@c   XBasic                     1
8242@c   Logo                       0
8243@c   Other Scripting Engines   49
8244@c   Other                    116
8245
8246@menu
8247* C::                           C, C++, Objective C
8248* sh::                          sh - Shell Script
8249* bash::                        bash - Bourne-Again Shell Script
8250* Python::                      Python
8251* Common Lisp::                 GNU clisp - Common Lisp
8252* clisp C::                     GNU clisp C sources
8253* Emacs Lisp::                  Emacs Lisp
8254* librep::                      librep
8255* Scheme::                      GNU guile - Scheme
8256* Smalltalk::                   GNU Smalltalk
8257* Java::                        Java
8258* C#::                          C#
8259* gawk::                        GNU awk
8260* Pascal::                      Pascal - Free Pascal Compiler
8261* wxWidgets::                   wxWidgets library
8262* YCP::                         YCP - YaST2 scripting language
8263* Tcl::                         Tcl - Tk's scripting language
8264* Perl::                        Perl
8265* PHP::                         PHP Hypertext Preprocessor
8266* Pike::                        Pike
8267* GCC-source::                  GNU Compiler Collection sources
8268@end menu
8269
8270@node C, sh, List of Programming Languages, List of Programming Languages
8271@subsection C, C++, Objective C
8272@cindex C and C-like languages
8273
8274@table @asis
8275@item RPMs
8276gcc, gpp, gobjc, glibc, gettext
8277
8278@item File extension
8279For C: @code{c}, @code{h}.
8280@*For C++: @code{C}, @code{c++}, @code{cc}, @code{cxx}, @code{cpp}, @code{hpp}.
8281@*For Objective C: @code{m}.
8282
8283@item String syntax
8284@code{"abc"}
8285
8286@item gettext shorthand
8287@code{_("abc")}
8288
8289@item gettext/ngettext functions
8290@code{gettext}, @code{dgettext}, @code{dcgettext}, @code{ngettext},
8291@code{dngettext}, @code{dcngettext}
8292
8293@item textdomain
8294@code{textdomain} function
8295
8296@item bindtextdomain
8297@code{bindtextdomain} function
8298
8299@item setlocale
8300Programmer must call @code{setlocale (LC_ALL, "")}
8301
8302@item Prerequisite
8303@code{#include <libintl.h>}
8304@*@code{#include <locale.h>}
8305@*@code{#define _(string) gettext (string)}
8306
8307@item Use or emulate GNU gettext
8308Use
8309
8310@item Extractor
8311@code{xgettext -k_}
8312
8313@item Formatting with positions
8314@code{fprintf "%2$d %1$d"}
8315@*In C++: @code{autosprintf "%2$d %1$d"}
8316(@pxref{Top, , Introduction, autosprintf, GNU autosprintf})
8317
8318@item Portability
8319autoconf (gettext.m4) and #if ENABLE_NLS
8320
8321@item po-mode marking
8322yes
8323@end table
8324
8325The following examples are available in the @file{examples} directory:
8326@code{hello-c}, @code{hello-c-gnome}, @code{hello-c++}, @code{hello-c++-qt}, 
8327@code{hello-c++-kde}, @code{hello-c++-gnome}, @code{hello-c++-wxwidgets},
8328@code{hello-objc}, @code{hello-objc-gnustep}, @code{hello-objc-gnome}.
8329
8330@node sh, bash, C, List of Programming Languages
8331@subsection sh - Shell Script
8332@cindex shell scripts
8333
8334@table @asis
8335@item RPMs
8336bash, gettext
8337
8338@item File extension
8339@code{sh}
8340
8341@item String syntax
8342@code{"abc"}, @code{'abc'}, @code{abc}
8343
8344@item gettext shorthand
8345@code{"`gettext \"abc\"`"}
8346
8347@item gettext/ngettext functions
8348@pindex gettext
8349@pindex ngettext
8350@code{gettext}, @code{ngettext} programs
8351@*@code{eval_gettext}, @code{eval_ngettext} shell functions
8352
8353@item textdomain
8354@vindex TEXTDOMAIN@r{, environment variable}
8355environment variable @code{TEXTDOMAIN}
8356
8357@item bindtextdomain
8358@vindex TEXTDOMAINDIR@r{, environment variable}
8359environment variable @code{TEXTDOMAINDIR}
8360
8361@item setlocale
8362automatic
8363
8364@item Prerequisite
8365@code{. gettext.sh}
8366
8367@item Use or emulate GNU gettext
8368use
8369
8370@item Extractor
8371@code{xgettext}
8372
8373@item Formatting with positions
8374---
8375
8376@item Portability
8377fully portable
8378
8379@item po-mode marking
8380---
8381@end table
8382
8383An example is available in the @file{examples} directory: @code{hello-sh}.
8384
8385@menu
8386* Preparing Shell Scripts::     Preparing Shell Scripts for Internationalization
8387* gettext.sh::                  Contents of @code{gettext.sh}
8388* gettext Invocation::          Invoking the @code{gettext} program
8389* ngettext Invocation::         Invoking the @code{ngettext} program
8390* envsubst Invocation::         Invoking the @code{envsubst} program
8391* eval_gettext Invocation::     Invoking the @code{eval_gettext} function
8392* eval_ngettext Invocation::    Invoking the @code{eval_ngettext} function
8393@end menu
8394
8395@node Preparing Shell Scripts, gettext.sh, sh, sh
8396@subsubsection Preparing Shell Scripts for Internationalization
8397@cindex preparing shell scripts for translation
8398
8399Preparing a shell script for internationalization is conceptually similar
8400to the steps described in @ref{Sources}.  The concrete steps for shell
8401scripts are as follows.
8402
8403@enumerate
8404@item
8405Insert the line
8406
8407@smallexample
8408. gettext.sh
8409@end smallexample
8410
8411near the top of the script.  @code{gettext.sh} is a shell function library
8412that provides the functions
8413@code{eval_gettext} (see @ref{eval_gettext Invocation}) and
8414@code{eval_ngettext} (see @ref{eval_ngettext Invocation}).
8415You have to ensure that @code{gettext.sh} can be found in the @code{PATH}.
8416
8417@item
8418Set and export the @code{TEXTDOMAIN} and @code{TEXTDOMAINDIR} environment
8419variables.  Usually @code{TEXTDOMAIN} is the package or program name, and
8420@code{TEXTDOMAINDIR} is the absolute pathname corresponding to
8421@code{$prefix/share/locale}, where @code{$prefix} is the installation location.
8422
8423@smallexample
8424TEXTDOMAIN=@@PACKAGE@@
8425export TEXTDOMAIN
8426TEXTDOMAINDIR=@@LOCALEDIR@@
8427export TEXTDOMAINDIR
8428@end smallexample
8429
8430@item
8431Prepare the strings for translation, as described in @ref{Preparing Strings}.
8432
8433@item
8434Simplify translatable strings so that they don't contain command substitution
8435(@code{"`...`"} or @code{"$(...)"}), variable access with defaulting (like
8436@code{$@{@var{variable}-@var{default}@}}), access to positional arguments
8437(like @code{$0}, @code{$1}, ...) or highly volatile shell variables (like
8438@code{$?}). This can always be done through simple local code restructuring.
8439For example,
8440
8441@smallexample
8442echo "Usage: $0 [OPTION] FILE..."
8443@end smallexample
8444
8445becomes
8446
8447@smallexample
8448program_name=$0
8449echo "Usage: $program_name [OPTION] FILE..."
8450@end smallexample
8451
8452Similarly,
8453
8454@smallexample
8455echo "Remaining files: `ls | wc -l`"
8456@end smallexample
8457
8458becomes
8459
8460@smallexample
8461filecount="`ls | wc -l`"
8462echo "Remaining files: $filecount"
8463@end smallexample
8464
8465@item
8466For each translatable string, change the output command @samp{echo} or
8467@samp{$echo} to @samp{gettext} (if the string contains no references to
8468shell variables) or to @samp{eval_gettext} (if it refers to shell variables),
8469followed by a no-argument @samp{echo} command (to account for the terminating
8470newline). Similarly, for cases with plural handling, replace a conditional
8471@samp{echo} command with an invocation of @samp{ngettext} or
8472@samp{eval_ngettext}, followed by a no-argument @samp{echo} command.
8473
8474When doing this, you also need to add an extra backslash before the dollar
8475sign in references to shell variables, so that the @samp{eval_gettext}
8476function receives the translatable string before the variable values are
8477substituted into it. For example,
8478
8479@smallexample
8480echo "Remaining files: $filecount"
8481@end smallexample
8482
8483becomes
8484
8485@smallexample
8486eval_gettext "Remaining files: \$filecount"; echo
8487@end smallexample
8488
8489If the output command is not @samp{echo}, you can make it use @samp{echo}
8490nevertheless, through the use of backquotes. However, note that inside
8491backquotes, backslashes must be doubled to be effective (because the
8492backquoting eats one level of backslashes). For example, assuming that
8493@samp{error} is a shell function that signals an error,
8494
8495@smallexample
8496error "file not found: $filename"
8497@end smallexample
8498
8499is first transformed into
8500
8501@smallexample
8502error "`echo \"file not found: \$filename\"`"
8503@end smallexample
8504
8505which then becomes
8506
8507@smallexample
8508error "`eval_gettext \"file not found: \\\$filename\"`"
8509@end smallexample
8510@end enumerate
8511
8512@node gettext.sh, gettext Invocation, Preparing Shell Scripts, sh
8513@subsubsection Contents of @code{gettext.sh}
8514
8515@code{gettext.sh}, contained in the run-time package of GNU gettext, provides
8516the following:
8517
8518@itemize @bullet
8519@item $echo
8520The variable @code{echo} is set to a command that outputs its first argument
8521and a newline, without interpreting backslashes in the argument string.
8522
8523@item eval_gettext
8524See @ref{eval_gettext Invocation}.
8525
8526@item eval_ngettext
8527See @ref{eval_ngettext Invocation}.
8528@end itemize
8529
8530@node gettext Invocation, ngettext Invocation, gettext.sh, sh
8531@subsubsection Invoking the @code{gettext} program
8532
8533@include rt-gettext.texi
8534
8535@node ngettext Invocation, envsubst Invocation, gettext Invocation, sh
8536@subsubsection Invoking the @code{ngettext} program
8537
8538@include rt-ngettext.texi
8539
8540@node envsubst Invocation, eval_gettext Invocation, ngettext Invocation, sh
8541@subsubsection Invoking the @code{envsubst} program
8542
8543@include rt-envsubst.texi
8544
8545@node eval_gettext Invocation, eval_ngettext Invocation, envsubst Invocation, sh
8546@subsubsection Invoking the @code{eval_gettext} function
8547
8548@cindex @code{eval_gettext} function, usage
8549@example
8550eval_gettext @var{msgid}
8551@end example
8552
8553@cindex lookup message translation
8554This function outputs the native language translation of a textual message,
8555performing dollar-substitution on the result.  Note that only shell variables
8556mentioned in @var{msgid} will be dollar-substituted in the result.
8557
8558@node eval_ngettext Invocation,  , eval_gettext Invocation, sh
8559@subsubsection Invoking the @code{eval_ngettext} function
8560
8561@cindex @code{eval_ngettext} function, usage
8562@example
8563eval_ngettext @var{msgid} @var{msgid-plural} @var{count}
8564@end example
8565
8566@cindex lookup plural message translation
8567This function outputs the native language translation of a textual message
8568whose grammatical form depends on a number, performing dollar-substitution
8569on the result.  Note that only shell variables mentioned in @var{msgid} or
8570@var{msgid-plural} will be dollar-substituted in the result.
8571
8572@node bash, Python, sh, List of Programming Languages
8573@subsection bash - Bourne-Again Shell Script
8574@cindex bash
8575
8576GNU @code{bash} 2.0 or newer has a special shorthand for translating a
8577string and substituting variable values in it: @code{$"msgid"}.  But
8578the use of this construct is @strong{discouraged}, due to the security
8579holes it opens and due to its portability problems.
8580
8581The security holes of @code{$"..."} come from the fact that after looking up
8582the translation of the string, @code{bash} processes it like it processes
8583any double-quoted string: dollar and backquote processing, like @samp{eval}
8584does.
8585
8586@enumerate
8587@item
8588In a locale whose encoding is one of BIG5, BIG5-HKSCS, GBK, GB18030, SHIFT_JIS,
8589JOHAB, some double-byte characters have a second byte whose value is
8590@code{0x60}.  For example, the byte sequence @code{\xe0\x60} is a single
8591character in these locales.  Many versions of @code{bash} (all versions
8592up to bash-2.05, and newer versions on platforms without @code{mbsrtowcs()}
8593function) don't know about character boundaries and see a backquote character
8594where there is only a particular Chinese character.  Thus it can start
8595executing part of the translation as a command list.  This situation can occur
8596even without the translator being aware of it: if the translator provides
8597translations in the UTF-8 encoding, it is the @code{gettext()} function which
8598will, during its conversion from the translator's encoding to the user's
8599locale's encoding, produce the dangerous @code{\x60} bytes.
8600
8601@item
8602A translator could - voluntarily or inadvertently - use backquotes
8603@code{"`...`"} or dollar-parentheses @code{"$(...)"} in her translations.
8604The enclosed strings would be executed as command lists by the shell.
8605@end enumerate
8606
8607The portability problem is that @code{bash} must be built with
8608internationalization support; this is normally not the case on systems
8609that don't have the @code{gettext()} function in libc.
8610
8611@node Python, Common Lisp, bash, List of Programming Languages
8612@subsection Python
8613@cindex Python
8614
8615@table @asis
8616@item RPMs
8617python
8618
8619@item File extension
8620@code{py}
8621
8622@item String syntax
8623@code{'abc'}, @code{u'abc'}, @code{r'abc'}, @code{ur'abc'},
8624@*@code{"abc"}, @code{u"abc"}, @code{r"abc"}, @code{ur"abc"},
8625@*@code{'''abc'''}, @code{u'''abc'''}, @code{r'''abc'''}, @code{ur'''abc'''},
8626@*@code{"""abc"""}, @code{u"""abc"""}, @code{r"""abc"""}, @code{ur"""abc"""}
8627
8628@item gettext shorthand
8629@code{_('abc')} etc.
8630
8631@item gettext/ngettext functions
8632@code{gettext.gettext}, @code{gettext.dgettext},
8633@code{gettext.ngettext}, @code{gettext.dngettext},
8634also @code{ugettext}, @code{ungettext}
8635
8636@item textdomain
8637@code{gettext.textdomain} function, or
8638@code{gettext.install(@var{domain})} function
8639
8640@item bindtextdomain
8641@code{gettext.bindtextdomain} function, or
8642@code{gettext.install(@var{domain},@var{localedir})} function
8643
8644@item setlocale
8645not used by the gettext emulation
8646
8647@item Prerequisite
8648@code{import gettext}
8649
8650@item Use or emulate GNU gettext
8651emulate
8652
8653@item Extractor
8654@code{xgettext}
8655
8656@item Formatting with positions
8657@code{'...%(ident)d...' % @{ 'ident': value @}}
8658
8659@item Portability
8660fully portable
8661
8662@item po-mode marking
8663---
8664@end table
8665
8666An example is available in the @file{examples} directory: @code{hello-python}.
8667
8668@node Common Lisp, clisp C, Python, List of Programming Languages
8669@subsection GNU clisp - Common Lisp
8670@cindex Common Lisp
8671@cindex Lisp
8672@cindex clisp
8673
8674@table @asis
8675@item RPMs
8676clisp 2.28 or newer
8677
8678@item File extension
8679@code{lisp}
8680
8681@item String syntax
8682@code{"abc"}
8683
8684@item gettext shorthand
8685@code{(_ "abc")}, @code{(ENGLISH "abc")}
8686
8687@item gettext/ngettext functions
8688@code{i18n:gettext}, @code{i18n:ngettext}
8689
8690@item textdomain
8691@code{i18n:textdomain}
8692
8693@item bindtextdomain
8694@code{i18n:textdomaindir}
8695
8696@item setlocale
8697automatic
8698
8699@item Prerequisite
8700---
8701
8702@item Use or emulate GNU gettext
8703use
8704
8705@item Extractor
8706@code{xgettext -k_ -kENGLISH}
8707
8708@item Formatting with positions
8709@code{format "~1@@*~D ~0@@*~D"}
8710
8711@item Portability
8712On platforms without gettext, no translation.
8713
8714@item po-mode marking
8715---
8716@end table
8717
8718An example is available in the @file{examples} directory: @code{hello-clisp}.
8719
8720@node clisp C, Emacs Lisp, Common Lisp, List of Programming Languages
8721@subsection GNU clisp C sources
8722@cindex clisp C sources
8723
8724@table @asis
8725@item RPMs
8726clisp
8727
8728@item File extension
8729@code{d}
8730
8731@item String syntax
8732@code{"abc"}
8733
8734@item gettext shorthand
8735@code{ENGLISH ? "abc" : ""}
8736@*@code{GETTEXT("abc")}
8737@*@code{GETTEXTL("abc")}
8738
8739@item gettext/ngettext functions
8740@code{clgettext}, @code{clgettextl}
8741
8742@item textdomain
8743---
8744
8745@item bindtextdomain
8746---
8747
8748@item setlocale
8749automatic
8750
8751@item Prerequisite
8752@code{#include "lispbibl.c"}
8753
8754@item Use or emulate GNU gettext
8755use
8756
8757@item Extractor
8758@code{clisp-xgettext}
8759
8760@item Formatting with positions
8761@code{fprintf "%2$d %1$d"}
8762
8763@item Portability
8764On platforms without gettext, no translation.
8765
8766@item po-mode marking
8767---
8768@end table
8769
8770@node Emacs Lisp, librep, clisp C, List of Programming Languages
8771@subsection Emacs Lisp
8772@cindex Emacs Lisp
8773
8774@table @asis
8775@item RPMs
8776emacs, xemacs
8777
8778@item File extension
8779@code{el}
8780
8781@item String syntax
8782@code{"abc"}
8783
8784@item gettext shorthand
8785@code{(_"abc")}
8786
8787@item gettext/ngettext functions
8788@code{gettext}, @code{dgettext} (xemacs only)
8789
8790@item textdomain
8791@code{domain} special form (xemacs only)
8792
8793@item bindtextdomain
8794@code{bind-text-domain} function (xemacs only)
8795
8796@item setlocale
8797automatic
8798
8799@item Prerequisite
8800---
8801
8802@item Use or emulate GNU gettext
8803use
8804
8805@item Extractor
8806@code{xgettext}
8807
8808@item Formatting with positions
8809@code{format "%2$d %1$d"}
8810
8811@item Portability
8812Only XEmacs.  Without @code{I18N3} defined at build time, no translation.
8813
8814@item po-mode marking
8815---
8816@end table
8817
8818@node librep, Scheme, Emacs Lisp, List of Programming Languages
8819@subsection librep
8820@cindex @code{librep} Lisp
8821
8822@table @asis
8823@item RPMs
8824librep 0.15.3 or newer
8825
8826@item File extension
8827@code{jl}
8828
8829@item String syntax
8830@code{"abc"}
8831
8832@item gettext shorthand
8833@code{(_"abc")}
8834
8835@item gettext/ngettext functions
8836@code{gettext}
8837
8838@item textdomain
8839@code{textdomain} function
8840
8841@item bindtextdomain
8842@code{bindtextdomain} function
8843
8844@item setlocale
8845---
8846
8847@item Prerequisite
8848@code{(require 'rep.i18n.gettext)}
8849
8850@item Use or emulate GNU gettext
8851use
8852
8853@item Extractor
8854@code{xgettext}
8855
8856@item Formatting with positions
8857@code{format "%2$d %1$d"}
8858
8859@item Portability
8860On platforms without gettext, no translation.
8861
8862@item po-mode marking
8863---
8864@end table
8865
8866An example is available in the @file{examples} directory: @code{hello-librep}.
8867
8868@node Scheme, Smalltalk, librep, List of Programming Languages
8869@subsection GNU guile - Scheme
8870@cindex Scheme
8871@cindex guile
8872
8873@table @asis
8874@item RPMs
8875guile
8876
8877@item File extension
8878@code{scm}
8879
8880@item String syntax
8881@code{"abc"}
8882
8883@item gettext shorthand
8884@code{(_ "abc")}
8885
8886@item gettext/ngettext functions
8887@code{gettext}, @code{ngettext}
8888
8889@item textdomain
8890@code{textdomain}
8891
8892@item bindtextdomain
8893@code{bindtextdomain}
8894
8895@item setlocale
8896@code{(catch #t (lambda () (setlocale LC_ALL "")) (lambda args #f))}
8897
8898@item Prerequisite
8899@code{(use-modules (ice-9 format))}
8900
8901@item Use or emulate GNU gettext
8902use
8903
8904@item Extractor
8905@code{xgettext -k_}
8906
8907@item Formatting with positions
8908@c @code{format "~1@@*~D ~0@@*~D~2@@*"}, requires @code{(use-modules (ice-9 format))}
8909@c not yet supported
8910---
8911
8912@item Portability
8913On platforms without gettext, no translation.
8914
8915@item po-mode marking
8916---
8917@end table
8918
8919An example is available in the @file{examples} directory: @code{hello-guile}.
8920
8921@node Smalltalk, Java, Scheme, List of Programming Languages
8922@subsection GNU Smalltalk
8923@cindex Smalltalk
8924
8925@table @asis
8926@item RPMs
8927smalltalk
8928
8929@item File extension
8930@code{st}
8931
8932@item String syntax
8933@code{'abc'}
8934
8935@item gettext shorthand
8936@code{NLS ? 'abc'}
8937
8938@item gettext/ngettext functions
8939@code{LcMessagesDomain>>#at:}, @code{LcMessagesDomain>>#at:plural:with:}
8940
8941@item textdomain
8942@code{LcMessages>>#domain:localeDirectory:} (returns a @code{LcMessagesDomain}
8943object).@*
8944Example: @code{I18N Locale default messages domain: 'gettext' localeDirectory: /usr/local/share/locale'}
8945
8946@item bindtextdomain
8947@code{LcMessages>>#domain:localeDirectory:}, see above.
8948
8949@item setlocale
8950Automatic if you use @code{I18N Locale default}.
8951
8952@item Prerequisite
8953@code{PackageLoader fileInPackage: 'I18N'!}
8954
8955@item Use or emulate GNU gettext
8956emulate
8957
8958@item Extractor
8959@code{xgettext}
8960
8961@item Formatting with positions
8962@code{'%1 %2' bindWith: 'Hello' with: 'world'}
8963
8964@item Portability
8965fully portable
8966
8967@item po-mode marking
8968---
8969@end table
8970
8971An example is available in the @file{examples} directory:
8972@code{hello-smalltalk}.
8973
8974@node Java, C#, Smalltalk, List of Programming Languages
8975@subsection Java
8976@cindex Java
8977
8978@table @asis
8979@item RPMs
8980java, java2
8981
8982@item File extension
8983@code{java}
8984
8985@item String syntax
8986"abc"
8987
8988@item gettext shorthand
8989_("abc")
8990
8991@item gettext/ngettext functions
8992@code{GettextResource.gettext}, @code{GettextResource.ngettext}
8993
8994@item textdomain
8995---, use @code{ResourceBundle.getResource} instead
8996
8997@item bindtextdomain
8998---, use CLASSPATH instead
8999
9000@item setlocale
9001automatic
9002
9003@item Prerequisite
9004---
9005
9006@item Use or emulate GNU gettext
9007---, uses a Java specific message catalog format
9008
9009@item Extractor
9010@code{xgettext -k_}
9011
9012@item Formatting with positions
9013@code{MessageFormat.format "@{1,number@} @{0,number@}"}
9014
9015@item Portability
9016fully portable
9017
9018@item po-mode marking
9019---
9020@end table
9021
9022Before marking strings as internationalizable, uses of the string
9023concatenation operator need to be converted to @code{MessageFormat}
9024applications.  For example, @code{"file "+filename+" not found"} becomes
9025@code{MessageFormat.format("file @{0@} not found", new Object[] @{ filename @})}.
9026Only after this is done, can the strings be marked and extracted.
9027
9028GNU gettext uses the native Java internationalization mechanism, namely
9029@code{ResourceBundle}s.  There are two formats of @code{ResourceBundle}s:
9030@code{.properties} files and @code{.class} files.  The @code{.properties}
9031format is a text file which the translators can directly edit, like PO
9032files, but which doesn't support plural forms.  Whereas the @code{.class}
9033format is compiled from @code{.java} source code and can support plural
9034forms (provided it is accessed through an appropriate API, see below).
9035
9036To convert a PO file to a @code{.properties} file, the @code{msgcat}
9037program can be used with the option @code{--properties-output}.  To convert
9038a @code{.properties} file back to a PO file, the @code{msgcat} program
9039can be used with the option @code{--properties-input}.  All the tools
9040that manipulate PO files can work with @code{.properties} files as well,
9041if given the @code{--properties-input} and/or @code{--properties-output}
9042option.
9043
9044To convert a PO file to a ResourceBundle class, the @code{msgfmt} program
9045can be used with the option @code{--java} or @code{--java2}.  To convert a
9046ResourceBundle back to a PO file, the @code{msgunfmt} program can be used
9047with the option @code{--java}.
9048
9049Two different programmatic APIs can be used to access ResourceBundles.
9050Note that both APIs work with all kinds of ResourceBundles, whether
9051GNU gettext generated classes, or other @code{.class} or @code{.properties}
9052files.
9053
9054@enumerate
9055@item
9056The @code{java.util.ResourceBundle} API.
9057
9058In particular, its @code{getString} function returns a string translation.
9059Note that a missing translation yields a @code{MissingResourceException}.
9060
9061This has the advantage of being the standard API.  And it does not require
9062any additional libraries, only the @code{msgcat} generated @code{.properties}
9063files or the @code{msgfmt} generated @code{.class} files.  But it cannot do
9064plural handling, even if the resource was generated by @code{msgfmt} from
9065a PO file with plural handling.
9066
9067@item
9068The @code{gnu.gettext.GettextResource} API.
9069
9070Reference documentation in Javadoc 1.1 style format
9071is in the @uref{javadoc1/tree.html,javadoc1 directory} and
9072in Javadoc 2 style format
9073in the @uref{javadoc2/index.html,javadoc2 directory}.
9074
9075Its @code{gettext} function returns a string translation.  Note that when
9076a translation is missing, the @var{msgid} argument is returned unchanged.
9077
9078This has the advantage of having the @code{ngettext} function for plural
9079handling.
9080
9081@cindex @code{libintl} for Java
9082To use this API, one needs the @code{libintl.jar} file which is part of
9083the GNU gettext package and distributed under the LGPL.
9084@end enumerate
9085
9086Three examples, using the second API, are available in the @file{examples}
9087directory: @code{hello-java}, @code{hello-java-awt}, @code{hello-java-swing}.
9088
9089Now, to make use of the API and define a shorthand for @samp{getString},
9090there are three idioms that you can choose from:
9091
9092@itemize @bullet
9093@item
9094(This one assumes Java 1.5 or newer.)
9095In a unique class of your project, say @samp{Util}, define a static variable
9096holding the @code{ResourceBundle} instance and the shorthand:
9097
9098@smallexample
9099private static ResourceBundle myResources =
9100  ResourceBundle.getBundle("domain-name");
9101public static String _(String s) @{
9102  return myResources.getString(s);
9103@}
9104@end smallexample
9105
9106All classes containing internationalized strings then contain
9107
9108@smallexample
9109import static Util._;
9110@end smallexample
9111
9112@noindent
9113and the shorthand is used like this:
9114
9115@smallexample
9116System.out.println(_("Operation completed."));
9117@end smallexample
9118
9119@item
9120In a unique class of your project, say @samp{Util}, define a static variable
9121holding the @code{ResourceBundle} instance:
9122
9123@smallexample
9124public static ResourceBundle myResources =
9125  ResourceBundle.getBundle("domain-name");
9126@end smallexample
9127
9128All classes containing internationalized strings then contain
9129
9130@smallexample
9131private static ResourceBundle res = Util.myResources;
9132private static String _(String s) @{ return res.getString(s); @}
9133@end smallexample
9134
9135@noindent
9136and the shorthand is used like this:
9137
9138@smallexample
9139System.out.println(_("Operation completed."));
9140@end smallexample
9141
9142@item
9143You add a class with a very short name, say @samp{S}, containing just the
9144definition of the resource bundle and of the shorthand:
9145
9146@smallexample
9147public class S @{
9148  public static ResourceBundle myResources =
9149    ResourceBundle.getBundle("domain-name");
9150  public static String _(String s) @{
9151    return myResources.getString(s);
9152  @}
9153@}
9154@end smallexample
9155
9156@noindent
9157and the shorthand is used like this:
9158
9159@smallexample
9160System.out.println(S._("Operation completed."));
9161@end smallexample
9162@end itemize
9163
9164Which of the three idioms you choose, will depend on whether your project
9165requires portability to Java versions prior to Java 1.5 and, if so, whether
9166copying two lines of codes into every class is more acceptable in your project
9167than a class with a single-letter name.
9168
9169@node C#, gawk, Java, List of Programming Languages
9170@subsection C#
9171@cindex C#
9172
9173@table @asis
9174@item RPMs
9175pnet, pnetlib 0.6.2 or newer, or mono 0.29 or newer
9176
9177@item File extension
9178@code{cs}
9179
9180@item String syntax
9181@code{"abc"}, @code{@@"abc"}
9182
9183@item gettext shorthand
9184_("abc")
9185
9186@item gettext/ngettext functions
9187@code{GettextResourceManager.GetString},
9188@code{GettextResourceManager.GetPluralString}
9189
9190@item textdomain
9191@code{new GettextResourceManager(domain)}
9192
9193@item bindtextdomain
9194---, compiled message catalogs are located in subdirectories of the directory
9195containing the executable
9196
9197@item setlocale
9198automatic
9199
9200@item Prerequisite
9201---
9202
9203@item Use or emulate GNU gettext
9204---, uses a C# specific message catalog format
9205
9206@item Extractor
9207@code{xgettext -k_}
9208
9209@item Formatting with positions
9210@code{String.Format "@{1@} @{0@}"}
9211
9212@item Portability
9213fully portable
9214
9215@item po-mode marking
9216---
9217@end table
9218
9219Before marking strings as internationalizable, uses of the string
9220concatenation operator need to be converted to @code{String.Format}
9221invocations.  For example, @code{"file "+filename+" not found"} becomes
9222@code{String.Format("file @{0@} not found", filename)}.
9223Only after this is done, can the strings be marked and extracted.
9224
9225GNU gettext uses the native C#/.NET internationalization mechanism, namely
9226the classes @code{ResourceManager} and @code{ResourceSet}.  Applications
9227use the @code{ResourceManager} methods to retrieve the native language
9228translation of strings.  An instance of @code{ResourceSet} is the in-memory
9229representation of a message catalog file.  The @code{ResourceManager} loads
9230and accesses @code{ResourceSet} instances as needed to look up the
9231translations.
9232
9233There are two formats of @code{ResourceSet}s that can be directly loaded by
9234the C# runtime: @code{.resources} files and @code{.dll} files.
9235
9236@itemize @bullet
9237@item
9238The @code{.resources} format is a binary file usually generated through the
9239@code{resgen} or @code{monoresgen} utility, but which doesn't support plural
9240forms.  @code{.resources} files can also be embedded in .NET @code{.exe} files.
9241This only affects whether a file system access is performed to load the message
9242catalog; it doesn't affect the contents of the message catalog.
9243
9244@item
9245On the other hand, the @code{.dll} format is a binary file that is compiled
9246from @code{.cs} source code and can support plural forms (provided it is
9247accessed through the GNU gettext API, see below).
9248@end itemize
9249
9250Note that these .NET @code{.dll} and @code{.exe} files are not tied to a
9251particular platform; their file format and GNU gettext for C# can be used
9252on any platform.
9253
9254To convert a PO file to a @code{.resources} file, the @code{msgfmt} program
9255can be used with the option @samp{--csharp-resources}.  To convert a
9256@code{.resources} file back to a PO file, the @code{msgunfmt} program can be
9257used with the option @samp{--csharp-resources}.  You can also, in some cases,
9258use the @code{resgen} program (from the @code{pnet} package) or the
9259@code{monoresgen} program (from the @code{mono}/@code{mcs} package).  These
9260programs can also convert a @code{.resources} file back to a PO file.  But
9261beware: as of this writing (January 2004), the @code{monoresgen} converter is
9262quite buggy and the @code{resgen} converter ignores the encoding of the PO
9263files.
9264
9265To convert a PO file to a @code{.dll} file, the @code{msgfmt} program can be
9266used with the option @code{--csharp}.  The result will be a @code{.dll} file
9267containing a subclass of @code{GettextResourceSet}, which itself is a subclass
9268of @code{ResourceSet}.  To convert a @code{.dll} file containing a
9269@code{GettextResourceSet} subclass back to a PO file, the @code{msgunfmt}
9270program can be used with the option @code{--csharp}.
9271
9272The advantages of the @code{.dll} format over the @code{.resources} format
9273are:
9274
9275@enumerate
9276@item
9277Freedom to localize: Users can add their own translations to an application
9278after it has been built and distributed.  Whereas when the programmer uses
9279a @code{ResourceManager} constructor provided by the system, the set of
9280@code{.resources} files for an application must be specified when the
9281application is built and cannot be extended afterwards.
9282@c If this were the only issue with the @code{.resources} format, one could
9283@c use the @code{ResourceManager.CreateFileBasedResourceManager} function.
9284
9285@item
9286Plural handling: A message catalog in @code{.dll} format supports the plural
9287handling function @code{GetPluralString}.  Whereas @code{.resources} files can
9288only contain data and only support lookups that depend on a single string.
9289
9290@item
9291The @code{GettextResourceManager} that loads the message catalogs in
9292@code{.dll} format also provides for inheritance on a per-message basis.
9293For example, in Austrian (@code{de_AT}) locale, translations from the German
9294(@code{de}) message catalog will be used for messages not found in the
9295Austrian message catalog.  This has the consequence that the Austrian
9296translators need only translate those few messages for which the translation
9297into Austrian differs from the German one.  Whereas when working with
9298@code{.resources} files, each message catalog must provide the translations
9299of all messages by itself.
9300
9301@item
9302The @code{GettextResourceManager} that loads the message catalogs in
9303@code{.dll} format also provides for a fallback: The English @var{msgid} is
9304returned when no translation can be found.  Whereas when working with
9305@code{.resources} files, a language-neutral @code{.resources} file must
9306explicitly be provided as a fallback.
9307@end enumerate
9308
9309On the side of the programmatic APIs, the programmer can use either the
9310standard @code{ResourceManager} API and the GNU @code{GettextResourceManager}
9311API.  The latter is an extension of the former, because
9312@code{GettextResourceManager} is a subclass of @code{ResourceManager}.
9313
9314@enumerate
9315@item
9316The @code{System.Resources.ResourceManager} API.
9317
9318This API works with resources in @code{.resources} format.
9319
9320The creation of the @code{ResourceManager} is done through
9321@smallexample
9322  new ResourceManager(domainname, Assembly.GetExecutingAssembly())
9323@end smallexample
9324@noindent
9325
9326The @code{GetString} function returns a string's translation.  Note that this
9327function returns null when a translation is missing (i.e.@: not even found in
9328the fallback resource file).
9329
9330@item
9331The @code{GNU.Gettext.GettextResourceManager} API.
9332
9333This API works with resources in @code{.dll} format.
9334
9335Reference documentation is in the
9336@uref{csharpdoc/index.html,csharpdoc directory}.
9337
9338The creation of the @code{ResourceManager} is done through
9339@smallexample
9340  new GettextResourceManager(domainname)
9341@end smallexample
9342
9343The @code{GetString} function returns a string's translation.  Note that when
9344a translation is missing, the @var{msgid} argument is returned unchanged.
9345
9346The @code{GetPluralString} function returns a string translation with plural
9347handling, like the @code{ngettext} function in C.
9348
9349@cindex @code{libintl} for C#
9350To use this API, one needs the @code{GNU.Gettext.dll} file which is part of
9351the GNU gettext package and distributed under the LGPL.
9352@end enumerate
9353
9354You can also mix both approaches: use the
9355@code{GNU.Gettext.GettextResourceManager} constructor, but otherwise use
9356only the @code{ResourceManager} type and only the @code{GetString} method.
9357This is appropriate when you want to profit from the tools for PO files,
9358but don't want to change an existing source code that uses
9359@code{ResourceManager} and don't (yet) need the @code{GetPluralString} method.
9360
9361Two examples, using the second API, are available in the @file{examples}
9362directory: @code{hello-csharp}, @code{hello-csharp-forms}.
9363
9364Now, to make use of the API and define a shorthand for @samp{GetString},
9365there are two idioms that you can choose from:
9366
9367@itemize @bullet
9368@item
9369In a unique class of your project, say @samp{Util}, define a static variable
9370holding the @code{ResourceManager} instance:
9371
9372@smallexample
9373public static GettextResourceManager MyResourceManager =
9374  new GettextResourceManager("domain-name");
9375@end smallexample
9376
9377All classes containing internationalized strings then contain
9378
9379@smallexample
9380private static GettextResourceManager Res = Util.MyResourceManager;
9381private static String _(String s) @{ return Res.GetString(s); @}
9382@end smallexample
9383
9384@noindent
9385and the shorthand is used like this:
9386
9387@smallexample
9388Console.WriteLine(_("Operation completed."));
9389@end smallexample
9390
9391@item
9392You add a class with a very short name, say @samp{S}, containing just the
9393definition of the resource manager and of the shorthand:
9394
9395@smallexample
9396public class S @{
9397  public static GettextResourceManager MyResourceManager =
9398    new GettextResourceManager("domain-name");
9399  public static String _(String s) @{
9400     return MyResourceManager.GetString(s);
9401  @}
9402@}
9403@end smallexample
9404
9405@noindent
9406and the shorthand is used like this:
9407
9408@smallexample
9409Console.WriteLine(S._("Operation completed."));
9410@end smallexample
9411@end itemize
9412
9413Which of the two idioms you choose, will depend on whether copying two lines
9414of codes into every class is more acceptable in your project than a class
9415with a single-letter name.
9416
9417@node gawk, Pascal, C#, List of Programming Languages
9418@subsection GNU awk
9419@cindex awk
9420@cindex gawk
9421
9422@table @asis
9423@item RPMs
9424gawk 3.1 or newer
9425
9426@item File extension
9427@code{awk}
9428
9429@item String syntax
9430@code{"abc"}
9431
9432@item gettext shorthand
9433@code{_"abc"}
9434
9435@item gettext/ngettext functions
9436@code{dcgettext}, missing @code{dcngettext} in gawk-3.1.0
9437
9438@item textdomain
9439@code{TEXTDOMAIN} variable
9440
9441@item bindtextdomain
9442@code{bindtextdomain} function
9443
9444@item setlocale
9445automatic, but missing @code{setlocale (LC_MESSAGES, "")} in gawk-3.1.0
9446
9447@item Prerequisite
9448---
9449
9450@item Use or emulate GNU gettext
9451use
9452
9453@item Extractor
9454@code{xgettext}
9455
9456@item Formatting with positions
9457@code{printf "%2$d %1$d"} (GNU awk only)
9458
9459@item Portability
9460On platforms without gettext, no translation.  On non-GNU awks, you must
9461define @code{dcgettext}, @code{dcngettext} and @code{bindtextdomain}
9462yourself.
9463
9464@item po-mode marking
9465---
9466@end table
9467
9468An example is available in the @file{examples} directory: @code{hello-gawk}.
9469
9470@node Pascal, wxWidgets, gawk, List of Programming Languages
9471@subsection Pascal - Free Pascal Compiler
9472@cindex Pascal
9473@cindex Free Pascal
9474@cindex Object Pascal
9475
9476@table @asis
9477@item RPMs
9478fpk
9479
9480@item File extension
9481@code{pp}, @code{pas}
9482
9483@item String syntax
9484@code{'abc'}
9485
9486@item gettext shorthand
9487automatic
9488
9489@item gettext/ngettext functions
9490---, use @code{ResourceString} data type instead
9491
9492@item textdomain
9493---, use @code{TranslateResourceStrings} function instead
9494
9495@item bindtextdomain
9496---, use @code{TranslateResourceStrings} function instead
9497
9498@item setlocale
9499automatic, but uses only LANG, not LC_MESSAGES or LC_ALL
9500
9501@item Prerequisite
9502@code{@{$mode delphi@}} or @code{@{$mode objfpc@}}@*@code{uses gettext;}
9503
9504@item Use or emulate GNU gettext
9505emulate partially
9506
9507@item Extractor
9508@code{ppc386} followed by @code{xgettext} or @code{rstconv}
9509
9510@item Formatting with positions
9511@code{uses sysutils;}@*@code{format "%1:d %0:d"}
9512
9513@item Portability
9514?
9515
9516@item po-mode marking
9517---
9518@end table
9519
9520The Pascal compiler has special support for the @code{ResourceString} data
9521type.  It generates a @code{.rst} file.  This is then converted to a
9522@code{.pot} file by use of @code{xgettext} or @code{rstconv}.  At runtime,
9523a @code{.mo} file corresponding to translations of this @code{.pot} file
9524can be loaded using the @code{TranslateResourceStrings} function in the
9525@code{gettext} unit.
9526
9527An example is available in the @file{examples} directory: @code{hello-pascal}.
9528
9529@node wxWidgets, YCP, Pascal, List of Programming Languages
9530@subsection wxWidgets library
9531@cindex @code{wxWidgets} library
9532
9533@table @asis
9534@item RPMs
9535wxGTK, gettext
9536
9537@item File extension
9538@code{cpp}
9539
9540@item String syntax
9541@code{"abc"}
9542
9543@item gettext shorthand
9544@code{_("abc")}
9545
9546@item gettext/ngettext functions
9547@code{wxLocale::GetString}, @code{wxGetTranslation}
9548
9549@item textdomain
9550@code{wxLocale::AddCatalog}
9551
9552@item bindtextdomain
9553@code{wxLocale::AddCatalogLookupPathPrefix}
9554
9555@item setlocale
9556@code{wxLocale::Init}, @code{wxSetLocale}
9557
9558@item Prerequisite
9559@code{#include <wx/intl.h>}
9560
9561@item Use or emulate GNU gettext
9562emulate, see @code{include/wx/intl.h} and @code{src/common/intl.cpp}
9563
9564@item Extractor
9565@code{xgettext}
9566
9567@item Formatting with positions
9568wxString::Format supports positions if and only if the system has
9569@code{wprintf()}, @code{vswprintf()} functions and they support positions
9570according to POSIX.
9571
9572@item Portability
9573fully portable
9574
9575@item po-mode marking
9576yes
9577@end table
9578
9579@node YCP, Tcl, wxWidgets, List of Programming Languages
9580@subsection YCP - YaST2 scripting language
9581@cindex YCP
9582@cindex YaST2 scripting language
9583
9584@table @asis
9585@item RPMs
9586libycp, libycp-devel, yast2-core, yast2-core-devel
9587
9588@item File extension
9589@code{ycp}
9590
9591@item String syntax
9592@code{"abc"}
9593
9594@item gettext shorthand
9595@code{_("abc")}
9596
9597@item gettext/ngettext functions
9598@code{_()} with 1 or 3 arguments
9599
9600@item textdomain
9601@code{textdomain} statement
9602
9603@item bindtextdomain
9604---
9605
9606@item setlocale
9607---
9608
9609@item Prerequisite
9610---
9611
9612@item Use or emulate GNU gettext
9613use
9614
9615@item Extractor
9616@code{xgettext}
9617
9618@item Formatting with positions
9619@code{sformat "%2 %1"}
9620
9621@item Portability
9622fully portable
9623
9624@item po-mode marking
9625---
9626@end table
9627
9628An example is available in the @file{examples} directory: @code{hello-ycp}.
9629
9630@node Tcl, Perl, YCP, List of Programming Languages
9631@subsection Tcl - Tk's scripting language
9632@cindex Tcl
9633@cindex Tk's scripting language
9634
9635@table @asis
9636@item RPMs
9637tcl
9638
9639@item File extension
9640@code{tcl}
9641
9642@item String syntax
9643@code{"abc"}
9644
9645@item gettext shorthand
9646@code{[_ "abc"]}
9647
9648@item gettext/ngettext functions
9649@code{::msgcat::mc}
9650
9651@item textdomain
9652---
9653
9654@item bindtextdomain
9655---, use @code{::msgcat::mcload} instead
9656
9657@item setlocale
9658automatic, uses LANG, but ignores LC_MESSAGES and LC_ALL
9659
9660@item Prerequisite
9661@code{package require msgcat}
9662@*@code{proc _ @{s@} @{return [::msgcat::mc $s]@}}
9663
9664@item Use or emulate GNU gettext
9665---, uses a Tcl specific message catalog format
9666
9667@item Extractor
9668@code{xgettext -k_}
9669
9670@item Formatting with positions
9671@code{format "%2\$d %1\$d"}
9672
9673@item Portability
9674fully portable
9675
9676@item po-mode marking
9677---
9678@end table
9679
9680Two examples are available in the @file{examples} directory:
9681@code{hello-tcl}, @code{hello-tcl-tk}.
9682
9683Before marking strings as internationalizable, substitutions of variables
9684into the string need to be converted to @code{format} applications.  For
9685example, @code{"file $filename not found"} becomes
9686@code{[format "file %s not found" $filename]}.
9687Only after this is done, can the strings be marked and extracted.
9688After marking, this example becomes
9689@code{[format [_ "file %s not found"] $filename]} or
9690@code{[msgcat::mc "file %s not found" $filename]}.  Note that the
9691@code{msgcat::mc} function implicitly calls @code{format} when more than one
9692argument is given.
9693
9694@node Perl, PHP, Tcl, List of Programming Languages
9695@subsection Perl
9696@cindex Perl
9697
9698@table @asis
9699@item RPMs
9700perl
9701
9702@item File extension
9703@code{pl}, @code{PL}, @code{pm}, @code{cgi}
9704
9705@item String syntax
9706@itemize @bullet
9707
9708@item @code{"abc"}
9709
9710@item @code{'abc'}
9711
9712@item @code{qq (abc)}
9713
9714@item @code{q (abc)}
9715
9716@item @code{qr /abc/}
9717
9718@item @code{qx (/bin/date)}
9719
9720@item @code{/pattern match/}
9721
9722@item @code{?pattern match?}
9723
9724@item @code{s/substitution/operators/}
9725
9726@item @code{$tied_hash@{"message"@}}
9727
9728@item @code{$tied_hash_reference->@{"message"@}}
9729
9730@item etc., issue the command @samp{man perlsyn} for details
9731
9732@end itemize
9733
9734@item gettext shorthand
9735@code{__} (double underscore)
9736
9737@item gettext/ngettext functions
9738@code{gettext}, @code{dgettext}, @code{dcgettext}, @code{ngettext},
9739@code{dngettext}, @code{dcngettext}
9740
9741@item textdomain
9742@code{textdomain} function
9743
9744@item bindtextdomain
9745@code{bindtextdomain} function
9746
9747@item bind_textdomain_codeset 
9748@code{bind_textdomain_codeset} function
9749
9750@item setlocale
9751Use @code{setlocale (LC_ALL, "");}
9752
9753@item Prerequisite
9754@code{use POSIX;}
9755@*@code{use Locale::TextDomain;} (included in the package libintl-perl
9756which is available on the Comprehensive Perl Archive Network CPAN,
9757http://www.cpan.org/).
9758
9759@item Use or emulate GNU gettext
9760platform dependent: gettext_pp emulates, gettext_xs uses GNU gettext
9761
9762@item Extractor
9763@code{xgettext -k__ -k\$__ -k%__ -k__x -k__n:1,2 -k__nx:1,2 -k__xn:1,2 -kN__ -k}
9764
9765@item Formatting with positions
9766Both kinds of format strings support formatting with positions.
9767@*@code{printf "%2\$d %1\$d", ...} (requires Perl 5.8.0 or newer)
9768@*@code{__expand("[new] replaces [old]", old => $oldvalue, new => $newvalue)}
9769
9770@item Portability
9771The @code{libintl-perl} package is platform independent but is not
9772part of the Perl core.  The programmer is responsible for
9773providing a dummy implementation of the required functions if the 
9774package is not installed on the target system.
9775
9776@item po-mode marking
9777---
9778
9779@item Documentation
9780Included in @code{libintl-perl}, available on CPAN
9781(http://www.cpan.org/).
9782
9783@end table
9784
9785An example is available in the @file{examples} directory: @code{hello-perl}.
9786
9787@cindex marking Perl sources
9788
9789The @code{xgettext} parser backend for Perl differs significantly from
9790the parser backends for other programming languages, just as Perl
9791itself differs significantly from other programming languages.  The
9792Perl parser backend offers many more string marking facilities than
9793the other backends but it also has some Perl specific limitations, the
9794worst probably being its imperfectness.
9795
9796@menu
9797* General Problems::            General Problems Parsing Perl Code
9798* Default Keywords::            Which Keywords Will xgettext Look For?
9799* Special Keywords::            How to Extract Hash Keys
9800* Quote-like Expressions::      What are Strings And Quote-like Expressions?
9801* Interpolation I::             Invalid String Interpolation
9802* Interpolation II::            Valid String Interpolation
9803* Parentheses::                 When To Use Parentheses
9804* Long Lines::                  How To Grok with Long Lines
9805* Perl Pitfalls::               Bugs, Pitfalls, and Things That Do Not Work
9806@end menu
9807
9808@node General Problems, Default Keywords,  , Perl
9809@subsubsection General Problems Parsing Perl Code
9810
9811It is often heard that only Perl can parse Perl.  This is not true.
9812Perl cannot be @emph{parsed} at all, it can only be @emph{executed}.
9813Perl has various built-in ambiguities that can only be resolved at runtime.
9814
9815The following example may illustrate one common problem:
9816
9817@example
9818print gettext "Hello World!";
9819@end example
9820
9821Although this example looks like a bullet-proof case of a function
9822invocation, it is not:
9823
9824@example
9825open gettext, ">testfile" or die;
9826print gettext "Hello world!"
9827@end example
9828
9829In this context, the string @code{gettext} looks more like a
9830file handle.  But not necessarily:
9831
9832@example
9833use Locale::Messages qw (:libintl_h);
9834open gettext ">testfile" or die;
9835print gettext "Hello world!";
9836@end example
9837
9838Now, the file is probably syntactically incorrect, provided that the module
9839@code{Locale::Messages} found first in the Perl include path exports a
9840function @code{gettext}.  But what if the module
9841@code{Locale::Messages} really looks like this?
9842
9843@example
9844use vars qw (*gettext);
9845
98461;
9847@end example
9848
9849In this case, the string @code{gettext} will be interpreted as a file
9850handle again, and the above example will create a file @file{testfile}
9851and write the string ``Hello world!'' into it.  Even advanced
9852control flow analysis will not really help:
9853
9854@example
9855if (0.5 < rand) @{
9856   eval "use Sane";
9857@} else @{
9858   eval "use InSane";
9859@}
9860print gettext "Hello world!";
9861@end example
9862
9863If the module @code{Sane} exports a function @code{gettext} that does
9864what we expect, and the module @code{InSane} opens a file for writing
9865and associates the @emph{handle} @code{gettext} with this output
9866stream, we are clueless again about what will happen at runtime.  It is
9867completely unpredictable.  The truth is that Perl has so many ways to
9868fill its symbol table at runtime that it is impossible to interpret a
9869particular piece of code without executing it.
9870
9871Of course, @code{xgettext} will not execute your Perl sources while
9872scanning for translatable strings, but rather use heuristics in order
9873to guess what you meant.
9874
9875Another problem is the ambiguity of the slash and the question mark.
9876Their interpretation depends on the context:
9877
9878@example
9879# A pattern match.
9880print "OK\n" if /foobar/;
9881
9882# A division.
9883print 1 / 2;
9884
9885# Another pattern match.
9886print "OK\n" if ?foobar?;
9887
9888# Conditional.
9889print $x ? "foo" : "bar";
9890@end example
9891
9892The slash may either act as the division operator or introduce a
9893pattern match, whereas the question mark may act as the ternary
9894conditional operator or as a pattern match, too.  Other programming
9895languages like @code{awk} present similar problems, but the consequences of a
9896misinterpretation are particularly nasty with Perl sources.  In @code{awk}
9897for instance, a statement can never exceed one line and the parser
9898can recover from a parsing error at the next newline and interpret
9899the rest of the input stream correctly.  Perl is different, as a
9900pattern match is terminated by the next appearance of the delimiter
9901(the slash or the question mark) in the input stream, regardless of
9902the semantic context.  If a slash is really a division sign but
9903mis-interpreted as a pattern match, the rest of the input file is most
9904probably parsed incorrectly.
9905
9906If you find that @code{xgettext} fails to extract strings from
9907portions of your sources, you should therefore look out for slashes
9908and/or question marks preceding these sections.  You may have come
9909across a bug in @code{xgettext}'s Perl parser (and of course you
9910should report that bug).  In the meantime you should consider to
9911reformulate your code in a manner less challenging to @code{xgettext}.
9912
9913@node Default Keywords, Special Keywords, General Problems, Perl
9914@subsubsection Which keywords will xgettext look for?
9915@cindex Perl default keywords
9916
9917Unless you instruct @code{xgettext} otherwise by invoking it with one
9918of the options @code{--keyword} or @code{-k}, it will recognize the
9919following keywords in your Perl sources:
9920
9921@itemize @bullet
9922
9923@item @code{gettext}
9924
9925@item @code{dgettext}
9926
9927@item @code{dcgettext}
9928
9929@item @code{ngettext:1,2}
9930
9931The first (singular) and the second (plural) argument will be
9932extracted.
9933
9934@item @code{dngettext:1,2}
9935
9936The first (singular) and the second (plural) argument will be
9937extracted.
9938
9939@item @code{dcngettext:1,2}
9940
9941The first (singular) and the second (plural) argument will be
9942extracted.
9943
9944@item @code{gettext_noop}
9945
9946@item @code{%gettext}
9947
9948The keys of lookups into the hash @code{%gettext} will be extracted.
9949
9950@item @code{$gettext}
9951
9952The keys of lookups into the hash reference @code{$gettext} will be extracted.
9953
9954@end itemize
9955
9956@node Special Keywords, Quote-like Expressions, Default Keywords, Perl
9957@subsubsection How to Extract Hash Keys
9958@cindex Perl special keywords for hash-lookups
9959
9960Translating messages at runtime is normally performed by looking up the
9961original string in the translation database and returning the
9962translated version.  The ``natural'' Perl implementation is a hash
9963lookup, and, of course, @code{xgettext} supports such practice.
9964
9965@example
9966print __"Hello world!";
9967print $__@{"Hello world!"@};
9968print $__->@{"Hello world!"@};
9969print $$__@{"Hello world!"@};
9970@end example  
9971
9972The above four lines all do the same thing.  The Perl module 
9973@code{Locale::TextDomain} exports by default a hash @code{%__} that
9974is tied to the function @code{__()}.  It also exports a reference
9975@code{$__} to @code{%__}.
9976
9977If an argument to the @code{xgettext} option @code{--keyword},
9978resp. @code{-k} starts with a percent sign, the rest of the keyword is
9979interpreted as the name of a hash.  If it starts with a dollar
9980sign, the rest of the keyword is interpreted as a reference to a
9981hash.
9982
9983Note that you can omit the quotation marks (single or double) around
9984the hash key (almost) whenever Perl itself allows it:
9985
9986@example
9987print $gettext@{Error@};
9988@end example
9989
9990The exact rule is: You can omit the surrounding quotes, when the hash
9991key is a valid C (!) identifier, i.e.@: when it starts with an
9992underscore or an ASCII letter and is followed by an arbitrary number
9993of underscores, ASCII letters or digits.  Other Unicode characters
9994are @emph{not} allowed, regardless of the @code{use utf8} pragma.
9995
9996@node Quote-like Expressions, Interpolation I, Special Keywords, Perl
9997@subsubsection What are Strings And Quote-like Expressions?
9998@cindex Perl quote-like expressions
9999
10000Perl offers a plethora of different string constructs.  Those that can
10001be used either as arguments to functions or inside braces for hash
10002lookups are generally supported by @code{xgettext}.  
10003
10004@itemize @bullet
10005@item @strong{double-quoted strings}
10006@*
10007@example
10008print gettext "Hello World!";
10009@end example
10010
10011@item @strong{single-quoted strings}
10012@*
10013@example
10014print gettext 'Hello World!';
10015@end example
10016
10017@item @strong{the operator qq}
10018@*
10019@example
10020print gettext qq |Hello World!|;
10021print gettext qq <E-mail: <guido\@@imperia.net>>;
10022@end example
10023
10024The operator @code{qq} is fully supported.  You can use arbitrary
10025delimiters, including the four bracketing delimiters (round, angle,
10026square, curly) that nest.
10027
10028@item @strong{the operator q}
10029@*
10030@example
10031print gettext q |Hello World!|;
10032print gettext q <E-mail: <guido@@imperia.net>>;
10033@end example
10034
10035The operator @code{q} is fully supported.  You can use arbitrary
10036delimiters, including the four bracketing delimiters (round, angle,
10037square, curly) that nest.
10038
10039@item @strong{the operator qx}
10040@*
10041@example
10042print gettext qx ;LANGUAGE=C /bin/date;
10043print gettext qx [/usr/bin/ls | grep '^[A-Z]*'];
10044@end example
10045
10046The operator @code{qx} is fully supported.  You can use arbitrary
10047delimiters, including the four bracketing delimiters (round, angle,
10048square, curly) that nest.
10049
10050The example is actually a useless use of @code{gettext}.  It will
10051invoke the @code{gettext} function on the output of the command
10052specified with the @code{qx} operator.  The feature was included
10053in order to make the interface consistent (the parser will extract
10054all strings and quote-like expressions).
10055
10056@item @strong{here documents}
10057@*
10058@example
10059@group
10060print gettext <<'EOF';
10061program not found in $PATH
10062EOF
10063
10064print ngettext <<EOF, <<"EOF";
10065one file deleted
10066EOF
10067several files deleted
10068EOF
10069@end group
10070@end example
10071
10072Here-documents are recognized.  If the delimiter is enclosed in single
10073quotes, the string is not interpolated.  If it is enclosed in double
10074quotes or has no quotes at all, the string is interpolated.
10075
10076Delimiters that start with a digit are not supported!
10077
10078@end itemize
10079
10080@node Interpolation I, Interpolation II, Quote-like Expressions, Perl
10081@subsubsection Invalid Uses Of String Interpolation
10082@cindex Perl invalid string interpolation
10083
10084Perl is capable of interpolating variables into strings.  This offers
10085some nice features in localized programs but can also lead to
10086problems.
10087
10088A common error is a construct like the following:
10089
10090@example
10091print gettext "This is the program $0!\n";
10092@end example
10093
10094Perl will interpolate at runtime the value of the variable @code{$0}
10095into the argument of the @code{gettext()} function.  Hence, this
10096argument is not a string constant but a variable argument (@code{$0}
10097is a global variable that holds the name of the Perl script being
10098executed).  The interpolation is performed by Perl before the string
10099argument is passed to @code{gettext()} and will therefore depend on
10100the name of the script which can only be determined at runtime.
10101Consequently, it is almost impossible that a translation can be looked
10102up at runtime (except if, by accident, the interpolated string is found
10103in the message catalog).
10104
10105The @code{xgettext} program will therefore terminate parsing with a fatal
10106error if it encounters a variable inside of an extracted string.  In
10107general, this will happen for all kinds of string interpolations that
10108cannot be safely performed at compile time.  If you absolutely know
10109what you are doing, you can always circumvent this behavior:
10110
10111@example
10112my $know_what_i_am_doing = "This is program $0!\n";
10113print gettext $know_what_i_am_doing;
10114@end example
10115
10116Since the parser only recognizes strings and quote-like expressions,
10117but not variables or other terms, the above construct will be
10118accepted.  You will have to find another way, however, to let your
10119original string make it into your message catalog.
10120
10121If invoked with the option @code{--extract-all}, resp. @code{-a},
10122variable interpolation will be accepted.  Rationale: You will
10123generally use this option in order to prepare your sources for
10124internationalization.
10125
10126Please see the manual page @samp{man perlop} for details of strings and
10127quote-like expressions that are subject to interpolation and those
10128that are not.  Safe interpolations (that will not lead to a fatal
10129error) are:
10130
10131@itemize @bullet
10132
10133@item the escape sequences @code{\t} (tab, HT, TAB), @code{\n}
10134(newline, NL), @code{\r} (return, CR), @code{\f} (form feed, FF),
10135@code{\b} (backspace, BS), @code{\a} (alarm, bell, BEL), and @code{\e}
10136(escape, ESC).
10137
10138@item octal chars, like @code{\033}
10139@*
10140Note that octal escapes in the range of 400-777 are translated into a 
10141UTF-8 representation, regardless of the presence of the @code{use utf8} pragma.
10142
10143@item hex chars, like @code{\x1b}
10144
10145@item wide hex chars, like @code{\x@{263a@}}
10146@*
10147Note that this escape is translated into a UTF-8 representation,
10148regardless of the presence of the @code{use utf8} pragma.
10149
10150@item control chars, like @code{\c[} (CTRL-[)
10151
10152@item named Unicode chars, like @code{\N@{LATIN CAPITAL LETTER C WITH CEDILLA@}}
10153@*
10154Note that this escape is translated into a UTF-8 representation,
10155regardless of the presence of the @code{use utf8} pragma.
10156@end itemize
10157
10158The following escapes are considered partially safe:
10159
10160@itemize @bullet
10161
10162@item @code{\l} lowercase next char
10163
10164@item @code{\u} uppercase next char
10165
10166@item @code{\L} lowercase till \E
10167
10168@item @code{\U} uppercase till \E
10169
10170@item @code{\E} end case modification
10171
10172@item @code{\Q} quote non-word characters till \E
10173
10174@end itemize
10175
10176These escapes are only considered safe if the string consists of
10177ASCII characters only.  Translation of characters outside the range
10178defined by ASCII is locale-dependent and can actually only be performed 
10179at runtime; @code{xgettext} doesn't do these locale-dependent translations
10180at extraction time.
10181
10182Except for the modifier @code{\Q}, these translations, albeit valid,
10183are generally useless and only obfuscate your sources.  If a
10184translation can be safely performed at compile time you can just as
10185well write what you mean.
10186
10187@node Interpolation II, Parentheses, Interpolation I, Perl
10188@subsubsection Valid Uses Of String Interpolation
10189@cindex Perl valid string interpolation
10190
10191Perl is often used to generate sources for other programming languages
10192or arbitrary file formats.  Web applications that output HTML code
10193make a prominent example for such usage.
10194
10195You will often come across situations where you want to intersperse
10196code written in the target (programming) language with translatable
10197messages, like in the following HTML example:
10198
10199@example
10200print gettext <<EOF;
10201<h1>My Homepage</h1>
10202<script language="JavaScript"><!--
10203for (i = 0; i < 100; ++i) @{
10204    alert ("Thank you so much for visiting my homepage!");
10205@}
10206//--></script>
10207EOF
10208@end example
10209
10210The parser will extract the entire here document, and it will appear
10211entirely in the resulting PO file, including the JavaScript snippet
10212embedded in the HTML code.  If you exaggerate with constructs like 
10213the above, you will run the risk that the translators of your package 
10214will look out for a less challenging project.  You should consider an 
10215alternative expression here:
10216
10217@example
10218print <<EOF;
10219<h1>$gettext@{"My Homepage"@}</h1>
10220<script language="JavaScript"><!--
10221for (i = 0; i < 100; ++i) @{
10222    alert ("$gettext@{'Thank you so much for visiting my homepage!'@}");
10223@}
10224//--></script>
10225EOF
10226@end example
10227
10228Only the translatable portions of the code will be extracted here, and
10229the resulting PO file will begrudgingly improve in terms of readability.
10230
10231You can interpolate hash lookups in all strings or quote-like
10232expressions that are subject to interpolation (see the manual page
10233@samp{man perlop} for details).  Double interpolation is invalid, however:
10234
10235@example
10236# TRANSLATORS: Replace "the earth" with the name of your planet.
10237print gettext qq@{Welcome to $gettext->@{"the earth"@}@};
10238@end example
10239
10240The @code{qq}-quoted string is recognized as an argument to @code{xgettext} in
10241the first place, and checked for invalid variable interpolation.  The
10242dollar sign of hash-dereferencing will therefore terminate the parser 
10243with an ``invalid interpolation'' error.
10244
10245It is valid to interpolate hash lookups in regular expressions:
10246
10247@example
10248if ($var =~ /$gettext@{"the earth"@}/) @{
10249   print gettext "Match!\n";
10250@}
10251s/$gettext@{"U. S. A."@}/$gettext@{"U. S. A."@} $gettext@{"(dial +0)"@}/g;
10252@end example
10253
10254@node Parentheses, Long Lines, Interpolation II, Perl
10255@subsubsection When To Use Parentheses
10256@cindex Perl parentheses
10257
10258In Perl, parentheses around function arguments are mostly optional.
10259@code{xgettext} will always assume that all
10260recognized keywords (except for hashes and hash references) are names
10261of properly prototyped functions, and will (hopefully) only require
10262parentheses where Perl itself requires them.  All constructs in the
10263following example are therefore ok to use:
10264
10265@example
10266@group
10267print gettext ("Hello World!\n");
10268print gettext "Hello World!\n";
10269print dgettext ($package => "Hello World!\n");
10270print dgettext $package, "Hello World!\n";
10271
10272# The "fat comma" => turns the left-hand side argument into a
10273# single-quoted string!
10274print dgettext smellovision => "Hello World!\n";
10275
10276# The following assignment only works with prototyped functions.
10277# Otherwise, the functions will act as "greedy" list operators and
10278# eat up all following arguments.
10279my $anonymous_hash = @{
10280   planet => gettext "earth",
10281   cakes => ngettext "one cake", "several cakes", $n,
10282   still => $works,
10283@};
10284# The same without fat comma:
10285my $other_hash = @{
10286   'planet', gettext "earth",
10287   'cakes', ngettext "one cake", "several cakes", $n,
10288   'still', $works,
10289@};
10290
10291# Parentheses are only significant for the first argument.
10292print dngettext 'package', ("one cake", "several cakes", $n), $discarded;
10293@end group
10294@end example
10295
10296@node Long Lines, Perl Pitfalls, Parentheses, Perl
10297@subsubsection How To Grok with Long Lines
10298@cindex Perl long lines
10299
10300The necessity of long messages can often lead to a cumbersome or
10301unreadable coding style.  Perl has several options that may prevent
10302you from writing unreadable code, and
10303@code{xgettext} does its best to do likewise.  This is where the dot
10304operator (the string concatenation operator) may come in handy:
10305
10306@example
10307@group
10308print gettext ("This is a very long"
10309               . " message that is still"
10310               . " readable, because"
10311               . " it is split into"
10312               . " multiple lines.\n");
10313@end group
10314@end example
10315
10316Perl is smart enough to concatenate these constant string fragments
10317into one long string at compile time, and so is
10318@code{xgettext}.  You will only find one long message in the resulting
10319POT file.
10320
10321Note that the future Perl 6 will probably use the underscore
10322(@samp{_}) as the string concatenation operator, and the dot 
10323(@samp{.}) for dereferencing.  This new syntax is not yet supported by
10324@code{xgettext}.
10325
10326If embedded newline characters are not an issue, or even desired, you
10327may also insert newline characters inside quoted strings wherever you
10328feel like it:
10329
10330@example
10331@group
10332print gettext ("<em>In HTML output
10333embedded newlines are generally no
10334problem, since adjacent whitespace
10335is always rendered into a single
10336space character.</em>");
10337@end group
10338@end example
10339
10340You may also consider to use here documents:
10341
10342@example
10343@group
10344print gettext <<EOF;
10345<em>In HTML output
10346embedded newlines are generally no
10347problem, since adjacent whitespace
10348is always rendered into a single
10349space character.</em>
10350EOF
10351@end group
10352@end example
10353
10354Please do not forget that the line breaks are real, i.e.@: they
10355translate into newline characters that will consequently show up in
10356the resulting POT file.
10357
10358@node Perl Pitfalls,  , Long Lines, Perl
10359@subsubsection Bugs, Pitfalls, And Things That Do Not Work
10360@cindex Perl pitfalls
10361
10362The foregoing sections should have proven that
10363@code{xgettext} is quite smart in extracting translatable strings from
10364Perl sources.  Yet, some more or less exotic constructs that could be
10365expected to work, actually do not work.  
10366
10367One of the more relevant limitations can be found in the
10368implementation of variable interpolation inside quoted strings.  Only
10369simple hash lookups can be used there:
10370
10371@example
10372print <<EOF;
10373$gettext@{"The dot operator"
10374          . " does not work"
10375          . "here!"@}
10376Likewise, you cannot @@@{[ gettext ("interpolate function calls") ]@}
10377inside quoted strings or quote-like expressions.
10378EOF
10379@end example
10380
10381This is valid Perl code and will actually trigger invocations of the
10382@code{gettext} function at runtime.  Yet, the Perl parser in
10383@code{xgettext} will fail to recognize the strings.  A less obvious
10384example can be found in the interpolation of regular expressions:
10385
10386@example
10387s/<!--START_OF_WEEK-->/gettext ("Sunday")/e;
10388@end example
10389
10390The modifier @code{e} will cause the substitution to be interpreted as
10391an evaluable statement.  Consequently, at runtime the function
10392@code{gettext()} is called, but again, the parser fails to extract the
10393string ``Sunday''.  Use a temporary variable as a simple workaround if
10394you really happen to need this feature:
10395
10396@example
10397my $sunday = gettext "Sunday";
10398s/<!--START_OF_WEEK-->/$sunday/;
10399@end example
10400
10401Hash slices would also be handy but are not recognized:
10402
10403@example
10404my @@weekdays = @@gettext@{'Sunday', 'Monday', 'Tuesday', 'Wednesday',
10405                        'Thursday', 'Friday', 'Saturday'@};
10406# Or even:
10407@@weekdays = @@gettext@{qw (Sunday Monday Tuesday Wednesday Thursday
10408                         Friday Saturday) @};
10409@end example
10410
10411This is perfectly valid usage of the tied hash @code{%gettext} but the
10412strings are not recognized and therefore will not be extracted.
10413
10414Another caveat of the current version is its rudimentary support for
10415non-ASCII characters in identifiers.  You may encounter serious
10416problems if you use identifiers with characters outside the range of
10417'A'-'Z', 'a'-'z', '0'-'9' and the underscore '_'.
10418
10419Maybe some of these missing features will be implemented in future
10420versions, but since you can always make do without them at minimal effort,
10421these todos have very low priority.
10422
10423A nasty problem are brace format strings that already contain braces
10424as part of the normal text, for example the usage strings typically
10425encountered in programs:
10426
10427@example
10428die "usage: $0 @{OPTIONS@} FILENAME...\n";
10429@end example
10430
10431If you want to internationalize this code with Perl brace format strings,
10432you will run into a problem:
10433
10434@example
10435die __x ("usage: @{program@} @{OPTIONS@} FILENAME...\n", program => $0);
10436@end example
10437
10438Whereas @samp{@{program@}} is a placeholder, @samp{@{OPTIONS@}}
10439is not and should probably be translated. Yet, there is no way to teach
10440the Perl parser in @code{xgettext} to recognize the first one, and leave
10441the other one alone.
10442
10443There are two possible work-arounds for this problem.  If you are
10444sure that your program will run under Perl 5.8.0 or newer (these
10445Perl versions handle positional parameters in @code{printf()}) or
10446if you are sure that the translator will not have to reorder the arguments
10447in her translation -- for example if you have only one brace placeholder
10448in your string, or if it describes a syntax, like in this one --, you can
10449mark the string as @code{no-perl-brace-format} and use @code{printf()}:
10450
10451@example
10452# xgettext: no-perl-brace-format
10453die sprintf ("usage: %s @{OPTIONS@} FILENAME...\n", $0);
10454@end example
10455
10456If you want to use the more portable Perl brace format, you will have to do
10457put placeholders in place of the literal braces:
10458
10459@example
10460die __x ("usage: @{program@} @{[@}OPTIONS@{]@} FILENAME...\n",
10461         program => $0, '[' => '@{', ']' => '@}');
10462@end example
10463
10464Perl brace format strings know no escaping mechanism.  No matter how this
10465escaping mechanism looked like, it would either give the programmer a
10466hard time, make translating Perl brace format strings heavy-going, or
10467result in a performance penalty at runtime, when the format directives
10468get executed.  Most of the time you will happily get along with
10469@code{printf()} for this special case.
10470
10471@node PHP, Pike, Perl, List of Programming Languages
10472@subsection PHP Hypertext Preprocessor
10473@cindex PHP
10474
10475@table @asis
10476@item RPMs
10477mod_php4, mod_php4-core, phpdoc
10478
10479@item File extension
10480@code{php}, @code{php3}, @code{php4}
10481
10482@item String syntax
10483@code{"abc"}, @code{'abc'}
10484
10485@item gettext shorthand
10486@code{_("abc")}
10487
10488@item gettext/ngettext functions
10489@code{gettext}, @code{dgettext}, @code{dcgettext}; starting with PHP 4.2.0
10490also @code{ngettext}, @code{dngettext}, @code{dcngettext}
10491
10492@item textdomain
10493@code{textdomain} function
10494
10495@item bindtextdomain
10496@code{bindtextdomain} function
10497
10498@item setlocale
10499Programmer must call @code{setlocale (LC_ALL, "")}
10500
10501@item Prerequisite
10502---
10503
10504@item Use or emulate GNU gettext
10505use
10506
10507@item Extractor
10508@code{xgettext}
10509
10510@item Formatting with positions
10511@code{printf "%2\$d %1\$d"}
10512
10513@item Portability
10514On platforms without gettext, the functions are not available.
10515
10516@item po-mode marking
10517---
10518@end table
10519
10520An example is available in the @file{examples} directory: @code{hello-php}.
10521
10522@node Pike, GCC-source, PHP, List of Programming Languages
10523@subsection Pike
10524@cindex Pike
10525
10526@table @asis
10527@item RPMs
10528roxen
10529
10530@item File extension
10531@code{pike}
10532
10533@item String syntax
10534@code{"abc"}
10535
10536@item gettext shorthand
10537---
10538
10539@item gettext/ngettext functions
10540@code{gettext}, @code{dgettext}, @code{dcgettext}
10541
10542@item textdomain
10543@code{textdomain} function
10544
10545@item bindtextdomain
10546@code{bindtextdomain} function
10547
10548@item setlocale
10549@code{setlocale} function
10550
10551@item Prerequisite
10552@code{import Locale.Gettext;}
10553
10554@item Use or emulate GNU gettext
10555use
10556
10557@item Extractor
10558---
10559
10560@item Formatting with positions
10561---
10562
10563@item Portability
10564On platforms without gettext, the functions are not available.
10565
10566@item po-mode marking
10567---
10568@end table
10569
10570@node GCC-source,  , Pike, List of Programming Languages
10571@subsection GNU Compiler Collection sources
10572@cindex GCC-source
10573
10574@table @asis
10575@item RPMs
10576gcc
10577
10578@item File extension
10579@code{c}, @code{h}.
10580
10581@item String syntax
10582@code{"abc"}
10583
10584@item gettext shorthand
10585@code{_("abc")}
10586
10587@item gettext/ngettext functions
10588@code{gettext}, @code{dgettext}, @code{dcgettext}, @code{ngettext},
10589@code{dngettext}, @code{dcngettext}
10590
10591@item textdomain
10592@code{textdomain} function
10593
10594@item bindtextdomain
10595@code{bindtextdomain} function
10596
10597@item setlocale
10598Programmer must call @code{setlocale (LC_ALL, "")}
10599
10600@item Prerequisite
10601@code{#include "intl.h"}
10602
10603@item Use or emulate GNU gettext
10604Use
10605
10606@item Extractor
10607@code{xgettext -k_}
10608
10609@item Formatting with positions
10610---
10611
10612@item Portability
10613Uses autoconf macros
10614
10615@item po-mode marking
10616yes
10617@end table
10618
10619@c This is the template for new languages.
10620@ignore
10621
10622@ node
10623@ subsection 
10624
10625@table @asis
10626@item RPMs
10627
10628@item File extension
10629
10630@item String syntax
10631
10632@item gettext shorthand
10633
10634@item gettext/ngettext functions
10635
10636@item textdomain
10637
10638@item bindtextdomain
10639
10640@item setlocale
10641
10642@item Prerequisite
10643
10644@item Use or emulate GNU gettext
10645
10646@item Extractor
10647
10648@item Formatting with positions
10649
10650@item Portability
10651
10652@item po-mode marking
10653@end table
10654
10655@end ignore
10656
10657@node List of Data Formats,  , List of Programming Languages, Programming Languages
10658@section Internationalizable Data
10659
10660Here is a list of other data formats which can be internationalized
10661using GNU gettext.
10662
10663@menu
10664* POT::                         POT - Portable Object Template
10665* RST::                         Resource String Table
10666* Glade::                       Glade - GNOME user interface description
10667@end menu
10668
10669@node POT, RST, List of Data Formats, List of Data Formats
10670@subsection POT - Portable Object Template
10671
10672@table @asis
10673@item RPMs
10674gettext
10675
10676@item File extension
10677@code{pot}, @code{po}
10678
10679@item Extractor
10680@code{xgettext}
10681@end table
10682
10683@node RST, Glade, POT, List of Data Formats
10684@subsection Resource String Table
10685@cindex RST
10686
10687@table @asis
10688@item RPMs
10689fpk
10690
10691@item File extension
10692@code{rst}
10693
10694@item Extractor
10695@code{xgettext}, @code{rstconv}
10696@end table
10697
10698@node Glade,  , RST, List of Data Formats
10699@subsection Glade - GNOME user interface description
10700
10701@table @asis
10702@item RPMs
10703glade, libglade, glade2, libglade2, intltool
10704
10705@item File extension
10706@code{glade}, @code{glade2}
10707
10708@item Extractor
10709@code{xgettext}, @code{libglade-xgettext}, @code{xml-i18n-extract}, @code{intltool-extract}
10710@end table
10711
10712@c This is the template for new data formats.
10713@ignore
10714
10715@ node
10716@ subsection 
10717
10718@table @asis
10719@item RPMs
10720
10721@item File extension
10722
10723@item Extractor
10724@end table
10725
10726@end ignore
10727
10728@node Conclusion, Language Codes, Programming Languages, Top
10729@chapter Concluding Remarks
10730
10731We would like to conclude this GNU @code{gettext} manual by presenting
10732an history of the Translation Project so far.  We finally give
10733a few pointers for those who want to do further research or readings
10734about Native Language Support matters.
10735
10736@menu
10737* History::                     History of GNU @code{gettext}
10738* References::                  Related Readings
10739@end menu
10740
10741@node History, References, Conclusion, Conclusion
10742@section History of GNU @code{gettext}
10743@cindex history of GNU @code{gettext}
10744
10745Internationalization concerns and algorithms have been informally
10746and casually discussed for years in GNU, sometimes around GNU
10747@code{libc}, maybe around the incoming @code{Hurd}, or otherwise
10748(nobody clearly remembers).  And even then, when the work started for
10749real, this was somewhat independently of these previous discussions.
10750
10751This all began in July 1994, when Patrick D'Cruze had the idea and
10752initiative of internationalizing version 3.9.2 of GNU @code{fileutils}.
10753He then asked Jim Meyering, the maintainer, how to get those changes
10754folded into an official release.  That first draft was full of
10755@code{#ifdef}s and somewhat disconcerting, and Jim wanted to find
10756nicer ways.  Patrick and Jim shared some tries and experimentations
10757in this area.  Then, feeling that this might eventually have a deeper
10758impact on GNU, Jim wanted to know what standards were, and contacted
10759Richard Stallman, who very quickly and verbally described an overall
10760design for what was meant to become @code{glocale}, at that time.
10761
10762Jim implemented @code{glocale} and got a lot of exhausting feedback
10763from Patrick and Richard, of course, but also from Mitchum DSouza
10764(who wrote a @code{catgets}-like package), Roland McGrath, maybe David
10765MacKenzie, Fran@,{c}ois Pinard, and Paul Eggert, all pushing and
10766pulling in various directions, not always compatible, to the extent
10767that after a couple of test releases, @code{glocale} was torn apart.
10768In particular, Paul Eggert -- always keeping an eye on developments
10769in Solaris -- advocated the use of the @code{gettext} API over
10770@code{glocale}'s @code{catgets}-based API.
10771
10772While Jim took some distance and time and became dad for a second
10773time, Roland wanted to get GNU @code{libc} internationalized, and
10774got Ulrich Drepper involved in that project.  Instead of starting
10775from @code{glocale}, Ulrich rewrote something from scratch, but
10776more conforming to the set of guidelines who emerged out of the
10777@code{glocale} effort.  Then, Ulrich got people from the previous
10778forum to involve themselves into this new project, and the switch
10779from @code{glocale} to what was first named @code{msgutils}, renamed
10780@code{nlsutils}, and later @code{gettext}, became officially accepted
10781by Richard in May 1995 or so.
10782
10783Let's summarize by saying that Ulrich Drepper wrote GNU @code{gettext}
10784in April 1995.  The first official release of the package, including
10785PO mode, occurred in July 1995, and was numbered 0.7.  Other people
10786contributed to the effort by providing a discussion forum around
10787Ulrich, writing little pieces of code, or testing.  These are quoted
10788in the @code{THANKS} file which comes with the GNU @code{gettext}
10789distribution.
10790
10791While this was being done, Fran@,{c}ois adapted half a dozen of
10792GNU packages to @code{glocale} first, then later to @code{gettext},
10793putting them in pretest, so providing along the way an effective
10794user environment for fine tuning the evolving tools.  He also took
10795the responsibility of organizing and coordinating the Translation
10796Project.  After nearly a year of informal exchanges between people from
10797many countries, translator teams started to exist in May 1995, through
10798the creation and support by Patrick D'Cruze of twenty unmoderated
10799mailing lists for that many native languages, and two moderated
10800lists: one for reaching all teams at once, the other for reaching
10801all willing maintainers of internationalized free software packages.
10802
10803Fran@,{c}ois also wrote PO mode in June 1995 with the collaboration
10804of Greg McGary, as a kind of contribution to Ulrich's package.
10805He also gave a hand with the GNU @code{gettext} Texinfo manual.
10806
10807In 1997, Ulrich Drepper released the GNU libc 2.0, which included the
10808@code{gettext}, @code{textdomain} and @code{bindtextdomain} functions.
10809
10810In 2000, Ulrich Drepper added plural form handling (the @code{ngettext}
10811function) to GNU libc.  Later, in 2001, he released GNU libc 2.2.x,
10812which is the first free C library with full internationalization support.
10813
10814Ulrich being quite busy in his role of General Maintainer of GNU libc,
10815he handed over the GNU @code{gettext} maintenance to Bruno Haible in
108162000.  Bruno added the plural form handling to the tools as well, added
10817support for UTF-8 and CJK locales, and wrote a few new tools for
10818manipulating PO files.
10819
10820@node References,  , History, Conclusion
10821@section Related Readings
10822@cindex related reading
10823@cindex bibliography
10824
10825@strong{ NOTE: } This documentation section is outdated and needs to be
10826revised.
10827
10828Eugene H. Dorr (@file{dorre@@well.com}) maintains an interesting
10829bibliography on internationalization matters, called
10830@cite{Internationalization Reference List}, which is available as:
10831@example
10832ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/i18n-books.txt
10833@end example
10834
10835Michael Gschwind (@file{mike@@vlsivie.tuwien.ac.at}) maintains a
10836Frequently Asked Questions (FAQ) list, entitled @cite{Programming for
10837Internationalisation}.  This FAQ discusses writing programs which
10838can handle different language conventions, character sets, etc.;
10839and is applicable to all character set encodings, with particular
10840emphasis on @w{ISO 8859-1}.  It is regularly published in Usenet
10841groups @file{comp.unix.questions}, @file{comp.std.internat},
10842@file{comp.software.international}, @file{comp.lang.c},
10843@file{comp.windows.x}, @file{comp.std.c}, @file{comp.answers}
10844and @file{news.answers}.  The home location of this document is:
10845@example
10846ftp://ftp.vlsivie.tuwien.ac.at/pub/8bit/ISO-programming
10847@end example
10848
10849Patrick D'Cruze (@file{pdcruze@@li.org}) wrote a tutorial about NLS
10850matters, and Jochen Hein (@file{Hein@@student.tu-clausthal.de}) took
10851over the responsibility of maintaining it.  It may be found as:
10852@example
10853ftp://sunsite.unc.edu/pub/Linux/utils/nls/catalogs/Incoming/...
10854     ...locale-tutorial-0.8.txt.gz
10855@end example
10856@noindent
10857This site is mirrored in:
10858@example
10859ftp://ftp.ibp.fr/pub/linux/sunsite/
10860@end example
10861
10862A French version of the same tutorial should be findable at:
10863@example
10864ftp://ftp.ibp.fr/pub/linux/french/docs/
10865@end example
10866@noindent
10867together with French translations of many Linux-related documents.
10868
10869@node Language Codes, Country Codes, Conclusion, Top
10870@appendix Language Codes
10871@cindex language codes
10872@cindex ISO 639
10873
10874The @w{ISO 639} standard defines two-letter codes for many languages, and
10875three-letter codes for more rarely used languages.
10876All abbreviations for languages used in the Translation Project should
10877come from this standard.
10878
10879@menu
10880* Usual Language Codes::        Two-letter ISO 639 language codes
10881* Rare Language Codes::         Three-letter ISO 639 language codes
10882@end menu
10883
10884@node Usual Language Codes, Rare Language Codes, Language Codes, Language Codes
10885@appendixsec Usual Language Codes
10886
10887For the commonly used languages, the @w{ISO 639-1} standard defines two-letter
10888codes.
10889
10890@table @samp
10891@include iso-639.texi
10892@end table
10893
10894@node Rare Language Codes,  , Usual Language Codes, Language Codes
10895@appendixsec Rare Language Codes
10896
10897For rarely used languages, the @w{ISO 639-2} standard defines three-letter
10898codes.  Here is the current list, reduced to only living languages with at least
10899one million of speakers.
10900
10901@table @samp
10902@include iso-639-2.texi
10903@end table
10904
10905@node Country Codes, Licenses, Language Codes, Top
10906@appendix Country Codes
10907@cindex country codes
10908@cindex ISO 3166
10909
10910The @w{ISO 3166} standard defines two character codes for many countries
10911and territories.  All abbreviations for countries used in the Translation
10912Project should come from this standard.
10913
10914@table @samp
10915@include iso-3166.texi
10916@end table
10917
10918@node Licenses, Program Index, Country Codes, Top
10919@appendix Licenses
10920@cindex Licenses
10921
10922The files of this package are covered by the licenses indicated in each
10923particular file or directory.  Here is a summary:
10924
10925@itemize @bullet
10926@item
10927The @code{libintl} and @code{libasprintf} libraries are covered by the
10928GNU Library General Public License (LGPL).  
10929A copy of the license is included in @ref{GNU LGPL}.
10930
10931@item
10932The executable programs of this package and the @code{libgettextpo} library
10933are covered by the GNU General Public License (GPL).
10934A copy of the license is included in @ref{GNU GPL}.
10935
10936@item
10937This manual is free documentation.  It is dually licensed under the
10938GNU FDL and the GNU GPL.  This means that you can redistribute this
10939manual under either of these two licenses, at your choice.
10940@*
10941This manual is covered by the GNU FDL.  Permission is granted to copy,
10942distribute and/or modify this document under the terms of the
10943GNU Free Documentation License (FDL), either version 1.2 of the
10944License, or (at your option) any later version published by the
10945Free Software Foundation (FSF); with no Invariant Sections, with no
10946Front-Cover Text, and with no Back-Cover Texts.
10947A copy of the license is included in @ref{GNU FDL}.
10948@*
10949This manual is covered by the GNU GPL.  You can redistribute it and/or
10950modify it under the terms of the GNU General Public License (GPL), either
10951version 2 of the License, or (at your option) any later version published
10952by the Free Software Foundation (FSF).
10953A copy of the license is included in @ref{GNU GPL}.
10954@end itemize
10955
10956@menu
10957* GNU GPL::                     GNU General Public License
10958* GNU LGPL::                    GNU Lesser General Public License
10959* GNU FDL::                     GNU Free Documentation License
10960@end menu
10961
10962@page
10963@include gpl.texi
10964@page
10965@include lgpl.texi
10966@page
10967@include fdl.texi
10968
10969@node Program Index, Option Index, Licenses, Top
10970@unnumbered Program Index
10971
10972@printindex pg
10973
10974@node Option Index, Variable Index, Program Index, Top
10975@unnumbered Option Index
10976
10977@printindex op
10978
10979@node Variable Index, PO Mode Index, Option Index, Top
10980@unnumbered Variable Index
10981
10982@printindex vr
10983
10984@node PO Mode Index, Autoconf Macro Index, Variable Index, Top
10985@unnumbered PO Mode Index
10986
10987@printindex em
10988
10989@node Autoconf Macro Index, Index, PO Mode Index, Top
10990@unnumbered Autoconf Macro Index
10991
10992@printindex am
10993
10994@node Index,  , Autoconf Macro Index, Top
10995@unnumbered General Index
10996
10997@printindex cp
10998
10999@iftex
11000@c Table of Contents
11001@contents
11002@end iftex
11003
11004@bye
11005
11006@c Local variables:
11007@c texinfo-column-for-description: 32
11008@c End:
11009