1\input texinfo          @c -*-texinfo-*-
2@c %**start of header
3@setfilename gettext.info
4@c The @ifset makeinfo ... @end ifset conditional evaluates to true in makeinfo
5@c for info and html output, but to false in texi2html.
6@ifnottex
7@ifclear texi2html
8@set makeinfo
9@end ifclear
10@end ifnottex
11@c The @documentencoding is needed for makeinfo; texi2html 1.52
12@c doesn't recognize it.
13@ifset makeinfo
14@documentencoding UTF-8
15@end ifset
16@settitle GNU @code{gettext} utilities
17@finalout
18@c Indices:
19@c   am = autoconf macro  @amindex
20@c   cp = concept         @cindex
21@c   ef = emacs function  @efindex
22@c   em = emacs mode      @emindex
23@c   ev = emacs variable  @evindex
24@c   fn = function        @findex
25@c   kw = keyword         @kwindex
26@c   op = option          @opindex
27@c   pg = program         @pindex
28@c   vr = variable        @vindex
29@c Unused predefined indices:
30@c   tp = type            @tindex
31@c   ky = keystroke       @kindex
32@defcodeindex am
33@defcodeindex ef
34@defindex em
35@defcodeindex ev
36@defcodeindex kw
37@defcodeindex op
38@syncodeindex ef em
39@syncodeindex ev em
40@syncodeindex fn cp
41@syncodeindex kw cp
42@ifclear texi2html
43@firstparagraphindent insert
44@end ifclear
45@c %**end of header
46
47@include version.texi
48
49@ifinfo
50@dircategory GNU Gettext Utilities
51@direntry
52* gettext: (gettext).                          GNU gettext utilities.
53* autopoint: (gettext)autopoint Invocation.    Copy gettext infrastructure.
54* envsubst: (gettext)envsubst Invocation.      Expand environment variables.
55* gettextize: (gettext)gettextize Invocation.  Prepare a package for gettext.
56* msgattrib: (gettext)msgattrib Invocation.    Select part of a PO file.
57* msgcat: (gettext)msgcat Invocation.          Combine several PO files.
58* msgcmp: (gettext)msgcmp Invocation.          Compare a PO file and template.
59* msgcomm: (gettext)msgcomm Invocation.        Match two PO files.
60* msgconv: (gettext)msgconv Invocation.        Convert PO file to encoding.
61* msgen: (gettext)msgen Invocation.            Create an English PO file.
62* msgexec: (gettext)msgexec Invocation.        Process a PO file.
63* msgfilter: (gettext)msgfilter Invocation.    Pipe a PO file through a filter.
64* msgfmt: (gettext)msgfmt Invocation.          Make MO files out of PO files.
65* msggrep: (gettext)msggrep Invocation.        Select part of a PO file.
66* msginit: (gettext)msginit Invocation.        Create a fresh PO file.
67* msgmerge: (gettext)msgmerge Invocation.      Update a PO file from template.
68* msgunfmt: (gettext)msgunfmt Invocation.      Uncompile MO file into PO file.
69* msguniq: (gettext)msguniq Invocation.        Unify duplicates for PO file.
70* ngettext: (gettext)ngettext Invocation.      Translate a message with plural.
71* xgettext: (gettext)xgettext Invocation.      Extract strings into a PO file.
72* ISO639: (gettext)Language Codes.             ISO 639 language codes.
73* ISO3166: (gettext)Country Codes.             ISO 3166 country codes.
74@end direntry
75@end ifinfo
76
77@ifinfo
78This file provides documentation for GNU @code{gettext} utilities.
79It also serves as a reference for the free Translation Project.
80
81@copying
82Copyright (C) 1995-1998, 2001-2007 Free Software Foundation, Inc.
83
84This manual is free documentation.  It is dually licensed under the
85GNU FDL and the GNU GPL.  This means that you can redistribute this
86manual under either of these two licenses, at your choice.
87
88This manual is covered by the GNU FDL.  Permission is granted to copy,
89distribute and/or modify this document under the terms of the
90GNU Free Documentation License (FDL), either version 1.2 of the
91License, or (at your option) any later version published by the
92Free Software Foundation (FSF); with no Invariant Sections, with no
93Front-Cover Text, and with no Back-Cover Texts.
94A copy of the license is included in @ref{GNU FDL}.
95
96This manual is covered by the GNU GPL.  You can redistribute it and/or
97modify it under the terms of the GNU General Public License (GPL), either
98version 2 of the License, or (at your option) any later version published
99by the Free Software Foundation (FSF).
100A copy of the license is included in @ref{GNU GPL}.
101@end copying
102@end ifinfo
103
104@titlepage
105@title GNU gettext tools, version @value{VERSION}
106@subtitle Native Language Support Library and Tools
107@subtitle Edition @value{EDITION}, @value{UPDATED}
108@author Ulrich Drepper
109@author Jim Meyering
110@author Fran@,{c}ois Pinard
111@author Bruno Haible
112
113@ifnothtml
114@page
115@vskip 0pt plus 1filll
116@c @insertcopying
117Copyright (C) 1995-1998, 2001-2007 Free Software Foundation, Inc.
118
119This manual is free documentation.  It is dually licensed under the
120GNU FDL and the GNU GPL.  This means that you can redistribute this
121manual under either of these two licenses, at your choice.
122
123This manual is covered by the GNU FDL.  Permission is granted to copy,
124distribute and/or modify this document under the terms of the
125GNU Free Documentation License (FDL), either version 1.2 of the
126License, or (at your option) any later version published by the
127Free Software Foundation (FSF); with no Invariant Sections, with no
128Front-Cover Text, and with no Back-Cover Texts.
129A copy of the license is included in @ref{GNU FDL}.
130
131This manual is covered by the GNU GPL.  You can redistribute it and/or
132modify it under the terms of the GNU General Public License (GPL), either
133version 2 of the License, or (at your option) any later version published
134by the Free Software Foundation (FSF).
135A copy of the license is included in @ref{GNU GPL}.
136@end ifnothtml
137@end titlepage
138
139@ifnottex
140@c Table of Contents
141@contents
142@end ifnottex
143
144@ifset makeinfo
145@node Top, Introduction, (dir), (dir)
146@top GNU @code{gettext} utilities
147
148This manual documents the GNU gettext tools and the GNU libintl library,
149version @value{VERSION}.
150
151@menu
152* Introduction::                Introduction
153* Users::                       The User's View
154* PO Files::                    The Format of PO Files
155* Sources::                     Preparing Program Sources
156* Template::                    Making the PO Template File
157* Creating::                    Creating a New PO File
158* Updating::                    Updating Existing PO Files
159* Editing::                     Editing PO Files
160* Manipulating::                Manipulating PO Files
161* Binaries::                    Producing Binary MO Files
162* Programmers::                 The Programmer's View
163* Translators::                 The Translator's View
164* Maintainers::                 The Maintainer's View
165* Installers::                  The Installer's and Distributor's View
166* Programming Languages::       Other Programming Languages
167* Conclusion::                  Concluding Remarks
168
169* Language Codes::              ISO 639 language codes
170* Country Codes::               ISO 3166 country codes
171* Licenses::                    Licenses
172
173* Program Index::               Index of Programs
174* Option Index::                Index of Command-Line Options
175* Variable Index::              Index of Environment Variables
176* PO Mode Index::               Index of Emacs PO Mode Commands
177* Autoconf Macro Index::        Index of Autoconf Macros
178* Index::                       General Index
179
180@detailmenu
181 --- The Detailed Node Listing ---
182
183Introduction
184
185* Why::                         The Purpose of GNU @code{gettext}
186* Concepts::                    I18n, L10n, and Such
187* Aspects::                     Aspects in Native Language Support
188* Files::                       Files Conveying Translations
189* Overview::                    Overview of GNU @code{gettext}
190
191The User's View
192
193* System Installation::         Questions During Operating System Installation
194* Setting the GUI Locale::      How to Specify the Locale Used by GUI Programs
195* Setting the POSIX Locale::    How to Specify the Locale According to POSIX
196* Installing Localizations::    How to Install Additional Translations
197
198Setting the POSIX Locale
199
200* Locale Names::                How a Locale Specification Looks Like
201* Locale Environment Variables:: Which Environment Variable Specfies What
202* The LANGUAGE variable::       How to Specify a Priority List of Languages
203
204Preparing Program Sources
205
206* Importing::                   Importing the @code{gettext} declaration
207* Triggering::                  Triggering @code{gettext} Operations
208* Preparing Strings::           Preparing Translatable Strings
209* Mark Keywords::               How Marks Appear in Sources
210* Marking::                     Marking Translatable Strings
211* c-format Flag::               Telling something about the following string
212* Special cases::               Special Cases of Translatable Strings
213* Bug Report Address::          Letting Users Report Translation Bugs
214* Names::                       Marking Proper Names for Translation
215* Libraries::                   Preparing Library Sources
216
217Making the PO Template File
218
219* xgettext Invocation::         Invoking the @code{xgettext} Program
220
221Creating a New PO File
222
223* msginit Invocation::          Invoking the @code{msginit} Program
224* Header Entry::                Filling in the Header Entry
225
226Updating Existing PO Files
227
228* msgmerge Invocation::         Invoking the @code{msgmerge} Program
229
230Editing PO Files
231
232* KBabel::                      KDE's PO File Editor
233* Gtranslator::                 GNOME's PO File Editor
234* PO Mode::                     Emacs's PO File Editor
235* Compendium::                  Using Translation Compendia
236
237Emacs's PO File Editor
238
239* Installation::                Completing GNU @code{gettext} Installation
240* Main PO Commands::            Main Commands
241* Entry Positioning::           Entry Positioning
242* Normalizing::                 Normalizing Strings in Entries
243* Translated Entries::          Translated Entries
244* Fuzzy Entries::               Fuzzy Entries
245* Untranslated Entries::        Untranslated Entries
246* Obsolete Entries::            Obsolete Entries
247* Modifying Translations::      Modifying Translations
248* Modifying Comments::          Modifying Comments
249* Subedit::                     Mode for Editing Translations
250* C Sources Context::           C Sources Context
251* Auxiliary::                   Consulting Auxiliary PO Files
252
253Using Translation Compendia
254
255* Creating Compendia::          Merging translations for later use
256* Using Compendia::             Using older translations if they fit
257
258Manipulating PO Files
259
260* msgcat Invocation::           Invoking the @code{msgcat} Program
261* msgconv Invocation::          Invoking the @code{msgconv} Program
262* msggrep Invocation::          Invoking the @code{msggrep} Program
263* msgfilter Invocation::        Invoking the @code{msgfilter} Program
264* msguniq Invocation::          Invoking the @code{msguniq} Program
265* msgcomm Invocation::          Invoking the @code{msgcomm} Program
266* msgcmp Invocation::           Invoking the @code{msgcmp} Program
267* msgattrib Invocation::        Invoking the @code{msgattrib} Program
268* msgen Invocation::            Invoking the @code{msgen} Program
269* msgexec Invocation::          Invoking the @code{msgexec} Program
270* Colorizing::                  Highlighting parts of PO files
271* libgettextpo::                Writing your own programs that process PO files
272
273Highlighting parts of PO files
274
275* The --color option::          Triggering colorized output
276* The TERM variable::           The environment variable @code{TERM}
277* The --style option::          The @code{--style} option
278* Style rules::                 Style rules for PO files
279* Customizing less::            Customizing @code{less} for viewing PO files
280
281Producing Binary MO Files
282
283* msgfmt Invocation::           Invoking the @code{msgfmt} Program
284* msgunfmt Invocation::         Invoking the @code{msgunfmt} Program
285* MO Files::                    The Format of GNU MO Files
286
287The Programmer's View
288
289* catgets::                     About @code{catgets}
290* gettext::                     About @code{gettext}
291* Comparison::                  Comparing the two interfaces
292* Using libintl.a::             Using libintl.a in own programs
293* gettext grok::                Being a @code{gettext} grok
294* Temp Programmers::            Temporary Notes for the Programmers Chapter
295
296About @code{catgets}
297
298* Interface to catgets::        The interface
299* Problems with catgets::       Problems with the @code{catgets} interface?!
300
301About @code{gettext}
302
303* Interface to gettext::        The interface
304* Ambiguities::                 Solving ambiguities
305* Locating Catalogs::           Locating message catalog files
306* Charset conversion::          How to request conversion to Unicode
307* Contexts::                    Solving ambiguities in GUI programs
308* Plural forms::                Additional functions for handling plurals
309* Optimized gettext::           Optimization of the *gettext functions
310
311Temporary Notes for the Programmers Chapter
312
313* Temp Implementations::        Temporary - Two Possible Implementations
314* Temp catgets::                Temporary - About @code{catgets}
315* Temp WSI::                    Temporary - Why a single implementation
316* Temp Notes::                  Temporary - Notes
317
318The Translator's View
319
320* Trans Intro 0::               Introduction 0
321* Trans Intro 1::               Introduction 1
322* Discussions::                 Discussions
323* Organization::                Organization
324* Information Flow::            Information Flow
325* Prioritizing messages::       How to find which messages to translate first
326
327Organization
328
329* Central Coordination::        Central Coordination
330* National Teams::              National Teams
331* Mailing Lists::               Mailing Lists
332
333National Teams
334
335* Sub-Cultures::                Sub-Cultures
336* Organizational Ideas::        Organizational Ideas
337
338The Maintainer's View
339
340* Flat and Non-Flat::           Flat or Non-Flat Directory Structures
341* Prerequisites::               Prerequisite Works
342* gettextize Invocation::       Invoking the @code{gettextize} Program
343* Adjusting Files::             Files You Must Create or Alter
344* autoconf macros::             Autoconf macros for use in @file{configure.ac}
345* CVS Issues::                  Integrating with CVS
346* Release Management::          Creating a Distribution Tarball
347
348Files You Must Create or Alter
349
350* po/POTFILES.in::              @file{POTFILES.in} in @file{po/}
351* po/LINGUAS::                  @file{LINGUAS} in @file{po/}
352* po/Makevars::                 @file{Makevars} in @file{po/}
353* po/Rules-*::                  Extending @file{Makefile} in @file{po/}
354* configure.ac::                @file{configure.ac} at top level
355* config.guess::                @file{config.guess}, @file{config.sub} at top level
356* mkinstalldirs::               @file{mkinstalldirs} at top level
357* aclocal::                     @file{aclocal.m4} at top level
358* acconfig::                    @file{acconfig.h} at top level
359* config.h.in::                 @file{config.h.in} at top level
360* Makefile::                    @file{Makefile.in} at top level
361* src/Makefile::                @file{Makefile.in} in @file{src/}
362* lib/gettext.h::               @file{gettext.h} in @file{lib/}
363
364Autoconf macros for use in @file{configure.ac}
365
366* AM_GNU_GETTEXT::              AM_GNU_GETTEXT in @file{gettext.m4}
367* AM_GNU_GETTEXT_VERSION::      AM_GNU_GETTEXT_VERSION in @file{gettext.m4}
368* AM_GNU_GETTEXT_NEED::         AM_GNU_GETTEXT_NEED in @file{gettext.m4}
369* AM_GNU_GETTEXT_INTL_SUBDIR::  AM_GNU_GETTEXT_INTL_SUBDIR in @file{intldir.m4}
370* AM_PO_SUBDIRS::               AM_PO_SUBDIRS in @file{po.m4}
371* AM_ICONV::                    AM_ICONV in @file{iconv.m4}
372
373Integrating with CVS
374
375* Distributed CVS::             Avoiding version mismatch in distributed development
376* Files under CVS::             Files to put under CVS version control
377* autopoint Invocation::        Invoking the @code{autopoint} Program
378
379Other Programming Languages
380
381* Language Implementors::       The Language Implementor's View
382* Programmers for other Languages::  The Programmer's View
383* Translators for other Languages::  The Translator's View
384* Maintainers for other Languages::  The Maintainer's View
385* List of Programming Languages::  Individual Programming Languages
386* List of Data Formats::        Internationalizable Data
387
388The Translator's View
389
390* c-format::                    C Format Strings
391* objc-format::                 Objective C Format Strings
392* sh-format::                   Shell Format Strings
393* python-format::               Python Format Strings
394* lisp-format::                 Lisp Format Strings
395* elisp-format::                Emacs Lisp Format Strings
396* librep-format::               librep Format Strings
397* scheme-format::               Scheme Format Strings
398* smalltalk-format::            Smalltalk Format Strings
399* java-format::                 Java Format Strings
400* csharp-format::               C# Format Strings
401* awk-format::                  awk Format Strings
402* object-pascal-format::        Object Pascal Format Strings
403* ycp-format::                  YCP Format Strings
404* tcl-format::                  Tcl Format Strings
405* perl-format::                 Perl Format Strings
406* php-format::                  PHP Format Strings
407* gcc-internal-format::         GCC internal Format Strings
408* qt-format::                   Qt Format Strings
409* kde-format::                  KDE Format Strings
410* boost-format::                Boost Format Strings
411
412Individual Programming Languages
413
414* C::                           C, C++, Objective C
415* sh::                          sh - Shell Script
416* bash::                        bash - Bourne-Again Shell Script
417* Python::                      Python
418* Common Lisp::                 GNU clisp - Common Lisp
419* clisp C::                     GNU clisp C sources
420* Emacs Lisp::                  Emacs Lisp
421* librep::                      librep
422* Scheme::                      GNU guile - Scheme
423* Smalltalk::                   GNU Smalltalk
424* Java::                        Java
425* C#::                          C#
426* gawk::                        GNU awk
427* Pascal::                      Pascal - Free Pascal Compiler
428* wxWidgets::                   wxWidgets library
429* YCP::                         YCP - YaST2 scripting language
430* Tcl::                         Tcl - Tk's scripting language
431* Perl::                        Perl
432* PHP::                         PHP Hypertext Preprocessor
433* Pike::                        Pike
434* GCC-source::                  GNU Compiler Collection sources
435
436sh - Shell Script
437
438* Preparing Shell Scripts::     Preparing Shell Scripts for Internationalization
439* gettext.sh::                  Contents of @code{gettext.sh}
440* gettext Invocation::          Invoking the @code{gettext} program
441* ngettext Invocation::         Invoking the @code{ngettext} program
442* envsubst Invocation::         Invoking the @code{envsubst} program
443* eval_gettext Invocation::     Invoking the @code{eval_gettext} function
444* eval_ngettext Invocation::    Invoking the @code{eval_ngettext} function
445
446Perl
447
448* General Problems::            General Problems Parsing Perl Code
449* Default Keywords::            Which Keywords Will xgettext Look For?
450* Special Keywords::            How to Extract Hash Keys
451* Quote-like Expressions::      What are Strings And Quote-like Expressions?
452* Interpolation I::             Invalid String Interpolation
453* Interpolation II::            Valid String Interpolation
454* Parentheses::                 When To Use Parentheses
455* Long Lines::                  How To Grok with Long Lines
456* Perl Pitfalls::               Bugs, Pitfalls, and Things That Do Not Work
457
458Internationalizable Data
459
460* POT::                         POT - Portable Object Template
461* RST::                         Resource String Table
462* Glade::                       Glade - GNOME user interface description
463
464Concluding Remarks
465
466* History::                     History of GNU @code{gettext}
467* References::                  Related Readings
468
469Language Codes
470
471* Usual Language Codes::        Two-letter ISO 639 language codes
472* Rare Language Codes::         Three-letter ISO 639 language codes
473
474Licenses
475
476* GNU GPL::                     GNU General Public License
477* GNU LGPL::                    GNU Lesser General Public License
478* GNU FDL::                     GNU Free Documentation License
479
480@end detailmenu
481@end menu
482
483@end ifset
484
485@node Introduction, Users, Top, Top
486@chapter Introduction
487
488This chapter explains the goals sought in the creation
489of GNU @code{gettext} and the free Translation Project.
490Then, it explains a few broad concepts around
491Native Language Support, and positions message translation with regard
492to other aspects of national and cultural variance, as they apply
493to programs.  It also surveys those files used to convey the
494translations.  It explains how the various tools interact in the
495initial generation of these files, and later, how the maintenance
496cycle should usually operate.
497
498@cindex sex
499@cindex he, she, and they
500@cindex she, he, and they
501In this manual, we use @emph{he} when speaking of the programmer or
502maintainer, @emph{she} when speaking of the translator, and @emph{they}
503when speaking of the installers or end users of the translated program.
504This is only a convenience for clarifying the documentation.  It is
505@emph{absolutely} not meant to imply that some roles are more appropriate
506to males or females.  Besides, as you might guess, GNU @code{gettext}
507is meant to be useful for people using computers, whatever their sex,
508race, religion or nationality!
509
510@cindex bug report address
511Please send suggestions and corrections to:
512
513@example
514@group
515@r{Internet address:}
516    bug-gnu-gettext@@gnu.org
517@end group
518@end example
519
520@noindent
521Please include the manual's edition number and update date in your messages.
522
523@menu
524* Why::                         The Purpose of GNU @code{gettext}
525* Concepts::                    I18n, L10n, and Such
526* Aspects::                     Aspects in Native Language Support
527* Files::                       Files Conveying Translations
528* Overview::                    Overview of GNU @code{gettext}
529@end menu
530
531@node Why, Concepts, Introduction, Introduction
532@section The Purpose of GNU @code{gettext}
533
534Usually, programs are written and documented in English, and use
535English at execution time to interact with users.  This is true
536not only of GNU software, but also of a great deal of proprietary
537and free software.  Using a common language is quite handy for
538communication between developers, maintainers and users from all
539countries.  On the other hand, most people are less comfortable with
540English than with their own native language, and would prefer to
541use their mother tongue for day to day's work, as far as possible.
542Many would simply @emph{love} to see their computer screen showing
543a lot less of English, and far more of their own language.
544
545@cindex Translation Project
546However, to many people, this dream might appear so far fetched that
547they may believe it is not even worth spending time thinking about
548it.  They have no confidence at all that the dream might ever
549become true.  Yet some have not lost hope, and have organized themselves.
550The Translation Project is a formalization of this hope into a
551workable structure, which has a good chance to get all of us nearer
552the achievement of a truly multi-lingual set of programs.
553
554GNU @code{gettext} is an important step for the Translation Project,
555as it is an asset on which we may build many other steps.  This package
556offers to programmers, translators and even users, a well integrated
557set of tools and documentation.  Specifically, the GNU @code{gettext}
558utilities are a set of tools that provides a framework within which
559other free packages may produce multi-lingual messages.  These tools
560include
561
562@itemize @bullet
563@item
564A set of conventions about how programs should be written to support
565message catalogs.
566
567@item
568A directory and file naming organization for the message catalogs
569themselves.
570
571@item
572A runtime library supporting the retrieval of translated messages.
573
574@item
575A few stand-alone programs to massage in various ways the sets of
576translatable strings, or already translated strings.
577
578@item
579A library supporting the parsing and creation of files containing
580translated messages.
581
582@item
583A special mode for Emacs@footnote{In this manual, all mentions of Emacs
584refers to either GNU Emacs or to XEmacs, which people sometimes call FSF
585Emacs and Lucid Emacs, respectively.} which helps preparing these sets
586and bringing them up to date.
587@end itemize
588
589GNU @code{gettext} is designed to minimize the impact of
590internationalization on program sources, keeping this impact as small
591and hardly noticeable as possible.  Internationalization has better
592chances of succeeding if it is very light weighted, or at least,
593appear to be so, when looking at program sources.
594
595The Translation Project also uses the GNU @code{gettext} distribution
596as a vehicle for documenting its structure and methods.  This goes
597beyond the strict technicalities of documenting the GNU @code{gettext}
598proper.  By so doing, translators will find in a single place, as
599far as possible, all they need to know for properly doing their
600translating work.  Also, this supplemental documentation might also
601help programmers, and even curious users, in understanding how GNU
602@code{gettext} is related to the remainder of the Translation
603Project, and consequently, have a glimpse at the @emph{big picture}.
604
605@node Concepts, Aspects, Why, Introduction
606@section I18n, L10n, and Such
607
608@cindex i18n
609@cindex l10n
610Two long words appear all the time when we discuss support of native
611language in programs, and these words have a precise meaning, worth
612being explained here, once and for all in this document.  The words are
613@emph{internationalization} and @emph{localization}.  Many people,
614tired of writing these long words over and over again, took the
615habit of writing @dfn{i18n} and @dfn{l10n} instead, quoting the first
616and last letter of each word, and replacing the run of intermediate
617letters by a number merely telling how many such letters there are.
618But in this manual, in the sake of clarity, we will patiently write
619the names in full, each time@dots{}
620
621@cindex internationalization
622By @dfn{internationalization}, one refers to the operation by which a
623program, or a set of programs turned into a package, is made aware of and
624able to support multiple languages.  This is a generalization process,
625by which the programs are untied from calling only English strings or
626other English specific habits, and connected to generic ways of doing
627the same, instead.  Program developers may use various techniques to
628internationalize their programs.  Some of these have been standardized.
629GNU @code{gettext} offers one of these standards.  @xref{Programmers}.
630
631@cindex localization
632By @dfn{localization}, one means the operation by which, in a set
633of programs already internationalized, one gives the program all
634needed information so that it can adapt itself to handle its input
635and output in a fashion which is correct for some native language and
636cultural habits.  This is a particularisation process, by which generic
637methods already implemented in an internationalized program are used
638in specific ways.  The programming environment puts several functions
639to the programmers disposal which allow this runtime configuration.
640The formal description of specific set of cultural habits for some
641country, together with all associated translations targeted to the
642same native language, is called the @dfn{locale} for this language
643or country.  Users achieve localization of programs by setting proper
644values to special environment variables, prior to executing those
645programs, identifying which locale should be used.
646
647In fact, locale message support is only one component of the cultural
648data that makes up a particular locale.  There are a whole host of
649routines and functions provided to aid programmers in developing
650internationalized software and which allow them to access the data
651stored in a particular locale.  When someone presently refers to a
652particular locale, they are obviously referring to the data stored
653within that particular locale.  Similarly, if a programmer is referring
654to ``accessing the locale routines'', they are referring to the
655complete suite of routines that access all of the locale's information.
656
657@cindex NLS
658@cindex Native Language Support
659@cindex Natural Language Support
660One uses the expression @dfn{Native Language Support}, or merely NLS,
661for speaking of the overall activity or feature encompassing both
662internationalization and localization, allowing for multi-lingual
663interactions in a program.  In a nutshell, one could say that
664internationalization is the operation by which further localizations
665are made possible.
666
667Also, very roughly said, when it comes to multi-lingual messages,
668internationalization is usually taken care of by programmers, and
669localization is usually taken care of by translators.
670
671@node Aspects, Files, Concepts, Introduction
672@section Aspects in Native Language Support
673
674@cindex translation aspects
675For a totally multi-lingual distribution, there are many things to
676translate beyond output messages.
677
678@itemize @bullet
679@item
680As of today, GNU @code{gettext} offers a complete toolset for
681translating messages output by C programs.  Perl scripts and shell
682scripts will also need to be translated.  Even if there are today some hooks
683by which this can be done, these hooks are not integrated as well as they
684should be.
685
686@item
687Some programs, like @code{autoconf} or @code{bison}, are able
688to produce other programs (or scripts).  Even if the generating
689programs themselves are internationalized, the generated programs they
690produce may need internationalization on their own, and this indirect
691internationalization could be automated right from the generating
692program.  In fact, quite usually, generating and generated programs
693could be internationalized independently, as the effort needed is
694fairly orthogonal.
695
696@item
697A few programs include textual tables which might need translation
698themselves, independently of the strings contained in the program
699itself.  For example, @w{RFC 1345} gives an English description for each
700character which the @code{recode} program is able to reconstruct at execution.
701Since these descriptions are extracted from the RFC by mechanical means,
702translating them properly would require a prior translation of the RFC
703itself.
704
705@item
706Almost all programs accept options, which are often worded out so to
707be descriptive for the English readers; one might want to consider
708offering translated versions for program options as well.
709
710@item
711Many programs read, interpret, compile, or are somewhat driven by
712input files which are texts containing keywords, identifiers, or
713replies which are inherently translatable.  For example, one may want
714@code{gcc} to allow diacriticized characters in identifiers or use
715translated keywords; @samp{rm -i} might accept something else than
716@samp{y} or @samp{n} for replies, etc.  Even if the program will
717eventually make most of its output in the foreign languages, one has
718to decide whether the input syntax, option values, etc., are to be
719localized or not.
720
721@item
722The manual accompanying a package, as well as all documentation files
723in the distribution, could surely be translated, too.  Translating a
724manual, with the intent of later keeping up with updates, is a major
725undertaking in itself, generally.
726
727@end itemize
728
729As we already stressed, translation is only one aspect of locales.
730Other internationalization aspects are system services and are handled
731in GNU @code{libc}.  There
732are many attributes that are needed to define a country's cultural
733conventions.  These attributes include beside the country's native
734language, the formatting of the date and time, the representation of
735numbers, the symbols for currency, etc.  These local @dfn{rules} are
736termed the country's locale.  The locale represents the knowledge
737needed to support the country's native attributes.
738
739@cindex locale categories
740There are a few major areas which may vary between countries and
741hence, define what a locale must describe.  The following list helps
742putting multi-lingual messages into the proper context of other tasks
743related to locales.  See the GNU @code{libc} manual for details.
744
745@table @emph
746
747@item Characters and Codesets
748@cindex codeset
749@cindex encoding
750@cindex character encoding
751@cindex locale category, LC_CTYPE
752
753The codeset most commonly used through out the USA and most English
754speaking parts of the world is the ASCII codeset.  However, there are
755many characters needed by various locales that are not found within
756this codeset.  The 8-bit @w{ISO 8859-1} code set has most of the special
757characters needed to handle the major European languages.  However, in
758many cases, choosing @w{ISO 8859-1} is nevertheless not adequate: it
759doesn't even handle the major European currency.  Hence each locale
760will need to specify which codeset they need to use and will need
761to have the appropriate character handling routines to cope with
762the codeset.
763
764@item Currency
765@cindex currency symbols
766@cindex locale category, LC_MONETARY
767
768The symbols used vary from country to country as does the position
769used by the symbol.  Software needs to be able to transparently
770display currency figures in the native mode for each locale.
771
772@item Dates
773@cindex date format
774@cindex locale category, LC_TIME
775
776The format of date varies between locales.  For example, Christmas day
777in 1994 is written as 12/25/94 in the USA and as 25/12/94 in Australia.
778Other countries might use @w{ISO 8601} dates, etc.
779
780Time of the day may be noted as @var{hh}:@var{mm}, @var{hh}.@var{mm},
781or otherwise.  Some locales require time to be specified in 24-hour
782mode rather than as AM or PM.  Further, the nature and yearly extent
783of the Daylight Saving correction vary widely between countries.
784
785@item Numbers
786@cindex number format
787@cindex locale category, LC_NUMERIC
788
789Numbers can be represented differently in different locales.
790For example, the following numbers are all written correctly for
791their respective locales:
792
793@example
79412,345.67       English
79512.345,67       German
796 12345,67       French
7971,2345.67       Asia
798@end example
799
800Some programs could go further and use different unit systems, like
801English units or Metric units, or even take into account variants
802about how numbers are spelled in full.
803
804@item Messages
805@cindex messages
806@cindex locale category, LC_MESSAGES
807
808The most obvious area is the language support within a locale.  This is
809where GNU @code{gettext} provides the means for developers and users to
810easily change the language that the software uses to communicate to
811the user.
812
813@end table
814
815@cindex locale categories
816These areas of cultural conventions are called @emph{locale categories}.
817It is an unfortunate term; @emph{locale aspects} or @emph{locale feature
818categories} would be a better term, because each ``locale category''
819describes an area or task that requires localization.  The concrete data
820that describes the cultural conventions for such an area and for a particular
821culture is also called a @emph{locale category}.  In this sense, a locale
822is composed of several locale categories: the locale category describing
823the codeset, the locale category describing the formatting of numbers,
824the locale category containing the translated messages, and so on.
825
826@cindex Linux
827Components of locale outside of message handling are standardized in
828the ISO C standard and the POSIX:2001 standard (also known as the SUSV3
829specification).  GNU @code{libc}
830fully implements this, and most other modern systems provide a more
831or less reasonable support for at least some of the missing components.
832
833@node Files, Overview, Aspects, Introduction
834@section Files Conveying Translations
835
836@cindex files, @file{.po} and @file{.mo}
837The letters PO in @file{.po} files means Portable Object, to
838distinguish it from @file{.mo} files, where MO stands for Machine
839Object.  This paradigm, as well as the PO file format, is inspired
840by the NLS standard developed by Uniforum, and first implemented by
841Sun in their Solaris system.
842
843PO files are meant to be read and edited by humans, and associate each
844original, translatable string of a given package with its translation
845in a particular target language.  A single PO file is dedicated to
846a single target language.  If a package supports many languages,
847there is one such PO file per language supported, and each package
848has its own set of PO files.  These PO files are best created by
849the @code{xgettext} program, and later updated or refreshed through
850the @code{msgmerge} program.  Program @code{xgettext} extracts all
851marked messages from a set of C files and initializes a PO file with
852empty translations.  Program @code{msgmerge} takes care of adjusting
853PO files between releases of the corresponding sources, commenting
854obsolete entries, initializing new ones, and updating all source
855line references.  Files ending with @file{.pot} are kind of base
856translation files found in distributions, in PO file format.
857
858MO files are meant to be read by programs, and are binary in nature.
859A few systems already offer tools for creating and handling MO files
860as part of the Native Language Support coming with the system, but the
861format of these MO files is often different from system to system,
862and non-portable.  The tools already provided with these systems don't
863support all the features of GNU @code{gettext}.  Therefore GNU
864@code{gettext} uses its own format for MO files.  Files ending with
865@file{.gmo} are really MO files, when it is known that these files use
866the GNU format.
867
868@node Overview,  , Files, Introduction
869@section Overview of GNU @code{gettext}
870
871@cindex overview of @code{gettext}
872@cindex big picture
873@cindex tutorial of @code{gettext} usage
874The following diagram summarizes the relation between the files
875handled by GNU @code{gettext} and the tools acting on these files.
876It is followed by somewhat detailed explanations, which you should
877read while keeping an eye on the diagram.  Having a clear understanding
878of these interrelations will surely help programmers, translators
879and maintainers.
880
881@ifhtml
882@example
883@group
884Original C Sources ���������> Preparation ���������> Marked C Sources ������������
885                                                             ���
886              ������������������������������<��������� GNU gettext Library             ���
887������������ make <������������                                              ���
888���             ������������������������������<���������������������������������������������������������������������������������������������������������������
889���                                            ���
890���   ������������������<��������� PACKAGE.pot <��������� xgettext <������������   ������������<��������� PO Compendium
891���   ���                                            ���              ���
892���   ���                                            ���������������          ���
893���   ���������������                                            ������������> PO editor ������������
894���       ���������������> msgmerge ������������������> LANG.po ������������>���������������������������                  ���
895���   ���������������                                                               ���
896���   ���                                                                   ���
897���   ������������������������������������������<������������������������������������������������                                     ���
898���                                 ������������ New LANG.po <���������������������������������������������������������������
899���   ������������ LANG.gmo <��������� msgfmt <������������
900���   ���
901���   ������������> install ���������> /.../LANG/PACKAGE.mo ������������
902���                                              ������������> "Hello world!"
903������������������������> install ���������> /.../bin/PROGRAM ������������������������
904@end group
905@end example
906@end ifhtml
907@ifnothtml
908@example
909@group
910Original C Sources ---> Preparation ---> Marked C Sources ---.
911                                                             |
912              .---------<--- GNU gettext Library             |
913.--- make <---+                                              |
914|             `---------<--------------------+---------------'
915|                                            |
916|   .-----<--- PACKAGE.pot <--- xgettext <---'   .---<--- PO Compendium
917|   |                                            |              ^
918|   |                                            `---.          |
919|   `---.                                            +---> PO editor ---.
920|       +----> msgmerge ------> LANG.po ---->--------'                  |
921|   .---'                                                               |
922|   |                                                                   |
923|   `-------------<---------------.                                     |
924|                                 +--- New LANG.po <--------------------'
925|   .--- LANG.gmo <--- msgfmt <---'
926|   |
927|   `---> install ---> /.../LANG/PACKAGE.mo ---.
928|                                              +---> "Hello world!"
929`-------> install ---> /.../bin/PROGRAM -------'
930@end group
931@end example
932@end ifnothtml
933
934@cindex marking translatable strings
935As a programmer, the first step to bringing GNU @code{gettext}
936into your package is identifying, right in the C sources, those strings
937which are meant to be translatable, and those which are untranslatable.
938This tedious job can be done a little more comfortably using emacs PO
939mode, but you can use any means familiar to you for modifying your
940C sources.  Beside this some other simple, standard changes are needed to
941properly initialize the translation library.  @xref{Sources}, for
942more information about all this.
943
944For newly written software the strings of course can and should be
945marked while writing it.  The @code{gettext} approach makes this
946very easy.  Simply put the following lines at the beginning of each file
947or in a central header file:
948
949@example
950@group
951#define _(String) (String)
952#define N_(String) String
953#define textdomain(Domain)
954#define bindtextdomain(Package, Directory)
955@end group
956@end example
957
958@noindent
959Doing this allows you to prepare the sources for internationalization.
960Later when you feel ready for the step to use the @code{gettext} library
961simply replace these definitions by the following:
962
963@cindex include file @file{libintl.h}
964@example
965@group
966#include <libintl.h>
967#define _(String) gettext (String)
968#define gettext_noop(String) String
969#define N_(String) gettext_noop (String)
970@end group
971@end example
972
973@cindex link with @file{libintl}
974@cindex Linux
975@noindent
976and link against @file{libintl.a} or @file{libintl.so}.  Note that on
977GNU systems, you don't need to link with @code{libintl} because the
978@code{gettext} library functions are already contained in GNU libc.
979That is all you have to change.
980
981@cindex template PO file
982@cindex files, @file{.pot}
983Once the C sources have been modified, the @code{xgettext} program
984is used to find and extract all translatable strings, and create a
985PO template file out of all these.  This @file{@var{package}.pot} file
986contains all original program strings.  It has sets of pointers to
987exactly where in C sources each string is used.  All translations
988are set to empty.  The letter @code{t} in @file{.pot} marks this as
989a Template PO file, not yet oriented towards any particular language.
990@xref{xgettext Invocation}, for more details about how one calls the
991@code{xgettext} program.  If you are @emph{really} lazy, you might
992be interested at working a lot more right away, and preparing the
993whole distribution setup (@pxref{Maintainers}).  By doing so, you
994spare yourself typing the @code{xgettext} command, as @code{make}
995should now generate the proper things automatically for you!
996
997The first time through, there is no @file{@var{lang}.po} yet, so the
998@code{msgmerge} step may be skipped and replaced by a mere copy of
999@file{@var{package}.pot} to @file{@var{lang}.po}, where @var{lang}
1000represents the target language.  See @ref{Creating} for details.
1001
1002Then comes the initial translation of messages.  Translation in
1003itself is a whole matter, still exclusively meant for humans,
1004and whose complexity far overwhelms the level of this manual.
1005Nevertheless, a few hints are given in some other chapter of this
1006manual (@pxref{Translators}).  You will also find there indications
1007about how to contact translating teams, or becoming part of them,
1008for sharing your translating concerns with others who target the same
1009native language.
1010
1011While adding the translated messages into the @file{@var{lang}.po}
1012PO file, if you are not using one of the dedicated PO file editors
1013(@pxref{Editing}), you are on your own
1014for ensuring that your efforts fully respect the PO file format, and quoting
1015conventions (@pxref{PO Files}).  This is surely not an impossible task,
1016as this is the way many people have handled PO files around 1995.
1017On the other hand, by using a PO file editor, most details
1018of PO file format are taken care of for you, but you have to acquire
1019some familiarity with PO file editor itself.
1020
1021If some common translations have already been saved into a compendium
1022PO file, translators may use PO mode for initializing untranslated
1023entries from the compendium, and also save selected translations into
1024the compendium, updating it (@pxref{Compendium}).  Compendium files
1025are meant to be exchanged between members of a given translation team.
1026
1027Programs, or packages of programs, are dynamic in nature: users write
1028bug reports and suggestion for improvements, maintainers react by
1029modifying programs in various ways.  The fact that a package has
1030already been internationalized should not make maintainers shy
1031of adding new strings, or modifying strings already translated.
1032They just do their job the best they can.  For the Translation
1033Project to work smoothly, it is important that maintainers do not
1034carry translation concerns on their already loaded shoulders, and that
1035translators be kept as free as possible of programming concerns.
1036
1037The only concern maintainers should have is carefully marking new
1038strings as translatable, when they should be, and do not otherwise
1039worry about them being translated, as this will come in proper time.
1040Consequently, when programs and their strings are adjusted in various
1041ways by maintainers, and for matters usually unrelated to translation,
1042@code{xgettext} would construct @file{@var{package}.pot} files which are
1043evolving over time, so the translations carried by @file{@var{lang}.po}
1044are slowly fading out of date.
1045
1046@cindex evolution of packages
1047It is important for translators (and even maintainers) to understand
1048that package translation is a continuous process in the lifetime of a
1049package, and not something which is done once and for all at the start.
1050After an initial burst of translation activity for a given package,
1051interventions are needed once in a while, because here and there,
1052translated entries become obsolete, and new untranslated entries
1053appear, needing translation.
1054
1055The @code{msgmerge} program has the purpose of refreshing an already
1056existing @file{@var{lang}.po} file, by comparing it with a newer
1057@file{@var{package}.pot} template file, extracted by @code{xgettext}
1058out of recent C sources.  The refreshing operation adjusts all
1059references to C source locations for strings, since these strings
1060move as programs are modified.  Also, @code{msgmerge} comments out as
1061obsolete, in @file{@var{lang}.po}, those already translated entries
1062which are no longer used in the program sources (@pxref{Obsolete
1063Entries}).  It finally discovers new strings and inserts them in
1064the resulting PO file as untranslated entries (@pxref{Untranslated
1065Entries}).  @xref{msgmerge Invocation}, for more information about what
1066@code{msgmerge} really does.
1067
1068Whatever route or means taken, the goal is to obtain an updated
1069@file{@var{lang}.po} file offering translations for all strings.
1070
1071The temporal mobility, or fluidity of PO files, is an integral part of
1072the translation game, and should be well understood, and accepted.
1073People resisting it will have a hard time participating in the
1074Translation Project, or will give a hard time to other participants!  In
1075particular, maintainers should relax and include all available official
1076PO files in their distributions, even if these have not recently been
1077updated, without exerting pressure on the translator teams to get the
1078job done.  The pressure should rather come
1079from the community of users speaking a particular language, and
1080maintainers should consider themselves fairly relieved of any concern
1081about the adequacy of translation files.  On the other hand, translators
1082should reasonably try updating the PO files they are responsible for,
1083while the package is undergoing pretest, prior to an official
1084distribution.
1085
1086Once the PO file is complete and dependable, the @code{msgfmt} program
1087is used for turning the PO file into a machine-oriented format, which
1088may yield efficient retrieval of translations by the programs of the
1089package, whenever needed at runtime (@pxref{MO Files}).  @xref{msgfmt
1090Invocation}, for more information about all modes of execution
1091for the @code{msgfmt} program.
1092
1093Finally, the modified and marked C sources are compiled and linked
1094with the GNU @code{gettext} library, usually through the operation of
1095@code{make}, given a suitable @file{Makefile} exists for the project,
1096and the resulting executable is installed somewhere users will find it.
1097The MO files themselves should also be properly installed.  Given the
1098appropriate environment variables are set (@pxref{Setting the POSIX Locale}),
1099the program should localize itself automatically, whenever it executes.
1100
1101The remainder of this manual has the purpose of explaining in depth the various
1102steps outlined above.
1103
1104@node Users, PO Files, Introduction, Top
1105@chapter The User's View
1106
1107Nowadays, when users log into a computer, they usually find that all
1108their programs show messages in their native language -- at least for
1109users of languages with an active free software community, like French or
1110German; to a lesser extent for languages with a smaller participation in
1111free software and the GNU project, like Hindi and Filipino.
1112
1113How does this work?  How can the user influence the language that is used
1114by the programs?  This chapter will answer it.
1115
1116@menu
1117* System Installation::         Questions During Operating System Installation
1118* Setting the GUI Locale::      How to Specify the Locale Used by GUI Programs
1119* Setting the POSIX Locale::    How to Specify the Locale According to POSIX
1120* Installing Localizations::    How to Install Additional Translations
1121@end menu
1122
1123@node System Installation, Setting the GUI Locale, Users, Users
1124@section Operating System Installation
1125
1126The default language is often already specified during operating system
1127installation.  When the operating system is installed, the installer
1128typically asks for the language used for the installation process and,
1129separately, for the language to use in the installed system.  Some OS
1130installers only ask for the language once.
1131
1132This determines the system-wide default language for all users.  But the
1133installers often give the possibility to install extra localizations for
1134additional languages.  For example, the localizations of KDE (the K
1135Desktop Environment) and OpenOffice.org are often bundled separately,
1136as one installable package per language.
1137
1138At this point it is good to consider the intended use of the machine: If
1139it is a machine designated for personal use, additional localizations are
1140probably not necessary.  If, however, the machine is in use in an
1141organization or company that has international relationships, one can
1142consider the needs of guest users.  If you have a guest from abroad, for
1143a week, what could be his preferred locales?  It may be worth installing
1144these additional localizations ahead of time, since they cost only a bit
1145of disk space at this point.
1146
1147The system-wide default language is the locale configuration that is used
1148when a new user account is created.  But the user can have his own locale
1149configuration that is different from the one of the other users of the
1150same machine.  He can specify it, typically after the first login, as
1151described in the next section.
1152
1153@node Setting the GUI Locale, Setting the POSIX Locale, System Installation, Users
1154@section Setting the Locale Used by GUI Programs
1155
1156The immediately available programs in a user's desktop come from a group
1157of programs called a ``desktop environment''; it usually includes the window
1158manager, a web browser, a text editor, and more.  The most common free
1159desktop environments are KDE, GNOME, and Xfce.
1160
1161The locale used by GUI programs of the desktop environment can be specified
1162in a configuration screen called ``control center'', ``language settings''
1163or ``country settings''.
1164
1165Individual GUI programs that are not part of the desktop environment can
1166have their locale specified either in a settings panel, or through environment
1167variables.
1168
1169For some programs, it is possible to specify the locale through environment
1170variables, possibly even to a different locale than the desktop's locale.
1171This means, instead of starting a program through a menu or from the file
1172system, you can start it from the command-line, after having set some
1173environment variables.  The environment variables can be those specified
1174in the next section (@ref{Setting the POSIX Locale}); for some versions of
1175KDE, however, the locale is specified through a variable @code{KDE_LANG},
1176rather than @code{LANG} or @code{LC_ALL}.
1177
1178@node Setting the POSIX Locale, Installing Localizations, Setting the GUI Locale, Users
1179@section Setting the Locale through Environment Variables
1180
1181As a user, if your language has been installed for this package, in the
1182simplest case, you only have to set the @code{LANG} environment variable
1183to the appropriate @samp{@var{ll}_@var{CC}} combination.  For example,
1184let's suppose that you speak German and live in Germany.  At the shell
1185prompt, merely execute 
1186@w{@samp{setenv LANG de_DE}} (in @code{csh}),
1187@w{@samp{export LANG; LANG=de_DE}} (in @code{sh}) or
1188@w{@samp{export LANG=de_DE}} (in @code{bash}).  This can be done from your
1189@file{.login} or @file{.profile} file, once and for all.
1190
1191@menu
1192* Locale Names::                How a Locale Specification Looks Like
1193* Locale Environment Variables:: Which Environment Variable Specfies What
1194* The LANGUAGE variable::       How to Specify a Priority List of Languages
1195@end menu
1196
1197@node Locale Names, Locale Environment Variables, Setting the POSIX Locale, Setting the POSIX Locale
1198@subsection Locale Names
1199
1200A locale name usually has the form @samp{@var{ll}_@var{CC}}.  Here
1201@samp{@var{ll}} is an @w{ISO 639} two-letter language code, and
1202@samp{@var{CC}} is an @w{ISO 3166} two-letter country code.  For example,
1203for German in Germany, @var{ll} is @code{de}, and @var{CC} is @code{DE}.
1204You find a list of the language codes in appendix @ref{Language Codes} and
1205a list of the country codes in appendix @ref{Country Codes}.
1206
1207You might think that the country code specification is redundant.  But in
1208fact, some languages have dialects in different countries.  For example,
1209@samp{de_AT} is used for Austria, and @samp{pt_BR} for Brazil.  The country
1210code serves to distinguish the dialects.
1211
1212Many locale names have an extended syntax
1213@samp{@var{ll}_@var{CC}.@var{encoding}} that also specifies the character
1214encoding.  These are in use because between 2000 and 2005, most users have
1215switched to locales in UTF-8 encoding.  For example, the German locale on
1216glibc systems is nowadays @samp{de_DE.UTF-8}.  The older name @samp{de_DE}
1217still refers to the German locale as of 2000 that stores characters in
1218ISO-8859-1 encoding -- a text encoding that cannot even accomodate the Euro
1219currency sign.
1220
1221Some locale names use @samp{@var{ll}_@var{CC}.@@@var{variant}} instead of
1222@samp{@var{ll}_@var{CC}}.  The @samp{@@@var{variant}} can denote any kind of
1223characteristics that is not already implied by the language @var{ll} and
1224the country @var{CC}.  It can denote a particular monetary unit.  For example,
1225on glibc systems, @samp{de_DE@@euro} denotes the locale that uses the Euro
1226currency, in contrast to the older locale @samp{de_DE} which implies the use
1227of the currency before 2002.  It can also denote a dialect of the language,
1228or the script used to write text (for example, @samp{sr_RS@@latin} uses the
1229Latin script, whereas @samp{sr_RS} uses the Cyrillic script to write Serbian),
1230or the orthography rules, or similar.
1231
1232On other systems, some variations of this scheme are used, such as
1233@samp{@var{ll}}.  You can get the list of locales supported by your system
1234for your language by running the command @samp{locale -a | grep '^@var{ll}'}.
1235
1236There is also a special locale, called @samp{C}.
1237@c Don't mention that this locale also has the name "POSIX". When we talk about
1238@c the "POSIX locale", we mean the "locale as specified in the POSIX way", and
1239@c mentioning a locale called "POSIX" would bring total confusion.
1240When it is used, it disables all localization: in this locale, all programs
1241standardized by POSIX use English messages and an unspecified character
1242encoding (often US-ASCII, but sometimes also ISO-8859-1 or UTF-8, depending on
1243the operating system).
1244
1245@node Locale Environment Variables, The LANGUAGE variable, Locale Names, Setting the POSIX Locale
1246@subsection Locale Environment Variables
1247@cindex setting up @code{gettext} at run time
1248@cindex selecting message language
1249@cindex language selection
1250
1251A locale is composed of several @emph{locale categories}, see @ref{Aspects}.
1252When a program looks up locale dependent values, it does this according to
1253the following environment variables, in priority order:
1254
1255@enumerate
1256@vindex LANGUAGE@r{, environment variable}
1257@item @code{LANGUAGE}
1258@vindex LC_ALL@r{, environment variable}
1259@item @code{LC_ALL}
1260@vindex LC_CTYPE@r{, environment variable}
1261@vindex LC_NUMERIC@r{, environment variable}
1262@vindex LC_TIME@r{, environment variable}
1263@vindex LC_COLLATE@r{, environment variable}
1264@vindex LC_MONETARY@r{, environment variable}
1265@vindex LC_MESSAGES@r{, environment variable}
1266@item @code{LC_xxx}, according to selected locale category:
1267@code{LC_CTYPE}, @code{LC_NUMERIC}, @code{LC_TIME}, @code{LC_COLLATE},
1268@code{LC_MONETARY}, @code{LC_MESSAGES}, ...
1269@vindex LANG@r{, environment variable}
1270@item @code{LANG}
1271@end enumerate
1272
1273Variables whose value is set but is empty are ignored in this lookup.
1274
1275@code{LANG} is the normal environment variable for specifying a locale.
1276As a user, you normally set this variable (unless some of the other variables
1277have already been set by the system, in @file{/etc/profile} or similar
1278initialization files).
1279
1280@code{LC_CTYPE}, @code{LC_NUMERIC}, @code{LC_TIME}, @code{LC_COLLATE},
1281@code{LC_MONETARY}, @code{LC_MESSAGES}, and so on, are the environment
1282variables meant to override @code{LANG} and affecting a single locale
1283category only.  For example, assume you are a Swedish user in Spain, and you
1284want your programs to handle numbers and dates according to Spanish
1285conventions, and only the messages should be in Swedish.  Then you could
1286create a locale named @samp{sv_ES} or @samp{sv_ES.UTF-8} by use of the
1287@code{localedef} program.  But it is simpler, and achieves the same effect,
1288to set the @code{LANG} variable to @code{es_ES.UTF-8} and the
1289@code{LC_MESSAGES} variable to @code{sv_SE.UTF-8}; these two locales come
1290already preinstalled with the operating system.
1291
1292@code{LC_ALL} is an environment variable that overrides all of these.
1293It is typically used in scripts that run particular programs.  For example,
1294@code{configure} scripts generated by GNU autoconf use @code{LC_ALL} to make
1295sure that the configuration tests don't operate in locale dependent ways.
1296
1297Some systems, unfortunately, set @code{LC_ALL} in @file{/etc/profile} or in
1298similar initialization files.  As a user, you therefore have to unset this
1299variable if you want to set @code{LANG} and optionally some of the other
1300@code{LC_xxx} variables.
1301
1302The @code{LANGUAGE} variable is described in the next subsection.
1303
1304@node The LANGUAGE variable,  , Locale Environment Variables, Setting the POSIX Locale
1305@subsection Specifying a Priority List of Languages
1306
1307Not all programs have translations for all languages.  By default, an
1308English message is shown in place of a nonexistent translation.  If you
1309understand other languages, you can set up a priority list of languages.
1310This is done through a different environment variable, called
1311@code{LANGUAGE}.  GNU @code{gettext} gives preference to @code{LANGUAGE}
1312over @code{LC_ALL} and @code{LANG} for the purpose of message handling,
1313but you still need to have @code{LANG} (or @code{LC_ALL}) set to the primary
1314language; this is required by other parts of the system libraries.
1315For example, some Swedish users who would rather read translations in
1316German than English for when Swedish is not available, set @code{LANGUAGE}
1317to @samp{sv:de} while leaving @code{LANG} to @samp{sv_SE}.
1318
1319Special advice for Norwegian users: The language code for Norwegian
1320bokm@ringaccent{a}l changed from @samp{no} to @samp{nb} recently (in 2003).
1321During the transition period, while some message catalogs for this language
1322are installed under @samp{nb} and some older ones under @samp{no}, it is
1323recommended for Norwegian users to set @code{LANGUAGE} to @samp{nb:no} so that
1324both newer and older translations are used.
1325
1326In the @code{LANGUAGE} environment variable, but not in the other
1327environment variables, @samp{@var{ll}_@var{CC}} combinations can be
1328abbreviated as @samp{@var{ll}} to denote the language's main dialect.
1329For example, @samp{de} is equivalent to @samp{de_DE} (German as spoken in
1330Germany), and @samp{pt} to @samp{pt_PT} (Portuguese as spoken in Portugal)
1331in this context.
1332
1333Note: The variable @code{LANGUAGE} is ignored if the locale is set to
1334@samp{C}.  In other words, you have to first enable localization, by setting
1335@code{LANG} (or @code{LC_ALL}) to a value other than @samp{C}, before you can
1336use a language priority list through the @code{LANGUAGE} variable.
1337
1338@node Installing Localizations,  , Setting the POSIX Locale, Users
1339@section Installing Translations for Particular Programs
1340@cindex Translation Matrix
1341@cindex available translations
1342
1343Languages are not equally well supported in all packages using GNU
1344@code{gettext}, and more translations are added over time.  Usually, you
1345use the translations that are shipped with the operating system
1346or with particular packages that you install afterwards.  But you can also
1347install newer localizations directly.  For doing this, you will need an
1348understanding where each localization file is stored on the file system.
1349
1350@cindex @file{ABOUT-NLS} file
1351For programs that participate in the Translation Project, you can start
1352looking for translations here:
1353@url{http://translationproject.org/team/index.html}.
1354A snapshot of this information is also found in the @file{ABOUT-NLS} file
1355that is shipped with GNU gettext.
1356
1357For programs that are part of the KDE project, the starting point is:
1358@url{http://i18n.kde.org/}.
1359
1360For programs that are part of the GNOME project, the starting point is:
1361@url{http://www.gnome.org/i18n/}.
1362
1363For other programs, you may check whether the program's source code package
1364contains some @file{@var{ll}.po} files; often they are kept together in a
1365directory called @file{po/}.  Each @file{@var{ll}.po} file contains the
1366message translations for the language whose abbreviation of @var{ll}.
1367
1368@node PO Files, Sources, Users, Top
1369@chapter The Format of PO Files
1370@cindex PO files' format
1371@cindex file format, @file{.po}
1372
1373The GNU @code{gettext} toolset helps programmers and translators
1374at producing, updating and using translation files, mainly those
1375PO files which are textual, editable files.  This chapter explains
1376the format of PO files.
1377
1378A PO file is made up of many entries, each entry holding the relation
1379between an original untranslated string and its corresponding
1380translation.  All entries in a given PO file usually pertain
1381to a single project, and all translations are expressed in a single
1382target language.  One PO file @dfn{entry} has the following schematic
1383structure:
1384
1385@example
1386@var{white-space}
1387#  @var{translator-comments}
1388#. @var{extracted-comments}
1389#: @var{reference}@dots{}
1390#, @var{flag}@dots{}
1391#| msgid @var{previous-untranslated-string}
1392msgid @var{untranslated-string}
1393msgstr @var{translated-string}
1394@end example
1395
1396The general structure of a PO file should be well understood by
1397the translator.  When using PO mode, very little has to be known
1398about the format details, as PO mode takes care of them for her.
1399
1400A simple entry can look like this:
1401
1402@example
1403#: lib/error.c:116
1404msgid "Unknown system error"
1405msgstr "Error desconegut del sistema"
1406@end example
1407
1408@cindex comments, translator
1409@cindex comments, automatic
1410@cindex comments, extracted
1411Entries begin with some optional white space.  Usually, when generated
1412through GNU @code{gettext} tools, there is exactly one blank line
1413between entries.  Then comments follow, on lines all starting with the
1414character @code{#}.  There are two kinds of comments: those which have
1415some white space immediately following the @code{#} - the @var{translator
1416comments} -, which comments are created and maintained exclusively by the
1417translator, and those which have some non-white character just after the
1418@code{#} - the @var{automatic comments} -, which comments are created and
1419maintained automatically by GNU @code{gettext} tools.  Comment lines
1420starting with @code{#.} contain comments given by the programmer, directed
1421at the translator; these comments are called @var{extracted comments}
1422because the @code{xgettext} program extracts them from the program's
1423source code.  Comment lines starting with @code{#:} contain references to
1424the program's source code.  Comment lines starting with @code{#,} contain
1425flags; more about these below.  Comment lines starting with @code{#|}
1426contain the previous untranslated string for which the translator gave
1427a translation.
1428
1429All comments, of either kind, are optional.
1430
1431@kwindex msgid
1432@kwindex msgstr
1433After white space and comments, entries show two strings, namely
1434first the untranslated string as it appears in the original program
1435sources, and then, the translation of this string.  The original
1436string is introduced by the keyword @code{msgid}, and the translation,
1437by @code{msgstr}.  The two strings, untranslated and translated,
1438are quoted in various ways in the PO file, using @code{"}
1439delimiters and @code{\} escapes, but the translator does not really
1440have to pay attention to the precise quoting format, as PO mode fully
1441takes care of quoting for her.
1442
1443The @code{msgid} strings, as well as automatic comments, are produced
1444and managed by other GNU @code{gettext} tools, and PO mode does not
1445provide means for the translator to alter these.  The most she can
1446do is merely deleting them, and only by deleting the whole entry.
1447On the other hand, the @code{msgstr} string, as well as translator
1448comments, are really meant for the translator, and PO mode gives her
1449the full control she needs.
1450
1451The comment lines beginning with @code{#,} are special because they are
1452not completely ignored by the programs as comments generally are.  The
1453comma separated list of @var{flag}s is used by the @code{msgfmt}
1454program to give the user some better diagnostic messages.  Currently
1455there are two forms of flags defined:
1456
1457@table @code
1458@item fuzzy
1459@kwindex fuzzy@r{ flag}
1460This flag can be generated by the @code{msgmerge} program or it can be
1461inserted by the translator herself.  It shows that the @code{msgstr}
1462string might not be a correct translation (anymore).  Only the translator
1463can judge if the translation requires further modification, or is
1464acceptable as is.  Once satisfied with the translation, she then removes
1465this @code{fuzzy} attribute.  The @code{msgmerge} program inserts this
1466when it combined the @code{msgid} and @code{msgstr} entries after fuzzy
1467search only.  @xref{Fuzzy Entries}.
1468
1469@item c-format
1470@kwindex c-format@r{ flag}
1471@itemx no-c-format
1472@kwindex no-c-format@r{ flag}
1473These flags should not be added by a human.  Instead only the
1474@code{xgettext} program adds them.  In an automated PO file processing
1475system as proposed here the user changes would be thrown away again as
1476soon as the @code{xgettext} program generates a new template file.
1477
1478The @code{c-format} flag tells that the untranslated string and the
1479translation are supposed to be C format strings.  The @code{no-c-format}
1480flag tells that they are not C format strings, even though the untranslated
1481string happens to look like a C format string (with @samp{%} directives).
1482
1483In case the @code{c-format} flag is given for a string the @code{msgfmt}
1484does some more tests to check to validity of the translation.
1485@xref{msgfmt Invocation}, @ref{c-format Flag} and @ref{c-format}.
1486
1487@item objc-format
1488@kwindex objc-format@r{ flag}
1489@itemx no-objc-format
1490@kwindex no-objc-format@r{ flag}
1491Likewise for Objective C, see @ref{objc-format}.
1492
1493@item sh-format
1494@kwindex sh-format@r{ flag}
1495@itemx no-sh-format
1496@kwindex no-sh-format@r{ flag}
1497Likewise for Shell, see @ref{sh-format}.
1498
1499@item python-format
1500@kwindex python-format@r{ flag}
1501@itemx no-python-format
1502@kwindex no-python-format@r{ flag}
1503Likewise for Python, see @ref{python-format}.
1504
1505@item lisp-format
1506@kwindex lisp-format@r{ flag}
1507@itemx no-lisp-format
1508@kwindex no-lisp-format@r{ flag}
1509Likewise for Lisp, see @ref{lisp-format}.
1510
1511@item elisp-format
1512@kwindex elisp-format@r{ flag}
1513@itemx no-elisp-format
1514@kwindex no-elisp-format@r{ flag}
1515Likewise for Emacs Lisp, see @ref{elisp-format}.
1516
1517@item librep-format
1518@kwindex librep-format@r{ flag}
1519@itemx no-librep-format
1520@kwindex no-librep-format@r{ flag}
1521Likewise for librep, see @ref{librep-format}.
1522
1523@item scheme-format
1524@kwindex scheme-format@r{ flag}
1525@itemx no-scheme-format
1526@kwindex no-scheme-format@r{ flag}
1527Likewise for Scheme, see @ref{scheme-format}.
1528
1529@item smalltalk-format
1530@kwindex smalltalk-format@r{ flag}
1531@itemx no-smalltalk-format
1532@kwindex no-smalltalk-format@r{ flag}
1533Likewise for Smalltalk, see @ref{smalltalk-format}.
1534
1535@item java-format
1536@kwindex java-format@r{ flag}
1537@itemx no-java-format
1538@kwindex no-java-format@r{ flag}
1539Likewise for Java, see @ref{java-format}.
1540
1541@item csharp-format
1542@kwindex csharp-format@r{ flag}
1543@itemx no-csharp-format
1544@kwindex no-csharp-format@r{ flag}
1545Likewise for C#, see @ref{csharp-format}.
1546
1547@item awk-format
1548@kwindex awk-format@r{ flag}
1549@itemx no-awk-format
1550@kwindex no-awk-format@r{ flag}
1551Likewise for awk, see @ref{awk-format}.
1552
1553@item object-pascal-format
1554@kwindex object-pascal-format@r{ flag}
1555@itemx no-object-pascal-format
1556@kwindex no-object-pascal-format@r{ flag}
1557Likewise for Object Pascal, see @ref{object-pascal-format}.
1558
1559@item ycp-format
1560@kwindex ycp-format@r{ flag}
1561@itemx no-ycp-format
1562@kwindex no-ycp-format@r{ flag}
1563Likewise for YCP, see @ref{ycp-format}.
1564
1565@item tcl-format
1566@kwindex tcl-format@r{ flag}
1567@itemx no-tcl-format
1568@kwindex no-tcl-format@r{ flag}
1569Likewise for Tcl, see @ref{tcl-format}.
1570
1571@item perl-format
1572@kwindex perl-format@r{ flag}
1573@itemx no-perl-format
1574@kwindex no-perl-format@r{ flag}
1575Likewise for Perl, see @ref{perl-format}.
1576
1577@item perl-brace-format
1578@kwindex perl-brace-format@r{ flag}
1579@itemx no-perl-brace-format
1580@kwindex no-perl-brace-format@r{ flag}
1581Likewise for Perl brace, see @ref{perl-format}.
1582
1583@item php-format
1584@kwindex php-format@r{ flag}
1585@itemx no-php-format
1586@kwindex no-php-format@r{ flag}
1587Likewise for PHP, see @ref{php-format}.
1588
1589@item gcc-internal-format
1590@kwindex gcc-internal-format@r{ flag}
1591@itemx no-gcc-internal-format
1592@kwindex no-gcc-internal-format@r{ flag}
1593Likewise for the GCC sources, see @ref{gcc-internal-format}.
1594
1595@item qt-format
1596@kwindex qt-format@r{ flag}
1597@itemx no-qt-format
1598@kwindex no-qt-format@r{ flag}
1599Likewise for Qt, see @ref{qt-format}.
1600
1601@item kde-format
1602@kwindex kde-format@r{ flag}
1603@itemx no-kde-format
1604@kwindex no-kde-format@r{ flag}
1605Likewise for KDE, see @ref{kde-format}.
1606
1607@item boost-format
1608@kwindex boost-format@r{ flag}
1609@itemx no-boost-format
1610@kwindex no-boost-format@r{ flag}
1611Likewise for Boost, see @ref{boost-format}.
1612
1613@end table
1614
1615@kwindex msgctxt
1616@cindex context, in PO files
1617It is also possible to have entries with a context specifier. They look like
1618this:
1619
1620@example
1621@var{white-space}
1622#  @var{translator-comments}
1623#. @var{extracted-comments}
1624#: @var{reference}@dots{}
1625#, @var{flag}@dots{}
1626#| msgctxt @var{previous-context}
1627#| msgid @var{previous-untranslated-string}
1628msgctxt @var{context}
1629msgid @var{untranslated-string}
1630msgstr @var{translated-string}
1631@end example
1632
1633The context serves to disambiguate messages with the same
1634@var{untranslated-string}.  It is possible to have several entries with
1635the same @var{untranslated-string} in a PO file, provided that they each
1636have a different @var{context}.  Note that an empty @var{context} string
1637and an absent @code{msgctxt} line do not mean the same thing.
1638
1639@kwindex msgid_plural
1640@cindex plural forms, in PO files
1641A different kind of entries is used for translations which involve
1642plural forms.
1643
1644@example
1645@var{white-space}
1646#  @var{translator-comments}
1647#. @var{extracted-comments}
1648#: @var{reference}@dots{}
1649#, @var{flag}@dots{}
1650#| msgid @var{previous-untranslated-string-singular}
1651#| msgid_plural @var{previous-untranslated-string-plural}
1652msgid @var{untranslated-string-singular}
1653msgid_plural @var{untranslated-string-plural}
1654msgstr[0] @var{translated-string-case-0}
1655...
1656msgstr[N] @var{translated-string-case-n}
1657@end example
1658
1659Such an entry can look like this:
1660
1661@example
1662#: src/msgcmp.c:338 src/po-lex.c:699
1663#, c-format
1664msgid "found %d fatal error"
1665msgid_plural "found %d fatal errors"
1666msgstr[0] "s'ha trobat %d error fatal"
1667msgstr[1] "s'han trobat %d errors fatals"
1668@end example
1669
1670Here also, a @code{msgctxt} context can be specified before @code{msgid},
1671like above.
1672
1673The @var{previous-untranslated-string} is optionally inserted by the
1674@code{msgmerge} program, at the same time when it marks a message fuzzy.
1675It helps the translator to see which changes were done by the developers
1676on the @var{untranslated-string}.
1677
1678It happens that some lines, usually whitespace or comments, follow the
1679very last entry of a PO file.  Such lines are not part of any entry,
1680and will be dropped when the PO file is processed by the tools, or may
1681disturb some PO file editors.
1682
1683The remainder of this section may be safely skipped by those using
1684a PO file editor, yet it may be interesting for everybody to have a better
1685idea of the precise format of a PO file.  On the other hand, those
1686wishing to modify PO files by hand should carefully continue reading on.
1687
1688Each of @var{untranslated-string} and @var{translated-string} respects
1689the C syntax for a character string, including the surrounding quotes
1690and embedded backslashed escape sequences.  When the time comes
1691to write multi-line strings, one should not use escaped newlines.
1692Instead, a closing quote should follow the last character on the
1693line to be continued, and an opening quote should resume the string
1694at the beginning of the following PO file line.  For example:
1695
1696@example
1697msgid ""
1698"Here is an example of how one might continue a very long string\n"
1699"for the common case the string represents multi-line output.\n"
1700@end example
1701
1702@noindent
1703In this example, the empty string is used on the first line, to
1704allow better alignment of the @code{H} from the word @samp{Here}
1705over the @code{f} from the word @samp{for}.  In this example, the
1706@code{msgid} keyword is followed by three strings, which are meant
1707to be concatenated.  Concatenating the empty string does not change
1708the resulting overall string, but it is a way for us to comply with
1709the necessity of @code{msgid} to be followed by a string on the same
1710line, while keeping the multi-line presentation left-justified, as
1711we find this to be a cleaner disposition.  The empty string could have
1712been omitted, but only if the string starting with @samp{Here} was
1713promoted on the first line, right after @code{msgid}.@footnote{This
1714limitation is not imposed by GNU @code{gettext}, but is for compatibility
1715with the @code{msgfmt} implementation on Solaris.} It was not really necessary
1716either to switch between the two last quoted strings immediately after
1717the newline @samp{\n}, the switch could have occurred after @emph{any}
1718other character, we just did it this way because it is neater.
1719
1720@cindex newlines in PO files
1721One should carefully distinguish between end of lines marked as
1722@samp{\n} @emph{inside} quotes, which are part of the represented
1723string, and end of lines in the PO file itself, outside string quotes,
1724which have no incidence on the represented string.
1725
1726@cindex comments in PO files
1727Outside strings, white lines and comments may be used freely.
1728Comments start at the beginning of a line with @samp{#} and extend
1729until the end of the PO file line.  Comments written by translators
1730should have the initial @samp{#} immediately followed by some white
1731space.  If the @samp{#} is not immediately followed by white space,
1732this comment is most likely generated and managed by specialized GNU
1733tools, and might disappear or be replaced unexpectedly when the PO
1734file is given to @code{msgmerge}.
1735
1736@node Sources, Template, PO Files, Top
1737@chapter Preparing Program Sources
1738@cindex preparing programs for translation
1739
1740@c FIXME: Rewrite (the whole chapter).
1741
1742For the programmer, changes to the C source code fall into three
1743categories.  First, you have to make the localization functions
1744known to all modules needing message translation.  Second, you should
1745properly trigger the operation of GNU @code{gettext} when the program
1746initializes, usually from the @code{main} function.  Last, you should
1747identify, adjust and mark all constant strings in your program
1748needing translation.
1749
1750@menu
1751* Importing::                   Importing the @code{gettext} declaration
1752* Triggering::                  Triggering @code{gettext} Operations
1753* Preparing Strings::           Preparing Translatable Strings
1754* Mark Keywords::               How Marks Appear in Sources
1755* Marking::                     Marking Translatable Strings
1756* c-format Flag::               Telling something about the following string
1757* Special cases::               Special Cases of Translatable Strings
1758* Bug Report Address::          Letting Users Report Translation Bugs
1759* Names::                       Marking Proper Names for Translation
1760* Libraries::                   Preparing Library Sources
1761@end menu
1762
1763@node Importing, Triggering, Sources, Sources
1764@section Importing the @code{gettext} declaration
1765
1766Presuming that your set of programs, or package, has been adjusted
1767so all needed GNU @code{gettext} files are available, and your
1768@file{Makefile} files are adjusted (@pxref{Maintainers}), each C module
1769having translated C strings should contain the line:
1770
1771@cindex include file @file{libintl.h}
1772@example
1773#include <libintl.h>
1774@end example
1775
1776Similarly, each C module containing @code{printf()}/@code{fprintf()}/...
1777calls with a format string that could be a translated C string (even if
1778the C string comes from a different C module) should contain the line:
1779
1780@example
1781#include <libintl.h>
1782@end example
1783
1784@node Triggering, Preparing Strings, Importing, Sources
1785@section Triggering @code{gettext} Operations
1786
1787@cindex initialization
1788The initialization of locale data should be done with more or less
1789the same code in every program, as demonstrated below:
1790
1791@example
1792@group
1793int
1794main (int argc, char *argv[])
1795@{
1796  @dots{}
1797  setlocale (LC_ALL, "");
1798  bindtextdomain (PACKAGE, LOCALEDIR);
1799  textdomain (PACKAGE);
1800  @dots{}
1801@}
1802@end group
1803@end example
1804
1805@var{PACKAGE} and @var{LOCALEDIR} should be provided either by
1806@file{config.h} or by the Makefile.  For now consult the @code{gettext}
1807or @code{hello} sources for more information.
1808
1809@cindex locale category, LC_ALL
1810@cindex locale category, LC_CTYPE
1811The use of @code{LC_ALL} might not be appropriate for you.
1812@code{LC_ALL} includes all locale categories and especially
1813@code{LC_CTYPE}.  This latter category is responsible for determining
1814character classes with the @code{isalnum} etc. functions from
1815@file{ctype.h} which could especially for programs, which process some
1816kind of input language, be wrong.  For example this would mean that a
1817source code using the @,{c} (c-cedilla character) is runnable in
1818France but not in the U.S.
1819
1820Some systems also have problems with parsing numbers using the
1821@code{scanf} functions if an other but the @code{LC_ALL} locale category is
1822used.  The standards say that additional formats but the one known in the
1823@code{"C"} locale might be recognized.  But some systems seem to reject
1824numbers in the @code{"C"} locale format.  In some situation, it might
1825also be a problem with the notation itself which makes it impossible to
1826recognize whether the number is in the @code{"C"} locale or the local
1827format.  This can happen if thousands separator characters are used.
1828Some locales define this character according to the national
1829conventions to @code{'.'} which is the same character used in the
1830@code{"C"} locale to denote the decimal point.
1831
1832So it is sometimes necessary to replace the @code{LC_ALL} line in the
1833code above by a sequence of @code{setlocale} lines
1834
1835@example
1836@group
1837@{
1838  @dots{}
1839  setlocale (LC_CTYPE, "");
1840  setlocale (LC_MESSAGES, "");
1841  @dots{}
1842@}
1843@end group
1844@end example
1845
1846@cindex locale category, LC_CTYPE
1847@cindex locale category, LC_COLLATE
1848@cindex locale category, LC_MONETARY
1849@cindex locale category, LC_NUMERIC
1850@cindex locale category, LC_TIME
1851@cindex locale category, LC_MESSAGES
1852@cindex locale category, LC_RESPONSES
1853@noindent
1854On all POSIX conformant systems the locale categories @code{LC_CTYPE},
1855@code{LC_MESSAGES}, @code{LC_COLLATE}, @code{LC_MONETARY},
1856@code{LC_NUMERIC}, and @code{LC_TIME} are available.  On some systems
1857which are only ISO C compliant, @code{LC_MESSAGES} is missing, but
1858a substitute for it is defined in GNU gettext's @code{<libintl.h>} and
1859in GNU gnulib's @code{<locale.h>}.
1860
1861Note that changing the @code{LC_CTYPE} also affects the functions
1862declared in the @code{<ctype.h>} standard header and some functions
1863declared in the @code{<string.h>} and @code{<stdlib.h>} standard headers.
1864If this is not
1865desirable in your application (for example in a compiler's parser),
1866you can use a set of substitute functions which hardwire the C locale,
1867such as found in the modules @samp{c-ctype}, @samp{c-strcase},
1868@samp{c-strcasestr}, @samp{c-strtod}, @samp{c-strtold} in the GNU gnulib
1869source distribution.
1870
1871It is also possible to switch the locale forth and back between the
1872environment dependent locale and the C locale, but this approach is
1873normally avoided because a @code{setlocale} call is expensive,
1874because it is tedious to determine the places where a locale switch
1875is needed in a large program's source, and because switching a locale
1876is not multithread-safe.
1877
1878@node Preparing Strings, Mark Keywords, Triggering, Sources
1879@section Preparing Translatable Strings
1880
1881@cindex marking strings, preparations
1882Before strings can be marked for translations, they sometimes need to
1883be adjusted.  Usually preparing a string for translation is done right
1884before marking it, during the marking phase which is described in the
1885next sections.  What you have to keep in mind while doing that is the
1886following.
1887
1888@itemize @bullet
1889@item
1890Decent English style.
1891
1892@item
1893Entire sentences.
1894
1895@item
1896Split at paragraphs.
1897
1898@item
1899Use format strings instead of string concatenation.
1900
1901@item
1902Avoid unusual markup and unusual control characters.
1903@end itemize
1904
1905@noindent
1906Let's look at some examples of these guidelines.
1907
1908@cindex style
1909Translatable strings should be in good English style.  If slang language
1910with abbreviations and shortcuts is used, often translators will not
1911understand the message and will produce very inappropriate translations.
1912
1913@example
1914"%s: is parameter\n"
1915@end example
1916
1917@noindent
1918This is nearly untranslatable: Is the displayed item @emph{a} parameter or
1919@emph{the} parameter?
1920
1921@example
1922"No match"
1923@end example
1924
1925@noindent
1926The ambiguity in this message makes it unintelligible: Is the program
1927attempting to set something on fire? Does it mean "The given object does
1928not match the template"? Does it mean "The template does not fit for any
1929of the objects"?
1930
1931@cindex ambiguities
1932In both cases, adding more words to the message will help both the
1933translator and the English speaking user.
1934
1935@cindex sentences
1936Translatable strings should be entire sentences.  It is often not possible
1937to translate single verbs or adjectives in a substitutable way.
1938
1939@example
1940printf ("File %s is %s protected", filename, rw ? "write" : "read");
1941@end example
1942
1943@noindent
1944Most translators will not look at the source and will thus only see the
1945string @code{"File %s is %s protected"}, which is unintelligible.  Change
1946this to
1947
1948@example
1949printf (rw ? "File %s is write protected" : "File %s is read protected",
1950        filename);
1951@end example
1952
1953@noindent
1954This way the translator will not only understand the message, she will
1955also be able to find the appropriate grammatical construction.  A French
1956translator for example translates "write protected" like "protected
1957against writing".
1958
1959Entire sentences are also important because in many languages, the
1960declination of some word in a sentence depends on the gender or the
1961number (singular/plural) of another part of the sentence.  There are
1962usually more interdependencies between words than in English.  The
1963consequence is that asking a translator to translate two half-sentences
1964and then combining these two half-sentences through dumb string concatenation
1965will not work, for many languages, even though it would work for English.
1966That's why translators need to handle entire sentences.
1967
1968Often sentences don't fit into a single line.  If a sentence is output
1969using two subsequent @code{printf} statements, like this
1970
1971@example
1972printf ("Locale charset \"%s\" is different from\n", lcharset);
1973printf ("input file charset \"%s\".\n", fcharset);
1974@end example
1975
1976@noindent
1977the translator would have to translate two half sentences, but nothing
1978in the POT file would tell her that the two half sentences belong together.
1979It is necessary to merge the two @code{printf} statements so that the
1980translator can handle the entire sentence at once and decide at which
1981place to insert a line break in the translation (if at all):
1982
1983@example
1984printf ("Locale charset \"%s\" is different from\n\
1985input file charset \"%s\".\n", lcharset, fcharset);
1986@end example
1987
1988You may now ask: how about two or more adjacent sentences? Like in this case:
1989
1990@example
1991puts ("Apollo 13 scenario: Stack overflow handling failed.");
1992puts ("On the next stack overflow we will crash!!!");
1993@end example
1994
1995@noindent
1996Should these two statements merged into a single one? I would recommend to
1997merge them if the two sentences are related to each other, because then it
1998makes it easier for the translator to understand and translate both.  On
1999the other hand, if one of the two messages is a stereotypic one, occurring
2000in other places as well, you will do a favour to the translator by not
2001merging the two.  (Identical messages occurring in several places are
2002combined by xgettext, so the translator has to handle them once only.)
2003
2004@cindex paragraphs
2005Translatable strings should be limited to one paragraph; don't let a
2006single message be longer than ten lines.  The reason is that when the
2007translatable string changes, the translator is faced with the task of
2008updating the entire translated string.  Maybe only a single word will
2009have changed in the English string, but the translator doesn't see that
2010(with the current translation tools), therefore she has to proofread
2011the entire message.
2012
2013@cindex help option
2014Many GNU programs have a @samp{--help} output that extends over several
2015screen pages.  It is a courtesy towards the translators to split such a
2016message into several ones of five to ten lines each.  While doing that,
2017you can also attempt to split the documented options into groups,
2018such as the input options, the output options, and the informative
2019output options.  This will help every user to find the option he is
2020looking for.
2021
2022@cindex string concatenation
2023@cindex concatenation of strings
2024Hardcoded string concatenation is sometimes used to construct English
2025strings:
2026
2027@example
2028strcpy (s, "Replace ");
2029strcat (s, object1);
2030strcat (s, " with ");
2031strcat (s, object2);
2032strcat (s, "?");
2033@end example
2034
2035@noindent
2036In order to present to the translator only entire sentences, and also
2037because in some languages the translator might want to swap the order
2038of @code{object1} and @code{object2}, it is necessary to change this
2039to use a format string:
2040
2041@example
2042sprintf (s, "Replace %s with %s?", object1, object2);
2043@end example
2044
2045@cindex @code{inttypes.h}
2046A similar case is compile time concatenation of strings.  The ISO C 99
2047include file @code{<inttypes.h>} contains a macro @code{PRId64} that
2048can be used as a formatting directive for outputting an @samp{int64_t}
2049integer through @code{printf}.  It expands to a constant string, usually
2050"d" or "ld" or "lld" or something like this, depending on the platform.
2051Assume you have code like
2052
2053@example
2054printf ("The amount is %0" PRId64 "\n", number);
2055@end example
2056
2057@noindent
2058The @code{gettext} tools and library have special support for these
2059@code{<inttypes.h>} macros.  You can therefore simply write
2060
2061@example
2062printf (gettext ("The amount is %0" PRId64 "\n"), number);
2063@end example
2064
2065@noindent
2066The PO file will contain the string "The amount is %0<PRId64>\n".
2067The translators will provide a translation containing "%0<PRId64>"
2068as well, and at runtime the @code{gettext} function's result will
2069contain the appropriate constant string, "d" or "ld" or "lld".
2070
2071This works only for the predefined @code{<inttypes.h>} macros.  If
2072you have defined your own similar macros, let's say @samp{MYPRId64},
2073that are not known to @code{xgettext}, the solution for this problem
2074is to change the code like this:
2075
2076@example
2077char buf1[100];
2078sprintf (buf1, "%0" MYPRId64, number);
2079printf (gettext ("The amount is %s\n"), buf1);
2080@end example
2081
2082This means, you put the platform dependent code in one statement, and the
2083internationalization code in a different statement.  Note that a buffer length
2084of 100 is safe, because all available hardware integer types are limited to
2085128 bits, and to print a 128 bit integer one needs at most 54 characters,
2086regardless whether in decimal, octal or hexadecimal.
2087
2088@cindex Java, string concatenation
2089@cindex C#, string concatenation
2090All this applies to other programming languages as well.  For example, in
2091Java and C#, string concatenation is very frequently used, because it is a
2092compiler built-in operator.  Like in C, in Java, you would change
2093
2094@example
2095System.out.println("Replace "+object1+" with "+object2+"?");
2096@end example
2097
2098@noindent
2099into a statement involving a format string:
2100
2101@example
2102System.out.println(
2103    MessageFormat.format("Replace @{0@} with @{1@}?",
2104                         new Object[] @{ object1, object2 @}));
2105@end example
2106
2107@noindent
2108Similarly, in C#, you would change
2109
2110@example
2111Console.WriteLine("Replace "+object1+" with "+object2+"?");
2112@end example
2113
2114@noindent
2115into a statement involving a format string:
2116
2117@example
2118Console.WriteLine(
2119    String.Format("Replace @{0@} with @{1@}?", object1, object2));
2120@end example
2121
2122@cindex markup
2123@cindex control characters
2124Unusual markup or control characters should not be used in translatable
2125strings.  Translators will likely not understand the particular meaning
2126of the markup or control characters.
2127
2128For example, if you have a convention that @samp{|} delimits the
2129left-hand and right-hand part of some GUI elements, translators will
2130often not understand it without specific comments.  It might be
2131better to have the translator translate the left-hand and right-hand
2132part separately.
2133
2134Another example is the @samp{argp} convention to use a single @samp{\v}
2135(vertical tab) control character to delimit two sections inside a
2136string.  This is flawed.  Some translators may convert it to a simple
2137newline, some to blank lines.  With some PO file editors it may not be
2138easy to even enter a vertical tab control character.  So, you cannot
2139be sure that the translation will contain a @samp{\v} character, at the
2140corresponding position.  The solution is, again, to let the translator
2141translate two separate strings and combine at run-time the two translated
2142strings with the @samp{\v} required by the convention.
2143
2144HTML markup, however, is common enough that it's probably ok to use in
2145translatable strings.  But please bear in mind that the GNU gettext tools
2146don't verify that the translations are well-formed HTML.
2147
2148@node Mark Keywords, Marking, Preparing Strings, Sources
2149@section How Marks Appear in Sources
2150@cindex marking strings that require translation
2151
2152All strings requiring translation should be marked in the C sources.  Marking
2153is done in such a way that each translatable string appears to be
2154the sole argument of some function or preprocessor macro.  There are
2155only a few such possible functions or macros meant for translation,
2156and their names are said to be marking keywords.  The marking is
2157attached to strings themselves, rather than to what we do with them.
2158This approach has more uses.  A blatant example is an error message
2159produced by formatting.  The format string needs translation, as
2160well as some strings inserted through some @samp{%s} specification
2161in the format, while the result from @code{sprintf} may have so many
2162different instances that it is impractical to list them all in some
2163@samp{error_string_out()} routine, say.
2164
2165This marking operation has two goals.  The first goal of marking
2166is for triggering the retrieval of the translation, at run time.
2167The keyword is possibly resolved into a routine able to dynamically
2168return the proper translation, as far as possible or wanted, for the
2169argument string.  Most localizable strings are found in executable
2170positions, that is, attached to variables or given as parameters to
2171functions.  But this is not universal usage, and some translatable
2172strings appear in structured initializations.  @xref{Special cases}.
2173
2174The second goal of the marking operation is to help @code{xgettext}
2175at properly extracting all translatable strings when it scans a set
2176of program sources and produces PO file templates.
2177
2178The canonical keyword for marking translatable strings is
2179@samp{gettext}, it gave its name to the whole GNU @code{gettext}
2180package.  For packages making only light use of the @samp{gettext}
2181keyword, macro or function, it is easily used @emph{as is}.  However,
2182for packages using the @code{gettext} interface more heavily, it
2183is usually more convenient to give the main keyword a shorter, less
2184obtrusive name.  Indeed, the keyword might appear on a lot of strings
2185all over the package, and programmers usually do not want nor need
2186their program sources to remind them forcefully, all the time, that they
2187are internationalized.  Further, a long keyword has the disadvantage
2188of using more horizontal space, forcing more indentation work on
2189sources for those trying to keep them within 79 or 80 columns.
2190
2191@cindex @code{_}, a macro to mark strings for translation
2192Many packages use @samp{_} (a simple underline) as a keyword,
2193and write @samp{_("Translatable string")} instead of @samp{gettext
2194("Translatable string")}.  Further, the coding rule, from GNU standards,
2195wanting that there is a space between the keyword and the opening
2196parenthesis is relaxed, in practice, for this particular usage.
2197So, the textual overhead per translatable string is reduced to
2198only three characters: the underline and the two parentheses.
2199However, even if GNU @code{gettext} uses this convention internally,
2200it does not offer it officially.  The real, genuine keyword is truly
2201@samp{gettext} indeed.  It is fairly easy for those wanting to use
2202@samp{_} instead of @samp{gettext} to declare:
2203
2204@example
2205#include <libintl.h>
2206#define _(String) gettext (String)
2207@end example
2208
2209@noindent
2210instead of merely using @samp{#include <libintl.h>}.
2211
2212The marking keywords @samp{gettext} and @samp{_} take the translatable
2213string as sole argument.  It is also possible to define marking functions
2214that take it at another argument position.  It is even possible to make
2215the marked argument position depend on the total number of arguments of
2216the function call; this is useful in C++.  All this is achieved using
2217@code{xgettext}'s @samp{--keyword} option.
2218
2219Note also that long strings can be split across lines, into multiple
2220adjacent string tokens.  Automatic string concatenation is performed
2221at compile time according to ISO C and ISO C++; @code{xgettext} also
2222supports this syntax.
2223
2224Later on, the maintenance is relatively easy.  If, as a programmer,
2225you add or modify a string, you will have to ask yourself if the
2226new or altered string requires translation, and include it within
2227@samp{_()} if you think it should be translated.  For example, @samp{"%s"}
2228is an example of string @emph{not} requiring translation.  But
2229@samp{"%s: %d"} @emph{does} require translation, because in French, unlike
2230in English, it's customary to put a space before a colon.
2231
2232@node Marking, c-format Flag, Mark Keywords, Sources
2233@section Marking Translatable Strings
2234@emindex marking strings for translation
2235
2236In PO mode, one set of features is meant more for the programmer than
2237for the translator, and allows him to interactively mark which strings,
2238in a set of program sources, are translatable, and which are not.
2239Even if it is a fairly easy job for a programmer to find and mark
2240such strings by other means, using any editor of his choice, PO mode
2241makes this work more comfortable.  Further, this gives translators
2242who feel a little like programmers, or programmers who feel a little
2243like translators, a tool letting them work at marking translatable
2244strings in the program sources, while simultaneously producing a set of
2245translation in some language, for the package being internationalized.
2246
2247@emindex @code{etags}, using for marking strings
2248The set of program sources, targeted by the PO mode commands describe
2249here, should have an Emacs tags table constructed for your project,
2250prior to using these PO file commands.  This is easy to do.  In any
2251shell window, change the directory to the root of your project, then
2252execute a command resembling:
2253
2254@example
2255etags src/*.[hc] lib/*.[hc]
2256@end example
2257
2258@noindent
2259presuming here you want to process all @file{.h} and @file{.c} files
2260from the @file{src/} and @file{lib/} directories.  This command will
2261explore all said files and create a @file{TAGS} file in your root
2262directory, somewhat summarizing the contents using a special file
2263format Emacs can understand.
2264
2265@emindex @file{TAGS}, and marking translatable strings
2266For packages following the GNU coding standards, there is
2267a make goal @code{tags} or @code{TAGS} which constructs the tag files in
2268all directories and for all files containing source code.
2269
2270Once your @file{TAGS} file is ready, the following commands assist
2271the programmer at marking translatable strings in his set of sources.
2272But these commands are necessarily driven from within a PO file
2273window, and it is likely that you do not even have such a PO file yet.
2274This is not a problem at all, as you may safely open a new, empty PO
2275file, mainly for using these commands.  This empty PO file will slowly
2276fill in while you mark strings as translatable in your program sources.
2277
2278@table @kbd
2279@item ,
2280@efindex ,@r{, PO Mode command}
2281Search through program sources for a string which looks like a
2282candidate for translation (@code{po-tags-search}).
2283
2284@item M-,
2285@efindex M-,@r{, PO Mode command}
2286Mark the last string found with @samp{_()} (@code{po-mark-translatable}).
2287
2288@item M-.
2289@efindex M-.@r{, PO Mode command}
2290Mark the last string found with a keyword taken from a set of possible
2291keywords.  This command with a prefix allows some management of these
2292keywords (@code{po-select-mark-and-mark}).
2293
2294@end table
2295
2296@efindex po-tags-search@r{, PO Mode command}
2297The @kbd{,} (@code{po-tags-search}) command searches for the next
2298occurrence of a string which looks like a possible candidate for
2299translation, and displays the program source in another Emacs window,
2300positioned in such a way that the string is near the top of this other
2301window.  If the string is too big to fit whole in this window, it is
2302positioned so only its end is shown.  In any case, the cursor
2303is left in the PO file window.  If the shown string would be better
2304presented differently in different native languages, you may mark it
2305using @kbd{M-,} or @kbd{M-.}.  Otherwise, you might rather ignore it
2306and skip to the next string by merely repeating the @kbd{,} command.
2307
2308A string is a good candidate for translation if it contains a sequence
2309of three or more letters.  A string containing at most two letters in
2310a row will be considered as a candidate if it has more letters than
2311non-letters.  The command disregards strings containing no letters,
2312or isolated letters only.  It also disregards strings within comments,
2313or strings already marked with some keyword PO mode knows (see below).
2314
2315If you have never told Emacs about some @file{TAGS} file to use, the
2316command will request that you specify one from the minibuffer, the
2317first time you use the command.  You may later change your @file{TAGS}
2318file by using the regular Emacs command @w{@kbd{M-x visit-tags-table}},
2319which will ask you to name the precise @file{TAGS} file you want
2320to use.  @xref{Tags, , Tag Tables, emacs, The Emacs Editor}.
2321
2322Each time you use the @kbd{,} command, the search resumes from where it was
2323left by the previous search, and goes through all program sources,
2324obeying the @file{TAGS} file, until all sources have been processed.
2325However, by giving a prefix argument to the command @w{(@kbd{C-u
2326,})}, you may request that the search be restarted all over again
2327from the first program source; but in this case, strings that you
2328recently marked as translatable will be automatically skipped.
2329
2330Using this @kbd{,} command does not prevent using of other regular
2331Emacs tags commands.  For example, regular @code{tags-search} or
2332@code{tags-query-replace} commands may be used without disrupting the
2333independent @kbd{,} search sequence.  However, as implemented, the
2334@emph{initial} @kbd{,} command (or the @kbd{,} command is used with a
2335prefix) might also reinitialize the regular Emacs tags searching to the
2336first tags file, this reinitialization might be considered spurious.
2337
2338@efindex po-mark-translatable@r{, PO Mode command}
2339@efindex po-select-mark-and-mark@r{, PO Mode command}
2340The @kbd{M-,} (@code{po-mark-translatable}) command will mark the
2341recently found string with the @samp{_} keyword.  The @kbd{M-.}
2342(@code{po-select-mark-and-mark}) command will request that you type
2343one keyword from the minibuffer and use that keyword for marking
2344the string.  Both commands will automatically create a new PO file
2345untranslated entry for the string being marked, and make it the
2346current entry (making it easy for you to immediately proceed to its
2347translation, if you feel like doing it right away).  It is possible
2348that the modifications made to the program source by @kbd{M-,} or
2349@kbd{M-.} render some source line longer than 80 columns, forcing you
2350to break and re-indent this line differently.  You may use the @kbd{O}
2351command from PO mode, or any other window changing command from
2352Emacs, to break out into the program source window, and do any
2353needed adjustments.  You will have to use some regular Emacs command
2354to return the cursor to the PO file window, if you want command
2355@kbd{,} for the next string, say.
2356
2357The @kbd{M-.} command has a few built-in speedups, so you do not
2358have to explicitly type all keywords all the time.  The first such
2359speedup is that you are presented with a @emph{preferred} keyword,
2360which you may accept by merely typing @kbd{@key{RET}} at the prompt.
2361The second speedup is that you may type any non-ambiguous prefix of the
2362keyword you really mean, and the command will complete it automatically
2363for you.  This also means that PO mode has to @emph{know} all
2364your possible keywords, and that it will not accept mistyped keywords.
2365
2366If you reply @kbd{?} to the keyword request, the command gives a
2367list of all known keywords, from which you may choose.  When the
2368command is prefixed by an argument @w{(@kbd{C-u M-.})}, it inhibits
2369updating any program source or PO file buffer, and does some simple
2370keyword management instead.  In this case, the command asks for a
2371keyword, written in full, which becomes a new allowed keyword for
2372later @kbd{M-.} commands.  Moreover, this new keyword automatically
2373becomes the @emph{preferred} keyword for later commands.  By typing
2374an already known keyword in response to @w{@kbd{C-u M-.}}, one merely
2375changes the @emph{preferred} keyword and does nothing more.
2376
2377All keywords known for @kbd{M-.} are recognized by the @kbd{,} command
2378when scanning for strings, and strings already marked by any of those
2379known keywords are automatically skipped.  If many PO files are opened
2380simultaneously, each one has its own independent set of known keywords.
2381There is no provision in PO mode, currently, for deleting a known
2382keyword, you have to quit the file (maybe using @kbd{q}) and reopen
2383it afresh.  When a PO file is newly brought up in an Emacs window, only
2384@samp{gettext} and @samp{_} are known as keywords, and @samp{gettext}
2385is preferred for the @kbd{M-.} command.  In fact, this is not useful to
2386prefer @samp{_}, as this one is already built in the @kbd{M-,} command.
2387
2388@node c-format Flag, Special cases, Marking, Sources
2389@section Special Comments preceding Keywords
2390
2391@c FIXME document c-format and no-c-format.
2392
2393@cindex format strings
2394In C programs strings are often used within calls of functions from the
2395@code{printf} family.  The special thing about these format strings is
2396that they can contain format specifiers introduced with @kbd{%}.  Assume
2397we have the code
2398
2399@example
2400printf (gettext ("String `%s' has %d characters\n"), s, strlen (s));
2401@end example
2402
2403@noindent
2404A possible German translation for the above string might be:
2405
2406@example
2407"%d Zeichen lang ist die Zeichenkette `%s'"
2408@end example
2409
2410A C programmer, even if he cannot speak German, will recognize that
2411there is something wrong here.  The order of the two format specifiers
2412is changed but of course the arguments in the @code{printf} don't have.
2413This will most probably lead to problems because now the length of the
2414string is regarded as the address.
2415
2416To prevent errors at runtime caused by translations the @code{msgfmt}
2417tool can check statically whether the arguments in the original and the
2418translation string match in type and number.  If this is not the case
2419and the @samp{-c} option has been passed to @code{msgfmt}, @code{msgfmt}
2420will give an error and refuse to produce a MO file.  Thus consequent
2421use of @samp{msgfmt -c} will catch the error, so that it cannot cause
2422cause problems at runtime.
2423
2424@noindent
2425If the word order in the above German translation would be correct one
2426would have to write
2427
2428@example
2429"%2$d Zeichen lang ist die Zeichenkette `%1$s'"
2430@end example
2431
2432@noindent
2433The routines in @code{msgfmt} know about this special notation.
2434
2435Because not all strings in a program must be format strings it is not
2436useful for @code{msgfmt} to test all the strings in the @file{.po} file.
2437This might cause problems because the string might contain what looks
2438like a format specifier, but the string is not used in @code{printf}.
2439
2440Therefore the @code{xgettext} adds a special tag to those messages it
2441thinks might be a format string.  There is no absolute rule for this,
2442only a heuristic.  In the @file{.po} file the entry is marked using the
2443@code{c-format} flag in the @code{#,} comment line (@pxref{PO Files}).
2444
2445@kwindex c-format@r{, and @code{xgettext}}
2446@kwindex no-c-format@r{, and @code{xgettext}}
2447The careful reader now might say that this again can cause problems.
2448The heuristic might guess it wrong.  This is true and therefore
2449@code{xgettext} knows about a special kind of comment which lets
2450the programmer take over the decision.  If in the same line as or
2451the immediately preceding line to the @code{gettext} keyword
2452the @code{xgettext} program finds a comment containing the words
2453@code{xgettext:c-format}, it will mark the string in any case with
2454the @code{c-format} flag.  This kind of comment should be used when
2455@code{xgettext} does not recognize the string as a format string but
2456it really is one and it should be tested.  Please note that when the
2457comment is in the same line as the @code{gettext} keyword, it must be
2458before the string to be translated.
2459
2460This situation happens quite often.  The @code{printf} function is often
2461called with strings which do not contain a format specifier.  Of course
2462one would normally use @code{fputs} but it does happen.  In this case
2463@code{xgettext} does not recognize this as a format string but what
2464happens if the translation introduces a valid format specifier?  The
2465@code{printf} function will try to access one of the parameters but none
2466exists because the original code does not pass any parameters.
2467
2468@code{xgettext} of course could make a wrong decision the other way
2469round, i.e.@: a string marked as a format string actually is not a format
2470string.  In this case the @code{msgfmt} might give too many warnings and
2471would prevent translating the @file{.po} file.  The method to prevent
2472this wrong decision is similar to the one used above, only the comment
2473to use must contain the string @code{xgettext:no-c-format}.
2474
2475If a string is marked with @code{c-format} and this is not correct the
2476user can find out who is responsible for the decision.  See
2477@ref{xgettext Invocation} to see how the @code{--debug} option can be
2478used for solving this problem.
2479
2480@node Special cases, Bug Report Address, c-format Flag, Sources
2481@section Special Cases of Translatable Strings
2482
2483@cindex marking string initializers
2484The attentive reader might now point out that it is not always possible
2485to mark translatable string with @code{gettext} or something like this.
2486Consider the following case:
2487
2488@example
2489@group
2490@{
2491  static const char *messages[] = @{
2492    "some very meaningful message",
2493    "and another one"
2494  @};
2495  const char *string;
2496  @dots{}
2497  string
2498    = index > 1 ? "a default message" : messages[index];
2499
2500  fputs (string);
2501  @dots{}
2502@}
2503@end group
2504@end example
2505
2506While it is no problem to mark the string @code{"a default message"} it
2507is not possible to mark the string initializers for @code{messages}.
2508What is to be done?  We have to fulfill two tasks.  First we have to mark the
2509strings so that the @code{xgettext} program (@pxref{xgettext Invocation})
2510can find them, and second we have to translate the string at runtime
2511before printing them.
2512
2513The first task can be fulfilled by creating a new keyword, which names a
2514no-op.  For the second we have to mark all access points to a string
2515from the array.  So one solution can look like this:
2516
2517@example
2518@group
2519#define gettext_noop(String) String
2520
2521@{
2522  static const char *messages[] = @{
2523    gettext_noop ("some very meaningful message"),
2524    gettext_noop ("and another one")
2525  @};
2526  const char *string;
2527  @dots{}
2528  string
2529    = index > 1 ? gettext ("a default message") : gettext (messages[index]);
2530
2531  fputs (string);
2532  @dots{}
2533@}
2534@end group
2535@end example
2536
2537Please convince yourself that the string which is written by
2538@code{fputs} is translated in any case.  How to get @code{xgettext} know
2539the additional keyword @code{gettext_noop} is explained in @ref{xgettext
2540Invocation}.
2541
2542The above is of course not the only solution.  You could also come along
2543with the following one:
2544
2545@example
2546@group
2547#define gettext_noop(String) String
2548
2549@{
2550  static const char *messages[] = @{
2551    gettext_noop ("some very meaningful message",
2552    gettext_noop ("and another one")
2553  @};
2554  const char *string;
2555  @dots{}
2556  string
2557    = index > 1 ? gettext_noop ("a default message") : messages[index];
2558
2559  fputs (gettext (string));
2560  @dots{}
2561@}
2562@end group
2563@end example
2564
2565But this has a drawback.  The programmer has to take care that
2566he uses @code{gettext_noop} for the string @code{"a default message"}.
2567A use of @code{gettext} could have in rare cases unpredictable results.
2568
2569One advantage is that you need not make control flow analysis to make
2570sure the output is really translated in any case.  But this analysis is
2571generally not very difficult.  If it should be in any situation you can
2572use this second method in this situation.
2573
2574@node Bug Report Address, Names, Special cases, Sources
2575@section Letting Users Report Translation Bugs
2576
2577Code sometimes has bugs, but translations sometimes have bugs too.  The
2578users need to be able to report them.  Reporting translation bugs to the
2579programmer or maintainer of a package is not very useful, since the
2580maintainer must never change a translation, except on behalf of the
2581translator.  Hence the translation bugs must be reported to the
2582translators.
2583
2584Here is a way to organize this so that the maintainer does not need to
2585forward translation bug reports, nor even keep a list of the addresses of
2586the translators or their translation teams.
2587
2588Every program has a place where is shows the bug report address.  For
2589GNU programs, it is the code which handles the ``--help'' option,
2590typically in a function called ``usage''.  In this place, instruct the
2591translator to add her own bug reporting address.  For example, if that
2592code has a statement
2593
2594@example
2595@group
2596printf (_("Report bugs to <%s>.\n"), PACKAGE_BUGREPORT);
2597@end group
2598@end example
2599
2600you can add some translator instructions like this:
2601
2602@example
2603@group
2604/* TRANSLATORS: The placeholder indicates the bug-reporting address
2605   for this package.  Please add _another line_ saying
2606   "Report translation bugs to <...>\n" with the address for translation
2607   bugs (typically your translation team's web or email address).  */
2608printf (_("Report bugs to <%s>.\n"), PACKAGE_BUGREPORT);
2609@end group
2610@end example
2611
2612These will be extracted by @samp{xgettext}, leading to a .pot file that
2613contains this:
2614
2615@example
2616@group
2617#. TRANSLATORS: The placeholder indicates the bug-reporting address
2618#. for this package.  Please add _another line_ saying
2619#. "Report translation bugs to <...>\n" with the address for translation
2620#. bugs (typically your translation team's web or email address).
2621#: src/hello.c:178
2622#, c-format
2623msgid "Report bugs to <%s>.\n"
2624msgstr ""
2625@end group
2626@end example
2627
2628@node Names, Libraries, Bug Report Address, Sources
2629@section Marking Proper Names for Translation
2630
2631Should names of persons, cities, locations etc. be marked for translation
2632or not?  People who only know languages that can be written with Latin
2633letters (English, Spanish, French, German, etc.) are tempted to say ``no'',
2634because names usually do not change when transported between these languages.
2635However, in general when translating from one script to another, names
2636are translated too, usually phonetically or by transliteration.  For
2637example, Russian or Greek names are converted to the Latin alphabet when
2638being translated to English, and English or French names are converted
2639to the Katakana script when being translated to Japanese.  This is
2640necessary because the speakers of the target language in general cannot
2641read the script the name is originally written in.
2642
2643As a programmer, you should therefore make sure that names are marked
2644for translation, with a special comment telling the translators that it
2645is a proper name and how to pronounce it.  Like this:
2646
2647@example
2648@group
2649printf (_("Written by %s.\n"),
2650        /* TRANSLATORS: This is a proper name.  See the gettext
2651           manual, section Names.  Note this is actually a non-ASCII
2652           name: The first name is (with Unicode escapes)
2653           "Fran\u00e7ois" or (with HTML entities) "Fran&ccedil;ois".
2654           Pronunciation is like "fraa-swa pee-nar".  */
2655        _("Francois Pinard"));
2656@end group
2657@end example
2658
2659As a translator, you should use some care when translating names, because
2660it is frustrating if people see their names mutilated or distorted.  If
2661your language uses the Latin script, all you need to do is to reproduce
2662the name as perfectly as you can within the usual character set of your
2663language.  In this particular case, this means to provide a translation
2664containing the c-cedilla character.  If your language uses a different
2665script and the people speaking it don't usually read Latin words, it means
2666transliteration; but you should still give, in parentheses, the original
2667writing of the name -- for the sake of the people that do read the Latin
2668script.  Here is an example, using Greek as the target script:
2669
2670@example
2671@group
2672#. This is a proper name.  See the gettext
2673#. manual, section Names.  Note this is actually a non-ASCII
2674#. name: The first name is (with Unicode escapes)
2675#. "Fran\u00e7ois" or (with HTML entities) "Fran&ccedil;ois".
2676#. Pronunciation is like "fraa-swa pee-nar".
2677msgid "Francois Pinard"
2678msgstr "\phi\rho\alpha\sigma\omicron\alpha \pi\iota\nu\alpha\rho"
2679       " (Francois Pinard)"
2680@end group
2681@end example
2682
2683Because translation of names is such a sensitive domain, it is a good
2684idea to test your translation before submitting it.
2685
2686The translation project @url{http://sourceforge.net/projects/translation}
2687has set up a POT file and translation domain consisting of program author
2688names, with better facilities for the translator than those presented here.
2689Namely, there the original name is written directly in Unicode (rather
2690than with Unicode escapes or HTML entities), and the pronunciation is
2691denoted using the International Phonetic Alphabet (see
2692@url{http://www.wikipedia.org/wiki/International_Phonetic_Alphabet}).
2693
2694However, we don't recommend this approach for all POT files in all packages,
2695because this would force translators to use PO files in UTF-8 encoding,
2696which is - in the current state of software (as of 2003) - a major hassle
2697for translators using GNU Emacs or XEmacs with po-mode.
2698
2699@node Libraries,  , Names, Sources
2700@section Preparing Library Sources
2701
2702When you are preparing a library, not a program, for the use of
2703@code{gettext}, only a few details are different.  Here we assume that
2704the library has a translation domain and a POT file of its own.  (If
2705it uses the translation domain and POT file of the main program, then
2706the previous sections apply without changes.)
2707
2708@enumerate
2709@item
2710The library code doesn't call @code{setlocale (LC_ALL, "")}.  It's the
2711responsibility of the main program to set the locale.  The library's
2712documentation should mention this fact, so that developers of programs
2713using the library are aware of it.
2714
2715@item
2716The library code doesn't call @code{textdomain (PACKAGE)}, because it
2717would interfere with the text domain set by the main program.
2718
2719@item
2720The initialization code for a program was
2721
2722@smallexample
2723  setlocale (LC_ALL, "");
2724  bindtextdomain (PACKAGE, LOCALEDIR);
2725  textdomain (PACKAGE);
2726@end smallexample
2727
2728@noindent
2729For a library it is reduced to
2730
2731@smallexample
2732  bindtextdomain (PACKAGE, LOCALEDIR);
2733@end smallexample
2734
2735@noindent
2736If your library's API doesn't already have an initialization function,
2737you need to create one, containing at least the @code{bindtextdomain}
2738invocation.  However, you usually don't need to export and document this
2739initialization function: It is sufficient that all entry points of the
2740library call the initialization function if it hasn't been called before.
2741The typical idiom used to achieve this is a static boolean variable that
2742indicates whether the initialization function has been called. Like this:
2743
2744@example
2745@group
2746static bool libfoo_initialized;
2747
2748static void
2749libfoo_initialize (void)
2750@{
2751  bindtextdomain (PACKAGE, LOCALEDIR);
2752  libfoo_initialized = true;
2753@}
2754
2755/* This function is part of the exported API.  */
2756struct foo *
2757create_foo (...)
2758@{
2759  /* Must ensure the initialization is performed.  */
2760  if (!libfoo_initialized)
2761    libfoo_initialize ();
2762  ...
2763@}
2764
2765/* This function is part of the exported API.  The argument must be
2766   non-NULL and have been created through create_foo().  */
2767int
2768foo_refcount (struct foo *argument)
2769@{
2770  /* No need to invoke the initialization function here, because
2771     create_foo() must already have been called before.  */
2772  ...
2773@}
2774@end group
2775@end example
2776
2777@item
2778The usual declaration of the @samp{_} macro in each source file was
2779
2780@smallexample
2781#include <libintl.h>
2782#define _(String) gettext (String)
2783@end smallexample
2784
2785@noindent
2786for a program.  For a library, which has its own translation domain,
2787it reads like this:
2788
2789@smallexample
2790#include <libintl.h>
2791#define _(String) dgettext (PACKAGE, String)
2792@end smallexample
2793
2794In other words, @code{dgettext} is used instead of @code{gettext}.
2795Similarly, the @code{dngettext} function should be used in place of the
2796@code{ngettext} function.
2797@end enumerate
2798
2799@node Template, Creating, Sources, Top
2800@chapter Making the PO Template File
2801@cindex PO template file
2802
2803After preparing the sources, the programmer creates a PO template file.
2804This section explains how to use @code{xgettext} for this purpose.
2805
2806@code{xgettext} creates a file named @file{@var{domainname}.po}.  You
2807should then rename it to @file{@var{domainname}.pot}.  (Why doesn't
2808@code{xgettext} create it under the name @file{@var{domainname}.pot}
2809right away?  The answer is: for historical reasons.  When @code{xgettext}
2810was specified, the distinction between a PO file and PO file template
2811was fuzzy, and the suffix @samp{.pot} wasn't in use at that time.)
2812
2813@c FIXME: Rewrite.
2814
2815@menu
2816* xgettext Invocation::         Invoking the @code{xgettext} Program
2817@end menu
2818
2819@node xgettext Invocation,  , Template, Template
2820@section Invoking the @code{xgettext} Program
2821
2822@include xgettext.texi
2823
2824@node Creating, Updating, Template, Top
2825@chapter Creating a New PO File
2826@cindex creating a new PO file
2827
2828When starting a new translation, the translator creates a file called
2829@file{@var{LANG}.po}, as a copy of the @file{@var{package}.pot} template
2830file with modifications in the initial comments (at the beginning of the file)
2831and in the header entry (the first entry, near the beginning of the file).
2832
2833The easiest way to do so is by use of the @samp{msginit} program.
2834For example:
2835
2836@example
2837$ cd @var{PACKAGE}-@var{VERSION}
2838$ cd po
2839$ msginit
2840@end example
2841
2842The alternative way is to do the copy and modifications by hand.
2843To do so, the translator copies @file{@var{package}.pot} to
2844@file{@var{LANG}.po}.  Then she modifies the initial comments and
2845the header entry of this file.
2846
2847@menu
2848* msginit Invocation::          Invoking the @code{msginit} Program
2849* Header Entry::                Filling in the Header Entry
2850@end menu
2851
2852@node msginit Invocation, Header Entry, Creating, Creating
2853@section Invoking the @code{msginit} Program
2854
2855@include msginit.texi
2856
2857@node Header Entry,  , msginit Invocation, Creating
2858@section Filling in the Header Entry
2859@cindex header entry of a PO file
2860
2861The initial comments "SOME DESCRIPTIVE TITLE", "YEAR" and
2862"FIRST AUTHOR <EMAIL@@ADDRESS>, YEAR" ought to be replaced by sensible
2863information.  This can be done in any text editor; if Emacs is used
2864and it switched to PO mode automatically (because it has recognized
2865the file's suffix), you can disable it by typing @kbd{M-x fundamental-mode}.
2866
2867Modifying the header entry can already be done using PO mode: in Emacs,
2868type @kbd{M-x po-mode RET} and then @kbd{RET} again to start editing the
2869entry.  You should fill in the following fields.
2870
2871@table @asis
2872@item Project-Id-Version
2873This is the name and version of the package.  Fill it in if it has not
2874already been filled in by @code{xgettext}.
2875
2876@item Report-Msgid-Bugs-To
2877This has already been filled in by @code{xgettext}.  It contains an email
2878address or URL where you can report bugs in the untranslated strings:
2879
2880@itemize -
2881@item Strings which are not entire sentences, see the maintainer guidelines
2882in @ref{Preparing Strings}.
2883@item Strings which use unclear terms or require additional context to be
2884understood.
2885@item Strings which make invalid assumptions about notation of date, time or
2886money.
2887@item Pluralisation problems.
2888@item Incorrect English spelling.
2889@item Incorrect formatting.
2890@end itemize
2891
2892@item POT-Creation-Date
2893This has already been filled in by @code{xgettext}.
2894
2895@item PO-Revision-Date
2896You don't need to fill this in.  It will be filled by the PO file editor
2897when you save the file.
2898
2899@item Last-Translator
2900Fill in your name and email address (without double quotes).
2901
2902@item Language-Team
2903Fill in the English name of the language, and the email address or
2904homepage URL of the language team you are part of.
2905
2906Before starting a translation, it is a good idea to get in touch with
2907your translation team, not only to make sure you don't do duplicated work,
2908but also to coordinate difficult linguistic issues.
2909
2910@cindex list of translation teams, where to find
2911In the Free Translation Project, each translation team has its own mailing
2912list.  The up-to-date list of teams can be found at the Free Translation
2913Project's homepage, @uref{http://translationproject.org/}, in the "Teams"
2914area.
2915
2916@item Content-Type
2917@cindex encoding of PO files
2918@cindex charset of PO files
2919Replace @samp{CHARSET} with the character encoding used for your language,
2920in your locale, or UTF-8.  This field is needed for correct operation of the
2921@code{msgmerge} and @code{msgfmt} programs, as well as for users whose
2922locale's character encoding differs from yours (see @ref{Charset conversion}).
2923
2924@cindex @code{locale} program
2925You get the character encoding of your locale by running the shell command
2926@samp{locale charmap}.  If the result is @samp{C} or @samp{ANSI_X3.4-1968},
2927which is equivalent to @samp{ASCII} (= @samp{US-ASCII}), it means that your
2928locale is not correctly configured.  In this case, ask your translation
2929team which charset to use.  @samp{ASCII} is not usable for any language
2930except Latin.
2931
2932@cindex encoding list
2933Because the PO files must be portable to operating systems with less advanced
2934internationalization facilities, the character encodings that can be used
2935are limited to those supported by both GNU @code{libc} and GNU
2936@code{libiconv}.  These are:
2937@code{ASCII}, @code{ISO-8859-1}, @code{ISO-8859-2}, @code{ISO-8859-3},
2938@code{ISO-8859-4}, @code{ISO-8859-5}, @code{ISO-8859-6}, @code{ISO-8859-7},
2939@code{ISO-8859-8}, @code{ISO-8859-9}, @code{ISO-8859-13}, @code{ISO-8859-14},
2940@code{ISO-8859-15},
2941@code{KOI8-R}, @code{KOI8-U}, @code{KOI8-T},
2942@code{CP850}, @code{CP866}, @code{CP874},
2943@code{CP932}, @code{CP949}, @code{CP950}, @code{CP1250}, @code{CP1251},
2944@code{CP1252}, @code{CP1253}, @code{CP1254}, @code{CP1255}, @code{CP1256},
2945@code{CP1257}, @code{GB2312}, @code{EUC-JP}, @code{EUC-KR}, @code{EUC-TW},
2946@code{BIG5}, @code{BIG5-HKSCS}, @code{GBK}, @code{GB18030}, @code{SHIFT_JIS},
2947@code{JOHAB}, @code{TIS-620}, @code{VISCII}, @code{GEORGIAN-PS}, @code{UTF-8}.
2948
2949@c This data is taken from glibc/localedata/SUPPORTED.
2950@cindex Linux
2951In the GNU system, the following encodings are frequently used for the
2952corresponding languages.
2953
2954@cindex encoding for your language
2955@itemize
2956@item @code{ISO-8859-1} for
2957Afrikaans, Albanian, Basque, Breton, Catalan, Cornish, Danish, Dutch,
2958English, Estonian, Faroese, Finnish, French, Galician, German,
2959Greenlandic, Icelandic, Indonesian, Irish, Italian, Malay, Manx,
2960Norwegian, Occitan, Portuguese, Spanish, Swedish, Tagalog, Uzbek,
2961Walloon,
2962@item @code{ISO-8859-2} for
2963Bosnian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian, Slovak,
2964Slovenian,
2965@item @code{ISO-8859-3} for Maltese,
2966@item @code{ISO-8859-5} for Macedonian, Serbian,
2967@item @code{ISO-8859-6} for Arabic,
2968@item @code{ISO-8859-7} for Greek,
2969@item @code{ISO-8859-8} for Hebrew,
2970@item @code{ISO-8859-9} for Turkish,
2971@item @code{ISO-8859-13} for Latvian, Lithuanian, Maori,
2972@item @code{ISO-8859-14} for Welsh,
2973@item @code{ISO-8859-15} for
2974Basque, Catalan, Dutch, English, Finnish, French, Galician, German, Irish,
2975Italian, Portuguese, Spanish, Swedish, Walloon,
2976@item @code{KOI8-R} for Russian,
2977@item @code{KOI8-U} for Ukrainian,
2978@item @code{KOI8-T} for Tajik,
2979@item @code{CP1251} for Bulgarian, Byelorussian,
2980@item @code{GB2312}, @code{GBK}, @code{GB18030}
2981for simplified writing of Chinese,
2982@item @code{BIG5}, @code{BIG5-HKSCS}
2983for traditional writing of Chinese,
2984@item @code{EUC-JP} for Japanese,
2985@item @code{EUC-KR} for Korean,
2986@item @code{TIS-620} for Thai,
2987@item @code{GEORGIAN-PS} for Georgian,
2988@item @code{UTF-8} for any language, including those listed above.
2989@end itemize
2990
2991@cindex quote characters, use in PO files
2992@cindex quotation marks
2993When single quote characters or double quote characters are used in
2994translations for your language, and your locale's encoding is one of the
2995ISO-8859-* charsets, it is best if you create your PO files in UTF-8
2996encoding, instead of your locale's encoding.  This is because in UTF-8
2997the real quote characters can be represented (single quote characters:
2998U+2018, U+2019, double quote characters: U+201C, U+201D), whereas none of
2999ISO-8859-* charsets has them all.  Users in UTF-8 locales will see the
3000real quote characters, whereas users in ISO-8859-* locales will see the
3001vertical apostrophe and the vertical double quote instead (because that's
3002what the character set conversion will transliterate them to).
3003
3004@cindex @code{xmodmap} program, and typing quotation marks
3005To enter such quote characters under X11, you can change your keyboard
3006mapping using the @code{xmodmap} program.  The X11 names of the quote
3007characters are "leftsinglequotemark", "rightsinglequotemark",
3008"leftdoublequotemark", "rightdoublequotemark", "singlelowquotemark",
3009"doublelowquotemark".
3010
3011Note that only recent versions of GNU Emacs support the UTF-8 encoding:
3012Emacs 20 with Mule-UCS, and Emacs 21.  As of January 2001, XEmacs doesn't
3013support the UTF-8 encoding.
3014
3015The character encoding name can be written in either upper or lower case.
3016Usually upper case is preferred.
3017
3018@item Content-Transfer-Encoding
3019Set this to @code{8bit}.
3020
3021@item Plural-Forms
3022This field is optional.  It is only needed if the PO file has plural forms.
3023You can find them by searching for the @samp{msgid_plural} keyword.  The
3024format of the plural forms field is described in @ref{Plural forms}.
3025@end table
3026
3027@node Updating, Editing, Creating, Top
3028@chapter Updating Existing PO Files
3029
3030@menu
3031* msgmerge Invocation::         Invoking the @code{msgmerge} Program
3032@end menu
3033
3034@node msgmerge Invocation,  , Updating, Updating
3035@section Invoking the @code{msgmerge} Program
3036
3037@include msgmerge.texi
3038
3039@node Editing, Manipulating, Updating, Top
3040@chapter Editing PO Files
3041@cindex Editing PO Files
3042
3043@menu
3044* KBabel::                      KDE's PO File Editor
3045* Gtranslator::                 GNOME's PO File Editor
3046* PO Mode::                     Emacs's PO File Editor
3047* Compendium::                  Using Translation Compendia
3048@end menu
3049
3050@node KBabel, Gtranslator, Editing, Editing
3051@section KDE's PO File Editor
3052@cindex KDE PO file editor
3053
3054@node Gtranslator, PO Mode, KBabel, Editing
3055@section GNOME's PO File Editor
3056@cindex GNOME PO file editor
3057
3058@node PO Mode, Compendium, Gtranslator, Editing
3059@section Emacs's PO File Editor
3060@cindex Emacs PO Mode
3061
3062@c FIXME: Rewrite.
3063
3064For those of you being
3065the lucky users of Emacs, PO mode has been specifically created
3066for providing a cozy environment for editing or modifying PO files.
3067While editing a PO file, PO mode allows for the easy browsing of
3068auxiliary and compendium PO files, as well as for following references into
3069the set of C program sources from which PO files have been derived.
3070It has a few special features, among which are the interactive marking
3071of program strings as translatable, and the validation of PO files
3072with easy repositioning to PO file lines showing errors.
3073
3074For the beginning, besides main PO mode commands
3075(@pxref{Main PO Commands}), you should know how to move between entries
3076(@pxref{Entry Positioning}), and how to handle untranslated entries
3077(@pxref{Untranslated Entries}).
3078
3079@menu
3080* Installation::                Completing GNU @code{gettext} Installation
3081* Main PO Commands::            Main Commands
3082* Entry Positioning::           Entry Positioning
3083* Normalizing::                 Normalizing Strings in Entries
3084* Translated Entries::          Translated Entries
3085* Fuzzy Entries::               Fuzzy Entries
3086* Untranslated Entries::        Untranslated Entries
3087* Obsolete Entries::            Obsolete Entries
3088* Modifying Translations::      Modifying Translations
3089* Modifying Comments::          Modifying Comments
3090* Subedit::                     Mode for Editing Translations
3091* C Sources Context::           C Sources Context
3092* Auxiliary::                   Consulting Auxiliary PO Files
3093@end menu
3094
3095@node Installation, Main PO Commands, PO Mode, PO Mode
3096@subsection Completing GNU @code{gettext} Installation
3097
3098@cindex installing @code{gettext}
3099@cindex @code{gettext} installation
3100Once you have received, unpacked, configured and compiled the GNU
3101@code{gettext} distribution, the @samp{make install} command puts in
3102place the programs @code{xgettext}, @code{msgfmt}, @code{gettext}, and
3103@code{msgmerge}, as well as their available message catalogs.  To
3104top off a comfortable installation, you might also want to make the
3105PO mode available to your Emacs users.
3106
3107@emindex @file{.emacs} customizations
3108@emindex installing PO mode
3109During the installation of the PO mode, you might want to modify your
3110file @file{.emacs}, once and for all, so it contains a few lines looking
3111like:
3112
3113@example
3114(setq auto-mode-alist
3115      (cons '("\\.po\\'\\|\\.po\\." . po-mode) auto-mode-alist))
3116(autoload 'po-mode "po-mode" "Major mode for translators to edit PO files" t)
3117@end example
3118
3119Later, whenever you edit some @file{.po}
3120file, or any file having the string @samp{.po.} within its name,
3121Emacs loads @file{po-mode.elc} (or @file{po-mode.el}) as needed, and
3122automatically activates PO mode commands for the associated buffer.
3123The string @emph{PO} appears in the mode line for any buffer for
3124which PO mode is active.  Many PO files may be active at once in a
3125single Emacs session.
3126
3127If you are using Emacs version 20 or newer, and have already installed
3128the appropriate international fonts on your system, you may also tell
3129Emacs how to determine automatically the coding system of every PO file.
3130This will often (but not always) cause the necessary fonts to be loaded
3131and used for displaying the translations on your Emacs screen.  For this
3132to happen, add the lines:
3133
3134@example
3135(modify-coding-system-alist 'file "\\.po\\'\\|\\.po\\."
3136                            'po-find-file-coding-system)
3137(autoload 'po-find-file-coding-system "po-mode")
3138@end example
3139
3140@noindent
3141to your @file{.emacs} file.  If, with this, you still see boxes instead
3142of international characters, try a different font set (via Shift Mouse
3143button 1).
3144
3145@node Main PO Commands, Entry Positioning, Installation, PO Mode
3146@subsection Main PO mode Commands
3147
3148@cindex PO mode (Emacs) commands
3149@emindex commands
3150After setting up Emacs with something similar to the lines in
3151@ref{Installation}, PO mode is activated for a window when Emacs finds a
3152PO file in that window.  This puts the window read-only and establishes a
3153po-mode-map, which is a genuine Emacs mode, in a way that is not derived
3154from text mode in any way.  Functions found on @code{po-mode-hook},
3155if any, will be executed.
3156
3157When PO mode is active in a window, the letters @samp{PO} appear
3158in the mode line for that window.  The mode line also displays how
3159many entries of each kind are held in the PO file.  For example,
3160the string @samp{132t+3f+10u+2o} would tell the translator that the
3161PO mode contains 132 translated entries (@pxref{Translated Entries},
31623 fuzzy entries (@pxref{Fuzzy Entries}), 10 untranslated entries
3163(@pxref{Untranslated Entries}) and 2 obsolete entries (@pxref{Obsolete
3164Entries}).  Zero-coefficients items are not shown.  So, in this example, if
3165the fuzzy entries were unfuzzied, the untranslated entries were translated
3166and the obsolete entries were deleted, the mode line would merely display
3167@samp{145t} for the counters.
3168
3169The main PO commands are those which do not fit into the other categories of
3170subsequent sections.  These allow for quitting PO mode or for managing windows
3171in special ways.
3172
3173@table @kbd
3174@item _
3175@efindex _@r{, PO Mode command}
3176Undo last modification to the PO file (@code{po-undo}).
3177
3178@item Q
3179@efindex Q@r{, PO Mode command}
3180Quit processing and save the PO file (@code{po-quit}).
3181
3182@item q
3183@efindex q@r{, PO Mode command}
3184Quit processing, possibly after confirmation (@code{po-confirm-and-quit}).
3185
3186@item 0
3187@efindex 0@r{, PO Mode command}
3188Temporary leave the PO file window (@code{po-other-window}).
3189
3190@item ?
3191@itemx h
3192@efindex ?@r{, PO Mode command}
3193@efindex h@r{, PO Mode command}
3194Show help about PO mode (@code{po-help}).
3195
3196@item =
3197@efindex =@r{, PO Mode command}
3198Give some PO file statistics (@code{po-statistics}).
3199
3200@item V
3201@efindex V@r{, PO Mode command}
3202Batch validate the format of the whole PO file (@code{po-validate}).
3203
3204@end table
3205
3206@efindex _@r{, PO Mode command}
3207@efindex po-undo@r{, PO Mode command}
3208The command @kbd{_} (@code{po-undo}) interfaces to the Emacs
3209@emph{undo} facility.  @xref{Undo, , Undoing Changes, emacs, The Emacs
3210Editor}.  Each time @kbd{U} is typed, modifications which the translator
3211did to the PO file are undone a little more.  For the purpose of
3212undoing, each PO mode command is atomic.  This is especially true for
3213the @kbd{@key{RET}} command: the whole edition made by using a single
3214use of this command is undone at once, even if the edition itself
3215implied several actions.  However, while in the editing window, one
3216can undo the edition work quite parsimoniously.
3217
3218@efindex Q@r{, PO Mode command}
3219@efindex q@r{, PO Mode command}
3220@efindex po-quit@r{, PO Mode command}
3221@efindex po-confirm-and-quit@r{, PO Mode command}
3222The commands @kbd{Q} (@code{po-quit}) and @kbd{q}
3223(@code{po-confirm-and-quit}) are used when the translator is done with the
3224PO file.  The former is a bit less verbose than the latter.  If the file
3225has been modified, it is saved to disk first.  In both cases, and prior to
3226all this, the commands check if any untranslated messages remain in the
3227PO file and, if so, the translator is asked if she really wants to leave
3228off working with this PO file.  This is the preferred way of getting rid
3229of an Emacs PO file buffer.  Merely killing it through the usual command
3230@w{@kbd{C-x k}} (@code{kill-buffer}) is not the tidiest way to proceed.
3231
3232@efindex 0@r{, PO Mode command}
3233@efindex po-other-window@r{, PO Mode command}
3234The command @kbd{0} (@code{po-other-window}) is another, softer way,
3235to leave PO mode, temporarily.  It just moves the cursor to some other
3236Emacs window, and pops one if necessary.  For example, if the translator
3237just got PO mode to show some source context in some other, she might
3238discover some apparent bug in the program source that needs correction.
3239This command allows the translator to change sex, become a programmer,
3240and have the cursor right into the window containing the program she
3241(or rather @emph{he}) wants to modify.  By later getting the cursor back
3242in the PO file window, or by asking Emacs to edit this file once again,
3243PO mode is then recovered.
3244
3245@efindex ?@r{, PO Mode command}
3246@efindex h@r{, PO Mode command}
3247@efindex po-help@r{, PO Mode command}
3248The command @kbd{h} (@code{po-help}) displays a summary of all available PO
3249mode commands.  The translator should then type any character to resume
3250normal PO mode operations.  The command @kbd{?} has the same effect
3251as @kbd{h}.
3252
3253@efindex =@r{, PO Mode command}
3254@efindex po-statistics@r{, PO Mode command}
3255The command @kbd{=} (@code{po-statistics}) computes the total number of
3256entries in the PO file, the ordinal of the current entry (counted from
32571), the number of untranslated entries, the number of obsolete entries,
3258and displays all these numbers.
3259
3260@efindex V@r{, PO Mode command}
3261@efindex po-validate@r{, PO Mode command}
3262The command @kbd{V} (@code{po-validate}) launches @code{msgfmt} in
3263checking and verbose
3264mode over the current PO file.  This command first offers to save the
3265current PO file on disk.  The @code{msgfmt} tool, from GNU @code{gettext},
3266has the purpose of creating a MO file out of a PO file, and PO mode uses
3267the features of this program for checking the overall format of a PO file,
3268as well as all individual entries.
3269
3270@efindex next-error@r{, stepping through PO file validation results}
3271The program @code{msgfmt} runs asynchronously with Emacs, so the
3272translator regains control immediately while her PO file is being studied.
3273Error output is collected in the Emacs @samp{*compilation*} buffer,
3274displayed in another window.  The regular Emacs command @kbd{C-x`}
3275(@code{next-error}), as well as other usual compile commands, allow the
3276translator to reposition quickly to the offending parts of the PO file.
3277Once the cursor is on the line in error, the translator may decide on
3278any PO mode action which would help correcting the error.
3279
3280@node Entry Positioning, Normalizing, Main PO Commands, PO Mode
3281@subsection Entry Positioning
3282
3283@emindex current entry of a PO file
3284The cursor in a PO file window is almost always part of
3285an entry.  The only exceptions are the special case when the cursor
3286is after the last entry in the file, or when the PO file is
3287empty.  The entry where the cursor is found to be is said to be the
3288current entry.  Many PO mode commands operate on the current entry,
3289so moving the cursor does more than allowing the translator to browse
3290the PO file, this also selects on which entry commands operate.
3291
3292@emindex moving through a PO file
3293Some PO mode commands alter the position of the cursor in a specialized
3294way.  A few of those special purpose positioning are described here,
3295the others are described in following sections (for a complete list try
3296@kbd{C-h m}):
3297
3298@table @kbd
3299
3300@item .
3301@efindex .@r{, PO Mode command}
3302Redisplay the current entry (@code{po-current-entry}).
3303
3304@item n
3305@efindex n@r{, PO Mode command}
3306Select the entry after the current one (@code{po-next-entry}).
3307
3308@item p
3309@efindex p@r{, PO Mode command}
3310Select the entry before the current one (@code{po-previous-entry}).
3311
3312@item <
3313@efindex <@r{, PO Mode command}
3314Select the first entry in the PO file (@code{po-first-entry}).
3315
3316@item >
3317@efindex >@r{, PO Mode command}
3318Select the last entry in the PO file (@code{po-last-entry}).
3319
3320@item m
3321@efindex m@r{, PO Mode command}
3322Record the location of the current entry for later use
3323(@code{po-push-location}).
3324
3325@item r
3326@efindex r@r{, PO Mode command}
3327Return to a previously saved entry location (@code{po-pop-location}).
3328
3329@item x
3330@efindex x@r{, PO Mode command}
3331Exchange the current entry location with the previously saved one
3332(@code{po-exchange-location}).
3333
3334@end table
3335
3336@efindex .@r{, PO Mode command}
3337@efindex po-current-entry@r{, PO Mode command}
3338Any Emacs command able to reposition the cursor may be used
3339to select the current entry in PO mode, including commands which
3340move by characters, lines, paragraphs, screens or pages, and search
3341commands.  However, there is a kind of standard way to display the
3342current entry in PO mode, which usual Emacs commands moving
3343the cursor do not especially try to enforce.  The command @kbd{.}
3344(@code{po-current-entry}) has the sole purpose of redisplaying the
3345current entry properly, after the current entry has been changed by
3346means external to PO mode, or the Emacs screen otherwise altered.
3347
3348It is yet to be decided if PO mode helps the translator, or otherwise
3349irritates her, by forcing a rigid window disposition while she
3350is doing her work.  We originally had quite precise ideas about
3351how windows should behave, but on the other hand, anyone used to
3352Emacs is often happy to keep full control.  Maybe a fixed window
3353disposition might be offered as a PO mode option that the translator
3354might activate or deactivate at will, so it could be offered on an
3355experimental basis.  If nobody feels a real need for using it, or
3356a compulsion for writing it, we should drop this whole idea.
3357The incentive for doing it should come from translators rather than
3358programmers, as opinions from an experienced translator are surely
3359more worth to me than opinions from programmers @emph{thinking} about
3360how @emph{others} should do translation.
3361
3362@efindex n@r{, PO Mode command}
3363@efindex po-next-entry@r{, PO Mode command}
3364@efindex p@r{, PO Mode command}
3365@efindex po-previous-entry@r{, PO Mode command}
3366The commands @kbd{n} (@code{po-next-entry}) and @kbd{p}
3367(@code{po-previous-entry}) move the cursor the entry following,
3368or preceding, the current one.  If @kbd{n} is given while the
3369cursor is on the last entry of the PO file, or if @kbd{p}
3370is given while the cursor is on the first entry, no move is done.
3371
3372@efindex <@r{, PO Mode command}
3373@efindex po-first-entry@r{, PO Mode command}
3374@efindex >@r{, PO Mode command}
3375@efindex po-last-entry@r{, PO Mode command}
3376The commands @kbd{<} (@code{po-first-entry}) and @kbd{>}
3377(@code{po-last-entry}) move the cursor to the first entry, or last
3378entry, of the PO file.  When the cursor is located past the last
3379entry in a PO file, most PO mode commands will return an error saying
3380@samp{After last entry}.  Moreover, the commands @kbd{<} and @kbd{>}
3381have the special property of being able to work even when the cursor
3382is not into some PO file entry, and one may use them for nicely
3383correcting this situation.  But even these commands will fail on a
3384truly empty PO file.  There are development plans for the PO mode for it
3385to interactively fill an empty PO file from sources.  @xref{Marking}.
3386
3387The translator may decide, before working at the translation of
3388a particular entry, that she needs to browse the remainder of the
3389PO file, maybe for finding the terminology or phraseology used
3390in related entries.  She can of course use the standard Emacs idioms
3391for saving the current cursor location in some register, and use that
3392register for getting back, or else, use the location ring.
3393
3394@efindex m@r{, PO Mode command}
3395@efindex po-push-location@r{, PO Mode command}
3396@efindex r@r{, PO Mode command}
3397@efindex po-pop-location@r{, PO Mode command}
3398PO mode offers another approach, by which cursor locations may be saved
3399onto a special stack.  The command @kbd{m} (@code{po-push-location})
3400merely adds the location of current entry to the stack, pushing
3401the already saved locations under the new one.  The command
3402@kbd{r} (@code{po-pop-location}) consumes the top stack element and
3403repositions the cursor to the entry associated with that top element.
3404This position is then lost, for the next @kbd{r} will move the cursor
3405to the previously saved location, and so on until no locations remain
3406on the stack.
3407
3408If the translator wants the position to be kept on the location stack,
3409maybe for taking a look at the entry associated with the top
3410element, then go elsewhere with the intent of getting back later, she
3411ought to use @kbd{m} immediately after @kbd{r}.
3412
3413@efindex x@r{, PO Mode command}
3414@efindex po-exchange-location@r{, PO Mode command}
3415The command @kbd{x} (@code{po-exchange-location}) simultaneously
3416repositions the cursor to the entry associated with the top element of
3417the stack of saved locations, and replaces that top element with the
3418location of the current entry before the move.  Consequently, repeating
3419the @kbd{x} command toggles alternatively between two entries.
3420For achieving this, the translator will position the cursor on the
3421first entry, use @kbd{m}, then position to the second entry, and
3422merely use @kbd{x} for making the switch.
3423
3424@node Normalizing, Translated Entries, Entry Positioning, PO Mode
3425@subsection Normalizing Strings in Entries
3426@cindex string normalization in entries
3427
3428There are many different ways for encoding a particular string into a
3429PO file entry, because there are so many different ways to split and
3430quote multi-line strings, and even, to represent special characters
3431by backslashed escaped sequences.  Some features of PO mode rely on
3432the ability for PO mode to scan an already existing PO file for a
3433particular string encoded into the @code{msgid} field of some entry.
3434Even if PO mode has internally all the built-in machinery for
3435implementing this recognition easily, doing it fast is technically
3436difficult.  To facilitate a solution to this efficiency problem,
3437we decided on a canonical representation for strings.
3438
3439A conventional representation of strings in a PO file is currently
3440under discussion, and PO mode experiments with a canonical representation.
3441Having both @code{xgettext} and PO mode converging towards a uniform
3442way of representing equivalent strings would be useful, as the internal
3443normalization needed by PO mode could be automatically satisfied
3444when using @code{xgettext} from GNU @code{gettext}.  An explicit
3445PO mode normalization should then be only necessary for PO files
3446imported from elsewhere, or for when the convention itself evolves.
3447
3448So, for achieving normalization of at least the strings of a given
3449PO file needing a canonical representation, the following PO mode
3450command is available:
3451
3452@emindex string normalization in entries
3453@table @kbd
3454@item M-x po-normalize
3455@efindex po-normalize@r{, PO Mode command}
3456Tidy the whole PO file by making entries more uniform.
3457
3458@end table
3459
3460The special command @kbd{M-x po-normalize}, which has no associated
3461keys, revises all entries, ensuring that strings of both original
3462and translated entries use uniform internal quoting in the PO file.
3463It also removes any crumb after the last entry.  This command may be
3464useful for PO files freshly imported from elsewhere, or if we ever
3465improve on the canonical quoting format we use.  This canonical format
3466is not only meant for getting cleaner PO files, but also for greatly
3467speeding up @code{msgid} string lookup for some other PO mode commands.
3468
3469@kbd{M-x po-normalize} presently makes three passes over the entries.
3470The first implements heuristics for converting PO files for GNU
3471@code{gettext} 0.6 and earlier, in which @code{msgid} and @code{msgstr}
3472fields were using K&R style C string syntax for multi-line strings.
3473These heuristics may fail for comments not related to obsolete
3474entries and ending with a backslash; they also depend on subsequent
3475passes for finalizing the proper commenting of continued lines for
3476obsolete entries.  This first pass might disappear once all oldish PO
3477files would have been adjusted.  The second and third pass normalize
3478all @code{msgid} and @code{msgstr} strings respectively.  They also
3479clean out those trailing backslashes used by XView's @code{msgfmt}
3480for continued lines.
3481
3482@cindex importing PO files
3483Having such an explicit normalizing command allows for importing PO
3484files from other sources, but also eases the evolution of the current
3485convention, evolution driven mostly by aesthetic concerns, as of now.
3486It is easy to make suggested adjustments at a later time, as the
3487normalizing command and eventually, other GNU @code{gettext} tools
3488should greatly automate conformance.  A description of the canonical
3489string format is given below, for the particular benefit of those not
3490having Emacs handy, and who would nevertheless want to handcraft
3491their PO files in nice ways.
3492
3493@cindex multi-line strings
3494Right now, in PO mode, strings are single line or multi-line.  A string
3495goes multi-line if and only if it has @emph{embedded} newlines, that
3496is, if it matches @samp{[^\n]\n+[^\n]}.  So, we would have:
3497
3498@example
3499msgstr "\n\nHello, world!\n\n\n"
3500@end example
3501
3502but, replacing the space by a newline, this becomes:
3503
3504@example
3505msgstr ""
3506"\n"
3507"\n"
3508"Hello,\n"
3509"world!\n"
3510"\n"
3511"\n"
3512@end example
3513
3514We are deliberately using a caricatural example, here, to make the
3515point clearer.  Usually, multi-lines are not that bad looking.
3516It is probable that we will implement the following suggestion.
3517We might lump together all initial newlines into the empty string,
3518and also all newlines introducing empty lines (that is, for @w{@var{n}
3519> 1}, the @var{n}-1'th last newlines would go together on a separate
3520string), so making the previous example appear:
3521
3522@example
3523msgstr "\n\n"
3524"Hello,\n"
3525"world!\n"
3526"\n\n"
3527@end example
3528
3529There are a few yet undecided little points about string normalization,
3530to be documented in this manual, once these questions settle.
3531
3532@node Translated Entries, Fuzzy Entries, Normalizing, PO Mode
3533@subsection Translated Entries
3534@cindex translated entries
3535
3536Each PO file entry for which the @code{msgstr} field has been filled with
3537a translation, and which is not marked as fuzzy (@pxref{Fuzzy Entries}),
3538is said to be a @dfn{translated} entry.  Only translated entries will
3539later be compiled by GNU @code{msgfmt} and become usable in programs.
3540Other entry types will be excluded; translation will not occur for them.
3541
3542@emindex moving by translated entries
3543Some commands are more specifically related to translated entry processing.
3544
3545@table @kbd
3546@item t
3547@efindex t@r{, PO Mode command}
3548Find the next translated entry (@code{po-next-translated-entry}).
3549
3550@item T
3551@efindex T@r{, PO Mode command}
3552Find the previous translated entry (@code{po-previous-translated-entry}).
3553
3554@end table
3555
3556@efindex t@r{, PO Mode command}
3557@efindex po-next-translated-entry@r{, PO Mode command}
3558@efindex T@r{, PO Mode command}
3559@efindex po-previous-translated-entry@r{, PO Mode command}
3560The commands @kbd{t} (@code{po-next-translated-entry}) and @kbd{T}
3561(@code{po-previous-translated-entry}) move forwards or backwards, chasing
3562for an translated entry.  If none is found, the search is extended and
3563wraps around in the PO file buffer.
3564
3565@evindex po-auto-fuzzy-on-edit@r{, PO Mode variable}
3566Translated entries usually result from the translator having edited in
3567a translation for them, @ref{Modifying Translations}.  However, if the
3568variable @code{po-auto-fuzzy-on-edit} is not @code{nil}, the entry having
3569received a new translation first becomes a fuzzy entry, which ought to
3570be later unfuzzied before becoming an official, genuine translated entry.
3571@xref{Fuzzy Entries}.
3572
3573@node Fuzzy Entries, Untranslated Entries, Translated Entries, PO Mode
3574@subsection Fuzzy Entries
3575@cindex fuzzy entries
3576
3577@cindex attributes of a PO file entry
3578@cindex attribute, fuzzy
3579Each PO file entry may have a set of @dfn{attributes}, which are
3580qualities given a name and explicitly associated with the translation,
3581using a special system comment.  One of these attributes
3582has the name @code{fuzzy}, and entries having this attribute are said
3583to have a fuzzy translation.  They are called fuzzy entries, for short.
3584
3585Fuzzy entries, even if they account for translated entries for
3586most other purposes, usually call for revision by the translator.
3587Those may be produced by applying the program @code{msgmerge} to
3588update an older translated PO files according to a new PO template
3589file, when this tool hypothesises that some new @code{msgid} has
3590been modified only slightly out of an older one, and chooses to pair
3591what it thinks to be the old translation for the new modified entry.
3592The slight alteration in the original string (the @code{msgid} string)
3593should often be reflected in the translated string, and this requires
3594the intervention of the translator.  For this reason, @code{msgmerge}
3595might mark some entries as being fuzzy.
3596
3597@emindex moving by fuzzy entries
3598Also, the translator may decide herself to mark an entry as fuzzy
3599for her own convenience, when she wants to remember that the entry
3600has to be later revisited.  So, some commands are more specifically
3601related to fuzzy entry processing.
3602
3603@table @kbd
3604@item z
3605@efindex z@r{, PO Mode command}
3606@c better append "-entry" all the time. -ke-
3607Find the next fuzzy entry (@code{po-next-fuzzy-entry}).
3608
3609@item Z
3610@efindex Z@r{, PO Mode command}
3611Find the previous fuzzy entry (@code{po-previous-fuzzy-entry}).
3612
3613@item @key{TAB}
3614@efindex TAB@r{, PO Mode command}
3615Remove the fuzzy attribute of the current entry (@code{po-unfuzzy}).
3616
3617@end table
3618
3619@efindex z@r{, PO Mode command}
3620@efindex po-next-fuzzy-entry@r{, PO Mode command}
3621@efindex Z@r{, PO Mode command}
3622@efindex po-previous-fuzzy-entry@r{, PO Mode command}
3623The commands @kbd{z} (@code{po-next-fuzzy-entry}) and @kbd{Z}
3624(@code{po-previous-fuzzy-entry}) move forwards or backwards, chasing for
3625a fuzzy entry.  If none is found, the search is extended and wraps
3626around in the PO file buffer.
3627
3628@efindex TAB@r{, PO Mode command}
3629@efindex po-unfuzzy@r{, PO Mode command}
3630@evindex po-auto-select-on-unfuzzy@r{, PO Mode variable}
3631The command @kbd{@key{TAB}} (@code{po-unfuzzy}) removes the fuzzy
3632attribute associated with an entry, usually leaving it translated.
3633Further, if the variable @code{po-auto-select-on-unfuzzy} has not
3634the @code{nil} value, the @kbd{@key{TAB}} command will automatically chase
3635for another interesting entry to work on.  The initial value of
3636@code{po-auto-select-on-unfuzzy} is @code{nil}.
3637
3638The initial value of @code{po-auto-fuzzy-on-edit} is @code{nil}.  However,
3639if the variable @code{po-auto-fuzzy-on-edit} is set to @code{t}, any entry
3640edited through the @kbd{@key{RET}} command is marked fuzzy, as a way to
3641ensure some kind of double check, later.  In this case, the usual paradigm
3642is that an entry becomes fuzzy (if not already) whenever the translator
3643modifies it.  If she is satisfied with the translation, she then uses
3644@kbd{@key{TAB}} to pick another entry to work on, clearing the fuzzy attribute
3645on the same blow.  If she is not satisfied yet, she merely uses @kbd{@key{SPC}}
3646to chase another entry, leaving the entry fuzzy.
3647
3648@efindex DEL@r{, PO Mode command}
3649@efindex po-fade-out-entry@r{, PO Mode command}
3650The translator may also use the @kbd{@key{DEL}} command
3651(@code{po-fade-out-entry}) over any translated entry to mark it as being
3652fuzzy, when she wants to easily leave a trace she wants to later return
3653working at this entry.
3654
3655Also, when time comes to quit working on a PO file buffer with the @kbd{q}
3656command, the translator is asked for confirmation, if fuzzy string
3657still exists.
3658
3659@node Untranslated Entries, Obsolete Entries, Fuzzy Entries, PO Mode
3660@subsection Untranslated Entries
3661@cindex untranslated entries
3662
3663When @code{xgettext} originally creates a PO file, unless told
3664otherwise, it initializes the @code{msgid} field with the untranslated
3665string, and leaves the @code{msgstr} string to be empty.  Such entries,
3666having an empty translation, are said to be @dfn{untranslated} entries.
3667Later, when the programmer slightly modifies some string right in
3668the program, this change is later reflected in the PO file
3669by the appearance of a new untranslated entry for the modified string.
3670
3671The usual commands moving from entry to entry consider untranslated
3672entries on the same level as active entries.  Untranslated entries
3673are easily recognizable by the fact they end with @w{@samp{msgstr ""}}.
3674
3675@emindex moving by untranslated entries
3676The work of the translator might be (quite naively) seen as the process
3677of seeking for an untranslated entry, editing a translation for
3678it, and repeating these actions until no untranslated entries remain.
3679Some commands are more specifically related to untranslated entry
3680processing.
3681
3682@table @kbd
3683@item u
3684@efindex u@r{, PO Mode command}
3685Find the next untranslated entry (@code{po-next-untranslated-entry}).
3686
3687@item U
3688@efindex U@r{, PO Mode command}
3689Find the previous untranslated entry (@code{po-previous-untransted-entry}).
3690
3691@item k
3692@efindex k@r{, PO Mode command}
3693Turn the current entry into an untranslated one (@code{po-kill-msgstr}).
3694
3695@end table
3696
3697@efindex u@r{, PO Mode command}
3698@efindex po-next-untranslated-entry@r{, PO Mode command}
3699@efindex U@r{, PO Mode command}
3700@efindex po-previous-untransted-entry@r{, PO Mode command}
3701The commands @kbd{u} (@code{po-next-untranslated-entry}) and @kbd{U}
3702(@code{po-previous-untransted-entry}) move forwards or backwards,
3703chasing for an untranslated entry.  If none is found, the search is
3704extended and wraps around in the PO file buffer.
3705
3706@efindex k@r{, PO Mode command}
3707@efindex po-kill-msgstr@r{, PO Mode command}
3708An entry can be turned back into an untranslated entry by
3709merely emptying its translation, using the command @kbd{k}
3710(@code{po-kill-msgstr}).  @xref{Modifying Translations}.
3711
3712Also, when time comes to quit working on a PO file buffer
3713with the @kbd{q} command, the translator is asked for confirmation,
3714if some untranslated string still exists.
3715
3716@node Obsolete Entries, Modifying Translations, Untranslated Entries, PO Mode
3717@subsection Obsolete Entries
3718@cindex obsolete entries
3719
3720By @dfn{obsolete} PO file entries, we mean those entries which are
3721commented out, usually by @code{msgmerge} when it found that the
3722translation is not needed anymore by the package being localized.
3723
3724The usual commands moving from entry to entry consider obsolete
3725entries on the same level as active entries.  Obsolete entries are
3726easily recognizable by the fact that all their lines start with
3727@code{#}, even those lines containing @code{msgid} or @code{msgstr}.
3728
3729Commands exist for emptying the translation or reinitializing it
3730to the original untranslated string.  Commands interfacing with the
3731kill ring may force some previously saved text into the translation.
3732The user may interactively edit the translation.  All these commands
3733may apply to obsolete entries, carefully leaving the entry obsolete
3734after the fact.
3735
3736@emindex moving by obsolete entries
3737Moreover, some commands are more specifically related to obsolete
3738entry processing.
3739
3740@table @kbd
3741@item o
3742@efindex o@r{, PO Mode command}
3743Find the next obsolete entry (@code{po-next-obsolete-entry}).
3744
3745@item O
3746@efindex O@r{, PO Mode command}
3747Find the previous obsolete entry (@code{po-previous-obsolete-entry}).
3748
3749@item @key{DEL}
3750@efindex DEL@r{, PO Mode command}
3751Make an active entry obsolete, or zap out an obsolete entry
3752(@code{po-fade-out-entry}).
3753
3754@end table
3755
3756@efindex o@r{, PO Mode command}
3757@efindex po-next-obsolete-entry@r{, PO Mode command}
3758@efindex O@r{, PO Mode command}
3759@efindex po-previous-obsolete-entry@r{, PO Mode command}
3760The commands @kbd{o} (@code{po-next-obsolete-entry}) and @kbd{O}
3761(@code{po-previous-obsolete-entry}) move forwards or backwards,
3762chasing for an obsolete entry.  If none is found, the search is
3763extended and wraps around in the PO file buffer.
3764
3765PO mode does not provide ways for un-commenting an obsolete entry
3766and making it active, because this would reintroduce an original
3767untranslated string which does not correspond to any marked string
3768in the program sources.  This goes with the philosophy of never
3769introducing useless @code{msgid} values.
3770
3771@efindex DEL@r{, PO Mode command}
3772@efindex po-fade-out-entry@r{, PO Mode command}
3773@emindex obsolete active entry
3774@emindex comment out PO file entry
3775However, it is possible to comment out an active entry, so making
3776it obsolete.  GNU @code{gettext} utilities will later react to the
3777disappearance of a translation by using the untranslated string.
3778The command @kbd{@key{DEL}} (@code{po-fade-out-entry}) pushes the current entry
3779a little further towards annihilation.  If the entry is active (it is a
3780translated entry), then it is first made fuzzy.  If it is already fuzzy,
3781then the entry is merely commented out, with confirmation.  If the entry
3782is already obsolete, then it is completely deleted from the PO file.
3783It is easy to recycle the translation so deleted into some other PO file
3784entry, usually one which is untranslated.  @xref{Modifying Translations}.
3785
3786Here is a quite interesting problem to solve for later development of
3787PO mode, for those nights you are not sleepy.  The idea would be that
3788PO mode might become bright enough, one of these days, to make good
3789guesses at retrieving the most probable candidate, among all obsolete
3790entries, for initializing the translation of a newly appeared string.
3791I think it might be a quite hard problem to do this algorithmically, as
3792we have to develop good and efficient measures of string similarity.
3793Right now, PO mode completely lets the decision to the translator,
3794when the time comes to find the adequate obsolete translation, it
3795merely tries to provide handy tools for helping her to do so.
3796
3797@node Modifying Translations, Modifying Comments, Obsolete Entries, PO Mode
3798@subsection Modifying Translations
3799@cindex editing translations
3800@emindex editing translations
3801
3802PO mode prevents direct modification of the PO file, by the usual
3803means Emacs gives for altering a buffer's contents.  By doing so,
3804it pretends helping the translator to avoid little clerical errors
3805about the overall file format, or the proper quoting of strings,
3806as those errors would be easily made.  Other kinds of errors are
3807still possible, but some may be caught and diagnosed by the batch
3808validation process, which the translator may always trigger by the
3809@kbd{V} command.  For all other errors, the translator has to rely on
3810her own judgment, and also on the linguistic reports submitted to her
3811by the users of the translated package, having the same mother tongue.
3812
3813When the time comes to create a translation, correct an error diagnosed
3814mechanically or reported by a user, the translators have to resort to
3815using the following commands for modifying the translations.
3816
3817@table @kbd
3818@item @key{RET}
3819@efindex RET@r{, PO Mode command}
3820Interactively edit the translation (@code{po-edit-msgstr}).
3821
3822@item @key{LFD}
3823@itemx C-j
3824@efindex LFD@r{, PO Mode command}
3825@efindex C-j@r{, PO Mode command}
3826Reinitialize the translation with the original, untranslated string
3827(@code{po-msgid-to-msgstr}).
3828
3829@item k
3830@efindex k@r{, PO Mode command}
3831Save the translation on the kill ring, and delete it (@code{po-kill-msgstr}).
3832
3833@item w
3834@efindex w@r{, PO Mode command}
3835Save the translation on the kill ring, without deleting it
3836(@code{po-kill-ring-save-msgstr}).
3837
3838@item y
3839@efindex y@r{, PO Mode command}
3840Replace the translation, taking the new from the kill ring
3841(@code{po-yank-msgstr}).
3842
3843@end table
3844
3845@efindex RET@r{, PO Mode command}
3846@efindex po-edit-msgstr@r{, PO Mode command}
3847The command @kbd{@key{RET}} (@code{po-edit-msgstr}) opens a new Emacs
3848window meant to edit in a new translation, or to modify an already existing
3849translation.  The new window contains a copy of the translation taken from
3850the current PO file entry, all ready for edition, expunged of all quoting
3851marks, fully modifiable and with the complete extent of Emacs modifying
3852commands.  When the translator is done with her modifications, she may use
3853@w{@kbd{C-c C-c}} to close the subedit window with the automatically requoted
3854results, or @w{@kbd{C-c C-k}} to abort her modifications.  @xref{Subedit},
3855for more information.
3856
3857@efindex LFD@r{, PO Mode command}
3858@efindex C-j@r{, PO Mode command}
3859@efindex po-msgid-to-msgstr@r{, PO Mode command}
3860The command @kbd{@key{LFD}} (@code{po-msgid-to-msgstr}) initializes, or
3861reinitializes the translation with the original string.  This command is
3862normally used when the translator wants to redo a fresh translation of
3863the original string, disregarding any previous work.
3864
3865@evindex po-auto-edit-with-msgid@r{, PO Mode variable}
3866It is possible to arrange so, whenever editing an untranslated
3867entry, the @kbd{@key{LFD}} command be automatically executed.  If you set
3868@code{po-auto-edit-with-msgid} to @code{t}, the translation gets
3869initialised with the original string, in case none exists already.
3870The default value for @code{po-auto-edit-with-msgid} is @code{nil}.
3871
3872@emindex starting a string translation
3873In fact, whether it is best to start a translation with an empty
3874string, or rather with a copy of the original string, is a matter of
3875taste or habit.  Sometimes, the source language and the
3876target language are so different that is simply best to start writing
3877on an empty page.  At other times, the source and target languages
3878are so close that it would be a waste to retype a number of words
3879already being written in the original string.  A translator may also
3880like having the original string right under her eyes, as she will
3881progressively overwrite the original text with the translation, even
3882if this requires some extra editing work to get rid of the original.
3883
3884@emindex cut and paste for translated strings
3885@efindex k@r{, PO Mode command}
3886@efindex po-kill-msgstr@r{, PO Mode command}
3887@efindex w@r{, PO Mode command}
3888@efindex po-kill-ring-save-msgstr@r{, PO Mode command}
3889The command @kbd{k} (@code{po-kill-msgstr}) merely empties the
3890translation string, so turning the entry into an untranslated
3891one.  But while doing so, its previous contents is put apart in
3892a special place, known as the kill ring.  The command @kbd{w}
3893(@code{po-kill-ring-save-msgstr}) has also the effect of taking a
3894copy of the translation onto the kill ring, but it otherwise leaves
3895the entry alone, and does @emph{not} remove the translation from the
3896entry.  Both commands use exactly the Emacs kill ring, which is shared
3897between buffers, and which is well known already to Emacs lovers.
3898
3899The translator may use @kbd{k} or @kbd{w} many times in the course
3900of her work, as the kill ring may hold several saved translations.
3901From the kill ring, strings may later be reinserted in various
3902Emacs buffers.  In particular, the kill ring may be used for moving
3903translation strings between different entries of a single PO file
3904buffer, or if the translator is handling many such buffers at once,
3905even between PO files.
3906
3907To facilitate exchanges with buffers which are not in PO mode, the
3908translation string put on the kill ring by the @kbd{k} command is fully
3909unquoted before being saved: external quotes are removed, multi-line
3910strings are concatenated, and backslash escaped sequences are turned
3911into their corresponding characters.  In the special case of obsolete
3912entries, the translation is also uncommented prior to saving.
3913
3914@efindex y@r{, PO Mode command}
3915@efindex po-yank-msgstr@r{, PO Mode command}
3916The command @kbd{y} (@code{po-yank-msgstr}) completely replaces the
3917translation of the current entry by a string taken from the kill ring.
3918Following Emacs terminology, we then say that the replacement
3919string is @dfn{yanked} into the PO file buffer.
3920@xref{Yanking, , , emacs, The Emacs Editor}.
3921The first time @kbd{y} is used, the translation receives the value of
3922the most recent addition to the kill ring.  If @kbd{y} is typed once
3923again, immediately, without intervening keystrokes, the translation
3924just inserted is taken away and replaced by the second most recent
3925addition to the kill ring.  By repeating @kbd{y} many times in a row,
3926the translator may travel along the kill ring for saved strings,
3927until she finds the string she really wanted.
3928
3929When a string is yanked into a PO file entry, it is fully and
3930automatically requoted for complying with the format PO files should
3931have.  Further, if the entry is obsolete, PO mode then appropriately
3932push the inserted string inside comments.  Once again, translators
3933should not burden themselves with quoting considerations besides, of
3934course, the necessity of the translated string itself respective to
3935the program using it.
3936
3937Note that @kbd{k} or @kbd{w} are not the only commands pushing strings
3938on the kill ring, as almost any PO mode command replacing translation
3939strings (or the translator comments) automatically saves the old string
3940on the kill ring.  The main exceptions to this general rule are the
3941yanking commands themselves.
3942
3943@emindex using obsolete translations to make new entries
3944To better illustrate the operation of killing and yanking, let's
3945use an actual example, taken from a common situation.  When the
3946programmer slightly modifies some string right in the program, his
3947change is later reflected in the PO file by the appearance
3948of a new untranslated entry for the modified string, and the fact
3949that the entry translating the original or unmodified string becomes
3950obsolete.  In many cases, the translator might spare herself some work
3951by retrieving the unmodified translation from the obsolete entry,
3952then initializing the untranslated entry @code{msgstr} field with
3953this retrieved translation.  Once this done, the obsolete entry is
3954not wanted anymore, and may be safely deleted.
3955
3956When the translator finds an untranslated entry and suspects that a
3957slight variant of the translation exists, she immediately uses @kbd{m}
3958to mark the current entry location, then starts chasing obsolete
3959entries with @kbd{o}, hoping to find some translation corresponding
3960to the unmodified string.  Once found, she uses the @kbd{@key{DEL}} command
3961for deleting the obsolete entry, knowing that @kbd{@key{DEL}} also @emph{kills}
3962the translation, that is, pushes the translation on the kill ring.
3963Then, @kbd{r} returns to the initial untranslated entry, and @kbd{y}
3964then @emph{yanks} the saved translation right into the @code{msgstr}
3965field.  The translator is then free to use @kbd{@key{RET}} for fine
3966tuning the translation contents, and maybe to later use @kbd{u},
3967then @kbd{m} again, for going on with the next untranslated string.
3968
3969When some sequence of keys has to be typed over and over again, the
3970translator may find it useful to become better acquainted with the Emacs
3971capability of learning these sequences and playing them back under request.
3972@xref{Keyboard Macros, , , emacs, The Emacs Editor}.
3973
3974@node Modifying Comments, Subedit, Modifying Translations, PO Mode
3975@subsection Modifying Comments
3976@cindex editing comments in PO files
3977@emindex editing comments
3978
3979Any translation work done seriously will raise many linguistic
3980difficulties, for which decisions have to be made, and the choices
3981further documented.  These documents may be saved within the
3982PO file in form of translator comments, which the translator
3983is free to create, delete, or modify at will.  These comments may
3984be useful to herself when she returns to this PO file after a while.
3985
3986Comments not having whitespace after the initial @samp{#}, for example,
3987those beginning with @samp{#.} or @samp{#:}, are @emph{not} translator
3988comments, they are exclusively created by other @code{gettext} tools.
3989So, the commands below will never alter such system added comments,
3990they are not meant for the translator to modify.  @xref{PO Files}.
3991
3992The following commands are somewhat similar to those modifying translations,
3993so the general indications given for those apply here.  @xref{Modifying
3994Translations}.
3995
3996@table @kbd
3997
3998@item #
3999@efindex #@r{, PO Mode command}
4000Interactively edit the translator comments (@code{po-edit-comment}).
4001
4002@item K
4003@efindex K@r{, PO Mode command}
4004Save the translator comments on the kill ring, and delete it
4005(@code{po-kill-comment}).
4006
4007@item W
4008@efindex W@r{, PO Mode command}
4009Save the translator comments on the kill ring, without deleting it
4010(@code{po-kill-ring-save-comment}).
4011
4012@item Y
4013@efindex Y@r{, PO Mode command}
4014Replace the translator comments, taking the new from the kill ring
4015(@code{po-yank-comment}).
4016
4017@end table
4018
4019These commands parallel PO mode commands for modifying the translation
4020strings, and behave much the same way as they do, except that they handle
4021this part of PO file comments meant for translator usage, rather
4022than the translation strings.  So, if the descriptions given below are
4023slightly succinct, it is because the full details have already been given.
4024@xref{Modifying Translations}.
4025
4026@efindex #@r{, PO Mode command}
4027@efindex po-edit-comment@r{, PO Mode command}
4028The command @kbd{#} (@code{po-edit-comment}) opens a new Emacs window
4029containing a copy of the translator comments on the current PO file entry.
4030If there are no such comments, PO mode understands that the translator wants
4031to add a comment to the entry, and she is presented with an empty screen.
4032Comment marks (@code{#}) and the space following them are automatically
4033removed before edition, and reinstated after.  For translator comments
4034pertaining to obsolete entries, the uncommenting and recommenting operations
4035are done twice.  Once in the editing window, the keys @w{@kbd{C-c C-c}}
4036allow the translator to tell she is finished with editing the comment.
4037@xref{Subedit}, for further details.
4038
4039@evindex po-subedit-mode-hook@r{, PO Mode variable}
4040Functions found on @code{po-subedit-mode-hook}, if any, are executed after
4041the string has been inserted in the edit buffer.
4042
4043@efindex K@r{, PO Mode command}
4044@efindex po-kill-comment@r{, PO Mode command}
4045@efindex W@r{, PO Mode command}
4046@efindex po-kill-ring-save-comment@r{, PO Mode command}
4047@efindex Y@r{, PO Mode command}
4048@efindex po-yank-comment@r{, PO Mode command}
4049The command @kbd{K} (@code{po-kill-comment}) gets rid of all
4050translator comments, while saving those comments on the kill ring.
4051The command @kbd{W} (@code{po-kill-ring-save-comment}) takes
4052a copy of the translator comments on the kill ring, but leaves
4053them undisturbed in the current entry.  The command @kbd{Y}
4054(@code{po-yank-comment}) completely replaces the translator comments
4055by a string taken at the front of the kill ring.  When this command
4056is immediately repeated, the comments just inserted are withdrawn,
4057and replaced by other strings taken along the kill ring.
4058
4059On the kill ring, all strings have the same nature.  There is no
4060distinction between @emph{translation} strings and @emph{translator
4061comments} strings.  So, for example, let's presume the translator
4062has just finished editing a translation, and wants to create a new
4063translator comment to document why the previous translation was
4064not good, just to remember what was the problem.  Foreseeing that she
4065will do that in her documentation, the translator may want to quote
4066the previous translation in her translator comments.  To do so, she
4067may initialize the translator comments with the previous translation,
4068still at the head of the kill ring.  Because editing already pushed the
4069previous translation on the kill ring, she merely has to type @kbd{M-w}
4070prior to @kbd{#}, and the previous translation will be right there,
4071all ready for being introduced by some explanatory text.
4072
4073On the other hand, presume there are some translator comments already
4074and that the translator wants to add to those comments, instead
4075of wholly replacing them.  Then, she should edit the comment right
4076away with @kbd{#}.  Once inside the editing window, she can use the
4077regular Emacs commands @kbd{C-y} (@code{yank}) and @kbd{M-y}
4078(@code{yank-pop}) to get the previous translation where she likes.
4079
4080@node Subedit, C Sources Context, Modifying Comments, PO Mode
4081@subsection Details of Sub Edition
4082@emindex subedit minor mode
4083
4084The PO subedit minor mode has a few peculiarities worth being described
4085in fuller detail.  It installs a few commands over the usual editing set
4086of Emacs, which are described below.
4087
4088@table @kbd
4089@item C-c C-c
4090@efindex C-c C-c@r{, PO Mode command}
4091Complete edition (@code{po-subedit-exit}).
4092
4093@item C-c C-k
4094@efindex C-c C-k@r{, PO Mode command}
4095Abort edition (@code{po-subedit-abort}).
4096
4097@item C-c C-a
4098@efindex C-c C-a@r{, PO Mode command}
4099Consult auxiliary PO files (@code{po-subedit-cycle-auxiliary}).
4100
4101@end table
4102
4103@emindex exiting PO subedit
4104@efindex C-c C-c@r{, PO Mode command}
4105@efindex po-subedit-exit@r{, PO Mode command}
4106The window's contents represents a translation for a given message,
4107or a translator comment.  The translator may modify this window to
4108her heart's content.  Once this is done, the command @w{@kbd{C-c C-c}}
4109(@code{po-subedit-exit}) may be used to return the edited translation into
4110the PO file, replacing the original translation, even if it moved out of
4111sight or if buffers were switched.
4112
4113@efindex C-c C-k@r{, PO Mode command}
4114@efindex po-subedit-abort@r{, PO Mode command}
4115If the translator becomes unsatisfied with her translation or comment,
4116to the extent she prefers keeping what was existent prior to the
4117@kbd{@key{RET}} or @kbd{#} command, she may use the command @w{@kbd{C-c C-k}}
4118(@code{po-subedit-abort}) to merely get rid of edition, while preserving
4119the original translation or comment.  Another way would be for her to exit
4120normally with @w{@kbd{C-c C-c}}, then type @code{U} once for undoing the
4121whole effect of last edition.
4122
4123@efindex C-c C-a@r{, PO Mode command}
4124@efindex po-subedit-cycle-auxiliary@r{, PO Mode command}
4125The command @w{@kbd{C-c C-a}} (@code{po-subedit-cycle-auxiliary})
4126allows for glancing through translations
4127already achieved in other languages, directly while editing the current
4128translation.  This may be quite convenient when the translator is fluent
4129at many languages, but of course, only makes sense when such completed
4130auxiliary PO files are already available to her (@pxref{Auxiliary}).
4131
4132Functions found on @code{po-subedit-mode-hook}, if any, are executed after
4133the string has been inserted in the edit buffer.
4134
4135While editing her translation, the translator should pay attention to not
4136inserting unwanted @kbd{@key{RET}} (newline) characters at the end of
4137the translated string if those are not meant to be there, or to removing
4138such characters when they are required.  Since these characters are not
4139visible in the editing buffer, they are easily introduced by mistake.
4140To help her, @kbd{@key{RET}} automatically puts the character @code{<}
4141at the end of the string being edited, but this @code{<} is not really
4142part of the string.  On exiting the editing window with @w{@kbd{C-c C-c}},
4143PO mode automatically removes such @kbd{<} and all whitespace added after
4144it.  If the translator adds characters after the terminating @code{<}, it
4145looses its delimiting property and integrally becomes part of the string.
4146If she removes the delimiting @code{<}, then the edited string is taken
4147@emph{as is}, with all trailing newlines, even if invisible.  Also, if
4148the translated string ought to end itself with a genuine @code{<}, then
4149the delimiting @code{<} may not be removed; so the string should appear,
4150in the editing window, as ending with two @code{<} in a row.
4151
4152@emindex editing multiple entries
4153When a translation (or a comment) is being edited, the translator may move
4154the cursor back into the PO file buffer and freely move to other entries,
4155browsing at will.  If, with an edition pending, the translator wanders in the
4156PO file buffer, she may decide to start modifying another entry.  Each entry
4157being edited has its own subedit buffer.  It is possible to simultaneously
4158edit the translation @emph{and} the comment of a single entry, or to
4159edit entries in different PO files, all at once.  Typing @kbd{@key{RET}}
4160on a field already being edited merely resumes that particular edit.  Yet,
4161the translator should better be comfortable at handling many Emacs windows!
4162
4163@emindex pending subedits
4164Pending subedits may be completed or aborted in any order, regardless
4165of how or when they were started.  When many subedits are pending and the
4166translator asks for quitting the PO file (with the @kbd{q} command), subedits
4167are automatically resumed one at a time, so she may decide for each of them.
4168
4169@node C Sources Context, Auxiliary, Subedit, PO Mode
4170@subsection C Sources Context
4171@emindex consulting program sources
4172@emindex looking at the source to aid translation
4173@emindex use the source, Luke
4174
4175PO mode is particularly powerful when used with PO files
4176created through GNU @code{gettext} utilities, as those utilities
4177insert special comments in the PO files they generate.
4178Some of these special comments relate the PO file entry to
4179exactly where the untranslated string appears in the program sources.
4180
4181When the translator gets to an untranslated entry, she is fairly
4182often faced with an original string which is not as informative as
4183it normally should be, being succinct, cryptic, or otherwise ambiguous.
4184Before choosing how to translate the string, she needs to understand
4185better what the string really means and how tight the translation has
4186to be.  Most of the time, when problems arise, the only way left to make
4187her judgment is looking at the true program sources from where this
4188string originated, searching for surrounding comments the programmer
4189might have put in there, and looking around for helping clues of
4190@emph{any} kind.
4191
4192Surely, when looking at program sources, the translator will receive
4193more help if she is a fluent programmer.  However, even if she is
4194not versed in programming and feels a little lost in C code, the
4195translator should not be shy at taking a look, once in a while.
4196It is most probable that she will still be able to find some of the
4197hints she needs.  She will learn quickly to not feel uncomfortable
4198in program code, paying more attention to programmer's comments,
4199variable and function names (if he dared choosing them well), and
4200overall organization, than to the program code itself.
4201
4202@emindex find source fragment for a PO file entry
4203The following commands are meant to help the translator at getting
4204program source context for a PO file entry.
4205
4206@table @kbd
4207@item s
4208@efindex s@r{, PO Mode command}
4209Resume the display of a program source context, or cycle through them
4210(@code{po-cycle-source-reference}).
4211
4212@item M-s
4213@efindex M-s@r{, PO Mode command}
4214Display of a program source context selected by menu
4215(@code{po-select-source-reference}).
4216
4217@item S
4218@efindex S@r{, PO Mode command}
4219Add a directory to the search path for source files
4220(@code{po-consider-source-path}).
4221
4222@item M-S
4223@efindex M-S@r{, PO Mode command}
4224Delete a directory from the search path for source files
4225(@code{po-ignore-source-path}).
4226
4227@end table
4228
4229@efindex s@r{, PO Mode command}
4230@efindex po-cycle-source-reference@r{, PO Mode command}
4231@efindex M-s@r{, PO Mode command}
4232@efindex po-select-source-reference@r{, PO Mode command}
4233The commands @kbd{s} (@code{po-cycle-source-reference}) and @kbd{M-s}
4234(@code{po-select-source-reference}) both open another window displaying
4235some source program file, and already positioned in such a way that
4236it shows an actual use of the string to be translated.  By doing
4237so, the command gives source program context for the string.  But if
4238the entry has no source context references, or if all references
4239are unresolved along the search path for program sources, then the
4240command diagnoses this as an error.
4241
4242Even if @kbd{s} (or @kbd{M-s}) opens a new window, the cursor stays
4243in the PO file window.  If the translator really wants to
4244get into the program source window, she ought to do it explicitly,
4245maybe by using command @kbd{O}.
4246
4247When @kbd{s} is typed for the first time, or for a PO file entry which
4248is different of the last one used for getting source context, then the
4249command reacts by giving the first context available for this entry,
4250if any.  If some context has already been recently displayed for the
4251current PO file entry, and the translator wandered off to do other
4252things, typing @kbd{s} again will merely resume, in another window,
4253the context last displayed.  In particular, if the translator moved
4254the cursor away from the context in the source file, the command will
4255bring the cursor back to the context.  By using @kbd{s} many times
4256in a row, with no other commands intervening, PO mode will cycle to
4257the next available contexts for this particular entry, getting back
4258to the first context once the last has been shown.
4259
4260The command @kbd{M-s} behaves differently.  Instead of cycling through
4261references, it lets the translator choose a particular reference among
4262many, and displays that reference.  It is best used with completion,
4263if the translator types @kbd{@key{TAB}} immediately after @kbd{M-s}, in
4264response to the question, she will be offered a menu of all possible
4265references, as a reminder of which are the acceptable answers.
4266This command is useful only where there are really many contexts
4267available for a single string to translate.
4268
4269@efindex S@r{, PO Mode command}
4270@efindex po-consider-source-path@r{, PO Mode command}
4271@efindex M-S@r{, PO Mode command}
4272@efindex po-ignore-source-path@r{, PO Mode command}
4273Program source files are usually found relative to where the PO
4274file stands.  As a special provision, when this fails, the file is
4275also looked for, but relative to the directory immediately above it.
4276Those two cases take proper care of most PO files.  However, it might
4277happen that a PO file has been moved, or is edited in a different
4278place than its normal location.  When this happens, the translator
4279should tell PO mode in which directory normally sits the genuine PO
4280file.  Many such directories may be specified, and all together, they
4281constitute what is called the @dfn{search path} for program sources.
4282The command @kbd{S} (@code{po-consider-source-path}) is used to interactively
4283enter a new directory at the front of the search path, and the command
4284@kbd{M-S} (@code{po-ignore-source-path}) is used to select, with completion,
4285one of the directories she does not want anymore on the search path.
4286
4287@node Auxiliary,  , C Sources Context, PO Mode
4288@subsection Consulting Auxiliary PO Files
4289@emindex consulting translations to other languages
4290
4291PO mode is able to help the knowledgeable translator, being fluent in
4292many languages, at taking advantage of translations already achieved
4293in other languages she just happens to know.  It provides these other
4294language translations as additional context for her own work.  Moreover,
4295it has features to ease the production of translations for many languages
4296at once, for translators preferring to work in this way.
4297
4298@cindex auxiliary PO file
4299@emindex auxiliary PO file
4300An @dfn{auxiliary} PO file is an existing PO file meant for the same
4301package the translator is working on, but targeted to a different mother
4302tongue language.  Commands exist for declaring and handling auxiliary
4303PO files, and also for showing contexts for the entry under work.
4304
4305Here are the auxiliary file commands available in PO mode.
4306
4307@table @kbd
4308@item a
4309@efindex a@r{, PO Mode command}
4310Seek auxiliary files for another translation for the same entry
4311(@code{po-cycle-auxiliary}).
4312
4313@item C-c C-a
4314@efindex C-c C-a@r{, PO Mode command}
4315Switch to a particular auxiliary file (@code{po-select-auxiliary}).
4316
4317@item A
4318@efindex A@r{, PO Mode command}
4319Declare this PO file as an auxiliary file (@code{po-consider-as-auxiliary}).
4320
4321@item M-A
4322@efindex M-A@r{, PO Mode command}
4323Remove this PO file from the list of auxiliary files
4324(@code{po-ignore-as-auxiliary}).
4325
4326@end table
4327
4328@efindex A@r{, PO Mode command}
4329@efindex po-consider-as-auxiliary@r{, PO Mode command}
4330@efindex M-A@r{, PO Mode command}
4331@efindex po-ignore-as-auxiliary@r{, PO Mode command}
4332Command @kbd{A} (@code{po-consider-as-auxiliary}) adds the current
4333PO file to the list of auxiliary files, while command @kbd{M-A}
4334(@code{po-ignore-as-auxiliary} just removes it.
4335
4336@efindex a@r{, PO Mode command}
4337@efindex po-cycle-auxiliary@r{, PO Mode command}
4338The command @kbd{a} (@code{po-cycle-auxiliary}) seeks all auxiliary PO
4339files, round-robin, searching for a translated entry in some other language
4340having an @code{msgid} field identical as the one for the current entry.
4341The found PO file, if any, takes the place of the current PO file in
4342the display (its window gets on top).  Before doing so, the current PO
4343file is also made into an auxiliary file, if not already.  So, @kbd{a}
4344in this newly displayed PO file will seek another PO file, and so on,
4345so repeating @kbd{a} will eventually yield back the original PO file.
4346
4347@efindex C-c C-a@r{, PO Mode command}
4348@efindex po-select-auxiliary@r{, PO Mode command}
4349The command @kbd{C-c C-a} (@code{po-select-auxiliary}) asks the translator
4350for her choice of a particular auxiliary file, with completion, and
4351then switches to that selected PO file.  The command also checks if
4352the selected file has an @code{msgid} field identical as the one for
4353the current entry, and if yes, this entry becomes current.  Otherwise,
4354the cursor of the selected file is left undisturbed.
4355
4356For all this to work fully, auxiliary PO files will have to be normalized,
4357in that way that @code{msgid} fields should be written @emph{exactly}
4358the same way.  It is possible to write @code{msgid} fields in various
4359ways for representing the same string, different writing would break the
4360proper behaviour of the auxiliary file commands of PO mode.  This is not
4361expected to be much a problem in practice, as most existing PO files have
4362their @code{msgid} entries written by the same GNU @code{gettext} tools.
4363
4364@efindex normalize@r{, PO Mode command}
4365However, PO files initially created by PO mode itself, while marking
4366strings in source files, are normalised differently.  So are PO
4367files resulting of the @samp{M-x normalize} command.  Until these
4368discrepancies between PO mode and other GNU @code{gettext} tools get
4369fully resolved, the translator should stay aware of normalisation issues.
4370
4371@node Compendium,  , PO Mode, Editing
4372@section Using Translation Compendia
4373@emindex using translation compendia
4374
4375@cindex compendium
4376A @dfn{compendium} is a special PO file containing a set of
4377translations recurring in many different packages.  The translator can
4378use gettext tools to build a new compendium, to add entries to her
4379compendium, and to initialize untranslated entries, or to update
4380already translated entries, from translations kept in the compendium.
4381
4382@menu
4383* Creating Compendia::          Merging translations for later use
4384* Using Compendia::             Using older translations if they fit
4385@end menu
4386
4387@node Creating Compendia, Using Compendia, Compendium, Compendium
4388@subsection Creating Compendia
4389@cindex creating compendia
4390@cindex compendium, creating
4391
4392Basically every PO file consisting of translated entries only can be
4393declared as a valid compendium.  Often the translator wants to have
4394special compendia; let's consider two cases: @cite{concatenating PO
4395files} and @cite{extracting a message subset from a PO file}.
4396
4397@subsubsection Concatenate PO Files
4398
4399@cindex concatenating PO files into a compendium
4400@cindex accumulating translations
4401To concatenate several valid PO files into one compendium file you can
4402use @samp{msgcomm} or @samp{msgcat} (the latter preferred):
4403
4404@example
4405msgcat -o compendium.po file1.po file2.po
4406@end example
4407
4408By default, @code{msgcat} will accumulate divergent translations
4409for the same string.  Those occurrences will be marked as @code{fuzzy}
4410and highly visible decorated; calling @code{msgcat} on
4411@file{file1.po}:
4412
4413@example
4414#: src/hello.c:200
4415#, c-format
4416msgid "Report bugs to <%s>.\n"
4417msgstr "Comunicar `bugs' a <%s>.\n"
4418@end example
4419
4420@noindent
4421and @file{file2.po}:
4422
4423@example
4424#: src/bye.c:100
4425#, c-format
4426msgid "Report bugs to <%s>.\n"
4427msgstr "Comunicar \"bugs\" a <%s>.\n"
4428@end example
4429
4430@noindent
4431will result in:
4432
4433@example
4434#: src/hello.c:200 src/bye.c:100
4435#, fuzzy, c-format
4436msgid "Report bugs to <%s>.\n"
4437msgstr ""
4438"#-#-#-#-#  file1.po  #-#-#-#-#\n"
4439"Comunicar `bugs' a <%s>.\n"
4440"#-#-#-#-#  file2.po  #-#-#-#-#\n"
4441"Comunicar \"bugs\" a <%s>.\n"
4442@end example
4443
4444@noindent
4445The translator will have to resolve this ``conflict'' manually; she
4446has to decide whether the first or the second version is appropriate
4447(or provide a new translation), to delete the ``marker lines'', and
4448finally to remove the @code{fuzzy} mark.
4449
4450If the translator knows in advance the first found translation of a
4451message is always the best translation she can make use to the
4452@samp{--use-first} switch:
4453
4454@example
4455msgcat --use-first -o compendium.po file1.po file2.po
4456@end example
4457
4458A good compendium file must not contain @code{fuzzy} or untranslated
4459entries.  If input files are ``dirty'' you must preprocess the input
4460files or postprocess the result using @samp{msgattrib --translated --no-fuzzy}.
4461
4462@subsubsection Extract a Message Subset from a PO File
4463@cindex extracting parts of a PO file into a compendium
4464
4465Nobody wants to translate the same messages again and again; thus you
4466may wish to have a compendium file containing @file{getopt.c} messages.
4467
4468To extract a message subset (e.g., all @file{getopt.c} messages) from an
4469existing PO file into one compendium file you can use @samp{msggrep}:
4470
4471@example
4472msggrep --location src/getopt.c -o compendium.po file.po
4473@end example
4474
4475@node Using Compendia,  , Creating Compendia, Compendium
4476@subsection Using Compendia
4477
4478You can use a compendium file to initialize a translation from scratch
4479or to update an already existing translation.
4480
4481@subsubsection Initialize a New Translation File
4482@cindex initialize translations from a compendium
4483
4484Since a PO file with translations does not exist the translator can
4485merely use @file{/dev/null} to fake the ``old'' translation file.
4486
4487@example
4488msgmerge --compendium compendium.po -o file.po /dev/null file.pot
4489@end example
4490
4491@subsubsection Update an Existing Translation File
4492@cindex update translations from a compendium
4493
4494Concatenate the compendium file(s) and the existing PO, merge the
4495result with the POT file and remove the obsolete entries (optional,
4496here done using @samp{sed}):
4497
4498@example
4499msgcat --use-first -o update.po compendium1.po compendium2.po file.po
4500msgmerge update.po file.pot | msgattrib --no-obsolete > file.po
4501@end example
4502
4503@node Manipulating, Binaries, Editing, Top
4504@chapter Manipulating PO Files
4505@cindex manipulating PO files
4506
4507Sometimes it is necessary to manipulate PO files in a way that is better
4508performed automatically than by hand.  GNU @code{gettext} includes a
4509complete set of tools for this purpose.
4510
4511@cindex merging two PO files
4512When merging two packages into a single package, the resulting POT file
4513will be the concatenation of the two packages' POT files.  Thus the
4514maintainer must concatenate the two existing package translations into
4515a single translation catalog, for each language.  This is best performed
4516using @samp{msgcat}.  It is then the translators' duty to deal with any
4517possible conflicts that arose during the merge.
4518
4519@cindex encoding conversion
4520When a translator takes over the translation job from another translator,
4521but she uses a different character encoding in her locale, she will
4522convert the catalog to her character encoding.  This is best done through
4523the @samp{msgconv} program.
4524
4525When a maintainer takes a source file with tagged messages from another
4526package, he should also take the existing translations for this source
4527file (and not let the translators do the same job twice).  One way to do
4528this is through @samp{msggrep}, another is to create a POT file for
4529that source file and use @samp{msgmerge}.
4530
4531@cindex dialect
4532@cindex orthography
4533When a translator wants to adjust some translation catalog for a special
4534dialect or orthography --- for example, German as written in Switzerland
4535versus German as written in Germany --- she needs to apply some text
4536processing to every message in the catalog.  The tool for doing this is
4537@samp{msgfilter}.
4538
4539Another use of @code{msgfilter} is to produce approximately the POT file for
4540which a given PO file was made.  This can be done through a filter command
4541like @samp{msgfilter sed -e d | sed -e '/^# /d'}.  Note that the original
4542POT file may have had different comments and different plural message counts,
4543that's why it's better to use the original POT file if available.
4544
4545@cindex checking of translations
4546When a translator wants to check her translations, for example according
4547to orthography rules or using a non-interactive spell checker, she can do
4548so using the @samp{msgexec} program.
4549
4550@cindex duplicate elimination
4551When third party tools create PO or POT files, sometimes duplicates cannot
4552be avoided.  But the GNU @code{gettext} tools give an error when they
4553encounter duplicate msgids in the same file and in the same domain.
4554To merge duplicates, the @samp{msguniq} program can be used.
4555
4556@samp{msgcomm} is a more general tool for keeping or throwing away
4557duplicates, occurring in different files.
4558
4559@samp{msgcmp} can be used to check whether a translation catalog is
4560completely translated.
4561
4562@cindex attributes, manipulating
4563@samp{msgattrib} can be used to select and extract only the fuzzy
4564or untranslated messages of a translation catalog.
4565
4566@samp{msgen} is useful as a first step for preparing English translation
4567catalogs.  It copies each message's msgid to its msgstr.
4568
4569Finally, for those applications where all these various programs are not
4570sufficient, a library @samp{libgettextpo} is provided that can be used to
4571write other specialized programs that process PO files.
4572
4573@menu
4574* msgcat Invocation::           Invoking the @code{msgcat} Program
4575* msgconv Invocation::          Invoking the @code{msgconv} Program
4576* msggrep Invocation::          Invoking the @code{msggrep} Program
4577* msgfilter Invocation::        Invoking the @code{msgfilter} Program
4578* msguniq Invocation::          Invoking the @code{msguniq} Program
4579* msgcomm Invocation::          Invoking the @code{msgcomm} Program
4580* msgcmp Invocation::           Invoking the @code{msgcmp} Program
4581* msgattrib Invocation::        Invoking the @code{msgattrib} Program
4582* msgen Invocation::            Invoking the @code{msgen} Program
4583* msgexec Invocation::          Invoking the @code{msgexec} Program
4584* Colorizing::                  Highlighting parts of PO files
4585* libgettextpo::                Writing your own programs that process PO files
4586@end menu
4587
4588@node msgcat Invocation, msgconv Invocation, Manipulating, Manipulating
4589@section Invoking the @code{msgcat} Program
4590
4591@include msgcat.texi
4592
4593@node msgconv Invocation, msggrep Invocation, msgcat Invocation, Manipulating
4594@section Invoking the @code{msgconv} Program
4595
4596@include msgconv.texi
4597
4598@node msggrep Invocation, msgfilter Invocation, msgconv Invocation, Manipulating
4599@section Invoking the @code{msggrep} Program
4600
4601@include msggrep.texi
4602
4603@node msgfilter Invocation, msguniq Invocation, msggrep Invocation, Manipulating
4604@section Invoking the @code{msgfilter} Program
4605
4606@include msgfilter.texi
4607
4608@node msguniq Invocation, msgcomm Invocation, msgfilter Invocation, Manipulating
4609@section Invoking the @code{msguniq} Program
4610
4611@include msguniq.texi
4612
4613@node msgcomm Invocation, msgcmp Invocation, msguniq Invocation, Manipulating
4614@section Invoking the @code{msgcomm} Program
4615
4616@include msgcomm.texi
4617
4618@node msgcmp Invocation, msgattrib Invocation, msgcomm Invocation, Manipulating
4619@section Invoking the @code{msgcmp} Program
4620
4621@include msgcmp.texi
4622
4623@node msgattrib Invocation, msgen Invocation, msgcmp Invocation, Manipulating
4624@section Invoking the @code{msgattrib} Program
4625
4626@include msgattrib.texi
4627
4628@node msgen Invocation, msgexec Invocation, msgattrib Invocation, Manipulating
4629@section Invoking the @code{msgen} Program
4630
4631@include msgen.texi
4632
4633@node msgexec Invocation, Colorizing, msgen Invocation, Manipulating
4634@section Invoking the @code{msgexec} Program
4635
4636@include msgexec.texi
4637
4638@node Colorizing, libgettextpo, msgexec Invocation, Manipulating
4639@section Highlighting parts of PO files
4640
4641Translators are usually only interested in seeing the untranslated and
4642fuzzy messages of a PO file.  Also, when a message is set fuzzy because
4643the msgid changed, they want to see the differences between the previous
4644msgid and the current one (especially if the msgid is long and only few
4645words in it have changed).  Finally, it's always welcome to highlight the
4646different sections of a message in a PO file (comments, msgid, msgstr, etc.).
4647
4648Such highlighting is possible through the @code{msgcat} options
4649@samp{--color} and @samp{--style}.
4650
4651@menu
4652* The --color option::          Triggering colorized output
4653* The TERM variable::           The environment variable @code{TERM}
4654* The --style option::          The @code{--style} option
4655* Style rules::                 Style rules for PO files
4656* Customizing less::            Customizing @code{less} for viewing PO files
4657@end menu
4658
4659@node The --color option, The TERM variable,  , Colorizing
4660@subsection The @code{--color} option
4661
4662@opindex --color@r{, @code{msgcat} option}
4663The @samp{--color=@var{when}} option specifies under which conditions
4664colorized output should be generated.  The @var{when} part can be one of
4665the following:
4666
4667@table @code
4668@item always
4669@itemx yes
4670The output will be colorized.
4671
4672@item never
4673@itemx no
4674The output will not be colorized.
4675
4676@item auto
4677@itemx tty
4678The output will be colorized if the output device is a tty, i.e.@: when the
4679output goes directly to a text screen or terminal emulator window.
4680
4681@item html
4682The output will be colorized and be in HTML format.
4683@end table
4684
4685@noindent
4686@samp{--color} is equivalent to @samp{--color=yes}.  The default is
4687@samp{--color=auto}.
4688
4689Thus, a command like @samp{msgcat vi.po} will produce colorized output
4690when called by itself in a command window.  Whereas in a pipe, such as
4691@samp{msgcat vi.po | less -R}, it will not produce colorized output.  To
4692get colorized output in this situation nevertheless, use the command
4693@samp{msgcat --color vi.po | less -R}.
4694
4695The @samp{--color=html} option will produce output that can be viewed in
4696a browser.  This can be useful, for example, for Indic languages,
4697because the renderic of Indic scripts in browser is usually better than
4698in terminal emulators.
4699
4700Note that the output produced with the @code{--color} option is @emph{not}
4701a valid PO file in itself.  It contains additional terminal-specific escape
4702sequences or HTML tags.  A PO file reader will give a syntax error when
4703confronted with such content.  Except for the @samp{--color=html} case,
4704you therefore normally don't need to save output produced with the
4705@code{--color} option in a file.
4706
4707@node The TERM variable, The --style option, The --color option, Colorizing
4708@subsection The environment variable @code{TERM}
4709
4710@vindex TERM@r{, environment variable}
4711The environment variable @code{TERM} contains a identifier for the text
4712window's capabilities.  You can get a detailed list of these cababilities
4713by using the @samp{infocmp} command, using @samp{man 5 terminfo} as a
4714reference.
4715
4716When producing text with embedded color directives, @code{msgcat} looks
4717at the @code{TERM} variable.  Text windows today typically support at least
47188 colors.  Often, however, the text window supports 16 or more colors,
4719even though the @code{TERM} variable is set to a identifier denoting only
47208 supported colors.  It can be worth setting the @code{TERM} variable to
4721a different value in these cases:
4722
4723@table @code
4724@item xterm
4725@code{xterm} is in most cases built with support for 16 colors.  It can also
4726be built with support for 88 or 256 colors (but not both).  You can try to
4727set @code{TERM} to either @code{xterm-16color}, @code{xterm-88color}, or
4728@code{xterm-256color}.
4729
4730@item rxvt
4731@code{rxvt} is often built with support for 16 colors.  You can try to set
4732@code{TERM} to @code{rxvt-16color}.
4733
4734@item konsole
4735@code{konsole} too is often built with support for 16 colors.  You can try to
4736set @code{TERM} to @code{konsole-16color} or @code{xterm-16color}.
4737@end table
4738
4739After setting @code{TERM}, you can verify it by invoking
4740@samp{msgcat --color=test} and seeing whether the output looks like a
4741reasonable color map.
4742
4743@node The --style option, Style rules, The TERM variable, Colorizing
4744@subsection The @code{--style} option
4745
4746@opindex --style@r{, @code{msgcat} option}
4747The @samp{--style=@var{style_file}} option specifies the style file to use
4748when colorizing.  It has an effect only when the @code{--color} option is
4749effective.
4750
4751@vindex PO_STYLE@r{, environment variable}
4752If the @code{--style} option is not specified, the environment variable
4753@code{PO_STYLE} is considered.  It is meant to point to the user's
4754preferred style for PO files.
4755
4756The default style file is @file{$prefix/share/gettext/styles/po-default.css},
4757where @code{$prefix} is the installation location.
4758
4759A few style files are predefined:
4760@table @file
4761@item po-vim.css
4762This style imitates the look used by vim 7.
4763
4764@item po-emacs-x.css
4765This style imitates the look used by GNU Emacs 21 and 22 in an X11 window.
4766
4767@item po-emacs-xterm.css
4768@itemx po-emacs-xterm16.css
4769@itemx po-emacs-xterm256.css
4770This style imitates the look used by GNU Emacs 22 in a terminal of type
4771@samp{xterm} (8 colors) or @samp{xterm-16color} (16 colors) or
4772@samp{xterm-256color} (256 colors), respectively.
4773@end table
4774
4775@noindent
4776You can use these styles without specifying a directory.  They are actually
4777located in @file{$prefix/share/gettext/styles/}, where @code{$prefix} is the
4778installation location.
4779
4780You can also design your own styles.  This is described in the next section.
4781
4782
4783@node Style rules, Customizing less, The --style option, Colorizing
4784@subsection Style rules for PO files
4785
4786The same style file can be used for styling of a PO file, for terminal
4787output and for HTML output.  It is written in CSS (Cascading Style Sheet)
4788syntax.  See @url{http://www.w3.org/TR/css2/cover.html} for a formal
4789definition of CSS.  Many HTML authoring tutorials also contain explanations
4790of CSS.
4791
4792In the case of HTML output, the style file is embedded in the HTML output.
4793In the case of text output, the style file is interpreted by the
4794@code{msgcat} program.  This means, in particular, that when
4795@code{@@import} is used with relative file names, the file names are
4796
4797@itemize @minus
4798@item
4799relative to the resulting HTML file, in the case of HTML output,
4800
4801@item
4802relative to the style sheet containing the @code{@@import}, in the case of
4803text output.  (Actually, @code{@@import}s are not yet supported in this case,
4804due to a limitation in @code{libcroco}.)
4805@end itemize
4806
4807CSS rules are built up from selectors and declarations.  The declarations
4808specify graphical properties; the selectors specify specify when they apply.
4809
4810In PO files, the following simple selectors (based on "CSS classes", see
4811the CSS2 spec, section 5.8.3) are supported.
4812
4813@itemize @bullet
4814@item
4815Selectors that apply to entire messages:
4816
4817@table @code
4818@item .header
4819This matches the header entry of a PO file.
4820
4821@item .translated
4822This matches a translated message.
4823
4824@item .untranslated
4825This matches an untranslated message (i.e.@: a message with empty translation).
4826
4827@item .fuzzy
4828This matches a fuzzy message (i.e.@: a message which has a translation that
4829needs review by the translator).
4830
4831@item .obsolete
4832This matches an obsolete message (i.e.@: a message that was translated but is
4833not needed by the current POT file any more).
4834@end table
4835
4836@item
4837Selectors that apply to parts of a message in PO syntax.  Recall the general
4838structure of a message in PO syntax:
4839
4840@example
4841@var{white-space}
4842#  @var{translator-comments}
4843#. @var{extracted-comments}
4844#: @var{reference}@dots{}
4845#, @var{flag}@dots{}
4846#| msgid @var{previous-untranslated-string}
4847msgid @var{untranslated-string}
4848msgstr @var{translated-string}
4849@end example
4850
4851@table @code
4852@item .comment
4853This matches all comments (translator comments, extracted comments,
4854source file reference comments, flag comments, previous message comments,
4855as well as the entire obsolete messages).
4856
4857@item .translator-comment
4858This matches the translator comments.
4859
4860@item .extracted-comment
4861This matches the extracted comments, i.e.@: the comments placed by the
4862programmer at the attention of the translator.
4863
4864@item .reference-comment
4865This matches the source file reference comments (entire lines).
4866
4867@item .reference
4868This matches the individual source file references inside the source file
4869reference comment lines.
4870
4871@item .flag-comment
4872This matches the flag comment lines (entire lines).
4873
4874@item .flag
4875This matches the individual flags inside flag comment lines.
4876
4877@item .fuzzy-flag
4878This matches the `fuzzy' flag inside flag comment lines.
4879
4880@item .previous-comment
4881This matches the comments containing the previous untranslated string (entire
4882lines).
4883
4884@item .previous
4885This matches the previous untranslated string including the string delimiters,
4886the associated keywords (@code{msgid} etc.) and the spaces between them.
4887
4888@item .msgid
4889This matches the untranslated string including the string delimiters,
4890the associated keywords (@code{msgid} etc.) and the spaces between them.
4891
4892@item .msgstr
4893This matches the translated string including the string delimiters,
4894the associated keywords (@code{msgstr} etc.) and the spaces between them.
4895
4896@item .keyword
4897This matches the keywords (@code{msgid}, @code{msgstr}, etc.).
4898
4899@item .string
4900This matches strings, including the string delimiters (double quotes).
4901@end table
4902
4903@item
4904Selectors that apply to parts of strings:
4905
4906@table @code
4907@item .text
4908This matches the entire contents of a string (excluding the string delimiters,
4909i.e.@: the double quotes).
4910
4911@item .escape-sequence
4912This matches an escape sequence (starting with a backslash).
4913
4914@item .format-directive
4915This matches a format string directive (starting with a @samp{%} sign in the
4916case of most programming languages, with a @samp{@{} in the case of
4917@code{java-format} and @code{csharp-format}, with a @samp{~} in the case of
4918@code{lisp-format} and @code{scheme-format}, or with @samp{$} in the case of
4919@code{sh-format}).
4920
4921@item .invalid-format-directive
4922This matches an invalid format string directive.
4923
4924@item .added
4925In an untranslated string, this matches a part of the string that was not
4926present in the previous untranslated string.  (Not yet implemented in this
4927release.)
4928
4929@item .changed
4930In an untranslated string or in a previous untranslated string, this matches
4931a part of the string that is changed or replaced.  (Not yet implemented in
4932this release.)
4933
4934@item .removed
4935In a previous untranslated string, this matches a part of the string that
4936is not present in the current untranslated string.  (Not yet implemented in
4937this release.)
4938@end table
4939@end itemize
4940
4941These selectors can be combined to hierarchical selectors.  For example,
4942
4943@smallexample
4944.msgstr .invalid-format-directive @{ color: red; @}
4945@end smallexample
4946
4947@noindent
4948will highlight the invalid format directives in the translated strings.
4949
4950In text mode, pseudo-classes (CSS2 spec, section 5.11) and pseudo-elements
4951(CSS2 spec, section 5.12) are not supported.
4952
4953The declarations in HTML mode are not limited; any graphical attribute
4954supported by the browsers can be used.
4955
4956The declarations in text mode are limited to the following properties.  Other
4957properties will be silently ignored.
4958
4959@table @asis
4960@item @code{color} (CSS2 spec, section 14.1)
4961@itemx @code{background-color} (CSS2 spec, section 14.2.1)
4962These properties is supported.  Colors will be adjusted to match the terminal's
4963capabilities.  Note that many terminals support only 8 colors.
4964
4965@item @code{font-weight} (CSS2 spec, section 15.2.3)
4966This property is supported, but most terminals can only render two different
4967weights: @code{normal} and @code{bold}.  Values >= 600 are rendered as
4968@code{bold}.
4969
4970@item @code{font-style} (CSS2 spec, section 15.2.3)
4971This property is supported.  The values @code{italic} and @code{oblique} are
4972rendered the same way.
4973
4974@item @code{text-decoration} (CSS2 spec, section 16.3.1)
4975This property is supported, limited to the values @code{none} and
4976@code{underline}.
4977@end table
4978
4979@node Customizing less,  , Style rules, Colorizing
4980@subsection Customizing @code{less} for viewing PO files
4981
4982The @samp{less} program is a popular text file browser for use in a text
4983screen or terminal emulator.  It also supports text with embedded escape
4984sequences for colors and text decorations.
4985
4986You can use @code{less} to view a PO file like this (assuming an UTF-8
4987environment):
4988
4989@smallexample
4990msgcat --to-code=UTF-8 --color xyz.po | less -R
4991@end smallexample
4992
4993You can simplify this to this simple command:
4994
4995@smallexample
4996less xyz.po
4997@end smallexample
4998
4999@noindent
5000after these three preparations:
5001
5002@enumerate
5003@item
5004Add the options @samp{-R} and @samp{-f} to the @code{LESS} environment
5005variable.  In sh shells:
5006@smallexample
5007$ LESS="$LESS -R -f"
5008$ export LESS
5009@end smallexample
5010
5011@item
5012If your system does not already have the @file{lessopen.sh} and
5013@file{lessclose.sh} scripts, create them and set the @code{LESSOPEN} and
5014@code{LESSCLOSE} environment variables, as indicated in the manual page
5015(@samp{man less}).
5016
5017@item
5018Add to @file{lessopen.sh} a piece of script that recognizes PO files
5019through their file extension and invokes @code{msgcat} on them, producing
5020a temporary file.  Like this:
5021
5022@smallexample
5023case "$1" in
5024  *.po)
5025    tmpfile=`mktemp "$@{TMPDIR-/tmp@}/less.XXXXXX"`
5026    msgcat --to-code=UTF-8 --color "$1" > "$tmpfile"
5027    echo "$tmpfile"
5028    exit 0
5029    ;;
5030esac
5031@end smallexample
5032@end enumerate
5033
5034@node libgettextpo,  , Colorizing, Manipulating
5035@section Writing your own programs that process PO files
5036
5037For the tasks for which a combination of @samp{msgattrib}, @samp{msgcat} etc.
5038is not sufficient, a set of C functions is provided in a library, to make it
5039possible to process PO files in your own programs.  When you use this library,
5040you don't need to write routines to parse the PO file; instead, you retrieve
5041a pointer in memory to each of messages contained in the PO file.  Functions
5042for writing PO files are not provided at this time.
5043
5044The functions are declared in the header file @samp{<gettext-po.h>}, and are
5045defined in a library called @samp{libgettextpo}.
5046
5047@deftp {Data Type} po_file_t
5048This is a pointer type that refers to the contents of a PO file, after it has
5049been read into memory.
5050@end deftp
5051
5052@deftp {Data Type} po_message_iterator_t
5053This is a pointer type that refers to an iterator that produces a sequence of
5054messages.
5055@end deftp
5056
5057@deftp {Data Type} po_message_t
5058This is a pointer type that refers to a message of a PO file, including its
5059translation.
5060@end deftp
5061
5062@deftypefun po_file_t po_file_read (const char *@var{filename})
5063The @code{po_file_read} function reads a PO file into memory.  The file name
5064is given as argument.  The return value is a handle to the PO file's contents,
5065valid until @code{po_file_free} is called on it.  In case of error, the return
5066value is @code{NULL}, and @code{errno} is set.
5067@end deftypefun
5068
5069@deftypefun void po_file_free (po_file_t @var{file})
5070The @code{po_file_free} function frees a PO file's contents from memory,
5071including all messages that are only implicitly accessible through iterators.
5072@end deftypefun
5073
5074@deftypefun {const char * const *} po_file_domains (po_file_t @var{file})
5075The @code{po_file_domains} function returns the domains for which the given
5076PO file has messages.  The return value is a @code{NULL} terminated array
5077which is valid as long as the @var{file} handle is valid.  For PO files which
5078contain no @samp{domain} directive, the return value contains only one domain,
5079namely the default domain @code{"messages"}.
5080@end deftypefun
5081
5082@deftypefun po_message_iterator_t po_message_iterator (po_file_t @var{file}, const char *@var{domain})
5083The @code{po_message_iterator} returns an iterator that will produce the
5084messages of @var{file} that belong to the given @var{domain}.  If @var{domain}
5085is @code{NULL}, the default domain is used instead.  To list the messages,
5086use the function @code{po_next_message} repeatedly.
5087@end deftypefun
5088
5089@deftypefun void po_message_iterator_free (po_message_iterator_t @var{iterator})
5090The @code{po_message_iterator_free} function frees an iterator previously
5091allocated through the @code{po_message_iterator} function.
5092@end deftypefun
5093
5094@deftypefun po_message_t po_next_message (po_message_iterator_t @var{iterator})
5095The @code{po_next_message} function returns the next message from
5096@var{iterator} and advances the iterator.  It returns @code{NULL} when the
5097iterator has reached the end of its message list.
5098@end deftypefun
5099
5100The following functions returns details of a @code{po_message_t}.  Recall
5101that the results are valid as long as the @var{file} handle is valid.
5102
5103@deftypefun {const char *} po_message_msgid (po_message_t @var{message})
5104The @code{po_message_msgid} function returns the @code{msgid} (untranslated
5105English string) of a message.  This is guaranteed to be non-@code{NULL}.
5106@end deftypefun
5107
5108@deftypefun {const char *} po_message_msgid_plural (po_message_t @var{message})
5109The @code{po_message_msgid_plural} function returns the @code{msgid_plural}
5110(untranslated English plural string) of a message with plurals, or @code{NULL}
5111for a message without plural.
5112@end deftypefun
5113
5114@deftypefun {const char *} po_message_msgstr (po_message_t @var{message})
5115The @code{po_message_msgstr} function returns the @code{msgstr} (translation)
5116of a message.  For an untranslated message, the return value is an empty
5117string.
5118@end deftypefun
5119
5120@deftypefun {const char *} po_message_msgstr_plural (po_message_t @var{message}, int @var{index})
5121The @code{po_message_msgstr_plural} function returns the
5122@code{msgstr[@var{index}]} of a message with plurals, or @code{NULL} when
5123the @var{index} is out of range or for a message without plural.
5124@end deftypefun
5125
5126Here is an example code how these functions can be used.
5127
5128@example
5129const char *filename = @dots{};
5130po_file_t file = po_file_read (filename);
5131
5132if (file == NULL)
5133  error (EXIT_FAILURE, errno, "couldn't open the PO file %s", filename);
5134@{
5135  const char * const *domains = po_file_domains (file);
5136  const char * const *domainp;
5137
5138  for (domainp = domains; *domainp; domainp++)
5139    @{
5140      const char *domain = *domainp;
5141      po_message_iterator_t iterator = po_message_iterator (file, domain);
5142
5143      for (;;)
5144        @{
5145          po_message_t *message = po_next_message (iterator);
5146
5147          if (message == NULL)
5148            break;
5149          @{
5150            const char *msgid = po_message_msgid (message);
5151            const char *msgstr = po_message_msgstr (message);
5152
5153            @dots{}
5154          @}
5155        @}
5156      po_message_iterator_free (iterator);
5157    @}
5158@}
5159po_file_free (file);
5160@end example
5161
5162@node Binaries, Programmers, Manipulating, Top
5163@chapter Producing Binary MO Files
5164
5165@c FIXME: Rewrite.
5166
5167@menu
5168* msgfmt Invocation::           Invoking the @code{msgfmt} Program
5169* msgunfmt Invocation::         Invoking the @code{msgunfmt} Program
5170* MO Files::                    The Format of GNU MO Files
5171@end menu
5172
5173@node msgfmt Invocation, msgunfmt Invocation, Binaries, Binaries
5174@section Invoking the @code{msgfmt} Program
5175
5176@include msgfmt.texi
5177
5178@node msgunfmt Invocation, MO Files, msgfmt Invocation, Binaries
5179@section Invoking the @code{msgunfmt} Program
5180
5181@include msgunfmt.texi
5182
5183@node MO Files,  , msgunfmt Invocation, Binaries
5184@section The Format of GNU MO Files
5185@cindex MO file's format
5186@cindex file format, @file{.mo}
5187
5188The format of the generated MO files is best described by a picture,
5189which appears below.
5190
5191@cindex magic signature of MO files
5192The first two words serve the identification of the file.  The magic
5193number will always signal GNU MO files.  The number is stored in the
5194byte order of the generating machine, so the magic number really is
5195two numbers: @code{0x950412de} and @code{0xde120495}.  The second
5196word describes the current revision of the file format.  For now the
5197revision is 0.  This might change in future versions, and ensures
5198that the readers of MO files can distinguish new formats from old
5199ones, so that both can be handled correctly.  The version is kept
5200separate from the magic number, instead of using different magic
5201numbers for different formats, mainly because @file{/etc/magic} is
5202not updated often.  It might be better to have magic separated from
5203internal format version identification.
5204
5205Follow a number of pointers to later tables in the file, allowing
5206for the extension of the prefix part of MO files without having to
5207recompile programs reading them.  This might become useful for later
5208inserting a few flag bits, indication about the charset used, new
5209tables, or other things.
5210
5211Then, at offset @var{O} and offset @var{T} in the picture, two tables
5212of string descriptors can be found.  In both tables, each string
5213descriptor uses two 32 bits integers, one for the string length,
5214another for the offset of the string in the MO file, counting in bytes
5215from the start of the file.  The first table contains descriptors
5216for the original strings, and is sorted so the original strings
5217are in increasing lexicographical order.  The second table contains
5218descriptors for the translated strings, and is parallel to the first
5219table: to find the corresponding translation one has to access the
5220array slot in the second array with the same index.
5221
5222Having the original strings sorted enables the use of simple binary
5223search, for when the MO file does not contain an hashing table, or
5224for when it is not practical to use the hashing table provided in
5225the MO file.  This also has another advantage, as the empty string
5226in a PO file GNU @code{gettext} is usually @emph{translated} into
5227some system information attached to that particular MO file, and the
5228empty string necessarily becomes the first in both the original and
5229translated tables, making the system information very easy to find.
5230
5231@cindex hash table, inside MO files
5232The size @var{S} of the hash table can be zero.  In this case, the
5233hash table itself is not contained in the MO file.  Some people might
5234prefer this because a precomputed hashing table takes disk space, and
5235does not win @emph{that} much speed.  The hash table contains indices
5236to the sorted array of strings in the MO file.  Conflict resolution is
5237done by double hashing.  The precise hashing algorithm used is fairly
5238dependent on GNU @code{gettext} code, and is not documented here.
5239
5240As for the strings themselves, they follow the hash file, and each
5241is terminated with a @key{NUL}, and this @key{NUL} is not counted in
5242the length which appears in the string descriptor.  The @code{msgfmt}
5243program has an option selecting the alignment for MO file strings.
5244With this option, each string is separately aligned so it starts at
5245an offset which is a multiple of the alignment value.  On some RISC
5246machines, a correct alignment will speed things up.
5247
5248@cindex context, in MO files
5249Contexts are stored by storing the concatenation of the context, a
5250@key{EOT} byte, and the original string, instead of the original string.
5251
5252@cindex plural forms, in MO files
5253Plural forms are stored by letting the plural of the original string
5254follow the singular of the original string, separated through a
5255@key{NUL} byte.  The length which appears in the string descriptor
5256includes both.  However, only the singular of the original string
5257takes part in the hash table lookup.  The plural variants of the
5258translation are all stored consecutively, separated through a
5259@key{NUL} byte.  Here also, the length in the string descriptor
5260includes all of them.
5261
5262Nothing prevents a MO file from having embedded @key{NUL}s in strings.
5263However, the program interface currently used already presumes
5264that strings are @key{NUL} terminated, so embedded @key{NUL}s are
5265somewhat useless.  But the MO file format is general enough so other
5266interfaces would be later possible, if for example, we ever want to
5267implement wide characters right in MO files, where @key{NUL} bytes may
5268accidentally appear.  (No, we don't want to have wide characters in MO
5269files.  They would make the file unnecessarily large, and the
5270@samp{wchar_t} type being platform dependent, MO files would be
5271platform dependent as well.)
5272
5273This particular issue has been strongly debated in the GNU
5274@code{gettext} development forum, and it is expectable that MO file
5275format will evolve or change over time.  It is even possible that many
5276formats may later be supported concurrently.  But surely, we have to
5277start somewhere, and the MO file format described here is a good start.
5278Nothing is cast in concrete, and the format may later evolve fairly
5279easily, so we should feel comfortable with the current approach.
5280
5281@example
5282@group
5283        byte
5284             +------------------------------------------+
5285          0  | magic number = 0x950412de                |
5286             |                                          |
5287          4  | file format revision = 0                 |
5288             |                                          |
5289          8  | number of strings                        |  == N
5290             |                                          |
5291         12  | offset of table with original strings    |  == O
5292             |                                          |
5293         16  | offset of table with translation strings |  == T
5294             |                                          |
5295         20  | size of hashing table                    |  == S
5296             |                                          |
5297         24  | offset of hashing table                  |  == H
5298             |                                          |
5299             .                                          .
5300             .    (possibly more entries later)         .
5301             .                                          .
5302             |                                          |
5303          O  | length & offset 0th string  ----------------.
5304      O + 8  | length & offset 1st string  ------------------.
5305              ...                                    ...   | |
5306O + ((N-1)*8)| length & offset (N-1)th string           |  | |
5307             |                                          |  | |
5308          T  | length & offset 0th translation  ---------------.
5309      T + 8  | length & offset 1st translation  -----------------.
5310              ...                                    ...   | | | |
5311T + ((N-1)*8)| length & offset (N-1)th translation      |  | | | |
5312             |                                          |  | | | |
5313          H  | start hash table                         |  | | | |
5314              ...                                    ...   | | | |
5315  H + S * 4  | end hash table                           |  | | | |
5316             |                                          |  | | | |
5317             | NUL terminated 0th string  <----------------' | | |
5318             |                                          |    | | |
5319             | NUL terminated 1st string  <------------------' | |
5320             |                                          |      | |
5321              ...                                    ...       | |
5322             |                                          |      | |
5323             | NUL terminated 0th translation  <---------------' |
5324             |                                          |        |
5325             | NUL terminated 1st translation  <-----------------'
5326             |                                          |
5327              ...                                    ...
5328             |                                          |
5329             +------------------------------------------+
5330@end group
5331@end example
5332
5333@node Programmers, Translators, Binaries, Top
5334@chapter The Programmer's View
5335
5336@c FIXME: Reorganize whole chapter.
5337
5338One aim of the current message catalog implementation provided by
5339GNU @code{gettext} was to use the system's message catalog handling, if the
5340installer wishes to do so.  So we perhaps should first take a look at
5341the solutions we know about.  The people in the POSIX committee did not
5342manage to agree on one of the semi-official standards which we'll
5343describe below.  In fact they couldn't agree on anything, so they decided
5344only to include an example of an interface.  The major Unix vendors
5345are split in the usage of the two most important specifications: X/Open's
5346catgets vs. Uniforum's gettext interface.  We'll describe them both and
5347later explain our solution of this dilemma.
5348
5349@menu
5350* catgets::                     About @code{catgets}
5351* gettext::                     About @code{gettext}
5352* Comparison::                  Comparing the two interfaces
5353* Using libintl.a::             Using libintl.a in own programs
5354* gettext grok::                Being a @code{gettext} grok
5355* Temp Programmers::            Temporary Notes for the Programmers Chapter
5356@end menu
5357
5358@node catgets, gettext, Programmers, Programmers
5359@section About @code{catgets}
5360@cindex @code{catgets}, X/Open specification
5361
5362The @code{catgets} implementation is defined in the X/Open Portability
5363Guide, Volume 3, XSI Supplementary Definitions, Chapter 5.  But the
5364process of creating this standard seemed to be too slow for some of
5365the Unix vendors so they created their implementations on preliminary
5366versions of the standard.  Of course this leads again to problems while
5367writing platform independent programs: even the usage of @code{catgets}
5368does not guarantee a unique interface.
5369
5370Another, personal comment on this that only a bunch of committee members
5371could have made this interface.  They never really tried to program
5372using this interface.  It is a fast, memory-saving implementation, an
5373user can happily live with it.  But programmers hate it (at least I and
5374some others do@dots{})
5375
5376But we must not forget one point: after all the trouble with transferring
5377the rights on Unix(tm) they at last came to X/Open, the very same who
5378published this specification.  This leads me to making the prediction
5379that this interface will be in future Unix standards (e.g.@: Spec1170) and
5380therefore part of all Unix implementation (implementations, which are
5381@emph{allowed} to wear this name).
5382
5383@menu
5384* Interface to catgets::        The interface
5385* Problems with catgets::       Problems with the @code{catgets} interface?!
5386@end menu
5387
5388@node Interface to catgets, Problems with catgets, catgets, catgets
5389@subsection The Interface
5390@cindex interface to @code{catgets}
5391
5392The interface to the @code{catgets} implementation consists of three
5393functions which correspond to those used in file access: @code{catopen}
5394to open the catalog for using, @code{catgets} for accessing the message
5395tables, and @code{catclose} for closing after work is done.  Prototypes
5396for the functions and the needed definitions are in the
5397@code{<nl_types.h>} header file.
5398
5399@cindex @code{catopen}, a @code{catgets} function
5400@code{catopen} is used like in this:
5401
5402@example
5403nl_catd catd = catopen ("catalog_name", 0);
5404@end example
5405
5406The function takes as the argument the name of the catalog.  This usual
5407refers to the name of the program or the package.  The second parameter
5408is not further specified in the standard.  I don't even know whether it
5409is implemented consistently among various systems.  So the common advice
5410is to use @code{0} as the value.  The return value is a handle to the
5411message catalog, equivalent to handles to file returned by @code{open}.
5412
5413@cindex @code{catgets}, a @code{catgets} function
5414This handle is of course used in the @code{catgets} function which can
5415be used like this:
5416
5417@example
5418char *translation = catgets (catd, set_no, msg_id, "original string");
5419@end example
5420
5421The first parameter is this catalog descriptor.  The second parameter
5422specifies the set of messages in this catalog, in which the message
5423described by @code{msg_id} is obtained.  @code{catgets} therefore uses a
5424three-stage addressing:
5425
5426@display
5427catalog name @result{} set number @result{} message ID @result{} translation
5428@end display
5429
5430@c Anybody else loving Haskell??? :-) -- Uli
5431
5432The fourth argument is not used to address the translation.  It is given
5433as a default value in case when one of the addressing stages fail.  One
5434important thing to remember is that although the return type of catgets
5435is @code{char *} the resulting string @emph{must not} be changed.  It
5436should better be @code{const char *}, but the standard is published in
54371988, one year before ANSI C.
5438
5439@noindent
5440@cindex @code{catclose}, a @code{catgets} function
5441The last of these functions is used and behaves as expected:
5442
5443@example
5444catclose (catd);
5445@end example
5446
5447After this no @code{catgets} call using the descriptor is legal anymore.
5448
5449@node Problems with catgets,  , Interface to catgets, catgets
5450@subsection Problems with the @code{catgets} Interface?!
5451@cindex problems with @code{catgets} interface
5452
5453Now that this description seemed to be really easy --- where are the
5454problems we speak of?  In fact the interface could be used in a
5455reasonable way, but constructing the message catalogs is a pain.  The
5456reason for this lies in the third argument of @code{catgets}: the unique
5457message ID.  This has to be a numeric value for all messages in a single
5458set.  Perhaps you could imagine the problems keeping such a list while
5459changing the source code.  Add a new message here, remove one there.  Of
5460course there have been developed a lot of tools helping to organize this
5461chaos but one as the other fails in one aspect or the other.  We don't
5462want to say that the other approach has no problems but they are far
5463more easy to manage.
5464
5465@node gettext, Comparison, catgets, Programmers
5466@section About @code{gettext}
5467@cindex @code{gettext}, a programmer's view
5468
5469The definition of the @code{gettext} interface comes from a Uniforum
5470proposal.  It was submitted there by Sun, who had implemented the
5471@code{gettext} function in SunOS 4, around 1990.  Nowadays, the
5472@code{gettext} interface is specified by the OpenI18N standard.
5473
5474The main point about this solution is that it does not follow the
5475method of normal file handling (open-use-close) and that it does not
5476burden the programmer with so many tasks, especially the unique key handling.
5477Of course here also a unique key is needed, but this key is the message
5478itself (how long or short it is).  See @ref{Comparison} for a more
5479detailed comparison of the two methods.
5480
5481The following section contains a rather detailed description of the
5482interface.  We make it that detailed because this is the interface
5483we chose for the GNU @code{gettext} Library.  Programmers interested
5484in using this library will be interested in this description.
5485
5486@menu
5487* Interface to gettext::        The interface
5488* Ambiguities::                 Solving ambiguities
5489* Locating Catalogs::           Locating message catalog files
5490* Charset conversion::          How to request conversion to Unicode
5491* Contexts::                    Solving ambiguities in GUI programs
5492* Plural forms::                Additional functions for handling plurals
5493* Optimized gettext::           Optimization of the *gettext functions
5494@end menu
5495
5496@node Interface to gettext, Ambiguities, gettext, gettext
5497@subsection The Interface
5498@cindex @code{gettext} interface
5499
5500The minimal functionality an interface must have is a) to select a
5501domain the strings are coming from (a single domain for all programs is
5502not reasonable because its construction and maintenance is difficult,
5503perhaps impossible) and b) to access a string in a selected domain.
5504
5505This is principally the description of the @code{gettext} interface.  It
5506has a global domain which unqualified usages reference.  Of course this
5507domain is selectable by the user.
5508
5509@example
5510char *textdomain (const char *domain_name);
5511@end example
5512
5513This provides the possibility to change or query the current status of
5514the current global domain of the @code{LC_MESSAGE} category.  The
5515argument is a null-terminated string, whose characters must be legal in
5516the use in filenames.  If the @var{domain_name} argument is @code{NULL},
5517the function returns the current value.  If no value has been set
5518before, the name of the default domain is returned: @emph{messages}.
5519Please note that although the return value of @code{textdomain} is of
5520type @code{char *} no changing is allowed.  It is also important to know
5521that no checks of the availability are made.  If the name is not
5522available you will see this by the fact that no translations are provided.
5523
5524@noindent
5525To use a domain set by @code{textdomain} the function
5526
5527@example
5528char *gettext (const char *msgid);
5529@end example
5530
5531@noindent
5532is to be used.  This is the simplest reasonable form one can imagine.
5533The translation of the string @var{msgid} is returned if it is available
5534in the current domain.  If it is not available, the argument itself is
5535returned.  If the argument is @code{NULL} the result is undefined.
5536
5537One thing which should come into mind is that no explicit dependency to
5538the used domain is given.  The current value of the domain is used.
5539If this changes between two
5540executions of the same @code{gettext} call in the program, both calls
5541reference a different message catalog.
5542
5543For the easiest case, which is normally used in internationalized
5544packages, once at the beginning of execution a call to @code{textdomain}
5545is issued, setting the domain to a unique name, normally the package
5546name.  In the following code all strings which have to be translated are
5547filtered through the gettext function.  That's all, the package speaks
5548your language.
5549
5550@node Ambiguities, Locating Catalogs, Interface to gettext, gettext
5551@subsection Solving Ambiguities
5552@cindex several domains
5553@cindex domain ambiguities
5554@cindex large package
5555
5556While this single name domain works well for most applications there
5557might be the need to get translations from more than one domain.  Of
5558course one could switch between different domains with calls to
5559@code{textdomain}, but this is really not convenient nor is it fast.  A
5560possible situation could be one case subject to discussion during this
5561writing:  all
5562error messages of functions in the set of common used functions should
5563go into a separate domain @code{error}.  By this mean we would only need
5564to translate them once.
5565Another case are messages from a library, as these @emph{have} to be
5566independent of the current domain set by the application.
5567
5568@noindent
5569For this reasons there are two more functions to retrieve strings:
5570
5571@example
5572char *dgettext (const char *domain_name, const char *msgid);
5573char *dcgettext (const char *domain_name, const char *msgid,
5574                 int category);
5575@end example
5576
5577Both take an additional argument at the first place, which corresponds
5578to the argument of @code{textdomain}.  The third argument of
5579@code{dcgettext} allows to use another locale category but @code{LC_MESSAGES}.
5580But I really don't know where this can be useful.  If the
5581@var{domain_name} is @code{NULL} or @var{category} has an value beside
5582the known ones, the result is undefined.  It should also be noted that
5583this function is not part of the second known implementation of this
5584function family, the one found in Solaris.
5585
5586A second ambiguity can arise by the fact, that perhaps more than one
5587domain has the same name.  This can be solved by specifying where the
5588needed message catalog files can be found.
5589
5590@example
5591char *bindtextdomain (const char *domain_name,
5592                      const char *dir_name);
5593@end example
5594
5595Calling this function binds the given domain to a file in the specified
5596directory (how this file is determined follows below).  Especially a
5597file in the systems default place is not favored against the specified
5598file anymore (as it would be by solely using @code{textdomain}).  A
5599@code{NULL} pointer for the @var{dir_name} parameter returns the binding
5600associated with @var{domain_name}.  If @var{domain_name} itself is
5601@code{NULL} nothing happens and a @code{NULL} pointer is returned.  Here
5602again as for all the other functions is true that none of the return
5603value must be changed!
5604
5605It is important to remember that relative path names for the
5606@var{dir_name} parameter can be trouble.  Since the path is always
5607computed relative to the current directory different results will be
5608achieved when the program executes a @code{chdir} command.  Relative
5609paths should always be avoided to avoid dependencies and
5610unreliabilities.
5611
5612@node Locating Catalogs, Charset conversion, Ambiguities, gettext
5613@subsection Locating Message Catalog Files
5614@cindex message catalog files location
5615
5616Because many different languages for many different packages have to be
5617stored we need some way to add these information to file message catalog
5618files.  The way usually used in Unix environments is have this encoding
5619in the file name.  This is also done here.  The directory name given in
5620@code{bindtextdomain}s second argument (or the default directory),
5621followed by the name of the locale, the locale category, and the domain name
5622are concatenated:
5623
5624@example
5625@var{dir_name}/@var{locale}/LC_@var{category}/@var{domain_name}.mo
5626@end example
5627
5628The default value for @var{dir_name} is system specific.  For the GNU
5629library, and for packages adhering to its conventions, it's:
5630@example
5631/usr/local/share/locale
5632@end example
5633
5634@noindent
5635@var{locale} is the name of the locale category which is designated by
5636@code{LC_@var{category}}.  For @code{gettext} and @code{dgettext} this
5637@code{LC_@var{category}} is always @code{LC_MESSAGES}.@footnote{Some
5638system, e.g.@: mingw, don't have @code{LC_MESSAGES}.  Here we use a more or
5639less arbitrary value for it, namely 1729, the smallest positive integer
5640which can be represented in two different ways as the sum of two cubes.}
5641The name of the locale category is determined through
5642@code{setlocale (LC_@var{category}, NULL)}.
5643@footnote{When the system does not support @code{setlocale} its behavior
5644in setting the locale values is simulated by looking at the environment
5645variables.}
5646When using the function @code{dcgettext}, you can specify the locale category
5647through the third argument.
5648
5649@node Charset conversion, Contexts, Locating Catalogs, gettext
5650@subsection How to specify the output character set @code{gettext} uses
5651@cindex charset conversion at runtime
5652@cindex encoding conversion at runtime
5653
5654@code{gettext} not only looks up a translation in a message catalog.  It
5655also converts the translation on the fly to the desired output character
5656set.  This is useful if the user is working in a different character set
5657than the translator who created the message catalog, because it avoids
5658distributing variants of message catalogs which differ only in the
5659character set.
5660
5661The output character set is, by default, the value of @code{nl_langinfo
5662(CODESET)}, which depends on the @code{LC_CTYPE} part of the current
5663locale.  But programs which store strings in a locale independent way
5664(e.g.@: UTF-8) can request that @code{gettext} and related functions
5665return the translations in that encoding, by use of the
5666@code{bind_textdomain_codeset} function.
5667
5668Note that the @var{msgid} argument to @code{gettext} is not subject to
5669character set conversion.  Also, when @code{gettext} does not find a
5670translation for @var{msgid}, it returns @var{msgid} unchanged --
5671independently of the current output character set.  It is therefore
5672recommended that all @var{msgid}s be US-ASCII strings.
5673
5674@deftypefun {char *} bind_textdomain_codeset (const char *@var{domainname}, const char *@var{codeset})
5675The @code{bind_textdomain_codeset} function can be used to specify the
5676output character set for message catalogs for domain @var{domainname}.
5677The @var{codeset} argument must be a valid codeset name which can be used
5678for the @code{iconv_open} function, or a null pointer.
5679
5680If the @var{codeset} parameter is the null pointer,
5681@code{bind_textdomain_codeset} returns the currently selected codeset
5682for the domain with the name @var{domainname}.  It returns @code{NULL} if
5683no codeset has yet been selected.
5684
5685The @code{bind_textdomain_codeset} function can be used several times. 
5686If used multiple times with the same @var{domainname} argument, the
5687later call overrides the settings made by the earlier one.
5688
5689The @code{bind_textdomain_codeset} function returns a pointer to a
5690string containing the name of the selected codeset.  The string is
5691allocated internally in the function and must not be changed by the
5692user.  If the system went out of core during the execution of
5693@code{bind_textdomain_codeset}, the return value is @code{NULL} and the
5694global variable @var{errno} is set accordingly.
5695@end deftypefun
5696
5697@node Contexts, Plural forms, Charset conversion, gettext
5698@subsection Using contexts for solving ambiguities
5699@cindex context
5700@cindex GUI programs
5701@cindex translating menu entries
5702@cindex menu entries
5703
5704One place where the @code{gettext} functions, if used normally, have big
5705problems is within programs with graphical user interfaces (GUIs).  The
5706problem is that many of the strings which have to be translated are very
5707short.  They have to appear in pull-down menus which restricts the
5708length.  But strings which are not containing entire sentences or at
5709least large fragments of a sentence may appear in more than one
5710situation in the program but might have different translations.  This is
5711especially true for the one-word strings which are frequently used in
5712GUI programs.
5713
5714As a consequence many people say that the @code{gettext} approach is
5715wrong and instead @code{catgets} should be used which indeed does not
5716have this problem.  But there is a very simple and powerful method to
5717handle this kind of problems with the @code{gettext} functions.
5718
5719Contexts can be added to strings to be translated.  A context dependent
5720translation lookup is when a translation for a given string is searched,
5721that is limited to a given context.  The translation for the same string
5722in a different context can be different.  The different translations of
5723the same string in different contexts can be stored in the in the same
5724MO file, and can be edited by the translator in the same PO file.
5725
5726The @file{gettext.h} include file contains the lookup macros for strings
5727with contexts.  They are implemented as thin macros and inline functions
5728over the functions from @code{<libintl.h>}.
5729
5730@findex pgettext
5731@example
5732const char *pgettext (const char *msgctxt, const char *msgid);
5733@end example
5734
5735In a call of this macro, @var{msgctxt} and @var{msgid} must be string
5736literals.  The macro returns the translation of @var{msgid}, restricted
5737to the context given by @var{msgctxt}.
5738
5739The @var{msgctxt} string is visible in the PO file to the translator.
5740You should try to make it somehow canonical and never changing.  Because
5741every time you change an @var{msgctxt}, the translator will have to review
5742the translation of @var{msgid}.
5743
5744Finding a canonical @var{msgctxt} string that doesn't change over time can
5745be hard.  But you shouldn't use the file name or class name containing the
5746@code{pgettext} call -- because it is a common development task to rename
5747a file or a class, and it shouldn't cause translator work.  Also you shouldn't
5748use a comment in the form of a complete English sentence as @var{msgctxt} --
5749because orthography or grammar changes are often applied to such sentences,
5750and again, it shouldn't force the translator to do a review.
5751
5752The @samp{p} in @samp{pgettext} stands for ``particular'': @code{pgettext}
5753fetches a particular translation of the @var{msgid}.
5754
5755@findex dpgettext
5756@findex dcpgettext
5757@example
5758const char *dpgettext (const char *domain_name,
5759                       const char *msgctxt, const char *msgid);
5760const char *dcpgettext (const char *domain_name,
5761                        const char *msgctxt, const char *msgid,
5762                        int category);
5763@end example
5764
5765These are generalizations of @code{pgettext}.  They behave similarly to
5766@code{dgettext} and @code{dcgettext}, respectively.  The @var{domain_name}
5767argument defines the translation domain.  The @var{category} argument
5768allows to use another locale category than @code{LC_MESSAGES}.
5769
5770As as example consider the following fictional situation.  A GUI program
5771has a menu bar with the following entries:
5772
5773@smallexample
5774+------------+------------+--------------------------------------+
5775| File       | Printer    |                                      |
5776+------------+------------+--------------------------------------+
5777| Open     | | Select   |
5778| New      | | Open     |
5779+----------+ | Connect  |
5780             +----------+
5781@end smallexample
5782
5783To have the strings @code{File}, @code{Printer}, @code{Open},
5784@code{New}, @code{Select}, and @code{Connect} translated there has to be
5785at some point in the code a call to a function of the @code{gettext}
5786family.  But in two places the string passed into the function would be
5787@code{Open}.  The translations might not be the same and therefore we
5788are in the dilemma described above.
5789
5790What distinguishes the two places is the menu path from the menu root to
5791the particular menu entries:
5792
5793@smallexample
5794Menu|File
5795Menu|Printer
5796Menu|File|Open
5797Menu|File|New
5798Menu|Printer|Select
5799Menu|Printer|Open
5800Menu|Printer|Connect
5801@end smallexample
5802
5803The context is thus the menu path without its last part.  So, the calls
5804look like this:
5805
5806@smallexample
5807pgettext ("Menu|", "File")
5808pgettext ("Menu|", "Printer")
5809pgettext ("Menu|File|", "Open")
5810pgettext ("Menu|File|", "New")
5811pgettext ("Menu|Printer|", "Select")
5812pgettext ("Menu|Printer|", "Open")
5813pgettext ("Menu|Printer|", "Connect")
5814@end smallexample
5815
5816Whether or not to use the @samp{|} character at the end of the context is a
5817matter of style.
5818
5819For more complex cases, where the @var{msgctxt} or @var{msgid} are not
5820string literals, more general macros are available:
5821
5822@findex pgettext_expr
5823@findex dpgettext_expr
5824@findex dcpgettext_expr
5825@example
5826const char *pgettext_expr (const char *msgctxt, const char *msgid);
5827const char *dpgettext_expr (const char *domain_name,
5828                            const char *msgctxt, const char *msgid);
5829const char *dcpgettext_expr (const char *domain_name,
5830                             const char *msgctxt, const char *msgid,
5831                             int category);
5832@end example
5833
5834Here @var{msgctxt} and @var{msgid} can be arbitrary string-valued expressions.
5835These macros are more general.  But in the case that both argument expressions
5836are string literals, the macros without the @samp{_expr} suffix are more
5837efficient.
5838
5839@node Plural forms, Optimized gettext, Contexts, gettext
5840@subsection Additional functions for plural forms
5841@cindex plural forms
5842
5843The functions of the @code{gettext} family described so far (and all the
5844@code{catgets} functions as well) have one problem in the real world
5845which have been neglected completely in all existing approaches.  What
5846is meant here is the handling of plural forms.
5847
5848Looking through Unix source code before the time anybody thought about
5849internationalization (and, sadly, even afterwards) one can often find
5850code similar to the following:
5851
5852@smallexample
5853   printf ("%d file%s deleted", n, n == 1 ? "" : "s");
5854@end smallexample
5855
5856@noindent
5857After the first complaints from people internationalizing the code people
5858either completely avoided formulations like this or used strings like
5859@code{"file(s)"}.  Both look unnatural and should be avoided.  First
5860tries to solve the problem correctly looked like this:
5861
5862@smallexample
5863   if (n == 1)
5864     printf ("%d file deleted", n);
5865   else
5866     printf ("%d files deleted", n);
5867@end smallexample
5868
5869But this does not solve the problem.  It helps languages where the
5870plural form of a noun is not simply constructed by adding an
5871@ifhtml
5872���s���
5873@end ifhtml
5874@ifnothtml
5875`s'
5876@end ifnothtml
5877but that is all.  Once again people fell into the trap of believing the
5878rules their language is using are universal.  But the handling of plural
5879forms differs widely between the language families.  For example,
5880Rafal Maszkowski @code{<rzm@@mat.uni.torun.pl>} reports:
5881
5882@quotation
5883In Polish we use e.g.@: plik (file) this way:
5884@example
58851 plik
58862,3,4 pliki
58875-21 pliko'w
588822-24 pliki
588925-31 pliko'w
5890@end example
5891and so on (o' means 8859-2 oacute which should be rather okreska,
5892similar to aogonek).
5893@end quotation
5894
5895There are two things which can differ between languages (and even inside
5896language families);
5897
5898@itemize @bullet
5899@item
5900The form how plural forms are built differs.  This is a problem with
5901languages which have many irregularities.  German, for instance, is a
5902drastic case.  Though English and German are part of the same language
5903family (Germanic), the almost regular forming of plural noun forms
5904(appending an
5905@ifhtml
5906���s���)
5907@end ifhtml
5908@ifnothtml
5909`s')
5910@end ifnothtml
5911is hardly found in German.
5912
5913@item
5914The number of plural forms differ.  This is somewhat surprising for
5915those who only have experiences with Romanic and Germanic languages
5916since here the number is the same (there are two).
5917
5918But other language families have only one form or many forms.  More
5919information on this in an extra section.
5920@end itemize
5921
5922The consequence of this is that application writers should not try to
5923solve the problem in their code.  This would be localization since it is
5924only usable for certain, hardcoded language environments.  Instead the
5925extended @code{gettext} interface should be used.
5926
5927These extra functions are taking instead of the one key string two
5928strings and a numerical argument.  The idea behind this is that using
5929the numerical argument and the first string as a key, the implementation
5930can select using rules specified by the translator the right plural
5931form.  The two string arguments then will be used to provide a return
5932value in case no message catalog is found (similar to the normal
5933@code{gettext} behavior).  In this case the rules for Germanic language
5934is used and it is assumed that the first string argument is the singular
5935form, the second the plural form.
5936
5937This has the consequence that programs without language catalogs can
5938display the correct strings only if the program itself is written using
5939a Germanic language.  This is a limitation but since the GNU C library
5940(as well as the GNU @code{gettext} package) are written as part of the
5941GNU package and the coding standards for the GNU project require program
5942being written in English, this solution nevertheless fulfills its
5943purpose.
5944
5945@deftypefun {char *} ngettext (const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
5946The @code{ngettext} function is similar to the @code{gettext} function
5947as it finds the message catalogs in the same way.  But it takes two
5948extra arguments.  The @var{msgid1} parameter must contain the singular
5949form of the string to be converted.  It is also used as the key for the
5950search in the catalog.  The @var{msgid2} parameter is the plural form.
5951The parameter @var{n} is used to determine the plural form.  If no
5952message catalog is found @var{msgid1} is returned if @code{n == 1},
5953otherwise @code{msgid2}.
5954
5955An example for the use of this function is:
5956
5957@smallexample
5958printf (ngettext ("%d file removed", "%d files removed", n), n);
5959@end smallexample
5960
5961Please note that the numeric value @var{n} has to be passed to the
5962@code{printf} function as well.  It is not sufficient to pass it only to
5963@code{ngettext}.
5964
5965In the English singular case, the number -- always 1 -- can be replaced with
5966"one":
5967
5968@smallexample
5969printf (ngettext ("One file removed", "%d files removed", n), n);
5970@end smallexample
5971
5972@noindent
5973This works because the @samp{printf} function discards excess arguments that
5974are not consumed by the format string.
5975
5976It is also possible to use this function when the strings don't contain a
5977cardinal number:
5978
5979@smallexample
5980puts (ngettext ("Delete the selected file?",
5981                "Delete the selected files?",
5982                n));
5983@end smallexample
5984
5985In this case the number @var{n} is only used to choose the plural form.
5986@end deftypefun
5987
5988@deftypefun {char *} dngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
5989The @code{dngettext} is similar to the @code{dgettext} function in the
5990way the message catalog is selected.  The difference is that it takes
5991two extra parameter to provide the correct plural form.  These two
5992parameters are handled in the same way @code{ngettext} handles them.
5993@end deftypefun
5994
5995@deftypefun {char *} dcngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}, int @var{category})
5996The @code{dcngettext} is similar to the @code{dcgettext} function in the
5997way the message catalog is selected.  The difference is that it takes
5998two extra parameter to provide the correct plural form.  These two
5999parameters are handled in the same way @code{ngettext} handles them.
6000@end deftypefun
6001
6002Now, how do these functions solve the problem of the plural forms?
6003Without the input of linguists (which was not available) it was not
6004possible to determine whether there are only a few different forms in
6005which plural forms are formed or whether the number can increase with
6006every new supported language.
6007
6008Therefore the solution implemented is to allow the translator to specify
6009the rules of how to select the plural form.  Since the formula varies
6010with every language this is the only viable solution except for
6011hardcoding the information in the code (which still would require the
6012possibility of extensions to not prevent the use of new languages).
6013
6014@cindex specifying plural form in a PO file
6015@kwindex nplurals@r{, in a PO file header}
6016@kwindex plural@r{, in a PO file header}
6017The information about the plural form selection has to be stored in the
6018header entry of the PO file (the one with the empty @code{msgid} string).
6019The plural form information looks like this:
6020
6021@smallexample
6022Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1;
6023@end smallexample
6024
6025The @code{nplurals} value must be a decimal number which specifies how
6026many different plural forms exist for this language.  The string
6027following @code{plural} is an expression which is using the C language
6028syntax.  Exceptions are that no negative numbers are allowed, numbers
6029must be decimal, and the only variable allowed is @code{n}.  Spaces are
6030allowed in the expression, but backslash-newlines are not; in the
6031examples below the backslash-newlines are present for formatting purposes
6032only.  This expression will be evaluated whenever one of the functions
6033@code{ngettext}, @code{dngettext}, or @code{dcngettext} is called.  The
6034numeric value passed to these functions is then substituted for all uses
6035of the variable @code{n} in the expression.  The resulting value then
6036must be greater or equal to zero and smaller than the value given as the
6037value of @code{nplurals}.
6038
6039@noindent
6040@cindex plural form formulas
6041The following rules are known at this point.  The language with families
6042are listed.  But this does not necessarily mean the information can be
6043generalized for the whole family (as can be easily seen in the table
6044below).@footnote{Additions are welcome.  Send appropriate information to
6045@email{bug-gnu-gettext@@gnu.org} and @email{bug-glibc-manual@@gnu.org}.}
6046
6047@table @asis
6048@item Only one form:
6049Some languages only require one single form.  There is no distinction
6050between the singular and plural form.  An appropriate header entry
6051would look like this:
6052
6053@smallexample
6054Plural-Forms: nplurals=1; plural=0;
6055@end smallexample
6056
6057@noindent
6058Languages with this property include:
6059
6060@table @asis
6061@item Asian family
6062Japanese, Korean, Vietnamese
6063@item Turkic/Altaic family
6064Turkish
6065@end table
6066
6067@item Two forms, singular used for one only
6068This is the form used in most existing programs since it is what English
6069is using.  A header entry would look like this:
6070
6071@smallexample
6072Plural-Forms: nplurals=2; plural=n != 1;
6073@end smallexample
6074
6075(Note: this uses the feature of C expressions that boolean expressions
6076have to value zero or one.)
6077
6078@noindent
6079Languages with this property include:
6080
6081@table @asis
6082@item Germanic family
6083Danish, Dutch, English, Faroese, German, Norwegian, Swedish
6084@item Finno-Ugric family
6085Estonian, Finnish
6086@item Latin/Greek family
6087Greek
6088@item Semitic family
6089Hebrew
6090@item Romanic family
6091Italian, Portuguese, Spanish
6092@item Artificial
6093Esperanto
6094@end table
6095
6096@noindent
6097Another language using the same header entry is:
6098
6099@table @asis
6100@item Finno-Ugric family
6101Hungarian
6102@end table
6103
6104Hungarian does not appear to have a plural if you look at sentences involving
6105cardinal numbers.  For example, ``1 apple'' is ``1 alma'', and ``123 apples'' is
6106``123 alma''.  But when the number is not explicit, the distinction between
6107singular and plural exists: ``the apple'' is ``az alma'', and ``the apples'' is
6108``az alm@'{a}k''.  Since @code{ngettext} has to support both types of sentences,
6109it is classified here, under ``two forms''.
6110
6111@item Two forms, singular used for zero and one
6112Exceptional case in the language family.  The header entry would be:
6113
6114@smallexample
6115Plural-Forms: nplurals=2; plural=n>1;
6116@end smallexample
6117
6118@noindent
6119Languages with this property include:
6120
6121@table @asis
6122@item Romanic family
6123French, Brazilian Portuguese
6124@end table
6125
6126@item Three forms, special case for zero
6127The header entry would be:
6128
6129@smallexample
6130Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2;
6131@end smallexample
6132
6133@noindent
6134Languages with this property include:
6135
6136@table @asis
6137@item Baltic family
6138Latvian
6139@end table
6140
6141@item Three forms, special cases for one and two
6142The header entry would be:
6143
6144@smallexample
6145Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2;
6146@end smallexample
6147
6148@noindent
6149Languages with this property include:
6150
6151@table @asis
6152@item Celtic
6153Gaeilge (Irish)
6154@end table
6155
6156@item Three forms, special case for numbers ending in 00 or [2-9][0-9]
6157The header entry would be:
6158
6159@smallexample
6160Plural-Forms: nplurals=3; \
6161    plural=n==1 ? 0 : (n==0 || (n%100 > 0 && n%100 < 20)) ? 1 : 2;
6162@end smallexample
6163
6164@noindent
6165Languages with this property include:
6166
6167@table @asis
6168@item Romanic family
6169Romanian
6170@end table
6171
6172@item Three forms, special case for numbers ending in 1[2-9]
6173The header entry would look like this:
6174
6175@smallexample
6176Plural-Forms: nplurals=3; \
6177    plural=n%10==1 && n%100!=11 ? 0 : \
6178           n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2;
6179@end smallexample
6180
6181@noindent
6182Languages with this property include:
6183
6184@table @asis
6185@item Baltic family
6186Lithuanian
6187@end table
6188
6189@item Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4]
6190The header entry would look like this:
6191
6192@smallexample
6193Plural-Forms: nplurals=3; \
6194    plural=n%10==1 && n%100!=11 ? 0 : \
6195           n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
6196@end smallexample
6197
6198@noindent
6199Languages with this property include:
6200
6201@table @asis
6202@item Slavic family
6203Croatian, Serbian, Russian, Ukrainian
6204@end table
6205
6206@item Three forms, special cases for 1 and 2, 3, 4
6207The header entry would look like this:
6208
6209@smallexample
6210Plural-Forms: nplurals=3; \
6211    plural=(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2;
6212@end smallexample
6213
6214@noindent
6215Languages with this property include:
6216
6217@table @asis
6218@item Slavic family
6219Slovak, Czech
6220@end table
6221
6222@item Three forms, special case for one and some numbers ending in 2, 3, or 4
6223The header entry would look like this:
6224
6225@smallexample
6226Plural-Forms: nplurals=3; \
6227    plural=n==1 ? 0 : \
6228           n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
6229@end smallexample
6230
6231@noindent
6232Languages with this property include:
6233
6234@table @asis
6235@item Slavic family
6236Polish
6237@end table
6238
6239@item Four forms, special case for one and all numbers ending in 02, 03, or 04
6240The header entry would look like this:
6241
6242@smallexample
6243Plural-Forms: nplurals=4; \
6244    plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3;
6245@end smallexample
6246
6247@noindent
6248Languages with this property include:
6249
6250@table @asis
6251@item Slavic family
6252Slovenian
6253@end table
6254@end table
6255
6256You might now ask, @code{ngettext} handles only numbers @var{n} of type
6257@samp{unsigned long}.  What about larger integer types?  What about negative
6258numbers?  What about floating-point numbers?
6259
6260About larger integer types, such as @samp{uintmax_t} or 
6261@samp{unsigned long long}: they can be handled by reducing the value to a
6262range that fits in an @samp{unsigned long}.  Simply casting the value to
6263@samp{unsigned long} would not do the right thing, since it would treat
6264@code{ULONG_MAX + 1} like zero, @code{ULONG_MAX + 2} like singular, and
6265the like.  Here you can exploit the fact that all mentioned plural form
6266formulas eventually become periodic, with a period that is a divisor of 100
6267(or 1000 or 1000000).  So, when you reduce a large value to another one in
6268the range [1000000, 1999999] that ends in the same 6 decimal digits, you
6269can assume that it will lead to the same plural form selection.  This code
6270does this:
6271
6272@smallexample
6273#include <inttypes.h>
6274uintmax_t nbytes = ...;
6275printf (ngettext ("The file has %"PRIuMAX" byte.",
6276                  "The file has %"PRIuMAX" bytes.",
6277                  (nbytes > ULONG_MAX
6278                   ? (nbytes % 1000000) + 1000000
6279                   : nbytes)),
6280        nbytes);
6281@end smallexample
6282
6283Negative and floating-point values usually represent physical entities for
6284which singular and plural don't clearly apply.  In such cases, there is no
6285need to use @code{ngettext}; a simple @code{gettext} call with a form suitable
6286for all values will do.  For example:
6287
6288@smallexample
6289printf (gettext ("Time elapsed: %.3f seconds"),
6290        num_milliseconds * 0.001);
6291@end smallexample
6292
6293@noindent
6294Even if @var{num_milliseconds} happens to be a multiple of 1000, the output
6295@smallexample
6296Time elapsed: 1.000 seconds
6297@end smallexample
6298@noindent
6299is acceptable in English, and similarly for other languages.
6300
6301@node Optimized gettext,  , Plural forms, gettext
6302@subsection Optimization of the *gettext functions
6303@cindex optimization of @code{gettext} functions
6304
6305At this point of the discussion we should talk about an advantage of the
6306GNU @code{gettext} implementation.  Some readers might have pointed out
6307that an internationalized program might have a poor performance if some
6308string has to be translated in an inner loop.  While this is unavoidable
6309when the string varies from one run of the loop to the other it is
6310simply a waste of time when the string is always the same.  Take the
6311following example:
6312
6313@example
6314@group
6315@{
6316  while (@dots{})
6317    @{
6318      puts (gettext ("Hello world"));
6319    @}
6320@}
6321@end group
6322@end example
6323
6324@noindent
6325When the locale selection does not change between two runs the resulting
6326string is always the same.  One way to use this is:
6327
6328@example
6329@group
6330@{
6331  str = gettext ("Hello world");
6332  while (@dots{})
6333    @{
6334      puts (str);
6335    @}
6336@}
6337@end group
6338@end example
6339
6340@noindent
6341But this solution is not usable in all situation (e.g.@: when the locale
6342selection changes) nor does it lead to legible code.
6343
6344For this reason, GNU @code{gettext} caches previous translation results.
6345When the same translation is requested twice, with no new message
6346catalogs being loaded in between, @code{gettext} will, the second time,
6347find the result through a single cache lookup.
6348
6349@node Comparison, Using libintl.a, gettext, Programmers
6350@section Comparing the Two Interfaces
6351@cindex @code{gettext} vs @code{catgets}
6352@cindex comparison of interfaces
6353
6354@c FIXME: arguments to catgets vs. gettext
6355@c Partly done 950718 -- drepper
6356
6357The following discussion is perhaps a little bit colored.  As said
6358above we implemented GNU @code{gettext} following the Uniforum
6359proposal and this surely has its reasons.  But it should show how we
6360came to this decision.
6361
6362First we take a look at the developing process.  When we write an
6363application using NLS provided by @code{gettext} we proceed as always.
6364Only when we come to a string which might be seen by the users and thus
6365has to be translated we use @code{gettext("@dots{}")} instead of
6366@code{"@dots{}"}.  At the beginning of each source file (or in a central
6367header file) we define
6368
6369@example
6370#define gettext(String) (String)
6371@end example
6372
6373Even this definition can be avoided when the system supports the
6374@code{gettext} function in its C library.  When we compile this code the
6375result is the same as if no NLS code is used.  When  you take a look at
6376the GNU @code{gettext} code you will see that we use @code{_("@dots{}")}
6377instead of @code{gettext("@dots{}")}.  This reduces the number of
6378additional characters per translatable string to @emph{3} (in words:
6379three).
6380
6381When now a production version of the program is needed we simply replace
6382the definition
6383
6384@example
6385#define _(String) (String)
6386@end example
6387
6388@noindent
6389by
6390
6391@cindex include file @file{libintl.h}
6392@example
6393#include <libintl.h>
6394#define _(String) gettext (String)
6395@end example
6396
6397@noindent
6398Additionally we run the program @file{xgettext} on all source code file
6399which contain translatable strings and that's it: we have a running
6400program which does not depend on translations to be available, but which
6401can use any that becomes available.
6402
6403@cindex @code{N_}, a convenience macro
6404The same procedure can be done for the @code{gettext_noop} invocations
6405(@pxref{Special cases}).  One usually defines @code{gettext_noop} as a
6406no-op macro.  So you should consider the following code for your project:
6407
6408@example
6409#define gettext_noop(String) String
6410#define N_(String) gettext_noop (String)
6411@end example
6412
6413@code{N_} is a short form similar to @code{_}.  The @file{Makefile} in
6414the @file{po/} directory of GNU @code{gettext} knows by default both of the
6415mentioned short forms so you are invited to follow this proposal for
6416your own ease.
6417
6418Now to @code{catgets}.  The main problem is the work for the
6419programmer.  Every time he comes to a translatable string he has to
6420define a number (or a symbolic constant) which has also be defined in
6421the message catalog file.  He also has to take care for duplicate
6422entries, duplicate message IDs etc.  If he wants to have the same
6423quality in the message catalog as the GNU @code{gettext} program
6424provides he also has to put the descriptive comments for the strings and
6425the location in all source code files in the message catalog.  This is
6426nearly a Mission: Impossible.
6427
6428But there are also some points people might call advantages speaking for
6429@code{catgets}.  If you have a single word in a string and this string
6430is used in different contexts it is likely that in one or the other
6431language the word has different translations.  Example:
6432
6433@example
6434printf ("%s: %d", gettext ("number"), number_of_errors)
6435
6436printf ("you should see %d %s", number_count,
6437        number_count == 1 ? gettext ("number") : gettext ("numbers"))
6438@end example
6439
6440Here we have to translate two times the string @code{"number"}.  Even
6441if you do not speak a language beside English it might be possible to
6442recognize that the two words have a different meaning.  In German the
6443first appearance has to be translated to @code{"Anzahl"} and the second
6444to @code{"Zahl"}.
6445
6446Now you can say that this example is really esoteric.  And you are
6447right!  This is exactly how we felt about this problem and decide that
6448it does not weight that much.  The solution for the above problem could
6449be very easy:
6450
6451@example
6452printf ("%s %d", gettext ("number:"), number_of_errors)
6453
6454printf (number_count == 1 ? gettext ("you should see %d number")
6455                          : gettext ("you should see %d numbers"),
6456        number_count)
6457@end example
6458
6459We believe that we can solve all conflicts with this method.  If it is
6460difficult one can also consider changing one of the conflicting string a
6461little bit.  But it is not impossible to overcome.
6462
6463@code{catgets} allows same original entry to have different translations,
6464but @code{gettext} has another, scalable approach for solving ambiguities
6465of this kind: @xref{Ambiguities}.
6466
6467@node Using libintl.a, gettext grok, Comparison, Programmers
6468@section Using libintl.a in own programs
6469
6470Starting with version 0.9.4 the library @code{libintl.h} should be
6471self-contained.  I.e., you can use it in your own programs without
6472providing additional functions.  The @file{Makefile} will put the header
6473and the library in directories selected using the @code{$(prefix)}.
6474
6475@node gettext grok, Temp Programmers, Using libintl.a, Programmers
6476@section Being a @code{gettext} grok
6477
6478@strong{ NOTE: } This documentation section is outdated and needs to be
6479revised.
6480
6481To fully exploit the functionality of the GNU @code{gettext} library it
6482is surely helpful to read the source code.  But for those who don't want
6483to spend that much time in reading the (sometimes complicated) code here
6484is a list comments:
6485
6486@itemize @bullet
6487@item Changing the language at runtime
6488@cindex language selection at runtime
6489
6490For interactive programs it might be useful to offer a selection of the
6491used language at runtime.  To understand how to do this one need to know
6492how the used language is determined while executing the @code{gettext}
6493function.  The method which is presented here only works correctly
6494with the GNU implementation of the @code{gettext} functions.
6495
6496In the function @code{dcgettext} at every call the current setting of
6497the highest priority environment variable is determined and used.
6498Highest priority means here the following list with decreasing
6499priority:
6500
6501@enumerate
6502@vindex LANGUAGE@r{, environment variable}
6503@item @code{LANGUAGE}
6504@vindex LC_ALL@r{, environment variable}
6505@item @code{LC_ALL}
6506@vindex LC_CTYPE@r{, environment variable}
6507@vindex LC_NUMERIC@r{, environment variable}
6508@vindex LC_TIME@r{, environment variable}
6509@vindex LC_COLLATE@r{, environment variable}
6510@vindex LC_MONETARY@r{, environment variable}
6511@vindex LC_MESSAGES@r{, environment variable}
6512@item @code{LC_xxx}, according to selected locale category
6513@vindex LANG@r{, environment variable}
6514@item @code{LANG}
6515@end enumerate
6516
6517Afterwards the path is constructed using the found value and the
6518translation file is loaded if available.
6519
6520What happens now when the value for, say, @code{LANGUAGE} changes?  According
6521to the process explained above the new value of this variable is found
6522as soon as the @code{dcgettext} function is called.  But this also means
6523the (perhaps) different message catalog file is loaded.  In other
6524words: the used language is changed.
6525
6526But there is one little hook.  The code for gcc-2.7.0 and up provides
6527some optimization.  This optimization normally prevents the calling of
6528the @code{dcgettext} function as long as no new catalog is loaded.  But
6529if @code{dcgettext} is not called the program also cannot find the
6530@code{LANGUAGE} variable be changed (@pxref{Optimized gettext}).  A
6531solution for this is very easy.  Include the following code in the
6532language switching function.
6533
6534@example
6535  /* Change language.  */
6536  setenv ("LANGUAGE", "fr", 1);
6537
6538  /* Make change known.  */
6539  @{
6540    extern int  _nl_msg_cat_cntr;
6541    ++_nl_msg_cat_cntr;
6542  @}
6543@end example
6544
6545@cindex @code{_nl_msg_cat_cntr}
6546The variable @code{_nl_msg_cat_cntr} is defined in @file{loadmsgcat.c}.
6547You don't need to know what this is for.  But it can be used to detect
6548whether a @code{gettext} implementation is GNU gettext and not non-GNU
6549system's native gettext implementation.
6550
6551@end itemize
6552
6553@node Temp Programmers,  , gettext grok, Programmers
6554@section Temporary Notes for the Programmers Chapter
6555
6556@strong{ NOTE: } This documentation section is outdated and needs to be
6557revised.
6558
6559@menu
6560* Temp Implementations::        Temporary - Two Possible Implementations
6561* Temp catgets::                Temporary - About @code{catgets}
6562* Temp WSI::                    Temporary - Why a single implementation
6563* Temp Notes::                  Temporary - Notes
6564@end menu
6565
6566@node Temp Implementations, Temp catgets, Temp Programmers, Temp Programmers
6567@subsection Temporary - Two Possible Implementations
6568
6569There are two competing methods for language independent messages:
6570the X/Open @code{catgets} method, and the Uniforum @code{gettext}
6571method.  The @code{catgets} method indexes messages by integers; the
6572@code{gettext} method indexes them by their English translations.
6573The @code{catgets} method has been around longer and is supported
6574by more vendors.  The @code{gettext} method is supported by Sun,
6575and it has been heard that the COSE multi-vendor initiative is
6576supporting it.  Neither method is a POSIX standard; the POSIX.1
6577committee had a lot of disagreement in this area.
6578
6579Neither one is in the POSIX standard.  There was much disagreement
6580in the POSIX.1 committee about using the @code{gettext} routines
6581vs. @code{catgets} (XPG).  In the end the committee couldn't
6582agree on anything, so no messaging system was included as part
6583of the standard.  I believe the informative annex of the standard
6584includes the XPG3 messaging interfaces, ``@dots{}as an example of
6585a messaging system that has been implemented@dots{}''
6586
6587They were very careful not to say anywhere that you should use one
6588set of interfaces over the other.  For more on this topic please
6589see the Programming for Internationalization FAQ.
6590
6591@node Temp catgets, Temp WSI, Temp Implementations, Temp Programmers
6592@subsection Temporary - About @code{catgets}
6593
6594There have been a few discussions of late on the use of
6595@code{catgets} as a base.  I think it important to present both
6596sides of the argument and hence am opting to play devil's advocate
6597for a little bit.
6598
6599I'll not deny the fact that @code{catgets} could have been designed
6600a lot better.  It currently has quite a number of limitations and
6601these have already been pointed out.
6602
6603However there is a great deal to be said for consistency and
6604standardization.  A common recurring problem when writing Unix
6605software is the myriad portability problems across Unix platforms.
6606It seems as if every Unix vendor had a look at the operating system
6607and found parts they could improve upon.  Undoubtedly, these
6608modifications are probably innovative and solve real problems.
6609However, software developers have a hard time keeping up with all
6610these changes across so many platforms.
6611
6612And this has prompted the Unix vendors to begin to standardize their
6613systems.  Hence the impetus for Spec1170.  Every major Unix vendor
6614has committed to supporting this standard and every Unix software
6615developer waits with glee the day they can write software to this
6616standard and simply recompile (without having to use autoconf)
6617across different platforms.
6618
6619As I understand it, Spec1170 is roughly based upon version 4 of the
6620X/Open Portability Guidelines (XPG4).  Because @code{catgets} and
6621friends are defined in XPG4, I'm led to believe that @code{catgets}
6622is a part of Spec1170 and hence will become a standardized component
6623of all Unix systems.
6624
6625@node Temp WSI, Temp Notes, Temp catgets, Temp Programmers
6626@subsection Temporary - Why a single implementation
6627
6628Now it seems kind of wasteful to me to have two different systems
6629installed for accessing message catalogs.  If we do want to remedy
6630@code{catgets} deficiencies why don't we try to expand @code{catgets}
6631(in a compatible manner) rather than implement an entirely new system.
6632Otherwise, we'll end up with two message catalog access systems installed
6633with an operating system - one set of routines for packages using GNU
6634@code{gettext} for their internationalization, and another set of routines
6635(catgets) for all other software.  Bloated?
6636
6637Supposing another catalog access system is implemented.  Which do
6638we recommend?  At least for Linux, we need to attract as many
6639software developers as possible.  Hence we need to make it as easy
6640for them to port their software as possible.  Which means supporting
6641@code{catgets}.  We will be implementing the @code{libintl} code
6642within our @code{libc}, but does this mean we also have to incorporate
6643another message catalog access scheme within our @code{libc} as well?
6644And what about people who are going to be using the @code{libintl}
6645+ non-@code{catgets} routines.  When they port their software to
6646other platforms, they're now going to have to include the front-end
6647(@code{libintl}) code plus the back-end code (the non-@code{catgets}
6648access routines) with their software instead of just including the
6649@code{libintl} code with their software.
6650
6651Message catalog support is however only the tip of the iceberg.
6652What about the data for the other locale categories?  They also have
6653a number of deficiencies.  Are we going to abandon them as well and
6654develop another duplicate set of routines (should @code{libintl}
6655expand beyond message catalog support)?
6656
6657Like many parts of Unix that can be improved upon, we're stuck with balancing
6658compatibility with the past with useful improvements and innovations for
6659the future.
6660
6661@node Temp Notes,  , Temp WSI, Temp Programmers
6662@subsection Temporary - Notes
6663
6664X/Open agreed very late on the standard form so that many
6665implementations differ from the final form.  Both of my system (old
6666Linux catgets and Ultrix-4) have a strange variation.
6667
6668OK.  After incorporating the last changes I have to spend some time on
6669making the GNU/Linux @code{libc} @code{gettext} functions.  So in future
6670Solaris is not the only system having @code{gettext}.
6671
6672@node Translators, Maintainers, Programmers, Top
6673@chapter The Translator's View
6674
6675@c FIXME: Reorganize whole chapter.
6676
6677@menu
6678* Trans Intro 0::               Introduction 0
6679* Trans Intro 1::               Introduction 1
6680* Discussions::                 Discussions
6681* Organization::                Organization
6682* Information Flow::            Information Flow
6683* Prioritizing messages::       How to find which messages to translate first
6684@end menu
6685
6686@node Trans Intro 0, Trans Intro 1, Translators, Translators
6687@section Introduction 0
6688
6689@strong{ NOTE: } This documentation section is outdated and needs to be
6690revised.
6691
6692Free software is going international!  The Translation Project is a way
6693to get maintainers, translators and users all together, so free software
6694will gradually become able to speak many native languages.
6695
6696The GNU @code{gettext} tool set contains @emph{everything} maintainers
6697need for internationalizing their packages for messages.  It also
6698contains quite useful tools for helping translators at localizing
6699messages to their native language, once a package has already been
6700internationalized.
6701
6702To achieve the Translation Project, we need many interested
6703people who like their own language and write it well, and who are also
6704able to synergize with other translators speaking the same language.
6705If you'd like to volunteer to @emph{work} at translating messages,
6706please send mail to your translating team.
6707
6708Each team has its own mailing list, courtesy of Linux
6709International.  You may reach your translating team at the address
6710@file{@var{ll}@@li.org}, replacing @var{ll} by the two-letter @w{ISO 639}
6711code for your language.  Language codes are @emph{not} the same as
6712country codes given in @w{ISO 3166}.  The following translating teams
6713exist:
6714
6715@quotation
6716Chinese @code{zh}, Czech @code{cs}, Danish @code{da}, Dutch @code{nl},
6717Esperanto @code{eo}, Finnish @code{fi}, French @code{fr}, Irish
6718@code{ga}, German @code{de}, Greek @code{el}, Italian @code{it},
6719Japanese @code{ja}, Indonesian @code{in}, Norwegian @code{no}, Polish
6720@code{pl}, Portuguese @code{pt}, Russian @code{ru}, Spanish @code{es},
6721Swedish @code{sv} and Turkish @code{tr}.
6722@end quotation
6723
6724@noindent
6725For example, you may reach the Chinese translating team by writing to
6726@file{zh@@li.org}.  When you become a member of the translating team
6727for your own language, you may subscribe to its list.  For example,
6728Swedish people can send a message to @w{@file{sv-request@@li.org}},
6729having this message body:
6730
6731@example
6732subscribe
6733@end example
6734
6735Keep in mind that team members should be interested in @emph{working}
6736at translations, or at solving translational difficulties, rather than
6737merely lurking around.  If your team does not exist yet and you want to
6738start one, please write to @w{@file{coordinator@@translationproject.org}};
6739you will then reach the coordinator for all translator teams.
6740
6741A handful of GNU packages have already been adapted and provided
6742with message translations for several languages.  Translation
6743teams have begun to organize, using these packages as a starting
6744point.  But there are many more packages and many languages for
6745which we have no volunteer translators.  If you would like to
6746volunteer to work at translating messages, please send mail to
6747@file{coordinator@@translationproject.org} indicating what language(s)
6748you can work on.
6749
6750@node Trans Intro 1, Discussions, Trans Intro 0, Translators
6751@section Introduction 1
6752
6753@strong{ NOTE: } This documentation section is outdated and needs to be
6754revised.
6755
6756This is now official, GNU is going international!  Here is the
6757announcement submitted for the January 1995 GNU Bulletin:
6758
6759@quotation
6760A handful of GNU packages have already been adapted and provided
6761with message translations for several languages.  Translation
6762teams have begun to organize, using these packages as a starting
6763point.  But there are many more packages and many languages
6764for which we have no volunteer translators.  If you'd like to
6765volunteer to work at translating messages, please send mail to
6766@samp{coordinator@@translationproject.org} indicating what language(s)
6767you can work on.
6768@end quotation
6769
6770This document should answer many questions for those who are curious about
6771the process or would like to contribute.  Please at least skim over it,
6772hoping to cut down a little of the high volume of e-mail generated by this
6773collective effort towards internationalization of free software.
6774
6775Most free programming which is widely shared is done in English, and
6776currently, English is used as the main communicating language between
6777national communities collaborating to free software.  This very document
6778is written in English.  This will not change in the foreseeable future.
6779
6780However, there is a strong appetite from national communities for
6781having more software able to write using national language and habits,
6782and there is an on-going effort to modify free software in such a way
6783that it becomes able to do so.  The experiments driven so far raised
6784an enthusiastic response from pretesters, so we believe that
6785internationalization of free software is dedicated to succeed.
6786
6787For suggestion clarifications, additions or corrections to this
6788document, please e-mail to @file{coordinator@@translationproject.org}.
6789
6790@node Discussions, Organization, Trans Intro 1, Translators
6791@section Discussions
6792
6793@strong{ NOTE: } This documentation section is outdated and needs to be
6794revised.
6795
6796Facing this internationalization effort, a few users expressed their
6797concerns.  Some of these doubts are presented and discussed, here.
6798
6799@itemize @bullet
6800@item Smaller groups
6801
6802Some languages are not spoken by a very large number of people, so people
6803speaking them sometimes consider that there may not be all that much
6804demand such versions of free software packages.  Moreover, many people
6805being @emph{into computers}, in some countries, generally seem to prefer
6806English versions of their software.
6807
6808On the other end, people might enjoy their own language a lot, and be
6809very motivated at providing to themselves the pleasure of having their
6810beloved free software speaking their mother tongue.  They do themselves
6811a personal favor, and do not pay that much attention to the number of
6812people benefiting of their work.
6813
6814@item Misinterpretation
6815
6816Other users are shy to push forward their own language, seeing in this
6817some kind of misplaced propaganda.  Someone thought there must be some
6818users of the language over the networks pestering other people with it.
6819
6820But any spoken language is worth localization, because there are
6821people behind the language for whom the language is important and
6822dear to their hearts.
6823
6824@item Odd translations
6825
6826The biggest problem is to find the right translations so that
6827everybody can understand the messages.  Translations are usually a
6828little odd.  Some people get used to English, to the extent they may
6829find translations into their own language ``rather pushy, obnoxious
6830and sometimes even hilarious.''  As a French speaking man, I have
6831the experience of those instruction manuals for goods, so poorly
6832translated in French in Korea or Taiwan@dots{}
6833
6834The fact is that we sometimes have to create a kind of national
6835computer culture, and this is not easy without the collaboration of
6836many people liking their mother tongue.  This is why translations are
6837better achieved by people knowing and loving their own language, and
6838ready to work together at improving the results they obtain.
6839
6840@item Dependencies over the GPL or LGPL
6841
6842Some people wonder if using GNU @code{gettext} necessarily brings their
6843package under the protective wing of the GNU General Public License or
6844the GNU Library General Public License, when they do not want to make
6845their program free, or want other kinds of freedom.  The simplest
6846answer is ``normally not''.
6847
6848The @code{gettext-runtime} part of GNU @code{gettext}, i.e.@: the
6849contents of @code{libintl}, is covered by the GNU Library General Public
6850License.  The @code{gettext-tools} part of GNU @code{gettext}, i.e.@: the
6851rest of the GNU @code{gettext} package, is covered by the GNU General
6852Public License.
6853
6854The mere marking of localizable strings in a package, or conditional
6855inclusion of a few lines for initialization, is not really including
6856GPL'ed or LGPL'ed code.  However, since the localization routines in
6857@code{libintl} are under the LGPL, the LGPL needs to be considered.
6858It gives the right to distribute the complete unmodified source of
6859@code{libintl} even with non-free programs.  It also gives the right
6860to use @code{libintl} as a shared library, even for non-free programs.
6861But it gives the right to use @code{libintl} as a static library or
6862to incorporate @code{libintl} into another library only to free
6863software.
6864
6865@end itemize
6866
6867@node Organization, Information Flow, Discussions, Translators
6868@section Organization
6869
6870@strong{ NOTE: } This documentation section is outdated and needs to be
6871revised.
6872
6873On a larger scale, the true solution would be to organize some kind of
6874fairly precise set up in which volunteers could participate.  I gave
6875some thought to this idea lately, and realize there will be some
6876touchy points.  I thought of writing to Richard Stallman to launch
6877such a project, but feel it might be good to shake out the ideas
6878between ourselves first.  Most probably that Linux International has
6879some experience in the field already, or would like to orchestrate
6880the volunteer work, maybe.  Food for thought, in any case!
6881
6882I guess we have to setup something early, somehow, that will help
6883many possible contributors of the same language to interlock and avoid
6884work duplication, and further be put in contact for solving together
6885problems particular to their tongue (in most languages, there are many
6886difficulties peculiar to translating technical English).  My Swedish
6887contributor acknowledged these difficulties, and I'm well aware of
6888them for French.
6889
6890This is surely not a technical issue, but we should manage so the
6891effort of locale contributors be maximally useful, despite the national
6892team layer interface between contributors and maintainers.
6893
6894The Translation Project needs some setup for coordinating language
6895coordinators.  Localizing evolving programs will surely
6896become a permanent and continuous activity in the free software community,
6897once well started.
6898The setup should be minimally completed and tested before GNU
6899@code{gettext} becomes an official reality.  The e-mail address
6900@file{coordinator@@translationproject.org} has been set up for receiving
6901offers from volunteers and general e-mail on these topics.  This address
6902reaches the Translation Project coordinator.
6903
6904@menu
6905* Central Coordination::        Central Coordination
6906* National Teams::              National Teams
6907* Mailing Lists::               Mailing Lists
6908@end menu
6909
6910@node Central Coordination, National Teams, Organization, Organization
6911@subsection Central Coordination
6912
6913I also think GNU will need sooner than it thinks, that someone set up
6914a way to organize and coordinate these groups.  Some kind of group
6915of groups.  My opinion is that it would be good that GNU delegates
6916this task to a small group of collaborating volunteers, shortly.
6917Perhaps in @file{gnu.announce} a list of this national committee's
6918can be published.
6919
6920My role as coordinator would simply be to refer to Ulrich any German
6921speaking volunteer interested to localization of free software packages, and
6922maybe helping national groups to initially organize, while maintaining
6923national registries for until national groups are ready to take over.
6924In fact, the coordinator should ease volunteers to get in contact with
6925one another for creating national teams, which should then select
6926one coordinator per language, or country (regionalized language).
6927If well done, the coordination should be useful without being an
6928overwhelming task, the time to put delegations in place.
6929
6930@node National Teams, Mailing Lists, Central Coordination, Organization
6931@subsection National Teams
6932
6933I suggest we look for volunteer coordinators/editors for individual
6934languages.  These people will scan contributions of translation files
6935for various programs, for their own languages, and will ensure high
6936and uniform standards of diction.
6937
6938From my current experience with other people in these days, those who
6939provide localizations are very enthusiastic about the process, and are
6940more interested in the localization process than in the program they
6941localize, and want to do many programs, not just one.  This seems
6942to confirm that having a coordinator/editor for each language is a
6943good idea.
6944
6945We need to choose someone who is good at writing clear and concise
6946prose in the language in question.  That is hard---we can't check
6947it ourselves.  So we need to ask a few people to judge each others'
6948writing and select the one who is best.
6949
6950I announce my prerelease to a few dozen people, and you would not
6951believe all the discussions it generated already.  I shudder to think
6952what will happen when this will be launched, for true, officially,
6953world wide.  Who am I to arbitrate between two Czekolsovak users
6954contradicting each other, for example?
6955
6956I assume that your German is not much better than my French so that
6957I would not be able to judge about these formulations.  What I would
6958suggest is that for each language there is a group for people who
6959maintain the PO files and judge about changes.  I suspect there will
6960be cultural differences between how such groups of people will behave.
6961Some will have relaxed ways, reach consensus easily, and have anyone
6962of the group relate to the maintainers, while others will fight to
6963death, organize heavy administrations up to national standards, and
6964use strict channels.
6965
6966The German team is putting out a good example.  Right now, they are
6967maybe half a dozen people revising translations of each other and
6968discussing the linguistic issues.  I do not even have all the names.
6969Ulrich Drepper is taking care of coordinating the German team.
6970He subscribed to all my pretest lists, so I do not even have to warn
6971him specifically of incoming releases.
6972
6973I'm sure, that is a good idea to get teams for each language working
6974on translations.  That will make the translations better and more
6975consistent.
6976
6977@menu
6978* Sub-Cultures::                Sub-Cultures
6979* Organizational Ideas::        Organizational Ideas
6980@end menu
6981
6982@node Sub-Cultures, Organizational Ideas, National Teams, National Teams
6983@subsubsection Sub-Cultures
6984
6985Taking French for example, there are a few sub-cultures around computers
6986which developed diverging vocabularies.  Picking volunteers here and
6987there without addressing this problem in an organized way, soon in the
6988project, might produce a distasteful mix of internationalized programs,
6989and possibly trigger endless quarrels among those who really care.
6990
6991Keeping some kind of unity in the way French localization of
6992internationalized programs is achieved is a difficult (and delicate) job.
6993Knowing the latin character of French people (:-), if we take this
6994the wrong way, we could end up nowhere, or spoil a lot of energies.
6995Maybe we should begin to address this problem seriously @emph{before}
6996GNU @code{gettext} become officially published.  And I suspect that this
6997means soon!
6998
6999@node Organizational Ideas,  , Sub-Cultures, National Teams
7000@subsubsection Organizational Ideas
7001
7002I expect the next big changes after the official release.  Please note
7003that I use the German translation of the short GPL message.  We need
7004to set a few good examples before the localization goes out for true
7005in the free software community.  Here are a few points to discuss:
7006
7007@itemize @bullet
7008@item
7009Each group should have one FTP server (at least one master).
7010
7011@item
7012The files on the server should reflect the latest version (of
7013course!) and it should also contain a RCS directory with the
7014corresponding archives (I don't have this now).
7015
7016@item
7017There should also be a ChangeLog file (this is more useful than the
7018RCS archive but can be generated automatically from the later by
7019Emacs).
7020
7021@item
7022A @dfn{core group} should judge about questionable changes (for now
7023this group consists solely by me but I ask some others occasionally;
7024this also seems to work).
7025
7026@end itemize
7027
7028@node Mailing Lists,  , National Teams, Organization
7029@subsection Mailing Lists
7030
7031If we get any inquiries about GNU @code{gettext}, send them on to:
7032
7033@example
7034@file{coordinator@@translationproject.org}
7035@end example
7036
7037The @file{*-pretest} lists are quite useful to me, maybe the idea could
7038be generalized to many GNU, and non-GNU packages.  But each maintainer
7039his/her way!
7040
7041Fran@,{c}ois, we have a mechanism in place here at
7042@file{gnu.ai.mit.edu} to track teams, support mailing lists for
7043them and log members.  We have a slight preference that you use it.
7044If this is OK with you, I can get you clued in.
7045
7046Things are changing!  A few years ago, when Daniel Fekete and I
7047asked for a mailing list for GNU localization, nested at the FSF, we
7048were politely invited to organize it anywhere else, and so did we.
7049For communicating with my pretesters, I later made a handful of
7050mailing lists located at iro.umontreal.ca and administrated by
7051@code{majordomo}.  These lists have been @emph{very} dependable
7052so far@dots{}
7053
7054I suspect that the German team will organize itself a mailing list
7055located in Germany, and so forth for other countries.  But before they
7056organize for true, it could surely be useful to offer mailing lists
7057located at the FSF to each national team.  So yes, please explain me
7058how I should proceed to create and handle them.
7059
7060We should create temporary mailing lists, one per country, to help
7061people organize.  Temporary, because once regrouped and structured, it
7062would be fair the volunteers from country bring back @emph{their} list
7063in there and manage it as they want.  My feeling is that, in the long
7064run, each team should run its own list, from within their country.
7065There also should be some central list to which all teams could
7066subscribe as they see fit, as long as each team is represented in it.
7067
7068@node Information Flow, Prioritizing messages, Organization, Translators
7069@section Information Flow
7070
7071@strong{ NOTE: } This documentation section is outdated and needs to be
7072revised.
7073
7074There will surely be some discussion about this messages after the
7075packages are finally released.  If people now send you some proposals
7076for better messages, how do you proceed?  Jim, please note that
7077right now, as I put forward nearly a dozen of localizable programs, I
7078receive both the translations and the coordination concerns about them.
7079
7080If I put one of my things to pretest, Ulrich receives the announcement
7081and passes it on to the German team, who make last minute revisions.
7082Then he submits the translation files to me @emph{as the maintainer}.
7083For free packages I do not maintain, I would not even hear about it.
7084This scheme could be made to work for the whole Translation Project,
7085I think.  For security reasons, maybe Ulrich (national coordinators,
7086in fact) should update central registry kept at the Translation Project
7087(Jim, me, or Len's recruits) once in a while.
7088
7089In December/January, I was aggressively ready to internationalize
7090all of GNU, giving myself the duty of one small GNU package per week
7091or so, taking many weeks or months for bigger packages.  But it does
7092not work this way.  I first did all the things I'm responsible for.
7093I've nothing against some missionary work on other maintainers, but
7094I'm also loosing a lot of energy over it---same debates over again.
7095
7096And when the first localized packages are released we'll get a lot of
7097responses about ugly translations :-).  Surely, and we need to have
7098beforehand a fairly good idea about how to handle the information
7099flow between the national teams and the package maintainers.
7100
7101Please start saving somewhere a quick history of each PO file.  I know
7102for sure that the file format will change, allowing for comments.
7103It would be nice that each file has a kind of log, and references for
7104those who want to submit comments or gripes, or otherwise contribute.
7105I sent a proposal for a fast and flexible format, but it is not
7106receiving acceptance yet by the GNU deciders.  I'll tell you when I
7107have more information about this.
7108
7109@node Prioritizing messages,  , Information Flow, Translators
7110@section Prioritizing messages: How to determine which messages to translate first
7111
7112A translator sometimes has only a limited amount of time per week to
7113spend on a package, and some packages have quite large message catalogs
7114(over 1000 messages).  Therefore she wishes to translate the messages
7115first that are the most visible to the user, or that occur most frequently.
7116This section describes how to determine these "most urgent" messages.
7117It also applies to determine the "next most urgent" messages after the
7118message catalog has already been partially translated.
7119
7120In a first step, she uses the programs like a user would do.  While she
7121does this, the GNU @code{gettext} library logs into a file the not yet
7122translated messages for which a translation was requested from the program.
7123
7124In a second step, she uses the PO mode to translate precisely this set
7125of messages.
7126
7127@vindex GETTEXT_LOG_UNTRANSLATED@r{, environment variable}
7128Here a more details.  The GNU @code{libintl} library (but not the
7129corresponding functions in GNU @code{libc}) supports an environment variable
7130@code{GETTEXT_LOG_UNTRANSLATED}.  The GNU @code{libintl} library will
7131log into this file the messages for which @code{gettext()} and related
7132functions couldn't find the translation.  If the file doesn't exist, it
7133will be created as needed.  On systems with GNU @code{libc} a shared library
7134@samp{preloadable_libintl.so} is provided that can be used with the ELF
7135@samp{LD_PRELOAD} mechanism.
7136
7137So, in the first step, the translator uses these commands on systems with
7138GNU @code{libc}:
7139
7140@smallexample
7141$ LD_PRELOAD=/usr/local/lib/preloadable_libintl.so
7142$ export LD_PRELOAD
7143$ GETTEXT_LOG_UNTRANSLATED=$HOME/gettextlogused
7144$ export GETTEXT_LOG_UNTRANSLATED
7145@end smallexample
7146
7147@noindent
7148and these commands on other systems:
7149
7150@smallexample
7151$ GETTEXT_LOG_UNTRANSLATED=$HOME/gettextlogused
7152$ export GETTEXT_LOG_UNTRANSLATED
7153@end smallexample
7154
7155Then she uses and peruses the programs.  (It is a good and recommended
7156practice to use the programs for which you provide translations: it
7157gives you the needed context.)  When done, she removes the environment
7158variables:
7159
7160@smallexample
7161$ unset LD_PRELOAD
7162$ unset GETTEXT_LOG_UNTRANSLATED
7163@end smallexample
7164
7165The second step starts with removing duplicates:
7166
7167@smallexample
7168$ msguniq $HOME/gettextlogused > missing.po
7169@end smallexample
7170
7171The result is a PO file, but needs some preprocessing before a PO file editor
7172can be used with it.  First, it is a multi-domain PO file, containing
7173messages from many translation domains.  Second, it lacks all translator
7174comments and source references.  Here is how to get a list of the affected
7175translation domains:
7176
7177@smallexample
7178$ sed -n -e 's,^domain "\(.*\)"$,\1,p' < missing.po | sort | uniq
7179@end smallexample
7180
7181Then the translator can handle the domains one by one.  For simplicity,
7182let's use environment variables to denote the language, domain and source
7183package.
7184
7185@smallexample
7186$ lang=nl             # your language
7187$ domain=coreutils    # the name of the domain to be handled
7188$ package=/usr/src/gnu/coreutils-4.5.4   # the package where it comes from
7189@end smallexample
7190
7191She takes the latest copy of @file{$lang.po} from the Translation Project,
7192or from the package (in most cases, @file{$package/po/$lang.po}), or
7193creates a fresh one if she's the first translator (see @ref{Creating}).
7194She then uses the following commands to mark the not urgent messages as
7195"obsolete".  (This doesn't mean that these messages - translated and
7196untranslated ones - will go away.  It simply means that the PO file editor
7197will ignore them in the following editing session.)
7198
7199@smallexample
7200$ msggrep --domain=$domain missing.po | grep -v '^domain' \
7201  > $domain-missing.po
7202$ msgattrib --set-obsolete --ignore-file $domain-missing.po $domain.$lang.po \
7203  > $domain.$lang-urgent.po
7204@end smallexample
7205
7206The she translates @file{$domain.$lang-urgent.po} by use of a PO file editor
7207(@pxref{Editing}).
7208(FIXME: I don't know whether @code{KBabel} and @code{gtranslator} also
7209preserve obsolete messages, as they should.)
7210Finally she restores the not urgent messages (with their earlier
7211translations, for those which were already translated) through this command:
7212
7213@smallexample
7214$ msgmerge --no-fuzzy-matching $domain.$lang-urgent.po $package/po/$domain.pot \
7215  > $domain.$lang.po
7216@end smallexample
7217
7218Then she can submit @file{$domain.$lang.po} and proceed to the next domain.
7219
7220@node Maintainers, Installers, Translators, Top
7221@chapter The Maintainer's View
7222@cindex package maintainer's view of @code{gettext}
7223
7224The maintainer of a package has many responsibilities.  One of them
7225is ensuring that the package will install easily on many platforms,
7226and that the magic we described earlier (@pxref{Users}) will work
7227for installers and end users.
7228
7229Of course, there are many possible ways by which GNU @code{gettext}
7230might be integrated in a distribution, and this chapter does not cover
7231them in all generality.  Instead, it details one possible approach which
7232is especially adequate for many free software distributions following GNU
7233standards, or even better, Gnits standards, because GNU @code{gettext}
7234is purposely for helping the internationalization of the whole GNU
7235project, and as many other good free packages as possible.  So, the
7236maintainer's view presented here presumes that the package already has
7237a @file{configure.ac} file and uses GNU Autoconf.
7238
7239Nevertheless, GNU @code{gettext} may surely be useful for free packages
7240not following GNU standards and conventions, but the maintainers of such
7241packages might have to show imagination and initiative in organizing
7242their distributions so @code{gettext} work for them in all situations.
7243There are surely many, out there.
7244
7245Even if @code{gettext} methods are now stabilizing, slight adjustments
7246might be needed between successive @code{gettext} versions, so you
7247should ideally revise this chapter in subsequent releases, looking
7248for changes.
7249
7250@menu
7251* Flat and Non-Flat::           Flat or Non-Flat Directory Structures
7252* Prerequisites::               Prerequisite Works
7253* gettextize Invocation::       Invoking the @code{gettextize} Program
7254* Adjusting Files::             Files You Must Create or Alter
7255* autoconf macros::             Autoconf macros for use in @file{configure.ac}
7256* CVS Issues::                  Integrating with CVS
7257* Release Management::          Creating a Distribution Tarball
7258@end menu
7259
7260@node Flat and Non-Flat, Prerequisites, Maintainers, Maintainers
7261@section Flat or Non-Flat Directory Structures
7262
7263Some free software packages are distributed as @code{tar} files which unpack
7264in a single directory, these are said to be @dfn{flat} distributions.
7265Other free software packages have a one level hierarchy of subdirectories, using
7266for example a subdirectory named @file{doc/} for the Texinfo manual and
7267man pages, another called @file{lib/} for holding functions meant to
7268replace or complement C libraries, and a subdirectory @file{src/} for
7269holding the proper sources for the package.  These other distributions
7270are said to be @dfn{non-flat}.
7271
7272We cannot say much about flat distributions.  A flat
7273directory structure has the disadvantage of increasing the difficulty
7274of updating to a new version of GNU @code{gettext}.  Also, if you have
7275many PO files, this could somewhat pollute your single directory.
7276Also, GNU @code{gettext}'s libintl sources consist of C sources, shell
7277scripts, @code{sed} scripts and complicated Makefile rules, which don't
7278fit well into an existing flat structure.  For these reasons, we
7279recommend to use non-flat approach in this case as well.
7280
7281Maybe because GNU @code{gettext} itself has a non-flat structure,
7282we have more experience with this approach, and this is what will be
7283described in the remaining of this chapter.  Some maintainers might
7284use this as an opportunity to unflatten their package structure.
7285
7286@node Prerequisites, gettextize Invocation, Flat and Non-Flat, Maintainers
7287@section Prerequisite Works
7288@cindex converting a package to use @code{gettext}
7289@cindex migration from earlier versions of @code{gettext}
7290@cindex upgrading to new versions of @code{gettext}
7291
7292There are some works which are required for using GNU @code{gettext}
7293in one of your package.  These works have some kind of generality
7294that escape the point by point descriptions used in the remainder
7295of this chapter.  So, we describe them here.
7296
7297@itemize @bullet
7298@item
7299Before attempting to use @code{gettextize} you should install some
7300other packages first.
7301Ensure that recent versions of GNU @code{m4}, GNU Autoconf and GNU
7302@code{gettext} are already installed at your site, and if not, proceed
7303to do this first.  If you get to install these things, beware that
7304GNU @code{m4} must be fully installed before GNU Autoconf is even
7305@emph{configured}.
7306
7307To further ease the task of a package maintainer the @code{automake}
7308package was designed and implemented.  GNU @code{gettext} now uses this
7309tool and the @file{Makefile}s in the @file{intl/} and @file{po/}
7310therefore know about all the goals necessary for using @code{automake}
7311and @file{libintl} in one project.
7312
7313Those four packages are only needed by you, as a maintainer; the
7314installers of your own package and end users do not really need any of
7315GNU @code{m4}, GNU Autoconf, GNU @code{gettext}, or GNU @code{automake}
7316for successfully installing and running your package, with messages
7317properly translated.  But this is not completely true if you provide
7318internationalized shell scripts within your own package: GNU
7319@code{gettext} shall then be installed at the user site if the end users
7320want to see the translation of shell script messages.
7321
7322@item
7323Your package should use Autoconf and have a @file{configure.ac} or
7324@file{configure.in} file.
7325If it does not, you have to learn how.  The Autoconf documentation
7326is quite well written, it is a good idea that you print it and get
7327familiar with it.
7328
7329@item
7330Your C sources should have already been modified according to
7331instructions given earlier in this manual.  @xref{Sources}.
7332
7333@item
7334Your @file{po/} directory should receive all PO files submitted to you
7335by the translator teams, each having @file{@var{ll}.po} as a name.
7336This is not usually easy to get translation
7337work done before your package gets internationalized and available!
7338Since the cycle has to start somewhere, the easiest for the maintainer
7339is to start with absolutely no PO files, and wait until various
7340translator teams get interested in your package, and submit PO files.
7341
7342@end itemize
7343
7344It is worth adding here a few words about how the maintainer should
7345ideally behave with PO files submissions.  As a maintainer, your role is
7346to authenticate the origin of the submission as being the representative
7347of the appropriate translating teams of the Translation Project (forward
7348the submission to @file{coordinator@@translationproject.org} in case of doubt),
7349to ensure that the PO file format is not severely broken and does not
7350prevent successful installation, and for the rest, to merely put these
7351PO files in @file{po/} for distribution.
7352
7353As a maintainer, you do not have to take on your shoulders the
7354responsibility of checking if the translations are adequate or
7355complete, and should avoid diving into linguistic matters.  Translation
7356teams drive themselves and are fully responsible of their linguistic
7357choices for the Translation Project.  Keep in mind that translator teams are @emph{not}
7358driven by maintainers.  You can help by carefully redirecting all
7359communications and reports from users about linguistic matters to the
7360appropriate translation team, or explain users how to reach or join
7361their team.  The simplest might be to send them the @file{ABOUT-NLS} file.
7362
7363Maintainers should @emph{never ever} apply PO file bug reports
7364themselves, short-cutting translation teams.  If some translator has
7365difficulty to get some of her points through her team, it should not be
7366an option for her to directly negotiate translations with maintainers.
7367Teams ought to settle their problems themselves, if any.  If you, as
7368a maintainer, ever think there is a real problem with a team, please
7369never try to @emph{solve} a team's problem on your own.
7370
7371@node gettextize Invocation, Adjusting Files, Prerequisites, Maintainers
7372@section Invoking the @code{gettextize} Program
7373
7374@include gettextize.texi
7375
7376@node Adjusting Files, autoconf macros, gettextize Invocation, Maintainers
7377@section Files You Must Create or Alter
7378@cindex @code{gettext} files
7379
7380Besides files which are automatically added through @code{gettextize},
7381there are many files needing revision for properly interacting with
7382GNU @code{gettext}.  If you are closely following GNU standards for
7383Makefile engineering and auto-configuration, the adaptations should
7384be easier to achieve.  Here is a point by point description of the
7385changes needed in each.
7386
7387So, here comes a list of files, each one followed by a description of
7388all alterations it needs.  Many examples are taken out from the GNU
7389@code{gettext} @value{VERSION} distribution itself, or from the GNU
7390@code{hello} distribution (@uref{http://www.franken.de/users/gnu/ke/hello}
7391or @uref{http://www.gnu.franken.de/ke/hello/})  You may indeed
7392refer to the source code of the GNU @code{gettext} and GNU @code{hello}
7393packages, as they are intended to be good examples for using GNU
7394gettext functionality.
7395
7396@menu
7397* po/POTFILES.in::              @file{POTFILES.in} in @file{po/}
7398* po/LINGUAS::                  @file{LINGUAS} in @file{po/}
7399* po/Makevars::                 @file{Makevars} in @file{po/}
7400* po/Rules-*::                  Extending @file{Makefile} in @file{po/}
7401* configure.ac::                @file{configure.ac} at top level
7402* config.guess::                @file{config.guess}, @file{config.sub} at top level
7403* mkinstalldirs::               @file{mkinstalldirs} at top level
7404* aclocal::                     @file{aclocal.m4} at top level
7405* acconfig::                    @file{acconfig.h} at top level
7406* config.h.in::                 @file{config.h.in} at top level
7407* Makefile::                    @file{Makefile.in} at top level
7408* src/Makefile::                @file{Makefile.in} in @file{src/}
7409* lib/gettext.h::               @file{gettext.h} in @file{lib/}
7410@end menu
7411
7412@node po/POTFILES.in, po/LINGUAS, Adjusting Files, Adjusting Files
7413@subsection @file{POTFILES.in} in @file{po/}
7414@cindex @file{POTFILES.in} file
7415
7416The @file{po/} directory should receive a file named
7417@file{POTFILES.in}.  This file tells which files, among all program
7418sources, have marked strings needing translation.  Here is an example
7419of such a file:
7420
7421@example
7422@group
7423# List of source files containing translatable strings.
7424# Copyright (C) 1995 Free Software Foundation, Inc.
7425
7426# Common library files
7427lib/error.c
7428lib/getopt.c
7429lib/xmalloc.c
7430
7431# Package source files
7432src/gettext.c
7433src/msgfmt.c
7434src/xgettext.c
7435@end group
7436@end example
7437
7438@noindent
7439Hash-marked comments and white lines are ignored.  All other lines
7440list those source files containing strings marked for translation
7441(@pxref{Mark Keywords}), in a notation relative to the top level
7442of your whole distribution, rather than the location of the
7443@file{POTFILES.in} file itself.
7444
7445When a C file is automatically generated by a tool, like @code{flex} or
7446@code{bison}, that doesn't introduce translatable strings by itself,
7447it is recommended to list in @file{po/POTFILES.in} the real source file
7448(ending in @file{.l} in the case of @code{flex}, or in @file{.y} in the
7449case of @code{bison}), not the generated C file.
7450
7451@node po/LINGUAS, po/Makevars, po/POTFILES.in, Adjusting Files
7452@subsection @file{LINGUAS} in @file{po/}
7453@cindex @file{LINGUAS} file
7454
7455The @file{po/} directory should also receive a file named
7456@file{LINGUAS}.  This file contains the list of available translations.
7457It is a whitespace separated list.  Hash-marked comments and white lines
7458are ignored.  Here is an example file:
7459
7460@example
7461@group
7462# Set of available languages.
7463de fr
7464@end group
7465@end example
7466
7467@noindent
7468This example means that German and French PO files are available, so
7469that these languages are currently supported by your package.  If you
7470want to further restrict, at installation time, the set of installed
7471languages, this should not be done by modifying the @file{LINGUAS} file,
7472but rather by using the @code{LINGUAS} environment variable
7473(@pxref{Installers}).
7474
7475It is recommended that you add the "languages" @samp{en@@quot} and
7476@samp{en@@boldquot} to the @code{LINGUAS} file.  @code{en@@quot} is a
7477variant of English message catalogs (@code{en}) which uses real quotation
7478marks instead of the ugly looking asymmetric ASCII substitutes @samp{`}
7479and @samp{'}.  @code{en@@boldquot} is a variant of @code{en@@quot} that
7480additionally outputs quoted pieces of text in a bold font, when used in
7481a terminal emulator which supports the VT100 escape sequences (such as
7482@code{xterm} or the Linux console, but not Emacs in @kbd{M-x shell} mode).
7483
7484These extra message catalogs @samp{en@@quot} and @samp{en@@boldquot}
7485are constructed automatically, not by translators; to support them, you
7486need the files @file{Rules-quot}, @file{quot.sed}, @file{boldquot.sed},
7487@file{en@@quot.header}, @file{en@@boldquot.header}, @file{insert-header.sin}
7488in the @file{po/} directory.  You can copy them from GNU gettext's @file{po/}
7489directory; they are also installed by running @code{gettextize}.
7490
7491@node po/Makevars, po/Rules-*, po/LINGUAS, Adjusting Files
7492@subsection @file{Makevars} in @file{po/}
7493@cindex @file{Makevars} file
7494
7495The @file{po/} directory also has a file named @file{Makevars}.  It
7496contains variables that are specific to your project.  @file{po/Makevars}
7497gets inserted into the @file{po/Makefile} when the latter is created.
7498The variables thus take effect when the POT file is created or updated,
7499and when the message catalogs get installed.
7500
7501The first three variables can be left unmodified if your package has a
7502single message domain and, accordingly, a single @file{po/} directory.
7503Only packages which have multiple @file{po/} directories at different
7504locations need to adjust the three first variables defined in
7505@file{Makevars}.
7506
7507As an alternative to the @code{XGETTEXT_OPTIONS} variables, it is also
7508possible to specify @code{xgettext} options through the
7509@code{AM_XGETTEXT_OPTION} autoconf macro.  See @ref{AM_XGETTEXT_OPTION}.
7510
7511@node po/Rules-*, configure.ac, po/Makevars, Adjusting Files
7512@subsection Extending @file{Makefile} in @file{po/}
7513@cindex @file{Makefile.in.in} extensions
7514
7515All files called @file{Rules-*} in the @file{po/} directory get appended to
7516the @file{po/Makefile} when it is created.  They present an opportunity to
7517add rules for special PO files to the Makefile, without needing to mess
7518with @file{po/Makefile.in.in}.
7519
7520@cindex quotation marks
7521@vindex LANGUAGE@r{, environment variable}
7522GNU gettext comes with a @file{Rules-quot} file, containing rules for
7523building catalogs @file{en@@quot.po} and @file{en@@boldquot.po}.  The
7524effect of @file{en@@quot.po} is that people who set their @code{LANGUAGE}
7525environment variable to @samp{en@@quot} will get messages with proper
7526looking symmetric Unicode quotation marks instead of abusing the ASCII
7527grave accent and the ASCII apostrophe for indicating quotations.  To
7528enable this catalog, simply add @code{en@@quot} to the @file{po/LINGUAS}
7529file.  The effect of @file{en@@boldquot.po} is that people who set
7530@code{LANGUAGE} to @samp{en@@boldquot} will get not only proper quotation
7531marks, but also the quoted text will be shown in a bold font on terminals
7532and consoles.  This catalog is useful only for command-line programs, not
7533GUI programs.  To enable it, similarly add @code{en@@boldquot} to the
7534@file{po/LINGUAS} file.
7535
7536Similarly, you can create rules for building message catalogs for the
7537@file{sr@@latin} locale -- Serbian written with the Latin alphabet --
7538from those for the @file{sr} locale -- Serbian written with Cyrillic
7539letters.  See @ref{msgfilter Invocation}.
7540
7541@node configure.ac, config.guess, po/Rules-*, Adjusting Files
7542@subsection @file{configure.ac} at top level
7543
7544@file{configure.ac} or @file{configure.in} - this is the source from which
7545@code{autoconf} generates the @file{configure} script.
7546
7547@enumerate
7548@item Declare the package and version.
7549@cindex package and version declaration in @file{configure.ac}
7550
7551This is done by a set of lines like these:
7552
7553@example
7554PACKAGE=gettext
7555VERSION=@value{VERSION}
7556AC_DEFINE_UNQUOTED(PACKAGE, "$PACKAGE")
7557AC_DEFINE_UNQUOTED(VERSION, "$VERSION")
7558AC_SUBST(PACKAGE)
7559AC_SUBST(VERSION)
7560@end example
7561
7562@noindent
7563or, if you are using GNU @code{automake}, by a line like this:
7564
7565@example
7566AM_INIT_AUTOMAKE(gettext, @value{VERSION})
7567@end example
7568
7569@noindent
7570Of course, you replace @samp{gettext} with the name of your package,
7571and @samp{@value{VERSION}} by its version numbers, exactly as they
7572should appear in the packaged @code{tar} file name of your distribution
7573(@file{gettext-@value{VERSION}.tar.gz}, here).
7574
7575@item Check for internationalization support.
7576
7577Here is the main @code{m4} macro for triggering internationalization
7578support.  Just add this line to @file{configure.ac}:
7579
7580@example
7581AM_GNU_GETTEXT
7582@end example
7583
7584@noindent
7585This call is purposely simple, even if it generates a lot of configure
7586time checking and actions.
7587
7588If you have suppressed the @file{intl/} subdirectory by calling
7589@code{gettextize} without @samp{--intl} option, this call should read
7590
7591@example
7592AM_GNU_GETTEXT([external])
7593@end example
7594
7595@item Have output files created.
7596
7597The @code{AC_OUTPUT} directive, at the end of your @file{configure.ac}
7598file, needs to be modified in two ways:
7599
7600@example
7601AC_OUTPUT([@var{existing configuration files} intl/Makefile po/Makefile.in],
7602[@var{existing additional actions}])
7603@end example
7604
7605The modification to the first argument to @code{AC_OUTPUT} asks
7606for substitution in the @file{intl/} and @file{po/} directories.
7607Note the @samp{.in} suffix used for @file{po/} only.  This is because
7608the distributed file is really @file{po/Makefile.in.in}.
7609
7610If you have suppressed the @file{intl/} subdirectory by calling
7611@code{gettextize} without @samp{--intl} option, then you don't need to
7612add @code{intl/Makefile} to the @code{AC_OUTPUT} line.
7613
7614@end enumerate
7615
7616If, after doing the recommended modifications, a command like
7617@samp{aclocal -I m4} or @samp{autoconf} or @samp{autoreconf} fails with
7618a trace similar to this:
7619
7620@smallexample
7621configure.ac:44: warning: AC_COMPILE_IFELSE was called before AC_GNU_SOURCE
7622../../lib/autoconf/specific.m4:335: AC_GNU_SOURCE is expanded from...
7623m4/lock.m4:224: gl_LOCK is expanded from...
7624m4/gettext.m4:571: gt_INTL_SUBDIR_CORE is expanded from...
7625m4/gettext.m4:472: AM_INTL_SUBDIR is expanded from...
7626m4/gettext.m4:347: AM_GNU_GETTEXT is expanded from...
7627configure.ac:44: the top level
7628configure.ac:44: warning: AC_RUN_IFELSE was called before AC_GNU_SOURCE
7629@end smallexample
7630
7631@noindent
7632you need to add an explicit invocation of @samp{AC_GNU_SOURCE} in the
7633@file{configure.ac} file - after @samp{AC_PROG_CC} but before
7634@samp{AM_GNU_GETTEXT}, most likely very close to the @samp{AC_PROG_CC}
7635invocation.  This is necessary because of ordering restrictions imposed
7636by GNU autoconf.
7637
7638@node config.guess, mkinstalldirs, configure.ac, Adjusting Files
7639@subsection @file{config.guess}, @file{config.sub} at top level
7640
7641If you haven't suppressed the @file{intl/} subdirectory,
7642you need to add the GNU @file{config.guess} and @file{config.sub} files
7643to your distribution.  They are needed because the @file{intl/} directory
7644has platform dependent support for determining the locale's character
7645encoding and therefore needs to identify the platform.
7646
7647You can obtain the newest version of @file{config.guess} and
7648@file{config.sub} from the CVS of the @samp{config} project at
7649@file{http://savannah.gnu.org/}. The commands to fetch them are
7650@smallexample
7651$ wget 'http://savannah.gnu.org/cgi-bin/viewcvs/*checkout*/config/config/config.guess'
7652$ wget 'http://savannah.gnu.org/cgi-bin/viewcvs/*checkout*/config/config/config.sub'
7653@end smallexample
7654@noindent
7655Less recent versions are also contained in the GNU @code{automake} and
7656GNU @code{libtool} packages.
7657
7658Normally, @file{config.guess} and @file{config.sub} are put at the
7659top level of a distribution.  But it is also possible to put them in a
7660subdirectory, altogether with other configuration support files like
7661@file{install-sh}, @file{ltconfig}, @file{ltmain.sh} or @file{missing}.
7662All you need to do, other than moving the files, is to add the following line
7663to your @file{configure.ac}.
7664
7665@example
7666AC_CONFIG_AUX_DIR([@var{subdir}])
7667@end example
7668
7669@node mkinstalldirs, aclocal, config.guess, Adjusting Files
7670@subsection @file{mkinstalldirs} at top level
7671@cindex @file{mkinstalldirs} file
7672
7673With earlier versions of GNU gettext, you needed to add the GNU
7674@file{mkinstalldirs} script to your distribution.  This is not needed any
7675more.  You can remove it if you not also using an automake version older than
7676automake 1.9.
7677
7678@node aclocal, acconfig, mkinstalldirs, Adjusting Files
7679@subsection @file{aclocal.m4} at top level
7680@cindex @file{aclocal.m4} file
7681
7682If you do not have an @file{aclocal.m4} file in your distribution,
7683the simplest is to concatenate the files @file{codeset.m4},
7684@file{gettext.m4}, @file{glibc2.m4}, @file{glibc21.m4}, @file{iconv.m4},
7685@file{intdiv0.m4}, @file{intl.m4}, @file{intldir.m4}, @file{intlmacosx.m4},
7686@file{intmax.m4}, @file{inttypes_h.m4}, @file{inttypes-pri.m4},
7687@file{lcmessage.m4}, @file{lib-ld.m4}, @file{lib-link.m4},
7688@file{lib-prefix.m4}, @file{lock.m4}, @file{longlong.m4}, @file{nls.m4},
7689@file{po.m4}, @file{printf-posix.m4}, @file{progtest.m4}, @file{size_max.m4},
7690@file{stdint_h.m4}, @file{uintmax_t.m4}, @file{visibility.m4},
7691@file{wchar_t.m4}, @file{wint_t.m4}, @file{xsize.m4}
7692from GNU @code{gettext}'s
7693@file{m4/} directory into a single file.  If you have suppressed the
7694@file{intl/} directory, only @file{gettext.m4}, @file{iconv.m4},
7695@file{lib-ld.m4}, @file{lib-link.m4}, @file{lib-prefix.m4},
7696@file{nls.m4}, @file{po.m4}, @file{progtest.m4} need to be concatenated.
7697
7698If you are not using GNU @code{automake} 1.8 or newer, you will need to
7699add a file @file{mkdirp.m4} from a newer automake distribution to the
7700list of files above.
7701
7702If you already have an @file{aclocal.m4} file, then you will have
7703to merge the said macro files into your @file{aclocal.m4}.  Note that if
7704you are upgrading from a previous release of GNU @code{gettext}, you
7705should most probably @emph{replace} the macros (@code{AM_GNU_GETTEXT},
7706etc.), as they usually
7707change a little from one release of GNU @code{gettext} to the next.
7708Their contents may vary as we get more experience with strange systems
7709out there.
7710
7711If you are using GNU @code{automake} 1.5 or newer, it is enough to put
7712these macro files into a subdirectory named @file{m4/} and add the line
7713
7714@example
7715ACLOCAL_AMFLAGS = -I m4
7716@end example
7717
7718@noindent
7719to your top level @file{Makefile.am}.
7720
7721These macros check for the internationalization support functions
7722and related informations.  Hopefully, once stabilized, these macros
7723might be integrated in the standard Autoconf set, because this
7724piece of @code{m4} code will be the same for all projects using GNU
7725@code{gettext}.
7726
7727@node acconfig, config.h.in, aclocal, Adjusting Files
7728@subsection @file{acconfig.h} at top level
7729@cindex @file{acconfig.h} file
7730
7731Earlier GNU @code{gettext} releases required to put definitions for
7732@code{ENABLE_NLS}, @code{HAVE_GETTEXT} and @code{HAVE_LC_MESSAGES},
7733@code{HAVE_STPCPY}, @code{PACKAGE} and @code{VERSION} into an
7734@file{acconfig.h} file.  This is not needed any more; you can remove
7735them from your @file{acconfig.h} file unless your package uses them
7736independently from the @file{intl/} directory.
7737
7738@node config.h.in, Makefile, acconfig, Adjusting Files
7739@subsection @file{config.h.in} at top level
7740@cindex @file{config.h.in} file
7741
7742The include file template that holds the C macros to be defined by
7743@code{configure} is usually called @file{config.h.in} and may be
7744maintained either manually or automatically.
7745
7746If @code{gettextize} has created an @file{intl/} directory, this file
7747must be called @file{config.h.in} and must be at the top level.  If,
7748however, you have suppressed the @file{intl/} directory by calling
7749@code{gettextize} without @samp{--intl} option, then you can choose the
7750name of this file and its location freely.
7751
7752If it is maintained automatically, by use of the @samp{autoheader}
7753program, you need to do nothing about it.  This is the case in particular
7754if you are using GNU @code{automake}.
7755
7756If it is maintained manually, and if @code{gettextize} has created an
7757@file{intl/} directory, you should switch to using @samp{autoheader}.
7758The list of C macros to be added for the sake of the @file{intl/}
7759directory is just too long to be maintained manually; it also changes
7760between different versions of GNU @code{gettext}.
7761
7762If it is maintained manually, and if on the other hand you have
7763suppressed the @file{intl/} directory by calling @code{gettextize}
7764without @samp{--intl} option, then you can get away by adding the
7765following lines to @file{config.h.in}:
7766
7767@example
7768/* Define to 1 if translation of program messages to the user's
7769   native language is requested. */
7770#undef ENABLE_NLS
7771@end example
7772
7773@node Makefile, src/Makefile, config.h.in, Adjusting Files
7774@subsection @file{Makefile.in} at top level
7775
7776Here are a few modifications you need to make to your main, top-level
7777@file{Makefile.in} file.
7778
7779@enumerate
7780@item
7781Add the following lines near the beginning of your @file{Makefile.in},
7782so the @samp{dist:} goal will work properly (as explained further down):
7783
7784@example
7785PACKAGE = @@PACKAGE@@
7786VERSION = @@VERSION@@
7787@end example
7788
7789@item
7790Add file @file{ABOUT-NLS} to the @code{DISTFILES} definition, so the file gets
7791distributed.
7792
7793@item
7794Wherever you process subdirectories in your @file{Makefile.in}, be sure
7795you also process the subdirectories @samp{intl} and @samp{po}.  Special
7796rules in the @file{Makefiles} take care for the case where no
7797internationalization is wanted.
7798
7799If you are using Makefiles, either generated by automake, or hand-written
7800so they carefully follow the GNU coding standards, the effected goals for
7801which the new subdirectories must be handled include @samp{installdirs},
7802@samp{install}, @samp{uninstall}, @samp{clean}, @samp{distclean}.
7803
7804Here is an example of a canonical order of processing.  In this
7805example, we also define @code{SUBDIRS} in @code{Makefile.in} for it
7806to be further used in the @samp{dist:} goal.
7807
7808@example
7809SUBDIRS = doc intl lib src po
7810@end example
7811
7812Note that you must arrange for @samp{make} to descend into the
7813@code{intl} directory before descending into other directories containing
7814code which make use of the @code{libintl.h} header file.  For this
7815reason, here we mention @code{intl} before @code{lib} and @code{src}.
7816
7817@item
7818A delicate point is the @samp{dist:} goal, as both
7819@file{intl/Makefile} and @file{po/Makefile} will later assume that the
7820proper directory has been set up from the main @file{Makefile}.  Here is
7821an example at what the @samp{dist:} goal might look like:
7822
7823@example
7824distdir = $(PACKAGE)-$(VERSION)
7825dist: Makefile
7826	rm -fr $(distdir)
7827	mkdir $(distdir)
7828	chmod 777 $(distdir)
7829	for file in $(DISTFILES); do \
7830	  ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir); \
7831	done
7832	for subdir in $(SUBDIRS); do \
7833	  mkdir $(distdir)/$$subdir || exit 1; \
7834	  chmod 777 $(distdir)/$$subdir; \
7835	  (cd $$subdir && $(MAKE) $@@) || exit 1; \
7836	done
7837	tar chozf $(distdir).tar.gz $(distdir)
7838	rm -fr $(distdir)
7839@end example
7840
7841@end enumerate
7842
7843Note that if you are using GNU @code{automake}, @file{Makefile.in} is
7844automatically generated from @file{Makefile.am}, and all needed changes
7845to @file{Makefile.am} are already made by running @samp{gettextize}.
7846
7847@node src/Makefile, lib/gettext.h, Makefile, Adjusting Files
7848@subsection @file{Makefile.in} in @file{src/}
7849
7850Some of the modifications made in the main @file{Makefile.in} will
7851also be needed in the @file{Makefile.in} from your package sources,
7852which we assume here to be in the @file{src/} subdirectory.  Here are
7853all the modifications needed in @file{src/Makefile.in}:
7854
7855@enumerate
7856@item
7857In view of the @samp{dist:} goal, you should have these lines near the
7858beginning of @file{src/Makefile.in}:
7859
7860@example
7861PACKAGE = @@PACKAGE@@
7862VERSION = @@VERSION@@
7863@end example
7864
7865@item
7866If not done already, you should guarantee that @code{top_srcdir}
7867gets defined.  This will serve for @code{cpp} include files.  Just add
7868the line:
7869
7870@example
7871top_srcdir = @@top_srcdir@@
7872@end example
7873
7874@item
7875You might also want to define @code{subdir} as @samp{src}, later
7876allowing for almost uniform @samp{dist:} goals in all your
7877@file{Makefile.in}.  At list, the @samp{dist:} goal below assume that
7878you used:
7879
7880@example
7881subdir = src
7882@end example
7883
7884@item
7885The @code{main} function of your program will normally call
7886@code{bindtextdomain} (see @pxref{Triggering}), like this:
7887
7888@example
7889bindtextdomain (@var{PACKAGE}, LOCALEDIR);
7890textdomain (@var{PACKAGE});
7891@end example
7892
7893To make LOCALEDIR known to the program, add the following lines to
7894@file{Makefile.in}:
7895
7896@example
7897datadir = @@datadir@@
7898localedir = $(datadir)/locale
7899DEFS = -DLOCALEDIR=\"$(localedir)\" @@DEFS@@
7900@end example
7901
7902Note that @code{@@datadir@@} defaults to @samp{$(prefix)/share}, thus
7903@code{$(localedir)} defaults to @samp{$(prefix)/share/locale}.
7904
7905@item
7906You should ensure that the final linking will use @code{@@LIBINTL@@} or
7907@code{@@LTLIBINTL@@} as a library.  @code{@@LIBINTL@@} is for use without
7908@code{libtool}, @code{@@LTLIBINTL@@} is for use with @code{libtool}.  An
7909easy way to achieve this is to manage that it gets into @code{LIBS}, like
7910this:
7911
7912@example
7913LIBS = @@LIBINTL@@ @@LIBS@@
7914@end example
7915
7916In most packages internationalized with GNU @code{gettext}, one will
7917find a directory @file{lib/} in which a library containing some helper
7918functions will be build.  (You need at least the few functions which the
7919GNU @code{gettext} Library itself needs.)  However some of the functions
7920in the @file{lib/} also give messages to the user which of course should be
7921translated, too.  Taking care of this, the support library (say
7922@file{libsupport.a}) should be placed before @code{@@LIBINTL@@} and
7923@code{@@LIBS@@} in the above example.  So one has to write this:
7924
7925@example
7926LIBS = ../lib/libsupport.a @@LIBINTL@@ @@LIBS@@
7927@end example
7928
7929@item
7930You should also ensure that directory @file{intl/} will be searched for
7931C preprocessor include files in all circumstances.  So, you have to
7932manage so both @samp{-I../intl} and @samp{-I$(top_srcdir)/intl} will
7933be given to the C compiler.
7934
7935@item
7936Your @samp{dist:} goal has to conform with others.  Here is a
7937reasonable definition for it:
7938
7939@example
7940distdir = ../$(PACKAGE)-$(VERSION)/$(subdir)
7941dist: Makefile $(DISTFILES)
7942	for file in $(DISTFILES); do \
7943	  ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir) || exit 1; \
7944	done
7945@end example
7946
7947@end enumerate
7948
7949Note that if you are using GNU @code{automake}, @file{Makefile.in} is
7950automatically generated from @file{Makefile.am}, and the first three
7951changes and the last change are not necessary.  The remaining needed
7952@file{Makefile.am} modifications are the following:
7953
7954@enumerate
7955@item
7956To make LOCALEDIR known to the program, add the following to
7957@file{Makefile.am}:
7958
7959@example
7960<module>_CPPFLAGS = -DLOCALEDIR=\"$(localedir)\"
7961@end example
7962
7963@noindent
7964for each specific module or compilation unit, or
7965
7966@example
7967AM_CPPFLAGS = -DLOCALEDIR=\"$(localedir)\"
7968@end example
7969
7970for all modules and compilation units together.  Furthermore, add this
7971line to define @samp{localedir}:
7972
7973@example
7974localedir = $(datadir)/locale
7975@end example
7976
7977@item
7978To ensure that the final linking will use @code{@@LIBINTL@@} or
7979@code{@@LTLIBINTL@@} as a library, add the following to
7980@file{Makefile.am}:
7981
7982@example
7983<program>_LDADD = @@LIBINTL@@
7984@end example
7985
7986@noindent
7987for each specific program, or
7988
7989@example
7990LDADD = @@LIBINTL@@
7991@end example
7992
7993for all programs together.  Remember that when you use @code{libtool}
7994to link a program, you need to use @@LTLIBINTL@@ instead of @@LIBINTL@@
7995for that program.
7996
7997@item
7998If you have an @file{intl/} directory, whose contents is created by
7999@code{gettextize}, then to ensure that it will be searched for
8000C preprocessor include files in all circumstances, add something like
8001this to @file{Makefile.am}:
8002
8003@example
8004AM_CPPFLAGS = -I../intl -I$(top_srcdir)/intl
8005@end example
8006
8007@end enumerate
8008
8009@node lib/gettext.h,  , src/Makefile, Adjusting Files
8010@subsection @file{gettext.h} in @file{lib/}
8011@cindex @file{gettext.h} file
8012@cindex turning off NLS support
8013@cindex disabling NLS
8014
8015Internationalization of packages, as provided by GNU @code{gettext}, is
8016optional.  It can be turned off in two situations:
8017
8018@itemize @bullet
8019@item
8020When the installer has specified @samp{./configure --disable-nls}.  This
8021can be useful when small binaries are more important than features, for
8022example when building utilities for boot diskettes.  It can also be useful
8023in order to get some specific C compiler warnings about code quality with
8024some older versions of GCC (older than 3.0).
8025
8026@item
8027When the package does not include the @code{intl/} subdirectory, and the
8028libintl.h header (with its associated libintl library, if any) is not
8029already installed on the system, it is preferable that the package builds
8030without internationalization support, rather than to give a compilation
8031error.
8032@end itemize
8033
8034A C preprocessor macro can be used to detect these two cases.  Usually,
8035when @code{libintl.h} was found and not explicitly disabled, the
8036@code{ENABLE_NLS} macro will be defined to 1 in the autoconf generated
8037configuration file (usually called @file{config.h}).  In the two negative
8038situations, however, this macro will not be defined, thus it will evaluate
8039to 0 in C preprocessor expressions.
8040
8041@cindex include file @file{libintl.h}
8042@file{gettext.h} is a convenience header file for conditional use of
8043@file{<libintl.h>}, depending on the @code{ENABLE_NLS} macro.  If
8044@code{ENABLE_NLS} is set, it includes @file{<libintl.h>}; otherwise it
8045defines no-op substitutes for the libintl.h functions.  We recommend
8046the use of @code{"gettext.h"} over direct use of @file{<libintl.h>},
8047so that portability to older systems is guaranteed and installers can
8048turn off internationalization if they want to.  In the C code, you will
8049then write
8050
8051@example
8052#include "gettext.h"
8053@end example
8054
8055@noindent
8056instead of
8057
8058@example
8059#include <libintl.h>
8060@end example
8061
8062The location of @code{gettext.h} is usually in a directory containing
8063auxiliary include files.  In many GNU packages, there is a directory
8064@file{lib/} containing helper functions; @file{gettext.h} fits there.
8065In other packages, it can go into the @file{src} directory.
8066
8067Do not install the @code{gettext.h} file in public locations.  Every
8068package that needs it should contain a copy of it on its own.
8069
8070@node autoconf macros, CVS Issues, Adjusting Files, Maintainers
8071@section Autoconf macros for use in @file{configure.ac}
8072@cindex autoconf macros for @code{gettext}
8073
8074GNU @code{gettext} installs macros for use in a package's
8075@file{configure.ac} or @file{configure.in}.
8076@xref{Top, , Introduction, autoconf, The Autoconf Manual}.
8077The primary macro is, of course, @code{AM_GNU_GETTEXT}.
8078
8079@menu
8080* AM_GNU_GETTEXT::              AM_GNU_GETTEXT in @file{gettext.m4}
8081* AM_GNU_GETTEXT_VERSION::      AM_GNU_GETTEXT_VERSION in @file{gettext.m4}
8082* AM_GNU_GETTEXT_NEED::         AM_GNU_GETTEXT_NEED in @file{gettext.m4}
8083* AM_GNU_GETTEXT_INTL_SUBDIR::  AM_GNU_GETTEXT_INTL_SUBDIR in @file{intldir.m4}
8084* AM_PO_SUBDIRS::               AM_PO_SUBDIRS in @file{po.m4}
8085* AM_XGETTEXT_OPTION::          AM_XGETTEXT_OPTION in @file{po.m4}
8086* AM_ICONV::                    AM_ICONV in @file{iconv.m4}
8087@end menu
8088
8089@node AM_GNU_GETTEXT, AM_GNU_GETTEXT_VERSION, autoconf macros, autoconf macros
8090@subsection AM_GNU_GETTEXT in @file{gettext.m4}
8091
8092@amindex AM_GNU_GETTEXT
8093The @code{AM_GNU_GETTEXT} macro tests for the presence of the GNU gettext
8094function family in either the C library or a separate @code{libintl}
8095library (shared or static libraries are both supported) or in the package's
8096@file{intl/} directory.  It also invokes @code{AM_PO_SUBDIRS}, thus preparing
8097the @file{po/} directories of the package for building.
8098
8099@code{AM_GNU_GETTEXT} accepts up to three optional arguments.  The general
8100syntax is
8101
8102@example
8103AM_GNU_GETTEXT([@var{intlsymbol}], [@var{needsymbol}], [@var{intldir}])
8104@end example
8105
8106@c We don't document @var{intlsymbol} = @samp{use-libtool} here, because
8107@c it is of no use for packages other than GNU gettext itself.  (Such packages
8108@c are not allowed to install the shared libintl.  But if they use libtool,
8109@c then it is in order to install shared libraries that depend on libintl.)
8110@var{intlsymbol} can be @samp{external} or @samp{no-libtool}.  The default
8111(if it is not specified or empty) is @samp{no-libtool}.  @var{intlsymbol}
8112should be @samp{external} for packages with no @file{intl/} directory.
8113For packages with an @file{intl/} directory, you can either use an
8114@var{intlsymbol} equal to @samp{no-libtool}, or you can use @samp{external}
8115and override by using the macro @code{AM_GNU_GETTEXT_INTL_SUBDIR} elsewhere.
8116The two ways to specify the existence of an @file{intl/} directory are
8117equivalent.  At build time, a static library
8118@code{$(top_builddir)/intl/libintl.a} will then be created.
8119
8120If @var{needsymbol} is specified and is @samp{need-ngettext}, then GNU
8121gettext implementations (in libc or libintl) without the @code{ngettext()}
8122function will be ignored.  If @var{needsymbol} is specified and is
8123@samp{need-formatstring-macros}, then GNU gettext implementations that don't
8124support the ISO C 99 @file{<inttypes.h>} formatstring macros will be ignored.
8125Only one @var{needsymbol} can be specified.  These requirements can also be
8126specified by using the macro @code{AM_GNU_GETTEXT_NEED} elsewhere.  To specify
8127more than one requirement, just specify the strongest one among them, or
8128invoke the @code{AM_GNU_GETTEXT_NEED} macro several times.  The hierarchy
8129among the various alternatives is as follows: @samp{need-formatstring-macros}
8130implies @samp{need-ngettext}.
8131
8132@var{intldir} is used to find the intl libraries.  If empty, the value
8133@samp{$(top_builddir)/intl/} is used.
8134
8135The @code{AM_GNU_GETTEXT} macro determines whether GNU gettext is
8136available and should be used.  If so, it sets the @code{USE_NLS} variable
8137to @samp{yes}; it defines @code{ENABLE_NLS} to 1 in the autoconf
8138generated configuration file (usually called @file{config.h}); it sets
8139the variables @code{LIBINTL} and @code{LTLIBINTL} to the linker options
8140for use in a Makefile (@code{LIBINTL} for use without libtool,
8141@code{LTLIBINTL} for use with libtool); it adds an @samp{-I} option to
8142@code{CPPFLAGS} if necessary.  In the negative case, it sets
8143@code{USE_NLS} to @samp{no}; it sets @code{LIBINTL} and @code{LTLIBINTL}
8144to empty and doesn't change @code{CPPFLAGS}.
8145
8146The complexities that @code{AM_GNU_GETTEXT} deals with are the following:
8147
8148@itemize @bullet
8149@item
8150@cindex @code{libintl} library
8151Some operating systems have @code{gettext} in the C library, for example
8152glibc.  Some have it in a separate library @code{libintl}.  GNU @code{libintl}
8153might have been installed as part of the GNU @code{gettext} package.
8154
8155@item
8156GNU @code{libintl}, if installed, is not necessarily already in the search
8157path (@code{CPPFLAGS} for the include file search path, @code{LDFLAGS} for
8158the library search path).
8159
8160@item
8161Except for glibc, the operating system's native @code{gettext} cannot
8162exploit the GNU mo files, doesn't have the necessary locale dependency
8163features, and cannot convert messages from the catalog's text encoding
8164to the user's locale encoding.
8165
8166@item
8167GNU @code{libintl}, if installed, is not necessarily already in the
8168run time library search path.  To avoid the need for setting an environment
8169variable like @code{LD_LIBRARY_PATH}, the macro adds the appropriate
8170run time search path options to the @code{LIBINTL} and @code{LTLIBINTL}
8171variables.  This works on most systems, but not on some operating systems
8172with limited shared library support, like SCO.
8173
8174@item
8175GNU @code{libintl} relies on POSIX/XSI @code{iconv}.  The macro checks for
8176linker options needed to use iconv and appends them to the @code{LIBINTL}
8177and @code{LTLIBINTL} variables.
8178@end itemize
8179
8180@node AM_GNU_GETTEXT_VERSION, AM_GNU_GETTEXT_NEED, AM_GNU_GETTEXT, autoconf macros
8181@subsection AM_GNU_GETTEXT_VERSION in @file{gettext.m4}
8182
8183@amindex AM_GNU_GETTEXT_VERSION
8184The @code{AM_GNU_GETTEXT_VERSION} macro declares the version number of
8185the GNU gettext infrastructure that is used by the package.
8186
8187The use of this macro is optional; only the @code{autopoint} program makes
8188use of it (@pxref{CVS Issues}).
8189
8190@node AM_GNU_GETTEXT_NEED, AM_GNU_GETTEXT_INTL_SUBDIR, AM_GNU_GETTEXT_VERSION, autoconf macros
8191@subsection AM_GNU_GETTEXT_NEED in @file{gettext.m4}
8192
8193@amindex AM_GNU_GETTEXT_NEED
8194The @code{AM_GNU_GETTEXT_NEED} macro declares a constraint regarding the
8195GNU gettext implementation.  The syntax is
8196
8197@example
8198AM_GNU_GETTEXT_NEED([@var{needsymbol}])
8199@end example
8200
8201If @var{needsymbol} is @samp{need-ngettext}, then GNU gettext implementations
8202(in libc or libintl) without the @code{ngettext()} function will be ignored.
8203If @var{needsymbol} is @samp{need-formatstring-macros}, then GNU gettext
8204implementations that don't support the ISO C 99 @file{<inttypes.h>}
8205formatstring macros will be ignored.
8206
8207The optional second argument of @code{AM_GNU_GETTEXT} is also taken into
8208account.
8209
8210The @code{AM_GNU_GETTEXT_NEED} invocations can occur before or after
8211the @code{AM_GNU_GETTEXT} invocation; the order doesn't matter.
8212
8213@node AM_GNU_GETTEXT_INTL_SUBDIR, AM_PO_SUBDIRS, AM_GNU_GETTEXT_NEED, autoconf macros
8214@subsection AM_GNU_GETTEXT_INTL_SUBDIR in @file{intldir.m4}
8215
8216@amindex AM_GNU_GETTEXT_INTL_SUBDIR
8217The @code{AM_GNU_GETTEXT_INTL_SUBDIR} macro specifies that the
8218@code{AM_GNU_GETTEXT} macro, although invoked with the first argument
8219@samp{external}, should also prepare for building the @file{intl/}
8220subdirectory.
8221
8222The @code{AM_GNU_GETTEXT_INTL_SUBDIR} invocation can occur before or after
8223the @code{AM_GNU_GETTEXT} invocation; the order doesn't matter.
8224
8225The use of this macro requires GNU automake 1.10 or newer and
8226GNU autoconf 2.61 or newer.
8227
8228@node AM_PO_SUBDIRS, AM_XGETTEXT_OPTION, AM_GNU_GETTEXT_INTL_SUBDIR, autoconf macros
8229@subsection AM_PO_SUBDIRS in @file{po.m4}
8230
8231@amindex AM_PO_SUBDIRS
8232The @code{AM_PO_SUBDIRS} macro prepares the @file{po/} directories of the
8233package for building.  This macro should be used in internationalized
8234programs written in other programming languages than C, C++, Objective C,
8235for example @code{sh}, @code{Python}, @code{Lisp}.  See @ref{Programming
8236Languages} for a list of programming languages that support localization
8237through PO files.
8238
8239The @code{AM_PO_SUBDIRS} macro determines whether internationalization
8240should be used.  If so, it sets the @code{USE_NLS} variable to @samp{yes},
8241otherwise to @samp{no}.  It also determines the right values for Makefile
8242variables in each @file{po/} directory.
8243
8244@node AM_XGETTEXT_OPTION, AM_ICONV, AM_PO_SUBDIRS, autoconf macros
8245@subsection AM_XGETTEXT_OPTION in @file{po.m4}
8246
8247@amindex AM_XGETTEXT_OPTION
8248The @code{AM_XGETTEXT_OPTION} macro registers a command-line option to be
8249used in the invocations of @code{xgettext} in the @file{po/} directories
8250of the package.
8251
8252For example, if you have a source file that defines a function
8253@samp{error_at_line} whose fifth argument is a format string, you can use
8254@example
8255AM_XGETTEXT_OPTION([--flag=error_at_line:5:c-format])
8256@end example
8257@noindent
8258to instruct @code{xgettext} to mark all translatable strings in @samp{gettext}
8259invocations that occur as fifth argument to this function as @samp{c-format}.
8260
8261See @ref{xgettext Invocation} for the list of options that @code{xgettext}
8262accepts.
8263
8264The use of this macro is an alternative to the use of the
8265@samp{XGETTEXT_OPTIONS} variable in @file{po/Makevars}.
8266
8267@node AM_ICONV,  , AM_XGETTEXT_OPTION, autoconf macros
8268@subsection AM_ICONV in @file{iconv.m4}
8269
8270@amindex AM_ICONV
8271The @code{AM_ICONV} macro tests for the presence of the POSIX/XSI
8272@code{iconv} function family in either the C library or a separate
8273@code{libiconv} library.  If found, it sets the @code{am_cv_func_iconv}
8274variable to @samp{yes}; it defines @code{HAVE_ICONV} to 1 in the autoconf
8275generated configuration file (usually called @file{config.h}); it defines
8276@code{ICONV_CONST} to @samp{const} or to empty, depending on whether the
8277second argument of @code{iconv()} is of type @samp{const char **} or
8278@samp{char **}; it sets the variables @code{LIBICONV} and
8279@code{LTLIBICONV} to the linker options for use in a Makefile
8280(@code{LIBICONV} for use without libtool, @code{LTLIBICONV} for use with
8281libtool); it adds an @samp{-I} option to @code{CPPFLAGS} if
8282necessary.  If not found, it sets @code{LIBICONV} and @code{LTLIBICONV} to
8283empty and doesn't change @code{CPPFLAGS}.
8284
8285The complexities that @code{AM_ICONV} deals with are the following:
8286
8287@itemize @bullet
8288@item
8289@cindex @code{libiconv} library
8290Some operating systems have @code{iconv} in the C library, for example
8291glibc.  Some have it in a separate library @code{libiconv}, for example
8292OSF/1 or FreeBSD.  Regardless of the operating system, GNU @code{libiconv}
8293might have been installed.  In that case, it should be used instead of the
8294operating system's native @code{iconv}.
8295
8296@item
8297GNU @code{libiconv}, if installed, is not necessarily already in the search
8298path (@code{CPPFLAGS} for the include file search path, @code{LDFLAGS} for
8299the library search path).
8300
8301@item
8302GNU @code{libiconv} is binary incompatible with some operating system's
8303native @code{iconv}, for example on FreeBSD.  Use of an @file{iconv.h}
8304and @file{libiconv.so} that don't fit together would produce program
8305crashes.
8306
8307@item
8308GNU @code{libiconv}, if installed, is not necessarily already in the
8309run time library search path.  To avoid the need for setting an environment
8310variable like @code{LD_LIBRARY_PATH}, the macro adds the appropriate
8311run time search path options to the @code{LIBICONV} variable.  This works
8312on most systems, but not on some operating systems with limited shared
8313library support, like SCO.
8314@end itemize
8315
8316@file{iconv.m4} is distributed with the GNU gettext package because
8317@file{gettext.m4} relies on it.
8318
8319@node CVS Issues, Release Management, autoconf macros, Maintainers
8320@section Integrating with CVS
8321
8322Many projects use CVS for distributed development, version control and
8323source backup.  This section gives some advice how to manage the uses
8324of @code{cvs}, @code{gettextize}, @code{autopoint} and @code{autoconf}.
8325
8326@menu
8327* Distributed CVS::             Avoiding version mismatch in distributed development
8328* Files under CVS::             Files to put under CVS version control
8329* autopoint Invocation::        Invoking the @code{autopoint} Program
8330@end menu
8331
8332@node Distributed CVS, Files under CVS, CVS Issues, CVS Issues
8333@subsection Avoiding version mismatch in distributed development
8334
8335In a project development with multiple developers, using CVS, there
8336should be a single developer who occasionally - when there is desire to
8337upgrade to a new @code{gettext} version - runs @code{gettextize} and
8338performs the changes listed in @ref{Adjusting Files}, and then commits
8339his changes to the CVS.
8340
8341It is highly recommended that all developers on a project use the same
8342version of GNU @code{gettext} in the package.  In other words, if a
8343developer runs @code{gettextize}, he should go the whole way, make the
8344necessary remaining changes and commit his changes to the CVS.
8345Otherwise the following damages will likely occur:
8346
8347@itemize @bullet
8348@item
8349Apparent version mismatch between developers.  Since some @code{gettext}
8350specific portions in @file{configure.ac}, @file{configure.in} and
8351@code{Makefile.am}, @code{Makefile.in} files depend on the @code{gettext}
8352version, the use of infrastructure files belonging to different
8353@code{gettext} versions can easily lead to build errors.
8354
8355@item
8356Hidden version mismatch.  Such version mismatch can also lead to
8357malfunctioning of the package, that may be undiscovered by the developers.
8358The worst case of hidden version mismatch is that internationalization
8359of the package doesn't work at all.
8360
8361@item
8362Release risks.  All developers implicitly perform constant testing on
8363a package.  This is important in the days and weeks before a release.
8364If the guy who makes the release tar files uses a different version
8365of GNU @code{gettext} than the other developers, the distribution will
8366be less well tested than if all had been using the same @code{gettext}
8367version.  For example, it is possible that a platform specific bug goes
8368undiscovered due to this constellation.
8369@end itemize
8370
8371@node Files under CVS, autopoint Invocation, Distributed CVS, CVS Issues
8372@subsection Files to put under CVS version control
8373
8374There are basically three ways to deal with generated files in the
8375context of a CVS repository, such as @file{configure} generated from
8376@file{configure.ac}, @code{@var{parser}.c} generated from
8377@code{@var{parser}.y}, or @code{po/Makefile.in.in} autoinstalled by
8378@code{gettextize} or @code{autopoint}.
8379
8380@enumerate
8381@item
8382All generated files are always committed into the repository.
8383
8384@item
8385All generated files are committed into the repository occasionally,
8386for example each time a release is made.
8387
8388@item
8389Generated files are never committed into the repository.
8390@end enumerate
8391
8392Each of these three approaches has different advantages and drawbacks.
8393
8394@enumerate
8395@item
8396The advantage is that anyone can check out the CVS at any moment and
8397gets a working build.  The drawbacks are:  1a. It requires some frequent
8398"cvs commit" actions by the maintainers.  1b. The repository grows in size
8399quite fast.
8400
8401@item
8402The advantage is that anyone can check out the CVS, and the usual
8403"./configure; make" will work.  The drawbacks are:  2a. The one who
8404checks out the repository needs tools like GNU @code{automake},
8405GNU @code{autoconf}, GNU @code{m4} installed in his PATH; sometimes
8406he even needs particular versions of them.  2b. When a release is made
8407and a commit is made on the generated files, the other developers get
8408conflicts on the generated files after doing "cvs update".  Although
8409these conflicts are easy to resolve, they are annoying.
8410
8411@item
8412The advantage is less work for the maintainers.  The drawback is that
8413anyone who checks out the CVS not only needs tools like GNU @code{automake},
8414GNU @code{autoconf}, GNU @code{m4} installed in his PATH, but also that
8415he needs to perform a package specific pre-build step before being able
8416to "./configure; make".
8417@end enumerate
8418
8419For the first and second approach, all files modified or brought in
8420by the occasional @code{gettextize} invocation and update should be
8421committed into the CVS.
8422
8423For the third approach, the maintainer can omit from the CVS repository
8424all the files that @code{gettextize} mentions as "copy".  Instead, he
8425adds to the @file{configure.ac} or @file{configure.in} a line of the
8426form
8427
8428@example
8429AM_GNU_GETTEXT_VERSION(@value{VERSION})
8430@end example
8431
8432@noindent
8433and adds to the package's pre-build script an invocation of
8434@samp{autopoint}.  For everyone who checks out the CVS, this
8435@code{autopoint} invocation will copy into the right place the
8436@code{gettext} infrastructure files that have been omitted from the CVS.
8437
8438The version number used as argument to @code{AM_GNU_GETTEXT_VERSION} is
8439the version of the @code{gettext} infrastructure that the package wants
8440to use.  It is also the minimum version number of the @samp{autopoint}
8441program.  So, if you write @code{AM_GNU_GETTEXT_VERSION(0.11.5)} then the
8442developers can have any version >= 0.11.5 installed; the package will work
8443with the 0.11.5 infrastructure in all developers' builds.  When the
8444maintainer then runs gettextize from, say, version 0.12.1 on the package,
8445the occurrence of @code{AM_GNU_GETTEXT_VERSION(0.11.5)} will be changed
8446into @code{AM_GNU_GETTEXT_VERSION(0.12.1)}, and all other developers that
8447use the CVS will henceforth need to have GNU @code{gettext} 0.12.1 or newer
8448installed.
8449
8450@node autopoint Invocation,  , Files under CVS, CVS Issues
8451@subsection Invoking the @code{autopoint} Program
8452
8453@include autopoint.texi
8454
8455@node Release Management,  , CVS Issues, Maintainers
8456@section Creating a Distribution Tarball
8457
8458@cindex release
8459@cindex distribution tarball
8460In projects that use GNU @code{automake}, the usual commands for creating
8461a distribution tarball, @samp{make dist} or @samp{make distcheck},
8462automatically update the PO files as needed.
8463
8464If GNU @code{automake} is not used, the maintainer needs to perform this
8465update before making a release:
8466
8467@example
8468$ ./configure
8469$ (cd po; make update-po)
8470$ make distclean
8471@end example
8472
8473@node Installers, Programming Languages, Maintainers, Top
8474@chapter The Installer's and Distributor's View
8475@cindex package installer's view of @code{gettext}
8476@cindex package distributor's view of @code{gettext}
8477@cindex package build and installation options
8478@cindex setting up @code{gettext} at build time
8479
8480By default, packages fully using GNU @code{gettext}, internally,
8481are installed in such a way that they to allow translation of
8482messages.  At @emph{configuration} time, those packages should
8483automatically detect whether the underlying host system already provides
8484the GNU @code{gettext} functions.  If not,
8485the GNU @code{gettext} library should be automatically prepared
8486and used.  Installers may use special options at configuration
8487time for changing this behavior.  The command @samp{./configure
8488--with-included-gettext} bypasses system @code{gettext} to
8489use the included GNU @code{gettext} instead,
8490while @samp{./configure --disable-nls}
8491produces programs totally unable to translate messages.
8492
8493@vindex LINGUAS@r{, environment variable}
8494Internationalized packages have usually many @file{@var{ll}.po}
8495files.  Unless
8496translations are disabled, all those available are installed together
8497with the package.  However, the environment variable @code{LINGUAS}
8498may be set, prior to configuration, to limit the installed set.
8499@code{LINGUAS} should then contain a space separated list of two-letter
8500codes, stating which languages are allowed.
8501
8502@node Programming Languages, Conclusion, Installers, Top
8503@chapter Other Programming Languages
8504
8505While the presentation of @code{gettext} focuses mostly on C and
8506implicitly applies to C++ as well, its scope is far broader than that:
8507Many programming languages, scripting languages and other textual data
8508like GUI resources or package descriptions can make use of the gettext
8509approach.
8510
8511@menu
8512* Language Implementors::       The Language Implementor's View
8513* Programmers for other Languages::  The Programmer's View
8514* Translators for other Languages::  The Translator's View
8515* Maintainers for other Languages::  The Maintainer's View
8516* List of Programming Languages::  Individual Programming Languages
8517* List of Data Formats::        Internationalizable Data
8518@end menu
8519
8520@node Language Implementors, Programmers for other Languages, Programming Languages, Programming Languages
8521@section The Language Implementor's View
8522@cindex programming languages
8523@cindex scripting languages
8524
8525All programming and scripting languages that have the notion of strings
8526are eligible to supporting @code{gettext}.  Supporting @code{gettext}
8527means the following:
8528
8529@enumerate
8530@item
8531You should add to the language a syntax for translatable strings.  In
8532principle, a function call of @code{gettext} would do, but a shorthand
8533syntax helps keeping the legibility of internationalized programs.  For
8534example, in C we use the syntax @code{_("string")}, and in GNU awk we use
8535the shorthand @code{_"string"}.
8536
8537@item
8538You should arrange that evaluation of such a translatable string at
8539runtime calls the @code{gettext} function, or performs equivalent
8540processing.
8541
8542@item
8543Similarly, you should make the functions @code{ngettext},
8544@code{dcgettext}, @code{dcngettext} available from within the language.
8545These functions are less often used, but are nevertheless necessary for
8546particular purposes: @code{ngettext} for correct plural handling, and
8547@code{dcgettext} and @code{dcngettext} for obeying other locale-related
8548environment variables than @code{LC_MESSAGES}, such as @code{LC_TIME} or
8549@code{LC_MONETARY}.  For these latter functions, you need to make the
8550@code{LC_*} constants, available in the C header @code{<locale.h>},
8551referenceable from within the language, usually either as enumeration
8552values or as strings.
8553
8554@item
8555You should allow the programmer to designate a message domain, either by
8556making the @code{textdomain} function available from within the
8557language, or by introducing a magic variable called @code{TEXTDOMAIN}.
8558Similarly, you should allow the programmer to designate where to search
8559for message catalogs, by providing access to the @code{bindtextdomain}
8560function.
8561
8562@item
8563You should either perform a @code{setlocale (LC_ALL, "")} call during
8564the startup of your language runtime, or allow the programmer to do so.
8565Remember that gettext will act as a no-op if the @code{LC_MESSAGES} and
8566@code{LC_CTYPE} locale categories are not both set.
8567
8568@item
8569A programmer should have a way to extract translatable strings from a
8570program into a PO file.  The GNU @code{xgettext} program is being
8571extended to support very different programming languages.  Please
8572contact the GNU @code{gettext} maintainers to help them doing this.  If
8573the string extractor is best integrated into your language's parser, GNU
8574@code{xgettext} can function as a front end to your string extractor.
8575
8576@item
8577The language's library should have a string formatting facility where
8578the arguments of a format string are denoted by a positional number or a
8579name.  This is needed because for some languages and some messages with
8580more than one substitutable argument, the translation will need to
8581output the substituted arguments in different order.  @xref{c-format Flag}.
8582
8583@item
8584If the language has more than one implementation, and not all of the
8585implementations use @code{gettext}, but the programs should be portable
8586across implementations, you should provide a no-i18n emulation, that
8587makes the other implementations accept programs written for yours,
8588without actually translating the strings.
8589
8590@item
8591To help the programmer in the task of marking translatable strings,
8592which is sometimes performed using the Emacs PO mode (@pxref{Marking}),
8593you are welcome to
8594contact the GNU @code{gettext} maintainers, so they can add support for
8595your language to @file{po-mode.el}.
8596@end enumerate
8597
8598On the implementation side, three approaches are possible, with
8599different effects on portability and copyright:
8600
8601@itemize @bullet
8602@item
8603You may integrate the GNU @code{gettext}'s @file{intl/} directory in
8604your package, as described in @ref{Maintainers}.  This allows you to
8605have internationalization on all kinds of platforms.  Note that when you
8606then distribute your package, it legally falls under the GNU General
8607Public License, and the GNU project will be glad about your contribution
8608to the Free Software pool.
8609
8610@item
8611You may link against GNU @code{gettext} functions if they are found in
8612the C library.  For example, an autoconf test for @code{gettext()} and
8613@code{ngettext()} will detect this situation.  For the moment, this test
8614will succeed on GNU systems and not on other platforms.  No severe
8615copyright restrictions apply.
8616
8617@item
8618You may emulate or reimplement the GNU @code{gettext} functionality.
8619This has the advantage of full portability and no copyright
8620restrictions, but also the drawback that you have to reimplement the GNU
8621@code{gettext} features (such as the @code{LANGUAGE} environment
8622variable, the locale aliases database, the automatic charset conversion,
8623and plural handling).
8624@end itemize
8625
8626@node Programmers for other Languages, Translators for other Languages, Language Implementors, Programming Languages
8627@section The Programmer's View
8628
8629For the programmer, the general procedure is the same as for the C
8630language.  The Emacs PO mode marking supports other languages, and the GNU
8631@code{xgettext} string extractor recognizes other languages based on the
8632file extension or a command-line option.  In some languages,
8633@code{setlocale} is not needed because it is already performed by the
8634underlying language runtime.
8635
8636@node Translators for other Languages, Maintainers for other Languages, Programmers for other Languages, Programming Languages
8637@section The Translator's View
8638
8639The translator works exactly as in the C language case.  The only
8640difference is that when translating format strings, she has to be aware
8641of the language's particular syntax for positional arguments in format
8642strings.
8643
8644@menu
8645* c-format::                    C Format Strings
8646* objc-format::                 Objective C Format Strings
8647* sh-format::                   Shell Format Strings
8648* python-format::               Python Format Strings
8649* lisp-format::                 Lisp Format Strings
8650* elisp-format::                Emacs Lisp Format Strings
8651* librep-format::               librep Format Strings
8652* scheme-format::               Scheme Format Strings
8653* smalltalk-format::            Smalltalk Format Strings
8654* java-format::                 Java Format Strings
8655* csharp-format::               C# Format Strings
8656* awk-format::                  awk Format Strings
8657* object-pascal-format::        Object Pascal Format Strings
8658* ycp-format::                  YCP Format Strings
8659* tcl-format::                  Tcl Format Strings
8660* perl-format::                 Perl Format Strings
8661* php-format::                  PHP Format Strings
8662* gcc-internal-format::         GCC internal Format Strings
8663* qt-format::                   Qt Format Strings
8664* kde-format::                  KDE Format Strings
8665* boost-format::                Boost Format Strings
8666@end menu
8667
8668@node c-format, objc-format, Translators for other Languages, Translators for other Languages
8669@subsection C Format Strings
8670
8671C format strings are described in POSIX (IEEE P1003.1 2001), section
8672XSH 3 fprintf(),
8673@uref{http://www.opengroup.org/onlinepubs/007904975/functions/fprintf.html}.
8674See also the fprintf() manual page,
8675@uref{http://www.linuxvalley.it/encyclopedia/ldp/manpage/man3/printf.3.php},
8676@uref{http://informatik.fh-wuerzburg.de/student/i510/man/printf.html}.
8677
8678Although format strings with positions that reorder arguments, such as
8679
8680@example
8681"Only %2$d bytes free on '%1$s'."
8682@end example
8683
8684@noindent
8685which is semantically equivalent to
8686
8687@example
8688"'%s' has only %d bytes free."
8689@end example
8690
8691@noindent
8692are a POSIX/XSI feature and not specified by ISO C 99, translators can rely
8693on this reordering ability: On the few platforms where @code{printf()},
8694@code{fprintf()} etc. don't support this feature natively, @file{libintl.a}
8695or @file{libintl.so} provides replacement functions, and GNU @code{<libintl.h>}
8696activates these replacement functions automatically.
8697
8698@cindex outdigits
8699@cindex Arabic digits
8700As a special feature for Farsi (Persian) and maybe Arabic, translators can
8701insert an @samp{I} flag into numeric format directives.  For example, the
8702translation of @code{"%d"} can be @code{"%Id"}.  The effect of this flag,
8703on systems with GNU @code{libc}, is that in the output, the ASCII digits are
8704replaced with the @samp{outdigits} defined in the @code{LC_CTYPE} locale
8705category.  On other systems, the @code{gettext} function removes this flag,
8706so that it has no effect.
8707
8708Note that the programmer should @emph{not} put this flag into the
8709untranslated string.  (Putting the @samp{I} format directive flag into an
8710@var{msgid} string would lead to undefined behaviour on platforms without
8711glibc when NLS is disabled.)
8712
8713@node objc-format, sh-format, c-format, Translators for other Languages
8714@subsection Objective C Format Strings
8715
8716Objective C format strings are like C format strings.  They support an
8717additional format directive: "$@@", which when executed consumes an argument
8718of type @code{Object *}.
8719
8720@node sh-format, python-format, objc-format, Translators for other Languages
8721@subsection Shell Format Strings
8722
8723Shell format strings, as supported by GNU gettext and the @samp{envsubst}
8724program, are strings with references to shell variables in the form
8725@code{$@var{variable}} or @code{$@{@var{variable}@}}.  References of the form
8726@code{$@{@var{variable}-@var{default}@}},
8727@code{$@{@var{variable}:-@var{default}@}},
8728@code{$@{@var{variable}=@var{default}@}},
8729@code{$@{@var{variable}:=@var{default}@}},
8730@code{$@{@var{variable}+@var{replacement}@}},
8731@code{$@{@var{variable}:+@var{replacement}@}},
8732@code{$@{@var{variable}?@var{ignored}@}},
8733@code{$@{@var{variable}:?@var{ignored}@}},
8734that would be valid inside shell scripts, are not supported.  The
8735@var{variable} names must consist solely of alphanumeric or underscore
8736ASCII characters, not start with a digit and be nonempty; otherwise such
8737a variable reference is ignored.
8738
8739@node python-format, lisp-format, sh-format, Translators for other Languages
8740@subsection Python Format Strings
8741
8742Python format strings are described in
8743@w{Python Library reference} /
8744@w{2. Built-in Types, Exceptions and Functions} /
8745@w{2.2. Built-in Types} /
8746@w{2.2.6. Sequence Types} /
8747@w{2.2.6.2. String Formatting Operations}.
8748@uref{http://www.python.org/doc/2.2.1/lib/typesseq-strings.html}.
8749
8750@node lisp-format, elisp-format, python-format, Translators for other Languages
8751@subsection Lisp Format Strings
8752
8753Lisp format strings are described in the Common Lisp HyperSpec,
8754chapter 22.3 @w{Formatted Output},
8755@uref{http://www.lisp.org/HyperSpec/Body/sec_22-3.html}.
8756
8757@node elisp-format, librep-format, lisp-format, Translators for other Languages
8758@subsection Emacs Lisp Format Strings
8759
8760Emacs Lisp format strings are documented in the Emacs Lisp reference,
8761section @w{Formatting Strings},
8762@uref{http://www.gnu.org/manual/elisp-manual-21-2.8/html_chapter/elisp_4.html#SEC75}.
8763Note that as of version 21, XEmacs supports numbered argument specifications
8764in format strings while FSF Emacs doesn't.
8765
8766@node librep-format, scheme-format, elisp-format, Translators for other Languages
8767@subsection librep Format Strings
8768
8769librep format strings are documented in the librep manual, section
8770@w{Formatted Output},
8771@url{http://librep.sourceforge.net/librep-manual.html#Formatted%20Output},
8772@url{http://www.gwinnup.org/research/docs/librep.html#SEC122}.
8773
8774@node scheme-format, smalltalk-format, librep-format, Translators for other Languages
8775@subsection Scheme Format Strings
8776
8777Scheme format strings are documented in the SLIB manual, section
8778@w{Format Specification}.
8779
8780@node smalltalk-format, java-format, scheme-format, Translators for other Languages
8781@subsection Smalltalk Format Strings
8782
8783Smalltalk format strings are described in the GNU Smalltalk documentation,
8784class @code{CharArray}, methods @samp{bindWith:} and
8785@samp{bindWithArguments:}.
8786@uref{http://www.gnu.org/software/smalltalk/gst-manual/gst_68.html#SEC238}.
8787In summary, a directive starts with @samp{%} and is followed by @samp{%}
8788or a nonzero digit (@samp{1} to @samp{9}).
8789
8790@node java-format, csharp-format, smalltalk-format, Translators for other Languages
8791@subsection Java Format Strings
8792
8793Java format strings are described in the JDK documentation for class
8794@code{java.text.MessageFormat},
8795@uref{http://java.sun.com/j2se/1.4/docs/api/java/text/MessageFormat.html}.
8796See also the ICU documentation
8797@uref{http://oss.software.ibm.com/icu/apiref/classMessageFormat.html}.
8798
8799@node csharp-format, awk-format, java-format, Translators for other Languages
8800@subsection C# Format Strings
8801
8802C# format strings are described in the .NET documentation for class
8803@code{System.String} and in
8804@uref{http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/cpConFormattingOverview.asp}.
8805
8806@node awk-format, object-pascal-format, csharp-format, Translators for other Languages
8807@subsection awk Format Strings
8808
8809awk format strings are described in the gawk documentation, section
8810@w{Printf},
8811@uref{http://www.gnu.org/manual/gawk/html_node/Printf.html#Printf}.
8812
8813@node object-pascal-format, ycp-format, awk-format, Translators for other Languages
8814@subsection Object Pascal Format Strings
8815
8816Where is this documented?
8817
8818@node ycp-format, tcl-format, object-pascal-format, Translators for other Languages
8819@subsection YCP Format Strings
8820
8821YCP sformat strings are described in the libycp documentation
8822@uref{file:/usr/share/doc/packages/libycp/YCP-builtins.html}.
8823In summary, a directive starts with @samp{%} and is followed by @samp{%}
8824or a nonzero digit (@samp{1} to @samp{9}).
8825
8826@node tcl-format, perl-format, ycp-format, Translators for other Languages
8827@subsection Tcl Format Strings
8828
8829Tcl format strings are described in the @file{format.n} manual page,
8830@uref{http://www.scriptics.com/man/tcl8.3/TclCmd/format.htm}.
8831
8832@node perl-format, php-format, tcl-format, Translators for other Languages
8833@subsection Perl Format Strings
8834
8835There are two kinds format strings in Perl: those acceptable to the
8836Perl built-in function @code{printf}, labelled as @samp{perl-format},
8837and those acceptable to the @code{libintl-perl} function @code{__x},
8838labelled as @samp{perl-brace-format}.
8839
8840Perl @code{printf} format strings are described in the @code{sprintf}
8841section of @samp{man perlfunc}.
8842
8843Perl brace format strings are described in the
8844@file{Locale::TextDomain(3pm)} manual page of the CPAN package
8845libintl-perl.  In brief, Perl format uses placeholders put between
8846braces (@samp{@{} and @samp{@}}).  The placeholder must have the syntax
8847of simple identifiers.
8848
8849@node php-format, gcc-internal-format, perl-format, Translators for other Languages
8850@subsection PHP Format Strings
8851
8852PHP format strings are described in the documentation of the PHP function
8853@code{sprintf}, in @file{phpdoc/manual/function.sprintf.html} or
8854@uref{http://www.php.net/manual/en/function.sprintf.php}.
8855
8856@node gcc-internal-format, qt-format, php-format, Translators for other Languages
8857@subsection GCC internal Format Strings
8858
8859These format strings are used inside the GCC sources.  In such a format
8860string, a directive starts with @samp{%}, is optionally followed by a
8861size specifier @samp{l}, an optional flag @samp{+}, another optional flag
8862@samp{#}, and is finished by a specifier: @samp{%} denotes a literal
8863percent sign, @samp{c} denotes a character, @samp{s} denotes a string,
8864@samp{i} and @samp{d} denote an integer, @samp{o}, @samp{u}, @samp{x}
8865denote an unsigned integer, @samp{.*s} denotes a string preceded by a
8866width specification, @samp{H} denotes a @samp{location_t *} pointer,
8867@samp{D} denotes a general declaration, @samp{F} denotes a function
8868declaration, @samp{T} denotes a type, @samp{A} denotes a function argument,
8869@samp{C} denotes a tree code, @samp{E} denotes an expression, @samp{L}
8870denotes a programming language, @samp{O} denotes a binary operator,
8871@samp{P} denotes a function parameter, @samp{Q} denotes an assignment
8872operator, @samp{V} denotes a const/volatile qualifier.
8873
8874@node qt-format, kde-format, gcc-internal-format, Translators for other Languages
8875@subsection Qt Format Strings
8876
8877Qt format strings are described in the documentation of the QString class
8878@uref{file:/usr/lib/qt-4.3.0/doc/html/qstring.html}.
8879In summary, a directive consists of a @samp{%} followed by a digit. The same
8880directive cannot occur more than once in a format string.
8881
8882@node kde-format, boost-format, qt-format, Translators for other Languages
8883@subsection KDE Format Strings
8884
8885KDE 4 format strings are defined as follows:
8886A directive consists of a @samp{%} followed by a non-zero decimal number.
8887If a @samp{%n} occurs in a format strings, all of @samp{%1}, ..., @samp{%(n-1)}
8888must occur as well, except possibly one of them.
8889
8890@node boost-format,  , kde-format, Translators for other Languages
8891@subsection Boost Format Strings
8892
8893Boost format strings are described in the documentation of the
8894@code{boost::format} class, at
8895@uref{http://www.boost.org/libs/format/doc/format.html}.
8896In summary, a directive has either the same syntax as in a C format string,
8897such as @samp{%1$+5d}, or may be surrounded by vertical bars, such as
8898@samp{%|1$+5d|} or @samp{%|1$+5|}, or consists of just an argument number
8899between percent signs, such as @samp{%1%}.
8900
8901@node Maintainers for other Languages, List of Programming Languages, Translators for other Languages, Programming Languages
8902@section The Maintainer's View
8903
8904For the maintainer, the general procedure differs from the C language
8905case in two ways.
8906
8907@itemize @bullet
8908@item
8909For those languages that don't use GNU gettext, the @file{intl/} directory
8910is not needed and can be omitted.  This means that the maintainer calls the
8911@code{gettextize} program without the @samp{--intl} option, and that he
8912invokes the @code{AM_GNU_GETTEXT} autoconf macro via
8913@samp{AM_GNU_GETTEXT([external])}.
8914
8915@item
8916If only a single programming language is used, the @code{XGETTEXT_OPTIONS}
8917variable in @file{po/Makevars} (@pxref{po/Makevars}) should be adjusted to
8918match the @code{xgettext} options for that particular programming language.
8919If the package uses more than one programming language with @code{gettext}
8920support, it becomes necessary to change the POT file construction rule
8921in @file{po/Makefile.in.in}.  It is recommended to make one @code{xgettext}
8922invocation per programming language, each with the options appropriate for
8923that language, and to combine the resulting files using @code{msgcat}.
8924@end itemize
8925
8926@node List of Programming Languages, List of Data Formats, Maintainers for other Languages, Programming Languages
8927@section Individual Programming Languages
8928
8929@c Here is a list of programming languages, as used for Free Software projects
8930@c on SourceForge/Freshmeat, as of February 2002.  Those supported by gettext
8931@c are marked with a star.
8932@c   C                       3580     *
8933@c   Perl                    1911     *
8934@c   C++                     1379     *
8935@c   Java                    1200     *
8936@c   PHP                     1051     *
8937@c   Python                   613     *
8938@c   Unix Shell               357     *
8939@c   Tcl                      266     *
8940@c   SQL                      174
8941@c   JavaScript               118
8942@c   Assembly                 108
8943@c   Scheme                    51
8944@c   Ruby                      47
8945@c   Lisp                      45     *
8946@c   Objective C               39     *
8947@c   PL/SQL                    29
8948@c   Fortran                   25
8949@c   Ada                       24
8950@c   Delphi                    22
8951@c   Awk                       19     *
8952@c   Pascal                    19
8953@c   ML                        19
8954@c   Eiffel                    17
8955@c   Emacs-Lisp                14     *
8956@c   Zope                      14
8957@c   ASP                       12
8958@c   Forth                     12
8959@c   Cold Fusion               10
8960@c   Haskell                    9
8961@c   Visual Basic               9
8962@c   C#                         6     *
8963@c   Smalltalk                  6     *
8964@c   Basic                      5
8965@c   Erlang                     5
8966@c   Modula                     5
8967@c   Object Pascal              5     *
8968@c   Rexx                       5
8969@c   Dylan                      4
8970@c   Prolog                     4
8971@c   APL                        3
8972@c   PROGRESS                   2
8973@c   Euler                      1
8974@c   Euphoria                   1
8975@c   Pliant                     1
8976@c   Simula                     1
8977@c   XBasic                     1
8978@c   Logo                       0
8979@c   Other Scripting Engines   49
8980@c   Other                    116
8981
8982@menu
8983* C::                           C, C++, Objective C
8984* sh::                          sh - Shell Script
8985* bash::                        bash - Bourne-Again Shell Script
8986* Python::                      Python
8987* Common Lisp::                 GNU clisp - Common Lisp
8988* clisp C::                     GNU clisp C sources
8989* Emacs Lisp::                  Emacs Lisp
8990* librep::                      librep
8991* Scheme::                      GNU guile - Scheme
8992* Smalltalk::                   GNU Smalltalk
8993* Java::                        Java
8994* C#::                          C#
8995* gawk::                        GNU awk
8996* Pascal::                      Pascal - Free Pascal Compiler
8997* wxWidgets::                   wxWidgets library
8998* YCP::                         YCP - YaST2 scripting language
8999* Tcl::                         Tcl - Tk's scripting language
9000* Perl::                        Perl
9001* PHP::                         PHP Hypertext Preprocessor
9002* Pike::                        Pike
9003* GCC-source::                  GNU Compiler Collection sources
9004@end menu
9005
9006@node C, sh, List of Programming Languages, List of Programming Languages
9007@subsection C, C++, Objective C
9008@cindex C and C-like languages
9009
9010@table @asis
9011@item RPMs
9012gcc, gpp, gobjc, glibc, gettext
9013
9014@item File extension
9015For C: @code{c}, @code{h}.
9016@*For C++: @code{C}, @code{c++}, @code{cc}, @code{cxx}, @code{cpp}, @code{hpp}.
9017@*For Objective C: @code{m}.
9018
9019@item String syntax
9020@code{"abc"}
9021
9022@item gettext shorthand
9023@code{_("abc")}
9024
9025@item gettext/ngettext functions
9026@code{gettext}, @code{dgettext}, @code{dcgettext}, @code{ngettext},
9027@code{dngettext}, @code{dcngettext}
9028
9029@item textdomain
9030@code{textdomain} function
9031
9032@item bindtextdomain
9033@code{bindtextdomain} function
9034
9035@item setlocale
9036Programmer must call @code{setlocale (LC_ALL, "")}
9037
9038@item Prerequisite
9039@code{#include <libintl.h>}
9040@*@code{#include <locale.h>}
9041@*@code{#define _(string) gettext (string)}
9042
9043@item Use or emulate GNU gettext
9044Use
9045
9046@item Extractor
9047@code{xgettext -k_}
9048
9049@item Formatting with positions
9050@code{fprintf "%2$d %1$d"}
9051@*In C++: @code{autosprintf "%2$d %1$d"}
9052(@pxref{Top, , Introduction, autosprintf, GNU autosprintf})
9053
9054@item Portability
9055autoconf (gettext.m4) and #if ENABLE_NLS
9056
9057@item po-mode marking
9058yes
9059@end table
9060
9061The following examples are available in the @file{examples} directory:
9062@code{hello-c}, @code{hello-c-gnome}, @code{hello-c++}, @code{hello-c++-qt}, 
9063@code{hello-c++-kde}, @code{hello-c++-gnome}, @code{hello-c++-wxwidgets},
9064@code{hello-objc}, @code{hello-objc-gnustep}, @code{hello-objc-gnome}.
9065
9066@node sh, bash, C, List of Programming Languages
9067@subsection sh - Shell Script
9068@cindex shell scripts
9069
9070@table @asis
9071@item RPMs
9072bash, gettext
9073
9074@item File extension
9075@code{sh}
9076
9077@item String syntax
9078@code{"abc"}, @code{'abc'}, @code{abc}
9079
9080@item gettext shorthand
9081@code{"`gettext \"abc\"`"}
9082
9083@item gettext/ngettext functions
9084@pindex gettext
9085@pindex ngettext
9086@code{gettext}, @code{ngettext} programs
9087@*@code{eval_gettext}, @code{eval_ngettext} shell functions
9088
9089@item textdomain
9090@vindex TEXTDOMAIN@r{, environment variable}
9091environment variable @code{TEXTDOMAIN}
9092
9093@item bindtextdomain
9094@vindex TEXTDOMAINDIR@r{, environment variable}
9095environment variable @code{TEXTDOMAINDIR}
9096
9097@item setlocale
9098automatic
9099
9100@item Prerequisite
9101@code{. gettext.sh}
9102
9103@item Use or emulate GNU gettext
9104use
9105
9106@item Extractor
9107@code{xgettext}
9108
9109@item Formatting with positions
9110---
9111
9112@item Portability
9113fully portable
9114
9115@item po-mode marking
9116---
9117@end table
9118
9119An example is available in the @file{examples} directory: @code{hello-sh}.
9120
9121@menu
9122* Preparing Shell Scripts::     Preparing Shell Scripts for Internationalization
9123* gettext.sh::                  Contents of @code{gettext.sh}
9124* gettext Invocation::          Invoking the @code{gettext} program
9125* ngettext Invocation::         Invoking the @code{ngettext} program
9126* envsubst Invocation::         Invoking the @code{envsubst} program
9127* eval_gettext Invocation::     Invoking the @code{eval_gettext} function
9128* eval_ngettext Invocation::    Invoking the @code{eval_ngettext} function
9129@end menu
9130
9131@node Preparing Shell Scripts, gettext.sh, sh, sh
9132@subsubsection Preparing Shell Scripts for Internationalization
9133@cindex preparing shell scripts for translation
9134
9135Preparing a shell script for internationalization is conceptually similar
9136to the steps described in @ref{Sources}.  The concrete steps for shell
9137scripts are as follows.
9138
9139@enumerate
9140@item
9141Insert the line
9142
9143@smallexample
9144. gettext.sh
9145@end smallexample
9146
9147near the top of the script.  @code{gettext.sh} is a shell function library
9148that provides the functions
9149@code{eval_gettext} (see @ref{eval_gettext Invocation}) and
9150@code{eval_ngettext} (see @ref{eval_ngettext Invocation}).
9151You have to ensure that @code{gettext.sh} can be found in the @code{PATH}.
9152
9153@item
9154Set and export the @code{TEXTDOMAIN} and @code{TEXTDOMAINDIR} environment
9155variables.  Usually @code{TEXTDOMAIN} is the package or program name, and
9156@code{TEXTDOMAINDIR} is the absolute pathname corresponding to
9157@code{$prefix/share/locale}, where @code{$prefix} is the installation location.
9158
9159@smallexample
9160TEXTDOMAIN=@@PACKAGE@@
9161export TEXTDOMAIN
9162TEXTDOMAINDIR=@@LOCALEDIR@@
9163export TEXTDOMAINDIR
9164@end smallexample
9165
9166@item
9167Prepare the strings for translation, as described in @ref{Preparing Strings}.
9168
9169@item
9170Simplify translatable strings so that they don't contain command substitution
9171(@code{"`...`"} or @code{"$(...)"}), variable access with defaulting (like
9172@code{$@{@var{variable}-@var{default}@}}), access to positional arguments
9173(like @code{$0}, @code{$1}, ...) or highly volatile shell variables (like
9174@code{$?}). This can always be done through simple local code restructuring.
9175For example,
9176
9177@smallexample
9178echo "Usage: $0 [OPTION] FILE..."
9179@end smallexample
9180
9181becomes
9182
9183@smallexample
9184program_name=$0
9185echo "Usage: $program_name [OPTION] FILE..."
9186@end smallexample
9187
9188Similarly,
9189
9190@smallexample
9191echo "Remaining files: `ls | wc -l`"
9192@end smallexample
9193
9194becomes
9195
9196@smallexample
9197filecount="`ls | wc -l`"
9198echo "Remaining files: $filecount"
9199@end smallexample
9200
9201@item
9202For each translatable string, change the output command @samp{echo} or
9203@samp{$echo} to @samp{gettext} (if the string contains no references to
9204shell variables) or to @samp{eval_gettext} (if it refers to shell variables),
9205followed by a no-argument @samp{echo} command (to account for the terminating
9206newline). Similarly, for cases with plural handling, replace a conditional
9207@samp{echo} command with an invocation of @samp{ngettext} or
9208@samp{eval_ngettext}, followed by a no-argument @samp{echo} command.
9209
9210When doing this, you also need to add an extra backslash before the dollar
9211sign in references to shell variables, so that the @samp{eval_gettext}
9212function receives the translatable string before the variable values are
9213substituted into it. For example,
9214
9215@smallexample
9216echo "Remaining files: $filecount"
9217@end smallexample
9218
9219becomes
9220
9221@smallexample
9222eval_gettext "Remaining files: \$filecount"; echo
9223@end smallexample
9224
9225If the output command is not @samp{echo}, you can make it use @samp{echo}
9226nevertheless, through the use of backquotes. However, note that inside
9227backquotes, backslashes must be doubled to be effective (because the
9228backquoting eats one level of backslashes). For example, assuming that
9229@samp{error} is a shell function that signals an error,
9230
9231@smallexample
9232error "file not found: $filename"
9233@end smallexample
9234
9235is first transformed into
9236
9237@smallexample
9238error "`echo \"file not found: \$filename\"`"
9239@end smallexample
9240
9241which then becomes
9242
9243@smallexample
9244error "`eval_gettext \"file not found: \\\$filename\"`"
9245@end smallexample
9246@end enumerate
9247
9248@node gettext.sh, gettext Invocation, Preparing Shell Scripts, sh
9249@subsubsection Contents of @code{gettext.sh}
9250
9251@code{gettext.sh}, contained in the run-time package of GNU gettext, provides
9252the following:
9253
9254@itemize @bullet
9255@item $echo
9256The variable @code{echo} is set to a command that outputs its first argument
9257and a newline, without interpreting backslashes in the argument string.
9258
9259@item eval_gettext
9260See @ref{eval_gettext Invocation}.
9261
9262@item eval_ngettext
9263See @ref{eval_ngettext Invocation}.
9264@end itemize
9265
9266@node gettext Invocation, ngettext Invocation, gettext.sh, sh
9267@subsubsection Invoking the @code{gettext} program
9268
9269@include rt-gettext.texi
9270
9271@node ngettext Invocation, envsubst Invocation, gettext Invocation, sh
9272@subsubsection Invoking the @code{ngettext} program
9273
9274@include rt-ngettext.texi
9275
9276@node envsubst Invocation, eval_gettext Invocation, ngettext Invocation, sh
9277@subsubsection Invoking the @code{envsubst} program
9278
9279@include rt-envsubst.texi
9280
9281@node eval_gettext Invocation, eval_ngettext Invocation, envsubst Invocation, sh
9282@subsubsection Invoking the @code{eval_gettext} function
9283
9284@cindex @code{eval_gettext} function, usage
9285@example
9286eval_gettext @var{msgid}
9287@end example
9288
9289@cindex lookup message translation
9290This function outputs the native language translation of a textual message,
9291performing dollar-substitution on the result.  Note that only shell variables
9292mentioned in @var{msgid} will be dollar-substituted in the result.
9293
9294@node eval_ngettext Invocation,  , eval_gettext Invocation, sh
9295@subsubsection Invoking the @code{eval_ngettext} function
9296
9297@cindex @code{eval_ngettext} function, usage
9298@example
9299eval_ngettext @var{msgid} @var{msgid-plural} @var{count}
9300@end example
9301
9302@cindex lookup plural message translation
9303This function outputs the native language translation of a textual message
9304whose grammatical form depends on a number, performing dollar-substitution
9305on the result.  Note that only shell variables mentioned in @var{msgid} or
9306@var{msgid-plural} will be dollar-substituted in the result.
9307
9308@node bash, Python, sh, List of Programming Languages
9309@subsection bash - Bourne-Again Shell Script
9310@cindex bash
9311
9312GNU @code{bash} 2.0 or newer has a special shorthand for translating a
9313string and substituting variable values in it: @code{$"msgid"}.  But
9314the use of this construct is @strong{discouraged}, due to the security
9315holes it opens and due to its portability problems.
9316
9317The security holes of @code{$"..."} come from the fact that after looking up
9318the translation of the string, @code{bash} processes it like it processes
9319any double-quoted string: dollar and backquote processing, like @samp{eval}
9320does.
9321
9322@enumerate
9323@item
9324In a locale whose encoding is one of BIG5, BIG5-HKSCS, GBK, GB18030, SHIFT_JIS,
9325JOHAB, some double-byte characters have a second byte whose value is
9326@code{0x60}.  For example, the byte sequence @code{\xe0\x60} is a single
9327character in these locales.  Many versions of @code{bash} (all versions
9328up to bash-2.05, and newer versions on platforms without @code{mbsrtowcs()}
9329function) don't know about character boundaries and see a backquote character
9330where there is only a particular Chinese character.  Thus it can start
9331executing part of the translation as a command list.  This situation can occur
9332even without the translator being aware of it: if the translator provides
9333translations in the UTF-8 encoding, it is the @code{gettext()} function which
9334will, during its conversion from the translator's encoding to the user's
9335locale's encoding, produce the dangerous @code{\x60} bytes.
9336
9337@item
9338A translator could - voluntarily or inadvertently - use backquotes
9339@code{"`...`"} or dollar-parentheses @code{"$(...)"} in her translations.
9340The enclosed strings would be executed as command lists by the shell.
9341@end enumerate
9342
9343The portability problem is that @code{bash} must be built with
9344internationalization support; this is normally not the case on systems
9345that don't have the @code{gettext()} function in libc.
9346
9347@node Python, Common Lisp, bash, List of Programming Languages
9348@subsection Python
9349@cindex Python
9350
9351@table @asis
9352@item RPMs
9353python
9354
9355@item File extension
9356@code{py}
9357
9358@item String syntax
9359@code{'abc'}, @code{u'abc'}, @code{r'abc'}, @code{ur'abc'},
9360@*@code{"abc"}, @code{u"abc"}, @code{r"abc"}, @code{ur"abc"},
9361@*@code{'''abc'''}, @code{u'''abc'''}, @code{r'''abc'''}, @code{ur'''abc'''},
9362@*@code{"""abc"""}, @code{u"""abc"""}, @code{r"""abc"""}, @code{ur"""abc"""}
9363
9364@item gettext shorthand
9365@code{_('abc')} etc.
9366
9367@item gettext/ngettext functions
9368@code{gettext.gettext}, @code{gettext.dgettext},
9369@code{gettext.ngettext}, @code{gettext.dngettext},
9370also @code{ugettext}, @code{ungettext}
9371
9372@item textdomain
9373@code{gettext.textdomain} function, or
9374@code{gettext.install(@var{domain})} function
9375
9376@item bindtextdomain
9377@code{gettext.bindtextdomain} function, or
9378@code{gettext.install(@var{domain},@var{localedir})} function
9379
9380@item setlocale
9381not used by the gettext emulation
9382
9383@item Prerequisite
9384@code{import gettext}
9385
9386@item Use or emulate GNU gettext
9387emulate
9388
9389@item Extractor
9390@code{xgettext}
9391
9392@item Formatting with positions
9393@code{'...%(ident)d...' % @{ 'ident': value @}}
9394
9395@item Portability
9396fully portable
9397
9398@item po-mode marking
9399---
9400@end table
9401
9402An example is available in the @file{examples} directory: @code{hello-python}.
9403
9404@node Common Lisp, clisp C, Python, List of Programming Languages
9405@subsection GNU clisp - Common Lisp
9406@cindex Common Lisp
9407@cindex Lisp
9408@cindex clisp
9409
9410@table @asis
9411@item RPMs
9412clisp 2.28 or newer
9413
9414@item File extension
9415@code{lisp}
9416
9417@item String syntax
9418@code{"abc"}
9419
9420@item gettext shorthand
9421@code{(_ "abc")}, @code{(ENGLISH "abc")}
9422
9423@item gettext/ngettext functions
9424@code{i18n:gettext}, @code{i18n:ngettext}
9425
9426@item textdomain
9427@code{i18n:textdomain}
9428
9429@item bindtextdomain
9430@code{i18n:textdomaindir}
9431
9432@item setlocale
9433automatic
9434
9435@item Prerequisite
9436---
9437
9438@item Use or emulate GNU gettext
9439use
9440
9441@item Extractor
9442@code{xgettext -k_ -kENGLISH}
9443
9444@item Formatting with positions
9445@code{format "~1@@*~D ~0@@*~D"}
9446
9447@item Portability
9448On platforms without gettext, no translation.
9449
9450@item po-mode marking
9451---
9452@end table
9453
9454An example is available in the @file{examples} directory: @code{hello-clisp}.
9455
9456@node clisp C, Emacs Lisp, Common Lisp, List of Programming Languages
9457@subsection GNU clisp C sources
9458@cindex clisp C sources
9459
9460@table @asis
9461@item RPMs
9462clisp
9463
9464@item File extension
9465@code{d}
9466
9467@item String syntax
9468@code{"abc"}
9469
9470@item gettext shorthand
9471@code{ENGLISH ? "abc" : ""}
9472@*@code{GETTEXT("abc")}
9473@*@code{GETTEXTL("abc")}
9474
9475@item gettext/ngettext functions
9476@code{clgettext}, @code{clgettextl}
9477
9478@item textdomain
9479---
9480
9481@item bindtextdomain
9482---
9483
9484@item setlocale
9485automatic
9486
9487@item Prerequisite
9488@code{#include "lispbibl.c"}
9489
9490@item Use or emulate GNU gettext
9491use
9492
9493@item Extractor
9494@code{clisp-xgettext}
9495
9496@item Formatting with positions
9497@code{fprintf "%2$d %1$d"}
9498
9499@item Portability
9500On platforms without gettext, no translation.
9501
9502@item po-mode marking
9503---
9504@end table
9505
9506@node Emacs Lisp, librep, clisp C, List of Programming Languages
9507@subsection Emacs Lisp
9508@cindex Emacs Lisp
9509
9510@table @asis
9511@item RPMs
9512emacs, xemacs
9513
9514@item File extension
9515@code{el}
9516
9517@item String syntax
9518@code{"abc"}
9519
9520@item gettext shorthand
9521@code{(_"abc")}
9522
9523@item gettext/ngettext functions
9524@code{gettext}, @code{dgettext} (xemacs only)
9525
9526@item textdomain
9527@code{domain} special form (xemacs only)
9528
9529@item bindtextdomain
9530@code{bind-text-domain} function (xemacs only)
9531
9532@item setlocale
9533automatic
9534
9535@item Prerequisite
9536---
9537
9538@item Use or emulate GNU gettext
9539use
9540
9541@item Extractor
9542@code{xgettext}
9543
9544@item Formatting with positions
9545@code{format "%2$d %1$d"}
9546
9547@item Portability
9548Only XEmacs.  Without @code{I18N3} defined at build time, no translation.
9549
9550@item po-mode marking
9551---
9552@end table
9553
9554@node librep, Scheme, Emacs Lisp, List of Programming Languages
9555@subsection librep
9556@cindex @code{librep} Lisp
9557
9558@table @asis
9559@item RPMs
9560librep 0.15.3 or newer
9561
9562@item File extension
9563@code{jl}
9564
9565@item String syntax
9566@code{"abc"}
9567
9568@item gettext shorthand
9569@code{(_"abc")}
9570
9571@item gettext/ngettext functions
9572@code{gettext}
9573
9574@item textdomain
9575@code{textdomain} function
9576
9577@item bindtextdomain
9578@code{bindtextdomain} function
9579
9580@item setlocale
9581---
9582
9583@item Prerequisite
9584@code{(require 'rep.i18n.gettext)}
9585
9586@item Use or emulate GNU gettext
9587use
9588
9589@item Extractor
9590@code{xgettext}
9591
9592@item Formatting with positions
9593@code{format "%2$d %1$d"}
9594
9595@item Portability
9596On platforms without gettext, no translation.
9597
9598@item po-mode marking
9599---
9600@end table
9601
9602An example is available in the @file{examples} directory: @code{hello-librep}.
9603
9604@node Scheme, Smalltalk, librep, List of Programming Languages
9605@subsection GNU guile - Scheme
9606@cindex Scheme
9607@cindex guile
9608
9609@table @asis
9610@item RPMs
9611guile
9612
9613@item File extension
9614@code{scm}
9615
9616@item String syntax
9617@code{"abc"}
9618
9619@item gettext shorthand
9620@code{(_ "abc")}
9621
9622@item gettext/ngettext functions
9623@code{gettext}, @code{ngettext}
9624
9625@item textdomain
9626@code{textdomain}
9627
9628@item bindtextdomain
9629@code{bindtextdomain}
9630
9631@item setlocale
9632@code{(catch #t (lambda () (setlocale LC_ALL "")) (lambda args #f))}
9633
9634@item Prerequisite
9635@code{(use-modules (ice-9 format))}
9636
9637@item Use or emulate GNU gettext
9638use
9639
9640@item Extractor
9641@code{xgettext -k_}
9642
9643@item Formatting with positions
9644@c @code{format "~1@@*~D ~0@@*~D~2@@*"}, requires @code{(use-modules (ice-9 format))}
9645@c not yet supported
9646---
9647
9648@item Portability
9649On platforms without gettext, no translation.
9650
9651@item po-mode marking
9652---
9653@end table
9654
9655An example is available in the @file{examples} directory: @code{hello-guile}.
9656
9657@node Smalltalk, Java, Scheme, List of Programming Languages
9658@subsection GNU Smalltalk
9659@cindex Smalltalk
9660
9661@table @asis
9662@item RPMs
9663smalltalk
9664
9665@item File extension
9666@code{st}
9667
9668@item String syntax
9669@code{'abc'}
9670
9671@item gettext shorthand
9672@code{NLS ? 'abc'}
9673
9674@item gettext/ngettext functions
9675@code{LcMessagesDomain>>#at:}, @code{LcMessagesDomain>>#at:plural:with:}
9676
9677@item textdomain
9678@code{LcMessages>>#domain:localeDirectory:} (returns a @code{LcMessagesDomain}
9679object).@*
9680Example: @code{I18N Locale default messages domain: 'gettext' localeDirectory: /usr/local/share/locale'}
9681
9682@item bindtextdomain
9683@code{LcMessages>>#domain:localeDirectory:}, see above.
9684
9685@item setlocale
9686Automatic if you use @code{I18N Locale default}.
9687
9688@item Prerequisite
9689@code{PackageLoader fileInPackage: 'I18N'!}
9690
9691@item Use or emulate GNU gettext
9692emulate
9693
9694@item Extractor
9695@code{xgettext}
9696
9697@item Formatting with positions
9698@code{'%1 %2' bindWith: 'Hello' with: 'world'}
9699
9700@item Portability
9701fully portable
9702
9703@item po-mode marking
9704---
9705@end table
9706
9707An example is available in the @file{examples} directory:
9708@code{hello-smalltalk}.
9709
9710@node Java, C#, Smalltalk, List of Programming Languages
9711@subsection Java
9712@cindex Java
9713
9714@table @asis
9715@item RPMs
9716java, java2
9717
9718@item File extension
9719@code{java}
9720
9721@item String syntax
9722"abc"
9723
9724@item gettext shorthand
9725_("abc")
9726
9727@item gettext/ngettext functions
9728@code{GettextResource.gettext}, @code{GettextResource.ngettext},
9729@code{GettextResource.pgettext}, @code{GettextResource.npgettext}
9730
9731@item textdomain
9732---, use @code{ResourceBundle.getResource} instead
9733
9734@item bindtextdomain
9735---, use CLASSPATH instead
9736
9737@item setlocale
9738automatic
9739
9740@item Prerequisite
9741---
9742
9743@item Use or emulate GNU gettext
9744---, uses a Java specific message catalog format
9745
9746@item Extractor
9747@code{xgettext -k_}
9748
9749@item Formatting with positions
9750@code{MessageFormat.format "@{1,number@} @{0,number@}"}
9751
9752@item Portability
9753fully portable
9754
9755@item po-mode marking
9756---
9757@end table
9758
9759Before marking strings as internationalizable, uses of the string
9760concatenation operator need to be converted to @code{MessageFormat}
9761applications.  For example, @code{"file "+filename+" not found"} becomes
9762@code{MessageFormat.format("file @{0@} not found", new Object[] @{ filename @})}.
9763Only after this is done, can the strings be marked and extracted.
9764
9765GNU gettext uses the native Java internationalization mechanism, namely
9766@code{ResourceBundle}s.  There are two formats of @code{ResourceBundle}s:
9767@code{.properties} files and @code{.class} files.  The @code{.properties}
9768format is a text file which the translators can directly edit, like PO
9769files, but which doesn't support plural forms.  Whereas the @code{.class}
9770format is compiled from @code{.java} source code and can support plural
9771forms (provided it is accessed through an appropriate API, see below).
9772
9773To convert a PO file to a @code{.properties} file, the @code{msgcat}
9774program can be used with the option @code{--properties-output}.  To convert
9775a @code{.properties} file back to a PO file, the @code{msgcat} program
9776can be used with the option @code{--properties-input}.  All the tools
9777that manipulate PO files can work with @code{.properties} files as well,
9778if given the @code{--properties-input} and/or @code{--properties-output}
9779option.
9780
9781To convert a PO file to a ResourceBundle class, the @code{msgfmt} program
9782can be used with the option @code{--java} or @code{--java2}.  To convert a
9783ResourceBundle back to a PO file, the @code{msgunfmt} program can be used
9784with the option @code{--java}.
9785
9786Two different programmatic APIs can be used to access ResourceBundles.
9787Note that both APIs work with all kinds of ResourceBundles, whether
9788GNU gettext generated classes, or other @code{.class} or @code{.properties}
9789files.
9790
9791@enumerate
9792@item
9793The @code{java.util.ResourceBundle} API.
9794
9795In particular, its @code{getString} function returns a string translation.
9796Note that a missing translation yields a @code{MissingResourceException}.
9797
9798This has the advantage of being the standard API.  And it does not require
9799any additional libraries, only the @code{msgcat} generated @code{.properties}
9800files or the @code{msgfmt} generated @code{.class} files.  But it cannot do
9801plural handling, even if the resource was generated by @code{msgfmt} from
9802a PO file with plural handling.
9803
9804@item
9805The @code{gnu.gettext.GettextResource} API.
9806
9807Reference documentation in Javadoc 1.1 style format is in the
9808@uref{javadoc2/index.html,javadoc2 directory}.
9809
9810Its @code{gettext} function returns a string translation.  Note that when
9811a translation is missing, the @var{msgid} argument is returned unchanged.
9812
9813This has the advantage of having the @code{ngettext} function for plural
9814handling and the @code{pgettext} and @code{npgettext} for strings constraint
9815to a particular context.
9816
9817@cindex @code{libintl} for Java
9818To use this API, one needs the @code{libintl.jar} file which is part of
9819the GNU gettext package and distributed under the LGPL.
9820@end enumerate
9821
9822Four examples, using the second API, are available in the @file{examples}
9823directory: @code{hello-java}, @code{hello-java-awt}, @code{hello-java-swing},
9824@code{hello-java-qtjambi}.
9825
9826Now, to make use of the API and define a shorthand for @samp{getString},
9827there are three idioms that you can choose from:
9828
9829@itemize @bullet
9830@item
9831(This one assumes Java 1.5 or newer.)
9832In a unique class of your project, say @samp{Util}, define a static variable
9833holding the @code{ResourceBundle} instance and the shorthand:
9834
9835@smallexample
9836private static ResourceBundle myResources =
9837  ResourceBundle.getBundle("domain-name");
9838public static String _(String s) @{
9839  return myResources.getString(s);
9840@}
9841@end smallexample
9842
9843All classes containing internationalized strings then contain
9844
9845@smallexample
9846import static Util._;
9847@end smallexample
9848
9849@noindent
9850and the shorthand is used like this:
9851
9852@smallexample
9853System.out.println(_("Operation completed."));
9854@end smallexample
9855
9856@item
9857In a unique class of your project, say @samp{Util}, define a static variable
9858holding the @code{ResourceBundle} instance:
9859
9860@smallexample
9861public static ResourceBundle myResources =
9862  ResourceBundle.getBundle("domain-name");
9863@end smallexample
9864
9865All classes containing internationalized strings then contain
9866
9867@smallexample
9868private static ResourceBundle res = Util.myResources;
9869private static String _(String s) @{ return res.getString(s); @}
9870@end smallexample
9871
9872@noindent
9873and the shorthand is used like this:
9874
9875@smallexample
9876System.out.println(_("Operation completed."));
9877@end smallexample
9878
9879@item
9880You add a class with a very short name, say @samp{S}, containing just the
9881definition of the resource bundle and of the shorthand:
9882
9883@smallexample
9884public class S @{
9885  public static ResourceBundle myResources =
9886    ResourceBundle.getBundle("domain-name");
9887  public static String _(String s) @{
9888    return myResources.getString(s);
9889  @}
9890@}
9891@end smallexample
9892
9893@noindent
9894and the shorthand is used like this:
9895
9896@smallexample
9897System.out.println(S._("Operation completed."));
9898@end smallexample
9899@end itemize
9900
9901Which of the three idioms you choose, will depend on whether your project
9902requires portability to Java versions prior to Java 1.5 and, if so, whether
9903copying two lines of codes into every class is more acceptable in your project
9904than a class with a single-letter name.
9905
9906@node C#, gawk, Java, List of Programming Languages
9907@subsection C#
9908@cindex C#
9909
9910@table @asis
9911@item RPMs
9912pnet, pnetlib 0.6.2 or newer, or mono 0.29 or newer
9913
9914@item File extension
9915@code{cs}
9916
9917@item String syntax
9918@code{"abc"}, @code{@@"abc"}
9919
9920@item gettext shorthand
9921_("abc")
9922
9923@item gettext/ngettext functions
9924@code{GettextResourceManager.GetString},
9925@code{GettextResourceManager.GetPluralString}
9926@code{GettextResourceManager.GetParticularString}
9927@code{GettextResourceManager.GetParticularPluralString}
9928
9929@item textdomain
9930@code{new GettextResourceManager(domain)}
9931
9932@item bindtextdomain
9933---, compiled message catalogs are located in subdirectories of the directory
9934containing the executable
9935
9936@item setlocale
9937automatic
9938
9939@item Prerequisite
9940---
9941
9942@item Use or emulate GNU gettext
9943---, uses a C# specific message catalog format
9944
9945@item Extractor
9946@code{xgettext -k_}
9947
9948@item Formatting with positions
9949@code{String.Format "@{1@} @{0@}"}
9950
9951@item Portability
9952fully portable
9953
9954@item po-mode marking
9955---
9956@end table
9957
9958Before marking strings as internationalizable, uses of the string
9959concatenation operator need to be converted to @code{String.Format}
9960invocations.  For example, @code{"file "+filename+" not found"} becomes
9961@code{String.Format("file @{0@} not found", filename)}.
9962Only after this is done, can the strings be marked and extracted.
9963
9964GNU gettext uses the native C#/.NET internationalization mechanism, namely
9965the classes @code{ResourceManager} and @code{ResourceSet}.  Applications
9966use the @code{ResourceManager} methods to retrieve the native language
9967translation of strings.  An instance of @code{ResourceSet} is the in-memory
9968representation of a message catalog file.  The @code{ResourceManager} loads
9969and accesses @code{ResourceSet} instances as needed to look up the
9970translations.
9971
9972There are two formats of @code{ResourceSet}s that can be directly loaded by
9973the C# runtime: @code{.resources} files and @code{.dll} files.
9974
9975@itemize @bullet
9976@item
9977The @code{.resources} format is a binary file usually generated through the
9978@code{resgen} or @code{monoresgen} utility, but which doesn't support plural
9979forms.  @code{.resources} files can also be embedded in .NET @code{.exe} files.
9980This only affects whether a file system access is performed to load the message
9981catalog; it doesn't affect the contents of the message catalog.
9982
9983@item
9984On the other hand, the @code{.dll} format is a binary file that is compiled
9985from @code{.cs} source code and can support plural forms (provided it is
9986accessed through the GNU gettext API, see below).
9987@end itemize
9988
9989Note that these .NET @code{.dll} and @code{.exe} files are not tied to a
9990particular platform; their file format and GNU gettext for C# can be used
9991on any platform.
9992
9993To convert a PO file to a @code{.resources} file, the @code{msgfmt} program
9994can be used with the option @samp{--csharp-resources}.  To convert a
9995@code{.resources} file back to a PO file, the @code{msgunfmt} program can be
9996used with the option @samp{--csharp-resources}.  You can also, in some cases,
9997use the @code{resgen} program (from the @code{pnet} package) or the
9998@code{monoresgen} program (from the @code{mono}/@code{mcs} package).  These
9999programs can also convert a @code{.resources} file back to a PO file.  But
10000beware: as of this writing (January 2004), the @code{monoresgen} converter is
10001quite buggy and the @code{resgen} converter ignores the encoding of the PO
10002files.
10003
10004To convert a PO file to a @code{.dll} file, the @code{msgfmt} program can be
10005used with the option @code{--csharp}.  The result will be a @code{.dll} file
10006containing a subclass of @code{GettextResourceSet}, which itself is a subclass
10007of @code{ResourceSet}.  To convert a @code{.dll} file containing a
10008@code{GettextResourceSet} subclass back to a PO file, the @code{msgunfmt}
10009program can be used with the option @code{--csharp}.
10010
10011The advantages of the @code{.dll} format over the @code{.resources} format
10012are:
10013
10014@enumerate
10015@item
10016Freedom to localize: Users can add their own translations to an application
10017after it has been built and distributed.  Whereas when the programmer uses
10018a @code{ResourceManager} constructor provided by the system, the set of
10019@code{.resources} files for an application must be specified when the
10020application is built and cannot be extended afterwards.
10021@c If this were the only issue with the @code{.resources} format, one could
10022@c use the @code{ResourceManager.CreateFileBasedResourceManager} function.
10023
10024@item
10025Plural handling: A message catalog in @code{.dll} format supports the plural
10026handling function @code{GetPluralString}.  Whereas @code{.resources} files can
10027only contain data and only support lookups that depend on a single string.
10028
10029@item
10030Context handling: A message catalog in @code{.dll} format supports the
10031query-with-context functions @code{GetParticularString} and
10032@code{GetParticularPluralString}.  Whereas @code{.resources} files can
10033only contain data and only support lookups that depend on a single string.
10034
10035@item
10036The @code{GettextResourceManager} that loads the message catalogs in
10037@code{.dll} format also provides for inheritance on a per-message basis.
10038For example, in Austrian (@code{de_AT}) locale, translations from the German
10039(@code{de}) message catalog will be used for messages not found in the
10040Austrian message catalog.  This has the consequence that the Austrian
10041translators need only translate those few messages for which the translation
10042into Austrian differs from the German one.  Whereas when working with
10043@code{.resources} files, each message catalog must provide the translations
10044of all messages by itself.
10045
10046@item
10047The @code{GettextResourceManager} that loads the message catalogs in
10048@code{.dll} format also provides for a fallback: The English @var{msgid} is
10049returned when no translation can be found.  Whereas when working with
10050@code{.resources} files, a language-neutral @code{.resources} file must
10051explicitly be provided as a fallback.
10052@end enumerate
10053
10054On the side of the programmatic APIs, the programmer can use either the
10055standard @code{ResourceManager} API and the GNU @code{GettextResourceManager}
10056API.  The latter is an extension of the former, because
10057@code{GettextResourceManager} is a subclass of @code{ResourceManager}.
10058
10059@enumerate
10060@item
10061The @code{System.Resources.ResourceManager} API.
10062
10063This API works with resources in @code{.resources} format.
10064
10065The creation of the @code{ResourceManager} is done through
10066@smallexample
10067  new ResourceManager(domainname, Assembly.GetExecutingAssembly())
10068@end smallexample
10069@noindent
10070
10071The @code{GetString} function returns a string's translation.  Note that this
10072function returns null when a translation is missing (i.e.@: not even found in
10073the fallback resource file).
10074
10075@item
10076The @code{GNU.Gettext.GettextResourceManager} API.
10077
10078This API works with resources in @code{.dll} format.
10079
10080Reference documentation is in the
10081@uref{csharpdoc/index.html,csharpdoc directory}.
10082
10083The creation of the @code{ResourceManager} is done through
10084@smallexample
10085  new GettextResourceManager(domainname)
10086@end smallexample
10087
10088The @code{GetString} function returns a string's translation.  Note that when
10089a translation is missing, the @var{msgid} argument is returned unchanged.
10090
10091The @code{GetPluralString} function returns a string translation with plural
10092handling, like the @code{ngettext} function in C.
10093
10094The @code{GetParticularString} function returns a string's translation,
10095specific to a particular context, like the @code{pgettext} function in C.
10096Note that when a translation is missing, the @var{msgid} argument is returned
10097unchanged.
10098
10099The @code{GetParticularPluralString} function returns a string translation,
10100specific to a particular context, with plural handling, like the
10101@code{npgettext} function in C.
10102
10103@cindex @code{libintl} for C#
10104To use this API, one needs the @code{GNU.Gettext.dll} file which is part of
10105the GNU gettext package and distributed under the LGPL.
10106@end enumerate
10107
10108You can also mix both approaches: use the
10109@code{GNU.Gettext.GettextResourceManager} constructor, but otherwise use
10110only the @code{ResourceManager} type and only the @code{GetString} method.
10111This is appropriate when you want to profit from the tools for PO files,
10112but don't want to change an existing source code that uses
10113@code{ResourceManager} and don't (yet) need the @code{GetPluralString} method.
10114
10115Two examples, using the second API, are available in the @file{examples}
10116directory: @code{hello-csharp}, @code{hello-csharp-forms}.
10117
10118Now, to make use of the API and define a shorthand for @samp{GetString},
10119there are two idioms that you can choose from:
10120
10121@itemize @bullet
10122@item
10123In a unique class of your project, say @samp{Util}, define a static variable
10124holding the @code{ResourceManager} instance:
10125
10126@smallexample
10127public static GettextResourceManager MyResourceManager =
10128  new GettextResourceManager("domain-name");
10129@end smallexample
10130
10131All classes containing internationalized strings then contain
10132
10133@smallexample
10134private static GettextResourceManager Res = Util.MyResourceManager;
10135private static String _(String s) @{ return Res.GetString(s); @}
10136@end smallexample
10137
10138@noindent
10139and the shorthand is used like this:
10140
10141@smallexample
10142Console.WriteLine(_("Operation completed."));
10143@end smallexample
10144
10145@item
10146You add a class with a very short name, say @samp{S}, containing just the
10147definition of the resource manager and of the shorthand:
10148
10149@smallexample
10150public class S @{
10151  public static GettextResourceManager MyResourceManager =
10152    new GettextResourceManager("domain-name");
10153  public static String _(String s) @{
10154     return MyResourceManager.GetString(s);
10155  @}
10156@}
10157@end smallexample
10158
10159@noindent
10160and the shorthand is used like this:
10161
10162@smallexample
10163Console.WriteLine(S._("Operation completed."));
10164@end smallexample
10165@end itemize
10166
10167Which of the two idioms you choose, will depend on whether copying two lines
10168of codes into every class is more acceptable in your project than a class
10169with a single-letter name.
10170
10171@node gawk, Pascal, C#, List of Programming Languages
10172@subsection GNU awk
10173@cindex awk
10174@cindex gawk
10175
10176@table @asis
10177@item RPMs
10178gawk 3.1 or newer
10179
10180@item File extension
10181@code{awk}
10182
10183@item String syntax
10184@code{"abc"}
10185
10186@item gettext shorthand
10187@code{_"abc"}
10188
10189@item gettext/ngettext functions
10190@code{dcgettext}, missing @code{dcngettext} in gawk-3.1.0
10191
10192@item textdomain
10193@code{TEXTDOMAIN} variable
10194
10195@item bindtextdomain
10196@code{bindtextdomain} function
10197
10198@item setlocale
10199automatic, but missing @code{setlocale (LC_MESSAGES, "")} in gawk-3.1.0
10200
10201@item Prerequisite
10202---
10203
10204@item Use or emulate GNU gettext
10205use
10206
10207@item Extractor
10208@code{xgettext}
10209
10210@item Formatting with positions
10211@code{printf "%2$d %1$d"} (GNU awk only)
10212
10213@item Portability
10214On platforms without gettext, no translation.  On non-GNU awks, you must
10215define @code{dcgettext}, @code{dcngettext} and @code{bindtextdomain}
10216yourself.
10217
10218@item po-mode marking
10219---
10220@end table
10221
10222An example is available in the @file{examples} directory: @code{hello-gawk}.
10223
10224@node Pascal, wxWidgets, gawk, List of Programming Languages
10225@subsection Pascal - Free Pascal Compiler
10226@cindex Pascal
10227@cindex Free Pascal
10228@cindex Object Pascal
10229
10230@table @asis
10231@item RPMs
10232fpk
10233
10234@item File extension
10235@code{pp}, @code{pas}
10236
10237@item String syntax
10238@code{'abc'}
10239
10240@item gettext shorthand
10241automatic
10242
10243@item gettext/ngettext functions
10244---, use @code{ResourceString} data type instead
10245
10246@item textdomain
10247---, use @code{TranslateResourceStrings} function instead
10248
10249@item bindtextdomain
10250---, use @code{TranslateResourceStrings} function instead
10251
10252@item setlocale
10253automatic, but uses only LANG, not LC_MESSAGES or LC_ALL
10254
10255@item Prerequisite
10256@code{@{$mode delphi@}} or @code{@{$mode objfpc@}}@*@code{uses gettext;}
10257
10258@item Use or emulate GNU gettext
10259emulate partially
10260
10261@item Extractor
10262@code{ppc386} followed by @code{xgettext} or @code{rstconv}
10263
10264@item Formatting with positions
10265@code{uses sysutils;}@*@code{format "%1:d %0:d"}
10266
10267@item Portability
10268?
10269
10270@item po-mode marking
10271---
10272@end table
10273
10274The Pascal compiler has special support for the @code{ResourceString} data
10275type.  It generates a @code{.rst} file.  This is then converted to a
10276@code{.pot} file by use of @code{xgettext} or @code{rstconv}.  At runtime,
10277a @code{.mo} file corresponding to translations of this @code{.pot} file
10278can be loaded using the @code{TranslateResourceStrings} function in the
10279@code{gettext} unit.
10280
10281An example is available in the @file{examples} directory: @code{hello-pascal}.
10282
10283@node wxWidgets, YCP, Pascal, List of Programming Languages
10284@subsection wxWidgets library
10285@cindex @code{wxWidgets} library
10286
10287@table @asis
10288@item RPMs
10289wxGTK, gettext
10290
10291@item File extension
10292@code{cpp}
10293
10294@item String syntax
10295@code{"abc"}
10296
10297@item gettext shorthand
10298@code{_("abc")}
10299
10300@item gettext/ngettext functions
10301@code{wxLocale::GetString}, @code{wxGetTranslation}
10302
10303@item textdomain
10304@code{wxLocale::AddCatalog}
10305
10306@item bindtextdomain
10307@code{wxLocale::AddCatalogLookupPathPrefix}
10308
10309@item setlocale
10310@code{wxLocale::Init}, @code{wxSetLocale}
10311
10312@item Prerequisite
10313@code{#include <wx/intl.h>}
10314
10315@item Use or emulate GNU gettext
10316emulate, see @code{include/wx/intl.h} and @code{src/common/intl.cpp}
10317
10318@item Extractor
10319@code{xgettext}
10320
10321@item Formatting with positions
10322wxString::Format supports positions if and only if the system has
10323@code{wprintf()}, @code{vswprintf()} functions and they support positions
10324according to POSIX.
10325
10326@item Portability
10327fully portable
10328
10329@item po-mode marking
10330yes
10331@end table
10332
10333@node YCP, Tcl, wxWidgets, List of Programming Languages
10334@subsection YCP - YaST2 scripting language
10335@cindex YCP
10336@cindex YaST2 scripting language
10337
10338@table @asis
10339@item RPMs
10340libycp, libycp-devel, yast2-core, yast2-core-devel
10341
10342@item File extension
10343@code{ycp}
10344
10345@item String syntax
10346@code{"abc"}
10347
10348@item gettext shorthand
10349@code{_("abc")}
10350
10351@item gettext/ngettext functions
10352@code{_()} with 1 or 3 arguments
10353
10354@item textdomain
10355@code{textdomain} statement
10356
10357@item bindtextdomain
10358---
10359
10360@item setlocale
10361---
10362
10363@item Prerequisite
10364---
10365
10366@item Use or emulate GNU gettext
10367use
10368
10369@item Extractor
10370@code{xgettext}
10371
10372@item Formatting with positions
10373@code{sformat "%2 %1"}
10374
10375@item Portability
10376fully portable
10377
10378@item po-mode marking
10379---
10380@end table
10381
10382An example is available in the @file{examples} directory: @code{hello-ycp}.
10383
10384@node Tcl, Perl, YCP, List of Programming Languages
10385@subsection Tcl - Tk's scripting language
10386@cindex Tcl
10387@cindex Tk's scripting language
10388
10389@table @asis
10390@item RPMs
10391tcl
10392
10393@item File extension
10394@code{tcl}
10395
10396@item String syntax
10397@code{"abc"}
10398
10399@item gettext shorthand
10400@code{[_ "abc"]}
10401
10402@item gettext/ngettext functions
10403@code{::msgcat::mc}
10404
10405@item textdomain
10406---
10407
10408@item bindtextdomain
10409---, use @code{::msgcat::mcload} instead
10410
10411@item setlocale
10412automatic, uses LANG, but ignores LC_MESSAGES and LC_ALL
10413
10414@item Prerequisite
10415@code{package require msgcat}
10416@*@code{proc _ @{s@} @{return [::msgcat::mc $s]@}}
10417
10418@item Use or emulate GNU gettext
10419---, uses a Tcl specific message catalog format
10420
10421@item Extractor
10422@code{xgettext -k_}
10423
10424@item Formatting with positions
10425@code{format "%2\$d %1\$d"}
10426
10427@item Portability
10428fully portable
10429
10430@item po-mode marking
10431---
10432@end table
10433
10434Two examples are available in the @file{examples} directory:
10435@code{hello-tcl}, @code{hello-tcl-tk}.
10436
10437Before marking strings as internationalizable, substitutions of variables
10438into the string need to be converted to @code{format} applications.  For
10439example, @code{"file $filename not found"} becomes
10440@code{[format "file %s not found" $filename]}.
10441Only after this is done, can the strings be marked and extracted.
10442After marking, this example becomes
10443@code{[format [_ "file %s not found"] $filename]} or
10444@code{[msgcat::mc "file %s not found" $filename]}.  Note that the
10445@code{msgcat::mc} function implicitly calls @code{format} when more than one
10446argument is given.
10447
10448@node Perl, PHP, Tcl, List of Programming Languages
10449@subsection Perl
10450@cindex Perl
10451
10452@table @asis
10453@item RPMs
10454perl
10455
10456@item File extension
10457@code{pl}, @code{PL}, @code{pm}, @code{cgi}
10458
10459@item String syntax
10460@itemize @bullet
10461
10462@item @code{"abc"}
10463
10464@item @code{'abc'}
10465
10466@item @code{qq (abc)}
10467
10468@item @code{q (abc)}
10469
10470@item @code{qr /abc/}
10471
10472@item @code{qx (/bin/date)}
10473
10474@item @code{/pattern match/}
10475
10476@item @code{?pattern match?}
10477
10478@item @code{s/substitution/operators/}
10479
10480@item @code{$tied_hash@{"message"@}}
10481
10482@item @code{$tied_hash_reference->@{"message"@}}
10483
10484@item etc., issue the command @samp{man perlsyn} for details
10485
10486@end itemize
10487
10488@item gettext shorthand
10489@code{__} (double underscore)
10490
10491@item gettext/ngettext functions
10492@code{gettext}, @code{dgettext}, @code{dcgettext}, @code{ngettext},
10493@code{dngettext}, @code{dcngettext}
10494
10495@item textdomain
10496@code{textdomain} function
10497
10498@item bindtextdomain
10499@code{bindtextdomain} function
10500
10501@item bind_textdomain_codeset 
10502@code{bind_textdomain_codeset} function
10503
10504@item setlocale
10505Use @code{setlocale (LC_ALL, "");}
10506
10507@item Prerequisite
10508@code{use POSIX;}
10509@*@code{use Locale::TextDomain;} (included in the package libintl-perl
10510which is available on the Comprehensive Perl Archive Network CPAN,
10511http://www.cpan.org/).
10512
10513@item Use or emulate GNU gettext
10514platform dependent: gettext_pp emulates, gettext_xs uses GNU gettext
10515
10516@item Extractor
10517@code{xgettext -k__ -k\$__ -k%__ -k__x -k__n:1,2 -k__nx:1,2 -k__xn:1,2 -kN__ -k}
10518
10519@item Formatting with positions
10520Both kinds of format strings support formatting with positions.
10521@*@code{printf "%2\$d %1\$d", ...} (requires Perl 5.8.0 or newer)
10522@*@code{__expand("[new] replaces [old]", old => $oldvalue, new => $newvalue)}
10523
10524@item Portability
10525The @code{libintl-perl} package is platform independent but is not
10526part of the Perl core.  The programmer is responsible for
10527providing a dummy implementation of the required functions if the 
10528package is not installed on the target system.
10529
10530@item po-mode marking
10531---
10532
10533@item Documentation
10534Included in @code{libintl-perl}, available on CPAN
10535(http://www.cpan.org/).
10536
10537@end table
10538
10539An example is available in the @file{examples} directory: @code{hello-perl}.
10540
10541@cindex marking Perl sources
10542
10543The @code{xgettext} parser backend for Perl differs significantly from
10544the parser backends for other programming languages, just as Perl
10545itself differs significantly from other programming languages.  The
10546Perl parser backend offers many more string marking facilities than
10547the other backends but it also has some Perl specific limitations, the
10548worst probably being its imperfectness.
10549
10550@menu
10551* General Problems::            General Problems Parsing Perl Code
10552* Default Keywords::            Which Keywords Will xgettext Look For?
10553* Special Keywords::            How to Extract Hash Keys
10554* Quote-like Expressions::      What are Strings And Quote-like Expressions?
10555* Interpolation I::             Invalid String Interpolation
10556* Interpolation II::            Valid String Interpolation
10557* Parentheses::                 When To Use Parentheses
10558* Long Lines::                  How To Grok with Long Lines
10559* Perl Pitfalls::               Bugs, Pitfalls, and Things That Do Not Work
10560@end menu
10561
10562@node General Problems, Default Keywords,  , Perl
10563@subsubsection General Problems Parsing Perl Code
10564
10565It is often heard that only Perl can parse Perl.  This is not true.
10566Perl cannot be @emph{parsed} at all, it can only be @emph{executed}.
10567Perl has various built-in ambiguities that can only be resolved at runtime.
10568
10569The following example may illustrate one common problem:
10570
10571@example
10572print gettext "Hello World!";
10573@end example
10574
10575Although this example looks like a bullet-proof case of a function
10576invocation, it is not:
10577
10578@example
10579open gettext, ">testfile" or die;
10580print gettext "Hello world!"
10581@end example
10582
10583In this context, the string @code{gettext} looks more like a
10584file handle.  But not necessarily:
10585
10586@example
10587use Locale::Messages qw (:libintl_h);
10588open gettext ">testfile" or die;
10589print gettext "Hello world!";
10590@end example
10591
10592Now, the file is probably syntactically incorrect, provided that the module
10593@code{Locale::Messages} found first in the Perl include path exports a
10594function @code{gettext}.  But what if the module
10595@code{Locale::Messages} really looks like this?
10596
10597@example
10598use vars qw (*gettext);
10599
106001;
10601@end example
10602
10603In this case, the string @code{gettext} will be interpreted as a file
10604handle again, and the above example will create a file @file{testfile}
10605and write the string ``Hello world!'' into it.  Even advanced
10606control flow analysis will not really help:
10607
10608@example
10609if (0.5 < rand) @{
10610   eval "use Sane";
10611@} else @{
10612   eval "use InSane";
10613@}
10614print gettext "Hello world!";
10615@end example
10616
10617If the module @code{Sane} exports a function @code{gettext} that does
10618what we expect, and the module @code{InSane} opens a file for writing
10619and associates the @emph{handle} @code{gettext} with this output
10620stream, we are clueless again about what will happen at runtime.  It is
10621completely unpredictable.  The truth is that Perl has so many ways to
10622fill its symbol table at runtime that it is impossible to interpret a
10623particular piece of code without executing it.
10624
10625Of course, @code{xgettext} will not execute your Perl sources while
10626scanning for translatable strings, but rather use heuristics in order
10627to guess what you meant.
10628
10629Another problem is the ambiguity of the slash and the question mark.
10630Their interpretation depends on the context:
10631
10632@example
10633# A pattern match.
10634print "OK\n" if /foobar/;
10635
10636# A division.
10637print 1 / 2;
10638
10639# Another pattern match.
10640print "OK\n" if ?foobar?;
10641
10642# Conditional.
10643print $x ? "foo" : "bar";
10644@end example
10645
10646The slash may either act as the division operator or introduce a
10647pattern match, whereas the question mark may act as the ternary
10648conditional operator or as a pattern match, too.  Other programming
10649languages like @code{awk} present similar problems, but the consequences of a
10650misinterpretation are particularly nasty with Perl sources.  In @code{awk}
10651for instance, a statement can never exceed one line and the parser
10652can recover from a parsing error at the next newline and interpret
10653the rest of the input stream correctly.  Perl is different, as a
10654pattern match is terminated by the next appearance of the delimiter
10655(the slash or the question mark) in the input stream, regardless of
10656the semantic context.  If a slash is really a division sign but
10657mis-interpreted as a pattern match, the rest of the input file is most
10658probably parsed incorrectly.
10659
10660If you find that @code{xgettext} fails to extract strings from
10661portions of your sources, you should therefore look out for slashes
10662and/or question marks preceding these sections.  You may have come
10663across a bug in @code{xgettext}'s Perl parser (and of course you
10664should report that bug).  In the meantime you should consider to
10665reformulate your code in a manner less challenging to @code{xgettext}.
10666
10667@node Default Keywords, Special Keywords, General Problems, Perl
10668@subsubsection Which keywords will xgettext look for?
10669@cindex Perl default keywords
10670
10671Unless you instruct @code{xgettext} otherwise by invoking it with one
10672of the options @code{--keyword} or @code{-k}, it will recognize the
10673following keywords in your Perl sources:
10674
10675@itemize @bullet
10676
10677@item @code{gettext}
10678
10679@item @code{dgettext}
10680
10681@item @code{dcgettext}
10682
10683@item @code{ngettext:1,2}
10684
10685The first (singular) and the second (plural) argument will be
10686extracted.
10687
10688@item @code{dngettext:1,2}
10689
10690The first (singular) and the second (plural) argument will be
10691extracted.
10692
10693@item @code{dcngettext:1,2}
10694
10695The first (singular) and the second (plural) argument will be
10696extracted.
10697
10698@item @code{gettext_noop}
10699
10700@item @code{%gettext}
10701
10702The keys of lookups into the hash @code{%gettext} will be extracted.
10703
10704@item @code{$gettext}
10705
10706The keys of lookups into the hash reference @code{$gettext} will be extracted.
10707
10708@end itemize
10709
10710@node Special Keywords, Quote-like Expressions, Default Keywords, Perl
10711@subsubsection How to Extract Hash Keys
10712@cindex Perl special keywords for hash-lookups
10713
10714Translating messages at runtime is normally performed by looking up the
10715original string in the translation database and returning the
10716translated version.  The ``natural'' Perl implementation is a hash
10717lookup, and, of course, @code{xgettext} supports such practice.
10718
10719@example
10720print __"Hello world!";
10721print $__@{"Hello world!"@};
10722print $__->@{"Hello world!"@};
10723print $$__@{"Hello world!"@};
10724@end example  
10725
10726The above four lines all do the same thing.  The Perl module 
10727@code{Locale::TextDomain} exports by default a hash @code{%__} that
10728is tied to the function @code{__()}.  It also exports a reference
10729@code{$__} to @code{%__}.
10730
10731If an argument to the @code{xgettext} option @code{--keyword},
10732resp. @code{-k} starts with a percent sign, the rest of the keyword is
10733interpreted as the name of a hash.  If it starts with a dollar
10734sign, the rest of the keyword is interpreted as a reference to a
10735hash.
10736
10737Note that you can omit the quotation marks (single or double) around
10738the hash key (almost) whenever Perl itself allows it:
10739
10740@example
10741print $gettext@{Error@};
10742@end example
10743
10744The exact rule is: You can omit the surrounding quotes, when the hash
10745key is a valid C (!) identifier, i.e.@: when it starts with an
10746underscore or an ASCII letter and is followed by an arbitrary number
10747of underscores, ASCII letters or digits.  Other Unicode characters
10748are @emph{not} allowed, regardless of the @code{use utf8} pragma.
10749
10750@node Quote-like Expressions, Interpolation I, Special Keywords, Perl
10751@subsubsection What are Strings And Quote-like Expressions?
10752@cindex Perl quote-like expressions
10753
10754Perl offers a plethora of different string constructs.  Those that can
10755be used either as arguments to functions or inside braces for hash
10756lookups are generally supported by @code{xgettext}.  
10757
10758@itemize @bullet
10759@item @strong{double-quoted strings}
10760@*
10761@example
10762print gettext "Hello World!";
10763@end example
10764
10765@item @strong{single-quoted strings}
10766@*
10767@example
10768print gettext 'Hello World!';
10769@end example
10770
10771@item @strong{the operator qq}
10772@*
10773@example
10774print gettext qq |Hello World!|;
10775print gettext qq <E-mail: <guido\@@imperia.net>>;
10776@end example
10777
10778The operator @code{qq} is fully supported.  You can use arbitrary
10779delimiters, including the four bracketing delimiters (round, angle,
10780square, curly) that nest.
10781
10782@item @strong{the operator q}
10783@*
10784@example
10785print gettext q |Hello World!|;
10786print gettext q <E-mail: <guido@@imperia.net>>;
10787@end example
10788
10789The operator @code{q} is fully supported.  You can use arbitrary
10790delimiters, including the four bracketing delimiters (round, angle,
10791square, curly) that nest.
10792
10793@item @strong{the operator qx}
10794@*
10795@example
10796print gettext qx ;LANGUAGE=C /bin/date;
10797print gettext qx [/usr/bin/ls | grep '^[A-Z]*'];
10798@end example
10799
10800The operator @code{qx} is fully supported.  You can use arbitrary
10801delimiters, including the four bracketing delimiters (round, angle,
10802square, curly) that nest.
10803
10804The example is actually a useless use of @code{gettext}.  It will
10805invoke the @code{gettext} function on the output of the command
10806specified with the @code{qx} operator.  The feature was included
10807in order to make the interface consistent (the parser will extract
10808all strings and quote-like expressions).
10809
10810@item @strong{here documents}
10811@*
10812@example
10813@group
10814print gettext <<'EOF';
10815program not found in $PATH
10816EOF
10817
10818print ngettext <<EOF, <<"EOF";
10819one file deleted
10820EOF
10821several files deleted
10822EOF
10823@end group
10824@end example
10825
10826Here-documents are recognized.  If the delimiter is enclosed in single
10827quotes, the string is not interpolated.  If it is enclosed in double
10828quotes or has no quotes at all, the string is interpolated.
10829
10830Delimiters that start with a digit are not supported!
10831
10832@end itemize
10833
10834@node Interpolation I, Interpolation II, Quote-like Expressions, Perl
10835@subsubsection Invalid Uses Of String Interpolation
10836@cindex Perl invalid string interpolation
10837
10838Perl is capable of interpolating variables into strings.  This offers
10839some nice features in localized programs but can also lead to
10840problems.
10841
10842A common error is a construct like the following:
10843
10844@example
10845print gettext "This is the program $0!\n";
10846@end example
10847
10848Perl will interpolate at runtime the value of the variable @code{$0}
10849into the argument of the @code{gettext()} function.  Hence, this
10850argument is not a string constant but a variable argument (@code{$0}
10851is a global variable that holds the name of the Perl script being
10852executed).  The interpolation is performed by Perl before the string
10853argument is passed to @code{gettext()} and will therefore depend on
10854the name of the script which can only be determined at runtime.
10855Consequently, it is almost impossible that a translation can be looked
10856up at runtime (except if, by accident, the interpolated string is found
10857in the message catalog).
10858
10859The @code{xgettext} program will therefore terminate parsing with a fatal
10860error if it encounters a variable inside of an extracted string.  In
10861general, this will happen for all kinds of string interpolations that
10862cannot be safely performed at compile time.  If you absolutely know
10863what you are doing, you can always circumvent this behavior:
10864
10865@example
10866my $know_what_i_am_doing = "This is program $0!\n";
10867print gettext $know_what_i_am_doing;
10868@end example
10869
10870Since the parser only recognizes strings and quote-like expressions,
10871but not variables or other terms, the above construct will be
10872accepted.  You will have to find another way, however, to let your
10873original string make it into your message catalog.
10874
10875If invoked with the option @code{--extract-all}, resp. @code{-a},
10876variable interpolation will be accepted.  Rationale: You will
10877generally use this option in order to prepare your sources for
10878internationalization.
10879
10880Please see the manual page @samp{man perlop} for details of strings and
10881quote-like expressions that are subject to interpolation and those
10882that are not.  Safe interpolations (that will not lead to a fatal
10883error) are:
10884
10885@itemize @bullet
10886
10887@item the escape sequences @code{\t} (tab, HT, TAB), @code{\n}
10888(newline, NL), @code{\r} (return, CR), @code{\f} (form feed, FF),
10889@code{\b} (backspace, BS), @code{\a} (alarm, bell, BEL), and @code{\e}
10890(escape, ESC).
10891
10892@item octal chars, like @code{\033}
10893@*
10894Note that octal escapes in the range of 400-777 are translated into a 
10895UTF-8 representation, regardless of the presence of the @code{use utf8} pragma.
10896
10897@item hex chars, like @code{\x1b}
10898
10899@item wide hex chars, like @code{\x@{263a@}}
10900@*
10901Note that this escape is translated into a UTF-8 representation,
10902regardless of the presence of the @code{use utf8} pragma.
10903
10904@item control chars, like @code{\c[} (CTRL-[)
10905
10906@item named Unicode chars, like @code{\N@{LATIN CAPITAL LETTER C WITH CEDILLA@}}
10907@*
10908Note that this escape is translated into a UTF-8 representation,
10909regardless of the presence of the @code{use utf8} pragma.
10910@end itemize
10911
10912The following escapes are considered partially safe:
10913
10914@itemize @bullet
10915
10916@item @code{\l} lowercase next char
10917
10918@item @code{\u} uppercase next char
10919
10920@item @code{\L} lowercase till \E
10921
10922@item @code{\U} uppercase till \E
10923
10924@item @code{\E} end case modification
10925
10926@item @code{\Q} quote non-word characters till \E
10927
10928@end itemize
10929
10930These escapes are only considered safe if the string consists of
10931ASCII characters only.  Translation of characters outside the range
10932defined by ASCII is locale-dependent and can actually only be performed 
10933at runtime; @code{xgettext} doesn't do these locale-dependent translations
10934at extraction time.
10935
10936Except for the modifier @code{\Q}, these translations, albeit valid,
10937are generally useless and only obfuscate your sources.  If a
10938translation can be safely performed at compile time you can just as
10939well write what you mean.
10940
10941@node Interpolation II, Parentheses, Interpolation I, Perl
10942@subsubsection Valid Uses Of String Interpolation
10943@cindex Perl valid string interpolation
10944
10945Perl is often used to generate sources for other programming languages
10946or arbitrary file formats.  Web applications that output HTML code
10947make a prominent example for such usage.
10948
10949You will often come across situations where you want to intersperse
10950code written in the target (programming) language with translatable
10951messages, like in the following HTML example:
10952
10953@example
10954print gettext <<EOF;
10955<h1>My Homepage</h1>
10956<script language="JavaScript"><!--
10957for (i = 0; i < 100; ++i) @{
10958    alert ("Thank you so much for visiting my homepage!");
10959@}
10960//--></script>
10961EOF
10962@end example
10963
10964The parser will extract the entire here document, and it will appear
10965entirely in the resulting PO file, including the JavaScript snippet
10966embedded in the HTML code.  If you exaggerate with constructs like 
10967the above, you will run the risk that the translators of your package 
10968will look out for a less challenging project.  You should consider an 
10969alternative expression here:
10970
10971@example
10972print <<EOF;
10973<h1>$gettext@{"My Homepage"@}</h1>
10974<script language="JavaScript"><!--
10975for (i = 0; i < 100; ++i) @{
10976    alert ("$gettext@{'Thank you so much for visiting my homepage!'@}");
10977@}
10978//--></script>
10979EOF
10980@end example
10981
10982Only the translatable portions of the code will be extracted here, and
10983the resulting PO file will begrudgingly improve in terms of readability.
10984
10985You can interpolate hash lookups in all strings or quote-like
10986expressions that are subject to interpolation (see the manual page
10987@samp{man perlop} for details).  Double interpolation is invalid, however:
10988
10989@example
10990# TRANSLATORS: Replace "the earth" with the name of your planet.
10991print gettext qq@{Welcome to $gettext->@{"the earth"@}@};
10992@end example
10993
10994The @code{qq}-quoted string is recognized as an argument to @code{xgettext} in
10995the first place, and checked for invalid variable interpolation.  The
10996dollar sign of hash-dereferencing will therefore terminate the parser 
10997with an ``invalid interpolation'' error.
10998
10999It is valid to interpolate hash lookups in regular expressions:
11000
11001@example
11002if ($var =~ /$gettext@{"the earth"@}/) @{
11003   print gettext "Match!\n";
11004@}
11005s/$gettext@{"U. S. A."@}/$gettext@{"U. S. A."@} $gettext@{"(dial +0)"@}/g;
11006@end example
11007
11008@node Parentheses, Long Lines, Interpolation II, Perl
11009@subsubsection When To Use Parentheses
11010@cindex Perl parentheses
11011
11012In Perl, parentheses around function arguments are mostly optional.
11013@code{xgettext} will always assume that all
11014recognized keywords (except for hashes and hash references) are names
11015of properly prototyped functions, and will (hopefully) only require
11016parentheses where Perl itself requires them.  All constructs in the
11017following example are therefore ok to use:
11018
11019@example
11020@group
11021print gettext ("Hello World!\n");
11022print gettext "Hello World!\n";
11023print dgettext ($package => "Hello World!\n");
11024print dgettext $package, "Hello World!\n";
11025
11026# The "fat comma" => turns the left-hand side argument into a
11027# single-quoted string!
11028print dgettext smellovision => "Hello World!\n";
11029
11030# The following assignment only works with prototyped functions.
11031# Otherwise, the functions will act as "greedy" list operators and
11032# eat up all following arguments.
11033my $anonymous_hash = @{
11034   planet => gettext "earth",
11035   cakes => ngettext "one cake", "several cakes", $n,
11036   still => $works,
11037@};
11038# The same without fat comma:
11039my $other_hash = @{
11040   'planet', gettext "earth",
11041   'cakes', ngettext "one cake", "several cakes", $n,
11042   'still', $works,
11043@};
11044
11045# Parentheses are only significant for the first argument.
11046print dngettext 'package', ("one cake", "several cakes", $n), $discarded;
11047@end group
11048@end example
11049
11050@node Long Lines, Perl Pitfalls, Parentheses, Perl
11051@subsubsection How To Grok with Long Lines
11052@cindex Perl long lines
11053
11054The necessity of long messages can often lead to a cumbersome or
11055unreadable coding style.  Perl has several options that may prevent
11056you from writing unreadable code, and
11057@code{xgettext} does its best to do likewise.  This is where the dot
11058operator (the string concatenation operator) may come in handy:
11059
11060@example
11061@group
11062print gettext ("This is a very long"
11063               . " message that is still"
11064               . " readable, because"
11065               . " it is split into"
11066               . " multiple lines.\n");
11067@end group
11068@end example
11069
11070Perl is smart enough to concatenate these constant string fragments
11071into one long string at compile time, and so is
11072@code{xgettext}.  You will only find one long message in the resulting
11073POT file.
11074
11075Note that the future Perl 6 will probably use the underscore
11076(@samp{_}) as the string concatenation operator, and the dot 
11077(@samp{.}) for dereferencing.  This new syntax is not yet supported by
11078@code{xgettext}.
11079
11080If embedded newline characters are not an issue, or even desired, you
11081may also insert newline characters inside quoted strings wherever you
11082feel like it:
11083
11084@example
11085@group
11086print gettext ("<em>In HTML output
11087embedded newlines are generally no
11088problem, since adjacent whitespace
11089is always rendered into a single
11090space character.</em>");
11091@end group
11092@end example
11093
11094You may also consider to use here documents:
11095
11096@example
11097@group
11098print gettext <<EOF;
11099<em>In HTML output
11100embedded newlines are generally no
11101problem, since adjacent whitespace
11102is always rendered into a single
11103space character.</em>
11104EOF
11105@end group
11106@end example
11107
11108Please do not forget that the line breaks are real, i.e.@: they
11109translate into newline characters that will consequently show up in
11110the resulting POT file.
11111
11112@node Perl Pitfalls,  , Long Lines, Perl
11113@subsubsection Bugs, Pitfalls, And Things That Do Not Work
11114@cindex Perl pitfalls
11115
11116The foregoing sections should have proven that
11117@code{xgettext} is quite smart in extracting translatable strings from
11118Perl sources.  Yet, some more or less exotic constructs that could be
11119expected to work, actually do not work.  
11120
11121One of the more relevant limitations can be found in the
11122implementation of variable interpolation inside quoted strings.  Only
11123simple hash lookups can be used there:
11124
11125@example
11126print <<EOF;
11127$gettext@{"The dot operator"
11128          . " does not work"
11129          . "here!"@}
11130Likewise, you cannot @@@{[ gettext ("interpolate function calls") ]@}
11131inside quoted strings or quote-like expressions.
11132EOF
11133@end example
11134
11135This is valid Perl code and will actually trigger invocations of the
11136@code{gettext} function at runtime.  Yet, the Perl parser in
11137@code{xgettext} will fail to recognize the strings.  A less obvious
11138example can be found in the interpolation of regular expressions:
11139
11140@example
11141s/<!--START_OF_WEEK-->/gettext ("Sunday")/e;
11142@end example
11143
11144The modifier @code{e} will cause the substitution to be interpreted as
11145an evaluable statement.  Consequently, at runtime the function
11146@code{gettext()} is called, but again, the parser fails to extract the
11147string ``Sunday''.  Use a temporary variable as a simple workaround if
11148you really happen to need this feature:
11149
11150@example
11151my $sunday = gettext "Sunday";
11152s/<!--START_OF_WEEK-->/$sunday/;
11153@end example
11154
11155Hash slices would also be handy but are not recognized:
11156
11157@example
11158my @@weekdays = @@gettext@{'Sunday', 'Monday', 'Tuesday', 'Wednesday',
11159                        'Thursday', 'Friday', 'Saturday'@};
11160# Or even:
11161@@weekdays = @@gettext@{qw (Sunday Monday Tuesday Wednesday Thursday
11162                         Friday Saturday) @};
11163@end example
11164
11165This is perfectly valid usage of the tied hash @code{%gettext} but the
11166strings are not recognized and therefore will not be extracted.
11167
11168Another caveat of the current version is its rudimentary support for
11169non-ASCII characters in identifiers.  You may encounter serious
11170problems if you use identifiers with characters outside the range of
11171'A'-'Z', 'a'-'z', '0'-'9' and the underscore '_'.
11172
11173Maybe some of these missing features will be implemented in future
11174versions, but since you can always make do without them at minimal effort,
11175these todos have very low priority.
11176
11177A nasty problem are brace format strings that already contain braces
11178as part of the normal text, for example the usage strings typically
11179encountered in programs:
11180
11181@example
11182die "usage: $0 @{OPTIONS@} FILENAME...\n";
11183@end example
11184
11185If you want to internationalize this code with Perl brace format strings,
11186you will run into a problem:
11187
11188@example
11189die __x ("usage: @{program@} @{OPTIONS@} FILENAME...\n", program => $0);
11190@end example
11191
11192Whereas @samp{@{program@}} is a placeholder, @samp{@{OPTIONS@}}
11193is not and should probably be translated. Yet, there is no way to teach
11194the Perl parser in @code{xgettext} to recognize the first one, and leave
11195the other one alone.
11196
11197There are two possible work-arounds for this problem.  If you are
11198sure that your program will run under Perl 5.8.0 or newer (these
11199Perl versions handle positional parameters in @code{printf()}) or
11200if you are sure that the translator will not have to reorder the arguments
11201in her translation -- for example if you have only one brace placeholder
11202in your string, or if it describes a syntax, like in this one --, you can
11203mark the string as @code{no-perl-brace-format} and use @code{printf()}:
11204
11205@example
11206# xgettext: no-perl-brace-format
11207die sprintf ("usage: %s @{OPTIONS@} FILENAME...\n", $0);
11208@end example
11209
11210If you want to use the more portable Perl brace format, you will have to do
11211put placeholders in place of the literal braces:
11212
11213@example
11214die __x ("usage: @{program@} @{[@}OPTIONS@{]@} FILENAME...\n",
11215         program => $0, '[' => '@{', ']' => '@}');
11216@end example
11217
11218Perl brace format strings know no escaping mechanism.  No matter how this
11219escaping mechanism looked like, it would either give the programmer a
11220hard time, make translating Perl brace format strings heavy-going, or
11221result in a performance penalty at runtime, when the format directives
11222get executed.  Most of the time you will happily get along with
11223@code{printf()} for this special case.
11224
11225@node PHP, Pike, Perl, List of Programming Languages
11226@subsection PHP Hypertext Preprocessor
11227@cindex PHP
11228
11229@table @asis
11230@item RPMs
11231mod_php4, mod_php4-core, phpdoc
11232
11233@item File extension
11234@code{php}, @code{php3}, @code{php4}
11235
11236@item String syntax
11237@code{"abc"}, @code{'abc'}
11238
11239@item gettext shorthand
11240@code{_("abc")}
11241
11242@item gettext/ngettext functions
11243@code{gettext}, @code{dgettext}, @code{dcgettext}; starting with PHP 4.2.0
11244also @code{ngettext}, @code{dngettext}, @code{dcngettext}
11245
11246@item textdomain
11247@code{textdomain} function
11248
11249@item bindtextdomain
11250@code{bindtextdomain} function
11251
11252@item setlocale
11253Programmer must call @code{setlocale (LC_ALL, "")}
11254
11255@item Prerequisite
11256---
11257
11258@item Use or emulate GNU gettext
11259use
11260
11261@item Extractor
11262@code{xgettext}
11263
11264@item Formatting with positions
11265@code{printf "%2\$d %1\$d"}
11266
11267@item Portability
11268On platforms without gettext, the functions are not available.
11269
11270@item po-mode marking
11271---
11272@end table
11273
11274An example is available in the @file{examples} directory: @code{hello-php}.
11275
11276@node Pike, GCC-source, PHP, List of Programming Languages
11277@subsection Pike
11278@cindex Pike
11279
11280@table @asis
11281@item RPMs
11282roxen
11283
11284@item File extension
11285@code{pike}
11286
11287@item String syntax
11288@code{"abc"}
11289
11290@item gettext shorthand
11291---
11292
11293@item gettext/ngettext functions
11294@code{gettext}, @code{dgettext}, @code{dcgettext}
11295
11296@item textdomain
11297@code{textdomain} function
11298
11299@item bindtextdomain
11300@code{bindtextdomain} function
11301
11302@item setlocale
11303@code{setlocale} function
11304
11305@item Prerequisite
11306@code{import Locale.Gettext;}
11307
11308@item Use or emulate GNU gettext
11309use
11310
11311@item Extractor
11312---
11313
11314@item Formatting with positions
11315---
11316
11317@item Portability
11318On platforms without gettext, the functions are not available.
11319
11320@item po-mode marking
11321---
11322@end table
11323
11324@node GCC-source,  , Pike, List of Programming Languages
11325@subsection GNU Compiler Collection sources
11326@cindex GCC-source
11327
11328@table @asis
11329@item RPMs
11330gcc
11331
11332@item File extension
11333@code{c}, @code{h}.
11334
11335@item String syntax
11336@code{"abc"}
11337
11338@item gettext shorthand
11339@code{_("abc")}
11340
11341@item gettext/ngettext functions
11342@code{gettext}, @code{dgettext}, @code{dcgettext}, @code{ngettext},
11343@code{dngettext}, @code{dcngettext}
11344
11345@item textdomain
11346@code{textdomain} function
11347
11348@item bindtextdomain
11349@code{bindtextdomain} function
11350
11351@item setlocale
11352Programmer must call @code{setlocale (LC_ALL, "")}
11353
11354@item Prerequisite
11355@code{#include "intl.h"}
11356
11357@item Use or emulate GNU gettext
11358Use
11359
11360@item Extractor
11361@code{xgettext -k_}
11362
11363@item Formatting with positions
11364---
11365
11366@item Portability
11367Uses autoconf macros
11368
11369@item po-mode marking
11370yes
11371@end table
11372
11373@c This is the template for new languages.
11374@ignore
11375
11376@ node
11377@ subsection 
11378
11379@table @asis
11380@item RPMs
11381
11382@item File extension
11383
11384@item String syntax
11385
11386@item gettext shorthand
11387
11388@item gettext/ngettext functions
11389
11390@item textdomain
11391
11392@item bindtextdomain
11393
11394@item setlocale
11395
11396@item Prerequisite
11397
11398@item Use or emulate GNU gettext
11399
11400@item Extractor
11401
11402@item Formatting with positions
11403
11404@item Portability
11405
11406@item po-mode marking
11407@end table
11408
11409@end ignore
11410
11411@node List of Data Formats,  , List of Programming Languages, Programming Languages
11412@section Internationalizable Data
11413
11414Here is a list of other data formats which can be internationalized
11415using GNU gettext.
11416
11417@menu
11418* POT::                         POT - Portable Object Template
11419* RST::                         Resource String Table
11420* Glade::                       Glade - GNOME user interface description
11421@end menu
11422
11423@node POT, RST, List of Data Formats, List of Data Formats
11424@subsection POT - Portable Object Template
11425
11426@table @asis
11427@item RPMs
11428gettext
11429
11430@item File extension
11431@code{pot}, @code{po}
11432
11433@item Extractor
11434@code{xgettext}
11435@end table
11436
11437@node RST, Glade, POT, List of Data Formats
11438@subsection Resource String Table
11439@cindex RST
11440
11441@table @asis
11442@item RPMs
11443fpk
11444
11445@item File extension
11446@code{rst}
11447
11448@item Extractor
11449@code{xgettext}, @code{rstconv}
11450@end table
11451
11452@node Glade,  , RST, List of Data Formats
11453@subsection Glade - GNOME user interface description
11454
11455@table @asis
11456@item RPMs
11457glade, libglade, glade2, libglade2, intltool
11458
11459@item File extension
11460@code{glade}, @code{glade2}
11461
11462@item Extractor
11463@code{xgettext}, @code{libglade-xgettext}, @code{xml-i18n-extract}, @code{intltool-extract}
11464@end table
11465
11466@c This is the template for new data formats.
11467@ignore
11468
11469@ node
11470@ subsection 
11471
11472@table @asis
11473@item RPMs
11474
11475@item File extension
11476
11477@item Extractor
11478@end table
11479
11480@end ignore
11481
11482@node Conclusion, Language Codes, Programming Languages, Top
11483@chapter Concluding Remarks
11484
11485We would like to conclude this GNU @code{gettext} manual by presenting
11486an history of the Translation Project so far.  We finally give
11487a few pointers for those who want to do further research or readings
11488about Native Language Support matters.
11489
11490@menu
11491* History::                     History of GNU @code{gettext}
11492* References::                  Related Readings
11493@end menu
11494
11495@node History, References, Conclusion, Conclusion
11496@section History of GNU @code{gettext}
11497@cindex history of GNU @code{gettext}
11498
11499Internationalization concerns and algorithms have been informally
11500and casually discussed for years in GNU, sometimes around GNU
11501@code{libc}, maybe around the incoming @code{Hurd}, or otherwise
11502(nobody clearly remembers).  And even then, when the work started for
11503real, this was somewhat independently of these previous discussions.
11504
11505This all began in July 1994, when Patrick D'Cruze had the idea and
11506initiative of internationalizing version 3.9.2 of GNU @code{fileutils}.
11507He then asked Jim Meyering, the maintainer, how to get those changes
11508folded into an official release.  That first draft was full of
11509@code{#ifdef}s and somewhat disconcerting, and Jim wanted to find
11510nicer ways.  Patrick and Jim shared some tries and experimentations
11511in this area.  Then, feeling that this might eventually have a deeper
11512impact on GNU, Jim wanted to know what standards were, and contacted
11513Richard Stallman, who very quickly and verbally described an overall
11514design for what was meant to become @code{glocale}, at that time.
11515
11516Jim implemented @code{glocale} and got a lot of exhausting feedback
11517from Patrick and Richard, of course, but also from Mitchum DSouza
11518(who wrote a @code{catgets}-like package), Roland McGrath, maybe David
11519MacKenzie, Fran@,{c}ois Pinard, and Paul Eggert, all pushing and
11520pulling in various directions, not always compatible, to the extent
11521that after a couple of test releases, @code{glocale} was torn apart.
11522In particular, Paul Eggert -- always keeping an eye on developments
11523in Solaris -- advocated the use of the @code{gettext} API over
11524@code{glocale}'s @code{catgets}-based API.
11525
11526While Jim took some distance and time and became dad for a second
11527time, Roland wanted to get GNU @code{libc} internationalized, and
11528got Ulrich Drepper involved in that project.  Instead of starting
11529from @code{glocale}, Ulrich rewrote something from scratch, but
11530more conforming to the set of guidelines who emerged out of the
11531@code{glocale} effort.  Then, Ulrich got people from the previous
11532forum to involve themselves into this new project, and the switch
11533from @code{glocale} to what was first named @code{msgutils}, renamed
11534@code{nlsutils}, and later @code{gettext}, became officially accepted
11535by Richard in May 1995 or so.
11536
11537Let's summarize by saying that Ulrich Drepper wrote GNU @code{gettext}
11538in April 1995.  The first official release of the package, including
11539PO mode, occurred in July 1995, and was numbered 0.7.  Other people
11540contributed to the effort by providing a discussion forum around
11541Ulrich, writing little pieces of code, or testing.  These are quoted
11542in the @code{THANKS} file which comes with the GNU @code{gettext}
11543distribution.
11544
11545While this was being done, Fran@,{c}ois adapted half a dozen of
11546GNU packages to @code{glocale} first, then later to @code{gettext},
11547putting them in pretest, so providing along the way an effective
11548user environment for fine tuning the evolving tools.  He also took
11549the responsibility of organizing and coordinating the Translation
11550Project.  After nearly a year of informal exchanges between people from
11551many countries, translator teams started to exist in May 1995, through
11552the creation and support by Patrick D'Cruze of twenty unmoderated
11553mailing lists for that many native languages, and two moderated
11554lists: one for reaching all teams at once, the other for reaching
11555all willing maintainers of internationalized free software packages.
11556
11557Fran@,{c}ois also wrote PO mode in June 1995 with the collaboration
11558of Greg McGary, as a kind of contribution to Ulrich's package.
11559He also gave a hand with the GNU @code{gettext} Texinfo manual.
11560
11561In 1997, Ulrich Drepper released the GNU libc 2.0, which included the
11562@code{gettext}, @code{textdomain} and @code{bindtextdomain} functions.
11563
11564In 2000, Ulrich Drepper added plural form handling (the @code{ngettext}
11565function) to GNU libc.  Later, in 2001, he released GNU libc 2.2.x,
11566which is the first free C library with full internationalization support.
11567
11568Ulrich being quite busy in his role of General Maintainer of GNU libc,
11569he handed over the GNU @code{gettext} maintenance to Bruno Haible in
115702000.  Bruno added the plural form handling to the tools as well, added
11571support for UTF-8 and CJK locales, and wrote a few new tools for
11572manipulating PO files.
11573
11574@node References,  , History, Conclusion
11575@section Related Readings
11576@cindex related reading
11577@cindex bibliography
11578
11579@strong{ NOTE: } This documentation section is outdated and needs to be
11580revised.
11581
11582Eugene H. Dorr (@file{dorre@@well.com}) maintains an interesting
11583bibliography on internationalization matters, called
11584@cite{Internationalization Reference List}, which is available as:
11585@example
11586ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/i18n-books.txt
11587@end example
11588
11589Michael Gschwind (@file{mike@@vlsivie.tuwien.ac.at}) maintains a
11590Frequently Asked Questions (FAQ) list, entitled @cite{Programming for
11591Internationalisation}.  This FAQ discusses writing programs which
11592can handle different language conventions, character sets, etc.;
11593and is applicable to all character set encodings, with particular
11594emphasis on @w{ISO 8859-1}.  It is regularly published in Usenet
11595groups @file{comp.unix.questions}, @file{comp.std.internat},
11596@file{comp.software.international}, @file{comp.lang.c},
11597@file{comp.windows.x}, @file{comp.std.c}, @file{comp.answers}
11598and @file{news.answers}.  The home location of this document is:
11599@example
11600ftp://ftp.vlsivie.tuwien.ac.at/pub/8bit/ISO-programming
11601@end example
11602
11603Patrick D'Cruze (@file{pdcruze@@li.org}) wrote a tutorial about NLS
11604matters, and Jochen Hein (@file{Hein@@student.tu-clausthal.de}) took
11605over the responsibility of maintaining it.  It may be found as:
11606@example
11607ftp://sunsite.unc.edu/pub/Linux/utils/nls/catalogs/Incoming/...
11608     ...locale-tutorial-0.8.txt.gz
11609@end example
11610@noindent
11611This site is mirrored in:
11612@example
11613ftp://ftp.ibp.fr/pub/linux/sunsite/
11614@end example
11615
11616A French version of the same tutorial should be findable at:
11617@example
11618ftp://ftp.ibp.fr/pub/linux/french/docs/
11619@end example
11620@noindent
11621together with French translations of many Linux-related documents.
11622
11623@node Language Codes, Country Codes, Conclusion, Top
11624@appendix Language Codes
11625@cindex language codes
11626@cindex ISO 639
11627
11628The @w{ISO 639} standard defines two-letter codes for many languages, and
11629three-letter codes for more rarely used languages.
11630All abbreviations for languages used in the Translation Project should
11631come from this standard.
11632
11633@menu
11634* Usual Language Codes::        Two-letter ISO 639 language codes
11635* Rare Language Codes::         Three-letter ISO 639 language codes
11636@end menu
11637
11638@node Usual Language Codes, Rare Language Codes, Language Codes, Language Codes
11639@appendixsec Usual Language Codes
11640
11641For the commonly used languages, the @w{ISO 639-1} standard defines two-letter
11642codes.
11643
11644@table @samp
11645@include iso-639.texi
11646@end table
11647
11648@node Rare Language Codes,  , Usual Language Codes, Language Codes
11649@appendixsec Rare Language Codes
11650
11651For rarely used languages, the @w{ISO 639-2} standard defines three-letter
11652codes.  Here is the current list, reduced to only living languages with at least
11653one million of speakers.
11654
11655@table @samp
11656@include iso-639-2.texi
11657@end table
11658
11659@node Country Codes, Licenses, Language Codes, Top
11660@appendix Country Codes
11661@cindex country codes
11662@cindex ISO 3166
11663
11664The @w{ISO 3166} standard defines two character codes for many countries
11665and territories.  All abbreviations for countries used in the Translation
11666Project should come from this standard.
11667
11668@table @samp
11669@include iso-3166.texi
11670@end table
11671
11672@node Licenses, Program Index, Country Codes, Top
11673@appendix Licenses
11674@cindex Licenses
11675
11676The files of this package are covered by the licenses indicated in each
11677particular file or directory.  Here is a summary:
11678
11679@itemize @bullet
11680@item
11681The @code{libintl} and @code{libasprintf} libraries are covered by the
11682GNU Library General Public License (LGPL).  
11683A copy of the license is included in @ref{GNU LGPL}.
11684
11685@item
11686The executable programs of this package and the @code{libgettextpo} library
11687are covered by the GNU General Public License (GPL).
11688A copy of the license is included in @ref{GNU GPL}.
11689
11690@item
11691This manual is free documentation.  It is dually licensed under the
11692GNU FDL and the GNU GPL.  This means that you can redistribute this
11693manual under either of these two licenses, at your choice.
11694@*
11695This manual is covered by the GNU FDL.  Permission is granted to copy,
11696distribute and/or modify this document under the terms of the
11697GNU Free Documentation License (FDL), either version 1.2 of the
11698License, or (at your option) any later version published by the
11699Free Software Foundation (FSF); with no Invariant Sections, with no
11700Front-Cover Text, and with no Back-Cover Texts.
11701A copy of the license is included in @ref{GNU FDL}.
11702@*
11703This manual is covered by the GNU GPL.  You can redistribute it and/or
11704modify it under the terms of the GNU General Public License (GPL), either
11705version 2 of the License, or (at your option) any later version published
11706by the Free Software Foundation (FSF).
11707A copy of the license is included in @ref{GNU GPL}.
11708@end itemize
11709
11710@menu
11711* GNU GPL::                     GNU General Public License
11712* GNU LGPL::                    GNU Lesser General Public License
11713* GNU FDL::                     GNU Free Documentation License
11714@end menu
11715
11716@page
11717@include gpl.texi
11718@page
11719@include lgpl.texi
11720@page
11721@include fdl.texi
11722
11723@node Program Index, Option Index, Licenses, Top
11724@unnumbered Program Index
11725
11726@printindex pg
11727
11728@node Option Index, Variable Index, Program Index, Top
11729@unnumbered Option Index
11730
11731@printindex op
11732
11733@node Variable Index, PO Mode Index, Option Index, Top
11734@unnumbered Variable Index
11735
11736@printindex vr
11737
11738@node PO Mode Index, Autoconf Macro Index, Variable Index, Top
11739@unnumbered PO Mode Index
11740
11741@printindex em
11742
11743@node Autoconf Macro Index, Index, PO Mode Index, Top
11744@unnumbered Autoconf Macro Index
11745
11746@printindex am
11747
11748@node Index,  , Autoconf Macro Index, Top
11749@unnumbered General Index
11750
11751@printindex cp
11752
11753@iftex
11754@c Table of Contents
11755@contents
11756@end iftex
11757
11758@bye
11759
11760@c Local variables:
11761@c texinfo-column-for-description: 32
11762@c End:
11763