1\input texinfo @c -*-texinfo-*- 2@c %**start of header 3@setfilename gettext.info 4@c The @ifset makeinfo ... @end ifset conditional evaluates to true in makeinfo 5@c for info and html output, but to false in texi2html. 6@ifnottex 7@ifclear texi2html 8@set makeinfo 9@end ifclear 10@end ifnottex 11@c The @documentencoding is needed for makeinfo; texi2html 1.52 12@c doesn't recognize it. 13@ifset makeinfo 14@documentencoding UTF-8 15@end ifset 16@settitle GNU @code{gettext} utilities 17@finalout 18@c Indices: 19@c am = autoconf macro @amindex 20@c cp = concept @cindex 21@c ef = emacs function @efindex 22@c em = emacs mode @emindex 23@c ev = emacs variable @evindex 24@c fn = function @findex 25@c kw = keyword @kwindex 26@c op = option @opindex 27@c pg = program @pindex 28@c vr = variable @vindex 29@c Unused predefined indices: 30@c tp = type @tindex 31@c ky = keystroke @kindex 32@defcodeindex am 33@defcodeindex ef 34@defindex em 35@defcodeindex ev 36@defcodeindex kw 37@defcodeindex op 38@syncodeindex ef em 39@syncodeindex ev em 40@syncodeindex fn cp 41@syncodeindex kw cp 42@ifclear texi2html 43@firstparagraphindent insert 44@end ifclear 45@c %**end of header 46 47@include version.texi 48 49@ifinfo 50@dircategory GNU Gettext Utilities 51@direntry 52* gettext: (gettext). GNU gettext utilities. 53* autopoint: (gettext)autopoint Invocation. Copy gettext infrastructure. 54* envsubst: (gettext)envsubst Invocation. Expand environment variables. 55* gettextize: (gettext)gettextize Invocation. Prepare a package for gettext. 56* msgattrib: (gettext)msgattrib Invocation. Select part of a PO file. 57* msgcat: (gettext)msgcat Invocation. Combine several PO files. 58* msgcmp: (gettext)msgcmp Invocation. Compare a PO file and template. 59* msgcomm: (gettext)msgcomm Invocation. Match two PO files. 60* msgconv: (gettext)msgconv Invocation. Convert PO file to encoding. 61* msgen: (gettext)msgen Invocation. Create an English PO file. 62* msgexec: (gettext)msgexec Invocation. Process a PO file. 63* msgfilter: (gettext)msgfilter Invocation. Pipe a PO file through a filter. 64* msgfmt: (gettext)msgfmt Invocation. Make MO files out of PO files. 65* msggrep: (gettext)msggrep Invocation. Select part of a PO file. 66* msginit: (gettext)msginit Invocation. Create a fresh PO file. 67* msgmerge: (gettext)msgmerge Invocation. Update a PO file from template. 68* msgunfmt: (gettext)msgunfmt Invocation. Uncompile MO file into PO file. 69* msguniq: (gettext)msguniq Invocation. Unify duplicates for PO file. 70* ngettext: (gettext)ngettext Invocation. Translate a message with plural. 71* xgettext: (gettext)xgettext Invocation. Extract strings into a PO file. 72* ISO639: (gettext)Language Codes. ISO 639 language codes. 73* ISO3166: (gettext)Country Codes. ISO 3166 country codes. 74@end direntry 75@end ifinfo 76 77@ifinfo 78This file provides documentation for GNU @code{gettext} utilities. 79It also serves as a reference for the free Translation Project. 80 81@copying 82Copyright (C) 1995-1998, 2001-2007 Free Software Foundation, Inc. 83 84This manual is free documentation. It is dually licensed under the 85GNU FDL and the GNU GPL. This means that you can redistribute this 86manual under either of these two licenses, at your choice. 87 88This manual is covered by the GNU FDL. Permission is granted to copy, 89distribute and/or modify this document under the terms of the 90GNU Free Documentation License (FDL), either version 1.2 of the 91License, or (at your option) any later version published by the 92Free Software Foundation (FSF); with no Invariant Sections, with no 93Front-Cover Text, and with no Back-Cover Texts. 94A copy of the license is included in @ref{GNU FDL}. 95 96This manual is covered by the GNU GPL. You can redistribute it and/or 97modify it under the terms of the GNU General Public License (GPL), either 98version 2 of the License, or (at your option) any later version published 99by the Free Software Foundation (FSF). 100A copy of the license is included in @ref{GNU GPL}. 101@end copying 102@end ifinfo 103 104@titlepage 105@title GNU gettext tools, version @value{VERSION} 106@subtitle Native Language Support Library and Tools 107@subtitle Edition @value{EDITION}, @value{UPDATED} 108@author Ulrich Drepper 109@author Jim Meyering 110@author Fran@,{c}ois Pinard 111@author Bruno Haible 112 113@ifnothtml 114@page 115@vskip 0pt plus 1filll 116@c @insertcopying 117Copyright (C) 1995-1998, 2001-2007 Free Software Foundation, Inc. 118 119This manual is free documentation. It is dually licensed under the 120GNU FDL and the GNU GPL. This means that you can redistribute this 121manual under either of these two licenses, at your choice. 122 123This manual is covered by the GNU FDL. Permission is granted to copy, 124distribute and/or modify this document under the terms of the 125GNU Free Documentation License (FDL), either version 1.2 of the 126License, or (at your option) any later version published by the 127Free Software Foundation (FSF); with no Invariant Sections, with no 128Front-Cover Text, and with no Back-Cover Texts. 129A copy of the license is included in @ref{GNU FDL}. 130 131This manual is covered by the GNU GPL. You can redistribute it and/or 132modify it under the terms of the GNU General Public License (GPL), either 133version 2 of the License, or (at your option) any later version published 134by the Free Software Foundation (FSF). 135A copy of the license is included in @ref{GNU GPL}. 136@end ifnothtml 137@end titlepage 138 139@ifnottex 140@c Table of Contents 141@contents 142@end ifnottex 143 144@ifset makeinfo 145@node Top, Introduction, (dir), (dir) 146@top GNU @code{gettext} utilities 147 148This manual documents the GNU gettext tools and the GNU libintl library, 149version @value{VERSION}. 150 151@menu 152* Introduction:: Introduction 153* Users:: The User's View 154* PO Files:: The Format of PO Files 155* Sources:: Preparing Program Sources 156* Template:: Making the PO Template File 157* Creating:: Creating a New PO File 158* Updating:: Updating Existing PO Files 159* Editing:: Editing PO Files 160* Manipulating:: Manipulating PO Files 161* Binaries:: Producing Binary MO Files 162* Programmers:: The Programmer's View 163* Translators:: The Translator's View 164* Maintainers:: The Maintainer's View 165* Installers:: The Installer's and Distributor's View 166* Programming Languages:: Other Programming Languages 167* Conclusion:: Concluding Remarks 168 169* Language Codes:: ISO 639 language codes 170* Country Codes:: ISO 3166 country codes 171* Licenses:: Licenses 172 173* Program Index:: Index of Programs 174* Option Index:: Index of Command-Line Options 175* Variable Index:: Index of Environment Variables 176* PO Mode Index:: Index of Emacs PO Mode Commands 177* Autoconf Macro Index:: Index of Autoconf Macros 178* Index:: General Index 179 180@detailmenu 181 --- The Detailed Node Listing --- 182 183Introduction 184 185* Why:: The Purpose of GNU @code{gettext} 186* Concepts:: I18n, L10n, and Such 187* Aspects:: Aspects in Native Language Support 188* Files:: Files Conveying Translations 189* Overview:: Overview of GNU @code{gettext} 190 191The User's View 192 193* System Installation:: Questions During Operating System Installation 194* Setting the GUI Locale:: How to Specify the Locale Used by GUI Programs 195* Setting the POSIX Locale:: How to Specify the Locale According to POSIX 196* Installing Localizations:: How to Install Additional Translations 197 198Setting the POSIX Locale 199 200* Locale Names:: How a Locale Specification Looks Like 201* Locale Environment Variables:: Which Environment Variable Specfies What 202* The LANGUAGE variable:: How to Specify a Priority List of Languages 203 204Preparing Program Sources 205 206* Importing:: Importing the @code{gettext} declaration 207* Triggering:: Triggering @code{gettext} Operations 208* Preparing Strings:: Preparing Translatable Strings 209* Mark Keywords:: How Marks Appear in Sources 210* Marking:: Marking Translatable Strings 211* c-format Flag:: Telling something about the following string 212* Special cases:: Special Cases of Translatable Strings 213* Bug Report Address:: Letting Users Report Translation Bugs 214* Names:: Marking Proper Names for Translation 215* Libraries:: Preparing Library Sources 216 217Making the PO Template File 218 219* xgettext Invocation:: Invoking the @code{xgettext} Program 220 221Creating a New PO File 222 223* msginit Invocation:: Invoking the @code{msginit} Program 224* Header Entry:: Filling in the Header Entry 225 226Updating Existing PO Files 227 228* msgmerge Invocation:: Invoking the @code{msgmerge} Program 229 230Editing PO Files 231 232* KBabel:: KDE's PO File Editor 233* Gtranslator:: GNOME's PO File Editor 234* PO Mode:: Emacs's PO File Editor 235* Compendium:: Using Translation Compendia 236 237Emacs's PO File Editor 238 239* Installation:: Completing GNU @code{gettext} Installation 240* Main PO Commands:: Main Commands 241* Entry Positioning:: Entry Positioning 242* Normalizing:: Normalizing Strings in Entries 243* Translated Entries:: Translated Entries 244* Fuzzy Entries:: Fuzzy Entries 245* Untranslated Entries:: Untranslated Entries 246* Obsolete Entries:: Obsolete Entries 247* Modifying Translations:: Modifying Translations 248* Modifying Comments:: Modifying Comments 249* Subedit:: Mode for Editing Translations 250* C Sources Context:: C Sources Context 251* Auxiliary:: Consulting Auxiliary PO Files 252 253Using Translation Compendia 254 255* Creating Compendia:: Merging translations for later use 256* Using Compendia:: Using older translations if they fit 257 258Manipulating PO Files 259 260* msgcat Invocation:: Invoking the @code{msgcat} Program 261* msgconv Invocation:: Invoking the @code{msgconv} Program 262* msggrep Invocation:: Invoking the @code{msggrep} Program 263* msgfilter Invocation:: Invoking the @code{msgfilter} Program 264* msguniq Invocation:: Invoking the @code{msguniq} Program 265* msgcomm Invocation:: Invoking the @code{msgcomm} Program 266* msgcmp Invocation:: Invoking the @code{msgcmp} Program 267* msgattrib Invocation:: Invoking the @code{msgattrib} Program 268* msgen Invocation:: Invoking the @code{msgen} Program 269* msgexec Invocation:: Invoking the @code{msgexec} Program 270* Colorizing:: Highlighting parts of PO files 271* libgettextpo:: Writing your own programs that process PO files 272 273Highlighting parts of PO files 274 275* The --color option:: Triggering colorized output 276* The TERM variable:: The environment variable @code{TERM} 277* The --style option:: The @code{--style} option 278* Style rules:: Style rules for PO files 279* Customizing less:: Customizing @code{less} for viewing PO files 280 281Producing Binary MO Files 282 283* msgfmt Invocation:: Invoking the @code{msgfmt} Program 284* msgunfmt Invocation:: Invoking the @code{msgunfmt} Program 285* MO Files:: The Format of GNU MO Files 286 287The Programmer's View 288 289* catgets:: About @code{catgets} 290* gettext:: About @code{gettext} 291* Comparison:: Comparing the two interfaces 292* Using libintl.a:: Using libintl.a in own programs 293* gettext grok:: Being a @code{gettext} grok 294* Temp Programmers:: Temporary Notes for the Programmers Chapter 295 296About @code{catgets} 297 298* Interface to catgets:: The interface 299* Problems with catgets:: Problems with the @code{catgets} interface?! 300 301About @code{gettext} 302 303* Interface to gettext:: The interface 304* Ambiguities:: Solving ambiguities 305* Locating Catalogs:: Locating message catalog files 306* Charset conversion:: How to request conversion to Unicode 307* Contexts:: Solving ambiguities in GUI programs 308* Plural forms:: Additional functions for handling plurals 309* Optimized gettext:: Optimization of the *gettext functions 310 311Temporary Notes for the Programmers Chapter 312 313* Temp Implementations:: Temporary - Two Possible Implementations 314* Temp catgets:: Temporary - About @code{catgets} 315* Temp WSI:: Temporary - Why a single implementation 316* Temp Notes:: Temporary - Notes 317 318The Translator's View 319 320* Trans Intro 0:: Introduction 0 321* Trans Intro 1:: Introduction 1 322* Discussions:: Discussions 323* Organization:: Organization 324* Information Flow:: Information Flow 325* Prioritizing messages:: How to find which messages to translate first 326 327Organization 328 329* Central Coordination:: Central Coordination 330* National Teams:: National Teams 331* Mailing Lists:: Mailing Lists 332 333National Teams 334 335* Sub-Cultures:: Sub-Cultures 336* Organizational Ideas:: Organizational Ideas 337 338The Maintainer's View 339 340* Flat and Non-Flat:: Flat or Non-Flat Directory Structures 341* Prerequisites:: Prerequisite Works 342* gettextize Invocation:: Invoking the @code{gettextize} Program 343* Adjusting Files:: Files You Must Create or Alter 344* autoconf macros:: Autoconf macros for use in @file{configure.ac} 345* CVS Issues:: Integrating with CVS 346* Release Management:: Creating a Distribution Tarball 347 348Files You Must Create or Alter 349 350* po/POTFILES.in:: @file{POTFILES.in} in @file{po/} 351* po/LINGUAS:: @file{LINGUAS} in @file{po/} 352* po/Makevars:: @file{Makevars} in @file{po/} 353* po/Rules-*:: Extending @file{Makefile} in @file{po/} 354* configure.ac:: @file{configure.ac} at top level 355* config.guess:: @file{config.guess}, @file{config.sub} at top level 356* mkinstalldirs:: @file{mkinstalldirs} at top level 357* aclocal:: @file{aclocal.m4} at top level 358* acconfig:: @file{acconfig.h} at top level 359* config.h.in:: @file{config.h.in} at top level 360* Makefile:: @file{Makefile.in} at top level 361* src/Makefile:: @file{Makefile.in} in @file{src/} 362* lib/gettext.h:: @file{gettext.h} in @file{lib/} 363 364Autoconf macros for use in @file{configure.ac} 365 366* AM_GNU_GETTEXT:: AM_GNU_GETTEXT in @file{gettext.m4} 367* AM_GNU_GETTEXT_VERSION:: AM_GNU_GETTEXT_VERSION in @file{gettext.m4} 368* AM_GNU_GETTEXT_NEED:: AM_GNU_GETTEXT_NEED in @file{gettext.m4} 369* AM_GNU_GETTEXT_INTL_SUBDIR:: AM_GNU_GETTEXT_INTL_SUBDIR in @file{intldir.m4} 370* AM_PO_SUBDIRS:: AM_PO_SUBDIRS in @file{po.m4} 371* AM_ICONV:: AM_ICONV in @file{iconv.m4} 372 373Integrating with CVS 374 375* Distributed CVS:: Avoiding version mismatch in distributed development 376* Files under CVS:: Files to put under CVS version control 377* autopoint Invocation:: Invoking the @code{autopoint} Program 378 379Other Programming Languages 380 381* Language Implementors:: The Language Implementor's View 382* Programmers for other Languages:: The Programmer's View 383* Translators for other Languages:: The Translator's View 384* Maintainers for other Languages:: The Maintainer's View 385* List of Programming Languages:: Individual Programming Languages 386* List of Data Formats:: Internationalizable Data 387 388The Translator's View 389 390* c-format:: C Format Strings 391* objc-format:: Objective C Format Strings 392* sh-format:: Shell Format Strings 393* python-format:: Python Format Strings 394* lisp-format:: Lisp Format Strings 395* elisp-format:: Emacs Lisp Format Strings 396* librep-format:: librep Format Strings 397* scheme-format:: Scheme Format Strings 398* smalltalk-format:: Smalltalk Format Strings 399* java-format:: Java Format Strings 400* csharp-format:: C# Format Strings 401* awk-format:: awk Format Strings 402* object-pascal-format:: Object Pascal Format Strings 403* ycp-format:: YCP Format Strings 404* tcl-format:: Tcl Format Strings 405* perl-format:: Perl Format Strings 406* php-format:: PHP Format Strings 407* gcc-internal-format:: GCC internal Format Strings 408* qt-format:: Qt Format Strings 409* kde-format:: KDE Format Strings 410* boost-format:: Boost Format Strings 411 412Individual Programming Languages 413 414* C:: C, C++, Objective C 415* sh:: sh - Shell Script 416* bash:: bash - Bourne-Again Shell Script 417* Python:: Python 418* Common Lisp:: GNU clisp - Common Lisp 419* clisp C:: GNU clisp C sources 420* Emacs Lisp:: Emacs Lisp 421* librep:: librep 422* Scheme:: GNU guile - Scheme 423* Smalltalk:: GNU Smalltalk 424* Java:: Java 425* C#:: C# 426* gawk:: GNU awk 427* Pascal:: Pascal - Free Pascal Compiler 428* wxWidgets:: wxWidgets library 429* YCP:: YCP - YaST2 scripting language 430* Tcl:: Tcl - Tk's scripting language 431* Perl:: Perl 432* PHP:: PHP Hypertext Preprocessor 433* Pike:: Pike 434* GCC-source:: GNU Compiler Collection sources 435 436sh - Shell Script 437 438* Preparing Shell Scripts:: Preparing Shell Scripts for Internationalization 439* gettext.sh:: Contents of @code{gettext.sh} 440* gettext Invocation:: Invoking the @code{gettext} program 441* ngettext Invocation:: Invoking the @code{ngettext} program 442* envsubst Invocation:: Invoking the @code{envsubst} program 443* eval_gettext Invocation:: Invoking the @code{eval_gettext} function 444* eval_ngettext Invocation:: Invoking the @code{eval_ngettext} function 445 446Perl 447 448* General Problems:: General Problems Parsing Perl Code 449* Default Keywords:: Which Keywords Will xgettext Look For? 450* Special Keywords:: How to Extract Hash Keys 451* Quote-like Expressions:: What are Strings And Quote-like Expressions? 452* Interpolation I:: Invalid String Interpolation 453* Interpolation II:: Valid String Interpolation 454* Parentheses:: When To Use Parentheses 455* Long Lines:: How To Grok with Long Lines 456* Perl Pitfalls:: Bugs, Pitfalls, and Things That Do Not Work 457 458Internationalizable Data 459 460* POT:: POT - Portable Object Template 461* RST:: Resource String Table 462* Glade:: Glade - GNOME user interface description 463 464Concluding Remarks 465 466* History:: History of GNU @code{gettext} 467* References:: Related Readings 468 469Language Codes 470 471* Usual Language Codes:: Two-letter ISO 639 language codes 472* Rare Language Codes:: Three-letter ISO 639 language codes 473 474Licenses 475 476* GNU GPL:: GNU General Public License 477* GNU LGPL:: GNU Lesser General Public License 478* GNU FDL:: GNU Free Documentation License 479 480@end detailmenu 481@end menu 482 483@end ifset 484 485@node Introduction, Users, Top, Top 486@chapter Introduction 487 488This chapter explains the goals sought in the creation 489of GNU @code{gettext} and the free Translation Project. 490Then, it explains a few broad concepts around 491Native Language Support, and positions message translation with regard 492to other aspects of national and cultural variance, as they apply 493to programs. It also surveys those files used to convey the 494translations. It explains how the various tools interact in the 495initial generation of these files, and later, how the maintenance 496cycle should usually operate. 497 498@cindex sex 499@cindex he, she, and they 500@cindex she, he, and they 501In this manual, we use @emph{he} when speaking of the programmer or 502maintainer, @emph{she} when speaking of the translator, and @emph{they} 503when speaking of the installers or end users of the translated program. 504This is only a convenience for clarifying the documentation. It is 505@emph{absolutely} not meant to imply that some roles are more appropriate 506to males or females. Besides, as you might guess, GNU @code{gettext} 507is meant to be useful for people using computers, whatever their sex, 508race, religion or nationality! 509 510@cindex bug report address 511Please send suggestions and corrections to: 512 513@example 514@group 515@r{Internet address:} 516 bug-gnu-gettext@@gnu.org 517@end group 518@end example 519 520@noindent 521Please include the manual's edition number and update date in your messages. 522 523@menu 524* Why:: The Purpose of GNU @code{gettext} 525* Concepts:: I18n, L10n, and Such 526* Aspects:: Aspects in Native Language Support 527* Files:: Files Conveying Translations 528* Overview:: Overview of GNU @code{gettext} 529@end menu 530 531@node Why, Concepts, Introduction, Introduction 532@section The Purpose of GNU @code{gettext} 533 534Usually, programs are written and documented in English, and use 535English at execution time to interact with users. This is true 536not only of GNU software, but also of a great deal of proprietary 537and free software. Using a common language is quite handy for 538communication between developers, maintainers and users from all 539countries. On the other hand, most people are less comfortable with 540English than with their own native language, and would prefer to 541use their mother tongue for day to day's work, as far as possible. 542Many would simply @emph{love} to see their computer screen showing 543a lot less of English, and far more of their own language. 544 545@cindex Translation Project 546However, to many people, this dream might appear so far fetched that 547they may believe it is not even worth spending time thinking about 548it. They have no confidence at all that the dream might ever 549become true. Yet some have not lost hope, and have organized themselves. 550The Translation Project is a formalization of this hope into a 551workable structure, which has a good chance to get all of us nearer 552the achievement of a truly multi-lingual set of programs. 553 554GNU @code{gettext} is an important step for the Translation Project, 555as it is an asset on which we may build many other steps. This package 556offers to programmers, translators and even users, a well integrated 557set of tools and documentation. Specifically, the GNU @code{gettext} 558utilities are a set of tools that provides a framework within which 559other free packages may produce multi-lingual messages. These tools 560include 561 562@itemize @bullet 563@item 564A set of conventions about how programs should be written to support 565message catalogs. 566 567@item 568A directory and file naming organization for the message catalogs 569themselves. 570 571@item 572A runtime library supporting the retrieval of translated messages. 573 574@item 575A few stand-alone programs to massage in various ways the sets of 576translatable strings, or already translated strings. 577 578@item 579A library supporting the parsing and creation of files containing 580translated messages. 581 582@item 583A special mode for Emacs@footnote{In this manual, all mentions of Emacs 584refers to either GNU Emacs or to XEmacs, which people sometimes call FSF 585Emacs and Lucid Emacs, respectively.} which helps preparing these sets 586and bringing them up to date. 587@end itemize 588 589GNU @code{gettext} is designed to minimize the impact of 590internationalization on program sources, keeping this impact as small 591and hardly noticeable as possible. Internationalization has better 592chances of succeeding if it is very light weighted, or at least, 593appear to be so, when looking at program sources. 594 595The Translation Project also uses the GNU @code{gettext} distribution 596as a vehicle for documenting its structure and methods. This goes 597beyond the strict technicalities of documenting the GNU @code{gettext} 598proper. By so doing, translators will find in a single place, as 599far as possible, all they need to know for properly doing their 600translating work. Also, this supplemental documentation might also 601help programmers, and even curious users, in understanding how GNU 602@code{gettext} is related to the remainder of the Translation 603Project, and consequently, have a glimpse at the @emph{big picture}. 604 605@node Concepts, Aspects, Why, Introduction 606@section I18n, L10n, and Such 607 608@cindex i18n 609@cindex l10n 610Two long words appear all the time when we discuss support of native 611language in programs, and these words have a precise meaning, worth 612being explained here, once and for all in this document. The words are 613@emph{internationalization} and @emph{localization}. Many people, 614tired of writing these long words over and over again, took the 615habit of writing @dfn{i18n} and @dfn{l10n} instead, quoting the first 616and last letter of each word, and replacing the run of intermediate 617letters by a number merely telling how many such letters there are. 618But in this manual, in the sake of clarity, we will patiently write 619the names in full, each time@dots{} 620 621@cindex internationalization 622By @dfn{internationalization}, one refers to the operation by which a 623program, or a set of programs turned into a package, is made aware of and 624able to support multiple languages. This is a generalization process, 625by which the programs are untied from calling only English strings or 626other English specific habits, and connected to generic ways of doing 627the same, instead. Program developers may use various techniques to 628internationalize their programs. Some of these have been standardized. 629GNU @code{gettext} offers one of these standards. @xref{Programmers}. 630 631@cindex localization 632By @dfn{localization}, one means the operation by which, in a set 633of programs already internationalized, one gives the program all 634needed information so that it can adapt itself to handle its input 635and output in a fashion which is correct for some native language and 636cultural habits. This is a particularisation process, by which generic 637methods already implemented in an internationalized program are used 638in specific ways. The programming environment puts several functions 639to the programmers disposal which allow this runtime configuration. 640The formal description of specific set of cultural habits for some 641country, together with all associated translations targeted to the 642same native language, is called the @dfn{locale} for this language 643or country. Users achieve localization of programs by setting proper 644values to special environment variables, prior to executing those 645programs, identifying which locale should be used. 646 647In fact, locale message support is only one component of the cultural 648data that makes up a particular locale. There are a whole host of 649routines and functions provided to aid programmers in developing 650internationalized software and which allow them to access the data 651stored in a particular locale. When someone presently refers to a 652particular locale, they are obviously referring to the data stored 653within that particular locale. Similarly, if a programmer is referring 654to ``accessing the locale routines'', they are referring to the 655complete suite of routines that access all of the locale's information. 656 657@cindex NLS 658@cindex Native Language Support 659@cindex Natural Language Support 660One uses the expression @dfn{Native Language Support}, or merely NLS, 661for speaking of the overall activity or feature encompassing both 662internationalization and localization, allowing for multi-lingual 663interactions in a program. In a nutshell, one could say that 664internationalization is the operation by which further localizations 665are made possible. 666 667Also, very roughly said, when it comes to multi-lingual messages, 668internationalization is usually taken care of by programmers, and 669localization is usually taken care of by translators. 670 671@node Aspects, Files, Concepts, Introduction 672@section Aspects in Native Language Support 673 674@cindex translation aspects 675For a totally multi-lingual distribution, there are many things to 676translate beyond output messages. 677 678@itemize @bullet 679@item 680As of today, GNU @code{gettext} offers a complete toolset for 681translating messages output by C programs. Perl scripts and shell 682scripts will also need to be translated. Even if there are today some hooks 683by which this can be done, these hooks are not integrated as well as they 684should be. 685 686@item 687Some programs, like @code{autoconf} or @code{bison}, are able 688to produce other programs (or scripts). Even if the generating 689programs themselves are internationalized, the generated programs they 690produce may need internationalization on their own, and this indirect 691internationalization could be automated right from the generating 692program. In fact, quite usually, generating and generated programs 693could be internationalized independently, as the effort needed is 694fairly orthogonal. 695 696@item 697A few programs include textual tables which might need translation 698themselves, independently of the strings contained in the program 699itself. For example, @w{RFC 1345} gives an English description for each 700character which the @code{recode} program is able to reconstruct at execution. 701Since these descriptions are extracted from the RFC by mechanical means, 702translating them properly would require a prior translation of the RFC 703itself. 704 705@item 706Almost all programs accept options, which are often worded out so to 707be descriptive for the English readers; one might want to consider 708offering translated versions for program options as well. 709 710@item 711Many programs read, interpret, compile, or are somewhat driven by 712input files which are texts containing keywords, identifiers, or 713replies which are inherently translatable. For example, one may want 714@code{gcc} to allow diacriticized characters in identifiers or use 715translated keywords; @samp{rm -i} might accept something else than 716@samp{y} or @samp{n} for replies, etc. Even if the program will 717eventually make most of its output in the foreign languages, one has 718to decide whether the input syntax, option values, etc., are to be 719localized or not. 720 721@item 722The manual accompanying a package, as well as all documentation files 723in the distribution, could surely be translated, too. Translating a 724manual, with the intent of later keeping up with updates, is a major 725undertaking in itself, generally. 726 727@end itemize 728 729As we already stressed, translation is only one aspect of locales. 730Other internationalization aspects are system services and are handled 731in GNU @code{libc}. There 732are many attributes that are needed to define a country's cultural 733conventions. These attributes include beside the country's native 734language, the formatting of the date and time, the representation of 735numbers, the symbols for currency, etc. These local @dfn{rules} are 736termed the country's locale. The locale represents the knowledge 737needed to support the country's native attributes. 738 739@cindex locale categories 740There are a few major areas which may vary between countries and 741hence, define what a locale must describe. The following list helps 742putting multi-lingual messages into the proper context of other tasks 743related to locales. See the GNU @code{libc} manual for details. 744 745@table @emph 746 747@item Characters and Codesets 748@cindex codeset 749@cindex encoding 750@cindex character encoding 751@cindex locale category, LC_CTYPE 752 753The codeset most commonly used through out the USA and most English 754speaking parts of the world is the ASCII codeset. However, there are 755many characters needed by various locales that are not found within 756this codeset. The 8-bit @w{ISO 8859-1} code set has most of the special 757characters needed to handle the major European languages. However, in 758many cases, choosing @w{ISO 8859-1} is nevertheless not adequate: it 759doesn't even handle the major European currency. Hence each locale 760will need to specify which codeset they need to use and will need 761to have the appropriate character handling routines to cope with 762the codeset. 763 764@item Currency 765@cindex currency symbols 766@cindex locale category, LC_MONETARY 767 768The symbols used vary from country to country as does the position 769used by the symbol. Software needs to be able to transparently 770display currency figures in the native mode for each locale. 771 772@item Dates 773@cindex date format 774@cindex locale category, LC_TIME 775 776The format of date varies between locales. For example, Christmas day 777in 1994 is written as 12/25/94 in the USA and as 25/12/94 in Australia. 778Other countries might use @w{ISO 8601} dates, etc. 779 780Time of the day may be noted as @var{hh}:@var{mm}, @var{hh}.@var{mm}, 781or otherwise. Some locales require time to be specified in 24-hour 782mode rather than as AM or PM. Further, the nature and yearly extent 783of the Daylight Saving correction vary widely between countries. 784 785@item Numbers 786@cindex number format 787@cindex locale category, LC_NUMERIC 788 789Numbers can be represented differently in different locales. 790For example, the following numbers are all written correctly for 791their respective locales: 792 793@example 79412,345.67 English 79512.345,67 German 796 12345,67 French 7971,2345.67 Asia 798@end example 799 800Some programs could go further and use different unit systems, like 801English units or Metric units, or even take into account variants 802about how numbers are spelled in full. 803 804@item Messages 805@cindex messages 806@cindex locale category, LC_MESSAGES 807 808The most obvious area is the language support within a locale. This is 809where GNU @code{gettext} provides the means for developers and users to 810easily change the language that the software uses to communicate to 811the user. 812 813@end table 814 815@cindex locale categories 816These areas of cultural conventions are called @emph{locale categories}. 817It is an unfortunate term; @emph{locale aspects} or @emph{locale feature 818categories} would be a better term, because each ``locale category'' 819describes an area or task that requires localization. The concrete data 820that describes the cultural conventions for such an area and for a particular 821culture is also called a @emph{locale category}. In this sense, a locale 822is composed of several locale categories: the locale category describing 823the codeset, the locale category describing the formatting of numbers, 824the locale category containing the translated messages, and so on. 825 826@cindex Linux 827Components of locale outside of message handling are standardized in 828the ISO C standard and the POSIX:2001 standard (also known as the SUSV3 829specification). GNU @code{libc} 830fully implements this, and most other modern systems provide a more 831or less reasonable support for at least some of the missing components. 832 833@node Files, Overview, Aspects, Introduction 834@section Files Conveying Translations 835 836@cindex files, @file{.po} and @file{.mo} 837The letters PO in @file{.po} files means Portable Object, to 838distinguish it from @file{.mo} files, where MO stands for Machine 839Object. This paradigm, as well as the PO file format, is inspired 840by the NLS standard developed by Uniforum, and first implemented by 841Sun in their Solaris system. 842 843PO files are meant to be read and edited by humans, and associate each 844original, translatable string of a given package with its translation 845in a particular target language. A single PO file is dedicated to 846a single target language. If a package supports many languages, 847there is one such PO file per language supported, and each package 848has its own set of PO files. These PO files are best created by 849the @code{xgettext} program, and later updated or refreshed through 850the @code{msgmerge} program. Program @code{xgettext} extracts all 851marked messages from a set of C files and initializes a PO file with 852empty translations. Program @code{msgmerge} takes care of adjusting 853PO files between releases of the corresponding sources, commenting 854obsolete entries, initializing new ones, and updating all source 855line references. Files ending with @file{.pot} are kind of base 856translation files found in distributions, in PO file format. 857 858MO files are meant to be read by programs, and are binary in nature. 859A few systems already offer tools for creating and handling MO files 860as part of the Native Language Support coming with the system, but the 861format of these MO files is often different from system to system, 862and non-portable. The tools already provided with these systems don't 863support all the features of GNU @code{gettext}. Therefore GNU 864@code{gettext} uses its own format for MO files. Files ending with 865@file{.gmo} are really MO files, when it is known that these files use 866the GNU format. 867 868@node Overview, , Files, Introduction 869@section Overview of GNU @code{gettext} 870 871@cindex overview of @code{gettext} 872@cindex big picture 873@cindex tutorial of @code{gettext} usage 874The following diagram summarizes the relation between the files 875handled by GNU @code{gettext} and the tools acting on these files. 876It is followed by somewhat detailed explanations, which you should 877read while keeping an eye on the diagram. Having a clear understanding 878of these interrelations will surely help programmers, translators 879and maintainers. 880 881@ifhtml 882@example 883@group 884Original C Sources ���������> Preparation ���������> Marked C Sources ������������ 885 ��� 886 ������������������������������<��������� GNU gettext Library ��� 887������������ make <������������ ��� 888��� ������������������������������<��������������������������������������������������������������������������������������������������������������� 889��� ��� 890��� ������������������<��������� PACKAGE.pot <��������� xgettext <������������ ������������<��������� PO Compendium 891��� ��� ��� ��� 892��� ��� ��������������� ��� 893��� ��������������� ������������> PO editor ������������ 894��� ���������������> msgmerge ������������������> LANG.po ������������>��������������������������� ��� 895��� ��������������� ��� 896��� ��� ��� 897��� ������������������������������������������<������������������������������������������������ ��� 898��� ������������ New LANG.po <��������������������������������������������������������������� 899��� ������������ LANG.gmo <��������� msgfmt <������������ 900��� ��� 901��� ������������> install ���������> /.../LANG/PACKAGE.mo ������������ 902��� ������������> "Hello world!" 903������������������������> install ���������> /.../bin/PROGRAM ������������������������ 904@end group 905@end example 906@end ifhtml 907@ifnothtml 908@example 909@group 910Original C Sources ---> Preparation ---> Marked C Sources ---. 911 | 912 .---------<--- GNU gettext Library | 913.--- make <---+ | 914| `---------<--------------------+---------------' 915| | 916| .-----<--- PACKAGE.pot <--- xgettext <---' .---<--- PO Compendium 917| | | ^ 918| | `---. | 919| `---. +---> PO editor ---. 920| +----> msgmerge ------> LANG.po ---->--------' | 921| .---' | 922| | | 923| `-------------<---------------. | 924| +--- New LANG.po <--------------------' 925| .--- LANG.gmo <--- msgfmt <---' 926| | 927| `---> install ---> /.../LANG/PACKAGE.mo ---. 928| +---> "Hello world!" 929`-------> install ---> /.../bin/PROGRAM -------' 930@end group 931@end example 932@end ifnothtml 933 934@cindex marking translatable strings 935As a programmer, the first step to bringing GNU @code{gettext} 936into your package is identifying, right in the C sources, those strings 937which are meant to be translatable, and those which are untranslatable. 938This tedious job can be done a little more comfortably using emacs PO 939mode, but you can use any means familiar to you for modifying your 940C sources. Beside this some other simple, standard changes are needed to 941properly initialize the translation library. @xref{Sources}, for 942more information about all this. 943 944For newly written software the strings of course can and should be 945marked while writing it. The @code{gettext} approach makes this 946very easy. Simply put the following lines at the beginning of each file 947or in a central header file: 948 949@example 950@group 951#define _(String) (String) 952#define N_(String) String 953#define textdomain(Domain) 954#define bindtextdomain(Package, Directory) 955@end group 956@end example 957 958@noindent 959Doing this allows you to prepare the sources for internationalization. 960Later when you feel ready for the step to use the @code{gettext} library 961simply replace these definitions by the following: 962 963@cindex include file @file{libintl.h} 964@example 965@group 966#include <libintl.h> 967#define _(String) gettext (String) 968#define gettext_noop(String) String 969#define N_(String) gettext_noop (String) 970@end group 971@end example 972 973@cindex link with @file{libintl} 974@cindex Linux 975@noindent 976and link against @file{libintl.a} or @file{libintl.so}. Note that on 977GNU systems, you don't need to link with @code{libintl} because the 978@code{gettext} library functions are already contained in GNU libc. 979That is all you have to change. 980 981@cindex template PO file 982@cindex files, @file{.pot} 983Once the C sources have been modified, the @code{xgettext} program 984is used to find and extract all translatable strings, and create a 985PO template file out of all these. This @file{@var{package}.pot} file 986contains all original program strings. It has sets of pointers to 987exactly where in C sources each string is used. All translations 988are set to empty. The letter @code{t} in @file{.pot} marks this as 989a Template PO file, not yet oriented towards any particular language. 990@xref{xgettext Invocation}, for more details about how one calls the 991@code{xgettext} program. If you are @emph{really} lazy, you might 992be interested at working a lot more right away, and preparing the 993whole distribution setup (@pxref{Maintainers}). By doing so, you 994spare yourself typing the @code{xgettext} command, as @code{make} 995should now generate the proper things automatically for you! 996 997The first time through, there is no @file{@var{lang}.po} yet, so the 998@code{msgmerge} step may be skipped and replaced by a mere copy of 999@file{@var{package}.pot} to @file{@var{lang}.po}, where @var{lang} 1000represents the target language. See @ref{Creating} for details. 1001 1002Then comes the initial translation of messages. Translation in 1003itself is a whole matter, still exclusively meant for humans, 1004and whose complexity far overwhelms the level of this manual. 1005Nevertheless, a few hints are given in some other chapter of this 1006manual (@pxref{Translators}). You will also find there indications 1007about how to contact translating teams, or becoming part of them, 1008for sharing your translating concerns with others who target the same 1009native language. 1010 1011While adding the translated messages into the @file{@var{lang}.po} 1012PO file, if you are not using one of the dedicated PO file editors 1013(@pxref{Editing}), you are on your own 1014for ensuring that your efforts fully respect the PO file format, and quoting 1015conventions (@pxref{PO Files}). This is surely not an impossible task, 1016as this is the way many people have handled PO files around 1995. 1017On the other hand, by using a PO file editor, most details 1018of PO file format are taken care of for you, but you have to acquire 1019some familiarity with PO file editor itself. 1020 1021If some common translations have already been saved into a compendium 1022PO file, translators may use PO mode for initializing untranslated 1023entries from the compendium, and also save selected translations into 1024the compendium, updating it (@pxref{Compendium}). Compendium files 1025are meant to be exchanged between members of a given translation team. 1026 1027Programs, or packages of programs, are dynamic in nature: users write 1028bug reports and suggestion for improvements, maintainers react by 1029modifying programs in various ways. The fact that a package has 1030already been internationalized should not make maintainers shy 1031of adding new strings, or modifying strings already translated. 1032They just do their job the best they can. For the Translation 1033Project to work smoothly, it is important that maintainers do not 1034carry translation concerns on their already loaded shoulders, and that 1035translators be kept as free as possible of programming concerns. 1036 1037The only concern maintainers should have is carefully marking new 1038strings as translatable, when they should be, and do not otherwise 1039worry about them being translated, as this will come in proper time. 1040Consequently, when programs and their strings are adjusted in various 1041ways by maintainers, and for matters usually unrelated to translation, 1042@code{xgettext} would construct @file{@var{package}.pot} files which are 1043evolving over time, so the translations carried by @file{@var{lang}.po} 1044are slowly fading out of date. 1045 1046@cindex evolution of packages 1047It is important for translators (and even maintainers) to understand 1048that package translation is a continuous process in the lifetime of a 1049package, and not something which is done once and for all at the start. 1050After an initial burst of translation activity for a given package, 1051interventions are needed once in a while, because here and there, 1052translated entries become obsolete, and new untranslated entries 1053appear, needing translation. 1054 1055The @code{msgmerge} program has the purpose of refreshing an already 1056existing @file{@var{lang}.po} file, by comparing it with a newer 1057@file{@var{package}.pot} template file, extracted by @code{xgettext} 1058out of recent C sources. The refreshing operation adjusts all 1059references to C source locations for strings, since these strings 1060move as programs are modified. Also, @code{msgmerge} comments out as 1061obsolete, in @file{@var{lang}.po}, those already translated entries 1062which are no longer used in the program sources (@pxref{Obsolete 1063Entries}). It finally discovers new strings and inserts them in 1064the resulting PO file as untranslated entries (@pxref{Untranslated 1065Entries}). @xref{msgmerge Invocation}, for more information about what 1066@code{msgmerge} really does. 1067 1068Whatever route or means taken, the goal is to obtain an updated 1069@file{@var{lang}.po} file offering translations for all strings. 1070 1071The temporal mobility, or fluidity of PO files, is an integral part of 1072the translation game, and should be well understood, and accepted. 1073People resisting it will have a hard time participating in the 1074Translation Project, or will give a hard time to other participants! In 1075particular, maintainers should relax and include all available official 1076PO files in their distributions, even if these have not recently been 1077updated, without exerting pressure on the translator teams to get the 1078job done. The pressure should rather come 1079from the community of users speaking a particular language, and 1080maintainers should consider themselves fairly relieved of any concern 1081about the adequacy of translation files. On the other hand, translators 1082should reasonably try updating the PO files they are responsible for, 1083while the package is undergoing pretest, prior to an official 1084distribution. 1085 1086Once the PO file is complete and dependable, the @code{msgfmt} program 1087is used for turning the PO file into a machine-oriented format, which 1088may yield efficient retrieval of translations by the programs of the 1089package, whenever needed at runtime (@pxref{MO Files}). @xref{msgfmt 1090Invocation}, for more information about all modes of execution 1091for the @code{msgfmt} program. 1092 1093Finally, the modified and marked C sources are compiled and linked 1094with the GNU @code{gettext} library, usually through the operation of 1095@code{make}, given a suitable @file{Makefile} exists for the project, 1096and the resulting executable is installed somewhere users will find it. 1097The MO files themselves should also be properly installed. Given the 1098appropriate environment variables are set (@pxref{Setting the POSIX Locale}), 1099the program should localize itself automatically, whenever it executes. 1100 1101The remainder of this manual has the purpose of explaining in depth the various 1102steps outlined above. 1103 1104@node Users, PO Files, Introduction, Top 1105@chapter The User's View 1106 1107Nowadays, when users log into a computer, they usually find that all 1108their programs show messages in their native language -- at least for 1109users of languages with an active free software community, like French or 1110German; to a lesser extent for languages with a smaller participation in 1111free software and the GNU project, like Hindi and Filipino. 1112 1113How does this work? How can the user influence the language that is used 1114by the programs? This chapter will answer it. 1115 1116@menu 1117* System Installation:: Questions During Operating System Installation 1118* Setting the GUI Locale:: How to Specify the Locale Used by GUI Programs 1119* Setting the POSIX Locale:: How to Specify the Locale According to POSIX 1120* Installing Localizations:: How to Install Additional Translations 1121@end menu 1122 1123@node System Installation, Setting the GUI Locale, Users, Users 1124@section Operating System Installation 1125 1126The default language is often already specified during operating system 1127installation. When the operating system is installed, the installer 1128typically asks for the language used for the installation process and, 1129separately, for the language to use in the installed system. Some OS 1130installers only ask for the language once. 1131 1132This determines the system-wide default language for all users. But the 1133installers often give the possibility to install extra localizations for 1134additional languages. For example, the localizations of KDE (the K 1135Desktop Environment) and OpenOffice.org are often bundled separately, 1136as one installable package per language. 1137 1138At this point it is good to consider the intended use of the machine: If 1139it is a machine designated for personal use, additional localizations are 1140probably not necessary. If, however, the machine is in use in an 1141organization or company that has international relationships, one can 1142consider the needs of guest users. If you have a guest from abroad, for 1143a week, what could be his preferred locales? It may be worth installing 1144these additional localizations ahead of time, since they cost only a bit 1145of disk space at this point. 1146 1147The system-wide default language is the locale configuration that is used 1148when a new user account is created. But the user can have his own locale 1149configuration that is different from the one of the other users of the 1150same machine. He can specify it, typically after the first login, as 1151described in the next section. 1152 1153@node Setting the GUI Locale, Setting the POSIX Locale, System Installation, Users 1154@section Setting the Locale Used by GUI Programs 1155 1156The immediately available programs in a user's desktop come from a group 1157of programs called a ``desktop environment''; it usually includes the window 1158manager, a web browser, a text editor, and more. The most common free 1159desktop environments are KDE, GNOME, and Xfce. 1160 1161The locale used by GUI programs of the desktop environment can be specified 1162in a configuration screen called ``control center'', ``language settings'' 1163or ``country settings''. 1164 1165Individual GUI programs that are not part of the desktop environment can 1166have their locale specified either in a settings panel, or through environment 1167variables. 1168 1169For some programs, it is possible to specify the locale through environment 1170variables, possibly even to a different locale than the desktop's locale. 1171This means, instead of starting a program through a menu or from the file 1172system, you can start it from the command-line, after having set some 1173environment variables. The environment variables can be those specified 1174in the next section (@ref{Setting the POSIX Locale}); for some versions of 1175KDE, however, the locale is specified through a variable @code{KDE_LANG}, 1176rather than @code{LANG} or @code{LC_ALL}. 1177 1178@node Setting the POSIX Locale, Installing Localizations, Setting the GUI Locale, Users 1179@section Setting the Locale through Environment Variables 1180 1181As a user, if your language has been installed for this package, in the 1182simplest case, you only have to set the @code{LANG} environment variable 1183to the appropriate @samp{@var{ll}_@var{CC}} combination. For example, 1184let's suppose that you speak German and live in Germany. At the shell 1185prompt, merely execute 1186@w{@samp{setenv LANG de_DE}} (in @code{csh}), 1187@w{@samp{export LANG; LANG=de_DE}} (in @code{sh}) or 1188@w{@samp{export LANG=de_DE}} (in @code{bash}). This can be done from your 1189@file{.login} or @file{.profile} file, once and for all. 1190 1191@menu 1192* Locale Names:: How a Locale Specification Looks Like 1193* Locale Environment Variables:: Which Environment Variable Specfies What 1194* The LANGUAGE variable:: How to Specify a Priority List of Languages 1195@end menu 1196 1197@node Locale Names, Locale Environment Variables, Setting the POSIX Locale, Setting the POSIX Locale 1198@subsection Locale Names 1199 1200A locale name usually has the form @samp{@var{ll}_@var{CC}}. Here 1201@samp{@var{ll}} is an @w{ISO 639} two-letter language code, and 1202@samp{@var{CC}} is an @w{ISO 3166} two-letter country code. For example, 1203for German in Germany, @var{ll} is @code{de}, and @var{CC} is @code{DE}. 1204You find a list of the language codes in appendix @ref{Language Codes} and 1205a list of the country codes in appendix @ref{Country Codes}. 1206 1207You might think that the country code specification is redundant. But in 1208fact, some languages have dialects in different countries. For example, 1209@samp{de_AT} is used for Austria, and @samp{pt_BR} for Brazil. The country 1210code serves to distinguish the dialects. 1211 1212Many locale names have an extended syntax 1213@samp{@var{ll}_@var{CC}.@var{encoding}} that also specifies the character 1214encoding. These are in use because between 2000 and 2005, most users have 1215switched to locales in UTF-8 encoding. For example, the German locale on 1216glibc systems is nowadays @samp{de_DE.UTF-8}. The older name @samp{de_DE} 1217still refers to the German locale as of 2000 that stores characters in 1218ISO-8859-1 encoding -- a text encoding that cannot even accomodate the Euro 1219currency sign. 1220 1221Some locale names use @samp{@var{ll}_@var{CC}.@@@var{variant}} instead of 1222@samp{@var{ll}_@var{CC}}. The @samp{@@@var{variant}} can denote any kind of 1223characteristics that is not already implied by the language @var{ll} and 1224the country @var{CC}. It can denote a particular monetary unit. For example, 1225on glibc systems, @samp{de_DE@@euro} denotes the locale that uses the Euro 1226currency, in contrast to the older locale @samp{de_DE} which implies the use 1227of the currency before 2002. It can also denote a dialect of the language, 1228or the script used to write text (for example, @samp{sr_RS@@latin} uses the 1229Latin script, whereas @samp{sr_RS} uses the Cyrillic script to write Serbian), 1230or the orthography rules, or similar. 1231 1232On other systems, some variations of this scheme are used, such as 1233@samp{@var{ll}}. You can get the list of locales supported by your system 1234for your language by running the command @samp{locale -a | grep '^@var{ll}'}. 1235 1236There is also a special locale, called @samp{C}. 1237@c Don't mention that this locale also has the name "POSIX". When we talk about 1238@c the "POSIX locale", we mean the "locale as specified in the POSIX way", and 1239@c mentioning a locale called "POSIX" would bring total confusion. 1240When it is used, it disables all localization: in this locale, all programs 1241standardized by POSIX use English messages and an unspecified character 1242encoding (often US-ASCII, but sometimes also ISO-8859-1 or UTF-8, depending on 1243the operating system). 1244 1245@node Locale Environment Variables, The LANGUAGE variable, Locale Names, Setting the POSIX Locale 1246@subsection Locale Environment Variables 1247@cindex setting up @code{gettext} at run time 1248@cindex selecting message language 1249@cindex language selection 1250 1251A locale is composed of several @emph{locale categories}, see @ref{Aspects}. 1252When a program looks up locale dependent values, it does this according to 1253the following environment variables, in priority order: 1254 1255@enumerate 1256@vindex LANGUAGE@r{, environment variable} 1257@item @code{LANGUAGE} 1258@vindex LC_ALL@r{, environment variable} 1259@item @code{LC_ALL} 1260@vindex LC_CTYPE@r{, environment variable} 1261@vindex LC_NUMERIC@r{, environment variable} 1262@vindex LC_TIME@r{, environment variable} 1263@vindex LC_COLLATE@r{, environment variable} 1264@vindex LC_MONETARY@r{, environment variable} 1265@vindex LC_MESSAGES@r{, environment variable} 1266@item @code{LC_xxx}, according to selected locale category: 1267@code{LC_CTYPE}, @code{LC_NUMERIC}, @code{LC_TIME}, @code{LC_COLLATE}, 1268@code{LC_MONETARY}, @code{LC_MESSAGES}, ... 1269@vindex LANG@r{, environment variable} 1270@item @code{LANG} 1271@end enumerate 1272 1273Variables whose value is set but is empty are ignored in this lookup. 1274 1275@code{LANG} is the normal environment variable for specifying a locale. 1276As a user, you normally set this variable (unless some of the other variables 1277have already been set by the system, in @file{/etc/profile} or similar 1278initialization files). 1279 1280@code{LC_CTYPE}, @code{LC_NUMERIC}, @code{LC_TIME}, @code{LC_COLLATE}, 1281@code{LC_MONETARY}, @code{LC_MESSAGES}, and so on, are the environment 1282variables meant to override @code{LANG} and affecting a single locale 1283category only. For example, assume you are a Swedish user in Spain, and you 1284want your programs to handle numbers and dates according to Spanish 1285conventions, and only the messages should be in Swedish. Then you could 1286create a locale named @samp{sv_ES} or @samp{sv_ES.UTF-8} by use of the 1287@code{localedef} program. But it is simpler, and achieves the same effect, 1288to set the @code{LANG} variable to @code{es_ES.UTF-8} and the 1289@code{LC_MESSAGES} variable to @code{sv_SE.UTF-8}; these two locales come 1290already preinstalled with the operating system. 1291 1292@code{LC_ALL} is an environment variable that overrides all of these. 1293It is typically used in scripts that run particular programs. For example, 1294@code{configure} scripts generated by GNU autoconf use @code{LC_ALL} to make 1295sure that the configuration tests don't operate in locale dependent ways. 1296 1297Some systems, unfortunately, set @code{LC_ALL} in @file{/etc/profile} or in 1298similar initialization files. As a user, you therefore have to unset this 1299variable if you want to set @code{LANG} and optionally some of the other 1300@code{LC_xxx} variables. 1301 1302The @code{LANGUAGE} variable is described in the next subsection. 1303 1304@node The LANGUAGE variable, , Locale Environment Variables, Setting the POSIX Locale 1305@subsection Specifying a Priority List of Languages 1306 1307Not all programs have translations for all languages. By default, an 1308English message is shown in place of a nonexistent translation. If you 1309understand other languages, you can set up a priority list of languages. 1310This is done through a different environment variable, called 1311@code{LANGUAGE}. GNU @code{gettext} gives preference to @code{LANGUAGE} 1312over @code{LC_ALL} and @code{LANG} for the purpose of message handling, 1313but you still need to have @code{LANG} (or @code{LC_ALL}) set to the primary 1314language; this is required by other parts of the system libraries. 1315For example, some Swedish users who would rather read translations in 1316German than English for when Swedish is not available, set @code{LANGUAGE} 1317to @samp{sv:de} while leaving @code{LANG} to @samp{sv_SE}. 1318 1319Special advice for Norwegian users: The language code for Norwegian 1320bokm@ringaccent{a}l changed from @samp{no} to @samp{nb} recently (in 2003). 1321During the transition period, while some message catalogs for this language 1322are installed under @samp{nb} and some older ones under @samp{no}, it is 1323recommended for Norwegian users to set @code{LANGUAGE} to @samp{nb:no} so that 1324both newer and older translations are used. 1325 1326In the @code{LANGUAGE} environment variable, but not in the other 1327environment variables, @samp{@var{ll}_@var{CC}} combinations can be 1328abbreviated as @samp{@var{ll}} to denote the language's main dialect. 1329For example, @samp{de} is equivalent to @samp{de_DE} (German as spoken in 1330Germany), and @samp{pt} to @samp{pt_PT} (Portuguese as spoken in Portugal) 1331in this context. 1332 1333Note: The variable @code{LANGUAGE} is ignored if the locale is set to 1334@samp{C}. In other words, you have to first enable localization, by setting 1335@code{LANG} (or @code{LC_ALL}) to a value other than @samp{C}, before you can 1336use a language priority list through the @code{LANGUAGE} variable. 1337 1338@node Installing Localizations, , Setting the POSIX Locale, Users 1339@section Installing Translations for Particular Programs 1340@cindex Translation Matrix 1341@cindex available translations 1342 1343Languages are not equally well supported in all packages using GNU 1344@code{gettext}, and more translations are added over time. Usually, you 1345use the translations that are shipped with the operating system 1346or with particular packages that you install afterwards. But you can also 1347install newer localizations directly. For doing this, you will need an 1348understanding where each localization file is stored on the file system. 1349 1350@cindex @file{ABOUT-NLS} file 1351For programs that participate in the Translation Project, you can start 1352looking for translations here: 1353@url{http://translationproject.org/team/index.html}. 1354A snapshot of this information is also found in the @file{ABOUT-NLS} file 1355that is shipped with GNU gettext. 1356 1357For programs that are part of the KDE project, the starting point is: 1358@url{http://i18n.kde.org/}. 1359 1360For programs that are part of the GNOME project, the starting point is: 1361@url{http://www.gnome.org/i18n/}. 1362 1363For other programs, you may check whether the program's source code package 1364contains some @file{@var{ll}.po} files; often they are kept together in a 1365directory called @file{po/}. Each @file{@var{ll}.po} file contains the 1366message translations for the language whose abbreviation of @var{ll}. 1367 1368@node PO Files, Sources, Users, Top 1369@chapter The Format of PO Files 1370@cindex PO files' format 1371@cindex file format, @file{.po} 1372 1373The GNU @code{gettext} toolset helps programmers and translators 1374at producing, updating and using translation files, mainly those 1375PO files which are textual, editable files. This chapter explains 1376the format of PO files. 1377 1378A PO file is made up of many entries, each entry holding the relation 1379between an original untranslated string and its corresponding 1380translation. All entries in a given PO file usually pertain 1381to a single project, and all translations are expressed in a single 1382target language. One PO file @dfn{entry} has the following schematic 1383structure: 1384 1385@example 1386@var{white-space} 1387# @var{translator-comments} 1388#. @var{extracted-comments} 1389#: @var{reference}@dots{} 1390#, @var{flag}@dots{} 1391#| msgid @var{previous-untranslated-string} 1392msgid @var{untranslated-string} 1393msgstr @var{translated-string} 1394@end example 1395 1396The general structure of a PO file should be well understood by 1397the translator. When using PO mode, very little has to be known 1398about the format details, as PO mode takes care of them for her. 1399 1400A simple entry can look like this: 1401 1402@example 1403#: lib/error.c:116 1404msgid "Unknown system error" 1405msgstr "Error desconegut del sistema" 1406@end example 1407 1408@cindex comments, translator 1409@cindex comments, automatic 1410@cindex comments, extracted 1411Entries begin with some optional white space. Usually, when generated 1412through GNU @code{gettext} tools, there is exactly one blank line 1413between entries. Then comments follow, on lines all starting with the 1414character @code{#}. There are two kinds of comments: those which have 1415some white space immediately following the @code{#} - the @var{translator 1416comments} -, which comments are created and maintained exclusively by the 1417translator, and those which have some non-white character just after the 1418@code{#} - the @var{automatic comments} -, which comments are created and 1419maintained automatically by GNU @code{gettext} tools. Comment lines 1420starting with @code{#.} contain comments given by the programmer, directed 1421at the translator; these comments are called @var{extracted comments} 1422because the @code{xgettext} program extracts them from the program's 1423source code. Comment lines starting with @code{#:} contain references to 1424the program's source code. Comment lines starting with @code{#,} contain 1425flags; more about these below. Comment lines starting with @code{#|} 1426contain the previous untranslated string for which the translator gave 1427a translation. 1428 1429All comments, of either kind, are optional. 1430 1431@kwindex msgid 1432@kwindex msgstr 1433After white space and comments, entries show two strings, namely 1434first the untranslated string as it appears in the original program 1435sources, and then, the translation of this string. The original 1436string is introduced by the keyword @code{msgid}, and the translation, 1437by @code{msgstr}. The two strings, untranslated and translated, 1438are quoted in various ways in the PO file, using @code{"} 1439delimiters and @code{\} escapes, but the translator does not really 1440have to pay attention to the precise quoting format, as PO mode fully 1441takes care of quoting for her. 1442 1443The @code{msgid} strings, as well as automatic comments, are produced 1444and managed by other GNU @code{gettext} tools, and PO mode does not 1445provide means for the translator to alter these. The most she can 1446do is merely deleting them, and only by deleting the whole entry. 1447On the other hand, the @code{msgstr} string, as well as translator 1448comments, are really meant for the translator, and PO mode gives her 1449the full control she needs. 1450 1451The comment lines beginning with @code{#,} are special because they are 1452not completely ignored by the programs as comments generally are. The 1453comma separated list of @var{flag}s is used by the @code{msgfmt} 1454program to give the user some better diagnostic messages. Currently 1455there are two forms of flags defined: 1456 1457@table @code 1458@item fuzzy 1459@kwindex fuzzy@r{ flag} 1460This flag can be generated by the @code{msgmerge} program or it can be 1461inserted by the translator herself. It shows that the @code{msgstr} 1462string might not be a correct translation (anymore). Only the translator 1463can judge if the translation requires further modification, or is 1464acceptable as is. Once satisfied with the translation, she then removes 1465this @code{fuzzy} attribute. The @code{msgmerge} program inserts this 1466when it combined the @code{msgid} and @code{msgstr} entries after fuzzy 1467search only. @xref{Fuzzy Entries}. 1468 1469@item c-format 1470@kwindex c-format@r{ flag} 1471@itemx no-c-format 1472@kwindex no-c-format@r{ flag} 1473These flags should not be added by a human. Instead only the 1474@code{xgettext} program adds them. In an automated PO file processing 1475system as proposed here the user changes would be thrown away again as 1476soon as the @code{xgettext} program generates a new template file. 1477 1478The @code{c-format} flag tells that the untranslated string and the 1479translation are supposed to be C format strings. The @code{no-c-format} 1480flag tells that they are not C format strings, even though the untranslated 1481string happens to look like a C format string (with @samp{%} directives). 1482 1483In case the @code{c-format} flag is given for a string the @code{msgfmt} 1484does some more tests to check to validity of the translation. 1485@xref{msgfmt Invocation}, @ref{c-format Flag} and @ref{c-format}. 1486 1487@item objc-format 1488@kwindex objc-format@r{ flag} 1489@itemx no-objc-format 1490@kwindex no-objc-format@r{ flag} 1491Likewise for Objective C, see @ref{objc-format}. 1492 1493@item sh-format 1494@kwindex sh-format@r{ flag} 1495@itemx no-sh-format 1496@kwindex no-sh-format@r{ flag} 1497Likewise for Shell, see @ref{sh-format}. 1498 1499@item python-format 1500@kwindex python-format@r{ flag} 1501@itemx no-python-format 1502@kwindex no-python-format@r{ flag} 1503Likewise for Python, see @ref{python-format}. 1504 1505@item lisp-format 1506@kwindex lisp-format@r{ flag} 1507@itemx no-lisp-format 1508@kwindex no-lisp-format@r{ flag} 1509Likewise for Lisp, see @ref{lisp-format}. 1510 1511@item elisp-format 1512@kwindex elisp-format@r{ flag} 1513@itemx no-elisp-format 1514@kwindex no-elisp-format@r{ flag} 1515Likewise for Emacs Lisp, see @ref{elisp-format}. 1516 1517@item librep-format 1518@kwindex librep-format@r{ flag} 1519@itemx no-librep-format 1520@kwindex no-librep-format@r{ flag} 1521Likewise for librep, see @ref{librep-format}. 1522 1523@item scheme-format 1524@kwindex scheme-format@r{ flag} 1525@itemx no-scheme-format 1526@kwindex no-scheme-format@r{ flag} 1527Likewise for Scheme, see @ref{scheme-format}. 1528 1529@item smalltalk-format 1530@kwindex smalltalk-format@r{ flag} 1531@itemx no-smalltalk-format 1532@kwindex no-smalltalk-format@r{ flag} 1533Likewise for Smalltalk, see @ref{smalltalk-format}. 1534 1535@item java-format 1536@kwindex java-format@r{ flag} 1537@itemx no-java-format 1538@kwindex no-java-format@r{ flag} 1539Likewise for Java, see @ref{java-format}. 1540 1541@item csharp-format 1542@kwindex csharp-format@r{ flag} 1543@itemx no-csharp-format 1544@kwindex no-csharp-format@r{ flag} 1545Likewise for C#, see @ref{csharp-format}. 1546 1547@item awk-format 1548@kwindex awk-format@r{ flag} 1549@itemx no-awk-format 1550@kwindex no-awk-format@r{ flag} 1551Likewise for awk, see @ref{awk-format}. 1552 1553@item object-pascal-format 1554@kwindex object-pascal-format@r{ flag} 1555@itemx no-object-pascal-format 1556@kwindex no-object-pascal-format@r{ flag} 1557Likewise for Object Pascal, see @ref{object-pascal-format}. 1558 1559@item ycp-format 1560@kwindex ycp-format@r{ flag} 1561@itemx no-ycp-format 1562@kwindex no-ycp-format@r{ flag} 1563Likewise for YCP, see @ref{ycp-format}. 1564 1565@item tcl-format 1566@kwindex tcl-format@r{ flag} 1567@itemx no-tcl-format 1568@kwindex no-tcl-format@r{ flag} 1569Likewise for Tcl, see @ref{tcl-format}. 1570 1571@item perl-format 1572@kwindex perl-format@r{ flag} 1573@itemx no-perl-format 1574@kwindex no-perl-format@r{ flag} 1575Likewise for Perl, see @ref{perl-format}. 1576 1577@item perl-brace-format 1578@kwindex perl-brace-format@r{ flag} 1579@itemx no-perl-brace-format 1580@kwindex no-perl-brace-format@r{ flag} 1581Likewise for Perl brace, see @ref{perl-format}. 1582 1583@item php-format 1584@kwindex php-format@r{ flag} 1585@itemx no-php-format 1586@kwindex no-php-format@r{ flag} 1587Likewise for PHP, see @ref{php-format}. 1588 1589@item gcc-internal-format 1590@kwindex gcc-internal-format@r{ flag} 1591@itemx no-gcc-internal-format 1592@kwindex no-gcc-internal-format@r{ flag} 1593Likewise for the GCC sources, see @ref{gcc-internal-format}. 1594 1595@item qt-format 1596@kwindex qt-format@r{ flag} 1597@itemx no-qt-format 1598@kwindex no-qt-format@r{ flag} 1599Likewise for Qt, see @ref{qt-format}. 1600 1601@item kde-format 1602@kwindex kde-format@r{ flag} 1603@itemx no-kde-format 1604@kwindex no-kde-format@r{ flag} 1605Likewise for KDE, see @ref{kde-format}. 1606 1607@item boost-format 1608@kwindex boost-format@r{ flag} 1609@itemx no-boost-format 1610@kwindex no-boost-format@r{ flag} 1611Likewise for Boost, see @ref{boost-format}. 1612 1613@end table 1614 1615@kwindex msgctxt 1616@cindex context, in PO files 1617It is also possible to have entries with a context specifier. They look like 1618this: 1619 1620@example 1621@var{white-space} 1622# @var{translator-comments} 1623#. @var{extracted-comments} 1624#: @var{reference}@dots{} 1625#, @var{flag}@dots{} 1626#| msgctxt @var{previous-context} 1627#| msgid @var{previous-untranslated-string} 1628msgctxt @var{context} 1629msgid @var{untranslated-string} 1630msgstr @var{translated-string} 1631@end example 1632 1633The context serves to disambiguate messages with the same 1634@var{untranslated-string}. It is possible to have several entries with 1635the same @var{untranslated-string} in a PO file, provided that they each 1636have a different @var{context}. Note that an empty @var{context} string 1637and an absent @code{msgctxt} line do not mean the same thing. 1638 1639@kwindex msgid_plural 1640@cindex plural forms, in PO files 1641A different kind of entries is used for translations which involve 1642plural forms. 1643 1644@example 1645@var{white-space} 1646# @var{translator-comments} 1647#. @var{extracted-comments} 1648#: @var{reference}@dots{} 1649#, @var{flag}@dots{} 1650#| msgid @var{previous-untranslated-string-singular} 1651#| msgid_plural @var{previous-untranslated-string-plural} 1652msgid @var{untranslated-string-singular} 1653msgid_plural @var{untranslated-string-plural} 1654msgstr[0] @var{translated-string-case-0} 1655... 1656msgstr[N] @var{translated-string-case-n} 1657@end example 1658 1659Such an entry can look like this: 1660 1661@example 1662#: src/msgcmp.c:338 src/po-lex.c:699 1663#, c-format 1664msgid "found %d fatal error" 1665msgid_plural "found %d fatal errors" 1666msgstr[0] "s'ha trobat %d error fatal" 1667msgstr[1] "s'han trobat %d errors fatals" 1668@end example 1669 1670Here also, a @code{msgctxt} context can be specified before @code{msgid}, 1671like above. 1672 1673The @var{previous-untranslated-string} is optionally inserted by the 1674@code{msgmerge} program, at the same time when it marks a message fuzzy. 1675It helps the translator to see which changes were done by the developers 1676on the @var{untranslated-string}. 1677 1678It happens that some lines, usually whitespace or comments, follow the 1679very last entry of a PO file. Such lines are not part of any entry, 1680and will be dropped when the PO file is processed by the tools, or may 1681disturb some PO file editors. 1682 1683The remainder of this section may be safely skipped by those using 1684a PO file editor, yet it may be interesting for everybody to have a better 1685idea of the precise format of a PO file. On the other hand, those 1686wishing to modify PO files by hand should carefully continue reading on. 1687 1688Each of @var{untranslated-string} and @var{translated-string} respects 1689the C syntax for a character string, including the surrounding quotes 1690and embedded backslashed escape sequences. When the time comes 1691to write multi-line strings, one should not use escaped newlines. 1692Instead, a closing quote should follow the last character on the 1693line to be continued, and an opening quote should resume the string 1694at the beginning of the following PO file line. For example: 1695 1696@example 1697msgid "" 1698"Here is an example of how one might continue a very long string\n" 1699"for the common case the string represents multi-line output.\n" 1700@end example 1701 1702@noindent 1703In this example, the empty string is used on the first line, to 1704allow better alignment of the @code{H} from the word @samp{Here} 1705over the @code{f} from the word @samp{for}. In this example, the 1706@code{msgid} keyword is followed by three strings, which are meant 1707to be concatenated. Concatenating the empty string does not change 1708the resulting overall string, but it is a way for us to comply with 1709the necessity of @code{msgid} to be followed by a string on the same 1710line, while keeping the multi-line presentation left-justified, as 1711we find this to be a cleaner disposition. The empty string could have 1712been omitted, but only if the string starting with @samp{Here} was 1713promoted on the first line, right after @code{msgid}.@footnote{This 1714limitation is not imposed by GNU @code{gettext}, but is for compatibility 1715with the @code{msgfmt} implementation on Solaris.} It was not really necessary 1716either to switch between the two last quoted strings immediately after 1717the newline @samp{\n}, the switch could have occurred after @emph{any} 1718other character, we just did it this way because it is neater. 1719 1720@cindex newlines in PO files 1721One should carefully distinguish between end of lines marked as 1722@samp{\n} @emph{inside} quotes, which are part of the represented 1723string, and end of lines in the PO file itself, outside string quotes, 1724which have no incidence on the represented string. 1725 1726@cindex comments in PO files 1727Outside strings, white lines and comments may be used freely. 1728Comments start at the beginning of a line with @samp{#} and extend 1729until the end of the PO file line. Comments written by translators 1730should have the initial @samp{#} immediately followed by some white 1731space. If the @samp{#} is not immediately followed by white space, 1732this comment is most likely generated and managed by specialized GNU 1733tools, and might disappear or be replaced unexpectedly when the PO 1734file is given to @code{msgmerge}. 1735 1736@node Sources, Template, PO Files, Top 1737@chapter Preparing Program Sources 1738@cindex preparing programs for translation 1739 1740@c FIXME: Rewrite (the whole chapter). 1741 1742For the programmer, changes to the C source code fall into three 1743categories. First, you have to make the localization functions 1744known to all modules needing message translation. Second, you should 1745properly trigger the operation of GNU @code{gettext} when the program 1746initializes, usually from the @code{main} function. Last, you should 1747identify, adjust and mark all constant strings in your program 1748needing translation. 1749 1750@menu 1751* Importing:: Importing the @code{gettext} declaration 1752* Triggering:: Triggering @code{gettext} Operations 1753* Preparing Strings:: Preparing Translatable Strings 1754* Mark Keywords:: How Marks Appear in Sources 1755* Marking:: Marking Translatable Strings 1756* c-format Flag:: Telling something about the following string 1757* Special cases:: Special Cases of Translatable Strings 1758* Bug Report Address:: Letting Users Report Translation Bugs 1759* Names:: Marking Proper Names for Translation 1760* Libraries:: Preparing Library Sources 1761@end menu 1762 1763@node Importing, Triggering, Sources, Sources 1764@section Importing the @code{gettext} declaration 1765 1766Presuming that your set of programs, or package, has been adjusted 1767so all needed GNU @code{gettext} files are available, and your 1768@file{Makefile} files are adjusted (@pxref{Maintainers}), each C module 1769having translated C strings should contain the line: 1770 1771@cindex include file @file{libintl.h} 1772@example 1773#include <libintl.h> 1774@end example 1775 1776Similarly, each C module containing @code{printf()}/@code{fprintf()}/... 1777calls with a format string that could be a translated C string (even if 1778the C string comes from a different C module) should contain the line: 1779 1780@example 1781#include <libintl.h> 1782@end example 1783 1784@node Triggering, Preparing Strings, Importing, Sources 1785@section Triggering @code{gettext} Operations 1786 1787@cindex initialization 1788The initialization of locale data should be done with more or less 1789the same code in every program, as demonstrated below: 1790 1791@example 1792@group 1793int 1794main (int argc, char *argv[]) 1795@{ 1796 @dots{} 1797 setlocale (LC_ALL, ""); 1798 bindtextdomain (PACKAGE, LOCALEDIR); 1799 textdomain (PACKAGE); 1800 @dots{} 1801@} 1802@end group 1803@end example 1804 1805@var{PACKAGE} and @var{LOCALEDIR} should be provided either by 1806@file{config.h} or by the Makefile. For now consult the @code{gettext} 1807or @code{hello} sources for more information. 1808 1809@cindex locale category, LC_ALL 1810@cindex locale category, LC_CTYPE 1811The use of @code{LC_ALL} might not be appropriate for you. 1812@code{LC_ALL} includes all locale categories and especially 1813@code{LC_CTYPE}. This latter category is responsible for determining 1814character classes with the @code{isalnum} etc. functions from 1815@file{ctype.h} which could especially for programs, which process some 1816kind of input language, be wrong. For example this would mean that a 1817source code using the @,{c} (c-cedilla character) is runnable in 1818France but not in the U.S. 1819 1820Some systems also have problems with parsing numbers using the 1821@code{scanf} functions if an other but the @code{LC_ALL} locale category is 1822used. The standards say that additional formats but the one known in the 1823@code{"C"} locale might be recognized. But some systems seem to reject 1824numbers in the @code{"C"} locale format. In some situation, it might 1825also be a problem with the notation itself which makes it impossible to 1826recognize whether the number is in the @code{"C"} locale or the local 1827format. This can happen if thousands separator characters are used. 1828Some locales define this character according to the national 1829conventions to @code{'.'} which is the same character used in the 1830@code{"C"} locale to denote the decimal point. 1831 1832So it is sometimes necessary to replace the @code{LC_ALL} line in the 1833code above by a sequence of @code{setlocale} lines 1834 1835@example 1836@group 1837@{ 1838 @dots{} 1839 setlocale (LC_CTYPE, ""); 1840 setlocale (LC_MESSAGES, ""); 1841 @dots{} 1842@} 1843@end group 1844@end example 1845 1846@cindex locale category, LC_CTYPE 1847@cindex locale category, LC_COLLATE 1848@cindex locale category, LC_MONETARY 1849@cindex locale category, LC_NUMERIC 1850@cindex locale category, LC_TIME 1851@cindex locale category, LC_MESSAGES 1852@cindex locale category, LC_RESPONSES 1853@noindent 1854On all POSIX conformant systems the locale categories @code{LC_CTYPE}, 1855@code{LC_MESSAGES}, @code{LC_COLLATE}, @code{LC_MONETARY}, 1856@code{LC_NUMERIC}, and @code{LC_TIME} are available. On some systems 1857which are only ISO C compliant, @code{LC_MESSAGES} is missing, but 1858a substitute for it is defined in GNU gettext's @code{<libintl.h>} and 1859in GNU gnulib's @code{<locale.h>}. 1860 1861Note that changing the @code{LC_CTYPE} also affects the functions 1862declared in the @code{<ctype.h>} standard header and some functions 1863declared in the @code{<string.h>} and @code{<stdlib.h>} standard headers. 1864If this is not 1865desirable in your application (for example in a compiler's parser), 1866you can use a set of substitute functions which hardwire the C locale, 1867such as found in the modules @samp{c-ctype}, @samp{c-strcase}, 1868@samp{c-strcasestr}, @samp{c-strtod}, @samp{c-strtold} in the GNU gnulib 1869source distribution. 1870 1871It is also possible to switch the locale forth and back between the 1872environment dependent locale and the C locale, but this approach is 1873normally avoided because a @code{setlocale} call is expensive, 1874because it is tedious to determine the places where a locale switch 1875is needed in a large program's source, and because switching a locale 1876is not multithread-safe. 1877 1878@node Preparing Strings, Mark Keywords, Triggering, Sources 1879@section Preparing Translatable Strings 1880 1881@cindex marking strings, preparations 1882Before strings can be marked for translations, they sometimes need to 1883be adjusted. Usually preparing a string for translation is done right 1884before marking it, during the marking phase which is described in the 1885next sections. What you have to keep in mind while doing that is the 1886following. 1887 1888@itemize @bullet 1889@item 1890Decent English style. 1891 1892@item 1893Entire sentences. 1894 1895@item 1896Split at paragraphs. 1897 1898@item 1899Use format strings instead of string concatenation. 1900 1901@item 1902Avoid unusual markup and unusual control characters. 1903@end itemize 1904 1905@noindent 1906Let's look at some examples of these guidelines. 1907 1908@cindex style 1909Translatable strings should be in good English style. If slang language 1910with abbreviations and shortcuts is used, often translators will not 1911understand the message and will produce very inappropriate translations. 1912 1913@example 1914"%s: is parameter\n" 1915@end example 1916 1917@noindent 1918This is nearly untranslatable: Is the displayed item @emph{a} parameter or 1919@emph{the} parameter? 1920 1921@example 1922"No match" 1923@end example 1924 1925@noindent 1926The ambiguity in this message makes it unintelligible: Is the program 1927attempting to set something on fire? Does it mean "The given object does 1928not match the template"? Does it mean "The template does not fit for any 1929of the objects"? 1930 1931@cindex ambiguities 1932In both cases, adding more words to the message will help both the 1933translator and the English speaking user. 1934 1935@cindex sentences 1936Translatable strings should be entire sentences. It is often not possible 1937to translate single verbs or adjectives in a substitutable way. 1938 1939@example 1940printf ("File %s is %s protected", filename, rw ? "write" : "read"); 1941@end example 1942 1943@noindent 1944Most translators will not look at the source and will thus only see the 1945string @code{"File %s is %s protected"}, which is unintelligible. Change 1946this to 1947 1948@example 1949printf (rw ? "File %s is write protected" : "File %s is read protected", 1950 filename); 1951@end example 1952 1953@noindent 1954This way the translator will not only understand the message, she will 1955also be able to find the appropriate grammatical construction. A French 1956translator for example translates "write protected" like "protected 1957against writing". 1958 1959Entire sentences are also important because in many languages, the 1960declination of some word in a sentence depends on the gender or the 1961number (singular/plural) of another part of the sentence. There are 1962usually more interdependencies between words than in English. The 1963consequence is that asking a translator to translate two half-sentences 1964and then combining these two half-sentences through dumb string concatenation 1965will not work, for many languages, even though it would work for English. 1966That's why translators need to handle entire sentences. 1967 1968Often sentences don't fit into a single line. If a sentence is output 1969using two subsequent @code{printf} statements, like this 1970 1971@example 1972printf ("Locale charset \"%s\" is different from\n", lcharset); 1973printf ("input file charset \"%s\".\n", fcharset); 1974@end example 1975 1976@noindent 1977the translator would have to translate two half sentences, but nothing 1978in the POT file would tell her that the two half sentences belong together. 1979It is necessary to merge the two @code{printf} statements so that the 1980translator can handle the entire sentence at once and decide at which 1981place to insert a line break in the translation (if at all): 1982 1983@example 1984printf ("Locale charset \"%s\" is different from\n\ 1985input file charset \"%s\".\n", lcharset, fcharset); 1986@end example 1987 1988You may now ask: how about two or more adjacent sentences? Like in this case: 1989 1990@example 1991puts ("Apollo 13 scenario: Stack overflow handling failed."); 1992puts ("On the next stack overflow we will crash!!!"); 1993@end example 1994 1995@noindent 1996Should these two statements merged into a single one? I would recommend to 1997merge them if the two sentences are related to each other, because then it 1998makes it easier for the translator to understand and translate both. On 1999the other hand, if one of the two messages is a stereotypic one, occurring 2000in other places as well, you will do a favour to the translator by not 2001merging the two. (Identical messages occurring in several places are 2002combined by xgettext, so the translator has to handle them once only.) 2003 2004@cindex paragraphs 2005Translatable strings should be limited to one paragraph; don't let a 2006single message be longer than ten lines. The reason is that when the 2007translatable string changes, the translator is faced with the task of 2008updating the entire translated string. Maybe only a single word will 2009have changed in the English string, but the translator doesn't see that 2010(with the current translation tools), therefore she has to proofread 2011the entire message. 2012 2013@cindex help option 2014Many GNU programs have a @samp{--help} output that extends over several 2015screen pages. It is a courtesy towards the translators to split such a 2016message into several ones of five to ten lines each. While doing that, 2017you can also attempt to split the documented options into groups, 2018such as the input options, the output options, and the informative 2019output options. This will help every user to find the option he is 2020looking for. 2021 2022@cindex string concatenation 2023@cindex concatenation of strings 2024Hardcoded string concatenation is sometimes used to construct English 2025strings: 2026 2027@example 2028strcpy (s, "Replace "); 2029strcat (s, object1); 2030strcat (s, " with "); 2031strcat (s, object2); 2032strcat (s, "?"); 2033@end example 2034 2035@noindent 2036In order to present to the translator only entire sentences, and also 2037because in some languages the translator might want to swap the order 2038of @code{object1} and @code{object2}, it is necessary to change this 2039to use a format string: 2040 2041@example 2042sprintf (s, "Replace %s with %s?", object1, object2); 2043@end example 2044 2045@cindex @code{inttypes.h} 2046A similar case is compile time concatenation of strings. The ISO C 99 2047include file @code{<inttypes.h>} contains a macro @code{PRId64} that 2048can be used as a formatting directive for outputting an @samp{int64_t} 2049integer through @code{printf}. It expands to a constant string, usually 2050"d" or "ld" or "lld" or something like this, depending on the platform. 2051Assume you have code like 2052 2053@example 2054printf ("The amount is %0" PRId64 "\n", number); 2055@end example 2056 2057@noindent 2058The @code{gettext} tools and library have special support for these 2059@code{<inttypes.h>} macros. You can therefore simply write 2060 2061@example 2062printf (gettext ("The amount is %0" PRId64 "\n"), number); 2063@end example 2064 2065@noindent 2066The PO file will contain the string "The amount is %0<PRId64>\n". 2067The translators will provide a translation containing "%0<PRId64>" 2068as well, and at runtime the @code{gettext} function's result will 2069contain the appropriate constant string, "d" or "ld" or "lld". 2070 2071This works only for the predefined @code{<inttypes.h>} macros. If 2072you have defined your own similar macros, let's say @samp{MYPRId64}, 2073that are not known to @code{xgettext}, the solution for this problem 2074is to change the code like this: 2075 2076@example 2077char buf1[100]; 2078sprintf (buf1, "%0" MYPRId64, number); 2079printf (gettext ("The amount is %s\n"), buf1); 2080@end example 2081 2082This means, you put the platform dependent code in one statement, and the 2083internationalization code in a different statement. Note that a buffer length 2084of 100 is safe, because all available hardware integer types are limited to 2085128 bits, and to print a 128 bit integer one needs at most 54 characters, 2086regardless whether in decimal, octal or hexadecimal. 2087 2088@cindex Java, string concatenation 2089@cindex C#, string concatenation 2090All this applies to other programming languages as well. For example, in 2091Java and C#, string concatenation is very frequently used, because it is a 2092compiler built-in operator. Like in C, in Java, you would change 2093 2094@example 2095System.out.println("Replace "+object1+" with "+object2+"?"); 2096@end example 2097 2098@noindent 2099into a statement involving a format string: 2100 2101@example 2102System.out.println( 2103 MessageFormat.format("Replace @{0@} with @{1@}?", 2104 new Object[] @{ object1, object2 @})); 2105@end example 2106 2107@noindent 2108Similarly, in C#, you would change 2109 2110@example 2111Console.WriteLine("Replace "+object1+" with "+object2+"?"); 2112@end example 2113 2114@noindent 2115into a statement involving a format string: 2116 2117@example 2118Console.WriteLine( 2119 String.Format("Replace @{0@} with @{1@}?", object1, object2)); 2120@end example 2121 2122@cindex markup 2123@cindex control characters 2124Unusual markup or control characters should not be used in translatable 2125strings. Translators will likely not understand the particular meaning 2126of the markup or control characters. 2127 2128For example, if you have a convention that @samp{|} delimits the 2129left-hand and right-hand part of some GUI elements, translators will 2130often not understand it without specific comments. It might be 2131better to have the translator translate the left-hand and right-hand 2132part separately. 2133 2134Another example is the @samp{argp} convention to use a single @samp{\v} 2135(vertical tab) control character to delimit two sections inside a 2136string. This is flawed. Some translators may convert it to a simple 2137newline, some to blank lines. With some PO file editors it may not be 2138easy to even enter a vertical tab control character. So, you cannot 2139be sure that the translation will contain a @samp{\v} character, at the 2140corresponding position. The solution is, again, to let the translator 2141translate two separate strings and combine at run-time the two translated 2142strings with the @samp{\v} required by the convention. 2143 2144HTML markup, however, is common enough that it's probably ok to use in 2145translatable strings. But please bear in mind that the GNU gettext tools 2146don't verify that the translations are well-formed HTML. 2147 2148@node Mark Keywords, Marking, Preparing Strings, Sources 2149@section How Marks Appear in Sources 2150@cindex marking strings that require translation 2151 2152All strings requiring translation should be marked in the C sources. Marking 2153is done in such a way that each translatable string appears to be 2154the sole argument of some function or preprocessor macro. There are 2155only a few such possible functions or macros meant for translation, 2156and their names are said to be marking keywords. The marking is 2157attached to strings themselves, rather than to what we do with them. 2158This approach has more uses. A blatant example is an error message 2159produced by formatting. The format string needs translation, as 2160well as some strings inserted through some @samp{%s} specification 2161in the format, while the result from @code{sprintf} may have so many 2162different instances that it is impractical to list them all in some 2163@samp{error_string_out()} routine, say. 2164 2165This marking operation has two goals. The first goal of marking 2166is for triggering the retrieval of the translation, at run time. 2167The keyword is possibly resolved into a routine able to dynamically 2168return the proper translation, as far as possible or wanted, for the 2169argument string. Most localizable strings are found in executable 2170positions, that is, attached to variables or given as parameters to 2171functions. But this is not universal usage, and some translatable 2172strings appear in structured initializations. @xref{Special cases}. 2173 2174The second goal of the marking operation is to help @code{xgettext} 2175at properly extracting all translatable strings when it scans a set 2176of program sources and produces PO file templates. 2177 2178The canonical keyword for marking translatable strings is 2179@samp{gettext}, it gave its name to the whole GNU @code{gettext} 2180package. For packages making only light use of the @samp{gettext} 2181keyword, macro or function, it is easily used @emph{as is}. However, 2182for packages using the @code{gettext} interface more heavily, it 2183is usually more convenient to give the main keyword a shorter, less 2184obtrusive name. Indeed, the keyword might appear on a lot of strings 2185all over the package, and programmers usually do not want nor need 2186their program sources to remind them forcefully, all the time, that they 2187are internationalized. Further, a long keyword has the disadvantage 2188of using more horizontal space, forcing more indentation work on 2189sources for those trying to keep them within 79 or 80 columns. 2190 2191@cindex @code{_}, a macro to mark strings for translation 2192Many packages use @samp{_} (a simple underline) as a keyword, 2193and write @samp{_("Translatable string")} instead of @samp{gettext 2194("Translatable string")}. Further, the coding rule, from GNU standards, 2195wanting that there is a space between the keyword and the opening 2196parenthesis is relaxed, in practice, for this particular usage. 2197So, the textual overhead per translatable string is reduced to 2198only three characters: the underline and the two parentheses. 2199However, even if GNU @code{gettext} uses this convention internally, 2200it does not offer it officially. The real, genuine keyword is truly 2201@samp{gettext} indeed. It is fairly easy for those wanting to use 2202@samp{_} instead of @samp{gettext} to declare: 2203 2204@example 2205#include <libintl.h> 2206#define _(String) gettext (String) 2207@end example 2208 2209@noindent 2210instead of merely using @samp{#include <libintl.h>}. 2211 2212The marking keywords @samp{gettext} and @samp{_} take the translatable 2213string as sole argument. It is also possible to define marking functions 2214that take it at another argument position. It is even possible to make 2215the marked argument position depend on the total number of arguments of 2216the function call; this is useful in C++. All this is achieved using 2217@code{xgettext}'s @samp{--keyword} option. 2218 2219Note also that long strings can be split across lines, into multiple 2220adjacent string tokens. Automatic string concatenation is performed 2221at compile time according to ISO C and ISO C++; @code{xgettext} also 2222supports this syntax. 2223 2224Later on, the maintenance is relatively easy. If, as a programmer, 2225you add or modify a string, you will have to ask yourself if the 2226new or altered string requires translation, and include it within 2227@samp{_()} if you think it should be translated. For example, @samp{"%s"} 2228is an example of string @emph{not} requiring translation. But 2229@samp{"%s: %d"} @emph{does} require translation, because in French, unlike 2230in English, it's customary to put a space before a colon. 2231 2232@node Marking, c-format Flag, Mark Keywords, Sources 2233@section Marking Translatable Strings 2234@emindex marking strings for translation 2235 2236In PO mode, one set of features is meant more for the programmer than 2237for the translator, and allows him to interactively mark which strings, 2238in a set of program sources, are translatable, and which are not. 2239Even if it is a fairly easy job for a programmer to find and mark 2240such strings by other means, using any editor of his choice, PO mode 2241makes this work more comfortable. Further, this gives translators 2242who feel a little like programmers, or programmers who feel a little 2243like translators, a tool letting them work at marking translatable 2244strings in the program sources, while simultaneously producing a set of 2245translation in some language, for the package being internationalized. 2246 2247@emindex @code{etags}, using for marking strings 2248The set of program sources, targeted by the PO mode commands describe 2249here, should have an Emacs tags table constructed for your project, 2250prior to using these PO file commands. This is easy to do. In any 2251shell window, change the directory to the root of your project, then 2252execute a command resembling: 2253 2254@example 2255etags src/*.[hc] lib/*.[hc] 2256@end example 2257 2258@noindent 2259presuming here you want to process all @file{.h} and @file{.c} files 2260from the @file{src/} and @file{lib/} directories. This command will 2261explore all said files and create a @file{TAGS} file in your root 2262directory, somewhat summarizing the contents using a special file 2263format Emacs can understand. 2264 2265@emindex @file{TAGS}, and marking translatable strings 2266For packages following the GNU coding standards, there is 2267a make goal @code{tags} or @code{TAGS} which constructs the tag files in 2268all directories and for all files containing source code. 2269 2270Once your @file{TAGS} file is ready, the following commands assist 2271the programmer at marking translatable strings in his set of sources. 2272But these commands are necessarily driven from within a PO file 2273window, and it is likely that you do not even have such a PO file yet. 2274This is not a problem at all, as you may safely open a new, empty PO 2275file, mainly for using these commands. This empty PO file will slowly 2276fill in while you mark strings as translatable in your program sources. 2277 2278@table @kbd 2279@item , 2280@efindex ,@r{, PO Mode command} 2281Search through program sources for a string which looks like a 2282candidate for translation (@code{po-tags-search}). 2283 2284@item M-, 2285@efindex M-,@r{, PO Mode command} 2286Mark the last string found with @samp{_()} (@code{po-mark-translatable}). 2287 2288@item M-. 2289@efindex M-.@r{, PO Mode command} 2290Mark the last string found with a keyword taken from a set of possible 2291keywords. This command with a prefix allows some management of these 2292keywords (@code{po-select-mark-and-mark}). 2293 2294@end table 2295 2296@efindex po-tags-search@r{, PO Mode command} 2297The @kbd{,} (@code{po-tags-search}) command searches for the next 2298occurrence of a string which looks like a possible candidate for 2299translation, and displays the program source in another Emacs window, 2300positioned in such a way that the string is near the top of this other 2301window. If the string is too big to fit whole in this window, it is 2302positioned so only its end is shown. In any case, the cursor 2303is left in the PO file window. If the shown string would be better 2304presented differently in different native languages, you may mark it 2305using @kbd{M-,} or @kbd{M-.}. Otherwise, you might rather ignore it 2306and skip to the next string by merely repeating the @kbd{,} command. 2307 2308A string is a good candidate for translation if it contains a sequence 2309of three or more letters. A string containing at most two letters in 2310a row will be considered as a candidate if it has more letters than 2311non-letters. The command disregards strings containing no letters, 2312or isolated letters only. It also disregards strings within comments, 2313or strings already marked with some keyword PO mode knows (see below). 2314 2315If you have never told Emacs about some @file{TAGS} file to use, the 2316command will request that you specify one from the minibuffer, the 2317first time you use the command. You may later change your @file{TAGS} 2318file by using the regular Emacs command @w{@kbd{M-x visit-tags-table}}, 2319which will ask you to name the precise @file{TAGS} file you want 2320to use. @xref{Tags, , Tag Tables, emacs, The Emacs Editor}. 2321 2322Each time you use the @kbd{,} command, the search resumes from where it was 2323left by the previous search, and goes through all program sources, 2324obeying the @file{TAGS} file, until all sources have been processed. 2325However, by giving a prefix argument to the command @w{(@kbd{C-u 2326,})}, you may request that the search be restarted all over again 2327from the first program source; but in this case, strings that you 2328recently marked as translatable will be automatically skipped. 2329 2330Using this @kbd{,} command does not prevent using of other regular 2331Emacs tags commands. For example, regular @code{tags-search} or 2332@code{tags-query-replace} commands may be used without disrupting the 2333independent @kbd{,} search sequence. However, as implemented, the 2334@emph{initial} @kbd{,} command (or the @kbd{,} command is used with a 2335prefix) might also reinitialize the regular Emacs tags searching to the 2336first tags file, this reinitialization might be considered spurious. 2337 2338@efindex po-mark-translatable@r{, PO Mode command} 2339@efindex po-select-mark-and-mark@r{, PO Mode command} 2340The @kbd{M-,} (@code{po-mark-translatable}) command will mark the 2341recently found string with the @samp{_} keyword. The @kbd{M-.} 2342(@code{po-select-mark-and-mark}) command will request that you type 2343one keyword from the minibuffer and use that keyword for marking 2344the string. Both commands will automatically create a new PO file 2345untranslated entry for the string being marked, and make it the 2346current entry (making it easy for you to immediately proceed to its 2347translation, if you feel like doing it right away). It is possible 2348that the modifications made to the program source by @kbd{M-,} or 2349@kbd{M-.} render some source line longer than 80 columns, forcing you 2350to break and re-indent this line differently. You may use the @kbd{O} 2351command from PO mode, or any other window changing command from 2352Emacs, to break out into the program source window, and do any 2353needed adjustments. You will have to use some regular Emacs command 2354to return the cursor to the PO file window, if you want command 2355@kbd{,} for the next string, say. 2356 2357The @kbd{M-.} command has a few built-in speedups, so you do not 2358have to explicitly type all keywords all the time. The first such 2359speedup is that you are presented with a @emph{preferred} keyword, 2360which you may accept by merely typing @kbd{@key{RET}} at the prompt. 2361The second speedup is that you may type any non-ambiguous prefix of the 2362keyword you really mean, and the command will complete it automatically 2363for you. This also means that PO mode has to @emph{know} all 2364your possible keywords, and that it will not accept mistyped keywords. 2365 2366If you reply @kbd{?} to the keyword request, the command gives a 2367list of all known keywords, from which you may choose. When the 2368command is prefixed by an argument @w{(@kbd{C-u M-.})}, it inhibits 2369updating any program source or PO file buffer, and does some simple 2370keyword management instead. In this case, the command asks for a 2371keyword, written in full, which becomes a new allowed keyword for 2372later @kbd{M-.} commands. Moreover, this new keyword automatically 2373becomes the @emph{preferred} keyword for later commands. By typing 2374an already known keyword in response to @w{@kbd{C-u M-.}}, one merely 2375changes the @emph{preferred} keyword and does nothing more. 2376 2377All keywords known for @kbd{M-.} are recognized by the @kbd{,} command 2378when scanning for strings, and strings already marked by any of those 2379known keywords are automatically skipped. If many PO files are opened 2380simultaneously, each one has its own independent set of known keywords. 2381There is no provision in PO mode, currently, for deleting a known 2382keyword, you have to quit the file (maybe using @kbd{q}) and reopen 2383it afresh. When a PO file is newly brought up in an Emacs window, only 2384@samp{gettext} and @samp{_} are known as keywords, and @samp{gettext} 2385is preferred for the @kbd{M-.} command. In fact, this is not useful to 2386prefer @samp{_}, as this one is already built in the @kbd{M-,} command. 2387 2388@node c-format Flag, Special cases, Marking, Sources 2389@section Special Comments preceding Keywords 2390 2391@c FIXME document c-format and no-c-format. 2392 2393@cindex format strings 2394In C programs strings are often used within calls of functions from the 2395@code{printf} family. The special thing about these format strings is 2396that they can contain format specifiers introduced with @kbd{%}. Assume 2397we have the code 2398 2399@example 2400printf (gettext ("String `%s' has %d characters\n"), s, strlen (s)); 2401@end example 2402 2403@noindent 2404A possible German translation for the above string might be: 2405 2406@example 2407"%d Zeichen lang ist die Zeichenkette `%s'" 2408@end example 2409 2410A C programmer, even if he cannot speak German, will recognize that 2411there is something wrong here. The order of the two format specifiers 2412is changed but of course the arguments in the @code{printf} don't have. 2413This will most probably lead to problems because now the length of the 2414string is regarded as the address. 2415 2416To prevent errors at runtime caused by translations the @code{msgfmt} 2417tool can check statically whether the arguments in the original and the 2418translation string match in type and number. If this is not the case 2419and the @samp{-c} option has been passed to @code{msgfmt}, @code{msgfmt} 2420will give an error and refuse to produce a MO file. Thus consequent 2421use of @samp{msgfmt -c} will catch the error, so that it cannot cause 2422cause problems at runtime. 2423 2424@noindent 2425If the word order in the above German translation would be correct one 2426would have to write 2427 2428@example 2429"%2$d Zeichen lang ist die Zeichenkette `%1$s'" 2430@end example 2431 2432@noindent 2433The routines in @code{msgfmt} know about this special notation. 2434 2435Because not all strings in a program must be format strings it is not 2436useful for @code{msgfmt} to test all the strings in the @file{.po} file. 2437This might cause problems because the string might contain what looks 2438like a format specifier, but the string is not used in @code{printf}. 2439 2440Therefore the @code{xgettext} adds a special tag to those messages it 2441thinks might be a format string. There is no absolute rule for this, 2442only a heuristic. In the @file{.po} file the entry is marked using the 2443@code{c-format} flag in the @code{#,} comment line (@pxref{PO Files}). 2444 2445@kwindex c-format@r{, and @code{xgettext}} 2446@kwindex no-c-format@r{, and @code{xgettext}} 2447The careful reader now might say that this again can cause problems. 2448The heuristic might guess it wrong. This is true and therefore 2449@code{xgettext} knows about a special kind of comment which lets 2450the programmer take over the decision. If in the same line as or 2451the immediately preceding line to the @code{gettext} keyword 2452the @code{xgettext} program finds a comment containing the words 2453@code{xgettext:c-format}, it will mark the string in any case with 2454the @code{c-format} flag. This kind of comment should be used when 2455@code{xgettext} does not recognize the string as a format string but 2456it really is one and it should be tested. Please note that when the 2457comment is in the same line as the @code{gettext} keyword, it must be 2458before the string to be translated. 2459 2460This situation happens quite often. The @code{printf} function is often 2461called with strings which do not contain a format specifier. Of course 2462one would normally use @code{fputs} but it does happen. In this case 2463@code{xgettext} does not recognize this as a format string but what 2464happens if the translation introduces a valid format specifier? The 2465@code{printf} function will try to access one of the parameters but none 2466exists because the original code does not pass any parameters. 2467 2468@code{xgettext} of course could make a wrong decision the other way 2469round, i.e.@: a string marked as a format string actually is not a format 2470string. In this case the @code{msgfmt} might give too many warnings and 2471would prevent translating the @file{.po} file. The method to prevent 2472this wrong decision is similar to the one used above, only the comment 2473to use must contain the string @code{xgettext:no-c-format}. 2474 2475If a string is marked with @code{c-format} and this is not correct the 2476user can find out who is responsible for the decision. See 2477@ref{xgettext Invocation} to see how the @code{--debug} option can be 2478used for solving this problem. 2479 2480@node Special cases, Bug Report Address, c-format Flag, Sources 2481@section Special Cases of Translatable Strings 2482 2483@cindex marking string initializers 2484The attentive reader might now point out that it is not always possible 2485to mark translatable string with @code{gettext} or something like this. 2486Consider the following case: 2487 2488@example 2489@group 2490@{ 2491 static const char *messages[] = @{ 2492 "some very meaningful message", 2493 "and another one" 2494 @}; 2495 const char *string; 2496 @dots{} 2497 string 2498 = index > 1 ? "a default message" : messages[index]; 2499 2500 fputs (string); 2501 @dots{} 2502@} 2503@end group 2504@end example 2505 2506While it is no problem to mark the string @code{"a default message"} it 2507is not possible to mark the string initializers for @code{messages}. 2508What is to be done? We have to fulfill two tasks. First we have to mark the 2509strings so that the @code{xgettext} program (@pxref{xgettext Invocation}) 2510can find them, and second we have to translate the string at runtime 2511before printing them. 2512 2513The first task can be fulfilled by creating a new keyword, which names a 2514no-op. For the second we have to mark all access points to a string 2515from the array. So one solution can look like this: 2516 2517@example 2518@group 2519#define gettext_noop(String) String 2520 2521@{ 2522 static const char *messages[] = @{ 2523 gettext_noop ("some very meaningful message"), 2524 gettext_noop ("and another one") 2525 @}; 2526 const char *string; 2527 @dots{} 2528 string 2529 = index > 1 ? gettext ("a default message") : gettext (messages[index]); 2530 2531 fputs (string); 2532 @dots{} 2533@} 2534@end group 2535@end example 2536 2537Please convince yourself that the string which is written by 2538@code{fputs} is translated in any case. How to get @code{xgettext} know 2539the additional keyword @code{gettext_noop} is explained in @ref{xgettext 2540Invocation}. 2541 2542The above is of course not the only solution. You could also come along 2543with the following one: 2544 2545@example 2546@group 2547#define gettext_noop(String) String 2548 2549@{ 2550 static const char *messages[] = @{ 2551 gettext_noop ("some very meaningful message", 2552 gettext_noop ("and another one") 2553 @}; 2554 const char *string; 2555 @dots{} 2556 string 2557 = index > 1 ? gettext_noop ("a default message") : messages[index]; 2558 2559 fputs (gettext (string)); 2560 @dots{} 2561@} 2562@end group 2563@end example 2564 2565But this has a drawback. The programmer has to take care that 2566he uses @code{gettext_noop} for the string @code{"a default message"}. 2567A use of @code{gettext} could have in rare cases unpredictable results. 2568 2569One advantage is that you need not make control flow analysis to make 2570sure the output is really translated in any case. But this analysis is 2571generally not very difficult. If it should be in any situation you can 2572use this second method in this situation. 2573 2574@node Bug Report Address, Names, Special cases, Sources 2575@section Letting Users Report Translation Bugs 2576 2577Code sometimes has bugs, but translations sometimes have bugs too. The 2578users need to be able to report them. Reporting translation bugs to the 2579programmer or maintainer of a package is not very useful, since the 2580maintainer must never change a translation, except on behalf of the 2581translator. Hence the translation bugs must be reported to the 2582translators. 2583 2584Here is a way to organize this so that the maintainer does not need to 2585forward translation bug reports, nor even keep a list of the addresses of 2586the translators or their translation teams. 2587 2588Every program has a place where is shows the bug report address. For 2589GNU programs, it is the code which handles the ``--help'' option, 2590typically in a function called ``usage''. In this place, instruct the 2591translator to add her own bug reporting address. For example, if that 2592code has a statement 2593 2594@example 2595@group 2596printf (_("Report bugs to <%s>.\n"), PACKAGE_BUGREPORT); 2597@end group 2598@end example 2599 2600you can add some translator instructions like this: 2601 2602@example 2603@group 2604/* TRANSLATORS: The placeholder indicates the bug-reporting address 2605 for this package. Please add _another line_ saying 2606 "Report translation bugs to <...>\n" with the address for translation 2607 bugs (typically your translation team's web or email address). */ 2608printf (_("Report bugs to <%s>.\n"), PACKAGE_BUGREPORT); 2609@end group 2610@end example 2611 2612These will be extracted by @samp{xgettext}, leading to a .pot file that 2613contains this: 2614 2615@example 2616@group 2617#. TRANSLATORS: The placeholder indicates the bug-reporting address 2618#. for this package. Please add _another line_ saying 2619#. "Report translation bugs to <...>\n" with the address for translation 2620#. bugs (typically your translation team's web or email address). 2621#: src/hello.c:178 2622#, c-format 2623msgid "Report bugs to <%s>.\n" 2624msgstr "" 2625@end group 2626@end example 2627 2628@node Names, Libraries, Bug Report Address, Sources 2629@section Marking Proper Names for Translation 2630 2631Should names of persons, cities, locations etc. be marked for translation 2632or not? People who only know languages that can be written with Latin 2633letters (English, Spanish, French, German, etc.) are tempted to say ``no'', 2634because names usually do not change when transported between these languages. 2635However, in general when translating from one script to another, names 2636are translated too, usually phonetically or by transliteration. For 2637example, Russian or Greek names are converted to the Latin alphabet when 2638being translated to English, and English or French names are converted 2639to the Katakana script when being translated to Japanese. This is 2640necessary because the speakers of the target language in general cannot 2641read the script the name is originally written in. 2642 2643As a programmer, you should therefore make sure that names are marked 2644for translation, with a special comment telling the translators that it 2645is a proper name and how to pronounce it. Like this: 2646 2647@example 2648@group 2649printf (_("Written by %s.\n"), 2650 /* TRANSLATORS: This is a proper name. See the gettext 2651 manual, section Names. Note this is actually a non-ASCII 2652 name: The first name is (with Unicode escapes) 2653 "Fran\u00e7ois" or (with HTML entities) "François". 2654 Pronunciation is like "fraa-swa pee-nar". */ 2655 _("Francois Pinard")); 2656@end group 2657@end example 2658 2659As a translator, you should use some care when translating names, because 2660it is frustrating if people see their names mutilated or distorted. If 2661your language uses the Latin script, all you need to do is to reproduce 2662the name as perfectly as you can within the usual character set of your 2663language. In this particular case, this means to provide a translation 2664containing the c-cedilla character. If your language uses a different 2665script and the people speaking it don't usually read Latin words, it means 2666transliteration; but you should still give, in parentheses, the original 2667writing of the name -- for the sake of the people that do read the Latin 2668script. Here is an example, using Greek as the target script: 2669 2670@example 2671@group 2672#. This is a proper name. See the gettext 2673#. manual, section Names. Note this is actually a non-ASCII 2674#. name: The first name is (with Unicode escapes) 2675#. "Fran\u00e7ois" or (with HTML entities) "François". 2676#. Pronunciation is like "fraa-swa pee-nar". 2677msgid "Francois Pinard" 2678msgstr "\phi\rho\alpha\sigma\omicron\alpha \pi\iota\nu\alpha\rho" 2679 " (Francois Pinard)" 2680@end group 2681@end example 2682 2683Because translation of names is such a sensitive domain, it is a good 2684idea to test your translation before submitting it. 2685 2686The translation project @url{http://sourceforge.net/projects/translation} 2687has set up a POT file and translation domain consisting of program author 2688names, with better facilities for the translator than those presented here. 2689Namely, there the original name is written directly in Unicode (rather 2690than with Unicode escapes or HTML entities), and the pronunciation is 2691denoted using the International Phonetic Alphabet (see 2692@url{http://www.wikipedia.org/wiki/International_Phonetic_Alphabet}). 2693 2694However, we don't recommend this approach for all POT files in all packages, 2695because this would force translators to use PO files in UTF-8 encoding, 2696which is - in the current state of software (as of 2003) - a major hassle 2697for translators using GNU Emacs or XEmacs with po-mode. 2698 2699@node Libraries, , Names, Sources 2700@section Preparing Library Sources 2701 2702When you are preparing a library, not a program, for the use of 2703@code{gettext}, only a few details are different. Here we assume that 2704the library has a translation domain and a POT file of its own. (If 2705it uses the translation domain and POT file of the main program, then 2706the previous sections apply without changes.) 2707 2708@enumerate 2709@item 2710The library code doesn't call @code{setlocale (LC_ALL, "")}. It's the 2711responsibility of the main program to set the locale. The library's 2712documentation should mention this fact, so that developers of programs 2713using the library are aware of it. 2714 2715@item 2716The library code doesn't call @code{textdomain (PACKAGE)}, because it 2717would interfere with the text domain set by the main program. 2718 2719@item 2720The initialization code for a program was 2721 2722@smallexample 2723 setlocale (LC_ALL, ""); 2724 bindtextdomain (PACKAGE, LOCALEDIR); 2725 textdomain (PACKAGE); 2726@end smallexample 2727 2728@noindent 2729For a library it is reduced to 2730 2731@smallexample 2732 bindtextdomain (PACKAGE, LOCALEDIR); 2733@end smallexample 2734 2735@noindent 2736If your library's API doesn't already have an initialization function, 2737you need to create one, containing at least the @code{bindtextdomain} 2738invocation. However, you usually don't need to export and document this 2739initialization function: It is sufficient that all entry points of the 2740library call the initialization function if it hasn't been called before. 2741The typical idiom used to achieve this is a static boolean variable that 2742indicates whether the initialization function has been called. Like this: 2743 2744@example 2745@group 2746static bool libfoo_initialized; 2747 2748static void 2749libfoo_initialize (void) 2750@{ 2751 bindtextdomain (PACKAGE, LOCALEDIR); 2752 libfoo_initialized = true; 2753@} 2754 2755/* This function is part of the exported API. */ 2756struct foo * 2757create_foo (...) 2758@{ 2759 /* Must ensure the initialization is performed. */ 2760 if (!libfoo_initialized) 2761 libfoo_initialize (); 2762 ... 2763@} 2764 2765/* This function is part of the exported API. The argument must be 2766 non-NULL and have been created through create_foo(). */ 2767int 2768foo_refcount (struct foo *argument) 2769@{ 2770 /* No need to invoke the initialization function here, because 2771 create_foo() must already have been called before. */ 2772 ... 2773@} 2774@end group 2775@end example 2776 2777@item 2778The usual declaration of the @samp{_} macro in each source file was 2779 2780@smallexample 2781#include <libintl.h> 2782#define _(String) gettext (String) 2783@end smallexample 2784 2785@noindent 2786for a program. For a library, which has its own translation domain, 2787it reads like this: 2788 2789@smallexample 2790#include <libintl.h> 2791#define _(String) dgettext (PACKAGE, String) 2792@end smallexample 2793 2794In other words, @code{dgettext} is used instead of @code{gettext}. 2795Similarly, the @code{dngettext} function should be used in place of the 2796@code{ngettext} function. 2797@end enumerate 2798 2799@node Template, Creating, Sources, Top 2800@chapter Making the PO Template File 2801@cindex PO template file 2802 2803After preparing the sources, the programmer creates a PO template file. 2804This section explains how to use @code{xgettext} for this purpose. 2805 2806@code{xgettext} creates a file named @file{@var{domainname}.po}. You 2807should then rename it to @file{@var{domainname}.pot}. (Why doesn't 2808@code{xgettext} create it under the name @file{@var{domainname}.pot} 2809right away? The answer is: for historical reasons. When @code{xgettext} 2810was specified, the distinction between a PO file and PO file template 2811was fuzzy, and the suffix @samp{.pot} wasn't in use at that time.) 2812 2813@c FIXME: Rewrite. 2814 2815@menu 2816* xgettext Invocation:: Invoking the @code{xgettext} Program 2817@end menu 2818 2819@node xgettext Invocation, , Template, Template 2820@section Invoking the @code{xgettext} Program 2821 2822@include xgettext.texi 2823 2824@node Creating, Updating, Template, Top 2825@chapter Creating a New PO File 2826@cindex creating a new PO file 2827 2828When starting a new translation, the translator creates a file called 2829@file{@var{LANG}.po}, as a copy of the @file{@var{package}.pot} template 2830file with modifications in the initial comments (at the beginning of the file) 2831and in the header entry (the first entry, near the beginning of the file). 2832 2833The easiest way to do so is by use of the @samp{msginit} program. 2834For example: 2835 2836@example 2837$ cd @var{PACKAGE}-@var{VERSION} 2838$ cd po 2839$ msginit 2840@end example 2841 2842The alternative way is to do the copy and modifications by hand. 2843To do so, the translator copies @file{@var{package}.pot} to 2844@file{@var{LANG}.po}. Then she modifies the initial comments and 2845the header entry of this file. 2846 2847@menu 2848* msginit Invocation:: Invoking the @code{msginit} Program 2849* Header Entry:: Filling in the Header Entry 2850@end menu 2851 2852@node msginit Invocation, Header Entry, Creating, Creating 2853@section Invoking the @code{msginit} Program 2854 2855@include msginit.texi 2856 2857@node Header Entry, , msginit Invocation, Creating 2858@section Filling in the Header Entry 2859@cindex header entry of a PO file 2860 2861The initial comments "SOME DESCRIPTIVE TITLE", "YEAR" and 2862"FIRST AUTHOR <EMAIL@@ADDRESS>, YEAR" ought to be replaced by sensible 2863information. This can be done in any text editor; if Emacs is used 2864and it switched to PO mode automatically (because it has recognized 2865the file's suffix), you can disable it by typing @kbd{M-x fundamental-mode}. 2866 2867Modifying the header entry can already be done using PO mode: in Emacs, 2868type @kbd{M-x po-mode RET} and then @kbd{RET} again to start editing the 2869entry. You should fill in the following fields. 2870 2871@table @asis 2872@item Project-Id-Version 2873This is the name and version of the package. Fill it in if it has not 2874already been filled in by @code{xgettext}. 2875 2876@item Report-Msgid-Bugs-To 2877This has already been filled in by @code{xgettext}. It contains an email 2878address or URL where you can report bugs in the untranslated strings: 2879 2880@itemize - 2881@item Strings which are not entire sentences, see the maintainer guidelines 2882in @ref{Preparing Strings}. 2883@item Strings which use unclear terms or require additional context to be 2884understood. 2885@item Strings which make invalid assumptions about notation of date, time or 2886money. 2887@item Pluralisation problems. 2888@item Incorrect English spelling. 2889@item Incorrect formatting. 2890@end itemize 2891 2892@item POT-Creation-Date 2893This has already been filled in by @code{xgettext}. 2894 2895@item PO-Revision-Date 2896You don't need to fill this in. It will be filled by the PO file editor 2897when you save the file. 2898 2899@item Last-Translator 2900Fill in your name and email address (without double quotes). 2901 2902@item Language-Team 2903Fill in the English name of the language, and the email address or 2904homepage URL of the language team you are part of. 2905 2906Before starting a translation, it is a good idea to get in touch with 2907your translation team, not only to make sure you don't do duplicated work, 2908but also to coordinate difficult linguistic issues. 2909 2910@cindex list of translation teams, where to find 2911In the Free Translation Project, each translation team has its own mailing 2912list. The up-to-date list of teams can be found at the Free Translation 2913Project's homepage, @uref{http://translationproject.org/}, in the "Teams" 2914area. 2915 2916@item Content-Type 2917@cindex encoding of PO files 2918@cindex charset of PO files 2919Replace @samp{CHARSET} with the character encoding used for your language, 2920in your locale, or UTF-8. This field is needed for correct operation of the 2921@code{msgmerge} and @code{msgfmt} programs, as well as for users whose 2922locale's character encoding differs from yours (see @ref{Charset conversion}). 2923 2924@cindex @code{locale} program 2925You get the character encoding of your locale by running the shell command 2926@samp{locale charmap}. If the result is @samp{C} or @samp{ANSI_X3.4-1968}, 2927which is equivalent to @samp{ASCII} (= @samp{US-ASCII}), it means that your 2928locale is not correctly configured. In this case, ask your translation 2929team which charset to use. @samp{ASCII} is not usable for any language 2930except Latin. 2931 2932@cindex encoding list 2933Because the PO files must be portable to operating systems with less advanced 2934internationalization facilities, the character encodings that can be used 2935are limited to those supported by both GNU @code{libc} and GNU 2936@code{libiconv}. These are: 2937@code{ASCII}, @code{ISO-8859-1}, @code{ISO-8859-2}, @code{ISO-8859-3}, 2938@code{ISO-8859-4}, @code{ISO-8859-5}, @code{ISO-8859-6}, @code{ISO-8859-7}, 2939@code{ISO-8859-8}, @code{ISO-8859-9}, @code{ISO-8859-13}, @code{ISO-8859-14}, 2940@code{ISO-8859-15}, 2941@code{KOI8-R}, @code{KOI8-U}, @code{KOI8-T}, 2942@code{CP850}, @code{CP866}, @code{CP874}, 2943@code{CP932}, @code{CP949}, @code{CP950}, @code{CP1250}, @code{CP1251}, 2944@code{CP1252}, @code{CP1253}, @code{CP1254}, @code{CP1255}, @code{CP1256}, 2945@code{CP1257}, @code{GB2312}, @code{EUC-JP}, @code{EUC-KR}, @code{EUC-TW}, 2946@code{BIG5}, @code{BIG5-HKSCS}, @code{GBK}, @code{GB18030}, @code{SHIFT_JIS}, 2947@code{JOHAB}, @code{TIS-620}, @code{VISCII}, @code{GEORGIAN-PS}, @code{UTF-8}. 2948 2949@c This data is taken from glibc/localedata/SUPPORTED. 2950@cindex Linux 2951In the GNU system, the following encodings are frequently used for the 2952corresponding languages. 2953 2954@cindex encoding for your language 2955@itemize 2956@item @code{ISO-8859-1} for 2957Afrikaans, Albanian, Basque, Breton, Catalan, Cornish, Danish, Dutch, 2958English, Estonian, Faroese, Finnish, French, Galician, German, 2959Greenlandic, Icelandic, Indonesian, Irish, Italian, Malay, Manx, 2960Norwegian, Occitan, Portuguese, Spanish, Swedish, Tagalog, Uzbek, 2961Walloon, 2962@item @code{ISO-8859-2} for 2963Bosnian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian, Slovak, 2964Slovenian, 2965@item @code{ISO-8859-3} for Maltese, 2966@item @code{ISO-8859-5} for Macedonian, Serbian, 2967@item @code{ISO-8859-6} for Arabic, 2968@item @code{ISO-8859-7} for Greek, 2969@item @code{ISO-8859-8} for Hebrew, 2970@item @code{ISO-8859-9} for Turkish, 2971@item @code{ISO-8859-13} for Latvian, Lithuanian, Maori, 2972@item @code{ISO-8859-14} for Welsh, 2973@item @code{ISO-8859-15} for 2974Basque, Catalan, Dutch, English, Finnish, French, Galician, German, Irish, 2975Italian, Portuguese, Spanish, Swedish, Walloon, 2976@item @code{KOI8-R} for Russian, 2977@item @code{KOI8-U} for Ukrainian, 2978@item @code{KOI8-T} for Tajik, 2979@item @code{CP1251} for Bulgarian, Byelorussian, 2980@item @code{GB2312}, @code{GBK}, @code{GB18030} 2981for simplified writing of Chinese, 2982@item @code{BIG5}, @code{BIG5-HKSCS} 2983for traditional writing of Chinese, 2984@item @code{EUC-JP} for Japanese, 2985@item @code{EUC-KR} for Korean, 2986@item @code{TIS-620} for Thai, 2987@item @code{GEORGIAN-PS} for Georgian, 2988@item @code{UTF-8} for any language, including those listed above. 2989@end itemize 2990 2991@cindex quote characters, use in PO files 2992@cindex quotation marks 2993When single quote characters or double quote characters are used in 2994translations for your language, and your locale's encoding is one of the 2995ISO-8859-* charsets, it is best if you create your PO files in UTF-8 2996encoding, instead of your locale's encoding. This is because in UTF-8 2997the real quote characters can be represented (single quote characters: 2998U+2018, U+2019, double quote characters: U+201C, U+201D), whereas none of 2999ISO-8859-* charsets has them all. Users in UTF-8 locales will see the 3000real quote characters, whereas users in ISO-8859-* locales will see the 3001vertical apostrophe and the vertical double quote instead (because that's 3002what the character set conversion will transliterate them to). 3003 3004@cindex @code{xmodmap} program, and typing quotation marks 3005To enter such quote characters under X11, you can change your keyboard 3006mapping using the @code{xmodmap} program. The X11 names of the quote 3007characters are "leftsinglequotemark", "rightsinglequotemark", 3008"leftdoublequotemark", "rightdoublequotemark", "singlelowquotemark", 3009"doublelowquotemark". 3010 3011Note that only recent versions of GNU Emacs support the UTF-8 encoding: 3012Emacs 20 with Mule-UCS, and Emacs 21. As of January 2001, XEmacs doesn't 3013support the UTF-8 encoding. 3014 3015The character encoding name can be written in either upper or lower case. 3016Usually upper case is preferred. 3017 3018@item Content-Transfer-Encoding 3019Set this to @code{8bit}. 3020 3021@item Plural-Forms 3022This field is optional. It is only needed if the PO file has plural forms. 3023You can find them by searching for the @samp{msgid_plural} keyword. The 3024format of the plural forms field is described in @ref{Plural forms}. 3025@end table 3026 3027@node Updating, Editing, Creating, Top 3028@chapter Updating Existing PO Files 3029 3030@menu 3031* msgmerge Invocation:: Invoking the @code{msgmerge} Program 3032@end menu 3033 3034@node msgmerge Invocation, , Updating, Updating 3035@section Invoking the @code{msgmerge} Program 3036 3037@include msgmerge.texi 3038 3039@node Editing, Manipulating, Updating, Top 3040@chapter Editing PO Files 3041@cindex Editing PO Files 3042 3043@menu 3044* KBabel:: KDE's PO File Editor 3045* Gtranslator:: GNOME's PO File Editor 3046* PO Mode:: Emacs's PO File Editor 3047* Compendium:: Using Translation Compendia 3048@end menu 3049 3050@node KBabel, Gtranslator, Editing, Editing 3051@section KDE's PO File Editor 3052@cindex KDE PO file editor 3053 3054@node Gtranslator, PO Mode, KBabel, Editing 3055@section GNOME's PO File Editor 3056@cindex GNOME PO file editor 3057 3058@node PO Mode, Compendium, Gtranslator, Editing 3059@section Emacs's PO File Editor 3060@cindex Emacs PO Mode 3061 3062@c FIXME: Rewrite. 3063 3064For those of you being 3065the lucky users of Emacs, PO mode has been specifically created 3066for providing a cozy environment for editing or modifying PO files. 3067While editing a PO file, PO mode allows for the easy browsing of 3068auxiliary and compendium PO files, as well as for following references into 3069the set of C program sources from which PO files have been derived. 3070It has a few special features, among which are the interactive marking 3071of program strings as translatable, and the validation of PO files 3072with easy repositioning to PO file lines showing errors. 3073 3074For the beginning, besides main PO mode commands 3075(@pxref{Main PO Commands}), you should know how to move between entries 3076(@pxref{Entry Positioning}), and how to handle untranslated entries 3077(@pxref{Untranslated Entries}). 3078 3079@menu 3080* Installation:: Completing GNU @code{gettext} Installation 3081* Main PO Commands:: Main Commands 3082* Entry Positioning:: Entry Positioning 3083* Normalizing:: Normalizing Strings in Entries 3084* Translated Entries:: Translated Entries 3085* Fuzzy Entries:: Fuzzy Entries 3086* Untranslated Entries:: Untranslated Entries 3087* Obsolete Entries:: Obsolete Entries 3088* Modifying Translations:: Modifying Translations 3089* Modifying Comments:: Modifying Comments 3090* Subedit:: Mode for Editing Translations 3091* C Sources Context:: C Sources Context 3092* Auxiliary:: Consulting Auxiliary PO Files 3093@end menu 3094 3095@node Installation, Main PO Commands, PO Mode, PO Mode 3096@subsection Completing GNU @code{gettext} Installation 3097 3098@cindex installing @code{gettext} 3099@cindex @code{gettext} installation 3100Once you have received, unpacked, configured and compiled the GNU 3101@code{gettext} distribution, the @samp{make install} command puts in 3102place the programs @code{xgettext}, @code{msgfmt}, @code{gettext}, and 3103@code{msgmerge}, as well as their available message catalogs. To 3104top off a comfortable installation, you might also want to make the 3105PO mode available to your Emacs users. 3106 3107@emindex @file{.emacs} customizations 3108@emindex installing PO mode 3109During the installation of the PO mode, you might want to modify your 3110file @file{.emacs}, once and for all, so it contains a few lines looking 3111like: 3112 3113@example 3114(setq auto-mode-alist 3115 (cons '("\\.po\\'\\|\\.po\\." . po-mode) auto-mode-alist)) 3116(autoload 'po-mode "po-mode" "Major mode for translators to edit PO files" t) 3117@end example 3118 3119Later, whenever you edit some @file{.po} 3120file, or any file having the string @samp{.po.} within its name, 3121Emacs loads @file{po-mode.elc} (or @file{po-mode.el}) as needed, and 3122automatically activates PO mode commands for the associated buffer. 3123The string @emph{PO} appears in the mode line for any buffer for 3124which PO mode is active. Many PO files may be active at once in a 3125single Emacs session. 3126 3127If you are using Emacs version 20 or newer, and have already installed 3128the appropriate international fonts on your system, you may also tell 3129Emacs how to determine automatically the coding system of every PO file. 3130This will often (but not always) cause the necessary fonts to be loaded 3131and used for displaying the translations on your Emacs screen. For this 3132to happen, add the lines: 3133 3134@example 3135(modify-coding-system-alist 'file "\\.po\\'\\|\\.po\\." 3136 'po-find-file-coding-system) 3137(autoload 'po-find-file-coding-system "po-mode") 3138@end example 3139 3140@noindent 3141to your @file{.emacs} file. If, with this, you still see boxes instead 3142of international characters, try a different font set (via Shift Mouse 3143button 1). 3144 3145@node Main PO Commands, Entry Positioning, Installation, PO Mode 3146@subsection Main PO mode Commands 3147 3148@cindex PO mode (Emacs) commands 3149@emindex commands 3150After setting up Emacs with something similar to the lines in 3151@ref{Installation}, PO mode is activated for a window when Emacs finds a 3152PO file in that window. This puts the window read-only and establishes a 3153po-mode-map, which is a genuine Emacs mode, in a way that is not derived 3154from text mode in any way. Functions found on @code{po-mode-hook}, 3155if any, will be executed. 3156 3157When PO mode is active in a window, the letters @samp{PO} appear 3158in the mode line for that window. The mode line also displays how 3159many entries of each kind are held in the PO file. For example, 3160the string @samp{132t+3f+10u+2o} would tell the translator that the 3161PO mode contains 132 translated entries (@pxref{Translated Entries}, 31623 fuzzy entries (@pxref{Fuzzy Entries}), 10 untranslated entries 3163(@pxref{Untranslated Entries}) and 2 obsolete entries (@pxref{Obsolete 3164Entries}). Zero-coefficients items are not shown. So, in this example, if 3165the fuzzy entries were unfuzzied, the untranslated entries were translated 3166and the obsolete entries were deleted, the mode line would merely display 3167@samp{145t} for the counters. 3168 3169The main PO commands are those which do not fit into the other categories of 3170subsequent sections. These allow for quitting PO mode or for managing windows 3171in special ways. 3172 3173@table @kbd 3174@item _ 3175@efindex _@r{, PO Mode command} 3176Undo last modification to the PO file (@code{po-undo}). 3177 3178@item Q 3179@efindex Q@r{, PO Mode command} 3180Quit processing and save the PO file (@code{po-quit}). 3181 3182@item q 3183@efindex q@r{, PO Mode command} 3184Quit processing, possibly after confirmation (@code{po-confirm-and-quit}). 3185 3186@item 0 3187@efindex 0@r{, PO Mode command} 3188Temporary leave the PO file window (@code{po-other-window}). 3189 3190@item ? 3191@itemx h 3192@efindex ?@r{, PO Mode command} 3193@efindex h@r{, PO Mode command} 3194Show help about PO mode (@code{po-help}). 3195 3196@item = 3197@efindex =@r{, PO Mode command} 3198Give some PO file statistics (@code{po-statistics}). 3199 3200@item V 3201@efindex V@r{, PO Mode command} 3202Batch validate the format of the whole PO file (@code{po-validate}). 3203 3204@end table 3205 3206@efindex _@r{, PO Mode command} 3207@efindex po-undo@r{, PO Mode command} 3208The command @kbd{_} (@code{po-undo}) interfaces to the Emacs 3209@emph{undo} facility. @xref{Undo, , Undoing Changes, emacs, The Emacs 3210Editor}. Each time @kbd{U} is typed, modifications which the translator 3211did to the PO file are undone a little more. For the purpose of 3212undoing, each PO mode command is atomic. This is especially true for 3213the @kbd{@key{RET}} command: the whole edition made by using a single 3214use of this command is undone at once, even if the edition itself 3215implied several actions. However, while in the editing window, one 3216can undo the edition work quite parsimoniously. 3217 3218@efindex Q@r{, PO Mode command} 3219@efindex q@r{, PO Mode command} 3220@efindex po-quit@r{, PO Mode command} 3221@efindex po-confirm-and-quit@r{, PO Mode command} 3222The commands @kbd{Q} (@code{po-quit}) and @kbd{q} 3223(@code{po-confirm-and-quit}) are used when the translator is done with the 3224PO file. The former is a bit less verbose than the latter. If the file 3225has been modified, it is saved to disk first. In both cases, and prior to 3226all this, the commands check if any untranslated messages remain in the 3227PO file and, if so, the translator is asked if she really wants to leave 3228off working with this PO file. This is the preferred way of getting rid 3229of an Emacs PO file buffer. Merely killing it through the usual command 3230@w{@kbd{C-x k}} (@code{kill-buffer}) is not the tidiest way to proceed. 3231 3232@efindex 0@r{, PO Mode command} 3233@efindex po-other-window@r{, PO Mode command} 3234The command @kbd{0} (@code{po-other-window}) is another, softer way, 3235to leave PO mode, temporarily. It just moves the cursor to some other 3236Emacs window, and pops one if necessary. For example, if the translator 3237just got PO mode to show some source context in some other, she might 3238discover some apparent bug in the program source that needs correction. 3239This command allows the translator to change sex, become a programmer, 3240and have the cursor right into the window containing the program she 3241(or rather @emph{he}) wants to modify. By later getting the cursor back 3242in the PO file window, or by asking Emacs to edit this file once again, 3243PO mode is then recovered. 3244 3245@efindex ?@r{, PO Mode command} 3246@efindex h@r{, PO Mode command} 3247@efindex po-help@r{, PO Mode command} 3248The command @kbd{h} (@code{po-help}) displays a summary of all available PO 3249mode commands. The translator should then type any character to resume 3250normal PO mode operations. The command @kbd{?} has the same effect 3251as @kbd{h}. 3252 3253@efindex =@r{, PO Mode command} 3254@efindex po-statistics@r{, PO Mode command} 3255The command @kbd{=} (@code{po-statistics}) computes the total number of 3256entries in the PO file, the ordinal of the current entry (counted from 32571), the number of untranslated entries, the number of obsolete entries, 3258and displays all these numbers. 3259 3260@efindex V@r{, PO Mode command} 3261@efindex po-validate@r{, PO Mode command} 3262The command @kbd{V} (@code{po-validate}) launches @code{msgfmt} in 3263checking and verbose 3264mode over the current PO file. This command first offers to save the 3265current PO file on disk. The @code{msgfmt} tool, from GNU @code{gettext}, 3266has the purpose of creating a MO file out of a PO file, and PO mode uses 3267the features of this program for checking the overall format of a PO file, 3268as well as all individual entries. 3269 3270@efindex next-error@r{, stepping through PO file validation results} 3271The program @code{msgfmt} runs asynchronously with Emacs, so the 3272translator regains control immediately while her PO file is being studied. 3273Error output is collected in the Emacs @samp{*compilation*} buffer, 3274displayed in another window. The regular Emacs command @kbd{C-x`} 3275(@code{next-error}), as well as other usual compile commands, allow the 3276translator to reposition quickly to the offending parts of the PO file. 3277Once the cursor is on the line in error, the translator may decide on 3278any PO mode action which would help correcting the error. 3279 3280@node Entry Positioning, Normalizing, Main PO Commands, PO Mode 3281@subsection Entry Positioning 3282 3283@emindex current entry of a PO file 3284The cursor in a PO file window is almost always part of 3285an entry. The only exceptions are the special case when the cursor 3286is after the last entry in the file, or when the PO file is 3287empty. The entry where the cursor is found to be is said to be the 3288current entry. Many PO mode commands operate on the current entry, 3289so moving the cursor does more than allowing the translator to browse 3290the PO file, this also selects on which entry commands operate. 3291 3292@emindex moving through a PO file 3293Some PO mode commands alter the position of the cursor in a specialized 3294way. A few of those special purpose positioning are described here, 3295the others are described in following sections (for a complete list try 3296@kbd{C-h m}): 3297 3298@table @kbd 3299 3300@item . 3301@efindex .@r{, PO Mode command} 3302Redisplay the current entry (@code{po-current-entry}). 3303 3304@item n 3305@efindex n@r{, PO Mode command} 3306Select the entry after the current one (@code{po-next-entry}). 3307 3308@item p 3309@efindex p@r{, PO Mode command} 3310Select the entry before the current one (@code{po-previous-entry}). 3311 3312@item < 3313@efindex <@r{, PO Mode command} 3314Select the first entry in the PO file (@code{po-first-entry}). 3315 3316@item > 3317@efindex >@r{, PO Mode command} 3318Select the last entry in the PO file (@code{po-last-entry}). 3319 3320@item m 3321@efindex m@r{, PO Mode command} 3322Record the location of the current entry for later use 3323(@code{po-push-location}). 3324 3325@item r 3326@efindex r@r{, PO Mode command} 3327Return to a previously saved entry location (@code{po-pop-location}). 3328 3329@item x 3330@efindex x@r{, PO Mode command} 3331Exchange the current entry location with the previously saved one 3332(@code{po-exchange-location}). 3333 3334@end table 3335 3336@efindex .@r{, PO Mode command} 3337@efindex po-current-entry@r{, PO Mode command} 3338Any Emacs command able to reposition the cursor may be used 3339to select the current entry in PO mode, including commands which 3340move by characters, lines, paragraphs, screens or pages, and search 3341commands. However, there is a kind of standard way to display the 3342current entry in PO mode, which usual Emacs commands moving 3343the cursor do not especially try to enforce. The command @kbd{.} 3344(@code{po-current-entry}) has the sole purpose of redisplaying the 3345current entry properly, after the current entry has been changed by 3346means external to PO mode, or the Emacs screen otherwise altered. 3347 3348It is yet to be decided if PO mode helps the translator, or otherwise 3349irritates her, by forcing a rigid window disposition while she 3350is doing her work. We originally had quite precise ideas about 3351how windows should behave, but on the other hand, anyone used to 3352Emacs is often happy to keep full control. Maybe a fixed window 3353disposition might be offered as a PO mode option that the translator 3354might activate or deactivate at will, so it could be offered on an 3355experimental basis. If nobody feels a real need for using it, or 3356a compulsion for writing it, we should drop this whole idea. 3357The incentive for doing it should come from translators rather than 3358programmers, as opinions from an experienced translator are surely 3359more worth to me than opinions from programmers @emph{thinking} about 3360how @emph{others} should do translation. 3361 3362@efindex n@r{, PO Mode command} 3363@efindex po-next-entry@r{, PO Mode command} 3364@efindex p@r{, PO Mode command} 3365@efindex po-previous-entry@r{, PO Mode command} 3366The commands @kbd{n} (@code{po-next-entry}) and @kbd{p} 3367(@code{po-previous-entry}) move the cursor the entry following, 3368or preceding, the current one. If @kbd{n} is given while the 3369cursor is on the last entry of the PO file, or if @kbd{p} 3370is given while the cursor is on the first entry, no move is done. 3371 3372@efindex <@r{, PO Mode command} 3373@efindex po-first-entry@r{, PO Mode command} 3374@efindex >@r{, PO Mode command} 3375@efindex po-last-entry@r{, PO Mode command} 3376The commands @kbd{<} (@code{po-first-entry}) and @kbd{>} 3377(@code{po-last-entry}) move the cursor to the first entry, or last 3378entry, of the PO file. When the cursor is located past the last 3379entry in a PO file, most PO mode commands will return an error saying 3380@samp{After last entry}. Moreover, the commands @kbd{<} and @kbd{>} 3381have the special property of being able to work even when the cursor 3382is not into some PO file entry, and one may use them for nicely 3383correcting this situation. But even these commands will fail on a 3384truly empty PO file. There are development plans for the PO mode for it 3385to interactively fill an empty PO file from sources. @xref{Marking}. 3386 3387The translator may decide, before working at the translation of 3388a particular entry, that she needs to browse the remainder of the 3389PO file, maybe for finding the terminology or phraseology used 3390in related entries. She can of course use the standard Emacs idioms 3391for saving the current cursor location in some register, and use that 3392register for getting back, or else, use the location ring. 3393 3394@efindex m@r{, PO Mode command} 3395@efindex po-push-location@r{, PO Mode command} 3396@efindex r@r{, PO Mode command} 3397@efindex po-pop-location@r{, PO Mode command} 3398PO mode offers another approach, by which cursor locations may be saved 3399onto a special stack. The command @kbd{m} (@code{po-push-location}) 3400merely adds the location of current entry to the stack, pushing 3401the already saved locations under the new one. The command 3402@kbd{r} (@code{po-pop-location}) consumes the top stack element and 3403repositions the cursor to the entry associated with that top element. 3404This position is then lost, for the next @kbd{r} will move the cursor 3405to the previously saved location, and so on until no locations remain 3406on the stack. 3407 3408If the translator wants the position to be kept on the location stack, 3409maybe for taking a look at the entry associated with the top 3410element, then go elsewhere with the intent of getting back later, she 3411ought to use @kbd{m} immediately after @kbd{r}. 3412 3413@efindex x@r{, PO Mode command} 3414@efindex po-exchange-location@r{, PO Mode command} 3415The command @kbd{x} (@code{po-exchange-location}) simultaneously 3416repositions the cursor to the entry associated with the top element of 3417the stack of saved locations, and replaces that top element with the 3418location of the current entry before the move. Consequently, repeating 3419the @kbd{x} command toggles alternatively between two entries. 3420For achieving this, the translator will position the cursor on the 3421first entry, use @kbd{m}, then position to the second entry, and 3422merely use @kbd{x} for making the switch. 3423 3424@node Normalizing, Translated Entries, Entry Positioning, PO Mode 3425@subsection Normalizing Strings in Entries 3426@cindex string normalization in entries 3427 3428There are many different ways for encoding a particular string into a 3429PO file entry, because there are so many different ways to split and 3430quote multi-line strings, and even, to represent special characters 3431by backslashed escaped sequences. Some features of PO mode rely on 3432the ability for PO mode to scan an already existing PO file for a 3433particular string encoded into the @code{msgid} field of some entry. 3434Even if PO mode has internally all the built-in machinery for 3435implementing this recognition easily, doing it fast is technically 3436difficult. To facilitate a solution to this efficiency problem, 3437we decided on a canonical representation for strings. 3438 3439A conventional representation of strings in a PO file is currently 3440under discussion, and PO mode experiments with a canonical representation. 3441Having both @code{xgettext} and PO mode converging towards a uniform 3442way of representing equivalent strings would be useful, as the internal 3443normalization needed by PO mode could be automatically satisfied 3444when using @code{xgettext} from GNU @code{gettext}. An explicit 3445PO mode normalization should then be only necessary for PO files 3446imported from elsewhere, or for when the convention itself evolves. 3447 3448So, for achieving normalization of at least the strings of a given 3449PO file needing a canonical representation, the following PO mode 3450command is available: 3451 3452@emindex string normalization in entries 3453@table @kbd 3454@item M-x po-normalize 3455@efindex po-normalize@r{, PO Mode command} 3456Tidy the whole PO file by making entries more uniform. 3457 3458@end table 3459 3460The special command @kbd{M-x po-normalize}, which has no associated 3461keys, revises all entries, ensuring that strings of both original 3462and translated entries use uniform internal quoting in the PO file. 3463It also removes any crumb after the last entry. This command may be 3464useful for PO files freshly imported from elsewhere, or if we ever 3465improve on the canonical quoting format we use. This canonical format 3466is not only meant for getting cleaner PO files, but also for greatly 3467speeding up @code{msgid} string lookup for some other PO mode commands. 3468 3469@kbd{M-x po-normalize} presently makes three passes over the entries. 3470The first implements heuristics for converting PO files for GNU 3471@code{gettext} 0.6 and earlier, in which @code{msgid} and @code{msgstr} 3472fields were using K&R style C string syntax for multi-line strings. 3473These heuristics may fail for comments not related to obsolete 3474entries and ending with a backslash; they also depend on subsequent 3475passes for finalizing the proper commenting of continued lines for 3476obsolete entries. This first pass might disappear once all oldish PO 3477files would have been adjusted. The second and third pass normalize 3478all @code{msgid} and @code{msgstr} strings respectively. They also 3479clean out those trailing backslashes used by XView's @code{msgfmt} 3480for continued lines. 3481 3482@cindex importing PO files 3483Having such an explicit normalizing command allows for importing PO 3484files from other sources, but also eases the evolution of the current 3485convention, evolution driven mostly by aesthetic concerns, as of now. 3486It is easy to make suggested adjustments at a later time, as the 3487normalizing command and eventually, other GNU @code{gettext} tools 3488should greatly automate conformance. A description of the canonical 3489string format is given below, for the particular benefit of those not 3490having Emacs handy, and who would nevertheless want to handcraft 3491their PO files in nice ways. 3492 3493@cindex multi-line strings 3494Right now, in PO mode, strings are single line or multi-line. A string 3495goes multi-line if and only if it has @emph{embedded} newlines, that 3496is, if it matches @samp{[^\n]\n+[^\n]}. So, we would have: 3497 3498@example 3499msgstr "\n\nHello, world!\n\n\n" 3500@end example 3501 3502but, replacing the space by a newline, this becomes: 3503 3504@example 3505msgstr "" 3506"\n" 3507"\n" 3508"Hello,\n" 3509"world!\n" 3510"\n" 3511"\n" 3512@end example 3513 3514We are deliberately using a caricatural example, here, to make the 3515point clearer. Usually, multi-lines are not that bad looking. 3516It is probable that we will implement the following suggestion. 3517We might lump together all initial newlines into the empty string, 3518and also all newlines introducing empty lines (that is, for @w{@var{n} 3519> 1}, the @var{n}-1'th last newlines would go together on a separate 3520string), so making the previous example appear: 3521 3522@example 3523msgstr "\n\n" 3524"Hello,\n" 3525"world!\n" 3526"\n\n" 3527@end example 3528 3529There are a few yet undecided little points about string normalization, 3530to be documented in this manual, once these questions settle. 3531 3532@node Translated Entries, Fuzzy Entries, Normalizing, PO Mode 3533@subsection Translated Entries 3534@cindex translated entries 3535 3536Each PO file entry for which the @code{msgstr} field has been filled with 3537a translation, and which is not marked as fuzzy (@pxref{Fuzzy Entries}), 3538is said to be a @dfn{translated} entry. Only translated entries will 3539later be compiled by GNU @code{msgfmt} and become usable in programs. 3540Other entry types will be excluded; translation will not occur for them. 3541 3542@emindex moving by translated entries 3543Some commands are more specifically related to translated entry processing. 3544 3545@table @kbd 3546@item t 3547@efindex t@r{, PO Mode command} 3548Find the next translated entry (@code{po-next-translated-entry}). 3549 3550@item T 3551@efindex T@r{, PO Mode command} 3552Find the previous translated entry (@code{po-previous-translated-entry}). 3553 3554@end table 3555 3556@efindex t@r{, PO Mode command} 3557@efindex po-next-translated-entry@r{, PO Mode command} 3558@efindex T@r{, PO Mode command} 3559@efindex po-previous-translated-entry@r{, PO Mode command} 3560The commands @kbd{t} (@code{po-next-translated-entry}) and @kbd{T} 3561(@code{po-previous-translated-entry}) move forwards or backwards, chasing 3562for an translated entry. If none is found, the search is extended and 3563wraps around in the PO file buffer. 3564 3565@evindex po-auto-fuzzy-on-edit@r{, PO Mode variable} 3566Translated entries usually result from the translator having edited in 3567a translation for them, @ref{Modifying Translations}. However, if the 3568variable @code{po-auto-fuzzy-on-edit} is not @code{nil}, the entry having 3569received a new translation first becomes a fuzzy entry, which ought to 3570be later unfuzzied before becoming an official, genuine translated entry. 3571@xref{Fuzzy Entries}. 3572 3573@node Fuzzy Entries, Untranslated Entries, Translated Entries, PO Mode 3574@subsection Fuzzy Entries 3575@cindex fuzzy entries 3576 3577@cindex attributes of a PO file entry 3578@cindex attribute, fuzzy 3579Each PO file entry may have a set of @dfn{attributes}, which are 3580qualities given a name and explicitly associated with the translation, 3581using a special system comment. One of these attributes 3582has the name @code{fuzzy}, and entries having this attribute are said 3583to have a fuzzy translation. They are called fuzzy entries, for short. 3584 3585Fuzzy entries, even if they account for translated entries for 3586most other purposes, usually call for revision by the translator. 3587Those may be produced by applying the program @code{msgmerge} to 3588update an older translated PO files according to a new PO template 3589file, when this tool hypothesises that some new @code{msgid} has 3590been modified only slightly out of an older one, and chooses to pair 3591what it thinks to be the old translation for the new modified entry. 3592The slight alteration in the original string (the @code{msgid} string) 3593should often be reflected in the translated string, and this requires 3594the intervention of the translator. For this reason, @code{msgmerge} 3595might mark some entries as being fuzzy. 3596 3597@emindex moving by fuzzy entries 3598Also, the translator may decide herself to mark an entry as fuzzy 3599for her own convenience, when she wants to remember that the entry 3600has to be later revisited. So, some commands are more specifically 3601related to fuzzy entry processing. 3602 3603@table @kbd 3604@item z 3605@efindex z@r{, PO Mode command} 3606@c better append "-entry" all the time. -ke- 3607Find the next fuzzy entry (@code{po-next-fuzzy-entry}). 3608 3609@item Z 3610@efindex Z@r{, PO Mode command} 3611Find the previous fuzzy entry (@code{po-previous-fuzzy-entry}). 3612 3613@item @key{TAB} 3614@efindex TAB@r{, PO Mode command} 3615Remove the fuzzy attribute of the current entry (@code{po-unfuzzy}). 3616 3617@end table 3618 3619@efindex z@r{, PO Mode command} 3620@efindex po-next-fuzzy-entry@r{, PO Mode command} 3621@efindex Z@r{, PO Mode command} 3622@efindex po-previous-fuzzy-entry@r{, PO Mode command} 3623The commands @kbd{z} (@code{po-next-fuzzy-entry}) and @kbd{Z} 3624(@code{po-previous-fuzzy-entry}) move forwards or backwards, chasing for 3625a fuzzy entry. If none is found, the search is extended and wraps 3626around in the PO file buffer. 3627 3628@efindex TAB@r{, PO Mode command} 3629@efindex po-unfuzzy@r{, PO Mode command} 3630@evindex po-auto-select-on-unfuzzy@r{, PO Mode variable} 3631The command @kbd{@key{TAB}} (@code{po-unfuzzy}) removes the fuzzy 3632attribute associated with an entry, usually leaving it translated. 3633Further, if the variable @code{po-auto-select-on-unfuzzy} has not 3634the @code{nil} value, the @kbd{@key{TAB}} command will automatically chase 3635for another interesting entry to work on. The initial value of 3636@code{po-auto-select-on-unfuzzy} is @code{nil}. 3637 3638The initial value of @code{po-auto-fuzzy-on-edit} is @code{nil}. However, 3639if the variable @code{po-auto-fuzzy-on-edit} is set to @code{t}, any entry 3640edited through the @kbd{@key{RET}} command is marked fuzzy, as a way to 3641ensure some kind of double check, later. In this case, the usual paradigm 3642is that an entry becomes fuzzy (if not already) whenever the translator 3643modifies it. If she is satisfied with the translation, she then uses 3644@kbd{@key{TAB}} to pick another entry to work on, clearing the fuzzy attribute 3645on the same blow. If she is not satisfied yet, she merely uses @kbd{@key{SPC}} 3646to chase another entry, leaving the entry fuzzy. 3647 3648@efindex DEL@r{, PO Mode command} 3649@efindex po-fade-out-entry@r{, PO Mode command} 3650The translator may also use the @kbd{@key{DEL}} command 3651(@code{po-fade-out-entry}) over any translated entry to mark it as being 3652fuzzy, when she wants to easily leave a trace she wants to later return 3653working at this entry. 3654 3655Also, when time comes to quit working on a PO file buffer with the @kbd{q} 3656command, the translator is asked for confirmation, if fuzzy string 3657still exists. 3658 3659@node Untranslated Entries, Obsolete Entries, Fuzzy Entries, PO Mode 3660@subsection Untranslated Entries 3661@cindex untranslated entries 3662 3663When @code{xgettext} originally creates a PO file, unless told 3664otherwise, it initializes the @code{msgid} field with the untranslated 3665string, and leaves the @code{msgstr} string to be empty. Such entries, 3666having an empty translation, are said to be @dfn{untranslated} entries. 3667Later, when the programmer slightly modifies some string right in 3668the program, this change is later reflected in the PO file 3669by the appearance of a new untranslated entry for the modified string. 3670 3671The usual commands moving from entry to entry consider untranslated 3672entries on the same level as active entries. Untranslated entries 3673are easily recognizable by the fact they end with @w{@samp{msgstr ""}}. 3674 3675@emindex moving by untranslated entries 3676The work of the translator might be (quite naively) seen as the process 3677of seeking for an untranslated entry, editing a translation for 3678it, and repeating these actions until no untranslated entries remain. 3679Some commands are more specifically related to untranslated entry 3680processing. 3681 3682@table @kbd 3683@item u 3684@efindex u@r{, PO Mode command} 3685Find the next untranslated entry (@code{po-next-untranslated-entry}). 3686 3687@item U 3688@efindex U@r{, PO Mode command} 3689Find the previous untranslated entry (@code{po-previous-untransted-entry}). 3690 3691@item k 3692@efindex k@r{, PO Mode command} 3693Turn the current entry into an untranslated one (@code{po-kill-msgstr}). 3694 3695@end table 3696 3697@efindex u@r{, PO Mode command} 3698@efindex po-next-untranslated-entry@r{, PO Mode command} 3699@efindex U@r{, PO Mode command} 3700@efindex po-previous-untransted-entry@r{, PO Mode command} 3701The commands @kbd{u} (@code{po-next-untranslated-entry}) and @kbd{U} 3702(@code{po-previous-untransted-entry}) move forwards or backwards, 3703chasing for an untranslated entry. If none is found, the search is 3704extended and wraps around in the PO file buffer. 3705 3706@efindex k@r{, PO Mode command} 3707@efindex po-kill-msgstr@r{, PO Mode command} 3708An entry can be turned back into an untranslated entry by 3709merely emptying its translation, using the command @kbd{k} 3710(@code{po-kill-msgstr}). @xref{Modifying Translations}. 3711 3712Also, when time comes to quit working on a PO file buffer 3713with the @kbd{q} command, the translator is asked for confirmation, 3714if some untranslated string still exists. 3715 3716@node Obsolete Entries, Modifying Translations, Untranslated Entries, PO Mode 3717@subsection Obsolete Entries 3718@cindex obsolete entries 3719 3720By @dfn{obsolete} PO file entries, we mean those entries which are 3721commented out, usually by @code{msgmerge} when it found that the 3722translation is not needed anymore by the package being localized. 3723 3724The usual commands moving from entry to entry consider obsolete 3725entries on the same level as active entries. Obsolete entries are 3726easily recognizable by the fact that all their lines start with 3727@code{#}, even those lines containing @code{msgid} or @code{msgstr}. 3728 3729Commands exist for emptying the translation or reinitializing it 3730to the original untranslated string. Commands interfacing with the 3731kill ring may force some previously saved text into the translation. 3732The user may interactively edit the translation. All these commands 3733may apply to obsolete entries, carefully leaving the entry obsolete 3734after the fact. 3735 3736@emindex moving by obsolete entries 3737Moreover, some commands are more specifically related to obsolete 3738entry processing. 3739 3740@table @kbd 3741@item o 3742@efindex o@r{, PO Mode command} 3743Find the next obsolete entry (@code{po-next-obsolete-entry}). 3744 3745@item O 3746@efindex O@r{, PO Mode command} 3747Find the previous obsolete entry (@code{po-previous-obsolete-entry}). 3748 3749@item @key{DEL} 3750@efindex DEL@r{, PO Mode command} 3751Make an active entry obsolete, or zap out an obsolete entry 3752(@code{po-fade-out-entry}). 3753 3754@end table 3755 3756@efindex o@r{, PO Mode command} 3757@efindex po-next-obsolete-entry@r{, PO Mode command} 3758@efindex O@r{, PO Mode command} 3759@efindex po-previous-obsolete-entry@r{, PO Mode command} 3760The commands @kbd{o} (@code{po-next-obsolete-entry}) and @kbd{O} 3761(@code{po-previous-obsolete-entry}) move forwards or backwards, 3762chasing for an obsolete entry. If none is found, the search is 3763extended and wraps around in the PO file buffer. 3764 3765PO mode does not provide ways for un-commenting an obsolete entry 3766and making it active, because this would reintroduce an original 3767untranslated string which does not correspond to any marked string 3768in the program sources. This goes with the philosophy of never 3769introducing useless @code{msgid} values. 3770 3771@efindex DEL@r{, PO Mode command} 3772@efindex po-fade-out-entry@r{, PO Mode command} 3773@emindex obsolete active entry 3774@emindex comment out PO file entry 3775However, it is possible to comment out an active entry, so making 3776it obsolete. GNU @code{gettext} utilities will later react to the 3777disappearance of a translation by using the untranslated string. 3778The command @kbd{@key{DEL}} (@code{po-fade-out-entry}) pushes the current entry 3779a little further towards annihilation. If the entry is active (it is a 3780translated entry), then it is first made fuzzy. If it is already fuzzy, 3781then the entry is merely commented out, with confirmation. If the entry 3782is already obsolete, then it is completely deleted from the PO file. 3783It is easy to recycle the translation so deleted into some other PO file 3784entry, usually one which is untranslated. @xref{Modifying Translations}. 3785 3786Here is a quite interesting problem to solve for later development of 3787PO mode, for those nights you are not sleepy. The idea would be that 3788PO mode might become bright enough, one of these days, to make good 3789guesses at retrieving the most probable candidate, among all obsolete 3790entries, for initializing the translation of a newly appeared string. 3791I think it might be a quite hard problem to do this algorithmically, as 3792we have to develop good and efficient measures of string similarity. 3793Right now, PO mode completely lets the decision to the translator, 3794when the time comes to find the adequate obsolete translation, it 3795merely tries to provide handy tools for helping her to do so. 3796 3797@node Modifying Translations, Modifying Comments, Obsolete Entries, PO Mode 3798@subsection Modifying Translations 3799@cindex editing translations 3800@emindex editing translations 3801 3802PO mode prevents direct modification of the PO file, by the usual 3803means Emacs gives for altering a buffer's contents. By doing so, 3804it pretends helping the translator to avoid little clerical errors 3805about the overall file format, or the proper quoting of strings, 3806as those errors would be easily made. Other kinds of errors are 3807still possible, but some may be caught and diagnosed by the batch 3808validation process, which the translator may always trigger by the 3809@kbd{V} command. For all other errors, the translator has to rely on 3810her own judgment, and also on the linguistic reports submitted to her 3811by the users of the translated package, having the same mother tongue. 3812 3813When the time comes to create a translation, correct an error diagnosed 3814mechanically or reported by a user, the translators have to resort to 3815using the following commands for modifying the translations. 3816 3817@table @kbd 3818@item @key{RET} 3819@efindex RET@r{, PO Mode command} 3820Interactively edit the translation (@code{po-edit-msgstr}). 3821 3822@item @key{LFD} 3823@itemx C-j 3824@efindex LFD@r{, PO Mode command} 3825@efindex C-j@r{, PO Mode command} 3826Reinitialize the translation with the original, untranslated string 3827(@code{po-msgid-to-msgstr}). 3828 3829@item k 3830@efindex k@r{, PO Mode command} 3831Save the translation on the kill ring, and delete it (@code{po-kill-msgstr}). 3832 3833@item w 3834@efindex w@r{, PO Mode command} 3835Save the translation on the kill ring, without deleting it 3836(@code{po-kill-ring-save-msgstr}). 3837 3838@item y 3839@efindex y@r{, PO Mode command} 3840Replace the translation, taking the new from the kill ring 3841(@code{po-yank-msgstr}). 3842 3843@end table 3844 3845@efindex RET@r{, PO Mode command} 3846@efindex po-edit-msgstr@r{, PO Mode command} 3847The command @kbd{@key{RET}} (@code{po-edit-msgstr}) opens a new Emacs 3848window meant to edit in a new translation, or to modify an already existing 3849translation. The new window contains a copy of the translation taken from 3850the current PO file entry, all ready for edition, expunged of all quoting 3851marks, fully modifiable and with the complete extent of Emacs modifying 3852commands. When the translator is done with her modifications, she may use 3853@w{@kbd{C-c C-c}} to close the subedit window with the automatically requoted 3854results, or @w{@kbd{C-c C-k}} to abort her modifications. @xref{Subedit}, 3855for more information. 3856 3857@efindex LFD@r{, PO Mode command} 3858@efindex C-j@r{, PO Mode command} 3859@efindex po-msgid-to-msgstr@r{, PO Mode command} 3860The command @kbd{@key{LFD}} (@code{po-msgid-to-msgstr}) initializes, or 3861reinitializes the translation with the original string. This command is 3862normally used when the translator wants to redo a fresh translation of 3863the original string, disregarding any previous work. 3864 3865@evindex po-auto-edit-with-msgid@r{, PO Mode variable} 3866It is possible to arrange so, whenever editing an untranslated 3867entry, the @kbd{@key{LFD}} command be automatically executed. If you set 3868@code{po-auto-edit-with-msgid} to @code{t}, the translation gets 3869initialised with the original string, in case none exists already. 3870The default value for @code{po-auto-edit-with-msgid} is @code{nil}. 3871 3872@emindex starting a string translation 3873In fact, whether it is best to start a translation with an empty 3874string, or rather with a copy of the original string, is a matter of 3875taste or habit. Sometimes, the source language and the 3876target language are so different that is simply best to start writing 3877on an empty page. At other times, the source and target languages 3878are so close that it would be a waste to retype a number of words 3879already being written in the original string. A translator may also 3880like having the original string right under her eyes, as she will 3881progressively overwrite the original text with the translation, even 3882if this requires some extra editing work to get rid of the original. 3883 3884@emindex cut and paste for translated strings 3885@efindex k@r{, PO Mode command} 3886@efindex po-kill-msgstr@r{, PO Mode command} 3887@efindex w@r{, PO Mode command} 3888@efindex po-kill-ring-save-msgstr@r{, PO Mode command} 3889The command @kbd{k} (@code{po-kill-msgstr}) merely empties the 3890translation string, so turning the entry into an untranslated 3891one. But while doing so, its previous contents is put apart in 3892a special place, known as the kill ring. The command @kbd{w} 3893(@code{po-kill-ring-save-msgstr}) has also the effect of taking a 3894copy of the translation onto the kill ring, but it otherwise leaves 3895the entry alone, and does @emph{not} remove the translation from the 3896entry. Both commands use exactly the Emacs kill ring, which is shared 3897between buffers, and which is well known already to Emacs lovers. 3898 3899The translator may use @kbd{k} or @kbd{w} many times in the course 3900of her work, as the kill ring may hold several saved translations. 3901From the kill ring, strings may later be reinserted in various 3902Emacs buffers. In particular, the kill ring may be used for moving 3903translation strings between different entries of a single PO file 3904buffer, or if the translator is handling many such buffers at once, 3905even between PO files. 3906 3907To facilitate exchanges with buffers which are not in PO mode, the 3908translation string put on the kill ring by the @kbd{k} command is fully 3909unquoted before being saved: external quotes are removed, multi-line 3910strings are concatenated, and backslash escaped sequences are turned 3911into their corresponding characters. In the special case of obsolete 3912entries, the translation is also uncommented prior to saving. 3913 3914@efindex y@r{, PO Mode command} 3915@efindex po-yank-msgstr@r{, PO Mode command} 3916The command @kbd{y} (@code{po-yank-msgstr}) completely replaces the 3917translation of the current entry by a string taken from the kill ring. 3918Following Emacs terminology, we then say that the replacement 3919string is @dfn{yanked} into the PO file buffer. 3920@xref{Yanking, , , emacs, The Emacs Editor}. 3921The first time @kbd{y} is used, the translation receives the value of 3922the most recent addition to the kill ring. If @kbd{y} is typed once 3923again, immediately, without intervening keystrokes, the translation 3924just inserted is taken away and replaced by the second most recent 3925addition to the kill ring. By repeating @kbd{y} many times in a row, 3926the translator may travel along the kill ring for saved strings, 3927until she finds the string she really wanted. 3928 3929When a string is yanked into a PO file entry, it is fully and 3930automatically requoted for complying with the format PO files should 3931have. Further, if the entry is obsolete, PO mode then appropriately 3932push the inserted string inside comments. Once again, translators 3933should not burden themselves with quoting considerations besides, of 3934course, the necessity of the translated string itself respective to 3935the program using it. 3936 3937Note that @kbd{k} or @kbd{w} are not the only commands pushing strings 3938on the kill ring, as almost any PO mode command replacing translation 3939strings (or the translator comments) automatically saves the old string 3940on the kill ring. The main exceptions to this general rule are the 3941yanking commands themselves. 3942 3943@emindex using obsolete translations to make new entries 3944To better illustrate the operation of killing and yanking, let's 3945use an actual example, taken from a common situation. When the 3946programmer slightly modifies some string right in the program, his 3947change is later reflected in the PO file by the appearance 3948of a new untranslated entry for the modified string, and the fact 3949that the entry translating the original or unmodified string becomes 3950obsolete. In many cases, the translator might spare herself some work 3951by retrieving the unmodified translation from the obsolete entry, 3952then initializing the untranslated entry @code{msgstr} field with 3953this retrieved translation. Once this done, the obsolete entry is 3954not wanted anymore, and may be safely deleted. 3955 3956When the translator finds an untranslated entry and suspects that a 3957slight variant of the translation exists, she immediately uses @kbd{m} 3958to mark the current entry location, then starts chasing obsolete 3959entries with @kbd{o}, hoping to find some translation corresponding 3960to the unmodified string. Once found, she uses the @kbd{@key{DEL}} command 3961for deleting the obsolete entry, knowing that @kbd{@key{DEL}} also @emph{kills} 3962the translation, that is, pushes the translation on the kill ring. 3963Then, @kbd{r} returns to the initial untranslated entry, and @kbd{y} 3964then @emph{yanks} the saved translation right into the @code{msgstr} 3965field. The translator is then free to use @kbd{@key{RET}} for fine 3966tuning the translation contents, and maybe to later use @kbd{u}, 3967then @kbd{m} again, for going on with the next untranslated string. 3968 3969When some sequence of keys has to be typed over and over again, the 3970translator may find it useful to become better acquainted with the Emacs 3971capability of learning these sequences and playing them back under request. 3972@xref{Keyboard Macros, , , emacs, The Emacs Editor}. 3973 3974@node Modifying Comments, Subedit, Modifying Translations, PO Mode 3975@subsection Modifying Comments 3976@cindex editing comments in PO files 3977@emindex editing comments 3978 3979Any translation work done seriously will raise many linguistic 3980difficulties, for which decisions have to be made, and the choices 3981further documented. These documents may be saved within the 3982PO file in form of translator comments, which the translator 3983is free to create, delete, or modify at will. These comments may 3984be useful to herself when she returns to this PO file after a while. 3985 3986Comments not having whitespace after the initial @samp{#}, for example, 3987those beginning with @samp{#.} or @samp{#:}, are @emph{not} translator 3988comments, they are exclusively created by other @code{gettext} tools. 3989So, the commands below will never alter such system added comments, 3990they are not meant for the translator to modify. @xref{PO Files}. 3991 3992The following commands are somewhat similar to those modifying translations, 3993so the general indications given for those apply here. @xref{Modifying 3994Translations}. 3995 3996@table @kbd 3997 3998@item # 3999@efindex #@r{, PO Mode command} 4000Interactively edit the translator comments (@code{po-edit-comment}). 4001 4002@item K 4003@efindex K@r{, PO Mode command} 4004Save the translator comments on the kill ring, and delete it 4005(@code{po-kill-comment}). 4006 4007@item W 4008@efindex W@r{, PO Mode command} 4009Save the translator comments on the kill ring, without deleting it 4010(@code{po-kill-ring-save-comment}). 4011 4012@item Y 4013@efindex Y@r{, PO Mode command} 4014Replace the translator comments, taking the new from the kill ring 4015(@code{po-yank-comment}). 4016 4017@end table 4018 4019These commands parallel PO mode commands for modifying the translation 4020strings, and behave much the same way as they do, except that they handle 4021this part of PO file comments meant for translator usage, rather 4022than the translation strings. So, if the descriptions given below are 4023slightly succinct, it is because the full details have already been given. 4024@xref{Modifying Translations}. 4025 4026@efindex #@r{, PO Mode command} 4027@efindex po-edit-comment@r{, PO Mode command} 4028The command @kbd{#} (@code{po-edit-comment}) opens a new Emacs window 4029containing a copy of the translator comments on the current PO file entry. 4030If there are no such comments, PO mode understands that the translator wants 4031to add a comment to the entry, and she is presented with an empty screen. 4032Comment marks (@code{#}) and the space following them are automatically 4033removed before edition, and reinstated after. For translator comments 4034pertaining to obsolete entries, the uncommenting and recommenting operations 4035are done twice. Once in the editing window, the keys @w{@kbd{C-c C-c}} 4036allow the translator to tell she is finished with editing the comment. 4037@xref{Subedit}, for further details. 4038 4039@evindex po-subedit-mode-hook@r{, PO Mode variable} 4040Functions found on @code{po-subedit-mode-hook}, if any, are executed after 4041the string has been inserted in the edit buffer. 4042 4043@efindex K@r{, PO Mode command} 4044@efindex po-kill-comment@r{, PO Mode command} 4045@efindex W@r{, PO Mode command} 4046@efindex po-kill-ring-save-comment@r{, PO Mode command} 4047@efindex Y@r{, PO Mode command} 4048@efindex po-yank-comment@r{, PO Mode command} 4049The command @kbd{K} (@code{po-kill-comment}) gets rid of all 4050translator comments, while saving those comments on the kill ring. 4051The command @kbd{W} (@code{po-kill-ring-save-comment}) takes 4052a copy of the translator comments on the kill ring, but leaves 4053them undisturbed in the current entry. The command @kbd{Y} 4054(@code{po-yank-comment}) completely replaces the translator comments 4055by a string taken at the front of the kill ring. When this command 4056is immediately repeated, the comments just inserted are withdrawn, 4057and replaced by other strings taken along the kill ring. 4058 4059On the kill ring, all strings have the same nature. There is no 4060distinction between @emph{translation} strings and @emph{translator 4061comments} strings. So, for example, let's presume the translator 4062has just finished editing a translation, and wants to create a new 4063translator comment to document why the previous translation was 4064not good, just to remember what was the problem. Foreseeing that she 4065will do that in her documentation, the translator may want to quote 4066the previous translation in her translator comments. To do so, she 4067may initialize the translator comments with the previous translation, 4068still at the head of the kill ring. Because editing already pushed the 4069previous translation on the kill ring, she merely has to type @kbd{M-w} 4070prior to @kbd{#}, and the previous translation will be right there, 4071all ready for being introduced by some explanatory text. 4072 4073On the other hand, presume there are some translator comments already 4074and that the translator wants to add to those comments, instead 4075of wholly replacing them. Then, she should edit the comment right 4076away with @kbd{#}. Once inside the editing window, she can use the 4077regular Emacs commands @kbd{C-y} (@code{yank}) and @kbd{M-y} 4078(@code{yank-pop}) to get the previous translation where she likes. 4079 4080@node Subedit, C Sources Context, Modifying Comments, PO Mode 4081@subsection Details of Sub Edition 4082@emindex subedit minor mode 4083 4084The PO subedit minor mode has a few peculiarities worth being described 4085in fuller detail. It installs a few commands over the usual editing set 4086of Emacs, which are described below. 4087 4088@table @kbd 4089@item C-c C-c 4090@efindex C-c C-c@r{, PO Mode command} 4091Complete edition (@code{po-subedit-exit}). 4092 4093@item C-c C-k 4094@efindex C-c C-k@r{, PO Mode command} 4095Abort edition (@code{po-subedit-abort}). 4096 4097@item C-c C-a 4098@efindex C-c C-a@r{, PO Mode command} 4099Consult auxiliary PO files (@code{po-subedit-cycle-auxiliary}). 4100 4101@end table 4102 4103@emindex exiting PO subedit 4104@efindex C-c C-c@r{, PO Mode command} 4105@efindex po-subedit-exit@r{, PO Mode command} 4106The window's contents represents a translation for a given message, 4107or a translator comment. The translator may modify this window to 4108her heart's content. Once this is done, the command @w{@kbd{C-c C-c}} 4109(@code{po-subedit-exit}) may be used to return the edited translation into 4110the PO file, replacing the original translation, even if it moved out of 4111sight or if buffers were switched. 4112 4113@efindex C-c C-k@r{, PO Mode command} 4114@efindex po-subedit-abort@r{, PO Mode command} 4115If the translator becomes unsatisfied with her translation or comment, 4116to the extent she prefers keeping what was existent prior to the 4117@kbd{@key{RET}} or @kbd{#} command, she may use the command @w{@kbd{C-c C-k}} 4118(@code{po-subedit-abort}) to merely get rid of edition, while preserving 4119the original translation or comment. Another way would be for her to exit 4120normally with @w{@kbd{C-c C-c}}, then type @code{U} once for undoing the 4121whole effect of last edition. 4122 4123@efindex C-c C-a@r{, PO Mode command} 4124@efindex po-subedit-cycle-auxiliary@r{, PO Mode command} 4125The command @w{@kbd{C-c C-a}} (@code{po-subedit-cycle-auxiliary}) 4126allows for glancing through translations 4127already achieved in other languages, directly while editing the current 4128translation. This may be quite convenient when the translator is fluent 4129at many languages, but of course, only makes sense when such completed 4130auxiliary PO files are already available to her (@pxref{Auxiliary}). 4131 4132Functions found on @code{po-subedit-mode-hook}, if any, are executed after 4133the string has been inserted in the edit buffer. 4134 4135While editing her translation, the translator should pay attention to not 4136inserting unwanted @kbd{@key{RET}} (newline) characters at the end of 4137the translated string if those are not meant to be there, or to removing 4138such characters when they are required. Since these characters are not 4139visible in the editing buffer, they are easily introduced by mistake. 4140To help her, @kbd{@key{RET}} automatically puts the character @code{<} 4141at the end of the string being edited, but this @code{<} is not really 4142part of the string. On exiting the editing window with @w{@kbd{C-c C-c}}, 4143PO mode automatically removes such @kbd{<} and all whitespace added after 4144it. If the translator adds characters after the terminating @code{<}, it 4145looses its delimiting property and integrally becomes part of the string. 4146If she removes the delimiting @code{<}, then the edited string is taken 4147@emph{as is}, with all trailing newlines, even if invisible. Also, if 4148the translated string ought to end itself with a genuine @code{<}, then 4149the delimiting @code{<} may not be removed; so the string should appear, 4150in the editing window, as ending with two @code{<} in a row. 4151 4152@emindex editing multiple entries 4153When a translation (or a comment) is being edited, the translator may move 4154the cursor back into the PO file buffer and freely move to other entries, 4155browsing at will. If, with an edition pending, the translator wanders in the 4156PO file buffer, she may decide to start modifying another entry. Each entry 4157being edited has its own subedit buffer. It is possible to simultaneously 4158edit the translation @emph{and} the comment of a single entry, or to 4159edit entries in different PO files, all at once. Typing @kbd{@key{RET}} 4160on a field already being edited merely resumes that particular edit. Yet, 4161the translator should better be comfortable at handling many Emacs windows! 4162 4163@emindex pending subedits 4164Pending subedits may be completed or aborted in any order, regardless 4165of how or when they were started. When many subedits are pending and the 4166translator asks for quitting the PO file (with the @kbd{q} command), subedits 4167are automatically resumed one at a time, so she may decide for each of them. 4168 4169@node C Sources Context, Auxiliary, Subedit, PO Mode 4170@subsection C Sources Context 4171@emindex consulting program sources 4172@emindex looking at the source to aid translation 4173@emindex use the source, Luke 4174 4175PO mode is particularly powerful when used with PO files 4176created through GNU @code{gettext} utilities, as those utilities 4177insert special comments in the PO files they generate. 4178Some of these special comments relate the PO file entry to 4179exactly where the untranslated string appears in the program sources. 4180 4181When the translator gets to an untranslated entry, she is fairly 4182often faced with an original string which is not as informative as 4183it normally should be, being succinct, cryptic, or otherwise ambiguous. 4184Before choosing how to translate the string, she needs to understand 4185better what the string really means and how tight the translation has 4186to be. Most of the time, when problems arise, the only way left to make 4187her judgment is looking at the true program sources from where this 4188string originated, searching for surrounding comments the programmer 4189might have put in there, and looking around for helping clues of 4190@emph{any} kind. 4191 4192Surely, when looking at program sources, the translator will receive 4193more help if she is a fluent programmer. However, even if she is 4194not versed in programming and feels a little lost in C code, the 4195translator should not be shy at taking a look, once in a while. 4196It is most probable that she will still be able to find some of the 4197hints she needs. She will learn quickly to not feel uncomfortable 4198in program code, paying more attention to programmer's comments, 4199variable and function names (if he dared choosing them well), and 4200overall organization, than to the program code itself. 4201 4202@emindex find source fragment for a PO file entry 4203The following commands are meant to help the translator at getting 4204program source context for a PO file entry. 4205 4206@table @kbd 4207@item s 4208@efindex s@r{, PO Mode command} 4209Resume the display of a program source context, or cycle through them 4210(@code{po-cycle-source-reference}). 4211 4212@item M-s 4213@efindex M-s@r{, PO Mode command} 4214Display of a program source context selected by menu 4215(@code{po-select-source-reference}). 4216 4217@item S 4218@efindex S@r{, PO Mode command} 4219Add a directory to the search path for source files 4220(@code{po-consider-source-path}). 4221 4222@item M-S 4223@efindex M-S@r{, PO Mode command} 4224Delete a directory from the search path for source files 4225(@code{po-ignore-source-path}). 4226 4227@end table 4228 4229@efindex s@r{, PO Mode command} 4230@efindex po-cycle-source-reference@r{, PO Mode command} 4231@efindex M-s@r{, PO Mode command} 4232@efindex po-select-source-reference@r{, PO Mode command} 4233The commands @kbd{s} (@code{po-cycle-source-reference}) and @kbd{M-s} 4234(@code{po-select-source-reference}) both open another window displaying 4235some source program file, and already positioned in such a way that 4236it shows an actual use of the string to be translated. By doing 4237so, the command gives source program context for the string. But if 4238the entry has no source context references, or if all references 4239are unresolved along the search path for program sources, then the 4240command diagnoses this as an error. 4241 4242Even if @kbd{s} (or @kbd{M-s}) opens a new window, the cursor stays 4243in the PO file window. If the translator really wants to 4244get into the program source window, she ought to do it explicitly, 4245maybe by using command @kbd{O}. 4246 4247When @kbd{s} is typed for the first time, or for a PO file entry which 4248is different of the last one used for getting source context, then the 4249command reacts by giving the first context available for this entry, 4250if any. If some context has already been recently displayed for the 4251current PO file entry, and the translator wandered off to do other 4252things, typing @kbd{s} again will merely resume, in another window, 4253the context last displayed. In particular, if the translator moved 4254the cursor away from the context in the source file, the command will 4255bring the cursor back to the context. By using @kbd{s} many times 4256in a row, with no other commands intervening, PO mode will cycle to 4257the next available contexts for this particular entry, getting back 4258to the first context once the last has been shown. 4259 4260The command @kbd{M-s} behaves differently. Instead of cycling through 4261references, it lets the translator choose a particular reference among 4262many, and displays that reference. It is best used with completion, 4263if the translator types @kbd{@key{TAB}} immediately after @kbd{M-s}, in 4264response to the question, she will be offered a menu of all possible 4265references, as a reminder of which are the acceptable answers. 4266This command is useful only where there are really many contexts 4267available for a single string to translate. 4268 4269@efindex S@r{, PO Mode command} 4270@efindex po-consider-source-path@r{, PO Mode command} 4271@efindex M-S@r{, PO Mode command} 4272@efindex po-ignore-source-path@r{, PO Mode command} 4273Program source files are usually found relative to where the PO 4274file stands. As a special provision, when this fails, the file is 4275also looked for, but relative to the directory immediately above it. 4276Those two cases take proper care of most PO files. However, it might 4277happen that a PO file has been moved, or is edited in a different 4278place than its normal location. When this happens, the translator 4279should tell PO mode in which directory normally sits the genuine PO 4280file. Many such directories may be specified, and all together, they 4281constitute what is called the @dfn{search path} for program sources. 4282The command @kbd{S} (@code{po-consider-source-path}) is used to interactively 4283enter a new directory at the front of the search path, and the command 4284@kbd{M-S} (@code{po-ignore-source-path}) is used to select, with completion, 4285one of the directories she does not want anymore on the search path. 4286 4287@node Auxiliary, , C Sources Context, PO Mode 4288@subsection Consulting Auxiliary PO Files 4289@emindex consulting translations to other languages 4290 4291PO mode is able to help the knowledgeable translator, being fluent in 4292many languages, at taking advantage of translations already achieved 4293in other languages she just happens to know. It provides these other 4294language translations as additional context for her own work. Moreover, 4295it has features to ease the production of translations for many languages 4296at once, for translators preferring to work in this way. 4297 4298@cindex auxiliary PO file 4299@emindex auxiliary PO file 4300An @dfn{auxiliary} PO file is an existing PO file meant for the same 4301package the translator is working on, but targeted to a different mother 4302tongue language. Commands exist for declaring and handling auxiliary 4303PO files, and also for showing contexts for the entry under work. 4304 4305Here are the auxiliary file commands available in PO mode. 4306 4307@table @kbd 4308@item a 4309@efindex a@r{, PO Mode command} 4310Seek auxiliary files for another translation for the same entry 4311(@code{po-cycle-auxiliary}). 4312 4313@item C-c C-a 4314@efindex C-c C-a@r{, PO Mode command} 4315Switch to a particular auxiliary file (@code{po-select-auxiliary}). 4316 4317@item A 4318@efindex A@r{, PO Mode command} 4319Declare this PO file as an auxiliary file (@code{po-consider-as-auxiliary}). 4320 4321@item M-A 4322@efindex M-A@r{, PO Mode command} 4323Remove this PO file from the list of auxiliary files 4324(@code{po-ignore-as-auxiliary}). 4325 4326@end table 4327 4328@efindex A@r{, PO Mode command} 4329@efindex po-consider-as-auxiliary@r{, PO Mode command} 4330@efindex M-A@r{, PO Mode command} 4331@efindex po-ignore-as-auxiliary@r{, PO Mode command} 4332Command @kbd{A} (@code{po-consider-as-auxiliary}) adds the current 4333PO file to the list of auxiliary files, while command @kbd{M-A} 4334(@code{po-ignore-as-auxiliary} just removes it. 4335 4336@efindex a@r{, PO Mode command} 4337@efindex po-cycle-auxiliary@r{, PO Mode command} 4338The command @kbd{a} (@code{po-cycle-auxiliary}) seeks all auxiliary PO 4339files, round-robin, searching for a translated entry in some other language 4340having an @code{msgid} field identical as the one for the current entry. 4341The found PO file, if any, takes the place of the current PO file in 4342the display (its window gets on top). Before doing so, the current PO 4343file is also made into an auxiliary file, if not already. So, @kbd{a} 4344in this newly displayed PO file will seek another PO file, and so on, 4345so repeating @kbd{a} will eventually yield back the original PO file. 4346 4347@efindex C-c C-a@r{, PO Mode command} 4348@efindex po-select-auxiliary@r{, PO Mode command} 4349The command @kbd{C-c C-a} (@code{po-select-auxiliary}) asks the translator 4350for her choice of a particular auxiliary file, with completion, and 4351then switches to that selected PO file. The command also checks if 4352the selected file has an @code{msgid} field identical as the one for 4353the current entry, and if yes, this entry becomes current. Otherwise, 4354the cursor of the selected file is left undisturbed. 4355 4356For all this to work fully, auxiliary PO files will have to be normalized, 4357in that way that @code{msgid} fields should be written @emph{exactly} 4358the same way. It is possible to write @code{msgid} fields in various 4359ways for representing the same string, different writing would break the 4360proper behaviour of the auxiliary file commands of PO mode. This is not 4361expected to be much a problem in practice, as most existing PO files have 4362their @code{msgid} entries written by the same GNU @code{gettext} tools. 4363 4364@efindex normalize@r{, PO Mode command} 4365However, PO files initially created by PO mode itself, while marking 4366strings in source files, are normalised differently. So are PO 4367files resulting of the @samp{M-x normalize} command. Until these 4368discrepancies between PO mode and other GNU @code{gettext} tools get 4369fully resolved, the translator should stay aware of normalisation issues. 4370 4371@node Compendium, , PO Mode, Editing 4372@section Using Translation Compendia 4373@emindex using translation compendia 4374 4375@cindex compendium 4376A @dfn{compendium} is a special PO file containing a set of 4377translations recurring in many different packages. The translator can 4378use gettext tools to build a new compendium, to add entries to her 4379compendium, and to initialize untranslated entries, or to update 4380already translated entries, from translations kept in the compendium. 4381 4382@menu 4383* Creating Compendia:: Merging translations for later use 4384* Using Compendia:: Using older translations if they fit 4385@end menu 4386 4387@node Creating Compendia, Using Compendia, Compendium, Compendium 4388@subsection Creating Compendia 4389@cindex creating compendia 4390@cindex compendium, creating 4391 4392Basically every PO file consisting of translated entries only can be 4393declared as a valid compendium. Often the translator wants to have 4394special compendia; let's consider two cases: @cite{concatenating PO 4395files} and @cite{extracting a message subset from a PO file}. 4396 4397@subsubsection Concatenate PO Files 4398 4399@cindex concatenating PO files into a compendium 4400@cindex accumulating translations 4401To concatenate several valid PO files into one compendium file you can 4402use @samp{msgcomm} or @samp{msgcat} (the latter preferred): 4403 4404@example 4405msgcat -o compendium.po file1.po file2.po 4406@end example 4407 4408By default, @code{msgcat} will accumulate divergent translations 4409for the same string. Those occurrences will be marked as @code{fuzzy} 4410and highly visible decorated; calling @code{msgcat} on 4411@file{file1.po}: 4412 4413@example 4414#: src/hello.c:200 4415#, c-format 4416msgid "Report bugs to <%s>.\n" 4417msgstr "Comunicar `bugs' a <%s>.\n" 4418@end example 4419 4420@noindent 4421and @file{file2.po}: 4422 4423@example 4424#: src/bye.c:100 4425#, c-format 4426msgid "Report bugs to <%s>.\n" 4427msgstr "Comunicar \"bugs\" a <%s>.\n" 4428@end example 4429 4430@noindent 4431will result in: 4432 4433@example 4434#: src/hello.c:200 src/bye.c:100 4435#, fuzzy, c-format 4436msgid "Report bugs to <%s>.\n" 4437msgstr "" 4438"#-#-#-#-# file1.po #-#-#-#-#\n" 4439"Comunicar `bugs' a <%s>.\n" 4440"#-#-#-#-# file2.po #-#-#-#-#\n" 4441"Comunicar \"bugs\" a <%s>.\n" 4442@end example 4443 4444@noindent 4445The translator will have to resolve this ``conflict'' manually; she 4446has to decide whether the first or the second version is appropriate 4447(or provide a new translation), to delete the ``marker lines'', and 4448finally to remove the @code{fuzzy} mark. 4449 4450If the translator knows in advance the first found translation of a 4451message is always the best translation she can make use to the 4452@samp{--use-first} switch: 4453 4454@example 4455msgcat --use-first -o compendium.po file1.po file2.po 4456@end example 4457 4458A good compendium file must not contain @code{fuzzy} or untranslated 4459entries. If input files are ``dirty'' you must preprocess the input 4460files or postprocess the result using @samp{msgattrib --translated --no-fuzzy}. 4461 4462@subsubsection Extract a Message Subset from a PO File 4463@cindex extracting parts of a PO file into a compendium 4464 4465Nobody wants to translate the same messages again and again; thus you 4466may wish to have a compendium file containing @file{getopt.c} messages. 4467 4468To extract a message subset (e.g., all @file{getopt.c} messages) from an 4469existing PO file into one compendium file you can use @samp{msggrep}: 4470 4471@example 4472msggrep --location src/getopt.c -o compendium.po file.po 4473@end example 4474 4475@node Using Compendia, , Creating Compendia, Compendium 4476@subsection Using Compendia 4477 4478You can use a compendium file to initialize a translation from scratch 4479or to update an already existing translation. 4480 4481@subsubsection Initialize a New Translation File 4482@cindex initialize translations from a compendium 4483 4484Since a PO file with translations does not exist the translator can 4485merely use @file{/dev/null} to fake the ``old'' translation file. 4486 4487@example 4488msgmerge --compendium compendium.po -o file.po /dev/null file.pot 4489@end example 4490 4491@subsubsection Update an Existing Translation File 4492@cindex update translations from a compendium 4493 4494Concatenate the compendium file(s) and the existing PO, merge the 4495result with the POT file and remove the obsolete entries (optional, 4496here done using @samp{sed}): 4497 4498@example 4499msgcat --use-first -o update.po compendium1.po compendium2.po file.po 4500msgmerge update.po file.pot | msgattrib --no-obsolete > file.po 4501@end example 4502 4503@node Manipulating, Binaries, Editing, Top 4504@chapter Manipulating PO Files 4505@cindex manipulating PO files 4506 4507Sometimes it is necessary to manipulate PO files in a way that is better 4508performed automatically than by hand. GNU @code{gettext} includes a 4509complete set of tools for this purpose. 4510 4511@cindex merging two PO files 4512When merging two packages into a single package, the resulting POT file 4513will be the concatenation of the two packages' POT files. Thus the 4514maintainer must concatenate the two existing package translations into 4515a single translation catalog, for each language. This is best performed 4516using @samp{msgcat}. It is then the translators' duty to deal with any 4517possible conflicts that arose during the merge. 4518 4519@cindex encoding conversion 4520When a translator takes over the translation job from another translator, 4521but she uses a different character encoding in her locale, she will 4522convert the catalog to her character encoding. This is best done through 4523the @samp{msgconv} program. 4524 4525When a maintainer takes a source file with tagged messages from another 4526package, he should also take the existing translations for this source 4527file (and not let the translators do the same job twice). One way to do 4528this is through @samp{msggrep}, another is to create a POT file for 4529that source file and use @samp{msgmerge}. 4530 4531@cindex dialect 4532@cindex orthography 4533When a translator wants to adjust some translation catalog for a special 4534dialect or orthography --- for example, German as written in Switzerland 4535versus German as written in Germany --- she needs to apply some text 4536processing to every message in the catalog. The tool for doing this is 4537@samp{msgfilter}. 4538 4539Another use of @code{msgfilter} is to produce approximately the POT file for 4540which a given PO file was made. This can be done through a filter command 4541like @samp{msgfilter sed -e d | sed -e '/^# /d'}. Note that the original 4542POT file may have had different comments and different plural message counts, 4543that's why it's better to use the original POT file if available. 4544 4545@cindex checking of translations 4546When a translator wants to check her translations, for example according 4547to orthography rules or using a non-interactive spell checker, she can do 4548so using the @samp{msgexec} program. 4549 4550@cindex duplicate elimination 4551When third party tools create PO or POT files, sometimes duplicates cannot 4552be avoided. But the GNU @code{gettext} tools give an error when they 4553encounter duplicate msgids in the same file and in the same domain. 4554To merge duplicates, the @samp{msguniq} program can be used. 4555 4556@samp{msgcomm} is a more general tool for keeping or throwing away 4557duplicates, occurring in different files. 4558 4559@samp{msgcmp} can be used to check whether a translation catalog is 4560completely translated. 4561 4562@cindex attributes, manipulating 4563@samp{msgattrib} can be used to select and extract only the fuzzy 4564or untranslated messages of a translation catalog. 4565 4566@samp{msgen} is useful as a first step for preparing English translation 4567catalogs. It copies each message's msgid to its msgstr. 4568 4569Finally, for those applications where all these various programs are not 4570sufficient, a library @samp{libgettextpo} is provided that can be used to 4571write other specialized programs that process PO files. 4572 4573@menu 4574* msgcat Invocation:: Invoking the @code{msgcat} Program 4575* msgconv Invocation:: Invoking the @code{msgconv} Program 4576* msggrep Invocation:: Invoking the @code{msggrep} Program 4577* msgfilter Invocation:: Invoking the @code{msgfilter} Program 4578* msguniq Invocation:: Invoking the @code{msguniq} Program 4579* msgcomm Invocation:: Invoking the @code{msgcomm} Program 4580* msgcmp Invocation:: Invoking the @code{msgcmp} Program 4581* msgattrib Invocation:: Invoking the @code{msgattrib} Program 4582* msgen Invocation:: Invoking the @code{msgen} Program 4583* msgexec Invocation:: Invoking the @code{msgexec} Program 4584* Colorizing:: Highlighting parts of PO files 4585* libgettextpo:: Writing your own programs that process PO files 4586@end menu 4587 4588@node msgcat Invocation, msgconv Invocation, Manipulating, Manipulating 4589@section Invoking the @code{msgcat} Program 4590 4591@include msgcat.texi 4592 4593@node msgconv Invocation, msggrep Invocation, msgcat Invocation, Manipulating 4594@section Invoking the @code{msgconv} Program 4595 4596@include msgconv.texi 4597 4598@node msggrep Invocation, msgfilter Invocation, msgconv Invocation, Manipulating 4599@section Invoking the @code{msggrep} Program 4600 4601@include msggrep.texi 4602 4603@node msgfilter Invocation, msguniq Invocation, msggrep Invocation, Manipulating 4604@section Invoking the @code{msgfilter} Program 4605 4606@include msgfilter.texi 4607 4608@node msguniq Invocation, msgcomm Invocation, msgfilter Invocation, Manipulating 4609@section Invoking the @code{msguniq} Program 4610 4611@include msguniq.texi 4612 4613@node msgcomm Invocation, msgcmp Invocation, msguniq Invocation, Manipulating 4614@section Invoking the @code{msgcomm} Program 4615 4616@include msgcomm.texi 4617 4618@node msgcmp Invocation, msgattrib Invocation, msgcomm Invocation, Manipulating 4619@section Invoking the @code{msgcmp} Program 4620 4621@include msgcmp.texi 4622 4623@node msgattrib Invocation, msgen Invocation, msgcmp Invocation, Manipulating 4624@section Invoking the @code{msgattrib} Program 4625 4626@include msgattrib.texi 4627 4628@node msgen Invocation, msgexec Invocation, msgattrib Invocation, Manipulating 4629@section Invoking the @code{msgen} Program 4630 4631@include msgen.texi 4632 4633@node msgexec Invocation, Colorizing, msgen Invocation, Manipulating 4634@section Invoking the @code{msgexec} Program 4635 4636@include msgexec.texi 4637 4638@node Colorizing, libgettextpo, msgexec Invocation, Manipulating 4639@section Highlighting parts of PO files 4640 4641Translators are usually only interested in seeing the untranslated and 4642fuzzy messages of a PO file. Also, when a message is set fuzzy because 4643the msgid changed, they want to see the differences between the previous 4644msgid and the current one (especially if the msgid is long and only few 4645words in it have changed). Finally, it's always welcome to highlight the 4646different sections of a message in a PO file (comments, msgid, msgstr, etc.). 4647 4648Such highlighting is possible through the @code{msgcat} options 4649@samp{--color} and @samp{--style}. 4650 4651@menu 4652* The --color option:: Triggering colorized output 4653* The TERM variable:: The environment variable @code{TERM} 4654* The --style option:: The @code{--style} option 4655* Style rules:: Style rules for PO files 4656* Customizing less:: Customizing @code{less} for viewing PO files 4657@end menu 4658 4659@node The --color option, The TERM variable, , Colorizing 4660@subsection The @code{--color} option 4661 4662@opindex --color@r{, @code{msgcat} option} 4663The @samp{--color=@var{when}} option specifies under which conditions 4664colorized output should be generated. The @var{when} part can be one of 4665the following: 4666 4667@table @code 4668@item always 4669@itemx yes 4670The output will be colorized. 4671 4672@item never 4673@itemx no 4674The output will not be colorized. 4675 4676@item auto 4677@itemx tty 4678The output will be colorized if the output device is a tty, i.e.@: when the 4679output goes directly to a text screen or terminal emulator window. 4680 4681@item html 4682The output will be colorized and be in HTML format. 4683@end table 4684 4685@noindent 4686@samp{--color} is equivalent to @samp{--color=yes}. The default is 4687@samp{--color=auto}. 4688 4689Thus, a command like @samp{msgcat vi.po} will produce colorized output 4690when called by itself in a command window. Whereas in a pipe, such as 4691@samp{msgcat vi.po | less -R}, it will not produce colorized output. To 4692get colorized output in this situation nevertheless, use the command 4693@samp{msgcat --color vi.po | less -R}. 4694 4695The @samp{--color=html} option will produce output that can be viewed in 4696a browser. This can be useful, for example, for Indic languages, 4697because the renderic of Indic scripts in browser is usually better than 4698in terminal emulators. 4699 4700Note that the output produced with the @code{--color} option is @emph{not} 4701a valid PO file in itself. It contains additional terminal-specific escape 4702sequences or HTML tags. A PO file reader will give a syntax error when 4703confronted with such content. Except for the @samp{--color=html} case, 4704you therefore normally don't need to save output produced with the 4705@code{--color} option in a file. 4706 4707@node The TERM variable, The --style option, The --color option, Colorizing 4708@subsection The environment variable @code{TERM} 4709 4710@vindex TERM@r{, environment variable} 4711The environment variable @code{TERM} contains a identifier for the text 4712window's capabilities. You can get a detailed list of these cababilities 4713by using the @samp{infocmp} command, using @samp{man 5 terminfo} as a 4714reference. 4715 4716When producing text with embedded color directives, @code{msgcat} looks 4717at the @code{TERM} variable. Text windows today typically support at least 47188 colors. Often, however, the text window supports 16 or more colors, 4719even though the @code{TERM} variable is set to a identifier denoting only 47208 supported colors. It can be worth setting the @code{TERM} variable to 4721a different value in these cases: 4722 4723@table @code 4724@item xterm 4725@code{xterm} is in most cases built with support for 16 colors. It can also 4726be built with support for 88 or 256 colors (but not both). You can try to 4727set @code{TERM} to either @code{xterm-16color}, @code{xterm-88color}, or 4728@code{xterm-256color}. 4729 4730@item rxvt 4731@code{rxvt} is often built with support for 16 colors. You can try to set 4732@code{TERM} to @code{rxvt-16color}. 4733 4734@item konsole 4735@code{konsole} too is often built with support for 16 colors. You can try to 4736set @code{TERM} to @code{konsole-16color} or @code{xterm-16color}. 4737@end table 4738 4739After setting @code{TERM}, you can verify it by invoking 4740@samp{msgcat --color=test} and seeing whether the output looks like a 4741reasonable color map. 4742 4743@node The --style option, Style rules, The TERM variable, Colorizing 4744@subsection The @code{--style} option 4745 4746@opindex --style@r{, @code{msgcat} option} 4747The @samp{--style=@var{style_file}} option specifies the style file to use 4748when colorizing. It has an effect only when the @code{--color} option is 4749effective. 4750 4751@vindex PO_STYLE@r{, environment variable} 4752If the @code{--style} option is not specified, the environment variable 4753@code{PO_STYLE} is considered. It is meant to point to the user's 4754preferred style for PO files. 4755 4756The default style file is @file{$prefix/share/gettext/styles/po-default.css}, 4757where @code{$prefix} is the installation location. 4758 4759A few style files are predefined: 4760@table @file 4761@item po-vim.css 4762This style imitates the look used by vim 7. 4763 4764@item po-emacs-x.css 4765This style imitates the look used by GNU Emacs 21 and 22 in an X11 window. 4766 4767@item po-emacs-xterm.css 4768@itemx po-emacs-xterm16.css 4769@itemx po-emacs-xterm256.css 4770This style imitates the look used by GNU Emacs 22 in a terminal of type 4771@samp{xterm} (8 colors) or @samp{xterm-16color} (16 colors) or 4772@samp{xterm-256color} (256 colors), respectively. 4773@end table 4774 4775@noindent 4776You can use these styles without specifying a directory. They are actually 4777located in @file{$prefix/share/gettext/styles/}, where @code{$prefix} is the 4778installation location. 4779 4780You can also design your own styles. This is described in the next section. 4781 4782 4783@node Style rules, Customizing less, The --style option, Colorizing 4784@subsection Style rules for PO files 4785 4786The same style file can be used for styling of a PO file, for terminal 4787output and for HTML output. It is written in CSS (Cascading Style Sheet) 4788syntax. See @url{http://www.w3.org/TR/css2/cover.html} for a formal 4789definition of CSS. Many HTML authoring tutorials also contain explanations 4790of CSS. 4791 4792In the case of HTML output, the style file is embedded in the HTML output. 4793In the case of text output, the style file is interpreted by the 4794@code{msgcat} program. This means, in particular, that when 4795@code{@@import} is used with relative file names, the file names are 4796 4797@itemize @minus 4798@item 4799relative to the resulting HTML file, in the case of HTML output, 4800 4801@item 4802relative to the style sheet containing the @code{@@import}, in the case of 4803text output. (Actually, @code{@@import}s are not yet supported in this case, 4804due to a limitation in @code{libcroco}.) 4805@end itemize 4806 4807CSS rules are built up from selectors and declarations. The declarations 4808specify graphical properties; the selectors specify specify when they apply. 4809 4810In PO files, the following simple selectors (based on "CSS classes", see 4811the CSS2 spec, section 5.8.3) are supported. 4812 4813@itemize @bullet 4814@item 4815Selectors that apply to entire messages: 4816 4817@table @code 4818@item .header 4819This matches the header entry of a PO file. 4820 4821@item .translated 4822This matches a translated message. 4823 4824@item .untranslated 4825This matches an untranslated message (i.e.@: a message with empty translation). 4826 4827@item .fuzzy 4828This matches a fuzzy message (i.e.@: a message which has a translation that 4829needs review by the translator). 4830 4831@item .obsolete 4832This matches an obsolete message (i.e.@: a message that was translated but is 4833not needed by the current POT file any more). 4834@end table 4835 4836@item 4837Selectors that apply to parts of a message in PO syntax. Recall the general 4838structure of a message in PO syntax: 4839 4840@example 4841@var{white-space} 4842# @var{translator-comments} 4843#. @var{extracted-comments} 4844#: @var{reference}@dots{} 4845#, @var{flag}@dots{} 4846#| msgid @var{previous-untranslated-string} 4847msgid @var{untranslated-string} 4848msgstr @var{translated-string} 4849@end example 4850 4851@table @code 4852@item .comment 4853This matches all comments (translator comments, extracted comments, 4854source file reference comments, flag comments, previous message comments, 4855as well as the entire obsolete messages). 4856 4857@item .translator-comment 4858This matches the translator comments. 4859 4860@item .extracted-comment 4861This matches the extracted comments, i.e.@: the comments placed by the 4862programmer at the attention of the translator. 4863 4864@item .reference-comment 4865This matches the source file reference comments (entire lines). 4866 4867@item .reference 4868This matches the individual source file references inside the source file 4869reference comment lines. 4870 4871@item .flag-comment 4872This matches the flag comment lines (entire lines). 4873 4874@item .flag 4875This matches the individual flags inside flag comment lines. 4876 4877@item .fuzzy-flag 4878This matches the `fuzzy' flag inside flag comment lines. 4879 4880@item .previous-comment 4881This matches the comments containing the previous untranslated string (entire 4882lines). 4883 4884@item .previous 4885This matches the previous untranslated string including the string delimiters, 4886the associated keywords (@code{msgid} etc.) and the spaces between them. 4887 4888@item .msgid 4889This matches the untranslated string including the string delimiters, 4890the associated keywords (@code{msgid} etc.) and the spaces between them. 4891 4892@item .msgstr 4893This matches the translated string including the string delimiters, 4894the associated keywords (@code{msgstr} etc.) and the spaces between them. 4895 4896@item .keyword 4897This matches the keywords (@code{msgid}, @code{msgstr}, etc.). 4898 4899@item .string 4900This matches strings, including the string delimiters (double quotes). 4901@end table 4902 4903@item 4904Selectors that apply to parts of strings: 4905 4906@table @code 4907@item .text 4908This matches the entire contents of a string (excluding the string delimiters, 4909i.e.@: the double quotes). 4910 4911@item .escape-sequence 4912This matches an escape sequence (starting with a backslash). 4913 4914@item .format-directive 4915This matches a format string directive (starting with a @samp{%} sign in the 4916case of most programming languages, with a @samp{@{} in the case of 4917@code{java-format} and @code{csharp-format}, with a @samp{~} in the case of 4918@code{lisp-format} and @code{scheme-format}, or with @samp{$} in the case of 4919@code{sh-format}). 4920 4921@item .invalid-format-directive 4922This matches an invalid format string directive. 4923 4924@item .added 4925In an untranslated string, this matches a part of the string that was not 4926present in the previous untranslated string. (Not yet implemented in this 4927release.) 4928 4929@item .changed 4930In an untranslated string or in a previous untranslated string, this matches 4931a part of the string that is changed or replaced. (Not yet implemented in 4932this release.) 4933 4934@item .removed 4935In a previous untranslated string, this matches a part of the string that 4936is not present in the current untranslated string. (Not yet implemented in 4937this release.) 4938@end table 4939@end itemize 4940 4941These selectors can be combined to hierarchical selectors. For example, 4942 4943@smallexample 4944.msgstr .invalid-format-directive @{ color: red; @} 4945@end smallexample 4946 4947@noindent 4948will highlight the invalid format directives in the translated strings. 4949 4950In text mode, pseudo-classes (CSS2 spec, section 5.11) and pseudo-elements 4951(CSS2 spec, section 5.12) are not supported. 4952 4953The declarations in HTML mode are not limited; any graphical attribute 4954supported by the browsers can be used. 4955 4956The declarations in text mode are limited to the following properties. Other 4957properties will be silently ignored. 4958 4959@table @asis 4960@item @code{color} (CSS2 spec, section 14.1) 4961@itemx @code{background-color} (CSS2 spec, section 14.2.1) 4962These properties is supported. Colors will be adjusted to match the terminal's 4963capabilities. Note that many terminals support only 8 colors. 4964 4965@item @code{font-weight} (CSS2 spec, section 15.2.3) 4966This property is supported, but most terminals can only render two different 4967weights: @code{normal} and @code{bold}. Values >= 600 are rendered as 4968@code{bold}. 4969 4970@item @code{font-style} (CSS2 spec, section 15.2.3) 4971This property is supported. The values @code{italic} and @code{oblique} are 4972rendered the same way. 4973 4974@item @code{text-decoration} (CSS2 spec, section 16.3.1) 4975This property is supported, limited to the values @code{none} and 4976@code{underline}. 4977@end table 4978 4979@node Customizing less, , Style rules, Colorizing 4980@subsection Customizing @code{less} for viewing PO files 4981 4982The @samp{less} program is a popular text file browser for use in a text 4983screen or terminal emulator. It also supports text with embedded escape 4984sequences for colors and text decorations. 4985 4986You can use @code{less} to view a PO file like this (assuming an UTF-8 4987environment): 4988 4989@smallexample 4990msgcat --to-code=UTF-8 --color xyz.po | less -R 4991@end smallexample 4992 4993You can simplify this to this simple command: 4994 4995@smallexample 4996less xyz.po 4997@end smallexample 4998 4999@noindent 5000after these three preparations: 5001 5002@enumerate 5003@item 5004Add the options @samp{-R} and @samp{-f} to the @code{LESS} environment 5005variable. In sh shells: 5006@smallexample 5007$ LESS="$LESS -R -f" 5008$ export LESS 5009@end smallexample 5010 5011@item 5012If your system does not already have the @file{lessopen.sh} and 5013@file{lessclose.sh} scripts, create them and set the @code{LESSOPEN} and 5014@code{LESSCLOSE} environment variables, as indicated in the manual page 5015(@samp{man less}). 5016 5017@item 5018Add to @file{lessopen.sh} a piece of script that recognizes PO files 5019through their file extension and invokes @code{msgcat} on them, producing 5020a temporary file. Like this: 5021 5022@smallexample 5023case "$1" in 5024 *.po) 5025 tmpfile=`mktemp "$@{TMPDIR-/tmp@}/less.XXXXXX"` 5026 msgcat --to-code=UTF-8 --color "$1" > "$tmpfile" 5027 echo "$tmpfile" 5028 exit 0 5029 ;; 5030esac 5031@end smallexample 5032@end enumerate 5033 5034@node libgettextpo, , Colorizing, Manipulating 5035@section Writing your own programs that process PO files 5036 5037For the tasks for which a combination of @samp{msgattrib}, @samp{msgcat} etc. 5038is not sufficient, a set of C functions is provided in a library, to make it 5039possible to process PO files in your own programs. When you use this library, 5040you don't need to write routines to parse the PO file; instead, you retrieve 5041a pointer in memory to each of messages contained in the PO file. Functions 5042for writing PO files are not provided at this time. 5043 5044The functions are declared in the header file @samp{<gettext-po.h>}, and are 5045defined in a library called @samp{libgettextpo}. 5046 5047@deftp {Data Type} po_file_t 5048This is a pointer type that refers to the contents of a PO file, after it has 5049been read into memory. 5050@end deftp 5051 5052@deftp {Data Type} po_message_iterator_t 5053This is a pointer type that refers to an iterator that produces a sequence of 5054messages. 5055@end deftp 5056 5057@deftp {Data Type} po_message_t 5058This is a pointer type that refers to a message of a PO file, including its 5059translation. 5060@end deftp 5061 5062@deftypefun po_file_t po_file_read (const char *@var{filename}) 5063The @code{po_file_read} function reads a PO file into memory. The file name 5064is given as argument. The return value is a handle to the PO file's contents, 5065valid until @code{po_file_free} is called on it. In case of error, the return 5066value is @code{NULL}, and @code{errno} is set. 5067@end deftypefun 5068 5069@deftypefun void po_file_free (po_file_t @var{file}) 5070The @code{po_file_free} function frees a PO file's contents from memory, 5071including all messages that are only implicitly accessible through iterators. 5072@end deftypefun 5073 5074@deftypefun {const char * const *} po_file_domains (po_file_t @var{file}) 5075The @code{po_file_domains} function returns the domains for which the given 5076PO file has messages. The return value is a @code{NULL} terminated array 5077which is valid as long as the @var{file} handle is valid. For PO files which 5078contain no @samp{domain} directive, the return value contains only one domain, 5079namely the default domain @code{"messages"}. 5080@end deftypefun 5081 5082@deftypefun po_message_iterator_t po_message_iterator (po_file_t @var{file}, const char *@var{domain}) 5083The @code{po_message_iterator} returns an iterator that will produce the 5084messages of @var{file} that belong to the given @var{domain}. If @var{domain} 5085is @code{NULL}, the default domain is used instead. To list the messages, 5086use the function @code{po_next_message} repeatedly. 5087@end deftypefun 5088 5089@deftypefun void po_message_iterator_free (po_message_iterator_t @var{iterator}) 5090The @code{po_message_iterator_free} function frees an iterator previously 5091allocated through the @code{po_message_iterator} function. 5092@end deftypefun 5093 5094@deftypefun po_message_t po_next_message (po_message_iterator_t @var{iterator}) 5095The @code{po_next_message} function returns the next message from 5096@var{iterator} and advances the iterator. It returns @code{NULL} when the 5097iterator has reached the end of its message list. 5098@end deftypefun 5099 5100The following functions returns details of a @code{po_message_t}. Recall 5101that the results are valid as long as the @var{file} handle is valid. 5102 5103@deftypefun {const char *} po_message_msgid (po_message_t @var{message}) 5104The @code{po_message_msgid} function returns the @code{msgid} (untranslated 5105English string) of a message. This is guaranteed to be non-@code{NULL}. 5106@end deftypefun 5107 5108@deftypefun {const char *} po_message_msgid_plural (po_message_t @var{message}) 5109The @code{po_message_msgid_plural} function returns the @code{msgid_plural} 5110(untranslated English plural string) of a message with plurals, or @code{NULL} 5111for a message without plural. 5112@end deftypefun 5113 5114@deftypefun {const char *} po_message_msgstr (po_message_t @var{message}) 5115The @code{po_message_msgstr} function returns the @code{msgstr} (translation) 5116of a message. For an untranslated message, the return value is an empty 5117string. 5118@end deftypefun 5119 5120@deftypefun {const char *} po_message_msgstr_plural (po_message_t @var{message}, int @var{index}) 5121The @code{po_message_msgstr_plural} function returns the 5122@code{msgstr[@var{index}]} of a message with plurals, or @code{NULL} when 5123the @var{index} is out of range or for a message without plural. 5124@end deftypefun 5125 5126Here is an example code how these functions can be used. 5127 5128@example 5129const char *filename = @dots{}; 5130po_file_t file = po_file_read (filename); 5131 5132if (file == NULL) 5133 error (EXIT_FAILURE, errno, "couldn't open the PO file %s", filename); 5134@{ 5135 const char * const *domains = po_file_domains (file); 5136 const char * const *domainp; 5137 5138 for (domainp = domains; *domainp; domainp++) 5139 @{ 5140 const char *domain = *domainp; 5141 po_message_iterator_t iterator = po_message_iterator (file, domain); 5142 5143 for (;;) 5144 @{ 5145 po_message_t *message = po_next_message (iterator); 5146 5147 if (message == NULL) 5148 break; 5149 @{ 5150 const char *msgid = po_message_msgid (message); 5151 const char *msgstr = po_message_msgstr (message); 5152 5153 @dots{} 5154 @} 5155 @} 5156 po_message_iterator_free (iterator); 5157 @} 5158@} 5159po_file_free (file); 5160@end example 5161 5162@node Binaries, Programmers, Manipulating, Top 5163@chapter Producing Binary MO Files 5164 5165@c FIXME: Rewrite. 5166 5167@menu 5168* msgfmt Invocation:: Invoking the @code{msgfmt} Program 5169* msgunfmt Invocation:: Invoking the @code{msgunfmt} Program 5170* MO Files:: The Format of GNU MO Files 5171@end menu 5172 5173@node msgfmt Invocation, msgunfmt Invocation, Binaries, Binaries 5174@section Invoking the @code{msgfmt} Program 5175 5176@include msgfmt.texi 5177 5178@node msgunfmt Invocation, MO Files, msgfmt Invocation, Binaries 5179@section Invoking the @code{msgunfmt} Program 5180 5181@include msgunfmt.texi 5182 5183@node MO Files, , msgunfmt Invocation, Binaries 5184@section The Format of GNU MO Files 5185@cindex MO file's format 5186@cindex file format, @file{.mo} 5187 5188The format of the generated MO files is best described by a picture, 5189which appears below. 5190 5191@cindex magic signature of MO files 5192The first two words serve the identification of the file. The magic 5193number will always signal GNU MO files. The number is stored in the 5194byte order of the generating machine, so the magic number really is 5195two numbers: @code{0x950412de} and @code{0xde120495}. The second 5196word describes the current revision of the file format. For now the 5197revision is 0. This might change in future versions, and ensures 5198that the readers of MO files can distinguish new formats from old 5199ones, so that both can be handled correctly. The version is kept 5200separate from the magic number, instead of using different magic 5201numbers for different formats, mainly because @file{/etc/magic} is 5202not updated often. It might be better to have magic separated from 5203internal format version identification. 5204 5205Follow a number of pointers to later tables in the file, allowing 5206for the extension of the prefix part of MO files without having to 5207recompile programs reading them. This might become useful for later 5208inserting a few flag bits, indication about the charset used, new 5209tables, or other things. 5210 5211Then, at offset @var{O} and offset @var{T} in the picture, two tables 5212of string descriptors can be found. In both tables, each string 5213descriptor uses two 32 bits integers, one for the string length, 5214another for the offset of the string in the MO file, counting in bytes 5215from the start of the file. The first table contains descriptors 5216for the original strings, and is sorted so the original strings 5217are in increasing lexicographical order. The second table contains 5218descriptors for the translated strings, and is parallel to the first 5219table: to find the corresponding translation one has to access the 5220array slot in the second array with the same index. 5221 5222Having the original strings sorted enables the use of simple binary 5223search, for when the MO file does not contain an hashing table, or 5224for when it is not practical to use the hashing table provided in 5225the MO file. This also has another advantage, as the empty string 5226in a PO file GNU @code{gettext} is usually @emph{translated} into 5227some system information attached to that particular MO file, and the 5228empty string necessarily becomes the first in both the original and 5229translated tables, making the system information very easy to find. 5230 5231@cindex hash table, inside MO files 5232The size @var{S} of the hash table can be zero. In this case, the 5233hash table itself is not contained in the MO file. Some people might 5234prefer this because a precomputed hashing table takes disk space, and 5235does not win @emph{that} much speed. The hash table contains indices 5236to the sorted array of strings in the MO file. Conflict resolution is 5237done by double hashing. The precise hashing algorithm used is fairly 5238dependent on GNU @code{gettext} code, and is not documented here. 5239 5240As for the strings themselves, they follow the hash file, and each 5241is terminated with a @key{NUL}, and this @key{NUL} is not counted in 5242the length which appears in the string descriptor. The @code{msgfmt} 5243program has an option selecting the alignment for MO file strings. 5244With this option, each string is separately aligned so it starts at 5245an offset which is a multiple of the alignment value. On some RISC 5246machines, a correct alignment will speed things up. 5247 5248@cindex context, in MO files 5249Contexts are stored by storing the concatenation of the context, a 5250@key{EOT} byte, and the original string, instead of the original string. 5251 5252@cindex plural forms, in MO files 5253Plural forms are stored by letting the plural of the original string 5254follow the singular of the original string, separated through a 5255@key{NUL} byte. The length which appears in the string descriptor 5256includes both. However, only the singular of the original string 5257takes part in the hash table lookup. The plural variants of the 5258translation are all stored consecutively, separated through a 5259@key{NUL} byte. Here also, the length in the string descriptor 5260includes all of them. 5261 5262Nothing prevents a MO file from having embedded @key{NUL}s in strings. 5263However, the program interface currently used already presumes 5264that strings are @key{NUL} terminated, so embedded @key{NUL}s are 5265somewhat useless. But the MO file format is general enough so other 5266interfaces would be later possible, if for example, we ever want to 5267implement wide characters right in MO files, where @key{NUL} bytes may 5268accidentally appear. (No, we don't want to have wide characters in MO 5269files. They would make the file unnecessarily large, and the 5270@samp{wchar_t} type being platform dependent, MO files would be 5271platform dependent as well.) 5272 5273This particular issue has been strongly debated in the GNU 5274@code{gettext} development forum, and it is expectable that MO file 5275format will evolve or change over time. It is even possible that many 5276formats may later be supported concurrently. But surely, we have to 5277start somewhere, and the MO file format described here is a good start. 5278Nothing is cast in concrete, and the format may later evolve fairly 5279easily, so we should feel comfortable with the current approach. 5280 5281@example 5282@group 5283 byte 5284 +------------------------------------------+ 5285 0 | magic number = 0x950412de | 5286 | | 5287 4 | file format revision = 0 | 5288 | | 5289 8 | number of strings | == N 5290 | | 5291 12 | offset of table with original strings | == O 5292 | | 5293 16 | offset of table with translation strings | == T 5294 | | 5295 20 | size of hashing table | == S 5296 | | 5297 24 | offset of hashing table | == H 5298 | | 5299 . . 5300 . (possibly more entries later) . 5301 . . 5302 | | 5303 O | length & offset 0th string ----------------. 5304 O + 8 | length & offset 1st string ------------------. 5305 ... ... | | 5306O + ((N-1)*8)| length & offset (N-1)th string | | | 5307 | | | | 5308 T | length & offset 0th translation ---------------. 5309 T + 8 | length & offset 1st translation -----------------. 5310 ... ... | | | | 5311T + ((N-1)*8)| length & offset (N-1)th translation | | | | | 5312 | | | | | | 5313 H | start hash table | | | | | 5314 ... ... | | | | 5315 H + S * 4 | end hash table | | | | | 5316 | | | | | | 5317 | NUL terminated 0th string <----------------' | | | 5318 | | | | | 5319 | NUL terminated 1st string <------------------' | | 5320 | | | | 5321 ... ... | | 5322 | | | | 5323 | NUL terminated 0th translation <---------------' | 5324 | | | 5325 | NUL terminated 1st translation <-----------------' 5326 | | 5327 ... ... 5328 | | 5329 +------------------------------------------+ 5330@end group 5331@end example 5332 5333@node Programmers, Translators, Binaries, Top 5334@chapter The Programmer's View 5335 5336@c FIXME: Reorganize whole chapter. 5337 5338One aim of the current message catalog implementation provided by 5339GNU @code{gettext} was to use the system's message catalog handling, if the 5340installer wishes to do so. So we perhaps should first take a look at 5341the solutions we know about. The people in the POSIX committee did not 5342manage to agree on one of the semi-official standards which we'll 5343describe below. In fact they couldn't agree on anything, so they decided 5344only to include an example of an interface. The major Unix vendors 5345are split in the usage of the two most important specifications: X/Open's 5346catgets vs. Uniforum's gettext interface. We'll describe them both and 5347later explain our solution of this dilemma. 5348 5349@menu 5350* catgets:: About @code{catgets} 5351* gettext:: About @code{gettext} 5352* Comparison:: Comparing the two interfaces 5353* Using libintl.a:: Using libintl.a in own programs 5354* gettext grok:: Being a @code{gettext} grok 5355* Temp Programmers:: Temporary Notes for the Programmers Chapter 5356@end menu 5357 5358@node catgets, gettext, Programmers, Programmers 5359@section About @code{catgets} 5360@cindex @code{catgets}, X/Open specification 5361 5362The @code{catgets} implementation is defined in the X/Open Portability 5363Guide, Volume 3, XSI Supplementary Definitions, Chapter 5. But the 5364process of creating this standard seemed to be too slow for some of 5365the Unix vendors so they created their implementations on preliminary 5366versions of the standard. Of course this leads again to problems while 5367writing platform independent programs: even the usage of @code{catgets} 5368does not guarantee a unique interface. 5369 5370Another, personal comment on this that only a bunch of committee members 5371could have made this interface. They never really tried to program 5372using this interface. It is a fast, memory-saving implementation, an 5373user can happily live with it. But programmers hate it (at least I and 5374some others do@dots{}) 5375 5376But we must not forget one point: after all the trouble with transferring 5377the rights on Unix(tm) they at last came to X/Open, the very same who 5378published this specification. This leads me to making the prediction 5379that this interface will be in future Unix standards (e.g.@: Spec1170) and 5380therefore part of all Unix implementation (implementations, which are 5381@emph{allowed} to wear this name). 5382 5383@menu 5384* Interface to catgets:: The interface 5385* Problems with catgets:: Problems with the @code{catgets} interface?! 5386@end menu 5387 5388@node Interface to catgets, Problems with catgets, catgets, catgets 5389@subsection The Interface 5390@cindex interface to @code{catgets} 5391 5392The interface to the @code{catgets} implementation consists of three 5393functions which correspond to those used in file access: @code{catopen} 5394to open the catalog for using, @code{catgets} for accessing the message 5395tables, and @code{catclose} for closing after work is done. Prototypes 5396for the functions and the needed definitions are in the 5397@code{<nl_types.h>} header file. 5398 5399@cindex @code{catopen}, a @code{catgets} function 5400@code{catopen} is used like in this: 5401 5402@example 5403nl_catd catd = catopen ("catalog_name", 0); 5404@end example 5405 5406The function takes as the argument the name of the catalog. This usual 5407refers to the name of the program or the package. The second parameter 5408is not further specified in the standard. I don't even know whether it 5409is implemented consistently among various systems. So the common advice 5410is to use @code{0} as the value. The return value is a handle to the 5411message catalog, equivalent to handles to file returned by @code{open}. 5412 5413@cindex @code{catgets}, a @code{catgets} function 5414This handle is of course used in the @code{catgets} function which can 5415be used like this: 5416 5417@example 5418char *translation = catgets (catd, set_no, msg_id, "original string"); 5419@end example 5420 5421The first parameter is this catalog descriptor. The second parameter 5422specifies the set of messages in this catalog, in which the message 5423described by @code{msg_id} is obtained. @code{catgets} therefore uses a 5424three-stage addressing: 5425 5426@display 5427catalog name @result{} set number @result{} message ID @result{} translation 5428@end display 5429 5430@c Anybody else loving Haskell??? :-) -- Uli 5431 5432The fourth argument is not used to address the translation. It is given 5433as a default value in case when one of the addressing stages fail. One 5434important thing to remember is that although the return type of catgets 5435is @code{char *} the resulting string @emph{must not} be changed. It 5436should better be @code{const char *}, but the standard is published in 54371988, one year before ANSI C. 5438 5439@noindent 5440@cindex @code{catclose}, a @code{catgets} function 5441The last of these functions is used and behaves as expected: 5442 5443@example 5444catclose (catd); 5445@end example 5446 5447After this no @code{catgets} call using the descriptor is legal anymore. 5448 5449@node Problems with catgets, , Interface to catgets, catgets 5450@subsection Problems with the @code{catgets} Interface?! 5451@cindex problems with @code{catgets} interface 5452 5453Now that this description seemed to be really easy --- where are the 5454problems we speak of? In fact the interface could be used in a 5455reasonable way, but constructing the message catalogs is a pain. The 5456reason for this lies in the third argument of @code{catgets}: the unique 5457message ID. This has to be a numeric value for all messages in a single 5458set. Perhaps you could imagine the problems keeping such a list while 5459changing the source code. Add a new message here, remove one there. Of 5460course there have been developed a lot of tools helping to organize this 5461chaos but one as the other fails in one aspect or the other. We don't 5462want to say that the other approach has no problems but they are far 5463more easy to manage. 5464 5465@node gettext, Comparison, catgets, Programmers 5466@section About @code{gettext} 5467@cindex @code{gettext}, a programmer's view 5468 5469The definition of the @code{gettext} interface comes from a Uniforum 5470proposal. It was submitted there by Sun, who had implemented the 5471@code{gettext} function in SunOS 4, around 1990. Nowadays, the 5472@code{gettext} interface is specified by the OpenI18N standard. 5473 5474The main point about this solution is that it does not follow the 5475method of normal file handling (open-use-close) and that it does not 5476burden the programmer with so many tasks, especially the unique key handling. 5477Of course here also a unique key is needed, but this key is the message 5478itself (how long or short it is). See @ref{Comparison} for a more 5479detailed comparison of the two methods. 5480 5481The following section contains a rather detailed description of the 5482interface. We make it that detailed because this is the interface 5483we chose for the GNU @code{gettext} Library. Programmers interested 5484in using this library will be interested in this description. 5485 5486@menu 5487* Interface to gettext:: The interface 5488* Ambiguities:: Solving ambiguities 5489* Locating Catalogs:: Locating message catalog files 5490* Charset conversion:: How to request conversion to Unicode 5491* Contexts:: Solving ambiguities in GUI programs 5492* Plural forms:: Additional functions for handling plurals 5493* Optimized gettext:: Optimization of the *gettext functions 5494@end menu 5495 5496@node Interface to gettext, Ambiguities, gettext, gettext 5497@subsection The Interface 5498@cindex @code{gettext} interface 5499 5500The minimal functionality an interface must have is a) to select a 5501domain the strings are coming from (a single domain for all programs is 5502not reasonable because its construction and maintenance is difficult, 5503perhaps impossible) and b) to access a string in a selected domain. 5504 5505This is principally the description of the @code{gettext} interface. It 5506has a global domain which unqualified usages reference. Of course this 5507domain is selectable by the user. 5508 5509@example 5510char *textdomain (const char *domain_name); 5511@end example 5512 5513This provides the possibility to change or query the current status of 5514the current global domain of the @code{LC_MESSAGE} category. The 5515argument is a null-terminated string, whose characters must be legal in 5516the use in filenames. If the @var{domain_name} argument is @code{NULL}, 5517the function returns the current value. If no value has been set 5518before, the name of the default domain is returned: @emph{messages}. 5519Please note that although the return value of @code{textdomain} is of 5520type @code{char *} no changing is allowed. It is also important to know 5521that no checks of the availability are made. If the name is not 5522available you will see this by the fact that no translations are provided. 5523 5524@noindent 5525To use a domain set by @code{textdomain} the function 5526 5527@example 5528char *gettext (const char *msgid); 5529@end example 5530 5531@noindent 5532is to be used. This is the simplest reasonable form one can imagine. 5533The translation of the string @var{msgid} is returned if it is available 5534in the current domain. If it is not available, the argument itself is 5535returned. If the argument is @code{NULL} the result is undefined. 5536 5537One thing which should come into mind is that no explicit dependency to 5538the used domain is given. The current value of the domain is used. 5539If this changes between two 5540executions of the same @code{gettext} call in the program, both calls 5541reference a different message catalog. 5542 5543For the easiest case, which is normally used in internationalized 5544packages, once at the beginning of execution a call to @code{textdomain} 5545is issued, setting the domain to a unique name, normally the package 5546name. In the following code all strings which have to be translated are 5547filtered through the gettext function. That's all, the package speaks 5548your language. 5549 5550@node Ambiguities, Locating Catalogs, Interface to gettext, gettext 5551@subsection Solving Ambiguities 5552@cindex several domains 5553@cindex domain ambiguities 5554@cindex large package 5555 5556While this single name domain works well for most applications there 5557might be the need to get translations from more than one domain. Of 5558course one could switch between different domains with calls to 5559@code{textdomain}, but this is really not convenient nor is it fast. A 5560possible situation could be one case subject to discussion during this 5561writing: all 5562error messages of functions in the set of common used functions should 5563go into a separate domain @code{error}. By this mean we would only need 5564to translate them once. 5565Another case are messages from a library, as these @emph{have} to be 5566independent of the current domain set by the application. 5567 5568@noindent 5569For this reasons there are two more functions to retrieve strings: 5570 5571@example 5572char *dgettext (const char *domain_name, const char *msgid); 5573char *dcgettext (const char *domain_name, const char *msgid, 5574 int category); 5575@end example 5576 5577Both take an additional argument at the first place, which corresponds 5578to the argument of @code{textdomain}. The third argument of 5579@code{dcgettext} allows to use another locale category but @code{LC_MESSAGES}. 5580But I really don't know where this can be useful. If the 5581@var{domain_name} is @code{NULL} or @var{category} has an value beside 5582the known ones, the result is undefined. It should also be noted that 5583this function is not part of the second known implementation of this 5584function family, the one found in Solaris. 5585 5586A second ambiguity can arise by the fact, that perhaps more than one 5587domain has the same name. This can be solved by specifying where the 5588needed message catalog files can be found. 5589 5590@example 5591char *bindtextdomain (const char *domain_name, 5592 const char *dir_name); 5593@end example 5594 5595Calling this function binds the given domain to a file in the specified 5596directory (how this file is determined follows below). Especially a 5597file in the systems default place is not favored against the specified 5598file anymore (as it would be by solely using @code{textdomain}). A 5599@code{NULL} pointer for the @var{dir_name} parameter returns the binding 5600associated with @var{domain_name}. If @var{domain_name} itself is 5601@code{NULL} nothing happens and a @code{NULL} pointer is returned. Here 5602again as for all the other functions is true that none of the return 5603value must be changed! 5604 5605It is important to remember that relative path names for the 5606@var{dir_name} parameter can be trouble. Since the path is always 5607computed relative to the current directory different results will be 5608achieved when the program executes a @code{chdir} command. Relative 5609paths should always be avoided to avoid dependencies and 5610unreliabilities. 5611 5612@node Locating Catalogs, Charset conversion, Ambiguities, gettext 5613@subsection Locating Message Catalog Files 5614@cindex message catalog files location 5615 5616Because many different languages for many different packages have to be 5617stored we need some way to add these information to file message catalog 5618files. The way usually used in Unix environments is have this encoding 5619in the file name. This is also done here. The directory name given in 5620@code{bindtextdomain}s second argument (or the default directory), 5621followed by the name of the locale, the locale category, and the domain name 5622are concatenated: 5623 5624@example 5625@var{dir_name}/@var{locale}/LC_@var{category}/@var{domain_name}.mo 5626@end example 5627 5628The default value for @var{dir_name} is system specific. For the GNU 5629library, and for packages adhering to its conventions, it's: 5630@example 5631/usr/local/share/locale 5632@end example 5633 5634@noindent 5635@var{locale} is the name of the locale category which is designated by 5636@code{LC_@var{category}}. For @code{gettext} and @code{dgettext} this 5637@code{LC_@var{category}} is always @code{LC_MESSAGES}.@footnote{Some 5638system, e.g.@: mingw, don't have @code{LC_MESSAGES}. Here we use a more or 5639less arbitrary value for it, namely 1729, the smallest positive integer 5640which can be represented in two different ways as the sum of two cubes.} 5641The name of the locale category is determined through 5642@code{setlocale (LC_@var{category}, NULL)}. 5643@footnote{When the system does not support @code{setlocale} its behavior 5644in setting the locale values is simulated by looking at the environment 5645variables.} 5646When using the function @code{dcgettext}, you can specify the locale category 5647through the third argument. 5648 5649@node Charset conversion, Contexts, Locating Catalogs, gettext 5650@subsection How to specify the output character set @code{gettext} uses 5651@cindex charset conversion at runtime 5652@cindex encoding conversion at runtime 5653 5654@code{gettext} not only looks up a translation in a message catalog. It 5655also converts the translation on the fly to the desired output character 5656set. This is useful if the user is working in a different character set 5657than the translator who created the message catalog, because it avoids 5658distributing variants of message catalogs which differ only in the 5659character set. 5660 5661The output character set is, by default, the value of @code{nl_langinfo 5662(CODESET)}, which depends on the @code{LC_CTYPE} part of the current 5663locale. But programs which store strings in a locale independent way 5664(e.g.@: UTF-8) can request that @code{gettext} and related functions 5665return the translations in that encoding, by use of the 5666@code{bind_textdomain_codeset} function. 5667 5668Note that the @var{msgid} argument to @code{gettext} is not subject to 5669character set conversion. Also, when @code{gettext} does not find a 5670translation for @var{msgid}, it returns @var{msgid} unchanged -- 5671independently of the current output character set. It is therefore 5672recommended that all @var{msgid}s be US-ASCII strings. 5673 5674@deftypefun {char *} bind_textdomain_codeset (const char *@var{domainname}, const char *@var{codeset}) 5675The @code{bind_textdomain_codeset} function can be used to specify the 5676output character set for message catalogs for domain @var{domainname}. 5677The @var{codeset} argument must be a valid codeset name which can be used 5678for the @code{iconv_open} function, or a null pointer. 5679 5680If the @var{codeset} parameter is the null pointer, 5681@code{bind_textdomain_codeset} returns the currently selected codeset 5682for the domain with the name @var{domainname}. It returns @code{NULL} if 5683no codeset has yet been selected. 5684 5685The @code{bind_textdomain_codeset} function can be used several times. 5686If used multiple times with the same @var{domainname} argument, the 5687later call overrides the settings made by the earlier one. 5688 5689The @code{bind_textdomain_codeset} function returns a pointer to a 5690string containing the name of the selected codeset. The string is 5691allocated internally in the function and must not be changed by the 5692user. If the system went out of core during the execution of 5693@code{bind_textdomain_codeset}, the return value is @code{NULL} and the 5694global variable @var{errno} is set accordingly. 5695@end deftypefun 5696 5697@node Contexts, Plural forms, Charset conversion, gettext 5698@subsection Using contexts for solving ambiguities 5699@cindex context 5700@cindex GUI programs 5701@cindex translating menu entries 5702@cindex menu entries 5703 5704One place where the @code{gettext} functions, if used normally, have big 5705problems is within programs with graphical user interfaces (GUIs). The 5706problem is that many of the strings which have to be translated are very 5707short. They have to appear in pull-down menus which restricts the 5708length. But strings which are not containing entire sentences or at 5709least large fragments of a sentence may appear in more than one 5710situation in the program but might have different translations. This is 5711especially true for the one-word strings which are frequently used in 5712GUI programs. 5713 5714As a consequence many people say that the @code{gettext} approach is 5715wrong and instead @code{catgets} should be used which indeed does not 5716have this problem. But there is a very simple and powerful method to 5717handle this kind of problems with the @code{gettext} functions. 5718 5719Contexts can be added to strings to be translated. A context dependent 5720translation lookup is when a translation for a given string is searched, 5721that is limited to a given context. The translation for the same string 5722in a different context can be different. The different translations of 5723the same string in different contexts can be stored in the in the same 5724MO file, and can be edited by the translator in the same PO file. 5725 5726The @file{gettext.h} include file contains the lookup macros for strings 5727with contexts. They are implemented as thin macros and inline functions 5728over the functions from @code{<libintl.h>}. 5729 5730@findex pgettext 5731@example 5732const char *pgettext (const char *msgctxt, const char *msgid); 5733@end example 5734 5735In a call of this macro, @var{msgctxt} and @var{msgid} must be string 5736literals. The macro returns the translation of @var{msgid}, restricted 5737to the context given by @var{msgctxt}. 5738 5739The @var{msgctxt} string is visible in the PO file to the translator. 5740You should try to make it somehow canonical and never changing. Because 5741every time you change an @var{msgctxt}, the translator will have to review 5742the translation of @var{msgid}. 5743 5744Finding a canonical @var{msgctxt} string that doesn't change over time can 5745be hard. But you shouldn't use the file name or class name containing the 5746@code{pgettext} call -- because it is a common development task to rename 5747a file or a class, and it shouldn't cause translator work. Also you shouldn't 5748use a comment in the form of a complete English sentence as @var{msgctxt} -- 5749because orthography or grammar changes are often applied to such sentences, 5750and again, it shouldn't force the translator to do a review. 5751 5752The @samp{p} in @samp{pgettext} stands for ``particular'': @code{pgettext} 5753fetches a particular translation of the @var{msgid}. 5754 5755@findex dpgettext 5756@findex dcpgettext 5757@example 5758const char *dpgettext (const char *domain_name, 5759 const char *msgctxt, const char *msgid); 5760const char *dcpgettext (const char *domain_name, 5761 const char *msgctxt, const char *msgid, 5762 int category); 5763@end example 5764 5765These are generalizations of @code{pgettext}. They behave similarly to 5766@code{dgettext} and @code{dcgettext}, respectively. The @var{domain_name} 5767argument defines the translation domain. The @var{category} argument 5768allows to use another locale category than @code{LC_MESSAGES}. 5769 5770As as example consider the following fictional situation. A GUI program 5771has a menu bar with the following entries: 5772 5773@smallexample 5774+------------+------------+--------------------------------------+ 5775| File | Printer | | 5776+------------+------------+--------------------------------------+ 5777| Open | | Select | 5778| New | | Open | 5779+----------+ | Connect | 5780 +----------+ 5781@end smallexample 5782 5783To have the strings @code{File}, @code{Printer}, @code{Open}, 5784@code{New}, @code{Select}, and @code{Connect} translated there has to be 5785at some point in the code a call to a function of the @code{gettext} 5786family. But in two places the string passed into the function would be 5787@code{Open}. The translations might not be the same and therefore we 5788are in the dilemma described above. 5789 5790What distinguishes the two places is the menu path from the menu root to 5791the particular menu entries: 5792 5793@smallexample 5794Menu|File 5795Menu|Printer 5796Menu|File|Open 5797Menu|File|New 5798Menu|Printer|Select 5799Menu|Printer|Open 5800Menu|Printer|Connect 5801@end smallexample 5802 5803The context is thus the menu path without its last part. So, the calls 5804look like this: 5805 5806@smallexample 5807pgettext ("Menu|", "File") 5808pgettext ("Menu|", "Printer") 5809pgettext ("Menu|File|", "Open") 5810pgettext ("Menu|File|", "New") 5811pgettext ("Menu|Printer|", "Select") 5812pgettext ("Menu|Printer|", "Open") 5813pgettext ("Menu|Printer|", "Connect") 5814@end smallexample 5815 5816Whether or not to use the @samp{|} character at the end of the context is a 5817matter of style. 5818 5819For more complex cases, where the @var{msgctxt} or @var{msgid} are not 5820string literals, more general macros are available: 5821 5822@findex pgettext_expr 5823@findex dpgettext_expr 5824@findex dcpgettext_expr 5825@example 5826const char *pgettext_expr (const char *msgctxt, const char *msgid); 5827const char *dpgettext_expr (const char *domain_name, 5828 const char *msgctxt, const char *msgid); 5829const char *dcpgettext_expr (const char *domain_name, 5830 const char *msgctxt, const char *msgid, 5831 int category); 5832@end example 5833 5834Here @var{msgctxt} and @var{msgid} can be arbitrary string-valued expressions. 5835These macros are more general. But in the case that both argument expressions 5836are string literals, the macros without the @samp{_expr} suffix are more 5837efficient. 5838 5839@node Plural forms, Optimized gettext, Contexts, gettext 5840@subsection Additional functions for plural forms 5841@cindex plural forms 5842 5843The functions of the @code{gettext} family described so far (and all the 5844@code{catgets} functions as well) have one problem in the real world 5845which have been neglected completely in all existing approaches. What 5846is meant here is the handling of plural forms. 5847 5848Looking through Unix source code before the time anybody thought about 5849internationalization (and, sadly, even afterwards) one can often find 5850code similar to the following: 5851 5852@smallexample 5853 printf ("%d file%s deleted", n, n == 1 ? "" : "s"); 5854@end smallexample 5855 5856@noindent 5857After the first complaints from people internationalizing the code people 5858either completely avoided formulations like this or used strings like 5859@code{"file(s)"}. Both look unnatural and should be avoided. First 5860tries to solve the problem correctly looked like this: 5861 5862@smallexample 5863 if (n == 1) 5864 printf ("%d file deleted", n); 5865 else 5866 printf ("%d files deleted", n); 5867@end smallexample 5868 5869But this does not solve the problem. It helps languages where the 5870plural form of a noun is not simply constructed by adding an 5871@ifhtml 5872���s��� 5873@end ifhtml 5874@ifnothtml 5875`s' 5876@end ifnothtml 5877but that is all. Once again people fell into the trap of believing the 5878rules their language is using are universal. But the handling of plural 5879forms differs widely between the language families. For example, 5880Rafal Maszkowski @code{<rzm@@mat.uni.torun.pl>} reports: 5881 5882@quotation 5883In Polish we use e.g.@: plik (file) this way: 5884@example 58851 plik 58862,3,4 pliki 58875-21 pliko'w 588822-24 pliki 588925-31 pliko'w 5890@end example 5891and so on (o' means 8859-2 oacute which should be rather okreska, 5892similar to aogonek). 5893@end quotation 5894 5895There are two things which can differ between languages (and even inside 5896language families); 5897 5898@itemize @bullet 5899@item 5900The form how plural forms are built differs. This is a problem with 5901languages which have many irregularities. German, for instance, is a 5902drastic case. Though English and German are part of the same language 5903family (Germanic), the almost regular forming of plural noun forms 5904(appending an 5905@ifhtml 5906���s���) 5907@end ifhtml 5908@ifnothtml 5909`s') 5910@end ifnothtml 5911is hardly found in German. 5912 5913@item 5914The number of plural forms differ. This is somewhat surprising for 5915those who only have experiences with Romanic and Germanic languages 5916since here the number is the same (there are two). 5917 5918But other language families have only one form or many forms. More 5919information on this in an extra section. 5920@end itemize 5921 5922The consequence of this is that application writers should not try to 5923solve the problem in their code. This would be localization since it is 5924only usable for certain, hardcoded language environments. Instead the 5925extended @code{gettext} interface should be used. 5926 5927These extra functions are taking instead of the one key string two 5928strings and a numerical argument. The idea behind this is that using 5929the numerical argument and the first string as a key, the implementation 5930can select using rules specified by the translator the right plural 5931form. The two string arguments then will be used to provide a return 5932value in case no message catalog is found (similar to the normal 5933@code{gettext} behavior). In this case the rules for Germanic language 5934is used and it is assumed that the first string argument is the singular 5935form, the second the plural form. 5936 5937This has the consequence that programs without language catalogs can 5938display the correct strings only if the program itself is written using 5939a Germanic language. This is a limitation but since the GNU C library 5940(as well as the GNU @code{gettext} package) are written as part of the 5941GNU package and the coding standards for the GNU project require program 5942being written in English, this solution nevertheless fulfills its 5943purpose. 5944 5945@deftypefun {char *} ngettext (const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}) 5946The @code{ngettext} function is similar to the @code{gettext} function 5947as it finds the message catalogs in the same way. But it takes two 5948extra arguments. The @var{msgid1} parameter must contain the singular 5949form of the string to be converted. It is also used as the key for the 5950search in the catalog. The @var{msgid2} parameter is the plural form. 5951The parameter @var{n} is used to determine the plural form. If no 5952message catalog is found @var{msgid1} is returned if @code{n == 1}, 5953otherwise @code{msgid2}. 5954 5955An example for the use of this function is: 5956 5957@smallexample 5958printf (ngettext ("%d file removed", "%d files removed", n), n); 5959@end smallexample 5960 5961Please note that the numeric value @var{n} has to be passed to the 5962@code{printf} function as well. It is not sufficient to pass it only to 5963@code{ngettext}. 5964 5965In the English singular case, the number -- always 1 -- can be replaced with 5966"one": 5967 5968@smallexample 5969printf (ngettext ("One file removed", "%d files removed", n), n); 5970@end smallexample 5971 5972@noindent 5973This works because the @samp{printf} function discards excess arguments that 5974are not consumed by the format string. 5975 5976It is also possible to use this function when the strings don't contain a 5977cardinal number: 5978 5979@smallexample 5980puts (ngettext ("Delete the selected file?", 5981 "Delete the selected files?", 5982 n)); 5983@end smallexample 5984 5985In this case the number @var{n} is only used to choose the plural form. 5986@end deftypefun 5987 5988@deftypefun {char *} dngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}) 5989The @code{dngettext} is similar to the @code{dgettext} function in the 5990way the message catalog is selected. The difference is that it takes 5991two extra parameter to provide the correct plural form. These two 5992parameters are handled in the same way @code{ngettext} handles them. 5993@end deftypefun 5994 5995@deftypefun {char *} dcngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}, int @var{category}) 5996The @code{dcngettext} is similar to the @code{dcgettext} function in the 5997way the message catalog is selected. The difference is that it takes 5998two extra parameter to provide the correct plural form. These two 5999parameters are handled in the same way @code{ngettext} handles them. 6000@end deftypefun 6001 6002Now, how do these functions solve the problem of the plural forms? 6003Without the input of linguists (which was not available) it was not 6004possible to determine whether there are only a few different forms in 6005which plural forms are formed or whether the number can increase with 6006every new supported language. 6007 6008Therefore the solution implemented is to allow the translator to specify 6009the rules of how to select the plural form. Since the formula varies 6010with every language this is the only viable solution except for 6011hardcoding the information in the code (which still would require the 6012possibility of extensions to not prevent the use of new languages). 6013 6014@cindex specifying plural form in a PO file 6015@kwindex nplurals@r{, in a PO file header} 6016@kwindex plural@r{, in a PO file header} 6017The information about the plural form selection has to be stored in the 6018header entry of the PO file (the one with the empty @code{msgid} string). 6019The plural form information looks like this: 6020 6021@smallexample 6022Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1; 6023@end smallexample 6024 6025The @code{nplurals} value must be a decimal number which specifies how 6026many different plural forms exist for this language. The string 6027following @code{plural} is an expression which is using the C language 6028syntax. Exceptions are that no negative numbers are allowed, numbers 6029must be decimal, and the only variable allowed is @code{n}. Spaces are 6030allowed in the expression, but backslash-newlines are not; in the 6031examples below the backslash-newlines are present for formatting purposes 6032only. This expression will be evaluated whenever one of the functions 6033@code{ngettext}, @code{dngettext}, or @code{dcngettext} is called. The 6034numeric value passed to these functions is then substituted for all uses 6035of the variable @code{n} in the expression. The resulting value then 6036must be greater or equal to zero and smaller than the value given as the 6037value of @code{nplurals}. 6038 6039@noindent 6040@cindex plural form formulas 6041The following rules are known at this point. The language with families 6042are listed. But this does not necessarily mean the information can be 6043generalized for the whole family (as can be easily seen in the table 6044below).@footnote{Additions are welcome. Send appropriate information to 6045@email{bug-gnu-gettext@@gnu.org} and @email{bug-glibc-manual@@gnu.org}.} 6046 6047@table @asis 6048@item Only one form: 6049Some languages only require one single form. There is no distinction 6050between the singular and plural form. An appropriate header entry 6051would look like this: 6052 6053@smallexample 6054Plural-Forms: nplurals=1; plural=0; 6055@end smallexample 6056 6057@noindent 6058Languages with this property include: 6059 6060@table @asis 6061@item Asian family 6062Japanese, Korean, Vietnamese 6063@item Turkic/Altaic family 6064Turkish 6065@end table 6066 6067@item Two forms, singular used for one only 6068This is the form used in most existing programs since it is what English 6069is using. A header entry would look like this: 6070 6071@smallexample 6072Plural-Forms: nplurals=2; plural=n != 1; 6073@end smallexample 6074 6075(Note: this uses the feature of C expressions that boolean expressions 6076have to value zero or one.) 6077 6078@noindent 6079Languages with this property include: 6080 6081@table @asis 6082@item Germanic family 6083Danish, Dutch, English, Faroese, German, Norwegian, Swedish 6084@item Finno-Ugric family 6085Estonian, Finnish 6086@item Latin/Greek family 6087Greek 6088@item Semitic family 6089Hebrew 6090@item Romanic family 6091Italian, Portuguese, Spanish 6092@item Artificial 6093Esperanto 6094@end table 6095 6096@noindent 6097Another language using the same header entry is: 6098 6099@table @asis 6100@item Finno-Ugric family 6101Hungarian 6102@end table 6103 6104Hungarian does not appear to have a plural if you look at sentences involving 6105cardinal numbers. For example, ``1 apple'' is ``1 alma'', and ``123 apples'' is 6106``123 alma''. But when the number is not explicit, the distinction between 6107singular and plural exists: ``the apple'' is ``az alma'', and ``the apples'' is 6108``az alm@'{a}k''. Since @code{ngettext} has to support both types of sentences, 6109it is classified here, under ``two forms''. 6110 6111@item Two forms, singular used for zero and one 6112Exceptional case in the language family. The header entry would be: 6113 6114@smallexample 6115Plural-Forms: nplurals=2; plural=n>1; 6116@end smallexample 6117 6118@noindent 6119Languages with this property include: 6120 6121@table @asis 6122@item Romanic family 6123French, Brazilian Portuguese 6124@end table 6125 6126@item Three forms, special case for zero 6127The header entry would be: 6128 6129@smallexample 6130Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2; 6131@end smallexample 6132 6133@noindent 6134Languages with this property include: 6135 6136@table @asis 6137@item Baltic family 6138Latvian 6139@end table 6140 6141@item Three forms, special cases for one and two 6142The header entry would be: 6143 6144@smallexample 6145Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2; 6146@end smallexample 6147 6148@noindent 6149Languages with this property include: 6150 6151@table @asis 6152@item Celtic 6153Gaeilge (Irish) 6154@end table 6155 6156@item Three forms, special case for numbers ending in 00 or [2-9][0-9] 6157The header entry would be: 6158 6159@smallexample 6160Plural-Forms: nplurals=3; \ 6161 plural=n==1 ? 0 : (n==0 || (n%100 > 0 && n%100 < 20)) ? 1 : 2; 6162@end smallexample 6163 6164@noindent 6165Languages with this property include: 6166 6167@table @asis 6168@item Romanic family 6169Romanian 6170@end table 6171 6172@item Three forms, special case for numbers ending in 1[2-9] 6173The header entry would look like this: 6174 6175@smallexample 6176Plural-Forms: nplurals=3; \ 6177 plural=n%10==1 && n%100!=11 ? 0 : \ 6178 n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2; 6179@end smallexample 6180 6181@noindent 6182Languages with this property include: 6183 6184@table @asis 6185@item Baltic family 6186Lithuanian 6187@end table 6188 6189@item Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4] 6190The header entry would look like this: 6191 6192@smallexample 6193Plural-Forms: nplurals=3; \ 6194 plural=n%10==1 && n%100!=11 ? 0 : \ 6195 n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2; 6196@end smallexample 6197 6198@noindent 6199Languages with this property include: 6200 6201@table @asis 6202@item Slavic family 6203Croatian, Serbian, Russian, Ukrainian 6204@end table 6205 6206@item Three forms, special cases for 1 and 2, 3, 4 6207The header entry would look like this: 6208 6209@smallexample 6210Plural-Forms: nplurals=3; \ 6211 plural=(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2; 6212@end smallexample 6213 6214@noindent 6215Languages with this property include: 6216 6217@table @asis 6218@item Slavic family 6219Slovak, Czech 6220@end table 6221 6222@item Three forms, special case for one and some numbers ending in 2, 3, or 4 6223The header entry would look like this: 6224 6225@smallexample 6226Plural-Forms: nplurals=3; \ 6227 plural=n==1 ? 0 : \ 6228 n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2; 6229@end smallexample 6230 6231@noindent 6232Languages with this property include: 6233 6234@table @asis 6235@item Slavic family 6236Polish 6237@end table 6238 6239@item Four forms, special case for one and all numbers ending in 02, 03, or 04 6240The header entry would look like this: 6241 6242@smallexample 6243Plural-Forms: nplurals=4; \ 6244 plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3; 6245@end smallexample 6246 6247@noindent 6248Languages with this property include: 6249 6250@table @asis 6251@item Slavic family 6252Slovenian 6253@end table 6254@end table 6255 6256You might now ask, @code{ngettext} handles only numbers @var{n} of type 6257@samp{unsigned long}. What about larger integer types? What about negative 6258numbers? What about floating-point numbers? 6259 6260About larger integer types, such as @samp{uintmax_t} or 6261@samp{unsigned long long}: they can be handled by reducing the value to a 6262range that fits in an @samp{unsigned long}. Simply casting the value to 6263@samp{unsigned long} would not do the right thing, since it would treat 6264@code{ULONG_MAX + 1} like zero, @code{ULONG_MAX + 2} like singular, and 6265the like. Here you can exploit the fact that all mentioned plural form 6266formulas eventually become periodic, with a period that is a divisor of 100 6267(or 1000 or 1000000). So, when you reduce a large value to another one in 6268the range [1000000, 1999999] that ends in the same 6 decimal digits, you 6269can assume that it will lead to the same plural form selection. This code 6270does this: 6271 6272@smallexample 6273#include <inttypes.h> 6274uintmax_t nbytes = ...; 6275printf (ngettext ("The file has %"PRIuMAX" byte.", 6276 "The file has %"PRIuMAX" bytes.", 6277 (nbytes > ULONG_MAX 6278 ? (nbytes % 1000000) + 1000000 6279 : nbytes)), 6280 nbytes); 6281@end smallexample 6282 6283Negative and floating-point values usually represent physical entities for 6284which singular and plural don't clearly apply. In such cases, there is no 6285need to use @code{ngettext}; a simple @code{gettext} call with a form suitable 6286for all values will do. For example: 6287 6288@smallexample 6289printf (gettext ("Time elapsed: %.3f seconds"), 6290 num_milliseconds * 0.001); 6291@end smallexample 6292 6293@noindent 6294Even if @var{num_milliseconds} happens to be a multiple of 1000, the output 6295@smallexample 6296Time elapsed: 1.000 seconds 6297@end smallexample 6298@noindent 6299is acceptable in English, and similarly for other languages. 6300 6301@node Optimized gettext, , Plural forms, gettext 6302@subsection Optimization of the *gettext functions 6303@cindex optimization of @code{gettext} functions 6304 6305At this point of the discussion we should talk about an advantage of the 6306GNU @code{gettext} implementation. Some readers might have pointed out 6307that an internationalized program might have a poor performance if some 6308string has to be translated in an inner loop. While this is unavoidable 6309when the string varies from one run of the loop to the other it is 6310simply a waste of time when the string is always the same. Take the 6311following example: 6312 6313@example 6314@group 6315@{ 6316 while (@dots{}) 6317 @{ 6318 puts (gettext ("Hello world")); 6319 @} 6320@} 6321@end group 6322@end example 6323 6324@noindent 6325When the locale selection does not change between two runs the resulting 6326string is always the same. One way to use this is: 6327 6328@example 6329@group 6330@{ 6331 str = gettext ("Hello world"); 6332 while (@dots{}) 6333 @{ 6334 puts (str); 6335 @} 6336@} 6337@end group 6338@end example 6339 6340@noindent 6341But this solution is not usable in all situation (e.g.@: when the locale 6342selection changes) nor does it lead to legible code. 6343 6344For this reason, GNU @code{gettext} caches previous translation results. 6345When the same translation is requested twice, with no new message 6346catalogs being loaded in between, @code{gettext} will, the second time, 6347find the result through a single cache lookup. 6348 6349@node Comparison, Using libintl.a, gettext, Programmers 6350@section Comparing the Two Interfaces 6351@cindex @code{gettext} vs @code{catgets} 6352@cindex comparison of interfaces 6353 6354@c FIXME: arguments to catgets vs. gettext 6355@c Partly done 950718 -- drepper 6356 6357The following discussion is perhaps a little bit colored. As said 6358above we implemented GNU @code{gettext} following the Uniforum 6359proposal and this surely has its reasons. But it should show how we 6360came to this decision. 6361 6362First we take a look at the developing process. When we write an 6363application using NLS provided by @code{gettext} we proceed as always. 6364Only when we come to a string which might be seen by the users and thus 6365has to be translated we use @code{gettext("@dots{}")} instead of 6366@code{"@dots{}"}. At the beginning of each source file (or in a central 6367header file) we define 6368 6369@example 6370#define gettext(String) (String) 6371@end example 6372 6373Even this definition can be avoided when the system supports the 6374@code{gettext} function in its C library. When we compile this code the 6375result is the same as if no NLS code is used. When you take a look at 6376the GNU @code{gettext} code you will see that we use @code{_("@dots{}")} 6377instead of @code{gettext("@dots{}")}. This reduces the number of 6378additional characters per translatable string to @emph{3} (in words: 6379three). 6380 6381When now a production version of the program is needed we simply replace 6382the definition 6383 6384@example 6385#define _(String) (String) 6386@end example 6387 6388@noindent 6389by 6390 6391@cindex include file @file{libintl.h} 6392@example 6393#include <libintl.h> 6394#define _(String) gettext (String) 6395@end example 6396 6397@noindent 6398Additionally we run the program @file{xgettext} on all source code file 6399which contain translatable strings and that's it: we have a running 6400program which does not depend on translations to be available, but which 6401can use any that becomes available. 6402 6403@cindex @code{N_}, a convenience macro 6404The same procedure can be done for the @code{gettext_noop} invocations 6405(@pxref{Special cases}). One usually defines @code{gettext_noop} as a 6406no-op macro. So you should consider the following code for your project: 6407 6408@example 6409#define gettext_noop(String) String 6410#define N_(String) gettext_noop (String) 6411@end example 6412 6413@code{N_} is a short form similar to @code{_}. The @file{Makefile} in 6414the @file{po/} directory of GNU @code{gettext} knows by default both of the 6415mentioned short forms so you are invited to follow this proposal for 6416your own ease. 6417 6418Now to @code{catgets}. The main problem is the work for the 6419programmer. Every time he comes to a translatable string he has to 6420define a number (or a symbolic constant) which has also be defined in 6421the message catalog file. He also has to take care for duplicate 6422entries, duplicate message IDs etc. If he wants to have the same 6423quality in the message catalog as the GNU @code{gettext} program 6424provides he also has to put the descriptive comments for the strings and 6425the location in all source code files in the message catalog. This is 6426nearly a Mission: Impossible. 6427 6428But there are also some points people might call advantages speaking for 6429@code{catgets}. If you have a single word in a string and this string 6430is used in different contexts it is likely that in one or the other 6431language the word has different translations. Example: 6432 6433@example 6434printf ("%s: %d", gettext ("number"), number_of_errors) 6435 6436printf ("you should see %d %s", number_count, 6437 number_count == 1 ? gettext ("number") : gettext ("numbers")) 6438@end example 6439 6440Here we have to translate two times the string @code{"number"}. Even 6441if you do not speak a language beside English it might be possible to 6442recognize that the two words have a different meaning. In German the 6443first appearance has to be translated to @code{"Anzahl"} and the second 6444to @code{"Zahl"}. 6445 6446Now you can say that this example is really esoteric. And you are 6447right! This is exactly how we felt about this problem and decide that 6448it does not weight that much. The solution for the above problem could 6449be very easy: 6450 6451@example 6452printf ("%s %d", gettext ("number:"), number_of_errors) 6453 6454printf (number_count == 1 ? gettext ("you should see %d number") 6455 : gettext ("you should see %d numbers"), 6456 number_count) 6457@end example 6458 6459We believe that we can solve all conflicts with this method. If it is 6460difficult one can also consider changing one of the conflicting string a 6461little bit. But it is not impossible to overcome. 6462 6463@code{catgets} allows same original entry to have different translations, 6464but @code{gettext} has another, scalable approach for solving ambiguities 6465of this kind: @xref{Ambiguities}. 6466 6467@node Using libintl.a, gettext grok, Comparison, Programmers 6468@section Using libintl.a in own programs 6469 6470Starting with version 0.9.4 the library @code{libintl.h} should be 6471self-contained. I.e., you can use it in your own programs without 6472providing additional functions. The @file{Makefile} will put the header 6473and the library in directories selected using the @code{$(prefix)}. 6474 6475@node gettext grok, Temp Programmers, Using libintl.a, Programmers 6476@section Being a @code{gettext} grok 6477 6478@strong{ NOTE: } This documentation section is outdated and needs to be 6479revised. 6480 6481To fully exploit the functionality of the GNU @code{gettext} library it 6482is surely helpful to read the source code. But for those who don't want 6483to spend that much time in reading the (sometimes complicated) code here 6484is a list comments: 6485 6486@itemize @bullet 6487@item Changing the language at runtime 6488@cindex language selection at runtime 6489 6490For interactive programs it might be useful to offer a selection of the 6491used language at runtime. To understand how to do this one need to know 6492how the used language is determined while executing the @code{gettext} 6493function. The method which is presented here only works correctly 6494with the GNU implementation of the @code{gettext} functions. 6495 6496In the function @code{dcgettext} at every call the current setting of 6497the highest priority environment variable is determined and used. 6498Highest priority means here the following list with decreasing 6499priority: 6500 6501@enumerate 6502@vindex LANGUAGE@r{, environment variable} 6503@item @code{LANGUAGE} 6504@vindex LC_ALL@r{, environment variable} 6505@item @code{LC_ALL} 6506@vindex LC_CTYPE@r{, environment variable} 6507@vindex LC_NUMERIC@r{, environment variable} 6508@vindex LC_TIME@r{, environment variable} 6509@vindex LC_COLLATE@r{, environment variable} 6510@vindex LC_MONETARY@r{, environment variable} 6511@vindex LC_MESSAGES@r{, environment variable} 6512@item @code{LC_xxx}, according to selected locale category 6513@vindex LANG@r{, environment variable} 6514@item @code{LANG} 6515@end enumerate 6516 6517Afterwards the path is constructed using the found value and the 6518translation file is loaded if available. 6519 6520What happens now when the value for, say, @code{LANGUAGE} changes? According 6521to the process explained above the new value of this variable is found 6522as soon as the @code{dcgettext} function is called. But this also means 6523the (perhaps) different message catalog file is loaded. In other 6524words: the used language is changed. 6525 6526But there is one little hook. The code for gcc-2.7.0 and up provides 6527some optimization. This optimization normally prevents the calling of 6528the @code{dcgettext} function as long as no new catalog is loaded. But 6529if @code{dcgettext} is not called the program also cannot find the 6530@code{LANGUAGE} variable be changed (@pxref{Optimized gettext}). A 6531solution for this is very easy. Include the following code in the 6532language switching function. 6533 6534@example 6535 /* Change language. */ 6536 setenv ("LANGUAGE", "fr", 1); 6537 6538 /* Make change known. */ 6539 @{ 6540 extern int _nl_msg_cat_cntr; 6541 ++_nl_msg_cat_cntr; 6542 @} 6543@end example 6544 6545@cindex @code{_nl_msg_cat_cntr} 6546The variable @code{_nl_msg_cat_cntr} is defined in @file{loadmsgcat.c}. 6547You don't need to know what this is for. But it can be used to detect 6548whether a @code{gettext} implementation is GNU gettext and not non-GNU 6549system's native gettext implementation. 6550 6551@end itemize 6552 6553@node Temp Programmers, , gettext grok, Programmers 6554@section Temporary Notes for the Programmers Chapter 6555 6556@strong{ NOTE: } This documentation section is outdated and needs to be 6557revised. 6558 6559@menu 6560* Temp Implementations:: Temporary - Two Possible Implementations 6561* Temp catgets:: Temporary - About @code{catgets} 6562* Temp WSI:: Temporary - Why a single implementation 6563* Temp Notes:: Temporary - Notes 6564@end menu 6565 6566@node Temp Implementations, Temp catgets, Temp Programmers, Temp Programmers 6567@subsection Temporary - Two Possible Implementations 6568 6569There are two competing methods for language independent messages: 6570the X/Open @code{catgets} method, and the Uniforum @code{gettext} 6571method. The @code{catgets} method indexes messages by integers; the 6572@code{gettext} method indexes them by their English translations. 6573The @code{catgets} method has been around longer and is supported 6574by more vendors. The @code{gettext} method is supported by Sun, 6575and it has been heard that the COSE multi-vendor initiative is 6576supporting it. Neither method is a POSIX standard; the POSIX.1 6577committee had a lot of disagreement in this area. 6578 6579Neither one is in the POSIX standard. There was much disagreement 6580in the POSIX.1 committee about using the @code{gettext} routines 6581vs. @code{catgets} (XPG). In the end the committee couldn't 6582agree on anything, so no messaging system was included as part 6583of the standard. I believe the informative annex of the standard 6584includes the XPG3 messaging interfaces, ``@dots{}as an example of 6585a messaging system that has been implemented@dots{}'' 6586 6587They were very careful not to say anywhere that you should use one 6588set of interfaces over the other. For more on this topic please 6589see the Programming for Internationalization FAQ. 6590 6591@node Temp catgets, Temp WSI, Temp Implementations, Temp Programmers 6592@subsection Temporary - About @code{catgets} 6593 6594There have been a few discussions of late on the use of 6595@code{catgets} as a base. I think it important to present both 6596sides of the argument and hence am opting to play devil's advocate 6597for a little bit. 6598 6599I'll not deny the fact that @code{catgets} could have been designed 6600a lot better. It currently has quite a number of limitations and 6601these have already been pointed out. 6602 6603However there is a great deal to be said for consistency and 6604standardization. A common recurring problem when writing Unix 6605software is the myriad portability problems across Unix platforms. 6606It seems as if every Unix vendor had a look at the operating system 6607and found parts they could improve upon. Undoubtedly, these 6608modifications are probably innovative and solve real problems. 6609However, software developers have a hard time keeping up with all 6610these changes across so many platforms. 6611 6612And this has prompted the Unix vendors to begin to standardize their 6613systems. Hence the impetus for Spec1170. Every major Unix vendor 6614has committed to supporting this standard and every Unix software 6615developer waits with glee the day they can write software to this 6616standard and simply recompile (without having to use autoconf) 6617across different platforms. 6618 6619As I understand it, Spec1170 is roughly based upon version 4 of the 6620X/Open Portability Guidelines (XPG4). Because @code{catgets} and 6621friends are defined in XPG4, I'm led to believe that @code{catgets} 6622is a part of Spec1170 and hence will become a standardized component 6623of all Unix systems. 6624 6625@node Temp WSI, Temp Notes, Temp catgets, Temp Programmers 6626@subsection Temporary - Why a single implementation 6627 6628Now it seems kind of wasteful to me to have two different systems 6629installed for accessing message catalogs. If we do want to remedy 6630@code{catgets} deficiencies why don't we try to expand @code{catgets} 6631(in a compatible manner) rather than implement an entirely new system. 6632Otherwise, we'll end up with two message catalog access systems installed 6633with an operating system - one set of routines for packages using GNU 6634@code{gettext} for their internationalization, and another set of routines 6635(catgets) for all other software. Bloated? 6636 6637Supposing another catalog access system is implemented. Which do 6638we recommend? At least for Linux, we need to attract as many 6639software developers as possible. Hence we need to make it as easy 6640for them to port their software as possible. Which means supporting 6641@code{catgets}. We will be implementing the @code{libintl} code 6642within our @code{libc}, but does this mean we also have to incorporate 6643another message catalog access scheme within our @code{libc} as well? 6644And what about people who are going to be using the @code{libintl} 6645+ non-@code{catgets} routines. When they port their software to 6646other platforms, they're now going to have to include the front-end 6647(@code{libintl}) code plus the back-end code (the non-@code{catgets} 6648access routines) with their software instead of just including the 6649@code{libintl} code with their software. 6650 6651Message catalog support is however only the tip of the iceberg. 6652What about the data for the other locale categories? They also have 6653a number of deficiencies. Are we going to abandon them as well and 6654develop another duplicate set of routines (should @code{libintl} 6655expand beyond message catalog support)? 6656 6657Like many parts of Unix that can be improved upon, we're stuck with balancing 6658compatibility with the past with useful improvements and innovations for 6659the future. 6660 6661@node Temp Notes, , Temp WSI, Temp Programmers 6662@subsection Temporary - Notes 6663 6664X/Open agreed very late on the standard form so that many 6665implementations differ from the final form. Both of my system (old 6666Linux catgets and Ultrix-4) have a strange variation. 6667 6668OK. After incorporating the last changes I have to spend some time on 6669making the GNU/Linux @code{libc} @code{gettext} functions. So in future 6670Solaris is not the only system having @code{gettext}. 6671 6672@node Translators, Maintainers, Programmers, Top 6673@chapter The Translator's View 6674 6675@c FIXME: Reorganize whole chapter. 6676 6677@menu 6678* Trans Intro 0:: Introduction 0 6679* Trans Intro 1:: Introduction 1 6680* Discussions:: Discussions 6681* Organization:: Organization 6682* Information Flow:: Information Flow 6683* Prioritizing messages:: How to find which messages to translate first 6684@end menu 6685 6686@node Trans Intro 0, Trans Intro 1, Translators, Translators 6687@section Introduction 0 6688 6689@strong{ NOTE: } This documentation section is outdated and needs to be 6690revised. 6691 6692Free software is going international! The Translation Project is a way 6693to get maintainers, translators and users all together, so free software 6694will gradually become able to speak many native languages. 6695 6696The GNU @code{gettext} tool set contains @emph{everything} maintainers 6697need for internationalizing their packages for messages. It also 6698contains quite useful tools for helping translators at localizing 6699messages to their native language, once a package has already been 6700internationalized. 6701 6702To achieve the Translation Project, we need many interested 6703people who like their own language and write it well, and who are also 6704able to synergize with other translators speaking the same language. 6705If you'd like to volunteer to @emph{work} at translating messages, 6706please send mail to your translating team. 6707 6708Each team has its own mailing list, courtesy of Linux 6709International. You may reach your translating team at the address 6710@file{@var{ll}@@li.org}, replacing @var{ll} by the two-letter @w{ISO 639} 6711code for your language. Language codes are @emph{not} the same as 6712country codes given in @w{ISO 3166}. The following translating teams 6713exist: 6714 6715@quotation 6716Chinese @code{zh}, Czech @code{cs}, Danish @code{da}, Dutch @code{nl}, 6717Esperanto @code{eo}, Finnish @code{fi}, French @code{fr}, Irish 6718@code{ga}, German @code{de}, Greek @code{el}, Italian @code{it}, 6719Japanese @code{ja}, Indonesian @code{in}, Norwegian @code{no}, Polish 6720@code{pl}, Portuguese @code{pt}, Russian @code{ru}, Spanish @code{es}, 6721Swedish @code{sv} and Turkish @code{tr}. 6722@end quotation 6723 6724@noindent 6725For example, you may reach the Chinese translating team by writing to 6726@file{zh@@li.org}. When you become a member of the translating team 6727for your own language, you may subscribe to its list. For example, 6728Swedish people can send a message to @w{@file{sv-request@@li.org}}, 6729having this message body: 6730 6731@example 6732subscribe 6733@end example 6734 6735Keep in mind that team members should be interested in @emph{working} 6736at translations, or at solving translational difficulties, rather than 6737merely lurking around. If your team does not exist yet and you want to 6738start one, please write to @w{@file{coordinator@@translationproject.org}}; 6739you will then reach the coordinator for all translator teams. 6740 6741A handful of GNU packages have already been adapted and provided 6742with message translations for several languages. Translation 6743teams have begun to organize, using these packages as a starting 6744point. But there are many more packages and many languages for 6745which we have no volunteer translators. If you would like to 6746volunteer to work at translating messages, please send mail to 6747@file{coordinator@@translationproject.org} indicating what language(s) 6748you can work on. 6749 6750@node Trans Intro 1, Discussions, Trans Intro 0, Translators 6751@section Introduction 1 6752 6753@strong{ NOTE: } This documentation section is outdated and needs to be 6754revised. 6755 6756This is now official, GNU is going international! Here is the 6757announcement submitted for the January 1995 GNU Bulletin: 6758 6759@quotation 6760A handful of GNU packages have already been adapted and provided 6761with message translations for several languages. Translation 6762teams have begun to organize, using these packages as a starting 6763point. But there are many more packages and many languages 6764for which we have no volunteer translators. If you'd like to 6765volunteer to work at translating messages, please send mail to 6766@samp{coordinator@@translationproject.org} indicating what language(s) 6767you can work on. 6768@end quotation 6769 6770This document should answer many questions for those who are curious about 6771the process or would like to contribute. Please at least skim over it, 6772hoping to cut down a little of the high volume of e-mail generated by this 6773collective effort towards internationalization of free software. 6774 6775Most free programming which is widely shared is done in English, and 6776currently, English is used as the main communicating language between 6777national communities collaborating to free software. This very document 6778is written in English. This will not change in the foreseeable future. 6779 6780However, there is a strong appetite from national communities for 6781having more software able to write using national language and habits, 6782and there is an on-going effort to modify free software in such a way 6783that it becomes able to do so. The experiments driven so far raised 6784an enthusiastic response from pretesters, so we believe that 6785internationalization of free software is dedicated to succeed. 6786 6787For suggestion clarifications, additions or corrections to this 6788document, please e-mail to @file{coordinator@@translationproject.org}. 6789 6790@node Discussions, Organization, Trans Intro 1, Translators 6791@section Discussions 6792 6793@strong{ NOTE: } This documentation section is outdated and needs to be 6794revised. 6795 6796Facing this internationalization effort, a few users expressed their 6797concerns. Some of these doubts are presented and discussed, here. 6798 6799@itemize @bullet 6800@item Smaller groups 6801 6802Some languages are not spoken by a very large number of people, so people 6803speaking them sometimes consider that there may not be all that much 6804demand such versions of free software packages. Moreover, many people 6805being @emph{into computers}, in some countries, generally seem to prefer 6806English versions of their software. 6807 6808On the other end, people might enjoy their own language a lot, and be 6809very motivated at providing to themselves the pleasure of having their 6810beloved free software speaking their mother tongue. They do themselves 6811a personal favor, and do not pay that much attention to the number of 6812people benefiting of their work. 6813 6814@item Misinterpretation 6815 6816Other users are shy to push forward their own language, seeing in this 6817some kind of misplaced propaganda. Someone thought there must be some 6818users of the language over the networks pestering other people with it. 6819 6820But any spoken language is worth localization, because there are 6821people behind the language for whom the language is important and 6822dear to their hearts. 6823 6824@item Odd translations 6825 6826The biggest problem is to find the right translations so that 6827everybody can understand the messages. Translations are usually a 6828little odd. Some people get used to English, to the extent they may 6829find translations into their own language ``rather pushy, obnoxious 6830and sometimes even hilarious.'' As a French speaking man, I have 6831the experience of those instruction manuals for goods, so poorly 6832translated in French in Korea or Taiwan@dots{} 6833 6834The fact is that we sometimes have to create a kind of national 6835computer culture, and this is not easy without the collaboration of 6836many people liking their mother tongue. This is why translations are 6837better achieved by people knowing and loving their own language, and 6838ready to work together at improving the results they obtain. 6839 6840@item Dependencies over the GPL or LGPL 6841 6842Some people wonder if using GNU @code{gettext} necessarily brings their 6843package under the protective wing of the GNU General Public License or 6844the GNU Library General Public License, when they do not want to make 6845their program free, or want other kinds of freedom. The simplest 6846answer is ``normally not''. 6847 6848The @code{gettext-runtime} part of GNU @code{gettext}, i.e.@: the 6849contents of @code{libintl}, is covered by the GNU Library General Public 6850License. The @code{gettext-tools} part of GNU @code{gettext}, i.e.@: the 6851rest of the GNU @code{gettext} package, is covered by the GNU General 6852Public License. 6853 6854The mere marking of localizable strings in a package, or conditional 6855inclusion of a few lines for initialization, is not really including 6856GPL'ed or LGPL'ed code. However, since the localization routines in 6857@code{libintl} are under the LGPL, the LGPL needs to be considered. 6858It gives the right to distribute the complete unmodified source of 6859@code{libintl} even with non-free programs. It also gives the right 6860to use @code{libintl} as a shared library, even for non-free programs. 6861But it gives the right to use @code{libintl} as a static library or 6862to incorporate @code{libintl} into another library only to free 6863software. 6864 6865@end itemize 6866 6867@node Organization, Information Flow, Discussions, Translators 6868@section Organization 6869 6870@strong{ NOTE: } This documentation section is outdated and needs to be 6871revised. 6872 6873On a larger scale, the true solution would be to organize some kind of 6874fairly precise set up in which volunteers could participate. I gave 6875some thought to this idea lately, and realize there will be some 6876touchy points. I thought of writing to Richard Stallman to launch 6877such a project, but feel it might be good to shake out the ideas 6878between ourselves first. Most probably that Linux International has 6879some experience in the field already, or would like to orchestrate 6880the volunteer work, maybe. Food for thought, in any case! 6881 6882I guess we have to setup something early, somehow, that will help 6883many possible contributors of the same language to interlock and avoid 6884work duplication, and further be put in contact for solving together 6885problems particular to their tongue (in most languages, there are many 6886difficulties peculiar to translating technical English). My Swedish 6887contributor acknowledged these difficulties, and I'm well aware of 6888them for French. 6889 6890This is surely not a technical issue, but we should manage so the 6891effort of locale contributors be maximally useful, despite the national 6892team layer interface between contributors and maintainers. 6893 6894The Translation Project needs some setup for coordinating language 6895coordinators. Localizing evolving programs will surely 6896become a permanent and continuous activity in the free software community, 6897once well started. 6898The setup should be minimally completed and tested before GNU 6899@code{gettext} becomes an official reality. The e-mail address 6900@file{coordinator@@translationproject.org} has been set up for receiving 6901offers from volunteers and general e-mail on these topics. This address 6902reaches the Translation Project coordinator. 6903 6904@menu 6905* Central Coordination:: Central Coordination 6906* National Teams:: National Teams 6907* Mailing Lists:: Mailing Lists 6908@end menu 6909 6910@node Central Coordination, National Teams, Organization, Organization 6911@subsection Central Coordination 6912 6913I also think GNU will need sooner than it thinks, that someone set up 6914a way to organize and coordinate these groups. Some kind of group 6915of groups. My opinion is that it would be good that GNU delegates 6916this task to a small group of collaborating volunteers, shortly. 6917Perhaps in @file{gnu.announce} a list of this national committee's 6918can be published. 6919 6920My role as coordinator would simply be to refer to Ulrich any German 6921speaking volunteer interested to localization of free software packages, and 6922maybe helping national groups to initially organize, while maintaining 6923national registries for until national groups are ready to take over. 6924In fact, the coordinator should ease volunteers to get in contact with 6925one another for creating national teams, which should then select 6926one coordinator per language, or country (regionalized language). 6927If well done, the coordination should be useful without being an 6928overwhelming task, the time to put delegations in place. 6929 6930@node National Teams, Mailing Lists, Central Coordination, Organization 6931@subsection National Teams 6932 6933I suggest we look for volunteer coordinators/editors for individual 6934languages. These people will scan contributions of translation files 6935for various programs, for their own languages, and will ensure high 6936and uniform standards of diction. 6937 6938From my current experience with other people in these days, those who 6939provide localizations are very enthusiastic about the process, and are 6940more interested in the localization process than in the program they 6941localize, and want to do many programs, not just one. This seems 6942to confirm that having a coordinator/editor for each language is a 6943good idea. 6944 6945We need to choose someone who is good at writing clear and concise 6946prose in the language in question. That is hard---we can't check 6947it ourselves. So we need to ask a few people to judge each others' 6948writing and select the one who is best. 6949 6950I announce my prerelease to a few dozen people, and you would not 6951believe all the discussions it generated already. I shudder to think 6952what will happen when this will be launched, for true, officially, 6953world wide. Who am I to arbitrate between two Czekolsovak users 6954contradicting each other, for example? 6955 6956I assume that your German is not much better than my French so that 6957I would not be able to judge about these formulations. What I would 6958suggest is that for each language there is a group for people who 6959maintain the PO files and judge about changes. I suspect there will 6960be cultural differences between how such groups of people will behave. 6961Some will have relaxed ways, reach consensus easily, and have anyone 6962of the group relate to the maintainers, while others will fight to 6963death, organize heavy administrations up to national standards, and 6964use strict channels. 6965 6966The German team is putting out a good example. Right now, they are 6967maybe half a dozen people revising translations of each other and 6968discussing the linguistic issues. I do not even have all the names. 6969Ulrich Drepper is taking care of coordinating the German team. 6970He subscribed to all my pretest lists, so I do not even have to warn 6971him specifically of incoming releases. 6972 6973I'm sure, that is a good idea to get teams for each language working 6974on translations. That will make the translations better and more 6975consistent. 6976 6977@menu 6978* Sub-Cultures:: Sub-Cultures 6979* Organizational Ideas:: Organizational Ideas 6980@end menu 6981 6982@node Sub-Cultures, Organizational Ideas, National Teams, National Teams 6983@subsubsection Sub-Cultures 6984 6985Taking French for example, there are a few sub-cultures around computers 6986which developed diverging vocabularies. Picking volunteers here and 6987there without addressing this problem in an organized way, soon in the 6988project, might produce a distasteful mix of internationalized programs, 6989and possibly trigger endless quarrels among those who really care. 6990 6991Keeping some kind of unity in the way French localization of 6992internationalized programs is achieved is a difficult (and delicate) job. 6993Knowing the latin character of French people (:-), if we take this 6994the wrong way, we could end up nowhere, or spoil a lot of energies. 6995Maybe we should begin to address this problem seriously @emph{before} 6996GNU @code{gettext} become officially published. And I suspect that this 6997means soon! 6998 6999@node Organizational Ideas, , Sub-Cultures, National Teams 7000@subsubsection Organizational Ideas 7001 7002I expect the next big changes after the official release. Please note 7003that I use the German translation of the short GPL message. We need 7004to set a few good examples before the localization goes out for true 7005in the free software community. Here are a few points to discuss: 7006 7007@itemize @bullet 7008@item 7009Each group should have one FTP server (at least one master). 7010 7011@item 7012The files on the server should reflect the latest version (of 7013course!) and it should also contain a RCS directory with the 7014corresponding archives (I don't have this now). 7015 7016@item 7017There should also be a ChangeLog file (this is more useful than the 7018RCS archive but can be generated automatically from the later by 7019Emacs). 7020 7021@item 7022A @dfn{core group} should judge about questionable changes (for now 7023this group consists solely by me but I ask some others occasionally; 7024this also seems to work). 7025 7026@end itemize 7027 7028@node Mailing Lists, , National Teams, Organization 7029@subsection Mailing Lists 7030 7031If we get any inquiries about GNU @code{gettext}, send them on to: 7032 7033@example 7034@file{coordinator@@translationproject.org} 7035@end example 7036 7037The @file{*-pretest} lists are quite useful to me, maybe the idea could 7038be generalized to many GNU, and non-GNU packages. But each maintainer 7039his/her way! 7040 7041Fran@,{c}ois, we have a mechanism in place here at 7042@file{gnu.ai.mit.edu} to track teams, support mailing lists for 7043them and log members. We have a slight preference that you use it. 7044If this is OK with you, I can get you clued in. 7045 7046Things are changing! A few years ago, when Daniel Fekete and I 7047asked for a mailing list for GNU localization, nested at the FSF, we 7048were politely invited to organize it anywhere else, and so did we. 7049For communicating with my pretesters, I later made a handful of 7050mailing lists located at iro.umontreal.ca and administrated by 7051@code{majordomo}. These lists have been @emph{very} dependable 7052so far@dots{} 7053 7054I suspect that the German team will organize itself a mailing list 7055located in Germany, and so forth for other countries. But before they 7056organize for true, it could surely be useful to offer mailing lists 7057located at the FSF to each national team. So yes, please explain me 7058how I should proceed to create and handle them. 7059 7060We should create temporary mailing lists, one per country, to help 7061people organize. Temporary, because once regrouped and structured, it 7062would be fair the volunteers from country bring back @emph{their} list 7063in there and manage it as they want. My feeling is that, in the long 7064run, each team should run its own list, from within their country. 7065There also should be some central list to which all teams could 7066subscribe as they see fit, as long as each team is represented in it. 7067 7068@node Information Flow, Prioritizing messages, Organization, Translators 7069@section Information Flow 7070 7071@strong{ NOTE: } This documentation section is outdated and needs to be 7072revised. 7073 7074There will surely be some discussion about this messages after the 7075packages are finally released. If people now send you some proposals 7076for better messages, how do you proceed? Jim, please note that 7077right now, as I put forward nearly a dozen of localizable programs, I 7078receive both the translations and the coordination concerns about them. 7079 7080If I put one of my things to pretest, Ulrich receives the announcement 7081and passes it on to the German team, who make last minute revisions. 7082Then he submits the translation files to me @emph{as the maintainer}. 7083For free packages I do not maintain, I would not even hear about it. 7084This scheme could be made to work for the whole Translation Project, 7085I think. For security reasons, maybe Ulrich (national coordinators, 7086in fact) should update central registry kept at the Translation Project 7087(Jim, me, or Len's recruits) once in a while. 7088 7089In December/January, I was aggressively ready to internationalize 7090all of GNU, giving myself the duty of one small GNU package per week 7091or so, taking many weeks or months for bigger packages. But it does 7092not work this way. I first did all the things I'm responsible for. 7093I've nothing against some missionary work on other maintainers, but 7094I'm also loosing a lot of energy over it---same debates over again. 7095 7096And when the first localized packages are released we'll get a lot of 7097responses about ugly translations :-). Surely, and we need to have 7098beforehand a fairly good idea about how to handle the information 7099flow between the national teams and the package maintainers. 7100 7101Please start saving somewhere a quick history of each PO file. I know 7102for sure that the file format will change, allowing for comments. 7103It would be nice that each file has a kind of log, and references for 7104those who want to submit comments or gripes, or otherwise contribute. 7105I sent a proposal for a fast and flexible format, but it is not 7106receiving acceptance yet by the GNU deciders. I'll tell you when I 7107have more information about this. 7108 7109@node Prioritizing messages, , Information Flow, Translators 7110@section Prioritizing messages: How to determine which messages to translate first 7111 7112A translator sometimes has only a limited amount of time per week to 7113spend on a package, and some packages have quite large message catalogs 7114(over 1000 messages). Therefore she wishes to translate the messages 7115first that are the most visible to the user, or that occur most frequently. 7116This section describes how to determine these "most urgent" messages. 7117It also applies to determine the "next most urgent" messages after the 7118message catalog has already been partially translated. 7119 7120In a first step, she uses the programs like a user would do. While she 7121does this, the GNU @code{gettext} library logs into a file the not yet 7122translated messages for which a translation was requested from the program. 7123 7124In a second step, she uses the PO mode to translate precisely this set 7125of messages. 7126 7127@vindex GETTEXT_LOG_UNTRANSLATED@r{, environment variable} 7128Here a more details. The GNU @code{libintl} library (but not the 7129corresponding functions in GNU @code{libc}) supports an environment variable 7130@code{GETTEXT_LOG_UNTRANSLATED}. The GNU @code{libintl} library will 7131log into this file the messages for which @code{gettext()} and related 7132functions couldn't find the translation. If the file doesn't exist, it 7133will be created as needed. On systems with GNU @code{libc} a shared library 7134@samp{preloadable_libintl.so} is provided that can be used with the ELF 7135@samp{LD_PRELOAD} mechanism. 7136 7137So, in the first step, the translator uses these commands on systems with 7138GNU @code{libc}: 7139 7140@smallexample 7141$ LD_PRELOAD=/usr/local/lib/preloadable_libintl.so 7142$ export LD_PRELOAD 7143$ GETTEXT_LOG_UNTRANSLATED=$HOME/gettextlogused 7144$ export GETTEXT_LOG_UNTRANSLATED 7145@end smallexample 7146 7147@noindent 7148and these commands on other systems: 7149 7150@smallexample 7151$ GETTEXT_LOG_UNTRANSLATED=$HOME/gettextlogused 7152$ export GETTEXT_LOG_UNTRANSLATED 7153@end smallexample 7154 7155Then she uses and peruses the programs. (It is a good and recommended 7156practice to use the programs for which you provide translations: it 7157gives you the needed context.) When done, she removes the environment 7158variables: 7159 7160@smallexample 7161$ unset LD_PRELOAD 7162$ unset GETTEXT_LOG_UNTRANSLATED 7163@end smallexample 7164 7165The second step starts with removing duplicates: 7166 7167@smallexample 7168$ msguniq $HOME/gettextlogused > missing.po 7169@end smallexample 7170 7171The result is a PO file, but needs some preprocessing before a PO file editor 7172can be used with it. First, it is a multi-domain PO file, containing 7173messages from many translation domains. Second, it lacks all translator 7174comments and source references. Here is how to get a list of the affected 7175translation domains: 7176 7177@smallexample 7178$ sed -n -e 's,^domain "\(.*\)"$,\1,p' < missing.po | sort | uniq 7179@end smallexample 7180 7181Then the translator can handle the domains one by one. For simplicity, 7182let's use environment variables to denote the language, domain and source 7183package. 7184 7185@smallexample 7186$ lang=nl # your language 7187$ domain=coreutils # the name of the domain to be handled 7188$ package=/usr/src/gnu/coreutils-4.5.4 # the package where it comes from 7189@end smallexample 7190 7191She takes the latest copy of @file{$lang.po} from the Translation Project, 7192or from the package (in most cases, @file{$package/po/$lang.po}), or 7193creates a fresh one if she's the first translator (see @ref{Creating}). 7194She then uses the following commands to mark the not urgent messages as 7195"obsolete". (This doesn't mean that these messages - translated and 7196untranslated ones - will go away. It simply means that the PO file editor 7197will ignore them in the following editing session.) 7198 7199@smallexample 7200$ msggrep --domain=$domain missing.po | grep -v '^domain' \ 7201 > $domain-missing.po 7202$ msgattrib --set-obsolete --ignore-file $domain-missing.po $domain.$lang.po \ 7203 > $domain.$lang-urgent.po 7204@end smallexample 7205 7206The she translates @file{$domain.$lang-urgent.po} by use of a PO file editor 7207(@pxref{Editing}). 7208(FIXME: I don't know whether @code{KBabel} and @code{gtranslator} also 7209preserve obsolete messages, as they should.) 7210Finally she restores the not urgent messages (with their earlier 7211translations, for those which were already translated) through this command: 7212 7213@smallexample 7214$ msgmerge --no-fuzzy-matching $domain.$lang-urgent.po $package/po/$domain.pot \ 7215 > $domain.$lang.po 7216@end smallexample 7217 7218Then she can submit @file{$domain.$lang.po} and proceed to the next domain. 7219 7220@node Maintainers, Installers, Translators, Top 7221@chapter The Maintainer's View 7222@cindex package maintainer's view of @code{gettext} 7223 7224The maintainer of a package has many responsibilities. One of them 7225is ensuring that the package will install easily on many platforms, 7226and that the magic we described earlier (@pxref{Users}) will work 7227for installers and end users. 7228 7229Of course, there are many possible ways by which GNU @code{gettext} 7230might be integrated in a distribution, and this chapter does not cover 7231them in all generality. Instead, it details one possible approach which 7232is especially adequate for many free software distributions following GNU 7233standards, or even better, Gnits standards, because GNU @code{gettext} 7234is purposely for helping the internationalization of the whole GNU 7235project, and as many other good free packages as possible. So, the 7236maintainer's view presented here presumes that the package already has 7237a @file{configure.ac} file and uses GNU Autoconf. 7238 7239Nevertheless, GNU @code{gettext} may surely be useful for free packages 7240not following GNU standards and conventions, but the maintainers of such 7241packages might have to show imagination and initiative in organizing 7242their distributions so @code{gettext} work for them in all situations. 7243There are surely many, out there. 7244 7245Even if @code{gettext} methods are now stabilizing, slight adjustments 7246might be needed between successive @code{gettext} versions, so you 7247should ideally revise this chapter in subsequent releases, looking 7248for changes. 7249 7250@menu 7251* Flat and Non-Flat:: Flat or Non-Flat Directory Structures 7252* Prerequisites:: Prerequisite Works 7253* gettextize Invocation:: Invoking the @code{gettextize} Program 7254* Adjusting Files:: Files You Must Create or Alter 7255* autoconf macros:: Autoconf macros for use in @file{configure.ac} 7256* CVS Issues:: Integrating with CVS 7257* Release Management:: Creating a Distribution Tarball 7258@end menu 7259 7260@node Flat and Non-Flat, Prerequisites, Maintainers, Maintainers 7261@section Flat or Non-Flat Directory Structures 7262 7263Some free software packages are distributed as @code{tar} files which unpack 7264in a single directory, these are said to be @dfn{flat} distributions. 7265Other free software packages have a one level hierarchy of subdirectories, using 7266for example a subdirectory named @file{doc/} for the Texinfo manual and 7267man pages, another called @file{lib/} for holding functions meant to 7268replace or complement C libraries, and a subdirectory @file{src/} for 7269holding the proper sources for the package. These other distributions 7270are said to be @dfn{non-flat}. 7271 7272We cannot say much about flat distributions. A flat 7273directory structure has the disadvantage of increasing the difficulty 7274of updating to a new version of GNU @code{gettext}. Also, if you have 7275many PO files, this could somewhat pollute your single directory. 7276Also, GNU @code{gettext}'s libintl sources consist of C sources, shell 7277scripts, @code{sed} scripts and complicated Makefile rules, which don't 7278fit well into an existing flat structure. For these reasons, we 7279recommend to use non-flat approach in this case as well. 7280 7281Maybe because GNU @code{gettext} itself has a non-flat structure, 7282we have more experience with this approach, and this is what will be 7283described in the remaining of this chapter. Some maintainers might 7284use this as an opportunity to unflatten their package structure. 7285 7286@node Prerequisites, gettextize Invocation, Flat and Non-Flat, Maintainers 7287@section Prerequisite Works 7288@cindex converting a package to use @code{gettext} 7289@cindex migration from earlier versions of @code{gettext} 7290@cindex upgrading to new versions of @code{gettext} 7291 7292There are some works which are required for using GNU @code{gettext} 7293in one of your package. These works have some kind of generality 7294that escape the point by point descriptions used in the remainder 7295of this chapter. So, we describe them here. 7296 7297@itemize @bullet 7298@item 7299Before attempting to use @code{gettextize} you should install some 7300other packages first. 7301Ensure that recent versions of GNU @code{m4}, GNU Autoconf and GNU 7302@code{gettext} are already installed at your site, and if not, proceed 7303to do this first. If you get to install these things, beware that 7304GNU @code{m4} must be fully installed before GNU Autoconf is even 7305@emph{configured}. 7306 7307To further ease the task of a package maintainer the @code{automake} 7308package was designed and implemented. GNU @code{gettext} now uses this 7309tool and the @file{Makefile}s in the @file{intl/} and @file{po/} 7310therefore know about all the goals necessary for using @code{automake} 7311and @file{libintl} in one project. 7312 7313Those four packages are only needed by you, as a maintainer; the 7314installers of your own package and end users do not really need any of 7315GNU @code{m4}, GNU Autoconf, GNU @code{gettext}, or GNU @code{automake} 7316for successfully installing and running your package, with messages 7317properly translated. But this is not completely true if you provide 7318internationalized shell scripts within your own package: GNU 7319@code{gettext} shall then be installed at the user site if the end users 7320want to see the translation of shell script messages. 7321 7322@item 7323Your package should use Autoconf and have a @file{configure.ac} or 7324@file{configure.in} file. 7325If it does not, you have to learn how. The Autoconf documentation 7326is quite well written, it is a good idea that you print it and get 7327familiar with it. 7328 7329@item 7330Your C sources should have already been modified according to 7331instructions given earlier in this manual. @xref{Sources}. 7332 7333@item 7334Your @file{po/} directory should receive all PO files submitted to you 7335by the translator teams, each having @file{@var{ll}.po} as a name. 7336This is not usually easy to get translation 7337work done before your package gets internationalized and available! 7338Since the cycle has to start somewhere, the easiest for the maintainer 7339is to start with absolutely no PO files, and wait until various 7340translator teams get interested in your package, and submit PO files. 7341 7342@end itemize 7343 7344It is worth adding here a few words about how the maintainer should 7345ideally behave with PO files submissions. As a maintainer, your role is 7346to authenticate the origin of the submission as being the representative 7347of the appropriate translating teams of the Translation Project (forward 7348the submission to @file{coordinator@@translationproject.org} in case of doubt), 7349to ensure that the PO file format is not severely broken and does not 7350prevent successful installation, and for the rest, to merely put these 7351PO files in @file{po/} for distribution. 7352 7353As a maintainer, you do not have to take on your shoulders the 7354responsibility of checking if the translations are adequate or 7355complete, and should avoid diving into linguistic matters. Translation 7356teams drive themselves and are fully responsible of their linguistic 7357choices for the Translation Project. Keep in mind that translator teams are @emph{not} 7358driven by maintainers. You can help by carefully redirecting all 7359communications and reports from users about linguistic matters to the 7360appropriate translation team, or explain users how to reach or join 7361their team. The simplest might be to send them the @file{ABOUT-NLS} file. 7362 7363Maintainers should @emph{never ever} apply PO file bug reports 7364themselves, short-cutting translation teams. If some translator has 7365difficulty to get some of her points through her team, it should not be 7366an option for her to directly negotiate translations with maintainers. 7367Teams ought to settle their problems themselves, if any. If you, as 7368a maintainer, ever think there is a real problem with a team, please 7369never try to @emph{solve} a team's problem on your own. 7370 7371@node gettextize Invocation, Adjusting Files, Prerequisites, Maintainers 7372@section Invoking the @code{gettextize} Program 7373 7374@include gettextize.texi 7375 7376@node Adjusting Files, autoconf macros, gettextize Invocation, Maintainers 7377@section Files You Must Create or Alter 7378@cindex @code{gettext} files 7379 7380Besides files which are automatically added through @code{gettextize}, 7381there are many files needing revision for properly interacting with 7382GNU @code{gettext}. If you are closely following GNU standards for 7383Makefile engineering and auto-configuration, the adaptations should 7384be easier to achieve. Here is a point by point description of the 7385changes needed in each. 7386 7387So, here comes a list of files, each one followed by a description of 7388all alterations it needs. Many examples are taken out from the GNU 7389@code{gettext} @value{VERSION} distribution itself, or from the GNU 7390@code{hello} distribution (@uref{http://www.franken.de/users/gnu/ke/hello} 7391or @uref{http://www.gnu.franken.de/ke/hello/}) You may indeed 7392refer to the source code of the GNU @code{gettext} and GNU @code{hello} 7393packages, as they are intended to be good examples for using GNU 7394gettext functionality. 7395 7396@menu 7397* po/POTFILES.in:: @file{POTFILES.in} in @file{po/} 7398* po/LINGUAS:: @file{LINGUAS} in @file{po/} 7399* po/Makevars:: @file{Makevars} in @file{po/} 7400* po/Rules-*:: Extending @file{Makefile} in @file{po/} 7401* configure.ac:: @file{configure.ac} at top level 7402* config.guess:: @file{config.guess}, @file{config.sub} at top level 7403* mkinstalldirs:: @file{mkinstalldirs} at top level 7404* aclocal:: @file{aclocal.m4} at top level 7405* acconfig:: @file{acconfig.h} at top level 7406* config.h.in:: @file{config.h.in} at top level 7407* Makefile:: @file{Makefile.in} at top level 7408* src/Makefile:: @file{Makefile.in} in @file{src/} 7409* lib/gettext.h:: @file{gettext.h} in @file{lib/} 7410@end menu 7411 7412@node po/POTFILES.in, po/LINGUAS, Adjusting Files, Adjusting Files 7413@subsection @file{POTFILES.in} in @file{po/} 7414@cindex @file{POTFILES.in} file 7415 7416The @file{po/} directory should receive a file named 7417@file{POTFILES.in}. This file tells which files, among all program 7418sources, have marked strings needing translation. Here is an example 7419of such a file: 7420 7421@example 7422@group 7423# List of source files containing translatable strings. 7424# Copyright (C) 1995 Free Software Foundation, Inc. 7425 7426# Common library files 7427lib/error.c 7428lib/getopt.c 7429lib/xmalloc.c 7430 7431# Package source files 7432src/gettext.c 7433src/msgfmt.c 7434src/xgettext.c 7435@end group 7436@end example 7437 7438@noindent 7439Hash-marked comments and white lines are ignored. All other lines 7440list those source files containing strings marked for translation 7441(@pxref{Mark Keywords}), in a notation relative to the top level 7442of your whole distribution, rather than the location of the 7443@file{POTFILES.in} file itself. 7444 7445When a C file is automatically generated by a tool, like @code{flex} or 7446@code{bison}, that doesn't introduce translatable strings by itself, 7447it is recommended to list in @file{po/POTFILES.in} the real source file 7448(ending in @file{.l} in the case of @code{flex}, or in @file{.y} in the 7449case of @code{bison}), not the generated C file. 7450 7451@node po/LINGUAS, po/Makevars, po/POTFILES.in, Adjusting Files 7452@subsection @file{LINGUAS} in @file{po/} 7453@cindex @file{LINGUAS} file 7454 7455The @file{po/} directory should also receive a file named 7456@file{LINGUAS}. This file contains the list of available translations. 7457It is a whitespace separated list. Hash-marked comments and white lines 7458are ignored. Here is an example file: 7459 7460@example 7461@group 7462# Set of available languages. 7463de fr 7464@end group 7465@end example 7466 7467@noindent 7468This example means that German and French PO files are available, so 7469that these languages are currently supported by your package. If you 7470want to further restrict, at installation time, the set of installed 7471languages, this should not be done by modifying the @file{LINGUAS} file, 7472but rather by using the @code{LINGUAS} environment variable 7473(@pxref{Installers}). 7474 7475It is recommended that you add the "languages" @samp{en@@quot} and 7476@samp{en@@boldquot} to the @code{LINGUAS} file. @code{en@@quot} is a 7477variant of English message catalogs (@code{en}) which uses real quotation 7478marks instead of the ugly looking asymmetric ASCII substitutes @samp{`} 7479and @samp{'}. @code{en@@boldquot} is a variant of @code{en@@quot} that 7480additionally outputs quoted pieces of text in a bold font, when used in 7481a terminal emulator which supports the VT100 escape sequences (such as 7482@code{xterm} or the Linux console, but not Emacs in @kbd{M-x shell} mode). 7483 7484These extra message catalogs @samp{en@@quot} and @samp{en@@boldquot} 7485are constructed automatically, not by translators; to support them, you 7486need the files @file{Rules-quot}, @file{quot.sed}, @file{boldquot.sed}, 7487@file{en@@quot.header}, @file{en@@boldquot.header}, @file{insert-header.sin} 7488in the @file{po/} directory. You can copy them from GNU gettext's @file{po/} 7489directory; they are also installed by running @code{gettextize}. 7490 7491@node po/Makevars, po/Rules-*, po/LINGUAS, Adjusting Files 7492@subsection @file{Makevars} in @file{po/} 7493@cindex @file{Makevars} file 7494 7495The @file{po/} directory also has a file named @file{Makevars}. It 7496contains variables that are specific to your project. @file{po/Makevars} 7497gets inserted into the @file{po/Makefile} when the latter is created. 7498The variables thus take effect when the POT file is created or updated, 7499and when the message catalogs get installed. 7500 7501The first three variables can be left unmodified if your package has a 7502single message domain and, accordingly, a single @file{po/} directory. 7503Only packages which have multiple @file{po/} directories at different 7504locations need to adjust the three first variables defined in 7505@file{Makevars}. 7506 7507As an alternative to the @code{XGETTEXT_OPTIONS} variables, it is also 7508possible to specify @code{xgettext} options through the 7509@code{AM_XGETTEXT_OPTION} autoconf macro. See @ref{AM_XGETTEXT_OPTION}. 7510 7511@node po/Rules-*, configure.ac, po/Makevars, Adjusting Files 7512@subsection Extending @file{Makefile} in @file{po/} 7513@cindex @file{Makefile.in.in} extensions 7514 7515All files called @file{Rules-*} in the @file{po/} directory get appended to 7516the @file{po/Makefile} when it is created. They present an opportunity to 7517add rules for special PO files to the Makefile, without needing to mess 7518with @file{po/Makefile.in.in}. 7519 7520@cindex quotation marks 7521@vindex LANGUAGE@r{, environment variable} 7522GNU gettext comes with a @file{Rules-quot} file, containing rules for 7523building catalogs @file{en@@quot.po} and @file{en@@boldquot.po}. The 7524effect of @file{en@@quot.po} is that people who set their @code{LANGUAGE} 7525environment variable to @samp{en@@quot} will get messages with proper 7526looking symmetric Unicode quotation marks instead of abusing the ASCII 7527grave accent and the ASCII apostrophe for indicating quotations. To 7528enable this catalog, simply add @code{en@@quot} to the @file{po/LINGUAS} 7529file. The effect of @file{en@@boldquot.po} is that people who set 7530@code{LANGUAGE} to @samp{en@@boldquot} will get not only proper quotation 7531marks, but also the quoted text will be shown in a bold font on terminals 7532and consoles. This catalog is useful only for command-line programs, not 7533GUI programs. To enable it, similarly add @code{en@@boldquot} to the 7534@file{po/LINGUAS} file. 7535 7536Similarly, you can create rules for building message catalogs for the 7537@file{sr@@latin} locale -- Serbian written with the Latin alphabet -- 7538from those for the @file{sr} locale -- Serbian written with Cyrillic 7539letters. See @ref{msgfilter Invocation}. 7540 7541@node configure.ac, config.guess, po/Rules-*, Adjusting Files 7542@subsection @file{configure.ac} at top level 7543 7544@file{configure.ac} or @file{configure.in} - this is the source from which 7545@code{autoconf} generates the @file{configure} script. 7546 7547@enumerate 7548@item Declare the package and version. 7549@cindex package and version declaration in @file{configure.ac} 7550 7551This is done by a set of lines like these: 7552 7553@example 7554PACKAGE=gettext 7555VERSION=@value{VERSION} 7556AC_DEFINE_UNQUOTED(PACKAGE, "$PACKAGE") 7557AC_DEFINE_UNQUOTED(VERSION, "$VERSION") 7558AC_SUBST(PACKAGE) 7559AC_SUBST(VERSION) 7560@end example 7561 7562@noindent 7563or, if you are using GNU @code{automake}, by a line like this: 7564 7565@example 7566AM_INIT_AUTOMAKE(gettext, @value{VERSION}) 7567@end example 7568 7569@noindent 7570Of course, you replace @samp{gettext} with the name of your package, 7571and @samp{@value{VERSION}} by its version numbers, exactly as they 7572should appear in the packaged @code{tar} file name of your distribution 7573(@file{gettext-@value{VERSION}.tar.gz}, here). 7574 7575@item Check for internationalization support. 7576 7577Here is the main @code{m4} macro for triggering internationalization 7578support. Just add this line to @file{configure.ac}: 7579 7580@example 7581AM_GNU_GETTEXT 7582@end example 7583 7584@noindent 7585This call is purposely simple, even if it generates a lot of configure 7586time checking and actions. 7587 7588If you have suppressed the @file{intl/} subdirectory by calling 7589@code{gettextize} without @samp{--intl} option, this call should read 7590 7591@example 7592AM_GNU_GETTEXT([external]) 7593@end example 7594 7595@item Have output files created. 7596 7597The @code{AC_OUTPUT} directive, at the end of your @file{configure.ac} 7598file, needs to be modified in two ways: 7599 7600@example 7601AC_OUTPUT([@var{existing configuration files} intl/Makefile po/Makefile.in], 7602[@var{existing additional actions}]) 7603@end example 7604 7605The modification to the first argument to @code{AC_OUTPUT} asks 7606for substitution in the @file{intl/} and @file{po/} directories. 7607Note the @samp{.in} suffix used for @file{po/} only. This is because 7608the distributed file is really @file{po/Makefile.in.in}. 7609 7610If you have suppressed the @file{intl/} subdirectory by calling 7611@code{gettextize} without @samp{--intl} option, then you don't need to 7612add @code{intl/Makefile} to the @code{AC_OUTPUT} line. 7613 7614@end enumerate 7615 7616If, after doing the recommended modifications, a command like 7617@samp{aclocal -I m4} or @samp{autoconf} or @samp{autoreconf} fails with 7618a trace similar to this: 7619 7620@smallexample 7621configure.ac:44: warning: AC_COMPILE_IFELSE was called before AC_GNU_SOURCE 7622../../lib/autoconf/specific.m4:335: AC_GNU_SOURCE is expanded from... 7623m4/lock.m4:224: gl_LOCK is expanded from... 7624m4/gettext.m4:571: gt_INTL_SUBDIR_CORE is expanded from... 7625m4/gettext.m4:472: AM_INTL_SUBDIR is expanded from... 7626m4/gettext.m4:347: AM_GNU_GETTEXT is expanded from... 7627configure.ac:44: the top level 7628configure.ac:44: warning: AC_RUN_IFELSE was called before AC_GNU_SOURCE 7629@end smallexample 7630 7631@noindent 7632you need to add an explicit invocation of @samp{AC_GNU_SOURCE} in the 7633@file{configure.ac} file - after @samp{AC_PROG_CC} but before 7634@samp{AM_GNU_GETTEXT}, most likely very close to the @samp{AC_PROG_CC} 7635invocation. This is necessary because of ordering restrictions imposed 7636by GNU autoconf. 7637 7638@node config.guess, mkinstalldirs, configure.ac, Adjusting Files 7639@subsection @file{config.guess}, @file{config.sub} at top level 7640 7641If you haven't suppressed the @file{intl/} subdirectory, 7642you need to add the GNU @file{config.guess} and @file{config.sub} files 7643to your distribution. They are needed because the @file{intl/} directory 7644has platform dependent support for determining the locale's character 7645encoding and therefore needs to identify the platform. 7646 7647You can obtain the newest version of @file{config.guess} and 7648@file{config.sub} from the CVS of the @samp{config} project at 7649@file{http://savannah.gnu.org/}. The commands to fetch them are 7650@smallexample 7651$ wget 'http://savannah.gnu.org/cgi-bin/viewcvs/*checkout*/config/config/config.guess' 7652$ wget 'http://savannah.gnu.org/cgi-bin/viewcvs/*checkout*/config/config/config.sub' 7653@end smallexample 7654@noindent 7655Less recent versions are also contained in the GNU @code{automake} and 7656GNU @code{libtool} packages. 7657 7658Normally, @file{config.guess} and @file{config.sub} are put at the 7659top level of a distribution. But it is also possible to put them in a 7660subdirectory, altogether with other configuration support files like 7661@file{install-sh}, @file{ltconfig}, @file{ltmain.sh} or @file{missing}. 7662All you need to do, other than moving the files, is to add the following line 7663to your @file{configure.ac}. 7664 7665@example 7666AC_CONFIG_AUX_DIR([@var{subdir}]) 7667@end example 7668 7669@node mkinstalldirs, aclocal, config.guess, Adjusting Files 7670@subsection @file{mkinstalldirs} at top level 7671@cindex @file{mkinstalldirs} file 7672 7673With earlier versions of GNU gettext, you needed to add the GNU 7674@file{mkinstalldirs} script to your distribution. This is not needed any 7675more. You can remove it if you not also using an automake version older than 7676automake 1.9. 7677 7678@node aclocal, acconfig, mkinstalldirs, Adjusting Files 7679@subsection @file{aclocal.m4} at top level 7680@cindex @file{aclocal.m4} file 7681 7682If you do not have an @file{aclocal.m4} file in your distribution, 7683the simplest is to concatenate the files @file{codeset.m4}, 7684@file{gettext.m4}, @file{glibc2.m4}, @file{glibc21.m4}, @file{iconv.m4}, 7685@file{intdiv0.m4}, @file{intl.m4}, @file{intldir.m4}, @file{intlmacosx.m4}, 7686@file{intmax.m4}, @file{inttypes_h.m4}, @file{inttypes-pri.m4}, 7687@file{lcmessage.m4}, @file{lib-ld.m4}, @file{lib-link.m4}, 7688@file{lib-prefix.m4}, @file{lock.m4}, @file{longlong.m4}, @file{nls.m4}, 7689@file{po.m4}, @file{printf-posix.m4}, @file{progtest.m4}, @file{size_max.m4}, 7690@file{stdint_h.m4}, @file{uintmax_t.m4}, @file{visibility.m4}, 7691@file{wchar_t.m4}, @file{wint_t.m4}, @file{xsize.m4} 7692from GNU @code{gettext}'s 7693@file{m4/} directory into a single file. If you have suppressed the 7694@file{intl/} directory, only @file{gettext.m4}, @file{iconv.m4}, 7695@file{lib-ld.m4}, @file{lib-link.m4}, @file{lib-prefix.m4}, 7696@file{nls.m4}, @file{po.m4}, @file{progtest.m4} need to be concatenated. 7697 7698If you are not using GNU @code{automake} 1.8 or newer, you will need to 7699add a file @file{mkdirp.m4} from a newer automake distribution to the 7700list of files above. 7701 7702If you already have an @file{aclocal.m4} file, then you will have 7703to merge the said macro files into your @file{aclocal.m4}. Note that if 7704you are upgrading from a previous release of GNU @code{gettext}, you 7705should most probably @emph{replace} the macros (@code{AM_GNU_GETTEXT}, 7706etc.), as they usually 7707change a little from one release of GNU @code{gettext} to the next. 7708Their contents may vary as we get more experience with strange systems 7709out there. 7710 7711If you are using GNU @code{automake} 1.5 or newer, it is enough to put 7712these macro files into a subdirectory named @file{m4/} and add the line 7713 7714@example 7715ACLOCAL_AMFLAGS = -I m4 7716@end example 7717 7718@noindent 7719to your top level @file{Makefile.am}. 7720 7721These macros check for the internationalization support functions 7722and related informations. Hopefully, once stabilized, these macros 7723might be integrated in the standard Autoconf set, because this 7724piece of @code{m4} code will be the same for all projects using GNU 7725@code{gettext}. 7726 7727@node acconfig, config.h.in, aclocal, Adjusting Files 7728@subsection @file{acconfig.h} at top level 7729@cindex @file{acconfig.h} file 7730 7731Earlier GNU @code{gettext} releases required to put definitions for 7732@code{ENABLE_NLS}, @code{HAVE_GETTEXT} and @code{HAVE_LC_MESSAGES}, 7733@code{HAVE_STPCPY}, @code{PACKAGE} and @code{VERSION} into an 7734@file{acconfig.h} file. This is not needed any more; you can remove 7735them from your @file{acconfig.h} file unless your package uses them 7736independently from the @file{intl/} directory. 7737 7738@node config.h.in, Makefile, acconfig, Adjusting Files 7739@subsection @file{config.h.in} at top level 7740@cindex @file{config.h.in} file 7741 7742The include file template that holds the C macros to be defined by 7743@code{configure} is usually called @file{config.h.in} and may be 7744maintained either manually or automatically. 7745 7746If @code{gettextize} has created an @file{intl/} directory, this file 7747must be called @file{config.h.in} and must be at the top level. If, 7748however, you have suppressed the @file{intl/} directory by calling 7749@code{gettextize} without @samp{--intl} option, then you can choose the 7750name of this file and its location freely. 7751 7752If it is maintained automatically, by use of the @samp{autoheader} 7753program, you need to do nothing about it. This is the case in particular 7754if you are using GNU @code{automake}. 7755 7756If it is maintained manually, and if @code{gettextize} has created an 7757@file{intl/} directory, you should switch to using @samp{autoheader}. 7758The list of C macros to be added for the sake of the @file{intl/} 7759directory is just too long to be maintained manually; it also changes 7760between different versions of GNU @code{gettext}. 7761 7762If it is maintained manually, and if on the other hand you have 7763suppressed the @file{intl/} directory by calling @code{gettextize} 7764without @samp{--intl} option, then you can get away by adding the 7765following lines to @file{config.h.in}: 7766 7767@example 7768/* Define to 1 if translation of program messages to the user's 7769 native language is requested. */ 7770#undef ENABLE_NLS 7771@end example 7772 7773@node Makefile, src/Makefile, config.h.in, Adjusting Files 7774@subsection @file{Makefile.in} at top level 7775 7776Here are a few modifications you need to make to your main, top-level 7777@file{Makefile.in} file. 7778 7779@enumerate 7780@item 7781Add the following lines near the beginning of your @file{Makefile.in}, 7782so the @samp{dist:} goal will work properly (as explained further down): 7783 7784@example 7785PACKAGE = @@PACKAGE@@ 7786VERSION = @@VERSION@@ 7787@end example 7788 7789@item 7790Add file @file{ABOUT-NLS} to the @code{DISTFILES} definition, so the file gets 7791distributed. 7792 7793@item 7794Wherever you process subdirectories in your @file{Makefile.in}, be sure 7795you also process the subdirectories @samp{intl} and @samp{po}. Special 7796rules in the @file{Makefiles} take care for the case where no 7797internationalization is wanted. 7798 7799If you are using Makefiles, either generated by automake, or hand-written 7800so they carefully follow the GNU coding standards, the effected goals for 7801which the new subdirectories must be handled include @samp{installdirs}, 7802@samp{install}, @samp{uninstall}, @samp{clean}, @samp{distclean}. 7803 7804Here is an example of a canonical order of processing. In this 7805example, we also define @code{SUBDIRS} in @code{Makefile.in} for it 7806to be further used in the @samp{dist:} goal. 7807 7808@example 7809SUBDIRS = doc intl lib src po 7810@end example 7811 7812Note that you must arrange for @samp{make} to descend into the 7813@code{intl} directory before descending into other directories containing 7814code which make use of the @code{libintl.h} header file. For this 7815reason, here we mention @code{intl} before @code{lib} and @code{src}. 7816 7817@item 7818A delicate point is the @samp{dist:} goal, as both 7819@file{intl/Makefile} and @file{po/Makefile} will later assume that the 7820proper directory has been set up from the main @file{Makefile}. Here is 7821an example at what the @samp{dist:} goal might look like: 7822 7823@example 7824distdir = $(PACKAGE)-$(VERSION) 7825dist: Makefile 7826 rm -fr $(distdir) 7827 mkdir $(distdir) 7828 chmod 777 $(distdir) 7829 for file in $(DISTFILES); do \ 7830 ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir); \ 7831 done 7832 for subdir in $(SUBDIRS); do \ 7833 mkdir $(distdir)/$$subdir || exit 1; \ 7834 chmod 777 $(distdir)/$$subdir; \ 7835 (cd $$subdir && $(MAKE) $@@) || exit 1; \ 7836 done 7837 tar chozf $(distdir).tar.gz $(distdir) 7838 rm -fr $(distdir) 7839@end example 7840 7841@end enumerate 7842 7843Note that if you are using GNU @code{automake}, @file{Makefile.in} is 7844automatically generated from @file{Makefile.am}, and all needed changes 7845to @file{Makefile.am} are already made by running @samp{gettextize}. 7846 7847@node src/Makefile, lib/gettext.h, Makefile, Adjusting Files 7848@subsection @file{Makefile.in} in @file{src/} 7849 7850Some of the modifications made in the main @file{Makefile.in} will 7851also be needed in the @file{Makefile.in} from your package sources, 7852which we assume here to be in the @file{src/} subdirectory. Here are 7853all the modifications needed in @file{src/Makefile.in}: 7854 7855@enumerate 7856@item 7857In view of the @samp{dist:} goal, you should have these lines near the 7858beginning of @file{src/Makefile.in}: 7859 7860@example 7861PACKAGE = @@PACKAGE@@ 7862VERSION = @@VERSION@@ 7863@end example 7864 7865@item 7866If not done already, you should guarantee that @code{top_srcdir} 7867gets defined. This will serve for @code{cpp} include files. Just add 7868the line: 7869 7870@example 7871top_srcdir = @@top_srcdir@@ 7872@end example 7873 7874@item 7875You might also want to define @code{subdir} as @samp{src}, later 7876allowing for almost uniform @samp{dist:} goals in all your 7877@file{Makefile.in}. At list, the @samp{dist:} goal below assume that 7878you used: 7879 7880@example 7881subdir = src 7882@end example 7883 7884@item 7885The @code{main} function of your program will normally call 7886@code{bindtextdomain} (see @pxref{Triggering}), like this: 7887 7888@example 7889bindtextdomain (@var{PACKAGE}, LOCALEDIR); 7890textdomain (@var{PACKAGE}); 7891@end example 7892 7893To make LOCALEDIR known to the program, add the following lines to 7894@file{Makefile.in}: 7895 7896@example 7897datadir = @@datadir@@ 7898localedir = $(datadir)/locale 7899DEFS = -DLOCALEDIR=\"$(localedir)\" @@DEFS@@ 7900@end example 7901 7902Note that @code{@@datadir@@} defaults to @samp{$(prefix)/share}, thus 7903@code{$(localedir)} defaults to @samp{$(prefix)/share/locale}. 7904 7905@item 7906You should ensure that the final linking will use @code{@@LIBINTL@@} or 7907@code{@@LTLIBINTL@@} as a library. @code{@@LIBINTL@@} is for use without 7908@code{libtool}, @code{@@LTLIBINTL@@} is for use with @code{libtool}. An 7909easy way to achieve this is to manage that it gets into @code{LIBS}, like 7910this: 7911 7912@example 7913LIBS = @@LIBINTL@@ @@LIBS@@ 7914@end example 7915 7916In most packages internationalized with GNU @code{gettext}, one will 7917find a directory @file{lib/} in which a library containing some helper 7918functions will be build. (You need at least the few functions which the 7919GNU @code{gettext} Library itself needs.) However some of the functions 7920in the @file{lib/} also give messages to the user which of course should be 7921translated, too. Taking care of this, the support library (say 7922@file{libsupport.a}) should be placed before @code{@@LIBINTL@@} and 7923@code{@@LIBS@@} in the above example. So one has to write this: 7924 7925@example 7926LIBS = ../lib/libsupport.a @@LIBINTL@@ @@LIBS@@ 7927@end example 7928 7929@item 7930You should also ensure that directory @file{intl/} will be searched for 7931C preprocessor include files in all circumstances. So, you have to 7932manage so both @samp{-I../intl} and @samp{-I$(top_srcdir)/intl} will 7933be given to the C compiler. 7934 7935@item 7936Your @samp{dist:} goal has to conform with others. Here is a 7937reasonable definition for it: 7938 7939@example 7940distdir = ../$(PACKAGE)-$(VERSION)/$(subdir) 7941dist: Makefile $(DISTFILES) 7942 for file in $(DISTFILES); do \ 7943 ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir) || exit 1; \ 7944 done 7945@end example 7946 7947@end enumerate 7948 7949Note that if you are using GNU @code{automake}, @file{Makefile.in} is 7950automatically generated from @file{Makefile.am}, and the first three 7951changes and the last change are not necessary. The remaining needed 7952@file{Makefile.am} modifications are the following: 7953 7954@enumerate 7955@item 7956To make LOCALEDIR known to the program, add the following to 7957@file{Makefile.am}: 7958 7959@example 7960<module>_CPPFLAGS = -DLOCALEDIR=\"$(localedir)\" 7961@end example 7962 7963@noindent 7964for each specific module or compilation unit, or 7965 7966@example 7967AM_CPPFLAGS = -DLOCALEDIR=\"$(localedir)\" 7968@end example 7969 7970for all modules and compilation units together. Furthermore, add this 7971line to define @samp{localedir}: 7972 7973@example 7974localedir = $(datadir)/locale 7975@end example 7976 7977@item 7978To ensure that the final linking will use @code{@@LIBINTL@@} or 7979@code{@@LTLIBINTL@@} as a library, add the following to 7980@file{Makefile.am}: 7981 7982@example 7983<program>_LDADD = @@LIBINTL@@ 7984@end example 7985 7986@noindent 7987for each specific program, or 7988 7989@example 7990LDADD = @@LIBINTL@@ 7991@end example 7992 7993for all programs together. Remember that when you use @code{libtool} 7994to link a program, you need to use @@LTLIBINTL@@ instead of @@LIBINTL@@ 7995for that program. 7996 7997@item 7998If you have an @file{intl/} directory, whose contents is created by 7999@code{gettextize}, then to ensure that it will be searched for 8000C preprocessor include files in all circumstances, add something like 8001this to @file{Makefile.am}: 8002 8003@example 8004AM_CPPFLAGS = -I../intl -I$(top_srcdir)/intl 8005@end example 8006 8007@end enumerate 8008 8009@node lib/gettext.h, , src/Makefile, Adjusting Files 8010@subsection @file{gettext.h} in @file{lib/} 8011@cindex @file{gettext.h} file 8012@cindex turning off NLS support 8013@cindex disabling NLS 8014 8015Internationalization of packages, as provided by GNU @code{gettext}, is 8016optional. It can be turned off in two situations: 8017 8018@itemize @bullet 8019@item 8020When the installer has specified @samp{./configure --disable-nls}. This 8021can be useful when small binaries are more important than features, for 8022example when building utilities for boot diskettes. It can also be useful 8023in order to get some specific C compiler warnings about code quality with 8024some older versions of GCC (older than 3.0). 8025 8026@item 8027When the package does not include the @code{intl/} subdirectory, and the 8028libintl.h header (with its associated libintl library, if any) is not 8029already installed on the system, it is preferable that the package builds 8030without internationalization support, rather than to give a compilation 8031error. 8032@end itemize 8033 8034A C preprocessor macro can be used to detect these two cases. Usually, 8035when @code{libintl.h} was found and not explicitly disabled, the 8036@code{ENABLE_NLS} macro will be defined to 1 in the autoconf generated 8037configuration file (usually called @file{config.h}). In the two negative 8038situations, however, this macro will not be defined, thus it will evaluate 8039to 0 in C preprocessor expressions. 8040 8041@cindex include file @file{libintl.h} 8042@file{gettext.h} is a convenience header file for conditional use of 8043@file{<libintl.h>}, depending on the @code{ENABLE_NLS} macro. If 8044@code{ENABLE_NLS} is set, it includes @file{<libintl.h>}; otherwise it 8045defines no-op substitutes for the libintl.h functions. We recommend 8046the use of @code{"gettext.h"} over direct use of @file{<libintl.h>}, 8047so that portability to older systems is guaranteed and installers can 8048turn off internationalization if they want to. In the C code, you will 8049then write 8050 8051@example 8052#include "gettext.h" 8053@end example 8054 8055@noindent 8056instead of 8057 8058@example 8059#include <libintl.h> 8060@end example 8061 8062The location of @code{gettext.h} is usually in a directory containing 8063auxiliary include files. In many GNU packages, there is a directory 8064@file{lib/} containing helper functions; @file{gettext.h} fits there. 8065In other packages, it can go into the @file{src} directory. 8066 8067Do not install the @code{gettext.h} file in public locations. Every 8068package that needs it should contain a copy of it on its own. 8069 8070@node autoconf macros, CVS Issues, Adjusting Files, Maintainers 8071@section Autoconf macros for use in @file{configure.ac} 8072@cindex autoconf macros for @code{gettext} 8073 8074GNU @code{gettext} installs macros for use in a package's 8075@file{configure.ac} or @file{configure.in}. 8076@xref{Top, , Introduction, autoconf, The Autoconf Manual}. 8077The primary macro is, of course, @code{AM_GNU_GETTEXT}. 8078 8079@menu 8080* AM_GNU_GETTEXT:: AM_GNU_GETTEXT in @file{gettext.m4} 8081* AM_GNU_GETTEXT_VERSION:: AM_GNU_GETTEXT_VERSION in @file{gettext.m4} 8082* AM_GNU_GETTEXT_NEED:: AM_GNU_GETTEXT_NEED in @file{gettext.m4} 8083* AM_GNU_GETTEXT_INTL_SUBDIR:: AM_GNU_GETTEXT_INTL_SUBDIR in @file{intldir.m4} 8084* AM_PO_SUBDIRS:: AM_PO_SUBDIRS in @file{po.m4} 8085* AM_XGETTEXT_OPTION:: AM_XGETTEXT_OPTION in @file{po.m4} 8086* AM_ICONV:: AM_ICONV in @file{iconv.m4} 8087@end menu 8088 8089@node AM_GNU_GETTEXT, AM_GNU_GETTEXT_VERSION, autoconf macros, autoconf macros 8090@subsection AM_GNU_GETTEXT in @file{gettext.m4} 8091 8092@amindex AM_GNU_GETTEXT 8093The @code{AM_GNU_GETTEXT} macro tests for the presence of the GNU gettext 8094function family in either the C library or a separate @code{libintl} 8095library (shared or static libraries are both supported) or in the package's 8096@file{intl/} directory. It also invokes @code{AM_PO_SUBDIRS}, thus preparing 8097the @file{po/} directories of the package for building. 8098 8099@code{AM_GNU_GETTEXT} accepts up to three optional arguments. The general 8100syntax is 8101 8102@example 8103AM_GNU_GETTEXT([@var{intlsymbol}], [@var{needsymbol}], [@var{intldir}]) 8104@end example 8105 8106@c We don't document @var{intlsymbol} = @samp{use-libtool} here, because 8107@c it is of no use for packages other than GNU gettext itself. (Such packages 8108@c are not allowed to install the shared libintl. But if they use libtool, 8109@c then it is in order to install shared libraries that depend on libintl.) 8110@var{intlsymbol} can be @samp{external} or @samp{no-libtool}. The default 8111(if it is not specified or empty) is @samp{no-libtool}. @var{intlsymbol} 8112should be @samp{external} for packages with no @file{intl/} directory. 8113For packages with an @file{intl/} directory, you can either use an 8114@var{intlsymbol} equal to @samp{no-libtool}, or you can use @samp{external} 8115and override by using the macro @code{AM_GNU_GETTEXT_INTL_SUBDIR} elsewhere. 8116The two ways to specify the existence of an @file{intl/} directory are 8117equivalent. At build time, a static library 8118@code{$(top_builddir)/intl/libintl.a} will then be created. 8119 8120If @var{needsymbol} is specified and is @samp{need-ngettext}, then GNU 8121gettext implementations (in libc or libintl) without the @code{ngettext()} 8122function will be ignored. If @var{needsymbol} is specified and is 8123@samp{need-formatstring-macros}, then GNU gettext implementations that don't 8124support the ISO C 99 @file{<inttypes.h>} formatstring macros will be ignored. 8125Only one @var{needsymbol} can be specified. These requirements can also be 8126specified by using the macro @code{AM_GNU_GETTEXT_NEED} elsewhere. To specify 8127more than one requirement, just specify the strongest one among them, or 8128invoke the @code{AM_GNU_GETTEXT_NEED} macro several times. The hierarchy 8129among the various alternatives is as follows: @samp{need-formatstring-macros} 8130implies @samp{need-ngettext}. 8131 8132@var{intldir} is used to find the intl libraries. If empty, the value 8133@samp{$(top_builddir)/intl/} is used. 8134 8135The @code{AM_GNU_GETTEXT} macro determines whether GNU gettext is 8136available and should be used. If so, it sets the @code{USE_NLS} variable 8137to @samp{yes}; it defines @code{ENABLE_NLS} to 1 in the autoconf 8138generated configuration file (usually called @file{config.h}); it sets 8139the variables @code{LIBINTL} and @code{LTLIBINTL} to the linker options 8140for use in a Makefile (@code{LIBINTL} for use without libtool, 8141@code{LTLIBINTL} for use with libtool); it adds an @samp{-I} option to 8142@code{CPPFLAGS} if necessary. In the negative case, it sets 8143@code{USE_NLS} to @samp{no}; it sets @code{LIBINTL} and @code{LTLIBINTL} 8144to empty and doesn't change @code{CPPFLAGS}. 8145 8146The complexities that @code{AM_GNU_GETTEXT} deals with are the following: 8147 8148@itemize @bullet 8149@item 8150@cindex @code{libintl} library 8151Some operating systems have @code{gettext} in the C library, for example 8152glibc. Some have it in a separate library @code{libintl}. GNU @code{libintl} 8153might have been installed as part of the GNU @code{gettext} package. 8154 8155@item 8156GNU @code{libintl}, if installed, is not necessarily already in the search 8157path (@code{CPPFLAGS} for the include file search path, @code{LDFLAGS} for 8158the library search path). 8159 8160@item 8161Except for glibc, the operating system's native @code{gettext} cannot 8162exploit the GNU mo files, doesn't have the necessary locale dependency 8163features, and cannot convert messages from the catalog's text encoding 8164to the user's locale encoding. 8165 8166@item 8167GNU @code{libintl}, if installed, is not necessarily already in the 8168run time library search path. To avoid the need for setting an environment 8169variable like @code{LD_LIBRARY_PATH}, the macro adds the appropriate 8170run time search path options to the @code{LIBINTL} and @code{LTLIBINTL} 8171variables. This works on most systems, but not on some operating systems 8172with limited shared library support, like SCO. 8173 8174@item 8175GNU @code{libintl} relies on POSIX/XSI @code{iconv}. The macro checks for 8176linker options needed to use iconv and appends them to the @code{LIBINTL} 8177and @code{LTLIBINTL} variables. 8178@end itemize 8179 8180@node AM_GNU_GETTEXT_VERSION, AM_GNU_GETTEXT_NEED, AM_GNU_GETTEXT, autoconf macros 8181@subsection AM_GNU_GETTEXT_VERSION in @file{gettext.m4} 8182 8183@amindex AM_GNU_GETTEXT_VERSION 8184The @code{AM_GNU_GETTEXT_VERSION} macro declares the version number of 8185the GNU gettext infrastructure that is used by the package. 8186 8187The use of this macro is optional; only the @code{autopoint} program makes 8188use of it (@pxref{CVS Issues}). 8189 8190@node AM_GNU_GETTEXT_NEED, AM_GNU_GETTEXT_INTL_SUBDIR, AM_GNU_GETTEXT_VERSION, autoconf macros 8191@subsection AM_GNU_GETTEXT_NEED in @file{gettext.m4} 8192 8193@amindex AM_GNU_GETTEXT_NEED 8194The @code{AM_GNU_GETTEXT_NEED} macro declares a constraint regarding the 8195GNU gettext implementation. The syntax is 8196 8197@example 8198AM_GNU_GETTEXT_NEED([@var{needsymbol}]) 8199@end example 8200 8201If @var{needsymbol} is @samp{need-ngettext}, then GNU gettext implementations 8202(in libc or libintl) without the @code{ngettext()} function will be ignored. 8203If @var{needsymbol} is @samp{need-formatstring-macros}, then GNU gettext 8204implementations that don't support the ISO C 99 @file{<inttypes.h>} 8205formatstring macros will be ignored. 8206 8207The optional second argument of @code{AM_GNU_GETTEXT} is also taken into 8208account. 8209 8210The @code{AM_GNU_GETTEXT_NEED} invocations can occur before or after 8211the @code{AM_GNU_GETTEXT} invocation; the order doesn't matter. 8212 8213@node AM_GNU_GETTEXT_INTL_SUBDIR, AM_PO_SUBDIRS, AM_GNU_GETTEXT_NEED, autoconf macros 8214@subsection AM_GNU_GETTEXT_INTL_SUBDIR in @file{intldir.m4} 8215 8216@amindex AM_GNU_GETTEXT_INTL_SUBDIR 8217The @code{AM_GNU_GETTEXT_INTL_SUBDIR} macro specifies that the 8218@code{AM_GNU_GETTEXT} macro, although invoked with the first argument 8219@samp{external}, should also prepare for building the @file{intl/} 8220subdirectory. 8221 8222The @code{AM_GNU_GETTEXT_INTL_SUBDIR} invocation can occur before or after 8223the @code{AM_GNU_GETTEXT} invocation; the order doesn't matter. 8224 8225The use of this macro requires GNU automake 1.10 or newer and 8226GNU autoconf 2.61 or newer. 8227 8228@node AM_PO_SUBDIRS, AM_XGETTEXT_OPTION, AM_GNU_GETTEXT_INTL_SUBDIR, autoconf macros 8229@subsection AM_PO_SUBDIRS in @file{po.m4} 8230 8231@amindex AM_PO_SUBDIRS 8232The @code{AM_PO_SUBDIRS} macro prepares the @file{po/} directories of the 8233package for building. This macro should be used in internationalized 8234programs written in other programming languages than C, C++, Objective C, 8235for example @code{sh}, @code{Python}, @code{Lisp}. See @ref{Programming 8236Languages} for a list of programming languages that support localization 8237through PO files. 8238 8239The @code{AM_PO_SUBDIRS} macro determines whether internationalization 8240should be used. If so, it sets the @code{USE_NLS} variable to @samp{yes}, 8241otherwise to @samp{no}. It also determines the right values for Makefile 8242variables in each @file{po/} directory. 8243 8244@node AM_XGETTEXT_OPTION, AM_ICONV, AM_PO_SUBDIRS, autoconf macros 8245@subsection AM_XGETTEXT_OPTION in @file{po.m4} 8246 8247@amindex AM_XGETTEXT_OPTION 8248The @code{AM_XGETTEXT_OPTION} macro registers a command-line option to be 8249used in the invocations of @code{xgettext} in the @file{po/} directories 8250of the package. 8251 8252For example, if you have a source file that defines a function 8253@samp{error_at_line} whose fifth argument is a format string, you can use 8254@example 8255AM_XGETTEXT_OPTION([--flag=error_at_line:5:c-format]) 8256@end example 8257@noindent 8258to instruct @code{xgettext} to mark all translatable strings in @samp{gettext} 8259invocations that occur as fifth argument to this function as @samp{c-format}. 8260 8261See @ref{xgettext Invocation} for the list of options that @code{xgettext} 8262accepts. 8263 8264The use of this macro is an alternative to the use of the 8265@samp{XGETTEXT_OPTIONS} variable in @file{po/Makevars}. 8266 8267@node AM_ICONV, , AM_XGETTEXT_OPTION, autoconf macros 8268@subsection AM_ICONV in @file{iconv.m4} 8269 8270@amindex AM_ICONV 8271The @code{AM_ICONV} macro tests for the presence of the POSIX/XSI 8272@code{iconv} function family in either the C library or a separate 8273@code{libiconv} library. If found, it sets the @code{am_cv_func_iconv} 8274variable to @samp{yes}; it defines @code{HAVE_ICONV} to 1 in the autoconf 8275generated configuration file (usually called @file{config.h}); it defines 8276@code{ICONV_CONST} to @samp{const} or to empty, depending on whether the 8277second argument of @code{iconv()} is of type @samp{const char **} or 8278@samp{char **}; it sets the variables @code{LIBICONV} and 8279@code{LTLIBICONV} to the linker options for use in a Makefile 8280(@code{LIBICONV} for use without libtool, @code{LTLIBICONV} for use with 8281libtool); it adds an @samp{-I} option to @code{CPPFLAGS} if 8282necessary. If not found, it sets @code{LIBICONV} and @code{LTLIBICONV} to 8283empty and doesn't change @code{CPPFLAGS}. 8284 8285The complexities that @code{AM_ICONV} deals with are the following: 8286 8287@itemize @bullet 8288@item 8289@cindex @code{libiconv} library 8290Some operating systems have @code{iconv} in the C library, for example 8291glibc. Some have it in a separate library @code{libiconv}, for example 8292OSF/1 or FreeBSD. Regardless of the operating system, GNU @code{libiconv} 8293might have been installed. In that case, it should be used instead of the 8294operating system's native @code{iconv}. 8295 8296@item 8297GNU @code{libiconv}, if installed, is not necessarily already in the search 8298path (@code{CPPFLAGS} for the include file search path, @code{LDFLAGS} for 8299the library search path). 8300 8301@item 8302GNU @code{libiconv} is binary incompatible with some operating system's 8303native @code{iconv}, for example on FreeBSD. Use of an @file{iconv.h} 8304and @file{libiconv.so} that don't fit together would produce program 8305crashes. 8306 8307@item 8308GNU @code{libiconv}, if installed, is not necessarily already in the 8309run time library search path. To avoid the need for setting an environment 8310variable like @code{LD_LIBRARY_PATH}, the macro adds the appropriate 8311run time search path options to the @code{LIBICONV} variable. This works 8312on most systems, but not on some operating systems with limited shared 8313library support, like SCO. 8314@end itemize 8315 8316@file{iconv.m4} is distributed with the GNU gettext package because 8317@file{gettext.m4} relies on it. 8318 8319@node CVS Issues, Release Management, autoconf macros, Maintainers 8320@section Integrating with CVS 8321 8322Many projects use CVS for distributed development, version control and 8323source backup. This section gives some advice how to manage the uses 8324of @code{cvs}, @code{gettextize}, @code{autopoint} and @code{autoconf}. 8325 8326@menu 8327* Distributed CVS:: Avoiding version mismatch in distributed development 8328* Files under CVS:: Files to put under CVS version control 8329* autopoint Invocation:: Invoking the @code{autopoint} Program 8330@end menu 8331 8332@node Distributed CVS, Files under CVS, CVS Issues, CVS Issues 8333@subsection Avoiding version mismatch in distributed development 8334 8335In a project development with multiple developers, using CVS, there 8336should be a single developer who occasionally - when there is desire to 8337upgrade to a new @code{gettext} version - runs @code{gettextize} and 8338performs the changes listed in @ref{Adjusting Files}, and then commits 8339his changes to the CVS. 8340 8341It is highly recommended that all developers on a project use the same 8342version of GNU @code{gettext} in the package. In other words, if a 8343developer runs @code{gettextize}, he should go the whole way, make the 8344necessary remaining changes and commit his changes to the CVS. 8345Otherwise the following damages will likely occur: 8346 8347@itemize @bullet 8348@item 8349Apparent version mismatch between developers. Since some @code{gettext} 8350specific portions in @file{configure.ac}, @file{configure.in} and 8351@code{Makefile.am}, @code{Makefile.in} files depend on the @code{gettext} 8352version, the use of infrastructure files belonging to different 8353@code{gettext} versions can easily lead to build errors. 8354 8355@item 8356Hidden version mismatch. Such version mismatch can also lead to 8357malfunctioning of the package, that may be undiscovered by the developers. 8358The worst case of hidden version mismatch is that internationalization 8359of the package doesn't work at all. 8360 8361@item 8362Release risks. All developers implicitly perform constant testing on 8363a package. This is important in the days and weeks before a release. 8364If the guy who makes the release tar files uses a different version 8365of GNU @code{gettext} than the other developers, the distribution will 8366be less well tested than if all had been using the same @code{gettext} 8367version. For example, it is possible that a platform specific bug goes 8368undiscovered due to this constellation. 8369@end itemize 8370 8371@node Files under CVS, autopoint Invocation, Distributed CVS, CVS Issues 8372@subsection Files to put under CVS version control 8373 8374There are basically three ways to deal with generated files in the 8375context of a CVS repository, such as @file{configure} generated from 8376@file{configure.ac}, @code{@var{parser}.c} generated from 8377@code{@var{parser}.y}, or @code{po/Makefile.in.in} autoinstalled by 8378@code{gettextize} or @code{autopoint}. 8379 8380@enumerate 8381@item 8382All generated files are always committed into the repository. 8383 8384@item 8385All generated files are committed into the repository occasionally, 8386for example each time a release is made. 8387 8388@item 8389Generated files are never committed into the repository. 8390@end enumerate 8391 8392Each of these three approaches has different advantages and drawbacks. 8393 8394@enumerate 8395@item 8396The advantage is that anyone can check out the CVS at any moment and 8397gets a working build. The drawbacks are: 1a. It requires some frequent 8398"cvs commit" actions by the maintainers. 1b. The repository grows in size 8399quite fast. 8400 8401@item 8402The advantage is that anyone can check out the CVS, and the usual 8403"./configure; make" will work. The drawbacks are: 2a. The one who 8404checks out the repository needs tools like GNU @code{automake}, 8405GNU @code{autoconf}, GNU @code{m4} installed in his PATH; sometimes 8406he even needs particular versions of them. 2b. When a release is made 8407and a commit is made on the generated files, the other developers get 8408conflicts on the generated files after doing "cvs update". Although 8409these conflicts are easy to resolve, they are annoying. 8410 8411@item 8412The advantage is less work for the maintainers. The drawback is that 8413anyone who checks out the CVS not only needs tools like GNU @code{automake}, 8414GNU @code{autoconf}, GNU @code{m4} installed in his PATH, but also that 8415he needs to perform a package specific pre-build step before being able 8416to "./configure; make". 8417@end enumerate 8418 8419For the first and second approach, all files modified or brought in 8420by the occasional @code{gettextize} invocation and update should be 8421committed into the CVS. 8422 8423For the third approach, the maintainer can omit from the CVS repository 8424all the files that @code{gettextize} mentions as "copy". Instead, he 8425adds to the @file{configure.ac} or @file{configure.in} a line of the 8426form 8427 8428@example 8429AM_GNU_GETTEXT_VERSION(@value{VERSION}) 8430@end example 8431 8432@noindent 8433and adds to the package's pre-build script an invocation of 8434@samp{autopoint}. For everyone who checks out the CVS, this 8435@code{autopoint} invocation will copy into the right place the 8436@code{gettext} infrastructure files that have been omitted from the CVS. 8437 8438The version number used as argument to @code{AM_GNU_GETTEXT_VERSION} is 8439the version of the @code{gettext} infrastructure that the package wants 8440to use. It is also the minimum version number of the @samp{autopoint} 8441program. So, if you write @code{AM_GNU_GETTEXT_VERSION(0.11.5)} then the 8442developers can have any version >= 0.11.5 installed; the package will work 8443with the 0.11.5 infrastructure in all developers' builds. When the 8444maintainer then runs gettextize from, say, version 0.12.1 on the package, 8445the occurrence of @code{AM_GNU_GETTEXT_VERSION(0.11.5)} will be changed 8446into @code{AM_GNU_GETTEXT_VERSION(0.12.1)}, and all other developers that 8447use the CVS will henceforth need to have GNU @code{gettext} 0.12.1 or newer 8448installed. 8449 8450@node autopoint Invocation, , Files under CVS, CVS Issues 8451@subsection Invoking the @code{autopoint} Program 8452 8453@include autopoint.texi 8454 8455@node Release Management, , CVS Issues, Maintainers 8456@section Creating a Distribution Tarball 8457 8458@cindex release 8459@cindex distribution tarball 8460In projects that use GNU @code{automake}, the usual commands for creating 8461a distribution tarball, @samp{make dist} or @samp{make distcheck}, 8462automatically update the PO files as needed. 8463 8464If GNU @code{automake} is not used, the maintainer needs to perform this 8465update before making a release: 8466 8467@example 8468$ ./configure 8469$ (cd po; make update-po) 8470$ make distclean 8471@end example 8472 8473@node Installers, Programming Languages, Maintainers, Top 8474@chapter The Installer's and Distributor's View 8475@cindex package installer's view of @code{gettext} 8476@cindex package distributor's view of @code{gettext} 8477@cindex package build and installation options 8478@cindex setting up @code{gettext} at build time 8479 8480By default, packages fully using GNU @code{gettext}, internally, 8481are installed in such a way that they to allow translation of 8482messages. At @emph{configuration} time, those packages should 8483automatically detect whether the underlying host system already provides 8484the GNU @code{gettext} functions. If not, 8485the GNU @code{gettext} library should be automatically prepared 8486and used. Installers may use special options at configuration 8487time for changing this behavior. The command @samp{./configure 8488--with-included-gettext} bypasses system @code{gettext} to 8489use the included GNU @code{gettext} instead, 8490while @samp{./configure --disable-nls} 8491produces programs totally unable to translate messages. 8492 8493@vindex LINGUAS@r{, environment variable} 8494Internationalized packages have usually many @file{@var{ll}.po} 8495files. Unless 8496translations are disabled, all those available are installed together 8497with the package. However, the environment variable @code{LINGUAS} 8498may be set, prior to configuration, to limit the installed set. 8499@code{LINGUAS} should then contain a space separated list of two-letter 8500codes, stating which languages are allowed. 8501 8502@node Programming Languages, Conclusion, Installers, Top 8503@chapter Other Programming Languages 8504 8505While the presentation of @code{gettext} focuses mostly on C and 8506implicitly applies to C++ as well, its scope is far broader than that: 8507Many programming languages, scripting languages and other textual data 8508like GUI resources or package descriptions can make use of the gettext 8509approach. 8510 8511@menu 8512* Language Implementors:: The Language Implementor's View 8513* Programmers for other Languages:: The Programmer's View 8514* Translators for other Languages:: The Translator's View 8515* Maintainers for other Languages:: The Maintainer's View 8516* List of Programming Languages:: Individual Programming Languages 8517* List of Data Formats:: Internationalizable Data 8518@end menu 8519 8520@node Language Implementors, Programmers for other Languages, Programming Languages, Programming Languages 8521@section The Language Implementor's View 8522@cindex programming languages 8523@cindex scripting languages 8524 8525All programming and scripting languages that have the notion of strings 8526are eligible to supporting @code{gettext}. Supporting @code{gettext} 8527means the following: 8528 8529@enumerate 8530@item 8531You should add to the language a syntax for translatable strings. In 8532principle, a function call of @code{gettext} would do, but a shorthand 8533syntax helps keeping the legibility of internationalized programs. For 8534example, in C we use the syntax @code{_("string")}, and in GNU awk we use 8535the shorthand @code{_"string"}. 8536 8537@item 8538You should arrange that evaluation of such a translatable string at 8539runtime calls the @code{gettext} function, or performs equivalent 8540processing. 8541 8542@item 8543Similarly, you should make the functions @code{ngettext}, 8544@code{dcgettext}, @code{dcngettext} available from within the language. 8545These functions are less often used, but are nevertheless necessary for 8546particular purposes: @code{ngettext} for correct plural handling, and 8547@code{dcgettext} and @code{dcngettext} for obeying other locale-related 8548environment variables than @code{LC_MESSAGES}, such as @code{LC_TIME} or 8549@code{LC_MONETARY}. For these latter functions, you need to make the 8550@code{LC_*} constants, available in the C header @code{<locale.h>}, 8551referenceable from within the language, usually either as enumeration 8552values or as strings. 8553 8554@item 8555You should allow the programmer to designate a message domain, either by 8556making the @code{textdomain} function available from within the 8557language, or by introducing a magic variable called @code{TEXTDOMAIN}. 8558Similarly, you should allow the programmer to designate where to search 8559for message catalogs, by providing access to the @code{bindtextdomain} 8560function. 8561 8562@item 8563You should either perform a @code{setlocale (LC_ALL, "")} call during 8564the startup of your language runtime, or allow the programmer to do so. 8565Remember that gettext will act as a no-op if the @code{LC_MESSAGES} and 8566@code{LC_CTYPE} locale categories are not both set. 8567 8568@item 8569A programmer should have a way to extract translatable strings from a 8570program into a PO file. The GNU @code{xgettext} program is being 8571extended to support very different programming languages. Please 8572contact the GNU @code{gettext} maintainers to help them doing this. If 8573the string extractor is best integrated into your language's parser, GNU 8574@code{xgettext} can function as a front end to your string extractor. 8575 8576@item 8577The language's library should have a string formatting facility where 8578the arguments of a format string are denoted by a positional number or a 8579name. This is needed because for some languages and some messages with 8580more than one substitutable argument, the translation will need to 8581output the substituted arguments in different order. @xref{c-format Flag}. 8582 8583@item 8584If the language has more than one implementation, and not all of the 8585implementations use @code{gettext}, but the programs should be portable 8586across implementations, you should provide a no-i18n emulation, that 8587makes the other implementations accept programs written for yours, 8588without actually translating the strings. 8589 8590@item 8591To help the programmer in the task of marking translatable strings, 8592which is sometimes performed using the Emacs PO mode (@pxref{Marking}), 8593you are welcome to 8594contact the GNU @code{gettext} maintainers, so they can add support for 8595your language to @file{po-mode.el}. 8596@end enumerate 8597 8598On the implementation side, three approaches are possible, with 8599different effects on portability and copyright: 8600 8601@itemize @bullet 8602@item 8603You may integrate the GNU @code{gettext}'s @file{intl/} directory in 8604your package, as described in @ref{Maintainers}. This allows you to 8605have internationalization on all kinds of platforms. Note that when you 8606then distribute your package, it legally falls under the GNU General 8607Public License, and the GNU project will be glad about your contribution 8608to the Free Software pool. 8609 8610@item 8611You may link against GNU @code{gettext} functions if they are found in 8612the C library. For example, an autoconf test for @code{gettext()} and 8613@code{ngettext()} will detect this situation. For the moment, this test 8614will succeed on GNU systems and not on other platforms. No severe 8615copyright restrictions apply. 8616 8617@item 8618You may emulate or reimplement the GNU @code{gettext} functionality. 8619This has the advantage of full portability and no copyright 8620restrictions, but also the drawback that you have to reimplement the GNU 8621@code{gettext} features (such as the @code{LANGUAGE} environment 8622variable, the locale aliases database, the automatic charset conversion, 8623and plural handling). 8624@end itemize 8625 8626@node Programmers for other Languages, Translators for other Languages, Language Implementors, Programming Languages 8627@section The Programmer's View 8628 8629For the programmer, the general procedure is the same as for the C 8630language. The Emacs PO mode marking supports other languages, and the GNU 8631@code{xgettext} string extractor recognizes other languages based on the 8632file extension or a command-line option. In some languages, 8633@code{setlocale} is not needed because it is already performed by the 8634underlying language runtime. 8635 8636@node Translators for other Languages, Maintainers for other Languages, Programmers for other Languages, Programming Languages 8637@section The Translator's View 8638 8639The translator works exactly as in the C language case. The only 8640difference is that when translating format strings, she has to be aware 8641of the language's particular syntax for positional arguments in format 8642strings. 8643 8644@menu 8645* c-format:: C Format Strings 8646* objc-format:: Objective C Format Strings 8647* sh-format:: Shell Format Strings 8648* python-format:: Python Format Strings 8649* lisp-format:: Lisp Format Strings 8650* elisp-format:: Emacs Lisp Format Strings 8651* librep-format:: librep Format Strings 8652* scheme-format:: Scheme Format Strings 8653* smalltalk-format:: Smalltalk Format Strings 8654* java-format:: Java Format Strings 8655* csharp-format:: C# Format Strings 8656* awk-format:: awk Format Strings 8657* object-pascal-format:: Object Pascal Format Strings 8658* ycp-format:: YCP Format Strings 8659* tcl-format:: Tcl Format Strings 8660* perl-format:: Perl Format Strings 8661* php-format:: PHP Format Strings 8662* gcc-internal-format:: GCC internal Format Strings 8663* qt-format:: Qt Format Strings 8664* kde-format:: KDE Format Strings 8665* boost-format:: Boost Format Strings 8666@end menu 8667 8668@node c-format, objc-format, Translators for other Languages, Translators for other Languages 8669@subsection C Format Strings 8670 8671C format strings are described in POSIX (IEEE P1003.1 2001), section 8672XSH 3 fprintf(), 8673@uref{http://www.opengroup.org/onlinepubs/007904975/functions/fprintf.html}. 8674See also the fprintf() manual page, 8675@uref{http://www.linuxvalley.it/encyclopedia/ldp/manpage/man3/printf.3.php}, 8676@uref{http://informatik.fh-wuerzburg.de/student/i510/man/printf.html}. 8677 8678Although format strings with positions that reorder arguments, such as 8679 8680@example 8681"Only %2$d bytes free on '%1$s'." 8682@end example 8683 8684@noindent 8685which is semantically equivalent to 8686 8687@example 8688"'%s' has only %d bytes free." 8689@end example 8690 8691@noindent 8692are a POSIX/XSI feature and not specified by ISO C 99, translators can rely 8693on this reordering ability: On the few platforms where @code{printf()}, 8694@code{fprintf()} etc. don't support this feature natively, @file{libintl.a} 8695or @file{libintl.so} provides replacement functions, and GNU @code{<libintl.h>} 8696activates these replacement functions automatically. 8697 8698@cindex outdigits 8699@cindex Arabic digits 8700As a special feature for Farsi (Persian) and maybe Arabic, translators can 8701insert an @samp{I} flag into numeric format directives. For example, the 8702translation of @code{"%d"} can be @code{"%Id"}. The effect of this flag, 8703on systems with GNU @code{libc}, is that in the output, the ASCII digits are 8704replaced with the @samp{outdigits} defined in the @code{LC_CTYPE} locale 8705category. On other systems, the @code{gettext} function removes this flag, 8706so that it has no effect. 8707 8708Note that the programmer should @emph{not} put this flag into the 8709untranslated string. (Putting the @samp{I} format directive flag into an 8710@var{msgid} string would lead to undefined behaviour on platforms without 8711glibc when NLS is disabled.) 8712 8713@node objc-format, sh-format, c-format, Translators for other Languages 8714@subsection Objective C Format Strings 8715 8716Objective C format strings are like C format strings. They support an 8717additional format directive: "$@@", which when executed consumes an argument 8718of type @code{Object *}. 8719 8720@node sh-format, python-format, objc-format, Translators for other Languages 8721@subsection Shell Format Strings 8722 8723Shell format strings, as supported by GNU gettext and the @samp{envsubst} 8724program, are strings with references to shell variables in the form 8725@code{$@var{variable}} or @code{$@{@var{variable}@}}. References of the form 8726@code{$@{@var{variable}-@var{default}@}}, 8727@code{$@{@var{variable}:-@var{default}@}}, 8728@code{$@{@var{variable}=@var{default}@}}, 8729@code{$@{@var{variable}:=@var{default}@}}, 8730@code{$@{@var{variable}+@var{replacement}@}}, 8731@code{$@{@var{variable}:+@var{replacement}@}}, 8732@code{$@{@var{variable}?@var{ignored}@}}, 8733@code{$@{@var{variable}:?@var{ignored}@}}, 8734that would be valid inside shell scripts, are not supported. The 8735@var{variable} names must consist solely of alphanumeric or underscore 8736ASCII characters, not start with a digit and be nonempty; otherwise such 8737a variable reference is ignored. 8738 8739@node python-format, lisp-format, sh-format, Translators for other Languages 8740@subsection Python Format Strings 8741 8742Python format strings are described in 8743@w{Python Library reference} / 8744@w{2. Built-in Types, Exceptions and Functions} / 8745@w{2.2. Built-in Types} / 8746@w{2.2.6. Sequence Types} / 8747@w{2.2.6.2. String Formatting Operations}. 8748@uref{http://www.python.org/doc/2.2.1/lib/typesseq-strings.html}. 8749 8750@node lisp-format, elisp-format, python-format, Translators for other Languages 8751@subsection Lisp Format Strings 8752 8753Lisp format strings are described in the Common Lisp HyperSpec, 8754chapter 22.3 @w{Formatted Output}, 8755@uref{http://www.lisp.org/HyperSpec/Body/sec_22-3.html}. 8756 8757@node elisp-format, librep-format, lisp-format, Translators for other Languages 8758@subsection Emacs Lisp Format Strings 8759 8760Emacs Lisp format strings are documented in the Emacs Lisp reference, 8761section @w{Formatting Strings}, 8762@uref{http://www.gnu.org/manual/elisp-manual-21-2.8/html_chapter/elisp_4.html#SEC75}. 8763Note that as of version 21, XEmacs supports numbered argument specifications 8764in format strings while FSF Emacs doesn't. 8765 8766@node librep-format, scheme-format, elisp-format, Translators for other Languages 8767@subsection librep Format Strings 8768 8769librep format strings are documented in the librep manual, section 8770@w{Formatted Output}, 8771@url{http://librep.sourceforge.net/librep-manual.html#Formatted%20Output}, 8772@url{http://www.gwinnup.org/research/docs/librep.html#SEC122}. 8773 8774@node scheme-format, smalltalk-format, librep-format, Translators for other Languages 8775@subsection Scheme Format Strings 8776 8777Scheme format strings are documented in the SLIB manual, section 8778@w{Format Specification}. 8779 8780@node smalltalk-format, java-format, scheme-format, Translators for other Languages 8781@subsection Smalltalk Format Strings 8782 8783Smalltalk format strings are described in the GNU Smalltalk documentation, 8784class @code{CharArray}, methods @samp{bindWith:} and 8785@samp{bindWithArguments:}. 8786@uref{http://www.gnu.org/software/smalltalk/gst-manual/gst_68.html#SEC238}. 8787In summary, a directive starts with @samp{%} and is followed by @samp{%} 8788or a nonzero digit (@samp{1} to @samp{9}). 8789 8790@node java-format, csharp-format, smalltalk-format, Translators for other Languages 8791@subsection Java Format Strings 8792 8793Java format strings are described in the JDK documentation for class 8794@code{java.text.MessageFormat}, 8795@uref{http://java.sun.com/j2se/1.4/docs/api/java/text/MessageFormat.html}. 8796See also the ICU documentation 8797@uref{http://oss.software.ibm.com/icu/apiref/classMessageFormat.html}. 8798 8799@node csharp-format, awk-format, java-format, Translators for other Languages 8800@subsection C# Format Strings 8801 8802C# format strings are described in the .NET documentation for class 8803@code{System.String} and in 8804@uref{http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/cpConFormattingOverview.asp}. 8805 8806@node awk-format, object-pascal-format, csharp-format, Translators for other Languages 8807@subsection awk Format Strings 8808 8809awk format strings are described in the gawk documentation, section 8810@w{Printf}, 8811@uref{http://www.gnu.org/manual/gawk/html_node/Printf.html#Printf}. 8812 8813@node object-pascal-format, ycp-format, awk-format, Translators for other Languages 8814@subsection Object Pascal Format Strings 8815 8816Where is this documented? 8817 8818@node ycp-format, tcl-format, object-pascal-format, Translators for other Languages 8819@subsection YCP Format Strings 8820 8821YCP sformat strings are described in the libycp documentation 8822@uref{file:/usr/share/doc/packages/libycp/YCP-builtins.html}. 8823In summary, a directive starts with @samp{%} and is followed by @samp{%} 8824or a nonzero digit (@samp{1} to @samp{9}). 8825 8826@node tcl-format, perl-format, ycp-format, Translators for other Languages 8827@subsection Tcl Format Strings 8828 8829Tcl format strings are described in the @file{format.n} manual page, 8830@uref{http://www.scriptics.com/man/tcl8.3/TclCmd/format.htm}. 8831 8832@node perl-format, php-format, tcl-format, Translators for other Languages 8833@subsection Perl Format Strings 8834 8835There are two kinds format strings in Perl: those acceptable to the 8836Perl built-in function @code{printf}, labelled as @samp{perl-format}, 8837and those acceptable to the @code{libintl-perl} function @code{__x}, 8838labelled as @samp{perl-brace-format}. 8839 8840Perl @code{printf} format strings are described in the @code{sprintf} 8841section of @samp{man perlfunc}. 8842 8843Perl brace format strings are described in the 8844@file{Locale::TextDomain(3pm)} manual page of the CPAN package 8845libintl-perl. In brief, Perl format uses placeholders put between 8846braces (@samp{@{} and @samp{@}}). The placeholder must have the syntax 8847of simple identifiers. 8848 8849@node php-format, gcc-internal-format, perl-format, Translators for other Languages 8850@subsection PHP Format Strings 8851 8852PHP format strings are described in the documentation of the PHP function 8853@code{sprintf}, in @file{phpdoc/manual/function.sprintf.html} or 8854@uref{http://www.php.net/manual/en/function.sprintf.php}. 8855 8856@node gcc-internal-format, qt-format, php-format, Translators for other Languages 8857@subsection GCC internal Format Strings 8858 8859These format strings are used inside the GCC sources. In such a format 8860string, a directive starts with @samp{%}, is optionally followed by a 8861size specifier @samp{l}, an optional flag @samp{+}, another optional flag 8862@samp{#}, and is finished by a specifier: @samp{%} denotes a literal 8863percent sign, @samp{c} denotes a character, @samp{s} denotes a string, 8864@samp{i} and @samp{d} denote an integer, @samp{o}, @samp{u}, @samp{x} 8865denote an unsigned integer, @samp{.*s} denotes a string preceded by a 8866width specification, @samp{H} denotes a @samp{location_t *} pointer, 8867@samp{D} denotes a general declaration, @samp{F} denotes a function 8868declaration, @samp{T} denotes a type, @samp{A} denotes a function argument, 8869@samp{C} denotes a tree code, @samp{E} denotes an expression, @samp{L} 8870denotes a programming language, @samp{O} denotes a binary operator, 8871@samp{P} denotes a function parameter, @samp{Q} denotes an assignment 8872operator, @samp{V} denotes a const/volatile qualifier. 8873 8874@node qt-format, kde-format, gcc-internal-format, Translators for other Languages 8875@subsection Qt Format Strings 8876 8877Qt format strings are described in the documentation of the QString class 8878@uref{file:/usr/lib/qt-4.3.0/doc/html/qstring.html}. 8879In summary, a directive consists of a @samp{%} followed by a digit. The same 8880directive cannot occur more than once in a format string. 8881 8882@node kde-format, boost-format, qt-format, Translators for other Languages 8883@subsection KDE Format Strings 8884 8885KDE 4 format strings are defined as follows: 8886A directive consists of a @samp{%} followed by a non-zero decimal number. 8887If a @samp{%n} occurs in a format strings, all of @samp{%1}, ..., @samp{%(n-1)} 8888must occur as well, except possibly one of them. 8889 8890@node boost-format, , kde-format, Translators for other Languages 8891@subsection Boost Format Strings 8892 8893Boost format strings are described in the documentation of the 8894@code{boost::format} class, at 8895@uref{http://www.boost.org/libs/format/doc/format.html}. 8896In summary, a directive has either the same syntax as in a C format string, 8897such as @samp{%1$+5d}, or may be surrounded by vertical bars, such as 8898@samp{%|1$+5d|} or @samp{%|1$+5|}, or consists of just an argument number 8899between percent signs, such as @samp{%1%}. 8900 8901@node Maintainers for other Languages, List of Programming Languages, Translators for other Languages, Programming Languages 8902@section The Maintainer's View 8903 8904For the maintainer, the general procedure differs from the C language 8905case in two ways. 8906 8907@itemize @bullet 8908@item 8909For those languages that don't use GNU gettext, the @file{intl/} directory 8910is not needed and can be omitted. This means that the maintainer calls the 8911@code{gettextize} program without the @samp{--intl} option, and that he 8912invokes the @code{AM_GNU_GETTEXT} autoconf macro via 8913@samp{AM_GNU_GETTEXT([external])}. 8914 8915@item 8916If only a single programming language is used, the @code{XGETTEXT_OPTIONS} 8917variable in @file{po/Makevars} (@pxref{po/Makevars}) should be adjusted to 8918match the @code{xgettext} options for that particular programming language. 8919If the package uses more than one programming language with @code{gettext} 8920support, it becomes necessary to change the POT file construction rule 8921in @file{po/Makefile.in.in}. It is recommended to make one @code{xgettext} 8922invocation per programming language, each with the options appropriate for 8923that language, and to combine the resulting files using @code{msgcat}. 8924@end itemize 8925 8926@node List of Programming Languages, List of Data Formats, Maintainers for other Languages, Programming Languages 8927@section Individual Programming Languages 8928 8929@c Here is a list of programming languages, as used for Free Software projects 8930@c on SourceForge/Freshmeat, as of February 2002. Those supported by gettext 8931@c are marked with a star. 8932@c C 3580 * 8933@c Perl 1911 * 8934@c C++ 1379 * 8935@c Java 1200 * 8936@c PHP 1051 * 8937@c Python 613 * 8938@c Unix Shell 357 * 8939@c Tcl 266 * 8940@c SQL 174 8941@c JavaScript 118 8942@c Assembly 108 8943@c Scheme 51 8944@c Ruby 47 8945@c Lisp 45 * 8946@c Objective C 39 * 8947@c PL/SQL 29 8948@c Fortran 25 8949@c Ada 24 8950@c Delphi 22 8951@c Awk 19 * 8952@c Pascal 19 8953@c ML 19 8954@c Eiffel 17 8955@c Emacs-Lisp 14 * 8956@c Zope 14 8957@c ASP 12 8958@c Forth 12 8959@c Cold Fusion 10 8960@c Haskell 9 8961@c Visual Basic 9 8962@c C# 6 * 8963@c Smalltalk 6 * 8964@c Basic 5 8965@c Erlang 5 8966@c Modula 5 8967@c Object Pascal 5 * 8968@c Rexx 5 8969@c Dylan 4 8970@c Prolog 4 8971@c APL 3 8972@c PROGRESS 2 8973@c Euler 1 8974@c Euphoria 1 8975@c Pliant 1 8976@c Simula 1 8977@c XBasic 1 8978@c Logo 0 8979@c Other Scripting Engines 49 8980@c Other 116 8981 8982@menu 8983* C:: C, C++, Objective C 8984* sh:: sh - Shell Script 8985* bash:: bash - Bourne-Again Shell Script 8986* Python:: Python 8987* Common Lisp:: GNU clisp - Common Lisp 8988* clisp C:: GNU clisp C sources 8989* Emacs Lisp:: Emacs Lisp 8990* librep:: librep 8991* Scheme:: GNU guile - Scheme 8992* Smalltalk:: GNU Smalltalk 8993* Java:: Java 8994* C#:: C# 8995* gawk:: GNU awk 8996* Pascal:: Pascal - Free Pascal Compiler 8997* wxWidgets:: wxWidgets library 8998* YCP:: YCP - YaST2 scripting language 8999* Tcl:: Tcl - Tk's scripting language 9000* Perl:: Perl 9001* PHP:: PHP Hypertext Preprocessor 9002* Pike:: Pike 9003* GCC-source:: GNU Compiler Collection sources 9004@end menu 9005 9006@node C, sh, List of Programming Languages, List of Programming Languages 9007@subsection C, C++, Objective C 9008@cindex C and C-like languages 9009 9010@table @asis 9011@item RPMs 9012gcc, gpp, gobjc, glibc, gettext 9013 9014@item File extension 9015For C: @code{c}, @code{h}. 9016@*For C++: @code{C}, @code{c++}, @code{cc}, @code{cxx}, @code{cpp}, @code{hpp}. 9017@*For Objective C: @code{m}. 9018 9019@item String syntax 9020@code{"abc"} 9021 9022@item gettext shorthand 9023@code{_("abc")} 9024 9025@item gettext/ngettext functions 9026@code{gettext}, @code{dgettext}, @code{dcgettext}, @code{ngettext}, 9027@code{dngettext}, @code{dcngettext} 9028 9029@item textdomain 9030@code{textdomain} function 9031 9032@item bindtextdomain 9033@code{bindtextdomain} function 9034 9035@item setlocale 9036Programmer must call @code{setlocale (LC_ALL, "")} 9037 9038@item Prerequisite 9039@code{#include <libintl.h>} 9040@*@code{#include <locale.h>} 9041@*@code{#define _(string) gettext (string)} 9042 9043@item Use or emulate GNU gettext 9044Use 9045 9046@item Extractor 9047@code{xgettext -k_} 9048 9049@item Formatting with positions 9050@code{fprintf "%2$d %1$d"} 9051@*In C++: @code{autosprintf "%2$d %1$d"} 9052(@pxref{Top, , Introduction, autosprintf, GNU autosprintf}) 9053 9054@item Portability 9055autoconf (gettext.m4) and #if ENABLE_NLS 9056 9057@item po-mode marking 9058yes 9059@end table 9060 9061The following examples are available in the @file{examples} directory: 9062@code{hello-c}, @code{hello-c-gnome}, @code{hello-c++}, @code{hello-c++-qt}, 9063@code{hello-c++-kde}, @code{hello-c++-gnome}, @code{hello-c++-wxwidgets}, 9064@code{hello-objc}, @code{hello-objc-gnustep}, @code{hello-objc-gnome}. 9065 9066@node sh, bash, C, List of Programming Languages 9067@subsection sh - Shell Script 9068@cindex shell scripts 9069 9070@table @asis 9071@item RPMs 9072bash, gettext 9073 9074@item File extension 9075@code{sh} 9076 9077@item String syntax 9078@code{"abc"}, @code{'abc'}, @code{abc} 9079 9080@item gettext shorthand 9081@code{"`gettext \"abc\"`"} 9082 9083@item gettext/ngettext functions 9084@pindex gettext 9085@pindex ngettext 9086@code{gettext}, @code{ngettext} programs 9087@*@code{eval_gettext}, @code{eval_ngettext} shell functions 9088 9089@item textdomain 9090@vindex TEXTDOMAIN@r{, environment variable} 9091environment variable @code{TEXTDOMAIN} 9092 9093@item bindtextdomain 9094@vindex TEXTDOMAINDIR@r{, environment variable} 9095environment variable @code{TEXTDOMAINDIR} 9096 9097@item setlocale 9098automatic 9099 9100@item Prerequisite 9101@code{. gettext.sh} 9102 9103@item Use or emulate GNU gettext 9104use 9105 9106@item Extractor 9107@code{xgettext} 9108 9109@item Formatting with positions 9110--- 9111 9112@item Portability 9113fully portable 9114 9115@item po-mode marking 9116--- 9117@end table 9118 9119An example is available in the @file{examples} directory: @code{hello-sh}. 9120 9121@menu 9122* Preparing Shell Scripts:: Preparing Shell Scripts for Internationalization 9123* gettext.sh:: Contents of @code{gettext.sh} 9124* gettext Invocation:: Invoking the @code{gettext} program 9125* ngettext Invocation:: Invoking the @code{ngettext} program 9126* envsubst Invocation:: Invoking the @code{envsubst} program 9127* eval_gettext Invocation:: Invoking the @code{eval_gettext} function 9128* eval_ngettext Invocation:: Invoking the @code{eval_ngettext} function 9129@end menu 9130 9131@node Preparing Shell Scripts, gettext.sh, sh, sh 9132@subsubsection Preparing Shell Scripts for Internationalization 9133@cindex preparing shell scripts for translation 9134 9135Preparing a shell script for internationalization is conceptually similar 9136to the steps described in @ref{Sources}. The concrete steps for shell 9137scripts are as follows. 9138 9139@enumerate 9140@item 9141Insert the line 9142 9143@smallexample 9144. gettext.sh 9145@end smallexample 9146 9147near the top of the script. @code{gettext.sh} is a shell function library 9148that provides the functions 9149@code{eval_gettext} (see @ref{eval_gettext Invocation}) and 9150@code{eval_ngettext} (see @ref{eval_ngettext Invocation}). 9151You have to ensure that @code{gettext.sh} can be found in the @code{PATH}. 9152 9153@item 9154Set and export the @code{TEXTDOMAIN} and @code{TEXTDOMAINDIR} environment 9155variables. Usually @code{TEXTDOMAIN} is the package or program name, and 9156@code{TEXTDOMAINDIR} is the absolute pathname corresponding to 9157@code{$prefix/share/locale}, where @code{$prefix} is the installation location. 9158 9159@smallexample 9160TEXTDOMAIN=@@PACKAGE@@ 9161export TEXTDOMAIN 9162TEXTDOMAINDIR=@@LOCALEDIR@@ 9163export TEXTDOMAINDIR 9164@end smallexample 9165 9166@item 9167Prepare the strings for translation, as described in @ref{Preparing Strings}. 9168 9169@item 9170Simplify translatable strings so that they don't contain command substitution 9171(@code{"`...`"} or @code{"$(...)"}), variable access with defaulting (like 9172@code{$@{@var{variable}-@var{default}@}}), access to positional arguments 9173(like @code{$0}, @code{$1}, ...) or highly volatile shell variables (like 9174@code{$?}). This can always be done through simple local code restructuring. 9175For example, 9176 9177@smallexample 9178echo "Usage: $0 [OPTION] FILE..." 9179@end smallexample 9180 9181becomes 9182 9183@smallexample 9184program_name=$0 9185echo "Usage: $program_name [OPTION] FILE..." 9186@end smallexample 9187 9188Similarly, 9189 9190@smallexample 9191echo "Remaining files: `ls | wc -l`" 9192@end smallexample 9193 9194becomes 9195 9196@smallexample 9197filecount="`ls | wc -l`" 9198echo "Remaining files: $filecount" 9199@end smallexample 9200 9201@item 9202For each translatable string, change the output command @samp{echo} or 9203@samp{$echo} to @samp{gettext} (if the string contains no references to 9204shell variables) or to @samp{eval_gettext} (if it refers to shell variables), 9205followed by a no-argument @samp{echo} command (to account for the terminating 9206newline). Similarly, for cases with plural handling, replace a conditional 9207@samp{echo} command with an invocation of @samp{ngettext} or 9208@samp{eval_ngettext}, followed by a no-argument @samp{echo} command. 9209 9210When doing this, you also need to add an extra backslash before the dollar 9211sign in references to shell variables, so that the @samp{eval_gettext} 9212function receives the translatable string before the variable values are 9213substituted into it. For example, 9214 9215@smallexample 9216echo "Remaining files: $filecount" 9217@end smallexample 9218 9219becomes 9220 9221@smallexample 9222eval_gettext "Remaining files: \$filecount"; echo 9223@end smallexample 9224 9225If the output command is not @samp{echo}, you can make it use @samp{echo} 9226nevertheless, through the use of backquotes. However, note that inside 9227backquotes, backslashes must be doubled to be effective (because the 9228backquoting eats one level of backslashes). For example, assuming that 9229@samp{error} is a shell function that signals an error, 9230 9231@smallexample 9232error "file not found: $filename" 9233@end smallexample 9234 9235is first transformed into 9236 9237@smallexample 9238error "`echo \"file not found: \$filename\"`" 9239@end smallexample 9240 9241which then becomes 9242 9243@smallexample 9244error "`eval_gettext \"file not found: \\\$filename\"`" 9245@end smallexample 9246@end enumerate 9247 9248@node gettext.sh, gettext Invocation, Preparing Shell Scripts, sh 9249@subsubsection Contents of @code{gettext.sh} 9250 9251@code{gettext.sh}, contained in the run-time package of GNU gettext, provides 9252the following: 9253 9254@itemize @bullet 9255@item $echo 9256The variable @code{echo} is set to a command that outputs its first argument 9257and a newline, without interpreting backslashes in the argument string. 9258 9259@item eval_gettext 9260See @ref{eval_gettext Invocation}. 9261 9262@item eval_ngettext 9263See @ref{eval_ngettext Invocation}. 9264@end itemize 9265 9266@node gettext Invocation, ngettext Invocation, gettext.sh, sh 9267@subsubsection Invoking the @code{gettext} program 9268 9269@include rt-gettext.texi 9270 9271@node ngettext Invocation, envsubst Invocation, gettext Invocation, sh 9272@subsubsection Invoking the @code{ngettext} program 9273 9274@include rt-ngettext.texi 9275 9276@node envsubst Invocation, eval_gettext Invocation, ngettext Invocation, sh 9277@subsubsection Invoking the @code{envsubst} program 9278 9279@include rt-envsubst.texi 9280 9281@node eval_gettext Invocation, eval_ngettext Invocation, envsubst Invocation, sh 9282@subsubsection Invoking the @code{eval_gettext} function 9283 9284@cindex @code{eval_gettext} function, usage 9285@example 9286eval_gettext @var{msgid} 9287@end example 9288 9289@cindex lookup message translation 9290This function outputs the native language translation of a textual message, 9291performing dollar-substitution on the result. Note that only shell variables 9292mentioned in @var{msgid} will be dollar-substituted in the result. 9293 9294@node eval_ngettext Invocation, , eval_gettext Invocation, sh 9295@subsubsection Invoking the @code{eval_ngettext} function 9296 9297@cindex @code{eval_ngettext} function, usage 9298@example 9299eval_ngettext @var{msgid} @var{msgid-plural} @var{count} 9300@end example 9301 9302@cindex lookup plural message translation 9303This function outputs the native language translation of a textual message 9304whose grammatical form depends on a number, performing dollar-substitution 9305on the result. Note that only shell variables mentioned in @var{msgid} or 9306@var{msgid-plural} will be dollar-substituted in the result. 9307 9308@node bash, Python, sh, List of Programming Languages 9309@subsection bash - Bourne-Again Shell Script 9310@cindex bash 9311 9312GNU @code{bash} 2.0 or newer has a special shorthand for translating a 9313string and substituting variable values in it: @code{$"msgid"}. But 9314the use of this construct is @strong{discouraged}, due to the security 9315holes it opens and due to its portability problems. 9316 9317The security holes of @code{$"..."} come from the fact that after looking up 9318the translation of the string, @code{bash} processes it like it processes 9319any double-quoted string: dollar and backquote processing, like @samp{eval} 9320does. 9321 9322@enumerate 9323@item 9324In a locale whose encoding is one of BIG5, BIG5-HKSCS, GBK, GB18030, SHIFT_JIS, 9325JOHAB, some double-byte characters have a second byte whose value is 9326@code{0x60}. For example, the byte sequence @code{\xe0\x60} is a single 9327character in these locales. Many versions of @code{bash} (all versions 9328up to bash-2.05, and newer versions on platforms without @code{mbsrtowcs()} 9329function) don't know about character boundaries and see a backquote character 9330where there is only a particular Chinese character. Thus it can start 9331executing part of the translation as a command list. This situation can occur 9332even without the translator being aware of it: if the translator provides 9333translations in the UTF-8 encoding, it is the @code{gettext()} function which 9334will, during its conversion from the translator's encoding to the user's 9335locale's encoding, produce the dangerous @code{\x60} bytes. 9336 9337@item 9338A translator could - voluntarily or inadvertently - use backquotes 9339@code{"`...`"} or dollar-parentheses @code{"$(...)"} in her translations. 9340The enclosed strings would be executed as command lists by the shell. 9341@end enumerate 9342 9343The portability problem is that @code{bash} must be built with 9344internationalization support; this is normally not the case on systems 9345that don't have the @code{gettext()} function in libc. 9346 9347@node Python, Common Lisp, bash, List of Programming Languages 9348@subsection Python 9349@cindex Python 9350 9351@table @asis 9352@item RPMs 9353python 9354 9355@item File extension 9356@code{py} 9357 9358@item String syntax 9359@code{'abc'}, @code{u'abc'}, @code{r'abc'}, @code{ur'abc'}, 9360@*@code{"abc"}, @code{u"abc"}, @code{r"abc"}, @code{ur"abc"}, 9361@*@code{'''abc'''}, @code{u'''abc'''}, @code{r'''abc'''}, @code{ur'''abc'''}, 9362@*@code{"""abc"""}, @code{u"""abc"""}, @code{r"""abc"""}, @code{ur"""abc"""} 9363 9364@item gettext shorthand 9365@code{_('abc')} etc. 9366 9367@item gettext/ngettext functions 9368@code{gettext.gettext}, @code{gettext.dgettext}, 9369@code{gettext.ngettext}, @code{gettext.dngettext}, 9370also @code{ugettext}, @code{ungettext} 9371 9372@item textdomain 9373@code{gettext.textdomain} function, or 9374@code{gettext.install(@var{domain})} function 9375 9376@item bindtextdomain 9377@code{gettext.bindtextdomain} function, or 9378@code{gettext.install(@var{domain},@var{localedir})} function 9379 9380@item setlocale 9381not used by the gettext emulation 9382 9383@item Prerequisite 9384@code{import gettext} 9385 9386@item Use or emulate GNU gettext 9387emulate 9388 9389@item Extractor 9390@code{xgettext} 9391 9392@item Formatting with positions 9393@code{'...%(ident)d...' % @{ 'ident': value @}} 9394 9395@item Portability 9396fully portable 9397 9398@item po-mode marking 9399--- 9400@end table 9401 9402An example is available in the @file{examples} directory: @code{hello-python}. 9403 9404@node Common Lisp, clisp C, Python, List of Programming Languages 9405@subsection GNU clisp - Common Lisp 9406@cindex Common Lisp 9407@cindex Lisp 9408@cindex clisp 9409 9410@table @asis 9411@item RPMs 9412clisp 2.28 or newer 9413 9414@item File extension 9415@code{lisp} 9416 9417@item String syntax 9418@code{"abc"} 9419 9420@item gettext shorthand 9421@code{(_ "abc")}, @code{(ENGLISH "abc")} 9422 9423@item gettext/ngettext functions 9424@code{i18n:gettext}, @code{i18n:ngettext} 9425 9426@item textdomain 9427@code{i18n:textdomain} 9428 9429@item bindtextdomain 9430@code{i18n:textdomaindir} 9431 9432@item setlocale 9433automatic 9434 9435@item Prerequisite 9436--- 9437 9438@item Use or emulate GNU gettext 9439use 9440 9441@item Extractor 9442@code{xgettext -k_ -kENGLISH} 9443 9444@item Formatting with positions 9445@code{format "~1@@*~D ~0@@*~D"} 9446 9447@item Portability 9448On platforms without gettext, no translation. 9449 9450@item po-mode marking 9451--- 9452@end table 9453 9454An example is available in the @file{examples} directory: @code{hello-clisp}. 9455 9456@node clisp C, Emacs Lisp, Common Lisp, List of Programming Languages 9457@subsection GNU clisp C sources 9458@cindex clisp C sources 9459 9460@table @asis 9461@item RPMs 9462clisp 9463 9464@item File extension 9465@code{d} 9466 9467@item String syntax 9468@code{"abc"} 9469 9470@item gettext shorthand 9471@code{ENGLISH ? "abc" : ""} 9472@*@code{GETTEXT("abc")} 9473@*@code{GETTEXTL("abc")} 9474 9475@item gettext/ngettext functions 9476@code{clgettext}, @code{clgettextl} 9477 9478@item textdomain 9479--- 9480 9481@item bindtextdomain 9482--- 9483 9484@item setlocale 9485automatic 9486 9487@item Prerequisite 9488@code{#include "lispbibl.c"} 9489 9490@item Use or emulate GNU gettext 9491use 9492 9493@item Extractor 9494@code{clisp-xgettext} 9495 9496@item Formatting with positions 9497@code{fprintf "%2$d %1$d"} 9498 9499@item Portability 9500On platforms without gettext, no translation. 9501 9502@item po-mode marking 9503--- 9504@end table 9505 9506@node Emacs Lisp, librep, clisp C, List of Programming Languages 9507@subsection Emacs Lisp 9508@cindex Emacs Lisp 9509 9510@table @asis 9511@item RPMs 9512emacs, xemacs 9513 9514@item File extension 9515@code{el} 9516 9517@item String syntax 9518@code{"abc"} 9519 9520@item gettext shorthand 9521@code{(_"abc")} 9522 9523@item gettext/ngettext functions 9524@code{gettext}, @code{dgettext} (xemacs only) 9525 9526@item textdomain 9527@code{domain} special form (xemacs only) 9528 9529@item bindtextdomain 9530@code{bind-text-domain} function (xemacs only) 9531 9532@item setlocale 9533automatic 9534 9535@item Prerequisite 9536--- 9537 9538@item Use or emulate GNU gettext 9539use 9540 9541@item Extractor 9542@code{xgettext} 9543 9544@item Formatting with positions 9545@code{format "%2$d %1$d"} 9546 9547@item Portability 9548Only XEmacs. Without @code{I18N3} defined at build time, no translation. 9549 9550@item po-mode marking 9551--- 9552@end table 9553 9554@node librep, Scheme, Emacs Lisp, List of Programming Languages 9555@subsection librep 9556@cindex @code{librep} Lisp 9557 9558@table @asis 9559@item RPMs 9560librep 0.15.3 or newer 9561 9562@item File extension 9563@code{jl} 9564 9565@item String syntax 9566@code{"abc"} 9567 9568@item gettext shorthand 9569@code{(_"abc")} 9570 9571@item gettext/ngettext functions 9572@code{gettext} 9573 9574@item textdomain 9575@code{textdomain} function 9576 9577@item bindtextdomain 9578@code{bindtextdomain} function 9579 9580@item setlocale 9581--- 9582 9583@item Prerequisite 9584@code{(require 'rep.i18n.gettext)} 9585 9586@item Use or emulate GNU gettext 9587use 9588 9589@item Extractor 9590@code{xgettext} 9591 9592@item Formatting with positions 9593@code{format "%2$d %1$d"} 9594 9595@item Portability 9596On platforms without gettext, no translation. 9597 9598@item po-mode marking 9599--- 9600@end table 9601 9602An example is available in the @file{examples} directory: @code{hello-librep}. 9603 9604@node Scheme, Smalltalk, librep, List of Programming Languages 9605@subsection GNU guile - Scheme 9606@cindex Scheme 9607@cindex guile 9608 9609@table @asis 9610@item RPMs 9611guile 9612 9613@item File extension 9614@code{scm} 9615 9616@item String syntax 9617@code{"abc"} 9618 9619@item gettext shorthand 9620@code{(_ "abc")} 9621 9622@item gettext/ngettext functions 9623@code{gettext}, @code{ngettext} 9624 9625@item textdomain 9626@code{textdomain} 9627 9628@item bindtextdomain 9629@code{bindtextdomain} 9630 9631@item setlocale 9632@code{(catch #t (lambda () (setlocale LC_ALL "")) (lambda args #f))} 9633 9634@item Prerequisite 9635@code{(use-modules (ice-9 format))} 9636 9637@item Use or emulate GNU gettext 9638use 9639 9640@item Extractor 9641@code{xgettext -k_} 9642 9643@item Formatting with positions 9644@c @code{format "~1@@*~D ~0@@*~D~2@@*"}, requires @code{(use-modules (ice-9 format))} 9645@c not yet supported 9646--- 9647 9648@item Portability 9649On platforms without gettext, no translation. 9650 9651@item po-mode marking 9652--- 9653@end table 9654 9655An example is available in the @file{examples} directory: @code{hello-guile}. 9656 9657@node Smalltalk, Java, Scheme, List of Programming Languages 9658@subsection GNU Smalltalk 9659@cindex Smalltalk 9660 9661@table @asis 9662@item RPMs 9663smalltalk 9664 9665@item File extension 9666@code{st} 9667 9668@item String syntax 9669@code{'abc'} 9670 9671@item gettext shorthand 9672@code{NLS ? 'abc'} 9673 9674@item gettext/ngettext functions 9675@code{LcMessagesDomain>>#at:}, @code{LcMessagesDomain>>#at:plural:with:} 9676 9677@item textdomain 9678@code{LcMessages>>#domain:localeDirectory:} (returns a @code{LcMessagesDomain} 9679object).@* 9680Example: @code{I18N Locale default messages domain: 'gettext' localeDirectory: /usr/local/share/locale'} 9681 9682@item bindtextdomain 9683@code{LcMessages>>#domain:localeDirectory:}, see above. 9684 9685@item setlocale 9686Automatic if you use @code{I18N Locale default}. 9687 9688@item Prerequisite 9689@code{PackageLoader fileInPackage: 'I18N'!} 9690 9691@item Use or emulate GNU gettext 9692emulate 9693 9694@item Extractor 9695@code{xgettext} 9696 9697@item Formatting with positions 9698@code{'%1 %2' bindWith: 'Hello' with: 'world'} 9699 9700@item Portability 9701fully portable 9702 9703@item po-mode marking 9704--- 9705@end table 9706 9707An example is available in the @file{examples} directory: 9708@code{hello-smalltalk}. 9709 9710@node Java, C#, Smalltalk, List of Programming Languages 9711@subsection Java 9712@cindex Java 9713 9714@table @asis 9715@item RPMs 9716java, java2 9717 9718@item File extension 9719@code{java} 9720 9721@item String syntax 9722"abc" 9723 9724@item gettext shorthand 9725_("abc") 9726 9727@item gettext/ngettext functions 9728@code{GettextResource.gettext}, @code{GettextResource.ngettext}, 9729@code{GettextResource.pgettext}, @code{GettextResource.npgettext} 9730 9731@item textdomain 9732---, use @code{ResourceBundle.getResource} instead 9733 9734@item bindtextdomain 9735---, use CLASSPATH instead 9736 9737@item setlocale 9738automatic 9739 9740@item Prerequisite 9741--- 9742 9743@item Use or emulate GNU gettext 9744---, uses a Java specific message catalog format 9745 9746@item Extractor 9747@code{xgettext -k_} 9748 9749@item Formatting with positions 9750@code{MessageFormat.format "@{1,number@} @{0,number@}"} 9751 9752@item Portability 9753fully portable 9754 9755@item po-mode marking 9756--- 9757@end table 9758 9759Before marking strings as internationalizable, uses of the string 9760concatenation operator need to be converted to @code{MessageFormat} 9761applications. For example, @code{"file "+filename+" not found"} becomes 9762@code{MessageFormat.format("file @{0@} not found", new Object[] @{ filename @})}. 9763Only after this is done, can the strings be marked and extracted. 9764 9765GNU gettext uses the native Java internationalization mechanism, namely 9766@code{ResourceBundle}s. There are two formats of @code{ResourceBundle}s: 9767@code{.properties} files and @code{.class} files. The @code{.properties} 9768format is a text file which the translators can directly edit, like PO 9769files, but which doesn't support plural forms. Whereas the @code{.class} 9770format is compiled from @code{.java} source code and can support plural 9771forms (provided it is accessed through an appropriate API, see below). 9772 9773To convert a PO file to a @code{.properties} file, the @code{msgcat} 9774program can be used with the option @code{--properties-output}. To convert 9775a @code{.properties} file back to a PO file, the @code{msgcat} program 9776can be used with the option @code{--properties-input}. All the tools 9777that manipulate PO files can work with @code{.properties} files as well, 9778if given the @code{--properties-input} and/or @code{--properties-output} 9779option. 9780 9781To convert a PO file to a ResourceBundle class, the @code{msgfmt} program 9782can be used with the option @code{--java} or @code{--java2}. To convert a 9783ResourceBundle back to a PO file, the @code{msgunfmt} program can be used 9784with the option @code{--java}. 9785 9786Two different programmatic APIs can be used to access ResourceBundles. 9787Note that both APIs work with all kinds of ResourceBundles, whether 9788GNU gettext generated classes, or other @code{.class} or @code{.properties} 9789files. 9790 9791@enumerate 9792@item 9793The @code{java.util.ResourceBundle} API. 9794 9795In particular, its @code{getString} function returns a string translation. 9796Note that a missing translation yields a @code{MissingResourceException}. 9797 9798This has the advantage of being the standard API. And it does not require 9799any additional libraries, only the @code{msgcat} generated @code{.properties} 9800files or the @code{msgfmt} generated @code{.class} files. But it cannot do 9801plural handling, even if the resource was generated by @code{msgfmt} from 9802a PO file with plural handling. 9803 9804@item 9805The @code{gnu.gettext.GettextResource} API. 9806 9807Reference documentation in Javadoc 1.1 style format is in the 9808@uref{javadoc2/index.html,javadoc2 directory}. 9809 9810Its @code{gettext} function returns a string translation. Note that when 9811a translation is missing, the @var{msgid} argument is returned unchanged. 9812 9813This has the advantage of having the @code{ngettext} function for plural 9814handling and the @code{pgettext} and @code{npgettext} for strings constraint 9815to a particular context. 9816 9817@cindex @code{libintl} for Java 9818To use this API, one needs the @code{libintl.jar} file which is part of 9819the GNU gettext package and distributed under the LGPL. 9820@end enumerate 9821 9822Four examples, using the second API, are available in the @file{examples} 9823directory: @code{hello-java}, @code{hello-java-awt}, @code{hello-java-swing}, 9824@code{hello-java-qtjambi}. 9825 9826Now, to make use of the API and define a shorthand for @samp{getString}, 9827there are three idioms that you can choose from: 9828 9829@itemize @bullet 9830@item 9831(This one assumes Java 1.5 or newer.) 9832In a unique class of your project, say @samp{Util}, define a static variable 9833holding the @code{ResourceBundle} instance and the shorthand: 9834 9835@smallexample 9836private static ResourceBundle myResources = 9837 ResourceBundle.getBundle("domain-name"); 9838public static String _(String s) @{ 9839 return myResources.getString(s); 9840@} 9841@end smallexample 9842 9843All classes containing internationalized strings then contain 9844 9845@smallexample 9846import static Util._; 9847@end smallexample 9848 9849@noindent 9850and the shorthand is used like this: 9851 9852@smallexample 9853System.out.println(_("Operation completed.")); 9854@end smallexample 9855 9856@item 9857In a unique class of your project, say @samp{Util}, define a static variable 9858holding the @code{ResourceBundle} instance: 9859 9860@smallexample 9861public static ResourceBundle myResources = 9862 ResourceBundle.getBundle("domain-name"); 9863@end smallexample 9864 9865All classes containing internationalized strings then contain 9866 9867@smallexample 9868private static ResourceBundle res = Util.myResources; 9869private static String _(String s) @{ return res.getString(s); @} 9870@end smallexample 9871 9872@noindent 9873and the shorthand is used like this: 9874 9875@smallexample 9876System.out.println(_("Operation completed.")); 9877@end smallexample 9878 9879@item 9880You add a class with a very short name, say @samp{S}, containing just the 9881definition of the resource bundle and of the shorthand: 9882 9883@smallexample 9884public class S @{ 9885 public static ResourceBundle myResources = 9886 ResourceBundle.getBundle("domain-name"); 9887 public static String _(String s) @{ 9888 return myResources.getString(s); 9889 @} 9890@} 9891@end smallexample 9892 9893@noindent 9894and the shorthand is used like this: 9895 9896@smallexample 9897System.out.println(S._("Operation completed.")); 9898@end smallexample 9899@end itemize 9900 9901Which of the three idioms you choose, will depend on whether your project 9902requires portability to Java versions prior to Java 1.5 and, if so, whether 9903copying two lines of codes into every class is more acceptable in your project 9904than a class with a single-letter name. 9905 9906@node C#, gawk, Java, List of Programming Languages 9907@subsection C# 9908@cindex C# 9909 9910@table @asis 9911@item RPMs 9912pnet, pnetlib 0.6.2 or newer, or mono 0.29 or newer 9913 9914@item File extension 9915@code{cs} 9916 9917@item String syntax 9918@code{"abc"}, @code{@@"abc"} 9919 9920@item gettext shorthand 9921_("abc") 9922 9923@item gettext/ngettext functions 9924@code{GettextResourceManager.GetString}, 9925@code{GettextResourceManager.GetPluralString} 9926@code{GettextResourceManager.GetParticularString} 9927@code{GettextResourceManager.GetParticularPluralString} 9928 9929@item textdomain 9930@code{new GettextResourceManager(domain)} 9931 9932@item bindtextdomain 9933---, compiled message catalogs are located in subdirectories of the directory 9934containing the executable 9935 9936@item setlocale 9937automatic 9938 9939@item Prerequisite 9940--- 9941 9942@item Use or emulate GNU gettext 9943---, uses a C# specific message catalog format 9944 9945@item Extractor 9946@code{xgettext -k_} 9947 9948@item Formatting with positions 9949@code{String.Format "@{1@} @{0@}"} 9950 9951@item Portability 9952fully portable 9953 9954@item po-mode marking 9955--- 9956@end table 9957 9958Before marking strings as internationalizable, uses of the string 9959concatenation operator need to be converted to @code{String.Format} 9960invocations. For example, @code{"file "+filename+" not found"} becomes 9961@code{String.Format("file @{0@} not found", filename)}. 9962Only after this is done, can the strings be marked and extracted. 9963 9964GNU gettext uses the native C#/.NET internationalization mechanism, namely 9965the classes @code{ResourceManager} and @code{ResourceSet}. Applications 9966use the @code{ResourceManager} methods to retrieve the native language 9967translation of strings. An instance of @code{ResourceSet} is the in-memory 9968representation of a message catalog file. The @code{ResourceManager} loads 9969and accesses @code{ResourceSet} instances as needed to look up the 9970translations. 9971 9972There are two formats of @code{ResourceSet}s that can be directly loaded by 9973the C# runtime: @code{.resources} files and @code{.dll} files. 9974 9975@itemize @bullet 9976@item 9977The @code{.resources} format is a binary file usually generated through the 9978@code{resgen} or @code{monoresgen} utility, but which doesn't support plural 9979forms. @code{.resources} files can also be embedded in .NET @code{.exe} files. 9980This only affects whether a file system access is performed to load the message 9981catalog; it doesn't affect the contents of the message catalog. 9982 9983@item 9984On the other hand, the @code{.dll} format is a binary file that is compiled 9985from @code{.cs} source code and can support plural forms (provided it is 9986accessed through the GNU gettext API, see below). 9987@end itemize 9988 9989Note that these .NET @code{.dll} and @code{.exe} files are not tied to a 9990particular platform; their file format and GNU gettext for C# can be used 9991on any platform. 9992 9993To convert a PO file to a @code{.resources} file, the @code{msgfmt} program 9994can be used with the option @samp{--csharp-resources}. To convert a 9995@code{.resources} file back to a PO file, the @code{msgunfmt} program can be 9996used with the option @samp{--csharp-resources}. You can also, in some cases, 9997use the @code{resgen} program (from the @code{pnet} package) or the 9998@code{monoresgen} program (from the @code{mono}/@code{mcs} package). These 9999programs can also convert a @code{.resources} file back to a PO file. But 10000beware: as of this writing (January 2004), the @code{monoresgen} converter is 10001quite buggy and the @code{resgen} converter ignores the encoding of the PO 10002files. 10003 10004To convert a PO file to a @code{.dll} file, the @code{msgfmt} program can be 10005used with the option @code{--csharp}. The result will be a @code{.dll} file 10006containing a subclass of @code{GettextResourceSet}, which itself is a subclass 10007of @code{ResourceSet}. To convert a @code{.dll} file containing a 10008@code{GettextResourceSet} subclass back to a PO file, the @code{msgunfmt} 10009program can be used with the option @code{--csharp}. 10010 10011The advantages of the @code{.dll} format over the @code{.resources} format 10012are: 10013 10014@enumerate 10015@item 10016Freedom to localize: Users can add their own translations to an application 10017after it has been built and distributed. Whereas when the programmer uses 10018a @code{ResourceManager} constructor provided by the system, the set of 10019@code{.resources} files for an application must be specified when the 10020application is built and cannot be extended afterwards. 10021@c If this were the only issue with the @code{.resources} format, one could 10022@c use the @code{ResourceManager.CreateFileBasedResourceManager} function. 10023 10024@item 10025Plural handling: A message catalog in @code{.dll} format supports the plural 10026handling function @code{GetPluralString}. Whereas @code{.resources} files can 10027only contain data and only support lookups that depend on a single string. 10028 10029@item 10030Context handling: A message catalog in @code{.dll} format supports the 10031query-with-context functions @code{GetParticularString} and 10032@code{GetParticularPluralString}. Whereas @code{.resources} files can 10033only contain data and only support lookups that depend on a single string. 10034 10035@item 10036The @code{GettextResourceManager} that loads the message catalogs in 10037@code{.dll} format also provides for inheritance on a per-message basis. 10038For example, in Austrian (@code{de_AT}) locale, translations from the German 10039(@code{de}) message catalog will be used for messages not found in the 10040Austrian message catalog. This has the consequence that the Austrian 10041translators need only translate those few messages for which the translation 10042into Austrian differs from the German one. Whereas when working with 10043@code{.resources} files, each message catalog must provide the translations 10044of all messages by itself. 10045 10046@item 10047The @code{GettextResourceManager} that loads the message catalogs in 10048@code{.dll} format also provides for a fallback: The English @var{msgid} is 10049returned when no translation can be found. Whereas when working with 10050@code{.resources} files, a language-neutral @code{.resources} file must 10051explicitly be provided as a fallback. 10052@end enumerate 10053 10054On the side of the programmatic APIs, the programmer can use either the 10055standard @code{ResourceManager} API and the GNU @code{GettextResourceManager} 10056API. The latter is an extension of the former, because 10057@code{GettextResourceManager} is a subclass of @code{ResourceManager}. 10058 10059@enumerate 10060@item 10061The @code{System.Resources.ResourceManager} API. 10062 10063This API works with resources in @code{.resources} format. 10064 10065The creation of the @code{ResourceManager} is done through 10066@smallexample 10067 new ResourceManager(domainname, Assembly.GetExecutingAssembly()) 10068@end smallexample 10069@noindent 10070 10071The @code{GetString} function returns a string's translation. Note that this 10072function returns null when a translation is missing (i.e.@: not even found in 10073the fallback resource file). 10074 10075@item 10076The @code{GNU.Gettext.GettextResourceManager} API. 10077 10078This API works with resources in @code{.dll} format. 10079 10080Reference documentation is in the 10081@uref{csharpdoc/index.html,csharpdoc directory}. 10082 10083The creation of the @code{ResourceManager} is done through 10084@smallexample 10085 new GettextResourceManager(domainname) 10086@end smallexample 10087 10088The @code{GetString} function returns a string's translation. Note that when 10089a translation is missing, the @var{msgid} argument is returned unchanged. 10090 10091The @code{GetPluralString} function returns a string translation with plural 10092handling, like the @code{ngettext} function in C. 10093 10094The @code{GetParticularString} function returns a string's translation, 10095specific to a particular context, like the @code{pgettext} function in C. 10096Note that when a translation is missing, the @var{msgid} argument is returned 10097unchanged. 10098 10099The @code{GetParticularPluralString} function returns a string translation, 10100specific to a particular context, with plural handling, like the 10101@code{npgettext} function in C. 10102 10103@cindex @code{libintl} for C# 10104To use this API, one needs the @code{GNU.Gettext.dll} file which is part of 10105the GNU gettext package and distributed under the LGPL. 10106@end enumerate 10107 10108You can also mix both approaches: use the 10109@code{GNU.Gettext.GettextResourceManager} constructor, but otherwise use 10110only the @code{ResourceManager} type and only the @code{GetString} method. 10111This is appropriate when you want to profit from the tools for PO files, 10112but don't want to change an existing source code that uses 10113@code{ResourceManager} and don't (yet) need the @code{GetPluralString} method. 10114 10115Two examples, using the second API, are available in the @file{examples} 10116directory: @code{hello-csharp}, @code{hello-csharp-forms}. 10117 10118Now, to make use of the API and define a shorthand for @samp{GetString}, 10119there are two idioms that you can choose from: 10120 10121@itemize @bullet 10122@item 10123In a unique class of your project, say @samp{Util}, define a static variable 10124holding the @code{ResourceManager} instance: 10125 10126@smallexample 10127public static GettextResourceManager MyResourceManager = 10128 new GettextResourceManager("domain-name"); 10129@end smallexample 10130 10131All classes containing internationalized strings then contain 10132 10133@smallexample 10134private static GettextResourceManager Res = Util.MyResourceManager; 10135private static String _(String s) @{ return Res.GetString(s); @} 10136@end smallexample 10137 10138@noindent 10139and the shorthand is used like this: 10140 10141@smallexample 10142Console.WriteLine(_("Operation completed.")); 10143@end smallexample 10144 10145@item 10146You add a class with a very short name, say @samp{S}, containing just the 10147definition of the resource manager and of the shorthand: 10148 10149@smallexample 10150public class S @{ 10151 public static GettextResourceManager MyResourceManager = 10152 new GettextResourceManager("domain-name"); 10153 public static String _(String s) @{ 10154 return MyResourceManager.GetString(s); 10155 @} 10156@} 10157@end smallexample 10158 10159@noindent 10160and the shorthand is used like this: 10161 10162@smallexample 10163Console.WriteLine(S._("Operation completed.")); 10164@end smallexample 10165@end itemize 10166 10167Which of the two idioms you choose, will depend on whether copying two lines 10168of codes into every class is more acceptable in your project than a class 10169with a single-letter name. 10170 10171@node gawk, Pascal, C#, List of Programming Languages 10172@subsection GNU awk 10173@cindex awk 10174@cindex gawk 10175 10176@table @asis 10177@item RPMs 10178gawk 3.1 or newer 10179 10180@item File extension 10181@code{awk} 10182 10183@item String syntax 10184@code{"abc"} 10185 10186@item gettext shorthand 10187@code{_"abc"} 10188 10189@item gettext/ngettext functions 10190@code{dcgettext}, missing @code{dcngettext} in gawk-3.1.0 10191 10192@item textdomain 10193@code{TEXTDOMAIN} variable 10194 10195@item bindtextdomain 10196@code{bindtextdomain} function 10197 10198@item setlocale 10199automatic, but missing @code{setlocale (LC_MESSAGES, "")} in gawk-3.1.0 10200 10201@item Prerequisite 10202--- 10203 10204@item Use or emulate GNU gettext 10205use 10206 10207@item Extractor 10208@code{xgettext} 10209 10210@item Formatting with positions 10211@code{printf "%2$d %1$d"} (GNU awk only) 10212 10213@item Portability 10214On platforms without gettext, no translation. On non-GNU awks, you must 10215define @code{dcgettext}, @code{dcngettext} and @code{bindtextdomain} 10216yourself. 10217 10218@item po-mode marking 10219--- 10220@end table 10221 10222An example is available in the @file{examples} directory: @code{hello-gawk}. 10223 10224@node Pascal, wxWidgets, gawk, List of Programming Languages 10225@subsection Pascal - Free Pascal Compiler 10226@cindex Pascal 10227@cindex Free Pascal 10228@cindex Object Pascal 10229 10230@table @asis 10231@item RPMs 10232fpk 10233 10234@item File extension 10235@code{pp}, @code{pas} 10236 10237@item String syntax 10238@code{'abc'} 10239 10240@item gettext shorthand 10241automatic 10242 10243@item gettext/ngettext functions 10244---, use @code{ResourceString} data type instead 10245 10246@item textdomain 10247---, use @code{TranslateResourceStrings} function instead 10248 10249@item bindtextdomain 10250---, use @code{TranslateResourceStrings} function instead 10251 10252@item setlocale 10253automatic, but uses only LANG, not LC_MESSAGES or LC_ALL 10254 10255@item Prerequisite 10256@code{@{$mode delphi@}} or @code{@{$mode objfpc@}}@*@code{uses gettext;} 10257 10258@item Use or emulate GNU gettext 10259emulate partially 10260 10261@item Extractor 10262@code{ppc386} followed by @code{xgettext} or @code{rstconv} 10263 10264@item Formatting with positions 10265@code{uses sysutils;}@*@code{format "%1:d %0:d"} 10266 10267@item Portability 10268? 10269 10270@item po-mode marking 10271--- 10272@end table 10273 10274The Pascal compiler has special support for the @code{ResourceString} data 10275type. It generates a @code{.rst} file. This is then converted to a 10276@code{.pot} file by use of @code{xgettext} or @code{rstconv}. At runtime, 10277a @code{.mo} file corresponding to translations of this @code{.pot} file 10278can be loaded using the @code{TranslateResourceStrings} function in the 10279@code{gettext} unit. 10280 10281An example is available in the @file{examples} directory: @code{hello-pascal}. 10282 10283@node wxWidgets, YCP, Pascal, List of Programming Languages 10284@subsection wxWidgets library 10285@cindex @code{wxWidgets} library 10286 10287@table @asis 10288@item RPMs 10289wxGTK, gettext 10290 10291@item File extension 10292@code{cpp} 10293 10294@item String syntax 10295@code{"abc"} 10296 10297@item gettext shorthand 10298@code{_("abc")} 10299 10300@item gettext/ngettext functions 10301@code{wxLocale::GetString}, @code{wxGetTranslation} 10302 10303@item textdomain 10304@code{wxLocale::AddCatalog} 10305 10306@item bindtextdomain 10307@code{wxLocale::AddCatalogLookupPathPrefix} 10308 10309@item setlocale 10310@code{wxLocale::Init}, @code{wxSetLocale} 10311 10312@item Prerequisite 10313@code{#include <wx/intl.h>} 10314 10315@item Use or emulate GNU gettext 10316emulate, see @code{include/wx/intl.h} and @code{src/common/intl.cpp} 10317 10318@item Extractor 10319@code{xgettext} 10320 10321@item Formatting with positions 10322wxString::Format supports positions if and only if the system has 10323@code{wprintf()}, @code{vswprintf()} functions and they support positions 10324according to POSIX. 10325 10326@item Portability 10327fully portable 10328 10329@item po-mode marking 10330yes 10331@end table 10332 10333@node YCP, Tcl, wxWidgets, List of Programming Languages 10334@subsection YCP - YaST2 scripting language 10335@cindex YCP 10336@cindex YaST2 scripting language 10337 10338@table @asis 10339@item RPMs 10340libycp, libycp-devel, yast2-core, yast2-core-devel 10341 10342@item File extension 10343@code{ycp} 10344 10345@item String syntax 10346@code{"abc"} 10347 10348@item gettext shorthand 10349@code{_("abc")} 10350 10351@item gettext/ngettext functions 10352@code{_()} with 1 or 3 arguments 10353 10354@item textdomain 10355@code{textdomain} statement 10356 10357@item bindtextdomain 10358--- 10359 10360@item setlocale 10361--- 10362 10363@item Prerequisite 10364--- 10365 10366@item Use or emulate GNU gettext 10367use 10368 10369@item Extractor 10370@code{xgettext} 10371 10372@item Formatting with positions 10373@code{sformat "%2 %1"} 10374 10375@item Portability 10376fully portable 10377 10378@item po-mode marking 10379--- 10380@end table 10381 10382An example is available in the @file{examples} directory: @code{hello-ycp}. 10383 10384@node Tcl, Perl, YCP, List of Programming Languages 10385@subsection Tcl - Tk's scripting language 10386@cindex Tcl 10387@cindex Tk's scripting language 10388 10389@table @asis 10390@item RPMs 10391tcl 10392 10393@item File extension 10394@code{tcl} 10395 10396@item String syntax 10397@code{"abc"} 10398 10399@item gettext shorthand 10400@code{[_ "abc"]} 10401 10402@item gettext/ngettext functions 10403@code{::msgcat::mc} 10404 10405@item textdomain 10406--- 10407 10408@item bindtextdomain 10409---, use @code{::msgcat::mcload} instead 10410 10411@item setlocale 10412automatic, uses LANG, but ignores LC_MESSAGES and LC_ALL 10413 10414@item Prerequisite 10415@code{package require msgcat} 10416@*@code{proc _ @{s@} @{return [::msgcat::mc $s]@}} 10417 10418@item Use or emulate GNU gettext 10419---, uses a Tcl specific message catalog format 10420 10421@item Extractor 10422@code{xgettext -k_} 10423 10424@item Formatting with positions 10425@code{format "%2\$d %1\$d"} 10426 10427@item Portability 10428fully portable 10429 10430@item po-mode marking 10431--- 10432@end table 10433 10434Two examples are available in the @file{examples} directory: 10435@code{hello-tcl}, @code{hello-tcl-tk}. 10436 10437Before marking strings as internationalizable, substitutions of variables 10438into the string need to be converted to @code{format} applications. For 10439example, @code{"file $filename not found"} becomes 10440@code{[format "file %s not found" $filename]}. 10441Only after this is done, can the strings be marked and extracted. 10442After marking, this example becomes 10443@code{[format [_ "file %s not found"] $filename]} or 10444@code{[msgcat::mc "file %s not found" $filename]}. Note that the 10445@code{msgcat::mc} function implicitly calls @code{format} when more than one 10446argument is given. 10447 10448@node Perl, PHP, Tcl, List of Programming Languages 10449@subsection Perl 10450@cindex Perl 10451 10452@table @asis 10453@item RPMs 10454perl 10455 10456@item File extension 10457@code{pl}, @code{PL}, @code{pm}, @code{cgi} 10458 10459@item String syntax 10460@itemize @bullet 10461 10462@item @code{"abc"} 10463 10464@item @code{'abc'} 10465 10466@item @code{qq (abc)} 10467 10468@item @code{q (abc)} 10469 10470@item @code{qr /abc/} 10471 10472@item @code{qx (/bin/date)} 10473 10474@item @code{/pattern match/} 10475 10476@item @code{?pattern match?} 10477 10478@item @code{s/substitution/operators/} 10479 10480@item @code{$tied_hash@{"message"@}} 10481 10482@item @code{$tied_hash_reference->@{"message"@}} 10483 10484@item etc., issue the command @samp{man perlsyn} for details 10485 10486@end itemize 10487 10488@item gettext shorthand 10489@code{__} (double underscore) 10490 10491@item gettext/ngettext functions 10492@code{gettext}, @code{dgettext}, @code{dcgettext}, @code{ngettext}, 10493@code{dngettext}, @code{dcngettext} 10494 10495@item textdomain 10496@code{textdomain} function 10497 10498@item bindtextdomain 10499@code{bindtextdomain} function 10500 10501@item bind_textdomain_codeset 10502@code{bind_textdomain_codeset} function 10503 10504@item setlocale 10505Use @code{setlocale (LC_ALL, "");} 10506 10507@item Prerequisite 10508@code{use POSIX;} 10509@*@code{use Locale::TextDomain;} (included in the package libintl-perl 10510which is available on the Comprehensive Perl Archive Network CPAN, 10511http://www.cpan.org/). 10512 10513@item Use or emulate GNU gettext 10514platform dependent: gettext_pp emulates, gettext_xs uses GNU gettext 10515 10516@item Extractor 10517@code{xgettext -k__ -k\$__ -k%__ -k__x -k__n:1,2 -k__nx:1,2 -k__xn:1,2 -kN__ -k} 10518 10519@item Formatting with positions 10520Both kinds of format strings support formatting with positions. 10521@*@code{printf "%2\$d %1\$d", ...} (requires Perl 5.8.0 or newer) 10522@*@code{__expand("[new] replaces [old]", old => $oldvalue, new => $newvalue)} 10523 10524@item Portability 10525The @code{libintl-perl} package is platform independent but is not 10526part of the Perl core. The programmer is responsible for 10527providing a dummy implementation of the required functions if the 10528package is not installed on the target system. 10529 10530@item po-mode marking 10531--- 10532 10533@item Documentation 10534Included in @code{libintl-perl}, available on CPAN 10535(http://www.cpan.org/). 10536 10537@end table 10538 10539An example is available in the @file{examples} directory: @code{hello-perl}. 10540 10541@cindex marking Perl sources 10542 10543The @code{xgettext} parser backend for Perl differs significantly from 10544the parser backends for other programming languages, just as Perl 10545itself differs significantly from other programming languages. The 10546Perl parser backend offers many more string marking facilities than 10547the other backends but it also has some Perl specific limitations, the 10548worst probably being its imperfectness. 10549 10550@menu 10551* General Problems:: General Problems Parsing Perl Code 10552* Default Keywords:: Which Keywords Will xgettext Look For? 10553* Special Keywords:: How to Extract Hash Keys 10554* Quote-like Expressions:: What are Strings And Quote-like Expressions? 10555* Interpolation I:: Invalid String Interpolation 10556* Interpolation II:: Valid String Interpolation 10557* Parentheses:: When To Use Parentheses 10558* Long Lines:: How To Grok with Long Lines 10559* Perl Pitfalls:: Bugs, Pitfalls, and Things That Do Not Work 10560@end menu 10561 10562@node General Problems, Default Keywords, , Perl 10563@subsubsection General Problems Parsing Perl Code 10564 10565It is often heard that only Perl can parse Perl. This is not true. 10566Perl cannot be @emph{parsed} at all, it can only be @emph{executed}. 10567Perl has various built-in ambiguities that can only be resolved at runtime. 10568 10569The following example may illustrate one common problem: 10570 10571@example 10572print gettext "Hello World!"; 10573@end example 10574 10575Although this example looks like a bullet-proof case of a function 10576invocation, it is not: 10577 10578@example 10579open gettext, ">testfile" or die; 10580print gettext "Hello world!" 10581@end example 10582 10583In this context, the string @code{gettext} looks more like a 10584file handle. But not necessarily: 10585 10586@example 10587use Locale::Messages qw (:libintl_h); 10588open gettext ">testfile" or die; 10589print gettext "Hello world!"; 10590@end example 10591 10592Now, the file is probably syntactically incorrect, provided that the module 10593@code{Locale::Messages} found first in the Perl include path exports a 10594function @code{gettext}. But what if the module 10595@code{Locale::Messages} really looks like this? 10596 10597@example 10598use vars qw (*gettext); 10599 106001; 10601@end example 10602 10603In this case, the string @code{gettext} will be interpreted as a file 10604handle again, and the above example will create a file @file{testfile} 10605and write the string ``Hello world!'' into it. Even advanced 10606control flow analysis will not really help: 10607 10608@example 10609if (0.5 < rand) @{ 10610 eval "use Sane"; 10611@} else @{ 10612 eval "use InSane"; 10613@} 10614print gettext "Hello world!"; 10615@end example 10616 10617If the module @code{Sane} exports a function @code{gettext} that does 10618what we expect, and the module @code{InSane} opens a file for writing 10619and associates the @emph{handle} @code{gettext} with this output 10620stream, we are clueless again about what will happen at runtime. It is 10621completely unpredictable. The truth is that Perl has so many ways to 10622fill its symbol table at runtime that it is impossible to interpret a 10623particular piece of code without executing it. 10624 10625Of course, @code{xgettext} will not execute your Perl sources while 10626scanning for translatable strings, but rather use heuristics in order 10627to guess what you meant. 10628 10629Another problem is the ambiguity of the slash and the question mark. 10630Their interpretation depends on the context: 10631 10632@example 10633# A pattern match. 10634print "OK\n" if /foobar/; 10635 10636# A division. 10637print 1 / 2; 10638 10639# Another pattern match. 10640print "OK\n" if ?foobar?; 10641 10642# Conditional. 10643print $x ? "foo" : "bar"; 10644@end example 10645 10646The slash may either act as the division operator or introduce a 10647pattern match, whereas the question mark may act as the ternary 10648conditional operator or as a pattern match, too. Other programming 10649languages like @code{awk} present similar problems, but the consequences of a 10650misinterpretation are particularly nasty with Perl sources. In @code{awk} 10651for instance, a statement can never exceed one line and the parser 10652can recover from a parsing error at the next newline and interpret 10653the rest of the input stream correctly. Perl is different, as a 10654pattern match is terminated by the next appearance of the delimiter 10655(the slash or the question mark) in the input stream, regardless of 10656the semantic context. If a slash is really a division sign but 10657mis-interpreted as a pattern match, the rest of the input file is most 10658probably parsed incorrectly. 10659 10660If you find that @code{xgettext} fails to extract strings from 10661portions of your sources, you should therefore look out for slashes 10662and/or question marks preceding these sections. You may have come 10663across a bug in @code{xgettext}'s Perl parser (and of course you 10664should report that bug). In the meantime you should consider to 10665reformulate your code in a manner less challenging to @code{xgettext}. 10666 10667@node Default Keywords, Special Keywords, General Problems, Perl 10668@subsubsection Which keywords will xgettext look for? 10669@cindex Perl default keywords 10670 10671Unless you instruct @code{xgettext} otherwise by invoking it with one 10672of the options @code{--keyword} or @code{-k}, it will recognize the 10673following keywords in your Perl sources: 10674 10675@itemize @bullet 10676 10677@item @code{gettext} 10678 10679@item @code{dgettext} 10680 10681@item @code{dcgettext} 10682 10683@item @code{ngettext:1,2} 10684 10685The first (singular) and the second (plural) argument will be 10686extracted. 10687 10688@item @code{dngettext:1,2} 10689 10690The first (singular) and the second (plural) argument will be 10691extracted. 10692 10693@item @code{dcngettext:1,2} 10694 10695The first (singular) and the second (plural) argument will be 10696extracted. 10697 10698@item @code{gettext_noop} 10699 10700@item @code{%gettext} 10701 10702The keys of lookups into the hash @code{%gettext} will be extracted. 10703 10704@item @code{$gettext} 10705 10706The keys of lookups into the hash reference @code{$gettext} will be extracted. 10707 10708@end itemize 10709 10710@node Special Keywords, Quote-like Expressions, Default Keywords, Perl 10711@subsubsection How to Extract Hash Keys 10712@cindex Perl special keywords for hash-lookups 10713 10714Translating messages at runtime is normally performed by looking up the 10715original string in the translation database and returning the 10716translated version. The ``natural'' Perl implementation is a hash 10717lookup, and, of course, @code{xgettext} supports such practice. 10718 10719@example 10720print __"Hello world!"; 10721print $__@{"Hello world!"@}; 10722print $__->@{"Hello world!"@}; 10723print $$__@{"Hello world!"@}; 10724@end example 10725 10726The above four lines all do the same thing. The Perl module 10727@code{Locale::TextDomain} exports by default a hash @code{%__} that 10728is tied to the function @code{__()}. It also exports a reference 10729@code{$__} to @code{%__}. 10730 10731If an argument to the @code{xgettext} option @code{--keyword}, 10732resp. @code{-k} starts with a percent sign, the rest of the keyword is 10733interpreted as the name of a hash. If it starts with a dollar 10734sign, the rest of the keyword is interpreted as a reference to a 10735hash. 10736 10737Note that you can omit the quotation marks (single or double) around 10738the hash key (almost) whenever Perl itself allows it: 10739 10740@example 10741print $gettext@{Error@}; 10742@end example 10743 10744The exact rule is: You can omit the surrounding quotes, when the hash 10745key is a valid C (!) identifier, i.e.@: when it starts with an 10746underscore or an ASCII letter and is followed by an arbitrary number 10747of underscores, ASCII letters or digits. Other Unicode characters 10748are @emph{not} allowed, regardless of the @code{use utf8} pragma. 10749 10750@node Quote-like Expressions, Interpolation I, Special Keywords, Perl 10751@subsubsection What are Strings And Quote-like Expressions? 10752@cindex Perl quote-like expressions 10753 10754Perl offers a plethora of different string constructs. Those that can 10755be used either as arguments to functions or inside braces for hash 10756lookups are generally supported by @code{xgettext}. 10757 10758@itemize @bullet 10759@item @strong{double-quoted strings} 10760@* 10761@example 10762print gettext "Hello World!"; 10763@end example 10764 10765@item @strong{single-quoted strings} 10766@* 10767@example 10768print gettext 'Hello World!'; 10769@end example 10770 10771@item @strong{the operator qq} 10772@* 10773@example 10774print gettext qq |Hello World!|; 10775print gettext qq <E-mail: <guido\@@imperia.net>>; 10776@end example 10777 10778The operator @code{qq} is fully supported. You can use arbitrary 10779delimiters, including the four bracketing delimiters (round, angle, 10780square, curly) that nest. 10781 10782@item @strong{the operator q} 10783@* 10784@example 10785print gettext q |Hello World!|; 10786print gettext q <E-mail: <guido@@imperia.net>>; 10787@end example 10788 10789The operator @code{q} is fully supported. You can use arbitrary 10790delimiters, including the four bracketing delimiters (round, angle, 10791square, curly) that nest. 10792 10793@item @strong{the operator qx} 10794@* 10795@example 10796print gettext qx ;LANGUAGE=C /bin/date; 10797print gettext qx [/usr/bin/ls | grep '^[A-Z]*']; 10798@end example 10799 10800The operator @code{qx} is fully supported. You can use arbitrary 10801delimiters, including the four bracketing delimiters (round, angle, 10802square, curly) that nest. 10803 10804The example is actually a useless use of @code{gettext}. It will 10805invoke the @code{gettext} function on the output of the command 10806specified with the @code{qx} operator. The feature was included 10807in order to make the interface consistent (the parser will extract 10808all strings and quote-like expressions). 10809 10810@item @strong{here documents} 10811@* 10812@example 10813@group 10814print gettext <<'EOF'; 10815program not found in $PATH 10816EOF 10817 10818print ngettext <<EOF, <<"EOF"; 10819one file deleted 10820EOF 10821several files deleted 10822EOF 10823@end group 10824@end example 10825 10826Here-documents are recognized. If the delimiter is enclosed in single 10827quotes, the string is not interpolated. If it is enclosed in double 10828quotes or has no quotes at all, the string is interpolated. 10829 10830Delimiters that start with a digit are not supported! 10831 10832@end itemize 10833 10834@node Interpolation I, Interpolation II, Quote-like Expressions, Perl 10835@subsubsection Invalid Uses Of String Interpolation 10836@cindex Perl invalid string interpolation 10837 10838Perl is capable of interpolating variables into strings. This offers 10839some nice features in localized programs but can also lead to 10840problems. 10841 10842A common error is a construct like the following: 10843 10844@example 10845print gettext "This is the program $0!\n"; 10846@end example 10847 10848Perl will interpolate at runtime the value of the variable @code{$0} 10849into the argument of the @code{gettext()} function. Hence, this 10850argument is not a string constant but a variable argument (@code{$0} 10851is a global variable that holds the name of the Perl script being 10852executed). The interpolation is performed by Perl before the string 10853argument is passed to @code{gettext()} and will therefore depend on 10854the name of the script which can only be determined at runtime. 10855Consequently, it is almost impossible that a translation can be looked 10856up at runtime (except if, by accident, the interpolated string is found 10857in the message catalog). 10858 10859The @code{xgettext} program will therefore terminate parsing with a fatal 10860error if it encounters a variable inside of an extracted string. In 10861general, this will happen for all kinds of string interpolations that 10862cannot be safely performed at compile time. If you absolutely know 10863what you are doing, you can always circumvent this behavior: 10864 10865@example 10866my $know_what_i_am_doing = "This is program $0!\n"; 10867print gettext $know_what_i_am_doing; 10868@end example 10869 10870Since the parser only recognizes strings and quote-like expressions, 10871but not variables or other terms, the above construct will be 10872accepted. You will have to find another way, however, to let your 10873original string make it into your message catalog. 10874 10875If invoked with the option @code{--extract-all}, resp. @code{-a}, 10876variable interpolation will be accepted. Rationale: You will 10877generally use this option in order to prepare your sources for 10878internationalization. 10879 10880Please see the manual page @samp{man perlop} for details of strings and 10881quote-like expressions that are subject to interpolation and those 10882that are not. Safe interpolations (that will not lead to a fatal 10883error) are: 10884 10885@itemize @bullet 10886 10887@item the escape sequences @code{\t} (tab, HT, TAB), @code{\n} 10888(newline, NL), @code{\r} (return, CR), @code{\f} (form feed, FF), 10889@code{\b} (backspace, BS), @code{\a} (alarm, bell, BEL), and @code{\e} 10890(escape, ESC). 10891 10892@item octal chars, like @code{\033} 10893@* 10894Note that octal escapes in the range of 400-777 are translated into a 10895UTF-8 representation, regardless of the presence of the @code{use utf8} pragma. 10896 10897@item hex chars, like @code{\x1b} 10898 10899@item wide hex chars, like @code{\x@{263a@}} 10900@* 10901Note that this escape is translated into a UTF-8 representation, 10902regardless of the presence of the @code{use utf8} pragma. 10903 10904@item control chars, like @code{\c[} (CTRL-[) 10905 10906@item named Unicode chars, like @code{\N@{LATIN CAPITAL LETTER C WITH CEDILLA@}} 10907@* 10908Note that this escape is translated into a UTF-8 representation, 10909regardless of the presence of the @code{use utf8} pragma. 10910@end itemize 10911 10912The following escapes are considered partially safe: 10913 10914@itemize @bullet 10915 10916@item @code{\l} lowercase next char 10917 10918@item @code{\u} uppercase next char 10919 10920@item @code{\L} lowercase till \E 10921 10922@item @code{\U} uppercase till \E 10923 10924@item @code{\E} end case modification 10925 10926@item @code{\Q} quote non-word characters till \E 10927 10928@end itemize 10929 10930These escapes are only considered safe if the string consists of 10931ASCII characters only. Translation of characters outside the range 10932defined by ASCII is locale-dependent and can actually only be performed 10933at runtime; @code{xgettext} doesn't do these locale-dependent translations 10934at extraction time. 10935 10936Except for the modifier @code{\Q}, these translations, albeit valid, 10937are generally useless and only obfuscate your sources. If a 10938translation can be safely performed at compile time you can just as 10939well write what you mean. 10940 10941@node Interpolation II, Parentheses, Interpolation I, Perl 10942@subsubsection Valid Uses Of String Interpolation 10943@cindex Perl valid string interpolation 10944 10945Perl is often used to generate sources for other programming languages 10946or arbitrary file formats. Web applications that output HTML code 10947make a prominent example for such usage. 10948 10949You will often come across situations where you want to intersperse 10950code written in the target (programming) language with translatable 10951messages, like in the following HTML example: 10952 10953@example 10954print gettext <<EOF; 10955<h1>My Homepage</h1> 10956<script language="JavaScript"><!-- 10957for (i = 0; i < 100; ++i) @{ 10958 alert ("Thank you so much for visiting my homepage!"); 10959@} 10960//--></script> 10961EOF 10962@end example 10963 10964The parser will extract the entire here document, and it will appear 10965entirely in the resulting PO file, including the JavaScript snippet 10966embedded in the HTML code. If you exaggerate with constructs like 10967the above, you will run the risk that the translators of your package 10968will look out for a less challenging project. You should consider an 10969alternative expression here: 10970 10971@example 10972print <<EOF; 10973<h1>$gettext@{"My Homepage"@}</h1> 10974<script language="JavaScript"><!-- 10975for (i = 0; i < 100; ++i) @{ 10976 alert ("$gettext@{'Thank you so much for visiting my homepage!'@}"); 10977@} 10978//--></script> 10979EOF 10980@end example 10981 10982Only the translatable portions of the code will be extracted here, and 10983the resulting PO file will begrudgingly improve in terms of readability. 10984 10985You can interpolate hash lookups in all strings or quote-like 10986expressions that are subject to interpolation (see the manual page 10987@samp{man perlop} for details). Double interpolation is invalid, however: 10988 10989@example 10990# TRANSLATORS: Replace "the earth" with the name of your planet. 10991print gettext qq@{Welcome to $gettext->@{"the earth"@}@}; 10992@end example 10993 10994The @code{qq}-quoted string is recognized as an argument to @code{xgettext} in 10995the first place, and checked for invalid variable interpolation. The 10996dollar sign of hash-dereferencing will therefore terminate the parser 10997with an ``invalid interpolation'' error. 10998 10999It is valid to interpolate hash lookups in regular expressions: 11000 11001@example 11002if ($var =~ /$gettext@{"the earth"@}/) @{ 11003 print gettext "Match!\n"; 11004@} 11005s/$gettext@{"U. S. A."@}/$gettext@{"U. S. A."@} $gettext@{"(dial +0)"@}/g; 11006@end example 11007 11008@node Parentheses, Long Lines, Interpolation II, Perl 11009@subsubsection When To Use Parentheses 11010@cindex Perl parentheses 11011 11012In Perl, parentheses around function arguments are mostly optional. 11013@code{xgettext} will always assume that all 11014recognized keywords (except for hashes and hash references) are names 11015of properly prototyped functions, and will (hopefully) only require 11016parentheses where Perl itself requires them. All constructs in the 11017following example are therefore ok to use: 11018 11019@example 11020@group 11021print gettext ("Hello World!\n"); 11022print gettext "Hello World!\n"; 11023print dgettext ($package => "Hello World!\n"); 11024print dgettext $package, "Hello World!\n"; 11025 11026# The "fat comma" => turns the left-hand side argument into a 11027# single-quoted string! 11028print dgettext smellovision => "Hello World!\n"; 11029 11030# The following assignment only works with prototyped functions. 11031# Otherwise, the functions will act as "greedy" list operators and 11032# eat up all following arguments. 11033my $anonymous_hash = @{ 11034 planet => gettext "earth", 11035 cakes => ngettext "one cake", "several cakes", $n, 11036 still => $works, 11037@}; 11038# The same without fat comma: 11039my $other_hash = @{ 11040 'planet', gettext "earth", 11041 'cakes', ngettext "one cake", "several cakes", $n, 11042 'still', $works, 11043@}; 11044 11045# Parentheses are only significant for the first argument. 11046print dngettext 'package', ("one cake", "several cakes", $n), $discarded; 11047@end group 11048@end example 11049 11050@node Long Lines, Perl Pitfalls, Parentheses, Perl 11051@subsubsection How To Grok with Long Lines 11052@cindex Perl long lines 11053 11054The necessity of long messages can often lead to a cumbersome or 11055unreadable coding style. Perl has several options that may prevent 11056you from writing unreadable code, and 11057@code{xgettext} does its best to do likewise. This is where the dot 11058operator (the string concatenation operator) may come in handy: 11059 11060@example 11061@group 11062print gettext ("This is a very long" 11063 . " message that is still" 11064 . " readable, because" 11065 . " it is split into" 11066 . " multiple lines.\n"); 11067@end group 11068@end example 11069 11070Perl is smart enough to concatenate these constant string fragments 11071into one long string at compile time, and so is 11072@code{xgettext}. You will only find one long message in the resulting 11073POT file. 11074 11075Note that the future Perl 6 will probably use the underscore 11076(@samp{_}) as the string concatenation operator, and the dot 11077(@samp{.}) for dereferencing. This new syntax is not yet supported by 11078@code{xgettext}. 11079 11080If embedded newline characters are not an issue, or even desired, you 11081may also insert newline characters inside quoted strings wherever you 11082feel like it: 11083 11084@example 11085@group 11086print gettext ("<em>In HTML output 11087embedded newlines are generally no 11088problem, since adjacent whitespace 11089is always rendered into a single 11090space character.</em>"); 11091@end group 11092@end example 11093 11094You may also consider to use here documents: 11095 11096@example 11097@group 11098print gettext <<EOF; 11099<em>In HTML output 11100embedded newlines are generally no 11101problem, since adjacent whitespace 11102is always rendered into a single 11103space character.</em> 11104EOF 11105@end group 11106@end example 11107 11108Please do not forget that the line breaks are real, i.e.@: they 11109translate into newline characters that will consequently show up in 11110the resulting POT file. 11111 11112@node Perl Pitfalls, , Long Lines, Perl 11113@subsubsection Bugs, Pitfalls, And Things That Do Not Work 11114@cindex Perl pitfalls 11115 11116The foregoing sections should have proven that 11117@code{xgettext} is quite smart in extracting translatable strings from 11118Perl sources. Yet, some more or less exotic constructs that could be 11119expected to work, actually do not work. 11120 11121One of the more relevant limitations can be found in the 11122implementation of variable interpolation inside quoted strings. Only 11123simple hash lookups can be used there: 11124 11125@example 11126print <<EOF; 11127$gettext@{"The dot operator" 11128 . " does not work" 11129 . "here!"@} 11130Likewise, you cannot @@@{[ gettext ("interpolate function calls") ]@} 11131inside quoted strings or quote-like expressions. 11132EOF 11133@end example 11134 11135This is valid Perl code and will actually trigger invocations of the 11136@code{gettext} function at runtime. Yet, the Perl parser in 11137@code{xgettext} will fail to recognize the strings. A less obvious 11138example can be found in the interpolation of regular expressions: 11139 11140@example 11141s/<!--START_OF_WEEK-->/gettext ("Sunday")/e; 11142@end example 11143 11144The modifier @code{e} will cause the substitution to be interpreted as 11145an evaluable statement. Consequently, at runtime the function 11146@code{gettext()} is called, but again, the parser fails to extract the 11147string ``Sunday''. Use a temporary variable as a simple workaround if 11148you really happen to need this feature: 11149 11150@example 11151my $sunday = gettext "Sunday"; 11152s/<!--START_OF_WEEK-->/$sunday/; 11153@end example 11154 11155Hash slices would also be handy but are not recognized: 11156 11157@example 11158my @@weekdays = @@gettext@{'Sunday', 'Monday', 'Tuesday', 'Wednesday', 11159 'Thursday', 'Friday', 'Saturday'@}; 11160# Or even: 11161@@weekdays = @@gettext@{qw (Sunday Monday Tuesday Wednesday Thursday 11162 Friday Saturday) @}; 11163@end example 11164 11165This is perfectly valid usage of the tied hash @code{%gettext} but the 11166strings are not recognized and therefore will not be extracted. 11167 11168Another caveat of the current version is its rudimentary support for 11169non-ASCII characters in identifiers. You may encounter serious 11170problems if you use identifiers with characters outside the range of 11171'A'-'Z', 'a'-'z', '0'-'9' and the underscore '_'. 11172 11173Maybe some of these missing features will be implemented in future 11174versions, but since you can always make do without them at minimal effort, 11175these todos have very low priority. 11176 11177A nasty problem are brace format strings that already contain braces 11178as part of the normal text, for example the usage strings typically 11179encountered in programs: 11180 11181@example 11182die "usage: $0 @{OPTIONS@} FILENAME...\n"; 11183@end example 11184 11185If you want to internationalize this code with Perl brace format strings, 11186you will run into a problem: 11187 11188@example 11189die __x ("usage: @{program@} @{OPTIONS@} FILENAME...\n", program => $0); 11190@end example 11191 11192Whereas @samp{@{program@}} is a placeholder, @samp{@{OPTIONS@}} 11193is not and should probably be translated. Yet, there is no way to teach 11194the Perl parser in @code{xgettext} to recognize the first one, and leave 11195the other one alone. 11196 11197There are two possible work-arounds for this problem. If you are 11198sure that your program will run under Perl 5.8.0 or newer (these 11199Perl versions handle positional parameters in @code{printf()}) or 11200if you are sure that the translator will not have to reorder the arguments 11201in her translation -- for example if you have only one brace placeholder 11202in your string, or if it describes a syntax, like in this one --, you can 11203mark the string as @code{no-perl-brace-format} and use @code{printf()}: 11204 11205@example 11206# xgettext: no-perl-brace-format 11207die sprintf ("usage: %s @{OPTIONS@} FILENAME...\n", $0); 11208@end example 11209 11210If you want to use the more portable Perl brace format, you will have to do 11211put placeholders in place of the literal braces: 11212 11213@example 11214die __x ("usage: @{program@} @{[@}OPTIONS@{]@} FILENAME...\n", 11215 program => $0, '[' => '@{', ']' => '@}'); 11216@end example 11217 11218Perl brace format strings know no escaping mechanism. No matter how this 11219escaping mechanism looked like, it would either give the programmer a 11220hard time, make translating Perl brace format strings heavy-going, or 11221result in a performance penalty at runtime, when the format directives 11222get executed. Most of the time you will happily get along with 11223@code{printf()} for this special case. 11224 11225@node PHP, Pike, Perl, List of Programming Languages 11226@subsection PHP Hypertext Preprocessor 11227@cindex PHP 11228 11229@table @asis 11230@item RPMs 11231mod_php4, mod_php4-core, phpdoc 11232 11233@item File extension 11234@code{php}, @code{php3}, @code{php4} 11235 11236@item String syntax 11237@code{"abc"}, @code{'abc'} 11238 11239@item gettext shorthand 11240@code{_("abc")} 11241 11242@item gettext/ngettext functions 11243@code{gettext}, @code{dgettext}, @code{dcgettext}; starting with PHP 4.2.0 11244also @code{ngettext}, @code{dngettext}, @code{dcngettext} 11245 11246@item textdomain 11247@code{textdomain} function 11248 11249@item bindtextdomain 11250@code{bindtextdomain} function 11251 11252@item setlocale 11253Programmer must call @code{setlocale (LC_ALL, "")} 11254 11255@item Prerequisite 11256--- 11257 11258@item Use or emulate GNU gettext 11259use 11260 11261@item Extractor 11262@code{xgettext} 11263 11264@item Formatting with positions 11265@code{printf "%2\$d %1\$d"} 11266 11267@item Portability 11268On platforms without gettext, the functions are not available. 11269 11270@item po-mode marking 11271--- 11272@end table 11273 11274An example is available in the @file{examples} directory: @code{hello-php}. 11275 11276@node Pike, GCC-source, PHP, List of Programming Languages 11277@subsection Pike 11278@cindex Pike 11279 11280@table @asis 11281@item RPMs 11282roxen 11283 11284@item File extension 11285@code{pike} 11286 11287@item String syntax 11288@code{"abc"} 11289 11290@item gettext shorthand 11291--- 11292 11293@item gettext/ngettext functions 11294@code{gettext}, @code{dgettext}, @code{dcgettext} 11295 11296@item textdomain 11297@code{textdomain} function 11298 11299@item bindtextdomain 11300@code{bindtextdomain} function 11301 11302@item setlocale 11303@code{setlocale} function 11304 11305@item Prerequisite 11306@code{import Locale.Gettext;} 11307 11308@item Use or emulate GNU gettext 11309use 11310 11311@item Extractor 11312--- 11313 11314@item Formatting with positions 11315--- 11316 11317@item Portability 11318On platforms without gettext, the functions are not available. 11319 11320@item po-mode marking 11321--- 11322@end table 11323 11324@node GCC-source, , Pike, List of Programming Languages 11325@subsection GNU Compiler Collection sources 11326@cindex GCC-source 11327 11328@table @asis 11329@item RPMs 11330gcc 11331 11332@item File extension 11333@code{c}, @code{h}. 11334 11335@item String syntax 11336@code{"abc"} 11337 11338@item gettext shorthand 11339@code{_("abc")} 11340 11341@item gettext/ngettext functions 11342@code{gettext}, @code{dgettext}, @code{dcgettext}, @code{ngettext}, 11343@code{dngettext}, @code{dcngettext} 11344 11345@item textdomain 11346@code{textdomain} function 11347 11348@item bindtextdomain 11349@code{bindtextdomain} function 11350 11351@item setlocale 11352Programmer must call @code{setlocale (LC_ALL, "")} 11353 11354@item Prerequisite 11355@code{#include "intl.h"} 11356 11357@item Use or emulate GNU gettext 11358Use 11359 11360@item Extractor 11361@code{xgettext -k_} 11362 11363@item Formatting with positions 11364--- 11365 11366@item Portability 11367Uses autoconf macros 11368 11369@item po-mode marking 11370yes 11371@end table 11372 11373@c This is the template for new languages. 11374@ignore 11375 11376@ node 11377@ subsection 11378 11379@table @asis 11380@item RPMs 11381 11382@item File extension 11383 11384@item String syntax 11385 11386@item gettext shorthand 11387 11388@item gettext/ngettext functions 11389 11390@item textdomain 11391 11392@item bindtextdomain 11393 11394@item setlocale 11395 11396@item Prerequisite 11397 11398@item Use or emulate GNU gettext 11399 11400@item Extractor 11401 11402@item Formatting with positions 11403 11404@item Portability 11405 11406@item po-mode marking 11407@end table 11408 11409@end ignore 11410 11411@node List of Data Formats, , List of Programming Languages, Programming Languages 11412@section Internationalizable Data 11413 11414Here is a list of other data formats which can be internationalized 11415using GNU gettext. 11416 11417@menu 11418* POT:: POT - Portable Object Template 11419* RST:: Resource String Table 11420* Glade:: Glade - GNOME user interface description 11421@end menu 11422 11423@node POT, RST, List of Data Formats, List of Data Formats 11424@subsection POT - Portable Object Template 11425 11426@table @asis 11427@item RPMs 11428gettext 11429 11430@item File extension 11431@code{pot}, @code{po} 11432 11433@item Extractor 11434@code{xgettext} 11435@end table 11436 11437@node RST, Glade, POT, List of Data Formats 11438@subsection Resource String Table 11439@cindex RST 11440 11441@table @asis 11442@item RPMs 11443fpk 11444 11445@item File extension 11446@code{rst} 11447 11448@item Extractor 11449@code{xgettext}, @code{rstconv} 11450@end table 11451 11452@node Glade, , RST, List of Data Formats 11453@subsection Glade - GNOME user interface description 11454 11455@table @asis 11456@item RPMs 11457glade, libglade, glade2, libglade2, intltool 11458 11459@item File extension 11460@code{glade}, @code{glade2} 11461 11462@item Extractor 11463@code{xgettext}, @code{libglade-xgettext}, @code{xml-i18n-extract}, @code{intltool-extract} 11464@end table 11465 11466@c This is the template for new data formats. 11467@ignore 11468 11469@ node 11470@ subsection 11471 11472@table @asis 11473@item RPMs 11474 11475@item File extension 11476 11477@item Extractor 11478@end table 11479 11480@end ignore 11481 11482@node Conclusion, Language Codes, Programming Languages, Top 11483@chapter Concluding Remarks 11484 11485We would like to conclude this GNU @code{gettext} manual by presenting 11486an history of the Translation Project so far. We finally give 11487a few pointers for those who want to do further research or readings 11488about Native Language Support matters. 11489 11490@menu 11491* History:: History of GNU @code{gettext} 11492* References:: Related Readings 11493@end menu 11494 11495@node History, References, Conclusion, Conclusion 11496@section History of GNU @code{gettext} 11497@cindex history of GNU @code{gettext} 11498 11499Internationalization concerns and algorithms have been informally 11500and casually discussed for years in GNU, sometimes around GNU 11501@code{libc}, maybe around the incoming @code{Hurd}, or otherwise 11502(nobody clearly remembers). And even then, when the work started for 11503real, this was somewhat independently of these previous discussions. 11504 11505This all began in July 1994, when Patrick D'Cruze had the idea and 11506initiative of internationalizing version 3.9.2 of GNU @code{fileutils}. 11507He then asked Jim Meyering, the maintainer, how to get those changes 11508folded into an official release. That first draft was full of 11509@code{#ifdef}s and somewhat disconcerting, and Jim wanted to find 11510nicer ways. Patrick and Jim shared some tries and experimentations 11511in this area. Then, feeling that this might eventually have a deeper 11512impact on GNU, Jim wanted to know what standards were, and contacted 11513Richard Stallman, who very quickly and verbally described an overall 11514design for what was meant to become @code{glocale}, at that time. 11515 11516Jim implemented @code{glocale} and got a lot of exhausting feedback 11517from Patrick and Richard, of course, but also from Mitchum DSouza 11518(who wrote a @code{catgets}-like package), Roland McGrath, maybe David 11519MacKenzie, Fran@,{c}ois Pinard, and Paul Eggert, all pushing and 11520pulling in various directions, not always compatible, to the extent 11521that after a couple of test releases, @code{glocale} was torn apart. 11522In particular, Paul Eggert -- always keeping an eye on developments 11523in Solaris -- advocated the use of the @code{gettext} API over 11524@code{glocale}'s @code{catgets}-based API. 11525 11526While Jim took some distance and time and became dad for a second 11527time, Roland wanted to get GNU @code{libc} internationalized, and 11528got Ulrich Drepper involved in that project. Instead of starting 11529from @code{glocale}, Ulrich rewrote something from scratch, but 11530more conforming to the set of guidelines who emerged out of the 11531@code{glocale} effort. Then, Ulrich got people from the previous 11532forum to involve themselves into this new project, and the switch 11533from @code{glocale} to what was first named @code{msgutils}, renamed 11534@code{nlsutils}, and later @code{gettext}, became officially accepted 11535by Richard in May 1995 or so. 11536 11537Let's summarize by saying that Ulrich Drepper wrote GNU @code{gettext} 11538in April 1995. The first official release of the package, including 11539PO mode, occurred in July 1995, and was numbered 0.7. Other people 11540contributed to the effort by providing a discussion forum around 11541Ulrich, writing little pieces of code, or testing. These are quoted 11542in the @code{THANKS} file which comes with the GNU @code{gettext} 11543distribution. 11544 11545While this was being done, Fran@,{c}ois adapted half a dozen of 11546GNU packages to @code{glocale} first, then later to @code{gettext}, 11547putting them in pretest, so providing along the way an effective 11548user environment for fine tuning the evolving tools. He also took 11549the responsibility of organizing and coordinating the Translation 11550Project. After nearly a year of informal exchanges between people from 11551many countries, translator teams started to exist in May 1995, through 11552the creation and support by Patrick D'Cruze of twenty unmoderated 11553mailing lists for that many native languages, and two moderated 11554lists: one for reaching all teams at once, the other for reaching 11555all willing maintainers of internationalized free software packages. 11556 11557Fran@,{c}ois also wrote PO mode in June 1995 with the collaboration 11558of Greg McGary, as a kind of contribution to Ulrich's package. 11559He also gave a hand with the GNU @code{gettext} Texinfo manual. 11560 11561In 1997, Ulrich Drepper released the GNU libc 2.0, which included the 11562@code{gettext}, @code{textdomain} and @code{bindtextdomain} functions. 11563 11564In 2000, Ulrich Drepper added plural form handling (the @code{ngettext} 11565function) to GNU libc. Later, in 2001, he released GNU libc 2.2.x, 11566which is the first free C library with full internationalization support. 11567 11568Ulrich being quite busy in his role of General Maintainer of GNU libc, 11569he handed over the GNU @code{gettext} maintenance to Bruno Haible in 115702000. Bruno added the plural form handling to the tools as well, added 11571support for UTF-8 and CJK locales, and wrote a few new tools for 11572manipulating PO files. 11573 11574@node References, , History, Conclusion 11575@section Related Readings 11576@cindex related reading 11577@cindex bibliography 11578 11579@strong{ NOTE: } This documentation section is outdated and needs to be 11580revised. 11581 11582Eugene H. Dorr (@file{dorre@@well.com}) maintains an interesting 11583bibliography on internationalization matters, called 11584@cite{Internationalization Reference List}, which is available as: 11585@example 11586ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/i18n-books.txt 11587@end example 11588 11589Michael Gschwind (@file{mike@@vlsivie.tuwien.ac.at}) maintains a 11590Frequently Asked Questions (FAQ) list, entitled @cite{Programming for 11591Internationalisation}. This FAQ discusses writing programs which 11592can handle different language conventions, character sets, etc.; 11593and is applicable to all character set encodings, with particular 11594emphasis on @w{ISO 8859-1}. It is regularly published in Usenet 11595groups @file{comp.unix.questions}, @file{comp.std.internat}, 11596@file{comp.software.international}, @file{comp.lang.c}, 11597@file{comp.windows.x}, @file{comp.std.c}, @file{comp.answers} 11598and @file{news.answers}. The home location of this document is: 11599@example 11600ftp://ftp.vlsivie.tuwien.ac.at/pub/8bit/ISO-programming 11601@end example 11602 11603Patrick D'Cruze (@file{pdcruze@@li.org}) wrote a tutorial about NLS 11604matters, and Jochen Hein (@file{Hein@@student.tu-clausthal.de}) took 11605over the responsibility of maintaining it. It may be found as: 11606@example 11607ftp://sunsite.unc.edu/pub/Linux/utils/nls/catalogs/Incoming/... 11608 ...locale-tutorial-0.8.txt.gz 11609@end example 11610@noindent 11611This site is mirrored in: 11612@example 11613ftp://ftp.ibp.fr/pub/linux/sunsite/ 11614@end example 11615 11616A French version of the same tutorial should be findable at: 11617@example 11618ftp://ftp.ibp.fr/pub/linux/french/docs/ 11619@end example 11620@noindent 11621together with French translations of many Linux-related documents. 11622 11623@node Language Codes, Country Codes, Conclusion, Top 11624@appendix Language Codes 11625@cindex language codes 11626@cindex ISO 639 11627 11628The @w{ISO 639} standard defines two-letter codes for many languages, and 11629three-letter codes for more rarely used languages. 11630All abbreviations for languages used in the Translation Project should 11631come from this standard. 11632 11633@menu 11634* Usual Language Codes:: Two-letter ISO 639 language codes 11635* Rare Language Codes:: Three-letter ISO 639 language codes 11636@end menu 11637 11638@node Usual Language Codes, Rare Language Codes, Language Codes, Language Codes 11639@appendixsec Usual Language Codes 11640 11641For the commonly used languages, the @w{ISO 639-1} standard defines two-letter 11642codes. 11643 11644@table @samp 11645@include iso-639.texi 11646@end table 11647 11648@node Rare Language Codes, , Usual Language Codes, Language Codes 11649@appendixsec Rare Language Codes 11650 11651For rarely used languages, the @w{ISO 639-2} standard defines three-letter 11652codes. Here is the current list, reduced to only living languages with at least 11653one million of speakers. 11654 11655@table @samp 11656@include iso-639-2.texi 11657@end table 11658 11659@node Country Codes, Licenses, Language Codes, Top 11660@appendix Country Codes 11661@cindex country codes 11662@cindex ISO 3166 11663 11664The @w{ISO 3166} standard defines two character codes for many countries 11665and territories. All abbreviations for countries used in the Translation 11666Project should come from this standard. 11667 11668@table @samp 11669@include iso-3166.texi 11670@end table 11671 11672@node Licenses, Program Index, Country Codes, Top 11673@appendix Licenses 11674@cindex Licenses 11675 11676The files of this package are covered by the licenses indicated in each 11677particular file or directory. Here is a summary: 11678 11679@itemize @bullet 11680@item 11681The @code{libintl} and @code{libasprintf} libraries are covered by the 11682GNU Library General Public License (LGPL). 11683A copy of the license is included in @ref{GNU LGPL}. 11684 11685@item 11686The executable programs of this package and the @code{libgettextpo} library 11687are covered by the GNU General Public License (GPL). 11688A copy of the license is included in @ref{GNU GPL}. 11689 11690@item 11691This manual is free documentation. It is dually licensed under the 11692GNU FDL and the GNU GPL. This means that you can redistribute this 11693manual under either of these two licenses, at your choice. 11694@* 11695This manual is covered by the GNU FDL. Permission is granted to copy, 11696distribute and/or modify this document under the terms of the 11697GNU Free Documentation License (FDL), either version 1.2 of the 11698License, or (at your option) any later version published by the 11699Free Software Foundation (FSF); with no Invariant Sections, with no 11700Front-Cover Text, and with no Back-Cover Texts. 11701A copy of the license is included in @ref{GNU FDL}. 11702@* 11703This manual is covered by the GNU GPL. You can redistribute it and/or 11704modify it under the terms of the GNU General Public License (GPL), either 11705version 2 of the License, or (at your option) any later version published 11706by the Free Software Foundation (FSF). 11707A copy of the license is included in @ref{GNU GPL}. 11708@end itemize 11709 11710@menu 11711* GNU GPL:: GNU General Public License 11712* GNU LGPL:: GNU Lesser General Public License 11713* GNU FDL:: GNU Free Documentation License 11714@end menu 11715 11716@page 11717@include gpl.texi 11718@page 11719@include lgpl.texi 11720@page 11721@include fdl.texi 11722 11723@node Program Index, Option Index, Licenses, Top 11724@unnumbered Program Index 11725 11726@printindex pg 11727 11728@node Option Index, Variable Index, Program Index, Top 11729@unnumbered Option Index 11730 11731@printindex op 11732 11733@node Variable Index, PO Mode Index, Option Index, Top 11734@unnumbered Variable Index 11735 11736@printindex vr 11737 11738@node PO Mode Index, Autoconf Macro Index, Variable Index, Top 11739@unnumbered PO Mode Index 11740 11741@printindex em 11742 11743@node Autoconf Macro Index, Index, PO Mode Index, Top 11744@unnumbered Autoconf Macro Index 11745 11746@printindex am 11747 11748@node Index, , Autoconf Macro Index, Top 11749@unnumbered General Index 11750 11751@printindex cp 11752 11753@iftex 11754@c Table of Contents 11755@contents 11756@end iftex 11757 11758@bye 11759 11760@c Local variables: 11761@c texinfo-column-for-description: 32 11762@c End: 11763