1<HTML>
2<HEAD>
3<!-- This HTML file has been created by texi2html 1.52b
4     from gettext.texi on 29 December 2011 -->
5
6<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">
7<TITLE>GNU gettext utilities - 3  The Format of PO Files</TITLE>
8</HEAD>
9<BODY>
10Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_2.html">previous</A>, <A HREF="gettext_4.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
11<P><HR><P>
12
13
14<H1><A NAME="SEC15" HREF="gettext_toc.html#TOC15">3  The Format of PO Files</A></H1>
15<P>
16<A NAME="IDX55"></A>
17<A NAME="IDX56"></A>
18
19</P>
20<P>
21The GNU <CODE>gettext</CODE> toolset helps programmers and translators
22at producing, updating and using translation files, mainly those
23PO files which are textual, editable files.  This chapter explains
24the format of PO files.
25
26</P>
27<P>
28A PO file is made up of many entries, each entry holding the relation
29between an original untranslated string and its corresponding
30translation.  All entries in a given PO file usually pertain
31to a single project, and all translations are expressed in a single
32target language.  One PO file <EM>entry</EM> has the following schematic
33structure:
34
35</P>
36
37<PRE>
38<VAR>white-space</VAR>
39#  <VAR>translator-comments</VAR>
40#. <VAR>extracted-comments</VAR>
41#: <VAR>reference</VAR>...
42#, <VAR>flag</VAR>...
43#| msgid <VAR>previous-untranslated-string</VAR>
44msgid <VAR>untranslated-string</VAR>
45msgstr <VAR>translated-string</VAR>
46</PRE>
47
48<P>
49The general structure of a PO file should be well understood by
50the translator.  When using PO mode, very little has to be known
51about the format details, as PO mode takes care of them for her.
52
53</P>
54<P>
55A simple entry can look like this:
56
57</P>
58
59<PRE>
60#: lib/error.c:116
61msgid "Unknown system error"
62msgstr "Error desconegut del sistema"
63</PRE>
64
65<P>
66<A NAME="IDX57"></A>
67<A NAME="IDX58"></A>
68<A NAME="IDX59"></A>
69Entries begin with some optional white space.  Usually, when generated
70through GNU <CODE>gettext</CODE> tools, there is exactly one blank line
71between entries.  Then comments follow, on lines all starting with the
72character <CODE>#</CODE>.  There are two kinds of comments: those which have
73some white space immediately following the <CODE>#</CODE> - the <VAR>translator
74comments</VAR> -, which comments are created and maintained exclusively by the
75translator, and those which have some non-white character just after the
76<CODE>#</CODE> - the <VAR>automatic comments</VAR> -, which comments are created and
77maintained automatically by GNU <CODE>gettext</CODE> tools.  Comment lines
78starting with <CODE>#.</CODE> contain comments given by the programmer, directed
79at the translator; these comments are called <VAR>extracted comments</VAR>
80because the <CODE>xgettext</CODE> program extracts them from the program's
81source code.  Comment lines starting with <CODE>#:</CODE> contain references to
82the program's source code.  Comment lines starting with <CODE>#,</CODE> contain
83flags; more about these below.  Comment lines starting with <CODE>#|</CODE>
84contain the previous untranslated string for which the translator gave
85a translation.
86
87</P>
88<P>
89All comments, of either kind, are optional.
90
91</P>
92<P>
93<A NAME="IDX60"></A>
94<A NAME="IDX61"></A>
95After white space and comments, entries show two strings, namely
96first the untranslated string as it appears in the original program
97sources, and then, the translation of this string.  The original
98string is introduced by the keyword <CODE>msgid</CODE>, and the translation,
99by <CODE>msgstr</CODE>.  The two strings, untranslated and translated,
100are quoted in various ways in the PO file, using <CODE>"</CODE>
101delimiters and <CODE>\</CODE> escapes, but the translator does not really
102have to pay attention to the precise quoting format, as PO mode fully
103takes care of quoting for her.
104
105</P>
106<P>
107The <CODE>msgid</CODE> strings, as well as automatic comments, are produced
108and managed by other GNU <CODE>gettext</CODE> tools, and PO mode does not
109provide means for the translator to alter these.  The most she can
110do is merely deleting them, and only by deleting the whole entry.
111On the other hand, the <CODE>msgstr</CODE> string, as well as translator
112comments, are really meant for the translator, and PO mode gives her
113the full control she needs.
114
115</P>
116<P>
117The comment lines beginning with <CODE>#,</CODE> are special because they are
118not completely ignored by the programs as comments generally are.  The
119comma separated list of <VAR>flag</VAR>s is used by the <CODE>msgfmt</CODE>
120program to give the user some better diagnostic messages.  Currently
121there are two forms of flags defined:
122
123</P>
124<DL COMPACT>
125
126<DT><CODE>fuzzy</CODE>
127<DD>
128<A NAME="IDX62"></A>
129This flag can be generated by the <CODE>msgmerge</CODE> program or it can be
130inserted by the translator herself.  It shows that the <CODE>msgstr</CODE>
131string might not be a correct translation (anymore).  Only the translator
132can judge if the translation requires further modification, or is
133acceptable as is.  Once satisfied with the translation, she then removes
134this <CODE>fuzzy</CODE> attribute.  The <CODE>msgmerge</CODE> program inserts this
135when it combined the <CODE>msgid</CODE> and <CODE>msgstr</CODE> entries after fuzzy
136search only.  See section <A HREF="gettext_8.html#SEC64">8.3.6  Fuzzy Entries</A>.
137
138<DT><CODE>c-format</CODE>
139<DD>
140<A NAME="IDX63"></A>
141<DT><CODE>no-c-format</CODE>
142<DD>
143<A NAME="IDX64"></A>
144These flags should not be added by a human.  Instead only the
145<CODE>xgettext</CODE> program adds them.  In an automated PO file processing
146system as proposed here the user changes would be thrown away again as
147soon as the <CODE>xgettext</CODE> program generates a new template file.
148
149The <CODE>c-format</CODE> flag tells that the untranslated string and the
150translation are supposed to be C format strings.  The <CODE>no-c-format</CODE>
151flag tells that they are not C format strings, even though the untranslated
152string happens to look like a C format string (with <SAMP>&lsquo;%&rsquo;</SAMP> directives).
153
154In case the <CODE>c-format</CODE> flag is given for a string the <CODE>msgfmt</CODE>
155does some more tests to check to validity of the translation.
156See section <A HREF="gettext_10.html#SEC157">10.1  Invoking the <CODE>msgfmt</CODE> Program</A>, section <A HREF="gettext_4.html#SEC22">4.6  Special Comments preceding Keywords</A> and section <A HREF="gettext_15.html#SEC248">15.3.1  C Format Strings</A>.
157
158<DT><CODE>objc-format</CODE>
159<DD>
160<A NAME="IDX65"></A>
161<DT><CODE>no-objc-format</CODE>
162<DD>
163<A NAME="IDX66"></A>
164Likewise for Objective C, see section <A HREF="gettext_15.html#SEC249">15.3.2  Objective C Format Strings</A>.
165
166<DT><CODE>sh-format</CODE>
167<DD>
168<A NAME="IDX67"></A>
169<DT><CODE>no-sh-format</CODE>
170<DD>
171<A NAME="IDX68"></A>
172Likewise for Shell, see section <A HREF="gettext_15.html#SEC250">15.3.3  Shell Format Strings</A>.
173
174<DT><CODE>python-format</CODE>
175<DD>
176<A NAME="IDX69"></A>
177<DT><CODE>no-python-format</CODE>
178<DD>
179<A NAME="IDX70"></A>
180Likewise for Python, see section <A HREF="gettext_15.html#SEC251">15.3.4  Python Format Strings</A>.
181
182<DT><CODE>lisp-format</CODE>
183<DD>
184<A NAME="IDX71"></A>
185<DT><CODE>no-lisp-format</CODE>
186<DD>
187<A NAME="IDX72"></A>
188Likewise for Lisp, see section <A HREF="gettext_15.html#SEC252">15.3.5  Lisp Format Strings</A>.
189
190<DT><CODE>elisp-format</CODE>
191<DD>
192<A NAME="IDX73"></A>
193<DT><CODE>no-elisp-format</CODE>
194<DD>
195<A NAME="IDX74"></A>
196Likewise for Emacs Lisp, see section <A HREF="gettext_15.html#SEC253">15.3.6  Emacs Lisp Format Strings</A>.
197
198<DT><CODE>librep-format</CODE>
199<DD>
200<A NAME="IDX75"></A>
201<DT><CODE>no-librep-format</CODE>
202<DD>
203<A NAME="IDX76"></A>
204Likewise for librep, see section <A HREF="gettext_15.html#SEC254">15.3.7  librep Format Strings</A>.
205
206<DT><CODE>scheme-format</CODE>
207<DD>
208<A NAME="IDX77"></A>
209<DT><CODE>no-scheme-format</CODE>
210<DD>
211<A NAME="IDX78"></A>
212Likewise for Scheme, see section <A HREF="gettext_15.html#SEC255">15.3.8  Scheme Format Strings</A>.
213
214<DT><CODE>smalltalk-format</CODE>
215<DD>
216<A NAME="IDX79"></A>
217<DT><CODE>no-smalltalk-format</CODE>
218<DD>
219<A NAME="IDX80"></A>
220Likewise for Smalltalk, see section <A HREF="gettext_15.html#SEC256">15.3.9  Smalltalk Format Strings</A>.
221
222<DT><CODE>java-format</CODE>
223<DD>
224<A NAME="IDX81"></A>
225<DT><CODE>no-java-format</CODE>
226<DD>
227<A NAME="IDX82"></A>
228Likewise for Java, see section <A HREF="gettext_15.html#SEC257">15.3.10  Java Format Strings</A>.
229
230<DT><CODE>csharp-format</CODE>
231<DD>
232<A NAME="IDX83"></A>
233<DT><CODE>no-csharp-format</CODE>
234<DD>
235<A NAME="IDX84"></A>
236Likewise for C#, see section <A HREF="gettext_15.html#SEC258">15.3.11  C# Format Strings</A>.
237
238<DT><CODE>awk-format</CODE>
239<DD>
240<A NAME="IDX85"></A>
241<DT><CODE>no-awk-format</CODE>
242<DD>
243<A NAME="IDX86"></A>
244Likewise for awk, see section <A HREF="gettext_15.html#SEC259">15.3.12  awk Format Strings</A>.
245
246<DT><CODE>object-pascal-format</CODE>
247<DD>
248<A NAME="IDX87"></A>
249<DT><CODE>no-object-pascal-format</CODE>
250<DD>
251<A NAME="IDX88"></A>
252Likewise for Object Pascal, see section <A HREF="gettext_15.html#SEC260">15.3.13  Object Pascal Format Strings</A>.
253
254<DT><CODE>ycp-format</CODE>
255<DD>
256<A NAME="IDX89"></A>
257<DT><CODE>no-ycp-format</CODE>
258<DD>
259<A NAME="IDX90"></A>
260Likewise for YCP, see section <A HREF="gettext_15.html#SEC261">15.3.14  YCP Format Strings</A>.
261
262<DT><CODE>tcl-format</CODE>
263<DD>
264<A NAME="IDX91"></A>
265<DT><CODE>no-tcl-format</CODE>
266<DD>
267<A NAME="IDX92"></A>
268Likewise for Tcl, see section <A HREF="gettext_15.html#SEC262">15.3.15  Tcl Format Strings</A>.
269
270<DT><CODE>perl-format</CODE>
271<DD>
272<A NAME="IDX93"></A>
273<DT><CODE>no-perl-format</CODE>
274<DD>
275<A NAME="IDX94"></A>
276Likewise for Perl, see section <A HREF="gettext_15.html#SEC263">15.3.16  Perl Format Strings</A>.
277
278<DT><CODE>perl-brace-format</CODE>
279<DD>
280<A NAME="IDX95"></A>
281<DT><CODE>no-perl-brace-format</CODE>
282<DD>
283<A NAME="IDX96"></A>
284Likewise for Perl brace, see section <A HREF="gettext_15.html#SEC263">15.3.16  Perl Format Strings</A>.
285
286<DT><CODE>php-format</CODE>
287<DD>
288<A NAME="IDX97"></A>
289<DT><CODE>no-php-format</CODE>
290<DD>
291<A NAME="IDX98"></A>
292Likewise for PHP, see section <A HREF="gettext_15.html#SEC264">15.3.17  PHP Format Strings</A>.
293
294<DT><CODE>gcc-internal-format</CODE>
295<DD>
296<A NAME="IDX99"></A>
297<DT><CODE>no-gcc-internal-format</CODE>
298<DD>
299<A NAME="IDX100"></A>
300Likewise for the GCC sources, see section <A HREF="gettext_15.html#SEC265">15.3.18  GCC internal Format Strings</A>.
301
302<DT><CODE>qt-format</CODE>
303<DD>
304<A NAME="IDX101"></A>
305<DT><CODE>no-qt-format</CODE>
306<DD>
307<A NAME="IDX102"></A>
308Likewise for Qt, see section <A HREF="gettext_15.html#SEC266">15.3.19  Qt Format Strings</A>.
309
310<DT><CODE>kde-format</CODE>
311<DD>
312<A NAME="IDX103"></A>
313<DT><CODE>no-kde-format</CODE>
314<DD>
315<A NAME="IDX104"></A>
316Likewise for KDE, see section <A HREF="gettext_15.html#SEC267">15.3.20  KDE Format Strings</A>.
317
318<DT><CODE>boost-format</CODE>
319<DD>
320<A NAME="IDX105"></A>
321<DT><CODE>no-boost-format</CODE>
322<DD>
323<A NAME="IDX106"></A>
324Likewise for Boost, see section <A HREF="gettext_15.html#SEC268">15.3.21  Boost Format Strings</A>.
325
326</DL>
327
328<P>
329<A NAME="IDX107"></A>
330<A NAME="IDX108"></A>
331It is also possible to have entries with a context specifier. They look like
332this:
333
334</P>
335
336<PRE>
337<VAR>white-space</VAR>
338#  <VAR>translator-comments</VAR>
339#. <VAR>extracted-comments</VAR>
340#: <VAR>reference</VAR>...
341#, <VAR>flag</VAR>...
342#| msgctxt <VAR>previous-context</VAR>
343#| msgid <VAR>previous-untranslated-string</VAR>
344msgctxt <VAR>context</VAR>
345msgid <VAR>untranslated-string</VAR>
346msgstr <VAR>translated-string</VAR>
347</PRE>
348
349<P>
350The context serves to disambiguate messages with the same
351<VAR>untranslated-string</VAR>.  It is possible to have several entries with
352the same <VAR>untranslated-string</VAR> in a PO file, provided that they each
353have a different <VAR>context</VAR>.  Note that an empty <VAR>context</VAR> string
354and an absent <CODE>msgctxt</CODE> line do not mean the same thing.
355
356</P>
357<P>
358<A NAME="IDX109"></A>
359<A NAME="IDX110"></A>
360A different kind of entries is used for translations which involve
361plural forms.
362
363</P>
364
365<PRE>
366<VAR>white-space</VAR>
367#  <VAR>translator-comments</VAR>
368#. <VAR>extracted-comments</VAR>
369#: <VAR>reference</VAR>...
370#, <VAR>flag</VAR>...
371#| msgid <VAR>previous-untranslated-string-singular</VAR>
372#| msgid_plural <VAR>previous-untranslated-string-plural</VAR>
373msgid <VAR>untranslated-string-singular</VAR>
374msgid_plural <VAR>untranslated-string-plural</VAR>
375msgstr[0] <VAR>translated-string-case-0</VAR>
376...
377msgstr[N] <VAR>translated-string-case-n</VAR>
378</PRE>
379
380<P>
381Such an entry can look like this:
382
383</P>
384
385<PRE>
386#: src/msgcmp.c:338 src/po-lex.c:699
387#, c-format
388msgid "found %d fatal error"
389msgid_plural "found %d fatal errors"
390msgstr[0] "s'ha trobat %d error fatal"
391msgstr[1] "s'han trobat %d errors fatals"
392</PRE>
393
394<P>
395Here also, a <CODE>msgctxt</CODE> context can be specified before <CODE>msgid</CODE>,
396like above.
397
398</P>
399<P>
400The <VAR>previous-untranslated-string</VAR> is optionally inserted by the
401<CODE>msgmerge</CODE> program, at the same time when it marks a message fuzzy.
402It helps the translator to see which changes were done by the developers
403on the <VAR>untranslated-string</VAR>.
404
405</P>
406<P>
407It happens that some lines, usually whitespace or comments, follow the
408very last entry of a PO file.  Such lines are not part of any entry,
409and will be dropped when the PO file is processed by the tools, or may
410disturb some PO file editors.
411
412</P>
413<P>
414The remainder of this section may be safely skipped by those using
415a PO file editor, yet it may be interesting for everybody to have a better
416idea of the precise format of a PO file.  On the other hand, those
417wishing to modify PO files by hand should carefully continue reading on.
418
419</P>
420<P>
421Each of <VAR>untranslated-string</VAR> and <VAR>translated-string</VAR> respects
422the C syntax for a character string, including the surrounding quotes
423and embedded backslashed escape sequences.  When the time comes
424to write multi-line strings, one should not use escaped newlines.
425Instead, a closing quote should follow the last character on the
426line to be continued, and an opening quote should resume the string
427at the beginning of the following PO file line.  For example:
428
429</P>
430
431<PRE>
432msgid ""
433"Here is an example of how one might continue a very long string\n"
434"for the common case the string represents multi-line output.\n"
435</PRE>
436
437<P>
438In this example, the empty string is used on the first line, to
439allow better alignment of the <CODE>H</CODE> from the word <SAMP>&lsquo;Here&rsquo;</SAMP>
440over the <CODE>f</CODE> from the word <SAMP>&lsquo;for&rsquo;</SAMP>.  In this example, the
441<CODE>msgid</CODE> keyword is followed by three strings, which are meant
442to be concatenated.  Concatenating the empty string does not change
443the resulting overall string, but it is a way for us to comply with
444the necessity of <CODE>msgid</CODE> to be followed by a string on the same
445line, while keeping the multi-line presentation left-justified, as
446we find this to be a cleaner disposition.  The empty string could have
447been omitted, but only if the string starting with <SAMP>&lsquo;Here&rsquo;</SAMP> was
448promoted on the first line, right after <CODE>msgid</CODE>.<A NAME="DOCF2" HREF="gettext_foot.html#FOOT2">(2)</A> It was not really necessary
449either to switch between the two last quoted strings immediately after
450the newline <SAMP>&lsquo;\n&rsquo;</SAMP>, the switch could have occurred after <EM>any</EM>
451other character, we just did it this way because it is neater.
452
453</P>
454<P>
455<A NAME="IDX111"></A>
456One should carefully distinguish between end of lines marked as
457<SAMP>&lsquo;\n&rsquo;</SAMP> <EM>inside</EM> quotes, which are part of the represented
458string, and end of lines in the PO file itself, outside string quotes,
459which have no incidence on the represented string.
460
461</P>
462<P>
463<A NAME="IDX112"></A>
464Outside strings, white lines and comments may be used freely.
465Comments start at the beginning of a line with <SAMP>&lsquo;#&rsquo;</SAMP> and extend
466until the end of the PO file line.  Comments written by translators
467should have the initial <SAMP>&lsquo;#&rsquo;</SAMP> immediately followed by some white
468space.  If the <SAMP>&lsquo;#&rsquo;</SAMP> is not immediately followed by white space,
469this comment is most likely generated and managed by specialized GNU
470tools, and might disappear or be replaced unexpectedly when the PO
471file is given to <CODE>msgmerge</CODE>.
472
473</P>
474<P><HR><P>
475Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_2.html">previous</A>, <A HREF="gettext_4.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
476</BODY>
477</HTML>
478