1 2# Time-stamp: "2004-01-11 18:35:34 AST" 3 4=head1 NAME 5 6Locale::Maketext - framework for localization 7 8=head1 SYNOPSIS 9 10 package MyProgram; 11 use strict; 12 use MyProgram::L10N; 13 # ...which inherits from Locale::Maketext 14 my $lh = MyProgram::L10N->get_handle() || die "What language?"; 15 ... 16 # And then any messages your program emits, like: 17 warn $lh->maketext( "Can't open file [_1]: [_2]\n", $f, $! ); 18 ... 19 20=head1 DESCRIPTION 21 22It is a common feature of applications (whether run directly, 23or via the Web) for them to be "localized" -- i.e., for them 24to a present an English interface to an English-speaker, a German 25interface to a German-speaker, and so on for all languages it's 26programmed with. Locale::Maketext 27is a framework for software localization; it provides you with the 28tools for organizing and accessing the bits of text and text-processing 29code that you need for producing localized applications. 30 31In order to make sense of Maketext and how all its 32components fit together, you should probably 33go read L<Locale::Maketext::TPJ13|Locale::Maketext::TPJ13>, and 34I<then> read the following documentation. 35 36You may also want to read over the source for C<File::Findgrep> 37and its constituent modules -- they are a complete (if small) 38example application that uses Maketext. 39 40=head1 QUICK OVERVIEW 41 42The basic design of Locale::Maketext is object-oriented, and 43Locale::Maketext is an abstract base class, from which you 44derive a "project class". 45The project class (with a name like "TkBocciBall::Localize", 46which you then use in your module) is in turn the base class 47for all the "language classes" for your project 48(with names "TkBocciBall::Localize::it", 49"TkBocciBall::Localize::en", 50"TkBocciBall::Localize::fr", etc.). 51 52A language class is 53a class containing a lexicon of phrases as class data, 54and possibly also some methods that are of use in interpreting 55phrases in the lexicon, or otherwise dealing with text in that 56language. 57 58An object belonging to a language class is called a "language 59handle"; it's typically a flyweight object. 60 61The normal course of action is to call: 62 63 use TkBocciBall::Localize; # the localization project class 64 $lh = TkBocciBall::Localize->get_handle(); 65 # Depending on the user's locale, etc., this will 66 # make a language handle from among the classes available, 67 # and any defaults that you declare. 68 die "Couldn't make a language handle??" unless $lh; 69 70From then on, you use the C<maketext> function to access 71entries in whatever lexicon(s) belong to the language handle 72you got. So, this: 73 74 print $lh->maketext("You won!"), "\n"; 75 76...emits the right text for this language. If the object 77in C<$lh> belongs to class "TkBocciBall::Localize::fr" and 78%TkBocciBall::Localize::fr::Lexicon contains C<("You won!" 79=E<gt> "Tu as gagnE<eacute>!")>, then the above 80code happily tells the user "Tu as gagnE<eacute>!". 81 82=head1 METHODS 83 84Locale::Maketext offers a variety of methods, which fall 85into three categories: 86 87=over 88 89=item * 90 91Methods to do with constructing language handles. 92 93=item * 94 95C<maketext> and other methods to do with accessing %Lexicon data 96for a given language handle. 97 98=item * 99 100Methods that you may find it handy to use, from routines of 101yours that you put in %Lexicon entries. 102 103=back 104 105These are covered in the following section. 106 107=head2 Construction Methods 108 109These are to do with constructing a language handle: 110 111=over 112 113=item * 114 115$lh = YourProjClass->get_handle( ...langtags... ) || die "lg-handle?"; 116 117This tries loading classes based on the language-tags you give (like 118C<("en-US", "sk", "kon", "es-MX", "ja", "i-klingon")>, and for the first class 119that succeeds, returns YourProjClass::I<language>->new(). 120 121It runs thru the entire given list of language-tags, and finds no classes 122for those exact terms, it then tries "superordinate" language classes. 123So if no "en-US" class (i.e., YourProjClass::en_us) 124was found, nor classes for anything else in that list, we then try 125its superordinate, "en" (i.e., YourProjClass::en), and so on thru 126the other language-tags in the given list: "es". 127(The other language-tags in our example list: 128happen to have no superordinates.) 129 130If none of those language-tags leads to loadable classes, we then 131try classes derived from YourProjClass->fallback_languages() and 132then if nothing comes of that, we use classes named by 133YourProjClass->fallback_language_classes(). Then in the (probably 134quite unlikely) event that that fails, we just return undef. 135 136=item * 137 138$lh = YourProjClass->get_handleB<()> || die "lg-handle?"; 139 140When C<get_handle> is called with an empty parameter list, magic happens: 141 142If C<get_handle> senses that it's running in program that was 143invoked as a CGI, then it tries to get language-tags out of the 144environment variable "HTTP_ACCEPT_LANGUAGE", and it pretends that 145those were the languages passed as parameters to C<get_handle>. 146 147Otherwise (i.e., if not a CGI), this tries various OS-specific ways 148to get the language-tags for the current locale/language, and then 149pretends that those were the value(s) passed to C<get_handle>. 150 151Currently this OS-specific stuff consists of looking in the environment 152variables "LANG" and "LANGUAGE"; and on MSWin machines (where those 153variables are typically unused), this also tries using 154the module Win32::Locale to get a language-tag for whatever language/locale 155is currently selected in the "Regional Settings" (or "International"?) 156Control Panel. I welcome further 157suggestions for making this do the Right Thing under other operating 158systems that support localization. 159 160If you're using localization in an application that keeps a configuration 161file, you might consider something like this in your project class: 162 163 sub get_handle_via_config { 164 my $class = $_[0]; 165 my $preferred_language = $Config_settings{'language'}; 166 my $lh; 167 if($preferred_language) { 168 $lh = $class->get_handle($chosen_language) 169 || die "No language handle for \"$chosen_language\" or the like"; 170 } else { 171 # Config file missing, maybe? 172 $lh = $class->get_handle() 173 || die "Can't get a language handle"; 174 } 175 return $lh; 176 } 177 178=item * 179 180$lh = YourProjClass::langname->new(); 181 182This constructs a language handle. You usually B<don't> call this 183directly, but instead let C<get_handle> find a language class to C<use> 184and to then call ->new on. 185 186=item * 187 188$lh->init(); 189 190This is called by ->new to initialize newly-constructed language handles. 191If you define an init method in your class, remember that it's usually 192considered a good idea to call $lh->SUPER::init in it (presumably at the 193beginning), so that all classes get a chance to initialize a new object 194however they see fit. 195 196=item * 197 198YourProjClass->fallback_languages() 199 200C<get_handle> appends the return value of this to the end of 201whatever list of languages you pass C<get_handle>. Unless 202you override this method, your project class 203will inherit Locale::Maketext's C<fallback_languages>, which 204currently returns C<('i-default', 'en', 'en-US')>. 205("i-default" is defined in RFC 2277). 206 207This method (by having it return the name 208of a language-tag that has an existing language class) 209can be used for making sure that 210C<get_handle> will always manage to construct a language 211handle (assuming your language classes are in an appropriate 212@INC directory). Or you can use the next method: 213 214=item * 215 216YourProjClass->fallback_language_classes() 217 218C<get_handle> appends the return value of this to the end 219of the list of classes it will try using. Unless 220you override this method, your project class 221will inherit Locale::Maketext's C<fallback_language_classes>, 222which currently returns an empty list, C<()>. 223By setting this to some value (namely, the name of a loadable 224language class), you can be sure that 225C<get_handle> will always manage to construct a language 226handle. 227 228=back 229 230=head2 The "maketext" Method 231 232This is the most important method in Locale::Maketext: 233 234$text = $lh->maketext(I<key>, ...parameters for this phrase...); 235 236This looks in the %Lexicon of the language handle 237$lh and all its superclasses, looking 238for an entry whose key is the string I<key>. Assuming such 239an entry is found, various things then happen, depending on the 240value found: 241 242If the value is a scalarref, the scalar is dereferenced and returned 243(and any parameters are ignored). 244If the value is a coderef, we return &$value($lh, ...parameters...). 245If the value is a string that I<doesn't> look like it's in Bracket Notation, 246we return it (after replacing it with a scalarref, in its %Lexicon). 247If the value I<does> look like it's in Bracket Notation, then we compile 248it into a sub, replace the string in the %Lexicon with the new coderef, 249and then we return &$new_sub($lh, ...parameters...). 250 251Bracket Notation is discussed in a later section. Note 252that trying to compile a string into Bracket Notation can throw 253an exception if the string is not syntactically valid (say, by not 254balancing brackets right.) 255 256Also, calling &$coderef($lh, ...parameters...) can throw any sort of 257exception (if, say, code in that sub tries to divide by zero). But 258a very common exception occurs when you have Bracket 259Notation text that says to call a method "foo", but there is no such 260method. (E.g., "You have [quaB<tn>,_1,ball]." will throw an exception 261on trying to call $lh->quaB<tn>($_[1],'ball') -- you presumably meant 262"quant".) C<maketext> catches these exceptions, but only to make the 263error message more readable, at which point it rethrows the exception. 264 265An exception I<may> be thrown if I<key> is not found in any 266of $lh's %Lexicon hashes. What happens if a key is not found, 267is discussed in a later section, "Controlling Lookup Failure". 268 269Note that you might find it useful in some cases to override 270the C<maketext> method with an "after method", if you want to 271translate encodings, or even scripts: 272 273 package YrProj::zh_cn; # Chinese with PRC-style glyphs 274 use base ('YrProj::zh_tw'); # Taiwan-style 275 sub maketext { 276 my $self = shift(@_); 277 my $value = $self->maketext(@_); 278 return Chineeze::taiwan2mainland($value); 279 } 280 281Or you may want to override it with something that traps 282any exceptions, if that's critical to your program: 283 284 sub maketext { 285 my($lh, @stuff) = @_; 286 my $out; 287 eval { $out = $lh->SUPER::maketext(@stuff) }; 288 return $out unless $@; 289 ...otherwise deal with the exception... 290 } 291 292Other than those two situations, I don't imagine that 293it's useful to override the C<maketext> method. (If 294you run into a situation where it is useful, I'd be 295interested in hearing about it.) 296 297=over 298 299=item $lh->fail_with I<or> $lh->fail_with(I<PARAM>) 300 301=item $lh->failure_handler_auto 302 303These two methods are discussed in the section "Controlling 304Lookup Failure". 305 306=back 307 308=head2 Utility Methods 309 310These are methods that you may find it handy to use, generally 311from %Lexicon routines of yours (whether expressed as 312Bracket Notation or not). 313 314=over 315 316=item $language->quant($number, $singular) 317 318=item $language->quant($number, $singular, $plural) 319 320=item $language->quant($number, $singular, $plural, $negative) 321 322This is generally meant to be called from inside Bracket Notation 323(which is discussed later), as in 324 325 "Your search matched [quant,_1,document]!" 326 327It's for I<quantifying> a noun (i.e., saying how much of it there is, 328while giving the correct form of it). The behavior of this method is 329handy for English and a few other Western European languages, and you 330should override it for languages where it's not suitable. You can feel 331free to read the source, but the current implementation is basically 332as this pseudocode describes: 333 334 if $number is 0 and there's a $negative, 335 return $negative; 336 elsif $number is 1, 337 return "1 $singular"; 338 elsif there's a $plural, 339 return "$number $plural"; 340 else 341 return "$number " . $singular . "s"; 342 # 343 # ...except that we actually call numf to 344 # stringify $number before returning it. 345 346So for English (with Bracket Notation) 347C<"...[quant,_1,file]..."> is fine (for 0 it returns "0 files", 348for 1 it returns "1 file", and for more it returns "2 files", etc.) 349 350But for "directory", you'd want C<"[quant,_1,directory,directories]"> 351so that our elementary C<quant> method doesn't think that the 352plural of "directory" is "directorys". And you might find that the 353output may sound better if you specify a negative form, as in: 354 355 "[quant,_1,file,files,No files] matched your query.\n" 356 357Remember to keep in mind verb agreement (or adjectives too, in 358other languages), as in: 359 360 "[quant,_1,document] were matched.\n" 361 362Because if _1 is one, you get "1 document B<were> matched". 363An acceptable hack here is to do something like this: 364 365 "[quant,_1,document was, documents were] matched.\n" 366 367=item $language->numf($number) 368 369This returns the given number formatted nicely according to 370this language's conventions. Maketext's default method is 371mostly to just take the normal string form of the number 372(applying sprintf "%G" for only very large numbers), and then 373to add commas as necessary. (Except that 374we apply C<tr/,./.,/> if $language->{'numf_comma'} is true; 375that's a bit of a hack that's useful for languages that express 376two million as "2.000.000" and not as "2,000,000"). 377 378If you want anything fancier, consider overriding this with something 379that uses L<Number::Format|Number::Format>, or does something else 380entirely. 381 382Note that numf is called by quant for stringifying all quantifying 383numbers. 384 385=item $language->sprintf($format, @items) 386 387This is just a wrapper around Perl's normal C<sprintf> function. 388It's provided so that you can use "sprintf" in Bracket Notation: 389 390 "Couldn't access datanode [sprintf,%10x=~[%s~],_1,_2]!\n" 391 392returning... 393 394 Couldn't access datanode Stuff=[thangamabob]! 395 396=item $language->language_tag() 397 398Currently this just takes the last bit of C<ref($language)>, turns 399underscores to dashes, and returns it. So if $language is 400an object of class Hee::HOO::Haw::en_us, $language->language_tag() 401returns "en-us". (Yes, the usual representation for that language 402tag is "en-US", but case is I<never> considered meaningful in 403language-tag comparison.) 404 405You may override this as you like; Maketext doesn't use it for 406anything. 407 408=item $language->encoding() 409 410Currently this isn't used for anything, but it's provided 411(with default value of 412C<(ref($language) && $language-E<gt>{'encoding'})) or "iso-8859-1"> 413) as a sort of suggestion that it may be useful/necessary to 414associate encodings with your language handles (whether on a 415per-class or even per-handle basis.) 416 417=back 418 419=head2 Language Handle Attributes and Internals 420 421A language handle is a flyweight object -- i.e., it doesn't (necessarily) 422carry any data of interest, other than just being a member of 423whatever class it belongs to. 424 425A language handle is implemented as a blessed hash. Subclasses of yours 426can store whatever data you want in the hash. Currently the only hash 427entry used by any crucial Maketext method is "fail", so feel free to 428use anything else as you like. 429 430B<Remember: Don't be afraid to read the Maketext source if there's 431any point on which this documentation is unclear.> This documentation 432is vastly longer than the module source itself. 433 434=over 435 436=back 437 438=head1 LANGUAGE CLASS HIERARCHIES 439 440These are Locale::Maketext's assumptions about the class 441hierarchy formed by all your language classes: 442 443=over 444 445=item * 446 447You must have a project base class, which you load, and 448which you then use as the first argument in 449the call to YourProjClass->get_handle(...). It should derive 450(whether directly or indirectly) from Locale::Maketext. 451It B<doesn't matter> how you name this class, altho assuming this 452is the localization component of your Super Mega Program, 453good names for your project class might be 454SuperMegaProgram::Localization, SuperMegaProgram::L10N, 455SuperMegaProgram::I18N, SuperMegaProgram::International, 456or even SuperMegaProgram::Languages or SuperMegaProgram::Messages. 457 458=item * 459 460Language classes are what YourProjClass->get_handle will try to load. 461It will look for them by taking each language-tag (B<skipping> it 462if it doesn't look like a language-tag or locale-tag!), turning it to 463all lowercase, turning and dashes to underscores, and appending it 464to YourProjClass . "::". So this: 465 466 $lh = YourProjClass->get_handle( 467 'en-US', 'fr', 'kon', 'i-klingon', 'i-klingon-romanized' 468 ); 469 470will try loading the classes 471YourProjClass::en_us (note lowercase!), YourProjClass::fr, 472YourProjClass::kon, 473YourProjClass::i_klingon 474and YourProjClass::i_klingon_romanized. (And it'll stop at the 475first one that actually loads.) 476 477=item * 478 479I assume that each language class derives (directly or indirectly) 480from your project class, and also defines its @ISA, its %Lexicon, 481or both. But I anticipate no dire consequences if these assumptions 482do not hold. 483 484=item * 485 486Language classes may derive from other language classes (altho they 487should have "use I<Thatclassname>" or "use base qw(I<...classes...>)"). 488They may derive from the project 489class. They may derive from some other class altogether. Or via 490multiple inheritance, it may derive from any mixture of these. 491 492=item * 493 494I foresee no problems with having multiple inheritance in 495your hierarchy of language classes. (As usual, however, Perl will 496complain bitterly if you have a cycle in the hierarchy: i.e., if 497any class is its own ancestor.) 498 499=back 500 501=head1 ENTRIES IN EACH LEXICON 502 503A typical %Lexicon entry is meant to signify a phrase, 504taking some number (0 or more) of parameters. An entry 505is meant to be accessed by via 506a string I<key> in $lh->maketext(I<key>, ...parameters...), 507which should return a string that is generally meant for 508be used for "output" to the user -- regardless of whether 509this actually means printing to STDOUT, writing to a file, 510or putting into a GUI widget. 511 512While the key must be a string value (since that's a basic 513restriction that Perl places on hash keys), the value in 514the lexicon can currently be of several types: 515a defined scalar, scalarref, or coderef. The use of these is 516explained above, in the section 'The "maketext" Method', and 517Bracket Notation for strings is discussed in the next section. 518 519While you can use arbitrary unique IDs for lexicon keys 520(like "_min_larger_max_error"), it is often 521useful for if an entry's key is itself a valid value, like 522this example error message: 523 524 "Minimum ([_1]) is larger than maximum ([_2])!\n", 525 526Compare this code that uses an arbitrary ID... 527 528 die $lh->maketext( "_min_larger_max_error", $min, $max ) 529 if $min > $max; 530 531...to this code that uses a key-as-value: 532 533 die $lh->maketext( 534 "Minimum ([_1]) is larger than maximum ([_2])!\n", 535 $min, $max 536 ) if $min > $max; 537 538The second is, in short, more readable. In particular, it's obvious 539that the number of parameters you're feeding to that phrase (two) is 540the number of parameters that it I<wants> to be fed. (Since you see 541_1 and a _2 being used in the key there.) 542 543Also, once a project is otherwise 544complete and you start to localize it, you can scrape together 545all the various keys you use, and pass it to a translator; and then 546the translator's work will go faster if what he's presented is this: 547 548 "Minimum ([_1]) is larger than maximum ([_2])!\n", 549 => "", # fill in something here, Jacques! 550 551rather than this more cryptic mess: 552 553 "_min_larger_max_error" 554 => "", # fill in something here, Jacques 555 556I think that keys as lexicon values makes the completed lexicon 557entries more readable: 558 559 "Minimum ([_1]) is larger than maximum ([_2])!\n", 560 => "Le minimum ([_1]) est plus grand que le maximum ([_2])!\n", 561 562Also, having valid values as keys becomes very useful if you set 563up an _AUTO lexicon. _AUTO lexicons are discussed in a later 564section. 565 566I almost always use keys that are themselves 567valid lexicon values. One notable exception is when the value is 568quite long. For example, to get the screenful of data that 569a command-line program might returns when given an unknown switch, 570I often just use a key "_USAGE_MESSAGE". At that point I then go 571and immediately to define that lexicon entry in the 572ProjectClass::L10N::en lexicon (since English is always my "project 573language"): 574 575 '_USAGE_MESSAGE' => <<'EOSTUFF', 576 ...long long message... 577 EOSTUFF 578 579and then I can use it as: 580 581 getopt('oDI', \%opts) or die $lh->maketext('_USAGE_MESSAGE'); 582 583Incidentally, 584note that each class's C<%Lexicon> inherits-and-extends 585the lexicons in its superclasses. This is not because these are 586special hashes I<per se>, but because you access them via the 587C<maketext> method, which looks for entries across all the 588C<%Lexicon>'s in a language class I<and> all its ancestor classes. 589(This is because the idea of "class data" isn't directly implemented 590in Perl, but is instead left to individual class-systems to implement 591as they see fit..) 592 593Note that you may have things stored in a lexicon 594besides just phrases for output: for example, if your program 595takes input from the keyboard, asking a "(Y/N)" question, 596you probably need to know what equivalent of "Y[es]/N[o]" is 597in whatever language. You probably also need to know what 598the equivalents of the answers "y" and "n" are. You can 599store that information in the lexicon (say, under the keys 600"~answer_y" and "~answer_n", and the long forms as 601"~answer_yes" and "~answer_no", where "~" is just an ad-hoc 602character meant to indicate to programmers/translators that 603these are not phrases for output). 604 605Or instead of storing this in the language class's lexicon, 606you can (and, in some cases, really should) represent the same bit 607of knowledge as code is a method in the language class. (That 608leaves a tidy distinction between the lexicon as the things we 609know how to I<say>, and the rest of the things in the lexicon class 610as things that we know how to I<do>.) Consider 611this example of a processor for responses to French "oui/non" 612questions: 613 614 sub y_or_n { 615 return undef unless defined $_[1] and length $_[1]; 616 my $answer = lc $_[1]; # smash case 617 return 1 if $answer eq 'o' or $answer eq 'oui'; 618 return 0 if $answer eq 'n' or $answer eq 'non'; 619 return undef; 620 } 621 622...which you'd then call in a construct like this: 623 624 my $response; 625 until(defined $response) { 626 print $lh->maketext("Open the pod bay door (y/n)? "); 627 $response = $lh->y_or_n( get_input_from_keyboard_somehow() ); 628 } 629 if($response) { $pod_bay_door->open() } 630 else { $pod_bay_door->leave_closed() } 631 632Other data worth storing in a lexicon might be things like 633filenames for language-targetted resources: 634 635 ... 636 "_main_splash_png" 637 => "/styles/en_us/main_splash.png", 638 "_main_splash_imagemap" 639 => "/styles/en_us/main_splash.incl", 640 "_general_graphics_path" 641 => "/styles/en_us/", 642 "_alert_sound" 643 => "/styles/en_us/hey_there.wav", 644 "_forward_icon" 645 => "left_arrow.png", 646 "_backward_icon" 647 => "right_arrow.png", 648 # In some other languages, left equals 649 # BACKwards, and right is FOREwards. 650 ... 651 652You might want to do the same thing for expressing key bindings 653or the like (since hardwiring "q" as the binding for the function 654that quits a screen/menu/program is useful only if your language 655happens to associate "q" with "quit"!) 656 657=head1 BRACKET NOTATION 658 659Bracket Notation is a crucial feature of Locale::Maketext. I mean 660Bracket Notation to provide a replacement for sprintf formatting. 661Everything you do with Bracket Notation could be done with a sub block, 662but bracket notation is meant to be much more concise. 663 664Bracket Notation is a like a miniature "template" system (in the sense 665of L<Text::Template|Text::Template>, not in the sense of C++ templates), 666where normal text is passed thru basically as is, but text is special 667regions is specially interpreted. In Bracket Notation, you use brackets 668("[...]" -- not "{...}"!) to note sections that are specially interpreted. 669 670For example, here all the areas that are taken literally are underlined with 671a "^", and all the in-bracket special regions are underlined with an X: 672 673 "Minimum ([_1]) is larger than maximum ([_2])!\n", 674 ^^^^^^^^^ XX ^^^^^^^^^^^^^^^^^^^^^^^^^^ XX ^^^^ 675 676When that string is compiled from bracket notation into a real Perl sub, 677it's basically turned into: 678 679 sub { 680 my $lh = $_[0]; 681 my @params = @_; 682 return join '', 683 "Minimum (", 684 ...some code here... 685 ") is larger than maximum (", 686 ...some code here... 687 ")!\n", 688 } 689 # to be called by $lh->maketext(KEY, params...) 690 691In other words, text outside bracket groups is turned into string 692literals. Text in brackets is rather more complex, and currently follows 693these rules: 694 695=over 696 697=item * 698 699Bracket groups that are empty, or which consist only of whitespace, 700are ignored. (Examples: "[]", "[ ]", or a [ and a ] with returns 701and/or tabs and/or spaces between them. 702 703Otherwise, each group is taken to be a comma-separated group of items, 704and each item is interpreted as follows: 705 706=item * 707 708An item that is "_I<digits>" or "_-I<digits>" is interpreted as 709$_[I<value>]. I.e., "_1" is becomes with $_[1], and "_-3" is interpreted 710as $_[-3] (in which case @_ should have at least three elements in it). 711Note that $_[0] is the language handle, and is typically not named 712directly. 713 714=item * 715 716An item "_*" is interpreted to mean "all of @_ except $_[0]". 717I.e., C<@_[1..$#_]>. Note that this is an empty list in the case 718of calls like $lh->maketext(I<key>) where there are no 719parameters (except $_[0], the language handle). 720 721=item * 722 723Otherwise, each item is interpreted as a string literal. 724 725=back 726 727The group as a whole is interpreted as follows: 728 729=over 730 731=item * 732 733If the first item in a bracket group looks like a method name, 734then that group is interpreted like this: 735 736 $lh->that_method_name( 737 ...rest of items in this group... 738 ), 739 740=item * 741 742If the first item in a bracket group is "*", it's taken as shorthand 743for the so commonly called "quant" method. Similarly, if the first 744item in a bracket group is "#", it's taken to be shorthand for 745"numf". 746 747=item * 748 749If the first item in a bracket group is empty-string, or "_*" 750or "_I<digits>" or "_-I<digits>", then that group is interpreted 751as just the interpolation of all its items: 752 753 join('', 754 ...rest of items in this group... 755 ), 756 757Examples: "[_1]" and "[,_1]", which are synonymous; and 758"C<[,ID-(,_4,-,_2,)]>", which compiles as 759C<join "", "ID-(", $_[4], "-", $_[2], ")">. 760 761=item * 762 763Otherwise this bracket group is invalid. For example, in the group 764"[!@#,whatever]", the first item C<"!@#"> is neither empty-string, 765"_I<number>", "_-I<number>", "_*", nor a valid method name; and so 766Locale::Maketext will throw an exception of you try compiling an 767expression containing this bracket group. 768 769=back 770 771Note, incidentally, that items in each group are comma-separated, 772not C</\s*,\s*/>-separated. That is, you might expect that this 773bracket group: 774 775 "Hoohah [foo, _1 , bar ,baz]!" 776 777would compile to this: 778 779 sub { 780 my $lh = $_[0]; 781 return join '', 782 "Hoohah ", 783 $lh->foo( $_[1], "bar", "baz"), 784 "!", 785 } 786 787But it actually compiles as this: 788 789 sub { 790 my $lh = $_[0]; 791 return join '', 792 "Hoohah ", 793 $lh->foo(" _1 ", " bar ", "baz"), #!!! 794 "!", 795 } 796 797In the notation discussed so far, the characters "[" and "]" are given 798special meaning, for opening and closing bracket groups, and "," has 799a special meaning inside bracket groups, where it separates items in the 800group. This begs the question of how you'd express a literal "[" or 801"]" in a Bracket Notation string, and how you'd express a literal 802comma inside a bracket group. For this purpose I've adopted "~" (tilde) 803as an escape character: "~[" means a literal '[' character anywhere 804in Bracket Notation (i.e., regardless of whether you're in a bracket 805group or not), and ditto for "~]" meaning a literal ']', and "~," meaning 806a literal comma. (Altho "," means a literal comma outside of 807bracket groups -- it's only inside bracket groups that commas are special.) 808 809And on the off chance you need a literal tilde in a bracket expression, 810you get it with "~~". 811 812Currently, an unescaped "~" before a character 813other than a bracket or a comma is taken to mean just a "~" and that 814character. I.e., "~X" means the same as "~~X" -- i.e., one literal tilde, 815and then one literal "X". However, by using "~X", you are assuming that 816no future version of Maketext will use "~X" as a magic escape sequence. 817In practice this is not a great problem, since first off you can just 818write "~~X" and not worry about it; second off, I doubt I'll add lots 819of new magic characters to bracket notation; and third off, you 820aren't likely to want literal "~" characters in your messages anyway, 821since it's not a character with wide use in natural language text. 822 823Brackets must be balanced -- every openbracket must have 824one matching closebracket, and vice versa. So these are all B<invalid>: 825 826 "I ate [quant,_1,rhubarb pie." 827 "I ate [quant,_1,rhubarb pie[." 828 "I ate quant,_1,rhubarb pie]." 829 "I ate quant,_1,rhubarb pie[." 830 831Currently, bracket groups do not nest. That is, you B<cannot> say: 832 833 "Foo [bar,baz,[quux,quuux]]\n"; 834 835If you need a notation that's that powerful, use normal Perl: 836 837 %Lexicon = ( 838 ... 839 "some_key" => sub { 840 my $lh = $_[0]; 841 join '', 842 "Foo ", 843 $lh->bar('baz', $lh->quux('quuux')), 844 "\n", 845 }, 846 ... 847 ); 848 849Or write the "bar" method so you don't need to pass it the 850output from calling quux. 851 852I do not anticipate that you will need (or particularly want) 853to nest bracket groups, but you are welcome to email me with 854convincing (real-life) arguments to the contrary. 855 856=head1 AUTO LEXICONS 857 858If maketext goes to look in an individual %Lexicon for an entry 859for I<key> (where I<key> does not start with an underscore), and 860sees none, B<but does see> an entry of "_AUTO" => I<some_true_value>, 861then we actually define $Lexicon{I<key>} = I<key> right then and there, 862and then use that value as if it had been there all 863along. This happens before we even look in any superclass %Lexicons! 864 865(This is meant to be somewhat like the AUTOLOAD mechanism in 866Perl's function call system -- or, looked at another way, 867like the L<AutoLoader|AutoLoader> module.) 868 869I can picture all sorts of circumstances where you just 870do not want lookup to be able to fail (since failing 871normally means that maketext throws a C<die>, altho 872see the next section for greater control over that). But 873here's one circumstance where _AUTO lexicons are meant to 874be I<especially> useful: 875 876As you're writing an application, you decide as you go what messages 877you need to emit. Normally you'd go to write this: 878 879 if(-e $filename) { 880 go_process_file($filename) 881 } else { 882 print "Couldn't find file \"$filename\"!\n"; 883 } 884 885but since you anticipate localizing this, you write: 886 887 use ThisProject::I18N; 888 my $lh = ThisProject::I18N->get_handle(); 889 # For the moment, assume that things are set up so 890 # that we load class ThisProject::I18N::en 891 # and that that's the class that $lh belongs to. 892 ... 893 if(-e $filename) { 894 go_process_file($filename) 895 } else { 896 print $lh->maketext( 897 "Couldn't find file \"[_1]\"!\n", $filename 898 ); 899 } 900 901Now, right after you've just written the above lines, you'd 902normally have to go open the file 903ThisProject/I18N/en.pm, and immediately add an entry: 904 905 "Couldn't find file \"[_1]\"!\n" 906 => "Couldn't find file \"[_1]\"!\n", 907 908But I consider that somewhat of a distraction from the work 909of getting the main code working -- to say nothing of the fact 910that I often have to play with the program a few times before 911I can decide exactly what wording I want in the messages (which 912in this case would require me to go changing three lines of code: 913the call to maketext with that key, and then the two lines in 914ThisProject/I18N/en.pm). 915 916However, if you set "_AUTO => 1" in the %Lexicon in, 917ThisProject/I18N/en.pm (assuming that English (en) is 918the language that all your programmers will be using for this 919project's internal message keys), then you don't ever have to 920go adding lines like this 921 922 "Couldn't find file \"[_1]\"!\n" 923 => "Couldn't find file \"[_1]\"!\n", 924 925to ThisProject/I18N/en.pm, because if _AUTO is true there, 926then just looking for an entry with the key "Couldn't find 927file \"[_1]\"!\n" in that lexicon will cause it to be added, 928with that value! 929 930Note that the reason that keys that start with "_" 931are immune to _AUTO isn't anything generally magical about 932the underscore character -- I just wanted a way to have most 933lexicon keys be autoable, except for possibly a few, and I 934arbitrarily decided to use a leading underscore as a signal 935to distinguish those few. 936 937=head1 CONTROLLING LOOKUP FAILURE 938 939If you call $lh->maketext(I<key>, ...parameters...), 940and there's no entry I<key> in $lh's class's %Lexicon, nor 941in the superclass %Lexicon hash, I<and> if we can't auto-make 942I<key> (because either it starts with a "_", or because none 943of its lexicons have C<_AUTO =E<gt> 1,>), then we have 944failed to find a normal way to maketext I<key>. What then 945happens in these failure conditions, depends on the $lh object 946"fail" attribute. 947 948If the language handle has no "fail" attribute, maketext 949will simply throw an exception (i.e., it calls C<die>, mentioning 950the I<key> whose lookup failed, and naming the line number where 951the calling $lh->maketext(I<key>,...) was. 952 953If the language handle has a "fail" attribute whose value is a 954coderef, then $lh->maketext(I<key>,...params...) gives up and calls: 955 956 return &{$that_subref}($lh, $key, @params); 957 958Otherwise, the "fail" attribute's value should be a string denoting 959a method name, so that $lh->maketext(I<key>,...params...) can 960give up with: 961 962 return $lh->$that_method_name($phrase, @params); 963 964The "fail" attribute can be accessed with the C<fail_with> method: 965 966 # Set to a coderef: 967 $lh->fail_with( \&failure_handler ); 968 969 # Set to a method name: 970 $lh->fail_with( 'failure_method' ); 971 972 # Set to nothing (i.e., so failure throws a plain exception) 973 $lh->fail_with( undef ); 974 975 # Simply read: 976 $handler = $lh->fail_with(); 977 978Now, as to what you may want to do with these handlers: Maybe you'd 979want to log what key failed for what class, and then die. Maybe 980you don't like C<die> and instead you want to send the error message 981to STDOUT (or wherever) and then merely C<exit()>. 982 983Or maybe you don't want to C<die> at all! Maybe you could use a 984handler like this: 985 986 # Make all lookups fall back onto an English value, 987 # but after we log it for later fingerpointing. 988 my $lh_backup = ThisProject->get_handle('en'); 989 open(LEX_FAIL_LOG, ">>wherever/lex.log") || die "GNAARGH $!"; 990 sub lex_fail { 991 my($failing_lh, $key, $params) = @_; 992 print LEX_FAIL_LOG scalar(localtime), "\t", 993 ref($failing_lh), "\t", $key, "\n"; 994 return $lh_backup->maketext($key,@params); 995 } 996 997Some users have expressed that they think this whole mechanism of 998having a "fail" attribute at all, seems a rather pointless complication. 999But I want Locale::Maketext to be usable for software projects of I<any> 1000scale and type; and different software projects have different ideas 1001of what the right thing is to do in failure conditions. I could simply 1002say that failure always throws an exception, and that if you want to be 1003careful, you'll just have to wrap every call to $lh->maketext in an 1004S<eval { }>. However, I want programmers to reserve the right (via 1005the "fail" attribute) to treat lookup failure as something other than 1006an exception of the same level of severity as a config file being 1007unreadable, or some essential resource being inaccessible. 1008 1009One possibly useful value for the "fail" attribute is the method name 1010"failure_handler_auto". This is a method defined in class 1011Locale::Maketext itself. You set it with: 1012 1013 $lh->fail_with('failure_handler_auto'); 1014 1015Then when you call $lh->maketext(I<key>, ...parameters...) and 1016there's no I<key> in any of those lexicons, maketext gives up with 1017 1018 return $lh->failure_handler_auto($key, @params); 1019 1020But failure_handler_auto, instead of dying or anything, compiles 1021$key, caching it in $lh->{'failure_lex'}{$key} = $complied, 1022and then calls the compiled value, and returns that. (I.e., if 1023$key looks like bracket notation, $compiled is a sub, and we return 1024&{$compiled}(@params); but if $key is just a plain string, we just 1025return that.) 1026 1027The effect of using "failure_auto_handler" 1028is like an AUTO lexicon, except that it 1) compiles $key even if 1029it starts with "_", and 2) you have a record in the new hashref 1030$lh->{'failure_lex'} of all the keys that have failed for 1031this object. This should avoid your program dying -- as long 1032as your keys aren't actually invalid as bracket code, and as 1033long as they don't try calling methods that don't exist. 1034 1035"failure_auto_handler" may not be exactly what you want, but I 1036hope it at least shows you that maketext failure can be mitigated 1037in any number of very flexible ways. If you can formalize exactly 1038what you want, you should be able to express that as a failure 1039handler. You can even make it default for every object of a given 1040class, by setting it in that class's init: 1041 1042 sub init { 1043 my $lh = $_[0]; # a newborn handle 1044 $lh->SUPER::init(); 1045 $lh->fail_with('my_clever_failure_handler'); 1046 return; 1047 } 1048 sub my_clever_failure_handler { 1049 ...you clever things here... 1050 } 1051 1052=head1 HOW TO USE MAKETEXT 1053 1054Here is a brief checklist on how to use Maketext to localize 1055applications: 1056 1057=over 1058 1059=item * 1060 1061Decide what system you'll use for lexicon keys. If you insist, 1062you can use opaque IDs (if you're nostalgic for C<catgets>), 1063but I have better suggestions in the 1064section "Entries in Each Lexicon", above. Assuming you opt for 1065meaningful keys that double as values (like "Minimum ([_1]) is 1066larger than maximum ([_2])!\n"), you'll have to settle on what 1067language those should be in. For the sake of argument, I'll 1068call this English, specifically American English, "en-US". 1069 1070=item * 1071 1072Create a class for your localization project. This is 1073the name of the class that you'll use in the idiom: 1074 1075 use Projname::L10N; 1076 my $lh = Projname::L10N->get_handle(...) || die "Language?"; 1077 1078Assuming your call your class Projname::L10N, create a class 1079consisting minimally of: 1080 1081 package Projname::L10N; 1082 use base qw(Locale::Maketext); 1083 ...any methods you might want all your languages to share... 1084 1085 # And, assuming you want the base class to be an _AUTO lexicon, 1086 # as is discussed a few sections up: 1087 1088 1; 1089 1090=item * 1091 1092Create a class for the language your internal keys are in. Name 1093the class after the language-tag for that language, in lowercase, 1094with dashes changed to underscores. Assuming your project's first 1095language is US English, you should call this Projname::L10N::en_us. 1096It should consist minimally of: 1097 1098 package Projname::L10N::en_us; 1099 use base qw(Projname::L10N); 1100 %Lexicon = ( 1101 '_AUTO' => 1, 1102 ); 1103 1; 1104 1105(For the rest of this section, I'll assume that this "first 1106language class" of Projname::L10N::en_us has 1107_AUTO lexicon.) 1108 1109=item * 1110 1111Go and write your program. Everywhere in your program where 1112you would say: 1113 1114 print "Foobar $thing stuff\n"; 1115 1116instead do it thru maketext, using no variable interpolation in 1117the key: 1118 1119 print $lh->maketext("Foobar [_1] stuff\n", $thing); 1120 1121If you get tired of constantly saying C<print $lh-E<gt>maketext>, 1122consider making a functional wrapper for it, like so: 1123 1124 use Projname::L10N; 1125 use vars qw($lh); 1126 $lh = Projname::L10N->get_handle(...) || die "Language?"; 1127 sub pmt (@) { print( $lh->maketext(@_)) } 1128 # "pmt" is short for "Print MakeText" 1129 $Carp::Verbose = 1; 1130 # so if maketext fails, we see made the call to pmt 1131 1132Besides whole phrases meant for output, anything language-dependent 1133should be put into the class Projname::L10N::en_us, 1134whether as methods, or as lexicon entries -- this is discussed 1135in the section "Entries in Each Lexicon", above. 1136 1137=item * 1138 1139Once the program is otherwise done, and once its localization for 1140the first language works right (via the data and methods in 1141Projname::L10N::en_us), you can get together the data for translation. 1142If your first language lexicon isn't an _AUTO lexicon, then you already 1143have all the messages explicitly in the lexicon (or else you'd be 1144getting exceptions thrown when you call $lh->maketext to get 1145messages that aren't in there). But if you were (advisedly) lazy and are 1146using an _AUTO lexicon, then you've got to make a list of all the phrases 1147that you've so far been letting _AUTO generate for you. There are very 1148many ways to assemble such a list. The most straightforward is to simply 1149grep the source for every occurrence of "maketext" (or calls 1150to wrappers around it, like the above C<pmt> function), and to log the 1151following phrase. 1152 1153=item * 1154 1155You may at this point want to consider whether the your base class 1156(Projname::L10N) that all lexicons inherit from (Projname::L10N::en, 1157Projname::L10N::es, etc.) should be an _AUTO lexicon. It may be true 1158that in theory, all needed messages will be in each language class; 1159but in the presumably unlikely or "impossible" case of lookup failure, 1160you should consider whether your program should throw an exception, 1161emit text in English (or whatever your project's first language is), 1162or some more complex solution as described in the section 1163"Controlling Lookup Failure", above. 1164 1165=item * 1166 1167Submit all messages/phrases/etc. to translators. 1168 1169(You may, in fact, want to start with localizing to I<one> other language 1170at first, if you're not sure that you've property abstracted the 1171language-dependent parts of your code.) 1172 1173Translators may request clarification of the situation in which a 1174particular phrase is found. For example, in English we are entirely happy 1175saying "I<n> files found", regardless of whether we mean "I looked for files, 1176and found I<n> of them" or the rather distinct situation of "I looked for 1177something else (like lines in files), and along the way I saw I<n> 1178files." This may involve rethinking things that you thought quite clear: 1179should "Edit" on a toolbar be a noun ("editing") or a verb ("to edit")? Is 1180there already a conventionalized way to express that menu option, separate 1181from the target language's normal word for "to edit"? 1182 1183In all cases where the very common phenomenon of quantification 1184(saying "I<N> files", for B<any> value of N) 1185is involved, each translator should make clear what dependencies the 1186number causes in the sentence. In many cases, dependency is 1187limited to words adjacent to the number, in places where you might 1188expect them ("I found the-?PLURAL I<N> 1189empty-?PLURAL directory-?PLURAL"), but in some cases there are 1190unexpected dependencies ("I found-?PLURAL ..."!) as well as long-distance 1191dependencies "The I<N> directory-?PLURAL could not be deleted-?PLURAL"!). 1192 1193Remind the translators to consider the case where N is 0: 1194"0 files found" isn't exactly natural-sounding in any language, but it 1195may be unacceptable in many -- or it may condition special 1196kinds of agreement (similar to English "I didN'T find ANY files"). 1197 1198Remember to ask your translators about numeral formatting in their 1199language, so that you can override the C<numf> method as 1200appropriate. Typical variables in number formatting are: what to 1201use as a decimal point (comma? period?); what to use as a thousands 1202separator (space? nonbreaking space? comma? period? small 1203middot? prime? apostrophe?); and even whether the so-called "thousands 1204separator" is actually for every third digit -- I've heard reports of 1205two hundred thousand being expressible as "2,00,000" for some Indian 1206(Subcontinental) languages, besides the less surprising "S<200 000>", 1207"200.000", "200,000", and "200'000". Also, using a set of numeral 1208glyphs other than the usual ASCII "0"-"9" might be appreciated, as via 1209C<tr/0-9/\x{0966}-\x{096F}/> for getting digits in Devanagari script 1210(for Hindi, Konkani, others). 1211 1212The basic C<quant> method that Locale::Maketext provides should be 1213good for many languages. For some languages, it might be useful 1214to modify it (or its constituent C<numerate> method) 1215to take a plural form in the two-argument call to C<quant> 1216(as in "[quant,_1,files]") if 1217it's all-around easier to infer the singular form from the plural, than 1218to infer the plural form from the singular. 1219 1220But for other languages (as is discussed at length 1221in L<Locale::Maketext::TPJ13|Locale::Maketext::TPJ13>), simple 1222C<quant>/C<numerify> is not enough. For the particularly problematic 1223Slavic languages, what you may need is a method which you provide 1224with the number, the citation form of the noun to quantify, and 1225the case and gender that the sentence's syntax projects onto that 1226noun slot. The method would then be responsible for determining 1227what grammatical number that numeral projects onto its noun phrase, 1228and what case and gender it may override the normal case and gender 1229with; and then it would look up the noun in a lexicon providing 1230all needed inflected forms. 1231 1232=item * 1233 1234You may also wish to discuss with the translators the question of 1235how to relate different subforms of the same language tag, 1236considering how this reacts with C<get_handle>'s treatment of 1237these. For example, if a user accepts interfaces in "en, fr", and 1238you have interfaces available in "en-US" and "fr", what should 1239they get? You may wish to resolve this by establishing that "en" 1240and "en-US" are effectively synonymous, by having one class 1241zero-derive from the other. 1242 1243For some languages this issue may never come up (Danish is rarely 1244expressed as "da-DK", but instead is just "da"). And for other 1245languages, the whole concept of a "generic" form may verge on 1246being uselessly vague, particularly for interfaces involving voice 1247media in forms of Arabic or Chinese. 1248 1249=item * 1250 1251Once you've localized your program/site/etc. for all desired 1252languages, be sure to show the result (whether live, or via 1253screenshots) to the translators. Once they approve, make every 1254effort to have it then checked by at least one other speaker of 1255that language. This holds true even when (or especially when) the 1256translation is done by one of your own programmers. Some 1257kinds of systems may be harder to find testers for than others, 1258depending on the amount of domain-specific jargon and concepts 1259involved -- it's easier to find people who can tell you whether 1260they approve of your translation for "delete this message" in an 1261email-via-Web interface, than to find people who can give you 1262an informed opinion on your translation for "attribute value" 1263in an XML query tool's interface. 1264 1265=back 1266 1267=head1 SEE ALSO 1268 1269I recommend reading all of these: 1270 1271L<Locale::Maketext::TPJ13|Locale::Maketext::TPJ13> -- my I<The Perl 1272Journal> article about Maketext. It explains many important concepts 1273underlying Locale::Maketext's design, and some insight into why 1274Maketext is better than the plain old approach of just having 1275message catalogs that are just databases of sprintf formats. 1276 1277L<File::Findgrep|File::Findgrep> is a sample application/module 1278that uses Locale::Maketext to localize its messages. For a larger 1279internationalized system, see also L<Apache::MP3>. 1280 1281L<I18N::LangTags|I18N::LangTags>. 1282 1283L<Win32::Locale|Win32::Locale>. 1284 1285RFC 3066, I<Tags for the Identification of Languages>, 1286as at http://sunsite.dk/RFC/rfc/rfc3066.html 1287 1288RFC 2277, I<IETF Policy on Character Sets and Languages> 1289is at http://sunsite.dk/RFC/rfc/rfc2277.html -- much of it is 1290just things of interest to protocol designers, but it explains 1291some basic concepts, like the distinction between locales and 1292language-tags. 1293 1294The manual for GNU C<gettext>. The gettext dist is available in 1295C<ftp://prep.ai.mit.edu/pub/gnu/> -- get 1296a recent gettext tarball and look in its "doc/" directory, there's 1297an easily browsable HTML version in there. The 1298gettext documentation asks lots of questions worth thinking 1299about, even if some of their answers are sometimes wonky, 1300particularly where they start talking about pluralization. 1301 1302The Locale/Maketext.pm source. Obverse that the module is much 1303shorter than its documentation! 1304 1305=head1 COPYRIGHT AND DISCLAIMER 1306 1307Copyright (c) 1999-2004 Sean M. Burke. All rights reserved. 1308 1309This library is free software; you can redistribute it and/or modify 1310it under the same terms as Perl itself. 1311 1312This program is distributed in the hope that it will be useful, but 1313without any warranty; without even the implied warranty of 1314merchantability or fitness for a particular purpose. 1315 1316=head1 AUTHOR 1317 1318Sean M. Burke C<sburke@cpan.org> 1319 1320=cut 1321