1\input texinfo @c -*- texinfo -*- 2@c %**start of header 3@setfilename gperf.info 4@settitle Perfect Hash Function Generator 5@c @setchapternewpage odd 6@c %**end of header 7 8@c some day we should @include version.texi instead of defining 9@c these values at hand. 10@set UPDATED 31 March 2007 11@set EDITION 3.0.3 12@set VERSION 3.0.3 13@c --------------------- 14 15@c remove the black boxes generated in the GPL appendix. 16@finalout 17 18@c Merge functions into the concept index 19@syncodeindex fn cp 20@c @synindex pg cp 21 22@dircategory Programming Tools 23@direntry 24* Gperf: (gperf). Perfect Hash Function Generator. 25@end direntry 26 27@ifinfo 28This file documents the features of the GNU Perfect Hash Function 29Generator @value{VERSION}. 30 31Copyright @copyright{} 1989-2006 Free Software Foundation, Inc. 32 33Permission is granted to make and distribute verbatim copies of this 34manual provided the copyright notice and this permission notice are 35preserved on all copies. 36 37@ignore 38Permission is granted to process this file through TeX and print the 39results, provided the printed document carries a copying permission 40notice identical to this one except for the removal of this paragraph 41(this paragraph not being relevant to the printed manual). 42 43@end ignore 44 45Permission is granted to copy and distribute modified versions of this 46manual under the conditions for verbatim copying, provided also that the 47section entitled ``GNU General Public License'' is included exactly as 48in the original, and provided that the entire resulting derived work is 49distributed under the terms of a permission notice identical to this 50one. 51 52Permission is granted to copy and distribute translations of this manual 53into another language, under the above conditions for modified versions, 54except that the section entitled ``GNU General Public License'' and this 55permission notice may be included in translations approved by the Free 56Software Foundation instead of in the original English. 57 58@end ifinfo 59 60@titlepage 61@title User's Guide to @code{gperf} @value{VERSION} 62@subtitle The GNU Perfect Hash Function Generator 63@subtitle Edition @value{EDITION}, @value{UPDATED} 64@author Douglas C. Schmidt 65@author Bruno Haible 66 67@page 68@vskip 0pt plus 1filll 69Copyright @copyright{} 1989-2007 Free Software Foundation, Inc. 70 71 72Permission is granted to make and distribute verbatim copies of 73this manual provided the copyright notice and this permission notice 74are preserved on all copies. 75 76Permission is granted to copy and distribute modified versions of this 77manual under the conditions for verbatim copying, provided also that the 78section entitled ``GNU General Public License'' is included 79exactly as in the original, and provided that the entire resulting 80derived work is distributed under the terms of a permission notice 81identical to this one. 82 83Permission is granted to copy and distribute translations of this manual 84into another language, under the above conditions for modified versions, 85except that the section entitled ``GNU General Public License'' may be 86included in a translation approved by the author instead of in the 87original English. 88@end titlepage 89 90@ifinfo 91@node Top, Copying, (dir), (dir) 92@top Introduction 93 94This manual documents the GNU @code{gperf} perfect hash function generator 95utility, focusing on its features and how to use them, and how to report 96bugs. 97 98@menu 99* Copying:: GNU @code{gperf} General Public License says 100 how you can copy and share @code{gperf}. 101* Contributors:: People who have contributed to @code{gperf}. 102* Motivation:: The purpose of @code{gperf}. 103* Search Structures:: Static search structures and GNU @code{gperf} 104* Description:: High-level discussion of how GPERF functions. 105* Options:: A description of options to the program. 106* Bugs:: Known bugs and limitations with GPERF. 107* Projects:: Things still left to do. 108* Bibliography:: Material Referenced in this Report. 109 110* Concept Index:: 111 112@detailmenu --- The Detailed Node Listing --- 113 114High-Level Description of GNU @code{gperf} 115 116* Input Format:: Input Format to @code{gperf} 117* Output Format:: Output Format for Generated C Code with @code{gperf} 118* Binary Strings:: Use of NUL bytes 119 120Input Format to @code{gperf} 121 122* Declarations:: Declarations. 123* Keywords:: Format for Keyword Entries. 124* Functions:: Including Additional C Functions. 125* Controls for GNU indent:: Where to place directives for GNU @code{indent}. 126 127Declarations 128 129* User-supplied Struct:: Specifying keywords with attributes. 130* Gperf Declarations:: Embedding command line options in the input. 131* C Code Inclusion:: Including C declarations and definitions. 132 133Invoking @code{gperf} 134 135* Input Details:: Options that affect Interpretation of the Input File 136* Output Language:: Specifying the Language for the Output Code 137* Output Details:: Fine tuning Details in the Output Code 138* Algorithmic Details:: Changing the Algorithms employed by @code{gperf} 139* Verbosity:: Informative Output 140 141@end detailmenu 142@end menu 143 144@end ifinfo 145 146@node Copying, Contributors, Top, Top 147@unnumbered GNU GENERAL PUBLIC LICENSE 148@include gpl.texinfo 149 150@node Contributors, Motivation, Copying, Top 151@unnumbered Contributors to GNU @code{gperf} Utility 152 153@itemize @bullet 154@item 155@cindex Bugs 156The GNU @code{gperf} perfect hash function generator utility was 157written in GNU C++ by Douglas C. Schmidt. The general 158idea for the perfect hash function generator was inspired by Keith 159Bostic's algorithm written in C, and distributed to net.sources around 1601984. The current program is a heavily modified, enhanced, and extended 161implementation of Keith's basic idea, created at the University of 162California, Irvine. Bugs, patches, and suggestions should be reported 163to @code{<bug-gnu-gperf@@gnu.org>}. 164 165@item 166Special thanks is extended to Michael Tiemann and Doug Lea, for 167providing a useful compiler, and for giving me a forum to exhibit my 168creation. 169 170In addition, Adam de Boor and Nels Olson provided many tips and insights 171that greatly helped improve the quality and functionality of @code{gperf}. 172 173@item 174Bruno Haible enhanced and optimized the search algorithm. He also rewrote 175the input routines and the output routines for better reliability, and 176added a testsuite. 177@end itemize 178 179@node Motivation, Search Structures, Contributors, Top 180@chapter Introduction 181 182@code{gperf} is a perfect hash function generator written in C++. It 183transforms an @var{n} element user-specified keyword set @var{W} into a 184perfect hash function @var{F}. @var{F} uniquely maps keywords in 185@var{W} onto the range 0..@var{k}, where @var{k} >= @var{n-1}. If @var{k} 186= @var{n-1} then @var{F} is a @emph{minimal} perfect hash function. 187@code{gperf} generates a 0..@var{k} element static lookup table and a 188pair of C functions. These functions determine whether a given 189character string @var{s} occurs in @var{W}, using at most one probe into 190the lookup table. 191 192@code{gperf} currently generates the reserved keyword recognizer for 193lexical analyzers in several production and research compilers and 194language processing tools, including GNU C, GNU C++, GNU Java, GNU Pascal, 195GNU Modula 3, and GNU indent. Complete C++ source code for @code{gperf} is 196available from @code{http://ftp.gnu.org/pub/gnu/gperf/}. 197A paper describing @code{gperf}'s design and implementation in greater 198detail is available in the Second USENIX C++ Conference proceedings 199or from @code{http://www.cs.wustl.edu/~schmidt/resume.html}. 200 201@node Search Structures, Description, Motivation, Top 202@chapter Static search structures and GNU @code{gperf} 203@cindex Static search structure 204 205A @dfn{static search structure} is an Abstract Data Type with certain 206fundamental operations, e.g., @emph{initialize}, @emph{insert}, 207and @emph{retrieve}. Conceptually, all insertions occur before any 208retrievals. In practice, @code{gperf} generates a @emph{static} array 209containing search set keywords and any associated attributes specified 210by the user. Thus, there is essentially no execution-time cost for the 211insertions. It is a useful data structure for representing @emph{static 212search sets}. Static search sets occur frequently in software system 213applications. Typical static search sets include compiler reserved 214words, assembler instruction opcodes, and built-in shell interpreter 215commands. Search set members, called @dfn{keywords}, are inserted into 216the structure only once, usually during program initialization, and are 217not generally modified at run-time. 218 219Numerous static search structure implementations exist, e.g., 220arrays, linked lists, binary search trees, digital search tries, and 221hash tables. Different approaches offer trade-offs between space 222utilization and search time efficiency. For example, an @var{n} element 223sorted array is space efficient, though the average-case time 224complexity for retrieval operations using binary search is 225proportional to log @var{n}. Conversely, hash table implementations 226often locate a table entry in constant time, but typically impose 227additional memory overhead and exhibit poor worst case performance. 228 229@cindex Minimal perfect hash functions 230@emph{Minimal perfect hash functions} provide an optimal solution for a 231particular class of static search sets. A minimal perfect hash 232function is defined by two properties: 233 234@itemize @bullet 235@item 236It allows keyword recognition in a static search set using at most 237@emph{one} probe into the hash table. This represents the ``perfect'' 238property. 239@item 240The actual memory allocated to store the keywords is precisely large 241enough for the keyword set, and @emph{no larger}. This is the 242``minimal'' property. 243@end itemize 244 245For most applications it is far easier to generate @emph{perfect} hash 246functions than @emph{minimal perfect} hash functions. Moreover, 247non-minimal perfect hash functions frequently execute faster than 248minimal ones in practice. This phenomena occurs since searching a 249sparse keyword table increases the probability of locating a ``null'' 250entry, thereby reducing string comparisons. @code{gperf}'s default 251behavior generates @emph{near-minimal} perfect hash functions for 252keyword sets. However, @code{gperf} provides many options that permit 253user control over the degree of minimality and perfection. 254 255Static search sets often exhibit relative stability over time. For 256example, Ada's 63 reserved words have remained constant for nearly a 257decade. It is therefore frequently worthwhile to expend concerted 258effort building an optimal search structure @emph{once}, if it 259subsequently receives heavy use multiple times. @code{gperf} removes 260the drudgery associated with constructing time- and space-efficient 261search structures by hand. It has proven a useful and practical tool 262for serious programming projects. Output from @code{gperf} is currently 263used in several production and research compilers, including GNU C, GNU 264C++, GNU Java, GNU Pascal, and GNU Modula 3. The latter two compilers are 265not yet part of the official GNU distribution. Each compiler utilizes 266@code{gperf} to automatically generate static search structures that 267efficiently identify their respective reserved keywords. 268 269@node Description, Options, Search Structures, Top 270@chapter High-Level Description of GNU @code{gperf} 271 272@menu 273* Input Format:: Input Format to @code{gperf} 274* Output Format:: Output Format for Generated C Code with @code{gperf} 275* Binary Strings:: Use of NUL bytes 276@end menu 277 278The perfect hash function generator @code{gperf} reads a set of 279``keywords'' from an input file (or from the standard input by 280default). It attempts to derive a perfect hashing function that 281recognizes a member of the @dfn{static keyword set} with at most a 282single probe into the lookup table. If @code{gperf} succeeds in 283generating such a function it produces a pair of C source code routines 284that perform hashing and table lookup recognition. All generated C code 285is directed to the standard output. Command-line options described 286below allow you to modify the input and output format to @code{gperf}. 287 288By default, @code{gperf} attempts to produce time-efficient code, with 289less emphasis on efficient space utilization. However, several options 290exist that permit trading-off execution time for storage space and vice 291versa. In particular, expanding the generated table size produces a 292sparse search structure, generally yielding faster searches. 293Conversely, you can direct @code{gperf} to utilize a C @code{switch} 294statement scheme that minimizes data space storage size. Furthermore, 295using a C @code{switch} may actually speed up the keyword retrieval time 296somewhat. Actual results depend on your C compiler, of course. 297 298In general, @code{gperf} assigns values to the bytes it is using 299for hashing until some set of values gives each keyword a unique value. 300A helpful heuristic is that the larger the hash value range, the easier 301it is for @code{gperf} to find and generate a perfect hash function. 302Experimentation is the key to getting the most from @code{gperf}. 303 304@node Input Format, Output Format, Description, Description 305@section Input Format to @code{gperf} 306@cindex Format 307@cindex Declaration section 308@cindex Keywords section 309@cindex Functions section 310You can control the input file format by varying certain command-line 311arguments, in particular the @samp{-t} option. The input's appearance 312is similar to GNU utilities @code{flex} and @code{bison} (or UNIX 313utilities @code{lex} and @code{yacc}). Here's an outline of the general 314format: 315 316@example 317@group 318declarations 319%% 320keywords 321%% 322functions 323@end group 324@end example 325 326@emph{Unlike} @code{flex} or @code{bison}, the declarations section and 327the functions section are optional. The following sections describe the 328input format for each section. 329 330@menu 331* Declarations:: Declarations. 332* Keywords:: Format for Keyword Entries. 333* Functions:: Including Additional C Functions. 334* Controls for GNU indent:: Where to place directives for GNU @code{indent}. 335@end menu 336 337It is possible to omit the declaration section entirely, if the @samp{-t} 338option is not given. In this case the input file begins directly with the 339first keyword line, e.g.: 340 341@example 342@group 343january 344february 345march 346april 347... 348@end group 349@end example 350 351@node Declarations, Keywords, Input Format, Input Format 352@subsection Declarations 353 354The keyword input file optionally contains a section for including 355arbitrary C declarations and definitions, @code{gperf} declarations that 356act like command-line options, as well as for providing a user-supplied 357@code{struct}. 358 359@menu 360* User-supplied Struct:: Specifying keywords with attributes. 361* Gperf Declarations:: Embedding command line options in the input. 362* C Code Inclusion:: Including C declarations and definitions. 363@end menu 364 365@node User-supplied Struct, Gperf Declarations, Declarations, Declarations 366@subsubsection User-supplied @code{struct} 367 368If the @samp{-t} option (or, equivalently, the @samp{%struct-type} declaration) 369@emph{is} enabled, you @emph{must} provide a C @code{struct} as the last 370component in the declaration section from the input file. The first 371field in this struct must be of type @code{char *} or @code{const char *} 372if the @samp{-P} option is not given, or of type @code{int} if the option 373@samp{-P} (or, equivalently, the @samp{%pic} declaration) is enabled. 374This first field must be called @samp{name}, although it is possible to modify 375its name with the @samp{-K} option (or, equivalently, the 376@samp{%define slot-name} declaration) described below. 377 378Here is a simple example, using months of the year and their attributes as 379input: 380 381@example 382@group 383struct month @{ char *name; int number; int days; int leap_days; @}; 384%% 385january, 1, 31, 31 386february, 2, 28, 29 387march, 3, 31, 31 388april, 4, 30, 30 389may, 5, 31, 31 390june, 6, 30, 30 391july, 7, 31, 31 392august, 8, 31, 31 393september, 9, 30, 30 394october, 10, 31, 31 395november, 11, 30, 30 396december, 12, 31, 31 397@end group 398@end example 399 400@cindex @samp{%%} 401Separating the @code{struct} declaration from the list of keywords and 402other fields are a pair of consecutive percent signs, @samp{%%}, 403appearing left justified in the first column, as in the UNIX utility 404@code{lex}. 405 406If the @code{struct} has already been declared in an include file, it can 407be mentioned in an abbreviated form, like this: 408 409@example 410@group 411struct month; 412%% 413january, 1, 31, 31 414... 415@end group 416@end example 417 418@node Gperf Declarations, C Code Inclusion, User-supplied Struct, Declarations 419@subsubsection Gperf Declarations 420 421The declaration section can contain @code{gperf} declarations. They 422influence the way @code{gperf} works, like command line options do. 423In fact, every such declaration is equivalent to a command line option. 424There are three forms of declarations: 425 426@enumerate 427@item 428Declarations without argument, like @samp{%compare-lengths}. 429 430@item 431Declarations with an argument, like @samp{%switch=@var{count}}. 432 433@item 434Declarations of names of entities in the output file, like 435@samp{%define lookup-function-name @var{name}}. 436@end enumerate 437 438When a declaration is given both in the input file and as a command line 439option, the command-line option's value prevails. 440 441The following @code{gperf} declarations are available. 442 443@table @samp 444@item %delimiters=@var{delimiter-list} 445@cindex @samp{%delimiters} 446Allows you to provide a string containing delimiters used to 447separate keywords from their attributes. The default is ",". This 448option is essential if you want to use keywords that have embedded 449commas or newlines. 450 451@item %struct-type 452@cindex @samp{%struct-type} 453Allows you to include a @code{struct} type declaration for generated 454code; see above for an example. 455 456@item %ignore-case 457@cindex @samp{%ignore-case} 458Consider upper and lower case ASCII characters as equivalent. The string 459comparison will use a case insignificant character comparison. Note that 460locale dependent case mappings are ignored. 461 462@item %language=@var{language-name} 463@cindex @samp{%language} 464Instructs @code{gperf} to generate code in the language specified by the 465option's argument. Languages handled are currently: 466 467@table @samp 468@item KR-C 469Old-style K&R C. This language is understood by old-style C compilers and 470ANSI C compilers, but ANSI C compilers may flag warnings (or even errors) 471because of lacking @samp{const}. 472 473@item C 474Common C. This language is understood by ANSI C compilers, and also by 475old-style C compilers, provided that you @code{#define const} to empty 476for compilers which don't know about this keyword. 477 478@item ANSI-C 479ANSI C. This language is understood by ANSI C compilers and C++ compilers. 480 481@item C++ 482C++. This language is understood by C++ compilers. 483@end table 484 485The default is C. 486 487@item %define slot-name @var{name} 488@cindex @samp{%define slot-name} 489This declaration is only useful when option @samp{-t} (or, equivalently, the 490@samp{%struct-type} declaration) has been given. 491By default, the program assumes the structure component identifier for 492the keyword is @samp{name}. This option allows an arbitrary choice of 493identifier for this component, although it still must occur as the first 494field in your supplied @code{struct}. 495 496@item %define initializer-suffix @var{initializers} 497@cindex @samp{%define initializer-suffix} 498This declaration is only useful when option @samp{-t} (or, equivalently, the 499@samp{%struct-type} declaration) has been given. 500It permits to specify initializers for the structure members following 501@var{slot-name} in empty hash table entries. The list of initializers 502should start with a comma. By default, the emitted code will 503zero-initialize structure members following @var{slot-name}. 504 505@item %define hash-function-name @var{name} 506@cindex @samp{%define hash-function-name} 507Allows you to specify the name for the generated hash function. Default 508name is @samp{hash}. This option permits the use of two hash tables in 509the same file. 510 511@item %define lookup-function-name @var{name} 512@cindex @samp{%define lookup-function-name} 513Allows you to specify the name for the generated lookup function. 514Default name is @samp{in_word_set}. This option permits multiple 515generated hash functions to be used in the same application. 516 517@item %define class-name @var{name} 518@cindex @samp{%define class-name} 519This option is only useful when option @samp{-L C++} (or, equivalently, 520the @samp{%language=C++} declaration) has been given. It 521allows you to specify the name of generated C++ class. Default name is 522@code{Perfect_Hash}. 523 524@item %7bit 525@cindex @samp{%7bit} 526This option specifies that all strings that will be passed as arguments 527to the generated hash function and the generated lookup function will 528solely consist of 7-bit ASCII characters (bytes in the range 0..127). 529(Note that the ANSI C functions @code{isalnum} and @code{isgraph} do 530@emph{not} guarantee that a byte is in this range. Only an explicit 531test like @samp{c >= 'A' && c <= 'Z'} guarantees this.) 532 533@item %compare-lengths 534@cindex @samp{%compare-lengths} 535Compare keyword lengths before trying a string comparison. This option 536is mandatory for binary comparisons (@pxref{Binary Strings}). It also might 537cut down on the number of string comparisons made during the lookup, since 538keywords with different lengths are never compared via @code{strcmp}. 539However, using @samp{%compare-lengths} might greatly increase the size of the 540generated C code if the lookup table range is large (which implies that 541the switch option @samp{-S} or @samp{%switch} is not enabled), since the length 542table contains as many elements as there are entries in the lookup table. 543 544@item %compare-strncmp 545@cindex @samp{%compare-strncmp} 546Generates C code that uses the @code{strncmp} function to perform 547string comparisons. The default action is to use @code{strcmp}. 548 549@item %readonly-tables 550@cindex @samp{%readonly-tables} 551Makes the contents of all generated lookup tables constant, i.e., 552``readonly''. Many compilers can generate more efficient code for this 553by putting the tables in readonly memory. 554 555@item %enum 556@cindex @samp{%enum} 557Define constant values using an enum local to the lookup function rather 558than with #defines. This also means that different lookup functions can 559reside in the same file. Thanks to James Clark @code{<jjc@@ai.mit.edu>}. 560 561@item %includes 562@cindex @samp{%includes} 563Include the necessary system include file, @code{<string.h>}, at the 564beginning of the code. By default, this is not done; the user must 565include this header file himself to allow compilation of the code. 566 567@item %global-table 568@cindex @samp{%global-table} 569Generate the static table of keywords as a static global variable, 570rather than hiding it inside of the lookup function (which is the 571default behavior). 572 573@item %pic 574@cindex @samp{%pic} 575Optimize the generated table for inclusion in shared libraries. This 576reduces the startup time of programs using a shared library containing 577the generated code. If the @samp{%struct-type} declaration (or, 578equivalently, the option @samp{-t}) is also given, the first field of the 579user-defined struct must be of type @samp{int}, not @samp{char *}, because 580it will contain offsets into the string pool instead of actual strings. 581To convert such an offset to a string, you can use the expression 582@samp{stringpool + @var{o}}, where @var{o} is the offset. The string pool 583name can be changed through the @samp{%define string-pool-name} declaration. 584 585@item %define string-pool-name @var{name} 586@cindex @samp{%define string-pool-name} 587Allows you to specify the name of the generated string pool created by 588the declaration @samp{%pic} (or, equivalently, the option @samp{-P}). 589The default name is @samp{stringpool}. This declaration permits the use of 590two hash tables in the same file, with @samp{%pic} and even when the 591@samp{%global-table} declaration (or, equivalently, the option @samp{-G}) 592is given. 593 594@item %null-strings 595@cindex @samp{%null-strings} 596Use NULL strings instead of empty strings for empty keyword table entries. 597This reduces the startup time of programs using a shared library containing 598the generated code (but not as much as the declaration @samp{%pic}), at the 599expense of one more test-and-branch instruction at run time. 600 601@item %define word-array-name @var{name} 602@cindex @samp{%define word-array-name} 603Allows you to specify the name for the generated array containing the 604hash table. Default name is @samp{wordlist}. This option permits the 605use of two hash tables in the same file, even when the option @samp{-G} 606(or, equivalently, the @samp{%global-table} declaration) is given. 607 608@item %define length-table-name @var{name} 609@cindex @samp{%define length-table-name} 610Allows you to specify the name for the generated array containing the 611length table. Default name is @samp{lengthtable}. This option permits the 612use of two length tables in the same file, even when the option @samp{-G} 613(or, equivalently, the @samp{%global-table} declaration) is given. 614 615@item %switch=@var{count} 616@cindex @samp{%switch} 617Causes the generated C code to use a @code{switch} statement scheme, 618rather than an array lookup table. This can lead to a reduction in both 619time and space requirements for some input files. The argument to this 620option determines how many @code{switch} statements are generated. A 621value of 1 generates 1 @code{switch} containing all the elements, a 622value of 2 generates 2 tables with 1/2 the elements in each 623@code{switch}, etc. This is useful since many C compilers cannot 624correctly generate code for large @code{switch} statements. This option 625was inspired in part by Keith Bostic's original C program. 626 627@item %omit-struct-type 628@cindex @samp{%omit-struct-type} 629Prevents the transfer of the type declaration to the output file. Use 630this option if the type is already defined elsewhere. 631@end table 632 633@node C Code Inclusion, , Gperf Declarations, Declarations 634@subsubsection C Code Inclusion 635 636@cindex @samp{%@{} 637@cindex @samp{%@}} 638Using a syntax similar to GNU utilities @code{flex} and @code{bison}, it 639is possible to directly include C source text and comments verbatim into 640the generated output file. This is accomplished by enclosing the region 641inside left-justified surrounding @samp{%@{}, @samp{%@}} pairs. Here is 642an input fragment based on the previous example that illustrates this 643feature: 644 645@example 646@group 647%@{ 648#include <assert.h> 649/* This section of code is inserted directly into the output. */ 650int return_month_days (struct month *months, int is_leap_year); 651%@} 652struct month @{ char *name; int number; int days; int leap_days; @}; 653%% 654january, 1, 31, 31 655february, 2, 28, 29 656march, 3, 31, 31 657... 658@end group 659@end example 660 661@node Keywords, Functions, Declarations, Input Format 662@subsection Format for Keyword Entries 663 664The second input file format section contains lines of keywords and any 665associated attributes you might supply. A line beginning with @samp{#} 666in the first column is considered a comment. Everything following the 667@samp{#} is ignored, up to and including the following newline. A line 668beginning with @samp{%} in the first column is an option declaration and 669must not occur within the keywords section. 670 671The first field of each non-comment line is always the keyword itself. It 672can be given in two ways: as a simple name, i.e., without surrounding 673string quotation marks, or as a string enclosed in double-quotes, in 674C syntax, possibly with backslash escapes like @code{\"} or @code{\234} 675or @code{\xa8}. In either case, it must start right at the beginning 676of the line, without leading whitespace. 677In this context, a ``field'' is considered to extend up to, but 678not include, the first blank, comma, or newline. Here is a simple 679example taken from a partial list of C reserved words: 680 681@example 682@group 683# These are a few C reserved words, see the c.gperf file 684# for a complete list of ANSI C reserved words. 685unsigned 686sizeof 687switch 688signed 689if 690default 691for 692while 693return 694@end group 695@end example 696 697Note that unlike @code{flex} or @code{bison} the first @samp{%%} marker 698may be elided if the declaration section is empty. 699 700Additional fields may optionally follow the leading keyword. Fields 701should be separated by commas, and terminate at the end of line. What 702these fields mean is entirely up to you; they are used to initialize the 703elements of the user-defined @code{struct} provided by you in the 704declaration section. If the @samp{-t} option (or, equivalently, the 705@samp{%struct-type} declaration) is @emph{not} enabled 706these fields are simply ignored. All previous examples except the last 707one contain keyword attributes. 708 709@node Functions, Controls for GNU indent, Keywords, Input Format 710@subsection Including Additional C Functions 711 712The optional third section also corresponds closely with conventions 713found in @code{flex} and @code{bison}. All text in this section, 714starting at the final @samp{%%} and extending to the end of the input 715file, is included verbatim into the generated output file. Naturally, 716it is your responsibility to ensure that the code contained in this 717section is valid C. 718 719@node Controls for GNU indent, , Functions, Input Format 720@subsection Where to place directives for GNU @code{indent}. 721 722If you want to invoke GNU @code{indent} on a @code{gperf} input file, 723you will see that GNU @code{indent} doesn't understand the @samp{%%}, 724@samp{%@{} and @samp{%@}} directives that control @code{gperf}'s 725interpretation of the input file. Therefore you have to insert some 726directives for GNU @code{indent}. More precisely, assuming the most 727general input file structure 728 729@example 730@group 731declarations part 1 732%@{ 733verbatim code 734%@} 735declarations part 2 736%% 737keywords 738%% 739functions 740@end group 741@end example 742 743@noindent 744you would insert @samp{*INDENT-OFF*} and @samp{*INDENT-ON*} comments 745as follows: 746 747@example 748@group 749/* *INDENT-OFF* */ 750declarations part 1 751%@{ 752/* *INDENT-ON* */ 753verbatim code 754/* *INDENT-OFF* */ 755%@} 756declarations part 2 757%% 758keywords 759%% 760/* *INDENT-ON* */ 761functions 762@end group 763@end example 764 765@node Output Format, Binary Strings, Input Format, Description 766@section Output Format for Generated C Code with @code{gperf} 767@cindex hash table 768 769Several options control how the generated C code appears on the standard 770output. Two C functions are generated. They are called @code{hash} and 771@code{in_word_set}, although you may modify their names with a command-line 772option. Both functions require two arguments, a string, @code{char *} 773@var{str}, and a length parameter, @code{int} @var{len}. Their default 774function prototypes are as follows: 775 776@deftypefun {unsigned int} hash (const char * @var{str}, unsigned int @var{len}) 777By default, the generated @code{hash} function returns an integer value 778created by adding @var{len} to several user-specified @var{str} byte 779positions indexed into an @dfn{associated values} table stored in a 780local static array. The associated values table is constructed 781internally by @code{gperf} and later output as a static local C array 782called @samp{hash_table}. The relevant selected positions (i.e. indices 783into @var{str}) are specified via the @samp{-k} option when running 784@code{gperf}, as detailed in the @emph{Options} section below (@pxref{Options}). 785@end deftypefun 786 787@deftypefun {} in_word_set (const char * @var{str}, unsigned int @var{len}) 788If @var{str} is in the keyword set, returns a pointer to that 789keyword. More exactly, if the option @samp{-t} (or, equivalently, the 790@samp{%struct-type} declaration) was given, it returns 791a pointer to the matching keyword's structure. Otherwise it returns 792@code{NULL}. 793@end deftypefun 794 795If the option @samp{-c} (or, equivalently, the @samp{%compare-strncmp} 796declaration) is not used, @var{str} must be a NUL terminated 797string of exactly length @var{len}. If @samp{-c} (or, equivalently, the 798@samp{%compare-strncmp} declaration) is used, @var{str} must 799simply be an array of @var{len} bytes and does not need to be NUL 800terminated. 801 802The code generated for these two functions is affected by the following 803options: 804 805@table @samp 806@item -t 807@itemx --struct-type 808Make use of the user-defined @code{struct}. 809 810@item -S @var{total-switch-statements} 811@itemx --switch=@var{total-switch-statements} 812@cindex @code{switch} 813Generate 1 or more C @code{switch} statement rather than use a large, 814(and potentially sparse) static array. Although the exact time and 815space savings of this approach vary according to your C compiler's 816degree of optimization, this method often results in smaller and faster 817code. 818@end table 819 820If the @samp{-t} and @samp{-S} options (or, equivalently, the 821@samp{%struct-type} and @samp{%switch} declarations) are omitted, the default 822action 823is to generate a @code{char *} array containing the keywords, together with 824additional empty strings used for padding the array. By experimenting 825with the various input and output options, and timing the resulting C 826code, you can determine the best option choices for different keyword 827set characteristics. 828 829@node Binary Strings, , Output Format, Description 830@section Use of NUL bytes 831@cindex NUL 832 833By default, the code generated by @code{gperf} operates on zero 834terminated strings, the usual representation of strings in C. This means 835that the keywords in the input file must not contain NUL bytes, 836and the @var{str} argument passed to @code{hash} or @code{in_word_set} 837must be NUL terminated and have exactly length @var{len}. 838 839If option @samp{-c} (or, equivalently, the @samp{%compare-strncmp} 840declaration) is used, then the @var{str} argument does not need 841to be NUL terminated. The code generated by @code{gperf} will only 842access the first @var{len}, not @var{len+1}, bytes starting at @var{str}. 843However, the keywords in the input file still must not contain NUL 844bytes. 845 846If option @samp{-l} (or, equivalently, the @samp{%compare-lengths} 847declaration) is used, then the hash table performs binary 848comparison. The keywords in the input file may contain NUL bytes, 849written in string syntax as @code{\000} or @code{\x00}, and the code 850generated by @code{gperf} will treat NUL like any other byte. 851Also, in this case the @samp{-c} option (or, equivalently, the 852@samp{%compare-strncmp} declaration) is ignored. 853 854@node Options, Bugs, Description, Top 855@chapter Invoking @code{gperf} 856 857There are @emph{many} options to @code{gperf}. They were added to make 858the program more convenient for use with real applications. ``On-line'' 859help is readily available via the @samp{--help} option. Here is the 860complete list of options. 861 862@menu 863* Output File:: Specifying the Location of the Output File 864* Input Details:: Options that affect Interpretation of the Input File 865* Output Language:: Specifying the Language for the Output Code 866* Output Details:: Fine tuning Details in the Output Code 867* Algorithmic Details:: Changing the Algorithms employed by @code{gperf} 868* Verbosity:: Informative Output 869@end menu 870 871@node Output File, Input Details, Options, Options 872@section Specifying the Location of the Output File 873 874@table @samp 875@item --output-file=@var{file} 876Allows you to specify the name of the file to which the output is written to. 877@end table 878 879The results are written to standard output if no output file is specified 880or if it is @samp{-}. 881 882@node Input Details, Output Language, Output File, Options 883@section Options that affect Interpretation of the Input File 884 885These options are also available as declarations in the input file 886(@pxref{Gperf Declarations}). 887 888@table @samp 889@item -e @var{keyword-delimiter-list} 890@itemx --delimiters=@var{keyword-delimiter-list} 891@cindex Delimiters 892Allows you to provide a string containing delimiters used to 893separate keywords from their attributes. The default is ",". This 894option is essential if you want to use keywords that have embedded 895commas or newlines. One useful trick is to use -e'TAB', where TAB is 896the literal tab character. 897 898@item -t 899@itemx --struct-type 900Allows you to include a @code{struct} type declaration for generated 901code. Any text before a pair of consecutive @samp{%%} is considered 902part of the type declaration. Keywords and additional fields may follow 903this, one group of fields per line. A set of examples for generating 904perfect hash tables and functions for Ada, C, C++, Pascal, Modula 2, 905Modula 3 and JavaScript reserved words are distributed with this release. 906 907@item --ignore-case 908Consider upper and lower case ASCII characters as equivalent. The string 909comparison will use a case insignificant character comparison. Note that 910locale dependent case mappings are ignored. This option is therefore not 911suitable if a properly internationalized or locale aware case mapping 912should be used. (For example, in a Turkish locale, the upper case equivalent 913of the lowercase ASCII letter @samp{i} is the non-ASCII character 914@samp{capital i with dot above}.) For this case, it is better to apply 915an uppercase or lowercase conversion on the string before passing it to 916the @code{gperf} generated function. 917@end table 918 919@node Output Language, Output Details, Input Details, Options 920@section Options to specify the Language for the Output Code 921 922These options are also available as declarations in the input file 923(@pxref{Gperf Declarations}). 924 925@table @samp 926@item -L @var{generated-language-name} 927@itemx --language=@var{generated-language-name} 928Instructs @code{gperf} to generate code in the language specified by the 929option's argument. Languages handled are currently: 930 931@table @samp 932@item KR-C 933Old-style K&R C. This language is understood by old-style C compilers and 934ANSI C compilers, but ANSI C compilers may flag warnings (or even errors) 935because of lacking @samp{const}. 936 937@item C 938Common C. This language is understood by ANSI C compilers, and also by 939old-style C compilers, provided that you @code{#define const} to empty 940for compilers which don't know about this keyword. 941 942@item ANSI-C 943ANSI C. This language is understood by ANSI C compilers and C++ compilers. 944 945@item C++ 946C++. This language is understood by C++ compilers. 947@end table 948 949The default is C. 950 951@item -a 952This option is supported for compatibility with previous releases of 953@code{gperf}. It does not do anything. 954 955@item -g 956This option is supported for compatibility with previous releases of 957@code{gperf}. It does not do anything. 958@end table 959 960@node Output Details, Algorithmic Details, Output Language, Options 961@section Options for fine tuning Details in the Output Code 962 963Most of these options are also available as declarations in the input file 964(@pxref{Gperf Declarations}). 965 966@table @samp 967@item -K @var{slot-name} 968@itemx --slot-name=@var{slot-name} 969@cindex Slot name 970This option is only useful when option @samp{-t} (or, equivalently, the 971@samp{%struct-type} declaration) has been given. 972By default, the program assumes the structure component identifier for 973the keyword is @samp{name}. This option allows an arbitrary choice of 974identifier for this component, although it still must occur as the first 975field in your supplied @code{struct}. 976 977@item -F @var{initializers} 978@itemx --initializer-suffix=@var{initializers} 979@cindex Initializers 980This option is only useful when option @samp{-t} (or, equivalently, the 981@samp{%struct-type} declaration) has been given. 982It permits to specify initializers for the structure members following 983@var{slot-name} in empty hash table entries. The list of initializers 984should start with a comma. By default, the emitted code will 985zero-initialize structure members following @var{slot-name}. 986 987@item -H @var{hash-function-name} 988@itemx --hash-function-name=@var{hash-function-name} 989Allows you to specify the name for the generated hash function. Default 990name is @samp{hash}. This option permits the use of two hash tables in 991the same file. 992 993@item -N @var{lookup-function-name} 994@itemx --lookup-function-name=@var{lookup-function-name} 995Allows you to specify the name for the generated lookup function. 996Default name is @samp{in_word_set}. This option permits multiple 997generated hash functions to be used in the same application. 998 999@item -Z @var{class-name} 1000@itemx --class-name=@var{class-name} 1001@cindex Class name 1002This option is only useful when option @samp{-L C++} (or, equivalently, 1003the @samp{%language=C++} declaration) has been given. It 1004allows you to specify the name of generated C++ class. Default name is 1005@code{Perfect_Hash}. 1006 1007@item -7 1008@itemx --seven-bit 1009This option specifies that all strings that will be passed as arguments 1010to the generated hash function and the generated lookup function will 1011solely consist of 7-bit ASCII characters (bytes in the range 0..127). 1012(Note that the ANSI C functions @code{isalnum} and @code{isgraph} do 1013@emph{not} guarantee that a byte is in this range. Only an explicit 1014test like @samp{c >= 'A' && c <= 'Z'} guarantees this.) This was the 1015default in versions of @code{gperf} earlier than 2.7; now the default is 1016to support 8-bit and multibyte characters. 1017 1018@item -l 1019@itemx --compare-lengths 1020Compare keyword lengths before trying a string comparison. This option 1021is mandatory for binary comparisons (@pxref{Binary Strings}). It also might 1022cut down on the number of string comparisons made during the lookup, since 1023keywords with different lengths are never compared via @code{strcmp}. 1024However, using @samp{-l} might greatly increase the size of the 1025generated C code if the lookup table range is large (which implies that 1026the switch option @samp{-S} or @samp{%switch} is not enabled), since the length 1027table contains as many elements as there are entries in the lookup table. 1028 1029@item -c 1030@itemx --compare-strncmp 1031Generates C code that uses the @code{strncmp} function to perform 1032string comparisons. The default action is to use @code{strcmp}. 1033 1034@item -C 1035@itemx --readonly-tables 1036Makes the contents of all generated lookup tables constant, i.e., 1037``readonly''. Many compilers can generate more efficient code for this 1038by putting the tables in readonly memory. 1039 1040@item -E 1041@itemx --enum 1042Define constant values using an enum local to the lookup function rather 1043than with #defines. This also means that different lookup functions can 1044reside in the same file. Thanks to James Clark @code{<jjc@@ai.mit.edu>}. 1045 1046@item -I 1047@itemx --includes 1048Include the necessary system include file, @code{<string.h>}, at the 1049beginning of the code. By default, this is not done; the user must 1050include this header file himself to allow compilation of the code. 1051 1052@item -G 1053@itemx --global-table 1054Generate the static table of keywords as a static global variable, 1055rather than hiding it inside of the lookup function (which is the 1056default behavior). 1057 1058@item -P 1059@itemx --pic 1060Optimize the generated table for inclusion in shared libraries. This 1061reduces the startup time of programs using a shared library containing 1062the generated code. If the option @samp{-t} (or, equivalently, the 1063@samp{%struct-type} declaration) is also given, the first field of the 1064user-defined struct must be of type @samp{int}, not @samp{char *}, because 1065it will contain offsets into the string pool instead of actual strings. 1066To convert such an offset to a string, you can use the expression 1067@samp{stringpool + @var{o}}, where @var{o} is the offset. The string pool 1068name can be changed through the option @samp{--string-pool-name}. 1069 1070@item -Q @var{string-pool-name} 1071@itemx --string-pool-name=@var{string-pool-name} 1072Allows you to specify the name of the generated string pool created by 1073option @samp{-P}. The default name is @samp{stringpool}. This option 1074permits the use of two hash tables in the same file, with @samp{-P} and 1075even when the option @samp{-G} (or, equivalently, the @samp{%global-table} 1076declaration) is given. 1077 1078@item --null-strings 1079Use NULL strings instead of empty strings for empty keyword table entries. 1080This reduces the startup time of programs using a shared library containing 1081the generated code (but not as much as option @samp{-P}), at the expense 1082of one more test-and-branch instruction at run time. 1083 1084@item -W @var{hash-table-array-name} 1085@itemx --word-array-name=@var{hash-table-array-name} 1086@cindex Array name 1087Allows you to specify the name for the generated array containing the 1088hash table. Default name is @samp{wordlist}. This option permits the 1089use of two hash tables in the same file, even when the option @samp{-G} 1090(or, equivalently, the @samp{%global-table} declaration) is given. 1091 1092@itemx --length-table-name=@var{length-table-array-name} 1093@cindex Array name 1094Allows you to specify the name for the generated array containing the 1095length table. Default name is @samp{lengthtable}. This option permits the 1096use of two length tables in the same file, even when the option @samp{-G} 1097(or, equivalently, the @samp{%global-table} declaration) is given. 1098 1099@item -S @var{total-switch-statements} 1100@itemx --switch=@var{total-switch-statements} 1101@cindex @code{switch} 1102Causes the generated C code to use a @code{switch} statement scheme, 1103rather than an array lookup table. This can lead to a reduction in both 1104time and space requirements for some input files. The argument to this 1105option determines how many @code{switch} statements are generated. A 1106value of 1 generates 1 @code{switch} containing all the elements, a 1107value of 2 generates 2 tables with 1/2 the elements in each 1108@code{switch}, etc. This is useful since many C compilers cannot 1109correctly generate code for large @code{switch} statements. This option 1110was inspired in part by Keith Bostic's original C program. 1111 1112@item -T 1113@itemx --omit-struct-type 1114Prevents the transfer of the type declaration to the output file. Use 1115this option if the type is already defined elsewhere. 1116 1117@item -p 1118This option is supported for compatibility with previous releases of 1119@code{gperf}. It does not do anything. 1120@end table 1121 1122@node Algorithmic Details, Verbosity, Output Details, Options 1123@section Options for changing the Algorithms employed by @code{gperf} 1124 1125@table @samp 1126@item -k @var{selected-byte-positions} 1127@itemx --key-positions=@var{selected-byte-positions} 1128Allows selection of the byte positions used in the keywords' 1129hash function. The allowable choices range between 1-255, inclusive. 1130The positions are separated by commas, e.g., @samp{-k 9,4,13,14}; 1131ranges may be used, e.g., @samp{-k 2-7}; and positions may occur 1132in any order. Furthermore, the wildcard '*' causes the generated 1133hash function to consider @strong{all} byte positions in each keyword, 1134whereas '$' instructs the hash function to use the ``final byte'' 1135of a keyword (this is the only way to use a byte position greater than 1136255, incidentally). 1137 1138For instance, the option @samp{-k 1,2,4,6-10,'$'} generates a hash 1139function that considers positions 1,2,4,6,7,8,9,10, plus the last 1140byte in each keyword (which may be at a different position for each 1141keyword, obviously). Keywords 1142with length less than the indicated byte positions work properly, since 1143selected byte positions exceeding the keyword length are simply not 1144referenced in the hash function. 1145 1146This option is not normally needed since version 2.8 of @code{gperf}; 1147the default byte positions are computed depending on the keyword set, 1148through a search that minimizes the number of byte positions. 1149 1150@item -D 1151@itemx --duplicates 1152@cindex Duplicates 1153Handle keywords whose selected byte sets hash to duplicate values. 1154Duplicate hash values can occur if a set of keywords has the same names, but 1155possesses different attributes, or if the selected byte positions are not well 1156chosen. With the -D option @code{gperf} treats all these keywords as 1157part of an equivalence class and generates a perfect hash function with 1158multiple comparisons for duplicate keywords. It is up to you to completely 1159disambiguate the keywords by modifying the generated C code. However, 1160@code{gperf} helps you out by organizing the output. 1161 1162Using this option usually means that the generated hash function is no 1163longer perfect. On the other hand, it permits @code{gperf} to work on 1164keyword sets that it otherwise could not handle. 1165 1166@item -m @var{iterations} 1167@itemx --multiple-iterations=@var{iterations} 1168Perform multiple choices of the @samp{-i} and @samp{-j} values, and 1169choose the best results. This increases the running time by a factor of 1170@var{iterations} but does a good job minimizing the generated table size. 1171 1172@item -i @var{initial-value} 1173@itemx --initial-asso=@var{initial-value} 1174Provides an initial @var{value} for the associate values array. Default 1175is 0. Increasing the initial value helps inflate the final table size, 1176possibly leading to more time efficient keyword lookups. Note that this 1177option is not particularly useful when @samp{-S} (or, equivalently, 1178@samp{%switch}) is used. Also, 1179@samp{-i} is overridden when the @samp{-r} option is used. 1180 1181@item -j @var{jump-value} 1182@itemx --jump=@var{jump-value} 1183@cindex Jump value 1184Affects the ``jump value'', i.e., how far to advance the associated 1185byte value upon collisions. @var{Jump-value} is rounded up to an 1186odd number, the default is 5. If the @var{jump-value} is 0 @code{gperf} 1187jumps by random amounts. 1188 1189@item -n 1190@itemx --no-strlen 1191Instructs the generator not to include the length of a keyword when 1192computing its hash value. This may save a few assembly instructions in 1193the generated lookup table. 1194 1195@item -r 1196@itemx --random 1197Utilizes randomness to initialize the associated values table. This 1198frequently generates solutions faster than using deterministic 1199initialization (which starts all associated values at 0). Furthermore, 1200using the randomization option generally increases the size of the 1201table. 1202 1203@item -s @var{size-multiple} 1204@itemx --size-multiple=@var{size-multiple} 1205Affects the size of the generated hash table. The numeric argument for 1206this option indicates ``how many times larger or smaller'' the maximum 1207associated value range should be, in relationship to the number of keywords. 1208It can be written as an integer, a floating-point number or a fraction. 1209For example, a value of 3 means ``allow the maximum associated value to be 1210about 3 times larger than the number of input keywords''. 1211Conversely, a value of 1/3 means ``allow the maximum associated value to 1212be about 3 times smaller than the number of input keywords''. Values 1213smaller than 1 are useful for limiting the overall size of the generated hash 1214table, though the option @samp{-m} is better at this purpose. 1215 1216If `generate switch' option @samp{-S} (or, equivalently, @samp{%switch}) is 1217@emph{not} enabled, the maximum 1218associated value influences the static array table size, and a larger 1219table should decrease the time required for an unsuccessful search, at 1220the expense of extra table space. 1221 1222The default value is 1, thus the default maximum associated value about 1223the same size as the number of keywords (for efficiency, the maximum 1224associated value is always rounded up to a power of 2). The actual 1225table size may vary somewhat, since this technique is essentially a 1226heuristic. 1227@end table 1228 1229@node Verbosity, , Algorithmic Details, Options 1230@section Informative Output 1231 1232@table @samp 1233@item -h 1234@itemx --help 1235Prints a short summary on the meaning of each program option. Aborts 1236further program execution. 1237 1238@item -v 1239@itemx --version 1240Prints out the current version number. 1241 1242@item -d 1243@itemx --debug 1244Enables the debugging option. This produces verbose diagnostics to 1245``standard error'' when @code{gperf} is executing. It is useful both for 1246maintaining the program and for determining whether a given set of 1247options is actually speeding up the search for a solution. Some useful 1248information is dumped at the end of the program when the @samp{-d} 1249option is enabled. 1250@end table 1251 1252@node Bugs, Projects, Options, Top 1253@chapter Known Bugs and Limitations with @code{gperf} 1254 1255The following are some limitations with the current release of 1256@code{gperf}: 1257 1258@itemize @bullet 1259@item 1260The @code{gperf} utility is tuned to execute quickly, and works quickly 1261for small to medium size data sets (around 1000 keywords). It is 1262extremely useful for maintaining perfect hash functions for compiler 1263keyword sets. Several recent enhancements now enable @code{gperf} to 1264work efficiently on much larger keyword sets (over 15,000 keywords). 1265When processing large keyword sets it helps greatly to have over 8 megs 1266of RAM. 1267 1268@item 1269The size of the generate static keyword array can get @emph{extremely} 1270large if the input keyword file is large or if the keywords are quite 1271similar. This tends to slow down the compilation of the generated C 1272code, and @emph{greatly} inflates the object code size. If this 1273situation occurs, consider using the @samp{-S} option to reduce data 1274size, potentially increasing keyword recognition time a negligible 1275amount. Since many C compilers cannot correctly generate code for 1276large switch statements it is important to qualify the @var{-S} option 1277with an appropriate numerical argument that controls the number of 1278switch statements generated. 1279 1280@item 1281The maximum number of selected byte positions has an 1282arbitrary limit of 255. This restriction should be removed, and if 1283anyone considers this a problem write me and let me know so I can remove 1284the constraint. 1285@end itemize 1286 1287@node Projects, Bibliography, Bugs, Top 1288@chapter Things Still Left to Do 1289 1290It should be ``relatively'' easy to replace the current perfect hash 1291function algorithm with a more exhaustive approach; the perfect hash 1292module is essential independent from other program modules. Additional 1293worthwhile improvements include: 1294 1295@itemize @bullet 1296@item 1297Another useful extension involves modifying the program to generate 1298``minimal'' perfect hash functions (under certain circumstances, the 1299current version can be rather extravagant in the generated table size). 1300This is mostly of theoretical interest, since a sparse table 1301often produces faster lookups, and use of the @samp{-S} @code{switch} 1302option can minimize the data size, at the expense of slightly longer 1303lookups (note that the gcc compiler generally produces good code for 1304@code{switch} statements, reducing the need for more complex schemes). 1305 1306@item 1307In addition to improving the algorithm, it would also be useful to 1308generate an Ada package as the code output, in addition to the current 1309C and C++ routines. 1310@end itemize 1311 1312@page 1313 1314@node Bibliography, Concept Index, Projects, Top 1315@chapter Bibliography 1316 1317[1] Chang, C.C.: @i{A Scheme for Constructing Ordered Minimal Perfect 1318Hashing Functions} Information Sciences 39(1986), 187-195. 1319 1320[2] Cichelli, Richard J. @i{Author's Response to ``On Cichelli's Minimal Perfect Hash 1321Functions Method''} Communications of the ACM, 23, 12(December 1980), 729. 1322 1323[3] Cichelli, Richard J. @i{Minimal Perfect Hash Functions Made Simple} 1324Communications of the ACM, 23, 1(January 1980), 17-19. 1325 1326[4] Cook, C. R. and Oldehoeft, R.R. @i{A Letter Oriented Minimal 1327Perfect Hashing Function} SIGPLAN Notices, 17, 9(September 1982), 18-27. 1328 1329[5] Cormack, G. V. and Horspool, R. N. S. and Kaiserwerth, M. 1330@i{Practical Perfect Hashing} Computer Journal, 28, 1(January 1985), 54-58. 1331 1332[6] Jaeschke, G. @i{Reciprocal Hashing: A Method for Generating Minimal 1333Perfect Hashing Functions} Communications of the ACM, 24, 12(December 13341981), 829-833. 1335 1336[7] Jaeschke, G. and Osterburg, G. @i{On Cichelli's Minimal Perfect 1337Hash Functions Method} Communications of the ACM, 23, 12(December 1980), 1338728-729. 1339 1340[8] Sager, Thomas J. @i{A Polynomial Time Generator for Minimal Perfect 1341Hash Functions} Communications of the ACM, 28, 5(December 1985), 523-532 1342 1343[9] Schmidt, Douglas C. @i{GPERF: A Perfect Hash Function Generator} 1344Second USENIX C++ Conference Proceedings, April 1990. 1345 1346[10] Schmidt, Douglas C. @i{GPERF: A Perfect Hash Function Generator} 1347C++ Report, SIGS 10 10 (November/December 1998). 1348 1349[11] Sebesta, R.W. and Taylor, M.A. @i{Minimal Perfect Hash Functions 1350for Reserved Word Lists} SIGPLAN Notices, 20, 12(September 1985), 47-53. 1351 1352[12] Sprugnoli, R. @i{Perfect Hashing Functions: A Single Probe 1353Retrieving Method for Static Sets} Communications of the ACM, 20 135411(November 1977), 841-850. 1355 1356[13] Stallman, Richard M. @i{Using and Porting GNU CC} Free Software Foundation, 13571988. 1358 1359[14] Stroustrup, Bjarne @i{The C++ Programming Language.} Addison-Wesley, 1986. 1360 1361[15] Tiemann, Michael D. @i{User's Guide to GNU C++} Free Software 1362Foundation, 1989. 1363 1364@node Concept Index, , Bibliography, Top 1365@unnumbered Concept Index 1366 1367@printindex cp 1368 1369@contents 1370@bye 1371