1\input texinfo @c -*- texinfo -*-
2@c %**start of header
3@setfilename gperf.info
4@settitle Perfect Hash Function Generator
5@c @setchapternewpage odd
6@c %**end of header
7
8@c some day we should @include version.texi instead of defining
9@c these values at hand.
10@set UPDATED 31 March 2007
11@set EDITION 3.0.3
12@set VERSION 3.0.3
13@c ---------------------
14
15@c remove the black boxes generated in the GPL appendix.
16@finalout
17
18@c Merge functions into the concept index
19@syncodeindex fn cp
20@c @synindex pg cp
21
22@dircategory Programming Tools
23@direntry
24* Gperf: (gperf).                Perfect Hash Function Generator.
25@end direntry
26
27@ifinfo
28This file documents the features of the GNU Perfect Hash Function
29Generator @value{VERSION}.
30
31Copyright @copyright{} 1989-2006 Free Software Foundation, Inc.
32
33Permission is granted to make and distribute verbatim copies of this
34manual provided the copyright notice and this permission notice are
35preserved on all copies.
36
37@ignore
38Permission is granted to process this file through TeX and print the
39results, provided the printed document carries a copying permission
40notice identical to this one except for the removal of this paragraph
41(this paragraph not being relevant to the printed manual).
42
43@end ignore
44
45Permission is granted to copy and distribute modified versions of this
46manual under the conditions for verbatim copying, provided also that the
47section entitled ``GNU General Public License'' is included exactly as
48in the original, and provided that the entire resulting derived work is
49distributed under the terms of a permission notice identical to this
50one.
51
52Permission is granted to copy and distribute translations of this manual
53into another language, under the above conditions for modified versions,
54except that the section entitled ``GNU General Public License'' and this
55permission notice may be included in translations approved by the Free
56Software Foundation instead of in the original English.
57
58@end ifinfo
59
60@titlepage
61@title User's Guide to @code{gperf} @value{VERSION}
62@subtitle The GNU Perfect Hash Function Generator
63@subtitle Edition @value{EDITION}, @value{UPDATED}
64@author Douglas C. Schmidt
65@author Bruno Haible
66
67@page
68@vskip 0pt plus 1filll
69Copyright @copyright{} 1989-2007 Free Software Foundation, Inc.
70
71
72Permission is granted to make and distribute verbatim copies of
73this manual provided the copyright notice and this permission notice
74are preserved on all copies.
75
76Permission is granted to copy and distribute modified versions of this
77manual under the conditions for verbatim copying, provided also that the
78section entitled ``GNU General Public License'' is included
79exactly as in the original, and provided that the entire resulting
80derived work is distributed under the terms of a permission notice
81identical to this one.
82
83Permission is granted to copy and distribute translations of this manual
84into another language, under the above conditions for modified versions,
85except that the section entitled ``GNU General Public License'' may be
86included in a translation approved by the author instead of in the
87original English.
88@end titlepage
89
90@ifinfo
91@node Top, Copying, (dir), (dir)
92@top Introduction
93
94This manual documents the GNU @code{gperf} perfect hash function generator
95utility, focusing on its features and how to use them, and how to report
96bugs.
97
98@menu
99* Copying::                     GNU @code{gperf} General Public License says
100                                how you can copy and share @code{gperf}.
101* Contributors::                People who have contributed to @code{gperf}.
102* Motivation::                  The purpose of @code{gperf}.
103* Search Structures::           Static search structures and GNU @code{gperf}
104* Description::                 High-level discussion of how GPERF functions.
105* Options::                     A description of options to the program.
106* Bugs::                        Known bugs and limitations with GPERF.
107* Projects::                    Things still left to do.
108* Bibliography::                Material Referenced in this Report.
109
110* Concept Index::               
111
112@detailmenu --- The Detailed Node Listing ---
113
114High-Level Description of GNU @code{gperf}
115
116* Input Format::                Input Format to @code{gperf}
117* Output Format::               Output Format for Generated C Code with @code{gperf}
118* Binary Strings::              Use of NUL bytes
119
120Input Format to @code{gperf}
121
122* Declarations::                Declarations.
123* Keywords::                    Format for Keyword Entries.
124* Functions::                   Including Additional C Functions.
125* Controls for GNU indent::     Where to place directives for GNU @code{indent}.
126
127Declarations
128
129* User-supplied Struct::        Specifying keywords with attributes.
130* Gperf Declarations::          Embedding command line options in the input.
131* C Code Inclusion::            Including C declarations and definitions.
132
133Invoking @code{gperf}
134
135* Input Details::               Options that affect Interpretation of the Input File
136* Output Language::             Specifying the Language for the Output Code
137* Output Details::              Fine tuning Details in the Output Code
138* Algorithmic Details::         Changing the Algorithms employed by @code{gperf}
139* Verbosity::                   Informative Output
140
141@end detailmenu
142@end menu
143
144@end ifinfo
145
146@node Copying, Contributors, Top, Top
147@unnumbered GNU GENERAL PUBLIC LICENSE
148@include gpl.texinfo
149
150@node Contributors, Motivation, Copying, Top
151@unnumbered Contributors to GNU @code{gperf} Utility
152
153@itemize @bullet
154@item
155@cindex Bugs 
156The GNU @code{gperf} perfect hash function generator utility was
157written in GNU C++ by Douglas C. Schmidt.  The general
158idea for the perfect hash function generator was inspired by Keith
159Bostic's algorithm written in C, and distributed to net.sources around
1601984.  The current program is a heavily modified, enhanced, and extended
161implementation of Keith's basic idea, created at the University of
162California, Irvine.  Bugs, patches, and suggestions should be reported
163to @code{<bug-gnu-gperf@@gnu.org>}.
164
165@item
166Special thanks is extended to Michael Tiemann and Doug Lea, for
167providing a useful compiler, and for giving me a forum to exhibit my
168creation.
169
170In addition, Adam de Boor and Nels Olson provided many tips and insights
171that greatly helped improve the quality and functionality of @code{gperf}.
172
173@item
174Bruno Haible enhanced and optimized the search algorithm.  He also rewrote
175the input routines and the output routines for better reliability, and
176added a testsuite.
177@end itemize
178
179@node Motivation, Search Structures, Contributors, Top
180@chapter Introduction
181
182@code{gperf} is a perfect hash function generator written in C++.  It
183transforms an @var{n} element user-specified keyword set @var{W} into a
184perfect hash function @var{F}.  @var{F} uniquely maps keywords in
185@var{W} onto the range 0..@var{k}, where @var{k} >= @var{n-1}.  If @var{k}
186= @var{n-1} then @var{F} is a @emph{minimal} perfect hash function.
187@code{gperf} generates a 0..@var{k} element static lookup table and a
188pair of C functions.  These functions determine whether a given
189character string @var{s} occurs in @var{W}, using at most one probe into
190the lookup table.
191
192@code{gperf} currently generates the reserved keyword recognizer for
193lexical analyzers in several production and research compilers and
194language processing tools, including GNU C, GNU C++, GNU Java, GNU Pascal,
195GNU Modula 3, and GNU indent.  Complete C++ source code for @code{gperf} is
196available from @code{http://ftp.gnu.org/pub/gnu/gperf/}.
197A paper describing @code{gperf}'s design and implementation in greater
198detail is available in the Second USENIX C++ Conference proceedings
199or from @code{http://www.cs.wustl.edu/~schmidt/resume.html}.
200
201@node Search Structures, Description, Motivation, Top
202@chapter Static search structures and GNU @code{gperf}
203@cindex Static search structure
204
205A @dfn{static search structure} is an Abstract Data Type with certain
206fundamental operations, e.g., @emph{initialize}, @emph{insert},
207and @emph{retrieve}.  Conceptually, all insertions occur before any
208retrievals.  In practice, @code{gperf} generates a @emph{static} array
209containing search set keywords and any associated attributes specified
210by the user.  Thus, there is essentially no execution-time cost for the
211insertions.  It is a useful data structure for representing @emph{static
212search sets}.  Static search sets occur frequently in software system
213applications.  Typical static search sets include compiler reserved
214words, assembler instruction opcodes, and built-in shell interpreter
215commands.  Search set members, called @dfn{keywords}, are inserted into
216the structure only once, usually during program initialization, and are
217not generally modified at run-time.
218
219Numerous static search structure implementations exist, e.g.,
220arrays, linked lists, binary search trees, digital search tries, and
221hash tables.  Different approaches offer trade-offs between space
222utilization and search time efficiency.  For example, an @var{n} element
223sorted array is space efficient, though the average-case time
224complexity for retrieval operations using binary search is
225proportional to log @var{n}.  Conversely, hash table implementations
226often locate a table entry in constant time, but typically impose
227additional memory overhead and exhibit poor worst case performance.
228
229@cindex Minimal perfect hash functions
230@emph{Minimal perfect hash functions} provide an optimal solution for a
231particular class of static search sets.  A minimal perfect hash
232function is defined by two properties:
233
234@itemize @bullet
235@item 
236It allows keyword recognition in a static search set using at most
237@emph{one} probe into the hash table.  This represents the ``perfect''
238property.
239@item 
240The actual memory allocated to store the keywords is precisely large
241enough for the keyword set, and @emph{no larger}.  This is the
242``minimal'' property.
243@end itemize
244
245For most applications it is far easier to generate @emph{perfect} hash
246functions than @emph{minimal perfect} hash functions.  Moreover,
247non-minimal perfect hash functions frequently execute faster than
248minimal ones in practice.  This phenomena occurs since searching a
249sparse keyword table increases the probability of locating a ``null''
250entry, thereby reducing string comparisons.  @code{gperf}'s default
251behavior generates @emph{near-minimal} perfect hash functions for
252keyword sets.  However, @code{gperf} provides many options that permit
253user control over the degree of minimality and perfection.
254
255Static search sets often exhibit relative stability over time.  For
256example, Ada's 63 reserved words have remained constant for nearly a
257decade.  It is therefore frequently worthwhile to expend concerted
258effort building an optimal search structure @emph{once}, if it
259subsequently receives heavy use multiple times.  @code{gperf} removes
260the drudgery associated with constructing time- and space-efficient
261search structures by hand.  It has proven a useful and practical tool
262for serious programming projects.  Output from @code{gperf} is currently
263used in several production and research compilers, including GNU C, GNU
264C++, GNU Java, GNU Pascal, and GNU Modula 3.  The latter two compilers are
265not yet part of the official GNU distribution.  Each compiler utilizes
266@code{gperf} to automatically generate static search structures that
267efficiently identify their respective reserved keywords.
268
269@node Description, Options, Search Structures, Top
270@chapter High-Level Description of GNU @code{gperf}
271
272@menu
273* Input Format::                Input Format to @code{gperf}
274* Output Format::               Output Format for Generated C Code with @code{gperf}
275* Binary Strings::              Use of NUL bytes
276@end menu
277
278The perfect hash function generator @code{gperf} reads a set of
279``keywords'' from an input file (or from the standard input by
280default).  It attempts to derive a perfect hashing function that
281recognizes a member of the @dfn{static keyword set} with at most a
282single probe into the lookup table.  If @code{gperf} succeeds in
283generating such a function it produces a pair of C source code routines
284that perform hashing and table lookup recognition.  All generated C code
285is directed to the standard output.  Command-line options described
286below allow you to modify the input and output format to @code{gperf}.
287
288By default, @code{gperf} attempts to produce time-efficient code, with
289less emphasis on efficient space utilization.  However, several options
290exist that permit trading-off execution time for storage space and vice
291versa.  In particular, expanding the generated table size produces a
292sparse search structure, generally yielding faster searches.
293Conversely, you can direct @code{gperf} to utilize a C @code{switch}
294statement scheme that minimizes data space storage size.  Furthermore,
295using a C @code{switch} may actually speed up the keyword retrieval time
296somewhat.  Actual results depend on your C compiler, of course.
297
298In general, @code{gperf} assigns values to the bytes it is using
299for hashing until some set of values gives each keyword a unique value.
300A helpful heuristic is that the larger the hash value range, the easier
301it is for @code{gperf} to find and generate a perfect hash function.
302Experimentation is the key to getting the most from @code{gperf}.
303
304@node Input Format, Output Format, Description, Description
305@section Input Format to @code{gperf}
306@cindex Format
307@cindex Declaration section
308@cindex Keywords section
309@cindex Functions section
310You can control the input file format by varying certain command-line
311arguments, in particular the @samp{-t} option.  The input's appearance
312is similar to GNU utilities @code{flex} and @code{bison} (or UNIX
313utilities @code{lex} and @code{yacc}).  Here's an outline of the general
314format:
315
316@example
317@group
318declarations
319%%
320keywords
321%%
322functions
323@end group
324@end example
325
326@emph{Unlike} @code{flex} or @code{bison}, the declarations section and
327the functions section are optional.  The following sections describe the
328input format for each section.
329
330@menu
331* Declarations::                Declarations.
332* Keywords::                    Format for Keyword Entries.
333* Functions::                   Including Additional C Functions.
334* Controls for GNU indent::     Where to place directives for GNU @code{indent}.
335@end menu
336
337It is possible to omit the declaration section entirely, if the @samp{-t}
338option is not given.  In this case the input file begins directly with the
339first keyword line, e.g.:
340
341@example
342@group
343january
344february
345march
346april
347...
348@end group
349@end example
350
351@node Declarations, Keywords, Input Format, Input Format
352@subsection Declarations
353
354The keyword input file optionally contains a section for including
355arbitrary C declarations and definitions, @code{gperf} declarations that
356act like command-line options, as well as for providing a user-supplied
357@code{struct}.
358
359@menu
360* User-supplied Struct::        Specifying keywords with attributes.
361* Gperf Declarations::          Embedding command line options in the input.
362* C Code Inclusion::            Including C declarations and definitions.
363@end menu
364
365@node User-supplied Struct, Gperf Declarations, Declarations, Declarations
366@subsubsection User-supplied @code{struct}
367
368If the @samp{-t} option (or, equivalently, the @samp{%struct-type} declaration)
369@emph{is} enabled, you @emph{must} provide a C @code{struct} as the last
370component in the declaration section from the input file.  The first
371field in this struct must be of type @code{char *} or @code{const char *}
372if the @samp{-P} option is not given, or of type @code{int} if the option
373@samp{-P} (or, equivalently, the @samp{%pic} declaration) is enabled.
374This first field must be called @samp{name}, although it is possible to modify
375its name with the @samp{-K} option (or, equivalently, the
376@samp{%define slot-name} declaration) described below.
377
378Here is a simple example, using months of the year and their attributes as
379input:
380
381@example
382@group
383struct month @{ char *name; int number; int days; int leap_days; @};
384%%
385january,   1, 31, 31
386february,  2, 28, 29
387march,     3, 31, 31
388april,     4, 30, 30
389may,       5, 31, 31
390june,      6, 30, 30
391july,      7, 31, 31
392august,    8, 31, 31
393september, 9, 30, 30
394october,  10, 31, 31
395november, 11, 30, 30
396december, 12, 31, 31
397@end group
398@end example
399
400@cindex @samp{%%}
401Separating the @code{struct} declaration from the list of keywords and
402other fields are a pair of consecutive percent signs, @samp{%%},
403appearing left justified in the first column, as in the UNIX utility
404@code{lex}.
405
406If the @code{struct} has already been declared in an include file, it can
407be mentioned in an abbreviated form, like this:
408
409@example
410@group
411struct month;
412%%
413january,   1, 31, 31
414...
415@end group
416@end example
417
418@node Gperf Declarations, C Code Inclusion, User-supplied Struct, Declarations
419@subsubsection Gperf Declarations
420
421The declaration section can contain @code{gperf} declarations.  They
422influence the way @code{gperf} works, like command line options do.
423In fact, every such declaration is equivalent to a command line option.
424There are three forms of declarations:
425
426@enumerate
427@item
428Declarations without argument, like @samp{%compare-lengths}.
429
430@item
431Declarations with an argument, like @samp{%switch=@var{count}}.
432
433@item
434Declarations of names of entities in the output file, like
435@samp{%define lookup-function-name @var{name}}.
436@end enumerate
437
438When a declaration is given both in the input file and as a command line
439option, the command-line option's value prevails.
440
441The following @code{gperf} declarations are available.
442
443@table @samp
444@item %delimiters=@var{delimiter-list}
445@cindex @samp{%delimiters}
446Allows you to provide a string containing delimiters used to
447separate keywords from their attributes.  The default is ",".  This
448option is essential if you want to use keywords that have embedded
449commas or newlines.
450
451@item %struct-type
452@cindex @samp{%struct-type}
453Allows you to include a @code{struct} type declaration for generated
454code; see above for an example.
455
456@item %ignore-case
457@cindex @samp{%ignore-case}
458Consider upper and lower case ASCII characters as equivalent.  The string
459comparison will use a case insignificant character comparison.  Note that
460locale dependent case mappings are ignored.
461
462@item %language=@var{language-name}
463@cindex @samp{%language}
464Instructs @code{gperf} to generate code in the language specified by the
465option's argument.  Languages handled are currently:
466
467@table @samp
468@item KR-C
469Old-style K&R C.  This language is understood by old-style C compilers and
470ANSI C compilers, but ANSI C compilers may flag warnings (or even errors)
471because of lacking @samp{const}.
472
473@item C
474Common C.  This language is understood by ANSI C compilers, and also by
475old-style C compilers, provided that you @code{#define const} to empty
476for compilers which don't know about this keyword.
477
478@item ANSI-C
479ANSI C.  This language is understood by ANSI C compilers and C++ compilers.
480
481@item C++
482C++.  This language is understood by C++ compilers.
483@end table
484
485The default is C.
486
487@item %define slot-name @var{name}
488@cindex @samp{%define slot-name}
489This declaration is only useful when option @samp{-t} (or, equivalently, the
490@samp{%struct-type} declaration) has been given.
491By default, the program assumes the structure component identifier for
492the keyword is @samp{name}.  This option allows an arbitrary choice of
493identifier for this component, although it still must occur as the first
494field in your supplied @code{struct}.
495
496@item %define initializer-suffix @var{initializers}
497@cindex @samp{%define initializer-suffix}
498This declaration is only useful when option @samp{-t} (or, equivalently, the
499@samp{%struct-type} declaration) has been given.
500It permits to specify initializers for the structure members following
501@var{slot-name} in empty hash table entries.  The list of initializers
502should start with a comma.  By default, the emitted code will
503zero-initialize structure members following @var{slot-name}.
504
505@item %define hash-function-name @var{name}
506@cindex @samp{%define hash-function-name}
507Allows you to specify the name for the generated hash function.  Default
508name is @samp{hash}.  This option permits the use of two hash tables in
509the same file.
510
511@item %define lookup-function-name @var{name}
512@cindex @samp{%define lookup-function-name}
513Allows you to specify the name for the generated lookup function.
514Default name is @samp{in_word_set}.  This option permits multiple
515generated hash functions to be used in the same application.
516
517@item %define class-name @var{name}
518@cindex @samp{%define class-name}
519This option is only useful when option @samp{-L C++} (or, equivalently,
520the @samp{%language=C++} declaration) has been given.  It
521allows you to specify the name of generated C++ class.  Default name is
522@code{Perfect_Hash}.
523
524@item %7bit
525@cindex @samp{%7bit}
526This option specifies that all strings that will be passed as arguments
527to the generated hash function and the generated lookup function will
528solely consist of 7-bit ASCII characters (bytes in the range 0..127).
529(Note that the ANSI C functions @code{isalnum} and @code{isgraph} do
530@emph{not} guarantee that a byte is in this range.  Only an explicit
531test like @samp{c >= 'A' && c <= 'Z'} guarantees this.)
532
533@item %compare-lengths
534@cindex @samp{%compare-lengths}
535Compare keyword lengths before trying a string comparison.  This option
536is mandatory for binary comparisons (@pxref{Binary Strings}).  It also might
537cut down on the number of string comparisons made during the lookup, since
538keywords with different lengths are never compared via @code{strcmp}.
539However, using @samp{%compare-lengths} might greatly increase the size of the
540generated C code if the lookup table range is large (which implies that
541the switch option @samp{-S} or @samp{%switch} is not enabled), since the length
542table contains as many elements as there are entries in the lookup table.
543
544@item %compare-strncmp
545@cindex @samp{%compare-strncmp}
546Generates C code that uses the @code{strncmp} function to perform
547string comparisons.  The default action is to use @code{strcmp}.
548
549@item %readonly-tables
550@cindex @samp{%readonly-tables}
551Makes the contents of all generated lookup tables constant, i.e.,
552``readonly''.  Many compilers can generate more efficient code for this
553by putting the tables in readonly memory.
554
555@item %enum
556@cindex @samp{%enum}
557Define constant values using an enum local to the lookup function rather
558than with #defines.  This also means that different lookup functions can
559reside in the same file.  Thanks to James Clark @code{<jjc@@ai.mit.edu>}.
560
561@item %includes
562@cindex @samp{%includes}
563Include the necessary system include file, @code{<string.h>}, at the
564beginning of the code.  By default, this is not done; the user must
565include this header file himself to allow compilation of the code.
566
567@item %global-table
568@cindex @samp{%global-table}
569Generate the static table of keywords as a static global variable,
570rather than hiding it inside of the lookup function (which is the
571default behavior).
572
573@item %pic
574@cindex @samp{%pic}
575Optimize the generated table for inclusion in shared libraries.  This
576reduces the startup time of programs using a shared library containing
577the generated code.  If the @samp{%struct-type} declaration (or,
578equivalently, the option @samp{-t}) is also given, the first field of the
579user-defined struct must be of type @samp{int}, not @samp{char *}, because
580it will contain offsets into the string pool instead of actual strings.
581To convert such an offset to a string, you can use the expression
582@samp{stringpool + @var{o}}, where @var{o} is the offset.  The string pool
583name can be changed through the @samp{%define string-pool-name} declaration.
584
585@item %define string-pool-name @var{name}
586@cindex @samp{%define string-pool-name}
587Allows you to specify the name of the generated string pool created by
588the declaration @samp{%pic} (or, equivalently, the option @samp{-P}).
589The default name is @samp{stringpool}.  This declaration permits the use of
590two hash tables in the same file, with @samp{%pic} and even when the
591@samp{%global-table} declaration (or, equivalently, the option @samp{-G})
592is given.
593
594@item %null-strings
595@cindex @samp{%null-strings}
596Use NULL strings instead of empty strings for empty keyword table entries.
597This reduces the startup time of programs using a shared library containing
598the generated code (but not as much as the declaration @samp{%pic}), at the
599expense of one more test-and-branch instruction at run time.
600
601@item %define word-array-name @var{name}
602@cindex @samp{%define word-array-name}
603Allows you to specify the name for the generated array containing the
604hash table.  Default name is @samp{wordlist}.  This option permits the
605use of two hash tables in the same file, even when the option @samp{-G}
606(or, equivalently, the @samp{%global-table} declaration) is given.
607
608@item %define length-table-name @var{name}
609@cindex @samp{%define length-table-name}
610Allows you to specify the name for the generated array containing the
611length table.  Default name is @samp{lengthtable}.  This option permits the
612use of two length tables in the same file, even when the option @samp{-G}
613(or, equivalently, the @samp{%global-table} declaration) is given.
614
615@item %switch=@var{count}
616@cindex @samp{%switch}
617Causes the generated C code to use a @code{switch} statement scheme,
618rather than an array lookup table.  This can lead to a reduction in both
619time and space requirements for some input files.  The argument to this
620option determines how many @code{switch} statements are generated.  A
621value of 1 generates 1 @code{switch} containing all the elements, a
622value of 2 generates 2 tables with 1/2 the elements in each
623@code{switch}, etc.  This is useful since many C compilers cannot
624correctly generate code for large @code{switch} statements.  This option
625was inspired in part by Keith Bostic's original C program.
626
627@item %omit-struct-type
628@cindex @samp{%omit-struct-type}
629Prevents the transfer of the type declaration to the output file.  Use
630this option if the type is already defined elsewhere.
631@end table
632
633@node C Code Inclusion,  , Gperf Declarations, Declarations
634@subsubsection C Code Inclusion
635
636@cindex @samp{%@{}
637@cindex @samp{%@}}
638Using a syntax similar to GNU utilities @code{flex} and @code{bison}, it
639is possible to directly include C source text and comments verbatim into
640the generated output file.  This is accomplished by enclosing the region
641inside left-justified surrounding @samp{%@{}, @samp{%@}} pairs.  Here is
642an input fragment based on the previous example that illustrates this
643feature:
644
645@example
646@group
647%@{
648#include <assert.h>
649/* This section of code is inserted directly into the output. */
650int return_month_days (struct month *months, int is_leap_year);
651%@}
652struct month @{ char *name; int number; int days; int leap_days; @};
653%%
654january,   1, 31, 31
655february,  2, 28, 29
656march,     3, 31, 31
657...
658@end group
659@end example
660
661@node Keywords, Functions, Declarations, Input Format
662@subsection Format for Keyword Entries
663
664The second input file format section contains lines of keywords and any
665associated attributes you might supply.  A line beginning with @samp{#}
666in the first column is considered a comment.  Everything following the
667@samp{#} is ignored, up to and including the following newline.  A line
668beginning with @samp{%} in the first column is an option declaration and
669must not occur within the keywords section.
670
671The first field of each non-comment line is always the keyword itself.  It
672can be given in two ways: as a simple name, i.e., without surrounding
673string quotation marks, or as a string enclosed in double-quotes, in
674C syntax, possibly with backslash escapes like @code{\"} or @code{\234}
675or @code{\xa8}.  In either case, it must start right at the beginning
676of the line, without leading whitespace.
677In this context, a ``field'' is considered to extend up to, but
678not include, the first blank, comma, or newline.  Here is a simple
679example taken from a partial list of C reserved words:
680
681@example
682@group
683# These are a few C reserved words, see the c.gperf file 
684# for a complete list of ANSI C reserved words.
685unsigned
686sizeof
687switch
688signed
689if
690default
691for
692while
693return
694@end group
695@end example
696
697Note that unlike @code{flex} or @code{bison} the first @samp{%%} marker
698may be elided if the declaration section is empty.
699
700Additional fields may optionally follow the leading keyword.  Fields
701should be separated by commas, and terminate at the end of line.  What
702these fields mean is entirely up to you; they are used to initialize the
703elements of the user-defined @code{struct} provided by you in the
704declaration section.  If the @samp{-t} option (or, equivalently, the
705@samp{%struct-type} declaration) is @emph{not} enabled
706these fields are simply ignored.  All previous examples except the last
707one contain keyword attributes.
708
709@node Functions, Controls for GNU indent, Keywords, Input Format
710@subsection Including Additional C Functions
711
712The optional third section also corresponds closely with conventions
713found in @code{flex} and @code{bison}.  All text in this section,
714starting at the final @samp{%%} and extending to the end of the input
715file, is included verbatim into the generated output file.  Naturally,
716it is your responsibility to ensure that the code contained in this
717section is valid C.
718
719@node Controls for GNU indent,  , Functions, Input Format
720@subsection Where to place directives for GNU @code{indent}.
721
722If you want to invoke GNU @code{indent} on a @code{gperf} input file,
723you will see that GNU @code{indent} doesn't understand the @samp{%%},
724@samp{%@{} and @samp{%@}} directives that control @code{gperf}'s
725interpretation of the input file.  Therefore you have to insert some
726directives for GNU @code{indent}.  More precisely, assuming the most
727general input file structure
728
729@example
730@group
731declarations part 1
732%@{
733verbatim code
734%@}
735declarations part 2
736%%
737keywords
738%%
739functions
740@end group
741@end example
742
743@noindent
744you would insert @samp{*INDENT-OFF*} and @samp{*INDENT-ON*} comments
745as follows:
746
747@example
748@group
749/* *INDENT-OFF* */
750declarations part 1
751%@{
752/* *INDENT-ON* */
753verbatim code
754/* *INDENT-OFF* */
755%@}
756declarations part 2
757%%
758keywords
759%%
760/* *INDENT-ON* */
761functions
762@end group
763@end example
764
765@node Output Format, Binary Strings, Input Format, Description
766@section Output Format for Generated C Code with @code{gperf}
767@cindex hash table
768
769Several options control how the generated C code appears on the standard 
770output.  Two C functions are generated.  They are called @code{hash} and 
771@code{in_word_set}, although you may modify their names with a command-line 
772option.  Both functions require two arguments, a string, @code{char *} 
773@var{str}, and a length parameter, @code{int} @var{len}.  Their default 
774function prototypes are as follows:
775
776@deftypefun {unsigned int} hash (const char * @var{str}, unsigned int @var{len})
777By default, the generated @code{hash} function returns an integer value
778created by adding @var{len} to several user-specified @var{str} byte
779positions indexed into an @dfn{associated values} table stored in a
780local static array.  The associated values table is constructed
781internally by @code{gperf} and later output as a static local C array
782called @samp{hash_table}.  The relevant selected positions (i.e. indices
783into @var{str}) are specified via the @samp{-k} option when running
784@code{gperf}, as detailed in the @emph{Options} section below (@pxref{Options}).
785@end deftypefun
786
787@deftypefun {} in_word_set (const char * @var{str}, unsigned int @var{len})
788If @var{str} is in the keyword set, returns a pointer to that
789keyword.  More exactly, if the option @samp{-t} (or, equivalently, the
790@samp{%struct-type} declaration) was given, it returns
791a pointer to the matching keyword's structure.  Otherwise it returns
792@code{NULL}.
793@end deftypefun
794
795If the option @samp{-c} (or, equivalently, the @samp{%compare-strncmp}
796declaration) is not used, @var{str} must be a NUL terminated
797string of exactly length @var{len}.  If @samp{-c} (or, equivalently, the
798@samp{%compare-strncmp} declaration) is used, @var{str} must
799simply be an array of @var{len} bytes and does not need to be NUL
800terminated.
801
802The code generated for these two functions is affected by the following
803options:
804
805@table @samp
806@item -t
807@itemx --struct-type
808Make use of the user-defined @code{struct}.
809
810@item -S @var{total-switch-statements}
811@itemx --switch=@var{total-switch-statements}
812@cindex @code{switch}
813Generate 1 or more C @code{switch} statement rather than use a large,
814(and potentially sparse) static array.  Although the exact time and
815space savings of this approach vary according to your C compiler's
816degree of optimization, this method often results in smaller and faster
817code.
818@end table
819
820If the @samp{-t} and @samp{-S} options (or, equivalently, the
821@samp{%struct-type} and @samp{%switch} declarations) are omitted, the default
822action
823is to generate a @code{char *} array containing the keywords, together with
824additional empty strings used for padding the array.  By experimenting
825with the various input and output options, and timing the resulting C
826code, you can determine the best option choices for different keyword
827set characteristics.
828
829@node Binary Strings,  , Output Format, Description
830@section Use of NUL bytes
831@cindex NUL
832
833By default, the code generated by @code{gperf} operates on zero
834terminated strings, the usual representation of strings in C.  This means
835that the keywords in the input file must not contain NUL bytes,
836and the @var{str} argument passed to @code{hash} or @code{in_word_set}
837must be NUL terminated and have exactly length @var{len}.
838
839If option @samp{-c} (or, equivalently, the @samp{%compare-strncmp}
840declaration) is used, then the @var{str} argument does not need
841to be NUL terminated.  The code generated by @code{gperf} will only
842access the first @var{len}, not @var{len+1}, bytes starting at @var{str}.
843However, the keywords in the input file still must not contain NUL
844bytes.
845
846If option @samp{-l} (or, equivalently, the @samp{%compare-lengths}
847declaration) is used, then the hash table performs binary
848comparison.  The keywords in the input file may contain NUL bytes,
849written in string syntax as @code{\000} or @code{\x00}, and the code
850generated by @code{gperf} will treat NUL like any other byte.
851Also, in this case the @samp{-c} option (or, equivalently, the
852@samp{%compare-strncmp} declaration) is ignored.
853
854@node Options, Bugs, Description, Top
855@chapter Invoking @code{gperf}
856
857There are @emph{many} options to @code{gperf}.  They were added to make
858the program more convenient for use with real applications.  ``On-line''
859help is readily available via the @samp{--help} option.  Here is the
860complete list of options.
861
862@menu
863* Output File::                 Specifying the Location of the Output File
864* Input Details::               Options that affect Interpretation of the Input File
865* Output Language::             Specifying the Language for the Output Code
866* Output Details::              Fine tuning Details in the Output Code
867* Algorithmic Details::         Changing the Algorithms employed by @code{gperf}
868* Verbosity::                   Informative Output
869@end menu
870
871@node Output File, Input Details, Options, Options
872@section Specifying the Location of the Output File
873
874@table @samp
875@item --output-file=@var{file}
876Allows you to specify the name of the file to which the output is written to.
877@end table
878
879The results are written to standard output if no output file is specified
880or if it is @samp{-}.
881
882@node Input Details, Output Language, Output File, Options
883@section Options that affect Interpretation of the Input File
884
885These options are also available as declarations in the input file
886(@pxref{Gperf Declarations}).
887
888@table @samp
889@item -e @var{keyword-delimiter-list}
890@itemx --delimiters=@var{keyword-delimiter-list}
891@cindex Delimiters
892Allows you to provide a string containing delimiters used to
893separate keywords from their attributes.  The default is ",".  This
894option is essential if you want to use keywords that have embedded
895commas or newlines.  One useful trick is to use -e'TAB', where TAB is
896the literal tab character.
897
898@item -t
899@itemx --struct-type
900Allows you to include a @code{struct} type declaration for generated
901code.  Any text before a pair of consecutive @samp{%%} is considered
902part of the type declaration.  Keywords and additional fields may follow
903this, one group of fields per line.  A set of examples for generating
904perfect hash tables and functions for Ada, C, C++, Pascal, Modula 2,
905Modula 3 and JavaScript reserved words are distributed with this release.
906
907@item --ignore-case
908Consider upper and lower case ASCII characters as equivalent.  The string
909comparison will use a case insignificant character comparison.  Note that
910locale dependent case mappings are ignored.  This option is therefore not
911suitable if a properly internationalized or locale aware case mapping
912should be used.  (For example, in a Turkish locale, the upper case equivalent
913of the lowercase ASCII letter @samp{i} is the non-ASCII character
914@samp{capital i with dot above}.)  For this case, it is better to apply
915an uppercase or lowercase conversion on the string before passing it to
916the @code{gperf} generated function.
917@end table
918
919@node Output Language, Output Details, Input Details, Options
920@section Options to specify the Language for the Output Code
921
922These options are also available as declarations in the input file
923(@pxref{Gperf Declarations}).
924
925@table @samp
926@item -L @var{generated-language-name}
927@itemx --language=@var{generated-language-name}
928Instructs @code{gperf} to generate code in the language specified by the
929option's argument.  Languages handled are currently:
930
931@table @samp
932@item KR-C
933Old-style K&R C.  This language is understood by old-style C compilers and
934ANSI C compilers, but ANSI C compilers may flag warnings (or even errors)
935because of lacking @samp{const}.
936
937@item C
938Common C.  This language is understood by ANSI C compilers, and also by
939old-style C compilers, provided that you @code{#define const} to empty
940for compilers which don't know about this keyword.
941
942@item ANSI-C
943ANSI C.  This language is understood by ANSI C compilers and C++ compilers.
944
945@item C++
946C++.  This language is understood by C++ compilers.
947@end table
948
949The default is C.
950
951@item -a
952This option is supported for compatibility with previous releases of
953@code{gperf}.  It does not do anything.
954
955@item -g
956This option is supported for compatibility with previous releases of
957@code{gperf}.  It does not do anything.
958@end table
959
960@node Output Details, Algorithmic Details, Output Language, Options
961@section Options for fine tuning Details in the Output Code
962
963Most of these options are also available as declarations in the input file
964(@pxref{Gperf Declarations}).
965
966@table @samp
967@item -K @var{slot-name}
968@itemx --slot-name=@var{slot-name}
969@cindex Slot name
970This option is only useful when option @samp{-t} (or, equivalently, the
971@samp{%struct-type} declaration) has been given.
972By default, the program assumes the structure component identifier for
973the keyword is @samp{name}.  This option allows an arbitrary choice of
974identifier for this component, although it still must occur as the first
975field in your supplied @code{struct}.
976
977@item -F @var{initializers}
978@itemx --initializer-suffix=@var{initializers}
979@cindex Initializers
980This option is only useful when option @samp{-t} (or, equivalently, the
981@samp{%struct-type} declaration) has been given.
982It permits to specify initializers for the structure members following
983@var{slot-name} in empty hash table entries.  The list of initializers
984should start with a comma.  By default, the emitted code will
985zero-initialize structure members following @var{slot-name}.
986
987@item -H @var{hash-function-name}
988@itemx --hash-function-name=@var{hash-function-name}
989Allows you to specify the name for the generated hash function.  Default
990name is @samp{hash}.  This option permits the use of two hash tables in
991the same file.
992
993@item -N @var{lookup-function-name}
994@itemx --lookup-function-name=@var{lookup-function-name}
995Allows you to specify the name for the generated lookup function.
996Default name is @samp{in_word_set}.  This option permits multiple
997generated hash functions to be used in the same application.
998
999@item -Z @var{class-name}
1000@itemx --class-name=@var{class-name}
1001@cindex Class name
1002This option is only useful when option @samp{-L C++} (or, equivalently,
1003the @samp{%language=C++} declaration) has been given.  It
1004allows you to specify the name of generated C++ class.  Default name is
1005@code{Perfect_Hash}.
1006
1007@item -7
1008@itemx --seven-bit
1009This option specifies that all strings that will be passed as arguments
1010to the generated hash function and the generated lookup function will
1011solely consist of 7-bit ASCII characters (bytes in the range 0..127).
1012(Note that the ANSI C functions @code{isalnum} and @code{isgraph} do
1013@emph{not} guarantee that a byte is in this range.  Only an explicit
1014test like @samp{c >= 'A' && c <= 'Z'} guarantees this.) This was the
1015default in versions of @code{gperf} earlier than 2.7; now the default is
1016to support 8-bit and multibyte characters.
1017
1018@item -l
1019@itemx --compare-lengths
1020Compare keyword lengths before trying a string comparison.  This option
1021is mandatory for binary comparisons (@pxref{Binary Strings}).  It also might
1022cut down on the number of string comparisons made during the lookup, since
1023keywords with different lengths are never compared via @code{strcmp}.
1024However, using @samp{-l} might greatly increase the size of the
1025generated C code if the lookup table range is large (which implies that
1026the switch option @samp{-S} or @samp{%switch} is not enabled), since the length
1027table contains as many elements as there are entries in the lookup table.
1028
1029@item -c
1030@itemx --compare-strncmp
1031Generates C code that uses the @code{strncmp} function to perform
1032string comparisons.  The default action is to use @code{strcmp}.
1033
1034@item -C
1035@itemx --readonly-tables
1036Makes the contents of all generated lookup tables constant, i.e.,
1037``readonly''.  Many compilers can generate more efficient code for this
1038by putting the tables in readonly memory.
1039
1040@item -E
1041@itemx  --enum
1042Define constant values using an enum local to the lookup function rather
1043than with #defines.  This also means that different lookup functions can
1044reside in the same file.  Thanks to James Clark @code{<jjc@@ai.mit.edu>}.
1045
1046@item -I
1047@itemx --includes
1048Include the necessary system include file, @code{<string.h>}, at the
1049beginning of the code.  By default, this is not done; the user must
1050include this header file himself to allow compilation of the code.
1051
1052@item -G
1053@itemx --global-table
1054Generate the static table of keywords as a static global variable,
1055rather than hiding it inside of the lookup function (which is the
1056default behavior).
1057
1058@item -P
1059@itemx --pic
1060Optimize the generated table for inclusion in shared libraries.  This
1061reduces the startup time of programs using a shared library containing
1062the generated code.  If the option @samp{-t} (or, equivalently, the
1063@samp{%struct-type} declaration) is also given, the first field of the
1064user-defined struct must be of type @samp{int}, not @samp{char *}, because
1065it will contain offsets into the string pool instead of actual strings.
1066To convert such an offset to a string, you can use the expression
1067@samp{stringpool + @var{o}}, where @var{o} is the offset.  The string pool
1068name can be changed through the option @samp{--string-pool-name}.
1069
1070@item -Q @var{string-pool-name}
1071@itemx --string-pool-name=@var{string-pool-name}
1072Allows you to specify the name of the generated string pool created by
1073option @samp{-P}.  The default name is @samp{stringpool}.  This option
1074permits the use of two hash tables in the same file, with @samp{-P} and
1075even when the option @samp{-G} (or, equivalently, the @samp{%global-table}
1076declaration) is given.
1077
1078@item --null-strings
1079Use NULL strings instead of empty strings for empty keyword table entries.
1080This reduces the startup time of programs using a shared library containing
1081the generated code (but not as much as option @samp{-P}), at the expense
1082of one more test-and-branch instruction at run time.
1083
1084@item -W @var{hash-table-array-name}
1085@itemx --word-array-name=@var{hash-table-array-name}
1086@cindex Array name
1087Allows you to specify the name for the generated array containing the
1088hash table.  Default name is @samp{wordlist}.  This option permits the
1089use of two hash tables in the same file, even when the option @samp{-G}
1090(or, equivalently, the @samp{%global-table} declaration) is given.
1091
1092@itemx --length-table-name=@var{length-table-array-name}
1093@cindex Array name
1094Allows you to specify the name for the generated array containing the
1095length table.  Default name is @samp{lengthtable}.  This option permits the
1096use of two length tables in the same file, even when the option @samp{-G}
1097(or, equivalently, the @samp{%global-table} declaration) is given.
1098
1099@item -S @var{total-switch-statements}
1100@itemx --switch=@var{total-switch-statements}
1101@cindex @code{switch}
1102Causes the generated C code to use a @code{switch} statement scheme,
1103rather than an array lookup table.  This can lead to a reduction in both
1104time and space requirements for some input files.  The argument to this
1105option determines how many @code{switch} statements are generated.  A
1106value of 1 generates 1 @code{switch} containing all the elements, a
1107value of 2 generates 2 tables with 1/2 the elements in each
1108@code{switch}, etc.  This is useful since many C compilers cannot
1109correctly generate code for large @code{switch} statements.  This option
1110was inspired in part by Keith Bostic's original C program.
1111
1112@item -T
1113@itemx --omit-struct-type
1114Prevents the transfer of the type declaration to the output file.  Use
1115this option if the type is already defined elsewhere.
1116
1117@item -p
1118This option is supported for compatibility with previous releases of
1119@code{gperf}.  It does not do anything.
1120@end table
1121
1122@node Algorithmic Details, Verbosity, Output Details, Options
1123@section Options for changing the Algorithms employed by @code{gperf}
1124
1125@table @samp
1126@item -k @var{selected-byte-positions}
1127@itemx --key-positions=@var{selected-byte-positions}
1128Allows selection of the byte positions used in the keywords'
1129hash function.  The allowable choices range between 1-255, inclusive.
1130The positions are separated by commas, e.g., @samp{-k 9,4,13,14};
1131ranges may be used, e.g., @samp{-k 2-7}; and positions may occur
1132in any order.  Furthermore, the wildcard '*' causes the generated
1133hash function to consider @strong{all} byte positions in each keyword,
1134whereas '$' instructs the hash function to use the ``final byte''
1135of a keyword (this is the only way to use a byte position greater than
1136255, incidentally).
1137
1138For instance, the option @samp{-k 1,2,4,6-10,'$'} generates a hash
1139function that considers positions 1,2,4,6,7,8,9,10, plus the last
1140byte in each keyword (which may be at a different position for each
1141keyword, obviously).  Keywords
1142with length less than the indicated byte positions work properly, since
1143selected byte positions exceeding the keyword length are simply not
1144referenced in the hash function.
1145
1146This option is not normally needed since version 2.8 of @code{gperf};
1147the default byte positions are computed depending on the keyword set,
1148through a search that minimizes the number of byte positions.
1149
1150@item -D
1151@itemx --duplicates
1152@cindex Duplicates
1153Handle keywords whose selected byte sets hash to duplicate values.
1154Duplicate hash values can occur if a set of keywords has the same names, but
1155possesses different attributes, or if the selected byte positions are not well
1156chosen.  With the -D option @code{gperf} treats all these keywords as
1157part of an equivalence class and generates a perfect hash function with
1158multiple comparisons for duplicate keywords.  It is up to you to completely
1159disambiguate the keywords by modifying the generated C code.  However,
1160@code{gperf} helps you out by organizing the output.
1161
1162Using this option usually means that the generated hash function is no
1163longer perfect.  On the other hand, it permits @code{gperf} to work on
1164keyword sets that it otherwise could not handle.
1165
1166@item -m @var{iterations}
1167@itemx --multiple-iterations=@var{iterations}
1168Perform multiple choices of the @samp{-i} and @samp{-j} values, and
1169choose the best results.  This increases the running time by a factor of
1170@var{iterations} but does a good job minimizing the generated table size.
1171
1172@item -i @var{initial-value}
1173@itemx --initial-asso=@var{initial-value}
1174Provides an initial @var{value} for the associate values array.  Default
1175is 0.  Increasing the initial value helps inflate the final table size,
1176possibly leading to more time efficient keyword lookups.  Note that this
1177option is not particularly useful when @samp{-S} (or, equivalently,
1178@samp{%switch}) is used.  Also,
1179@samp{-i} is overridden when the @samp{-r} option is used.
1180
1181@item -j @var{jump-value}
1182@itemx --jump=@var{jump-value}
1183@cindex Jump value
1184Affects the ``jump value'', i.e., how far to advance the associated
1185byte value upon collisions.  @var{Jump-value} is rounded up to an
1186odd number, the default is 5.  If the @var{jump-value} is 0 @code{gperf}
1187jumps by random amounts.
1188
1189@item -n
1190@itemx --no-strlen
1191Instructs the generator not to include the length of a keyword when
1192computing its hash value.  This may save a few assembly instructions in
1193the generated lookup table.
1194
1195@item -r
1196@itemx --random
1197Utilizes randomness to initialize the associated values table.  This
1198frequently generates solutions faster than using deterministic
1199initialization (which starts all associated values at 0).  Furthermore,
1200using the randomization option generally increases the size of the
1201table.
1202
1203@item -s @var{size-multiple}
1204@itemx --size-multiple=@var{size-multiple}
1205Affects the size of the generated hash table.  The numeric argument for
1206this option indicates ``how many times larger or smaller'' the maximum
1207associated value range should be, in relationship to the number of keywords.
1208It can be written as an integer, a floating-point number or a fraction.
1209For example, a value of 3 means ``allow the maximum associated value to be
1210about 3 times larger than the number of input keywords''.
1211Conversely, a value of 1/3 means ``allow the maximum associated value to
1212be about 3 times smaller than the number of input keywords''.  Values
1213smaller than 1 are useful for limiting the overall size of the generated hash
1214table, though the option @samp{-m} is better at this purpose.
1215
1216If `generate switch' option @samp{-S} (or, equivalently, @samp{%switch}) is
1217@emph{not} enabled, the maximum
1218associated value influences the static array table size, and a larger
1219table should decrease the time required for an unsuccessful search, at
1220the expense of extra table space.
1221
1222The default value is 1, thus the default maximum associated value about
1223the same size as the number of keywords (for efficiency, the maximum
1224associated value is always rounded up to a power of 2).  The actual
1225table size may vary somewhat, since this technique is essentially a
1226heuristic.
1227@end table
1228
1229@node Verbosity,  , Algorithmic Details, Options
1230@section Informative Output
1231
1232@table @samp
1233@item -h
1234@itemx --help
1235Prints a short summary on the meaning of each program option.  Aborts
1236further program execution.
1237
1238@item -v
1239@itemx --version
1240Prints out the current version number.
1241
1242@item -d
1243@itemx --debug
1244Enables the debugging option.  This produces verbose diagnostics to
1245``standard error'' when @code{gperf} is executing.  It is useful both for
1246maintaining the program and for determining whether a given set of
1247options is actually speeding up the search for a solution.  Some useful
1248information is dumped at the end of the program when the @samp{-d}
1249option is enabled.
1250@end table
1251
1252@node Bugs, Projects, Options, Top
1253@chapter Known Bugs and Limitations with @code{gperf}
1254
1255The following are some limitations with the current release of
1256@code{gperf}:
1257
1258@itemize @bullet
1259@item
1260The @code{gperf} utility is tuned to execute quickly, and works quickly
1261for small to medium size data sets (around 1000 keywords).  It is
1262extremely useful for maintaining perfect hash functions for compiler
1263keyword sets.  Several recent enhancements now enable @code{gperf} to
1264work efficiently on much larger keyword sets (over 15,000 keywords).
1265When processing large keyword sets it helps greatly to have over 8 megs
1266of RAM.
1267
1268@item 
1269The size of the generate static keyword array can get @emph{extremely}
1270large if the input keyword file is large or if the keywords are quite
1271similar.  This tends to slow down the compilation of the generated C
1272code, and @emph{greatly} inflates the object code size.  If this
1273situation occurs, consider using the @samp{-S} option to reduce data
1274size, potentially increasing keyword recognition time a negligible
1275amount.  Since many C compilers cannot correctly generate code for
1276large switch statements it is important to qualify the @var{-S} option
1277with an appropriate numerical argument that controls the number of
1278switch statements generated.
1279
1280@item 
1281The maximum number of selected byte positions has an
1282arbitrary limit of 255.  This restriction should be removed, and if
1283anyone considers this a problem write me and let me know so I can remove
1284the constraint.
1285@end itemize
1286
1287@node Projects, Bibliography, Bugs, Top
1288@chapter Things Still Left to Do
1289
1290It should be ``relatively'' easy to replace the current perfect hash
1291function algorithm with a more exhaustive approach; the perfect hash
1292module is essential independent from other program modules.  Additional
1293worthwhile improvements include:
1294
1295@itemize @bullet
1296@item 
1297Another useful extension involves modifying the program to generate
1298``minimal'' perfect hash functions (under certain circumstances, the
1299current version can be rather extravagant in the generated table size).
1300This is mostly of theoretical interest, since a sparse table
1301often produces faster lookups, and use of the @samp{-S} @code{switch}
1302option can minimize the data size, at the expense of slightly longer
1303lookups (note that the gcc compiler generally produces good code for
1304@code{switch} statements, reducing the need for more complex schemes).
1305
1306@item
1307In addition to improving the algorithm, it would also be useful to
1308generate an Ada package as the code output, in addition to the current
1309C and C++ routines.
1310@end itemize
1311
1312@page
1313
1314@node Bibliography, Concept Index, Projects, Top
1315@chapter Bibliography
1316
1317[1] Chang, C.C.: @i{A Scheme for Constructing Ordered Minimal Perfect
1318Hashing Functions} Information Sciences 39(1986), 187-195.
1319
1320[2] Cichelli, Richard J. @i{Author's Response to ``On Cichelli's Minimal Perfect Hash
1321Functions Method''} Communications of the ACM, 23, 12(December 1980), 729.
1322
1323[3] Cichelli, Richard J. @i{Minimal Perfect Hash Functions Made Simple}
1324Communications of the ACM, 23, 1(January 1980), 17-19.
1325
1326[4] Cook, C. R. and Oldehoeft, R.R. @i{A Letter Oriented Minimal
1327Perfect Hashing Function} SIGPLAN Notices, 17, 9(September 1982), 18-27.
1328
1329[5] Cormack, G. V. and Horspool, R. N. S. and Kaiserwerth, M.
1330@i{Practical Perfect Hashing} Computer Journal, 28, 1(January 1985), 54-58.
1331
1332[6] Jaeschke, G. @i{Reciprocal Hashing: A Method for Generating Minimal
1333Perfect Hashing Functions} Communications of the ACM, 24, 12(December
13341981), 829-833.
1335
1336[7] Jaeschke, G. and Osterburg, G. @i{On Cichelli's Minimal Perfect
1337Hash Functions Method} Communications of the ACM, 23, 12(December 1980),
1338728-729.
1339
1340[8] Sager, Thomas J. @i{A Polynomial Time Generator for Minimal Perfect
1341Hash Functions} Communications of the ACM, 28, 5(December 1985), 523-532
1342
1343[9] Schmidt, Douglas C. @i{GPERF: A Perfect Hash Function Generator}
1344Second USENIX C++ Conference Proceedings, April 1990.
1345
1346[10] Schmidt, Douglas C. @i{GPERF: A Perfect Hash Function Generator}
1347C++ Report, SIGS 10 10 (November/December 1998).
1348
1349[11] Sebesta, R.W. and Taylor, M.A. @i{Minimal Perfect Hash Functions
1350for Reserved Word Lists}  SIGPLAN Notices, 20, 12(September 1985), 47-53.
1351
1352[12] Sprugnoli, R. @i{Perfect Hashing Functions: A Single Probe
1353Retrieving Method for Static Sets} Communications of the ACM, 20
135411(November 1977), 841-850.
1355
1356[13] Stallman, Richard M. @i{Using and Porting GNU CC} Free Software Foundation,
13571988.
1358
1359[14] Stroustrup, Bjarne @i{The C++ Programming Language.} Addison-Wesley, 1986.
1360
1361[15] Tiemann, Michael D. @i{User's Guide to GNU C++} Free Software
1362Foundation, 1989.
1363
1364@node Concept Index,  , Bibliography, Top
1365@unnumbered Concept Index
1366
1367@printindex cp
1368
1369@contents
1370@bye
1371