1\input texinfo @c -*- texinfo -*-
2@c %**start of header
3@setfilename gperf.info
4@c The @ifset makeinfo ... @end ifset conditional evaluates to true in makeinfo
5@c for info and html output, but to false in texi2html.
6@ifnottex
7@ifclear texi2html
8@set makeinfo
9@end ifclear
10@end ifnottex
11@settitle Perfect Hash Function Generator
12@ifclear texi2html
13@firstparagraphindent insert
14@end ifclear
15@c @setchapternewpage odd
16@c %**end of header
17
18@c some day we should @include version.texi instead of defining
19@c these values at hand.
20@set UPDATED 1 February 2009
21@set EDITION 3.0.4
22@set VERSION 3.0.4
23@c ---------------------
24
25@c remove the black boxes generated in the GPL appendix.
26@finalout
27
28@c Merge functions into the concept index
29@syncodeindex fn cp
30@c @synindex pg cp
31
32@ifinfo
33@dircategory Programming Tools
34@direntry
35* Gperf: (gperf).                Perfect Hash Function Generator.
36@end direntry
37@end ifinfo
38
39@ifinfo
40This file documents the features of the GNU Perfect Hash Function
41Generator @value{VERSION}.
42
43Copyright @copyright{} 1989-2009 Free Software Foundation, Inc.
44
45Permission is granted to make and distribute verbatim copies of this
46manual provided the copyright notice and this permission notice are
47preserved on all copies.
48
49@ignore
50Permission is granted to process this file through TeX and print the
51results, provided the printed document carries a copying permission
52notice identical to this one except for the removal of this paragraph
53(this paragraph not being relevant to the printed manual).
54
55@end ignore
56
57Permission is granted to copy and distribute modified versions of this
58manual under the conditions for verbatim copying, provided also that the
59section entitled ``GNU General Public License'' is included exactly as
60in the original, and provided that the entire resulting derived work is
61distributed under the terms of a permission notice identical to this
62one.
63
64Permission is granted to copy and distribute translations of this manual
65into another language, under the above conditions for modified versions,
66except that the section entitled ``GNU General Public License'' and this
67permission notice may be included in translations approved by the Free
68Software Foundation instead of in the original English.
69
70@end ifinfo
71
72@titlepage
73@title User's Guide to @code{gperf} @value{VERSION}
74@subtitle The GNU Perfect Hash Function Generator
75@subtitle Edition @value{EDITION}, @value{UPDATED}
76@author Douglas C. Schmidt
77@author Bruno Haible
78
79@page
80@vskip 0pt plus 1filll
81Copyright @copyright{} 1989-2009 Free Software Foundation, Inc.
82
83
84Permission is granted to make and distribute verbatim copies of
85this manual provided the copyright notice and this permission notice
86are preserved on all copies.
87
88Permission is granted to copy and distribute modified versions of this
89manual under the conditions for verbatim copying, provided also that the
90section entitled ``GNU General Public License'' is included
91exactly as in the original, and provided that the entire resulting
92derived work is distributed under the terms of a permission notice
93identical to this one.
94
95Permission is granted to copy and distribute translations of this manual
96into another language, under the above conditions for modified versions,
97except that the section entitled ``GNU General Public License'' may be
98included in a translation approved by the author instead of in the
99original English.
100@end titlepage
101
102@ifnottex
103@c Table of Contents
104@contents
105@end ifnottex
106
107@ifset makeinfo
108@node Top, Copying, (dir), (dir)
109@top Introduction
110
111This manual documents the GNU @code{gperf} perfect hash function generator
112utility, focusing on its features and how to use them, and how to report
113bugs.
114
115@menu
116* Copying::                     GNU General Public License says how you can
117                                copy and share @code{gperf}.
118* Contributors::                People who have contributed to @code{gperf}.
119* Motivation::                  The purpose of @code{gperf}.
120* Search Structures::           Static search structures and GNU @code{gperf}
121* Description::                 High-level discussion of how GPERF functions.
122* Options::                     A description of options to the program.
123* Bugs::                        Known bugs and limitations with GPERF.
124* Projects::                    Things still left to do.
125* Bibliography::                Material Referenced in this Report.
126
127* Concept Index::               
128
129@detailmenu --- The Detailed Node Listing ---
130
131High-Level Description of GNU @code{gperf}
132
133* Input Format::                Input Format to @code{gperf}
134* Output Format::               Output Format for Generated C Code with @code{gperf}
135* Binary Strings::              Use of NUL bytes
136* Output Copyright::            The Copyright of the Output.
137
138Input Format to @code{gperf}
139
140* Declarations::                Declarations.
141* Keywords::                    Format for Keyword Entries.
142* Functions::                   Including Additional C Functions.
143* Controls for GNU indent::     Where to place directives for GNU @code{indent}.
144
145Declarations
146
147* User-supplied Struct::        Specifying keywords with attributes.
148* Gperf Declarations::          Embedding command line options in the input.
149* C Code Inclusion::            Including C declarations and definitions.
150
151Invoking @code{gperf}
152
153* Input Details::               Options that affect Interpretation of the Input File
154* Output Language::             Specifying the Language for the Output Code
155* Output Details::              Fine tuning Details in the Output Code
156* Algorithmic Details::         Changing the Algorithms employed by @code{gperf}
157* Verbosity::                   Informative Output
158
159@end detailmenu
160@end menu
161
162@end ifset
163
164@node Copying, Contributors, Top, Top
165@unnumbered GNU GENERAL PUBLIC LICENSE
166@include gpl-3.0.texi
167
168@node Contributors, Motivation, Copying, Top
169@unnumbered Contributors to GNU @code{gperf} Utility
170
171@itemize @bullet
172@item
173@cindex Bugs 
174The GNU @code{gperf} perfect hash function generator utility was
175written in GNU C++ by Douglas C. Schmidt.  The general
176idea for the perfect hash function generator was inspired by Keith
177Bostic's algorithm written in C, and distributed to net.sources around
1781984.  The current program is a heavily modified, enhanced, and extended
179implementation of Keith's basic idea, created at the University of
180California, Irvine.  Bugs, patches, and suggestions should be reported
181to @code{<bug-gnu-gperf@@gnu.org>}.
182
183@item
184Special thanks is extended to Michael Tiemann and Doug Lea, for
185providing a useful compiler, and for giving me a forum to exhibit my
186creation.
187
188In addition, Adam de Boor and Nels Olson provided many tips and insights
189that greatly helped improve the quality and functionality of @code{gperf}.
190
191@item
192Bruno Haible enhanced and optimized the search algorithm.  He also rewrote
193the input routines and the output routines for better reliability, and
194added a testsuite.
195@end itemize
196
197@node Motivation, Search Structures, Contributors, Top
198@chapter Introduction
199
200@code{gperf} is a perfect hash function generator written in C++.  It
201transforms an @var{n} element user-specified keyword set @var{W} into a
202perfect hash function @var{F}.  @var{F} uniquely maps keywords in
203@var{W} onto the range 0..@var{k}, where @var{k} >= @var{n-1}.  If @var{k}
204= @var{n-1} then @var{F} is a @emph{minimal} perfect hash function.
205@code{gperf} generates a 0..@var{k} element static lookup table and a
206pair of C functions.  These functions determine whether a given
207character string @var{s} occurs in @var{W}, using at most one probe into
208the lookup table.
209
210@code{gperf} currently generates the reserved keyword recognizer for
211lexical analyzers in several production and research compilers and
212language processing tools, including GNU C, GNU C++, GNU Java, GNU Pascal,
213GNU Modula 3, and GNU indent.  Complete C++ source code for @code{gperf} is
214available from @code{http://ftp.gnu.org/pub/gnu/gperf/}.
215A paper describing @code{gperf}'s design and implementation in greater
216detail is available in the Second USENIX C++ Conference proceedings
217or from @code{http://www.cs.wustl.edu/~schmidt/resume.html}.
218
219@node Search Structures, Description, Motivation, Top
220@chapter Static search structures and GNU @code{gperf}
221@cindex Static search structure
222
223A @dfn{static search structure} is an Abstract Data Type with certain
224fundamental operations, e.g., @emph{initialize}, @emph{insert},
225and @emph{retrieve}.  Conceptually, all insertions occur before any
226retrievals.  In practice, @code{gperf} generates a @emph{static} array
227containing search set keywords and any associated attributes specified
228by the user.  Thus, there is essentially no execution-time cost for the
229insertions.  It is a useful data structure for representing @emph{static
230search sets}.  Static search sets occur frequently in software system
231applications.  Typical static search sets include compiler reserved
232words, assembler instruction opcodes, and built-in shell interpreter
233commands.  Search set members, called @dfn{keywords}, are inserted into
234the structure only once, usually during program initialization, and are
235not generally modified at run-time.
236
237Numerous static search structure implementations exist, e.g.,
238arrays, linked lists, binary search trees, digital search tries, and
239hash tables.  Different approaches offer trade-offs between space
240utilization and search time efficiency.  For example, an @var{n} element
241sorted array is space efficient, though the average-case time
242complexity for retrieval operations using binary search is
243proportional to log @var{n}.  Conversely, hash table implementations
244often locate a table entry in constant time, but typically impose
245additional memory overhead and exhibit poor worst case performance.
246
247@cindex Minimal perfect hash functions
248@emph{Minimal perfect hash functions} provide an optimal solution for a
249particular class of static search sets.  A minimal perfect hash
250function is defined by two properties:
251
252@itemize @bullet
253@item 
254It allows keyword recognition in a static search set using at most
255@emph{one} probe into the hash table.  This represents the ``perfect''
256property.
257@item 
258The actual memory allocated to store the keywords is precisely large
259enough for the keyword set, and @emph{no larger}.  This is the
260``minimal'' property.
261@end itemize
262
263For most applications it is far easier to generate @emph{perfect} hash
264functions than @emph{minimal perfect} hash functions.  Moreover,
265non-minimal perfect hash functions frequently execute faster than
266minimal ones in practice.  This phenomena occurs since searching a
267sparse keyword table increases the probability of locating a ``null''
268entry, thereby reducing string comparisons.  @code{gperf}'s default
269behavior generates @emph{near-minimal} perfect hash functions for
270keyword sets.  However, @code{gperf} provides many options that permit
271user control over the degree of minimality and perfection.
272
273Static search sets often exhibit relative stability over time.  For
274example, Ada's 63 reserved words have remained constant for nearly a
275decade.  It is therefore frequently worthwhile to expend concerted
276effort building an optimal search structure @emph{once}, if it
277subsequently receives heavy use multiple times.  @code{gperf} removes
278the drudgery associated with constructing time- and space-efficient
279search structures by hand.  It has proven a useful and practical tool
280for serious programming projects.  Output from @code{gperf} is currently
281used in several production and research compilers, including GNU C, GNU
282C++, GNU Java, GNU Pascal, and GNU Modula 3.  The latter two compilers are
283not yet part of the official GNU distribution.  Each compiler utilizes
284@code{gperf} to automatically generate static search structures that
285efficiently identify their respective reserved keywords.
286
287@node Description, Options, Search Structures, Top
288@chapter High-Level Description of GNU @code{gperf}
289
290@menu
291* Input Format::                Input Format to @code{gperf}
292* Output Format::               Output Format for Generated C Code with @code{gperf}
293* Binary Strings::              Use of NUL bytes
294* Output Copyright::            The Copyright of the Output.
295@end menu
296
297The perfect hash function generator @code{gperf} reads a set of
298``keywords'' from an input file (or from the standard input by
299default).  It attempts to derive a perfect hashing function that
300recognizes a member of the @dfn{static keyword set} with at most a
301single probe into the lookup table.  If @code{gperf} succeeds in
302generating such a function it produces a pair of C source code routines
303that perform hashing and table lookup recognition.  All generated C code
304is directed to the standard output.  Command-line options described
305below allow you to modify the input and output format to @code{gperf}.
306
307By default, @code{gperf} attempts to produce time-efficient code, with
308less emphasis on efficient space utilization.  However, several options
309exist that permit trading-off execution time for storage space and vice
310versa.  In particular, expanding the generated table size produces a
311sparse search structure, generally yielding faster searches.
312Conversely, you can direct @code{gperf} to utilize a C @code{switch}
313statement scheme that minimizes data space storage size.  Furthermore,
314using a C @code{switch} may actually speed up the keyword retrieval time
315somewhat.  Actual results depend on your C compiler, of course.
316
317In general, @code{gperf} assigns values to the bytes it is using
318for hashing until some set of values gives each keyword a unique value.
319A helpful heuristic is that the larger the hash value range, the easier
320it is for @code{gperf} to find and generate a perfect hash function.
321Experimentation is the key to getting the most from @code{gperf}.
322
323@node Input Format, Output Format, Description, Description
324@section Input Format to @code{gperf}
325@cindex Format
326@cindex Declaration section
327@cindex Keywords section
328@cindex Functions section
329You can control the input file format by varying certain command-line
330arguments, in particular the @samp{-t} option.  The input's appearance
331is similar to GNU utilities @code{flex} and @code{bison} (or UNIX
332utilities @code{lex} and @code{yacc}).  Here's an outline of the general
333format:
334
335@example
336@group
337declarations
338%%
339keywords
340%%
341functions
342@end group
343@end example
344
345@emph{Unlike} @code{flex} or @code{bison}, the declarations section and
346the functions section are optional.  The following sections describe the
347input format for each section.
348
349@menu
350* Declarations::                Declarations.
351* Keywords::                    Format for Keyword Entries.
352* Functions::                   Including Additional C Functions.
353* Controls for GNU indent::     Where to place directives for GNU @code{indent}.
354@end menu
355
356It is possible to omit the declaration section entirely, if the @samp{-t}
357option is not given.  In this case the input file begins directly with the
358first keyword line, e.g.:
359
360@example
361@group
362january
363february
364march
365april
366...
367@end group
368@end example
369
370@node Declarations, Keywords, Input Format, Input Format
371@subsection Declarations
372
373The keyword input file optionally contains a section for including
374arbitrary C declarations and definitions, @code{gperf} declarations that
375act like command-line options, as well as for providing a user-supplied
376@code{struct}.
377
378@menu
379* User-supplied Struct::        Specifying keywords with attributes.
380* Gperf Declarations::          Embedding command line options in the input.
381* C Code Inclusion::            Including C declarations and definitions.
382@end menu
383
384@node User-supplied Struct, Gperf Declarations, Declarations, Declarations
385@subsubsection User-supplied @code{struct}
386
387If the @samp{-t} option (or, equivalently, the @samp{%struct-type} declaration)
388@emph{is} enabled, you @emph{must} provide a C @code{struct} as the last
389component in the declaration section from the input file.  The first
390field in this struct must be of type @code{char *} or @code{const char *}
391if the @samp{-P} option is not given, or of type @code{int} if the option
392@samp{-P} (or, equivalently, the @samp{%pic} declaration) is enabled.
393This first field must be called @samp{name}, although it is possible to modify
394its name with the @samp{-K} option (or, equivalently, the
395@samp{%define slot-name} declaration) described below.
396
397Here is a simple example, using months of the year and their attributes as
398input:
399
400@example
401@group
402struct month @{ char *name; int number; int days; int leap_days; @};
403%%
404january,   1, 31, 31
405february,  2, 28, 29
406march,     3, 31, 31
407april,     4, 30, 30
408may,       5, 31, 31
409june,      6, 30, 30
410july,      7, 31, 31
411august,    8, 31, 31
412september, 9, 30, 30
413october,  10, 31, 31
414november, 11, 30, 30
415december, 12, 31, 31
416@end group
417@end example
418
419@cindex @samp{%%}
420Separating the @code{struct} declaration from the list of keywords and
421other fields are a pair of consecutive percent signs, @samp{%%},
422appearing left justified in the first column, as in the UNIX utility
423@code{lex}.
424
425If the @code{struct} has already been declared in an include file, it can
426be mentioned in an abbreviated form, like this:
427
428@example
429@group
430struct month;
431%%
432january,   1, 31, 31
433...
434@end group
435@end example
436
437@node Gperf Declarations, C Code Inclusion, User-supplied Struct, Declarations
438@subsubsection Gperf Declarations
439
440The declaration section can contain @code{gperf} declarations.  They
441influence the way @code{gperf} works, like command line options do.
442In fact, every such declaration is equivalent to a command line option.
443There are three forms of declarations:
444
445@enumerate
446@item
447Declarations without argument, like @samp{%compare-lengths}.
448
449@item
450Declarations with an argument, like @samp{%switch=@var{count}}.
451
452@item
453Declarations of names of entities in the output file, like
454@samp{%define lookup-function-name @var{name}}.
455@end enumerate
456
457When a declaration is given both in the input file and as a command line
458option, the command-line option's value prevails.
459
460The following @code{gperf} declarations are available.
461
462@table @samp
463@item %delimiters=@var{delimiter-list}
464@cindex @samp{%delimiters}
465Allows you to provide a string containing delimiters used to
466separate keywords from their attributes.  The default is ",".  This
467option is essential if you want to use keywords that have embedded
468commas or newlines.
469
470@item %struct-type
471@cindex @samp{%struct-type}
472Allows you to include a @code{struct} type declaration for generated
473code; see above for an example.
474
475@item %ignore-case
476@cindex @samp{%ignore-case}
477Consider upper and lower case ASCII characters as equivalent.  The string
478comparison will use a case insignificant character comparison.  Note that
479locale dependent case mappings are ignored.
480
481@item %language=@var{language-name}
482@cindex @samp{%language}
483Instructs @code{gperf} to generate code in the language specified by the
484option's argument.  Languages handled are currently:
485
486@table @samp
487@item KR-C
488Old-style K&R C.  This language is understood by old-style C compilers and
489ANSI C compilers, but ANSI C compilers may flag warnings (or even errors)
490because of lacking @samp{const}.
491
492@item C
493Common C.  This language is understood by ANSI C compilers, and also by
494old-style C compilers, provided that you @code{#define const} to empty
495for compilers which don't know about this keyword.
496
497@item ANSI-C
498ANSI C.  This language is understood by ANSI C (C89, ISO C90) compilers,
499ISO C99 compilers, and C++ compilers.
500
501@item C++
502C++.  This language is understood by C++ compilers.
503@end table
504
505The default is C.
506
507@item %define slot-name @var{name}
508@cindex @samp{%define slot-name}
509This declaration is only useful when option @samp{-t} (or, equivalently, the
510@samp{%struct-type} declaration) has been given.
511By default, the program assumes the structure component identifier for
512the keyword is @samp{name}.  This option allows an arbitrary choice of
513identifier for this component, although it still must occur as the first
514field in your supplied @code{struct}.
515
516@item %define initializer-suffix @var{initializers}
517@cindex @samp{%define initializer-suffix}
518This declaration is only useful when option @samp{-t} (or, equivalently, the
519@samp{%struct-type} declaration) has been given.
520It permits to specify initializers for the structure members following
521@var{slot-name} in empty hash table entries.  The list of initializers
522should start with a comma.  By default, the emitted code will
523zero-initialize structure members following @var{slot-name}.
524
525@item %define hash-function-name @var{name}
526@cindex @samp{%define hash-function-name}
527Allows you to specify the name for the generated hash function.  Default
528name is @samp{hash}.  This option permits the use of two hash tables in
529the same file.
530
531@item %define lookup-function-name @var{name}
532@cindex @samp{%define lookup-function-name}
533Allows you to specify the name for the generated lookup function.
534Default name is @samp{in_word_set}.  This option permits multiple
535generated hash functions to be used in the same application.
536
537@item %define class-name @var{name}
538@cindex @samp{%define class-name}
539This option is only useful when option @samp{-L C++} (or, equivalently,
540the @samp{%language=C++} declaration) has been given.  It
541allows you to specify the name of generated C++ class.  Default name is
542@code{Perfect_Hash}.
543
544@item %7bit
545@cindex @samp{%7bit}
546This option specifies that all strings that will be passed as arguments
547to the generated hash function and the generated lookup function will
548solely consist of 7-bit ASCII characters (bytes in the range 0..127).
549(Note that the ANSI C functions @code{isalnum} and @code{isgraph} do
550@emph{not} guarantee that a byte is in this range.  Only an explicit
551test like @samp{c >= 'A' && c <= 'Z'} guarantees this.)
552
553@item %compare-lengths
554@cindex @samp{%compare-lengths}
555Compare keyword lengths before trying a string comparison.  This option
556is mandatory for binary comparisons (@pxref{Binary Strings}).  It also might
557cut down on the number of string comparisons made during the lookup, since
558keywords with different lengths are never compared via @code{strcmp}.
559However, using @samp{%compare-lengths} might greatly increase the size of the
560generated C code if the lookup table range is large (which implies that
561the switch option @samp{-S} or @samp{%switch} is not enabled), since the length
562table contains as many elements as there are entries in the lookup table.
563
564@item %compare-strncmp
565@cindex @samp{%compare-strncmp}
566Generates C code that uses the @code{strncmp} function to perform
567string comparisons.  The default action is to use @code{strcmp}.
568
569@item %readonly-tables
570@cindex @samp{%readonly-tables}
571Makes the contents of all generated lookup tables constant, i.e.,
572``readonly''.  Many compilers can generate more efficient code for this
573by putting the tables in readonly memory.
574
575@item %enum
576@cindex @samp{%enum}
577Define constant values using an enum local to the lookup function rather
578than with #defines.  This also means that different lookup functions can
579reside in the same file.  Thanks to James Clark @code{<jjc@@ai.mit.edu>}.
580
581@item %includes
582@cindex @samp{%includes}
583Include the necessary system include file, @code{<string.h>}, at the
584beginning of the code.  By default, this is not done; the user must
585include this header file himself to allow compilation of the code.
586
587@item %global-table
588@cindex @samp{%global-table}
589Generate the static table of keywords as a static global variable,
590rather than hiding it inside of the lookup function (which is the
591default behavior).
592
593@item %pic
594@cindex @samp{%pic}
595Optimize the generated table for inclusion in shared libraries.  This
596reduces the startup time of programs using a shared library containing
597the generated code.  If the @samp{%struct-type} declaration (or,
598equivalently, the option @samp{-t}) is also given, the first field of the
599user-defined struct must be of type @samp{int}, not @samp{char *}, because
600it will contain offsets into the string pool instead of actual strings.
601To convert such an offset to a string, you can use the expression
602@samp{stringpool + @var{o}}, where @var{o} is the offset.  The string pool
603name can be changed through the @samp{%define string-pool-name} declaration.
604
605@item %define string-pool-name @var{name}
606@cindex @samp{%define string-pool-name}
607Allows you to specify the name of the generated string pool created by
608the declaration @samp{%pic} (or, equivalently, the option @samp{-P}).
609The default name is @samp{stringpool}.  This declaration permits the use of
610two hash tables in the same file, with @samp{%pic} and even when the
611@samp{%global-table} declaration (or, equivalently, the option @samp{-G})
612is given.
613
614@item %null-strings
615@cindex @samp{%null-strings}
616Use NULL strings instead of empty strings for empty keyword table entries.
617This reduces the startup time of programs using a shared library containing
618the generated code (but not as much as the declaration @samp{%pic}), at the
619expense of one more test-and-branch instruction at run time.
620
621@item %define word-array-name @var{name}
622@cindex @samp{%define word-array-name}
623Allows you to specify the name for the generated array containing the
624hash table.  Default name is @samp{wordlist}.  This option permits the
625use of two hash tables in the same file, even when the option @samp{-G}
626(or, equivalently, the @samp{%global-table} declaration) is given.
627
628@item %define length-table-name @var{name}
629@cindex @samp{%define length-table-name}
630Allows you to specify the name for the generated array containing the
631length table.  Default name is @samp{lengthtable}.  This option permits the
632use of two length tables in the same file, even when the option @samp{-G}
633(or, equivalently, the @samp{%global-table} declaration) is given.
634
635@item %switch=@var{count}
636@cindex @samp{%switch}
637Causes the generated C code to use a @code{switch} statement scheme,
638rather than an array lookup table.  This can lead to a reduction in both
639time and space requirements for some input files.  The argument to this
640option determines how many @code{switch} statements are generated.  A
641value of 1 generates 1 @code{switch} containing all the elements, a
642value of 2 generates 2 tables with 1/2 the elements in each
643@code{switch}, etc.  This is useful since many C compilers cannot
644correctly generate code for large @code{switch} statements.  This option
645was inspired in part by Keith Bostic's original C program.
646
647@item %omit-struct-type
648@cindex @samp{%omit-struct-type}
649Prevents the transfer of the type declaration to the output file.  Use
650this option if the type is already defined elsewhere.
651@end table
652
653@node C Code Inclusion,  , Gperf Declarations, Declarations
654@subsubsection C Code Inclusion
655
656@cindex @samp{%@{}
657@cindex @samp{%@}}
658Using a syntax similar to GNU utilities @code{flex} and @code{bison}, it
659is possible to directly include C source text and comments verbatim into
660the generated output file.  This is accomplished by enclosing the region
661inside left-justified surrounding @samp{%@{}, @samp{%@}} pairs.  Here is
662an input fragment based on the previous example that illustrates this
663feature:
664
665@example
666@group
667%@{
668#include <assert.h>
669/* This section of code is inserted directly into the output. */
670int return_month_days (struct month *months, int is_leap_year);
671%@}
672struct month @{ char *name; int number; int days; int leap_days; @};
673%%
674january,   1, 31, 31
675february,  2, 28, 29
676march,     3, 31, 31
677...
678@end group
679@end example
680
681@node Keywords, Functions, Declarations, Input Format
682@subsection Format for Keyword Entries
683
684The second input file format section contains lines of keywords and any
685associated attributes you might supply.  A line beginning with @samp{#}
686in the first column is considered a comment.  Everything following the
687@samp{#} is ignored, up to and including the following newline.  A line
688beginning with @samp{%} in the first column is an option declaration and
689must not occur within the keywords section.
690
691The first field of each non-comment line is always the keyword itself.  It
692can be given in two ways: as a simple name, i.e., without surrounding
693string quotation marks, or as a string enclosed in double-quotes, in
694C syntax, possibly with backslash escapes like @code{\"} or @code{\234}
695or @code{\xa8}.  In either case, it must start right at the beginning
696of the line, without leading whitespace.
697In this context, a ``field'' is considered to extend up to, but
698not include, the first blank, comma, or newline.  Here is a simple
699example taken from a partial list of C reserved words:
700
701@example
702@group
703# These are a few C reserved words, see the c.gperf file 
704# for a complete list of ANSI C reserved words.
705unsigned
706sizeof
707switch
708signed
709if
710default
711for
712while
713return
714@end group
715@end example
716
717Note that unlike @code{flex} or @code{bison} the first @samp{%%} marker
718may be elided if the declaration section is empty.
719
720Additional fields may optionally follow the leading keyword.  Fields
721should be separated by commas, and terminate at the end of line.  What
722these fields mean is entirely up to you; they are used to initialize the
723elements of the user-defined @code{struct} provided by you in the
724declaration section.  If the @samp{-t} option (or, equivalently, the
725@samp{%struct-type} declaration) is @emph{not} enabled
726these fields are simply ignored.  All previous examples except the last
727one contain keyword attributes.
728
729@node Functions, Controls for GNU indent, Keywords, Input Format
730@subsection Including Additional C Functions
731
732The optional third section also corresponds closely with conventions
733found in @code{flex} and @code{bison}.  All text in this section,
734starting at the final @samp{%%} and extending to the end of the input
735file, is included verbatim into the generated output file.  Naturally,
736it is your responsibility to ensure that the code contained in this
737section is valid C.
738
739@node Controls for GNU indent,  , Functions, Input Format
740@subsection Where to place directives for GNU @code{indent}.
741
742If you want to invoke GNU @code{indent} on a @code{gperf} input file,
743you will see that GNU @code{indent} doesn't understand the @samp{%%},
744@samp{%@{} and @samp{%@}} directives that control @code{gperf}'s
745interpretation of the input file.  Therefore you have to insert some
746directives for GNU @code{indent}.  More precisely, assuming the most
747general input file structure
748
749@example
750@group
751declarations part 1
752%@{
753verbatim code
754%@}
755declarations part 2
756%%
757keywords
758%%
759functions
760@end group
761@end example
762
763@noindent
764you would insert @samp{*INDENT-OFF*} and @samp{*INDENT-ON*} comments
765as follows:
766
767@example
768@group
769/* *INDENT-OFF* */
770declarations part 1
771%@{
772/* *INDENT-ON* */
773verbatim code
774/* *INDENT-OFF* */
775%@}
776declarations part 2
777%%
778keywords
779%%
780/* *INDENT-ON* */
781functions
782@end group
783@end example
784
785@node Output Format, Binary Strings, Input Format, Description
786@section Output Format for Generated C Code with @code{gperf}
787@cindex hash table
788
789Several options control how the generated C code appears on the standard 
790output.  Two C functions are generated.  They are called @code{hash} and 
791@code{in_word_set}, although you may modify their names with a command-line 
792option.  Both functions require two arguments, a string, @code{char *} 
793@var{str}, and a length parameter, @code{int} @var{len}.  Their default 
794function prototypes are as follows:
795
796@deftypefun {unsigned int} hash (const char * @var{str}, unsigned int @var{len})
797By default, the generated @code{hash} function returns an integer value
798created by adding @var{len} to several user-specified @var{str} byte
799positions indexed into an @dfn{associated values} table stored in a
800local static array.  The associated values table is constructed
801internally by @code{gperf} and later output as a static local C array
802called @samp{hash_table}.  The relevant selected positions (i.e. indices
803into @var{str}) are specified via the @samp{-k} option when running
804@code{gperf}, as detailed in the @emph{Options} section below (@pxref{Options}).
805@end deftypefun
806
807@deftypefun {} in_word_set (const char * @var{str}, unsigned int @var{len})
808If @var{str} is in the keyword set, returns a pointer to that
809keyword.  More exactly, if the option @samp{-t} (or, equivalently, the
810@samp{%struct-type} declaration) was given, it returns
811a pointer to the matching keyword's structure.  Otherwise it returns
812@code{NULL}.
813@end deftypefun
814
815If the option @samp{-c} (or, equivalently, the @samp{%compare-strncmp}
816declaration) is not used, @var{str} must be a NUL terminated
817string of exactly length @var{len}.  If @samp{-c} (or, equivalently, the
818@samp{%compare-strncmp} declaration) is used, @var{str} must
819simply be an array of @var{len} bytes and does not need to be NUL
820terminated.
821
822The code generated for these two functions is affected by the following
823options:
824
825@table @samp
826@item -t
827@itemx --struct-type
828Make use of the user-defined @code{struct}.
829
830@item -S @var{total-switch-statements}
831@itemx --switch=@var{total-switch-statements}
832@cindex @code{switch}
833Generate 1 or more C @code{switch} statement rather than use a large,
834(and potentially sparse) static array.  Although the exact time and
835space savings of this approach vary according to your C compiler's
836degree of optimization, this method often results in smaller and faster
837code.
838@end table
839
840If the @samp{-t} and @samp{-S} options (or, equivalently, the
841@samp{%struct-type} and @samp{%switch} declarations) are omitted, the default
842action
843is to generate a @code{char *} array containing the keywords, together with
844additional empty strings used for padding the array.  By experimenting
845with the various input and output options, and timing the resulting C
846code, you can determine the best option choices for different keyword
847set characteristics.
848
849@node Binary Strings, Output Copyright, Output Format, Description
850@section Use of NUL bytes
851@cindex NUL
852
853By default, the code generated by @code{gperf} operates on zero
854terminated strings, the usual representation of strings in C.  This means
855that the keywords in the input file must not contain NUL bytes,
856and the @var{str} argument passed to @code{hash} or @code{in_word_set}
857must be NUL terminated and have exactly length @var{len}.
858
859If option @samp{-c} (or, equivalently, the @samp{%compare-strncmp}
860declaration) is used, then the @var{str} argument does not need
861to be NUL terminated.  The code generated by @code{gperf} will only
862access the first @var{len}, not @var{len+1}, bytes starting at @var{str}.
863However, the keywords in the input file still must not contain NUL
864bytes.
865
866If option @samp{-l} (or, equivalently, the @samp{%compare-lengths}
867declaration) is used, then the hash table performs binary
868comparison.  The keywords in the input file may contain NUL bytes,
869written in string syntax as @code{\000} or @code{\x00}, and the code
870generated by @code{gperf} will treat NUL like any other byte.
871Also, in this case the @samp{-c} option (or, equivalently, the
872@samp{%compare-strncmp} declaration) is ignored.
873
874@node Output Copyright,  , Binary Strings, Description
875@section The Copyright of the Output
876@cindex Copyright
877
878@code{gperf} is under GPL, but that does not cause the output produced
879by @code{gperf} to be under GPL.  The reason is that the output contains
880only small pieces of text that come directly from @code{gperf}'s source
881code -- only about 7 lines long, too small for being significant --, and
882therefore the output is not a ``work based on @code{gperf}'' (in the
883sense of the GPL version 3).
884
885On the other hand, the output produced by @code{gperf} contains
886essentially all of the input file.  Therefore the output is a
887``derivative work'' of the input (in the sense of U.S.@: copyright law);
888and its copyright status depends on the copyright of the input.  For most
889software licenses, the result is that the the output is under the same
890license, with the same copyright holder, as the input that was passed to
891@code{gperf}.
892
893@node Options, Bugs, Description, Top
894@chapter Invoking @code{gperf}
895
896There are @emph{many} options to @code{gperf}.  They were added to make
897the program more convenient for use with real applications.  ``On-line''
898help is readily available via the @samp{--help} option.  Here is the
899complete list of options.
900
901@menu
902* Output File::                 Specifying the Location of the Output File
903* Input Details::               Options that affect Interpretation of the Input File
904* Output Language::             Specifying the Language for the Output Code
905* Output Details::              Fine tuning Details in the Output Code
906* Algorithmic Details::         Changing the Algorithms employed by @code{gperf}
907* Verbosity::                   Informative Output
908@end menu
909
910@node Output File, Input Details, Options, Options
911@section Specifying the Location of the Output File
912
913@table @samp
914@item --output-file=@var{file}
915Allows you to specify the name of the file to which the output is written to.
916@end table
917
918The results are written to standard output if no output file is specified
919or if it is @samp{-}.
920
921@node Input Details, Output Language, Output File, Options
922@section Options that affect Interpretation of the Input File
923
924These options are also available as declarations in the input file
925(@pxref{Gperf Declarations}).
926
927@table @samp
928@item -e @var{keyword-delimiter-list}
929@itemx --delimiters=@var{keyword-delimiter-list}
930@cindex Delimiters
931Allows you to provide a string containing delimiters used to
932separate keywords from their attributes.  The default is ",".  This
933option is essential if you want to use keywords that have embedded
934commas or newlines.  One useful trick is to use -e'TAB', where TAB is
935the literal tab character.
936
937@item -t
938@itemx --struct-type
939Allows you to include a @code{struct} type declaration for generated
940code.  Any text before a pair of consecutive @samp{%%} is considered
941part of the type declaration.  Keywords and additional fields may follow
942this, one group of fields per line.  A set of examples for generating
943perfect hash tables and functions for Ada, C, C++, Pascal, Modula 2,
944Modula 3 and JavaScript reserved words are distributed with this release.
945
946@item --ignore-case
947Consider upper and lower case ASCII characters as equivalent.  The string
948comparison will use a case insignificant character comparison.  Note that
949locale dependent case mappings are ignored.  This option is therefore not
950suitable if a properly internationalized or locale aware case mapping
951should be used.  (For example, in a Turkish locale, the upper case equivalent
952of the lowercase ASCII letter @samp{i} is the non-ASCII character
953@samp{capital i with dot above}.)  For this case, it is better to apply
954an uppercase or lowercase conversion on the string before passing it to
955the @code{gperf} generated function.
956@end table
957
958@node Output Language, Output Details, Input Details, Options
959@section Options to specify the Language for the Output Code
960
961These options are also available as declarations in the input file
962(@pxref{Gperf Declarations}).
963
964@table @samp
965@item -L @var{generated-language-name}
966@itemx --language=@var{generated-language-name}
967Instructs @code{gperf} to generate code in the language specified by the
968option's argument.  Languages handled are currently:
969
970@table @samp
971@item KR-C
972Old-style K&R C.  This language is understood by old-style C compilers and
973ANSI C compilers, but ANSI C compilers may flag warnings (or even errors)
974because of lacking @samp{const}.
975
976@item C
977Common C.  This language is understood by ANSI C compilers, and also by
978old-style C compilers, provided that you @code{#define const} to empty
979for compilers which don't know about this keyword.
980
981@item ANSI-C
982ANSI C.  This language is understood by ANSI C compilers and C++ compilers.
983
984@item C++
985C++.  This language is understood by C++ compilers.
986@end table
987
988The default is C.
989
990@item -a
991This option is supported for compatibility with previous releases of
992@code{gperf}.  It does not do anything.
993
994@item -g
995This option is supported for compatibility with previous releases of
996@code{gperf}.  It does not do anything.
997@end table
998
999@node Output Details, Algorithmic Details, Output Language, Options
1000@section Options for fine tuning Details in the Output Code
1001
1002Most of these options are also available as declarations in the input file
1003(@pxref{Gperf Declarations}).
1004
1005@table @samp
1006@item -K @var{slot-name}
1007@itemx --slot-name=@var{slot-name}
1008@cindex Slot name
1009This option is only useful when option @samp{-t} (or, equivalently, the
1010@samp{%struct-type} declaration) has been given.
1011By default, the program assumes the structure component identifier for
1012the keyword is @samp{name}.  This option allows an arbitrary choice of
1013identifier for this component, although it still must occur as the first
1014field in your supplied @code{struct}.
1015
1016@item -F @var{initializers}
1017@itemx --initializer-suffix=@var{initializers}
1018@cindex Initializers
1019This option is only useful when option @samp{-t} (or, equivalently, the
1020@samp{%struct-type} declaration) has been given.
1021It permits to specify initializers for the structure members following
1022@var{slot-name} in empty hash table entries.  The list of initializers
1023should start with a comma.  By default, the emitted code will
1024zero-initialize structure members following @var{slot-name}.
1025
1026@item -H @var{hash-function-name}
1027@itemx --hash-function-name=@var{hash-function-name}
1028Allows you to specify the name for the generated hash function.  Default
1029name is @samp{hash}.  This option permits the use of two hash tables in
1030the same file.
1031
1032@item -N @var{lookup-function-name}
1033@itemx --lookup-function-name=@var{lookup-function-name}
1034Allows you to specify the name for the generated lookup function.
1035Default name is @samp{in_word_set}.  This option permits multiple
1036generated hash functions to be used in the same application.
1037
1038@item -Z @var{class-name}
1039@itemx --class-name=@var{class-name}
1040@cindex Class name
1041This option is only useful when option @samp{-L C++} (or, equivalently,
1042the @samp{%language=C++} declaration) has been given.  It
1043allows you to specify the name of generated C++ class.  Default name is
1044@code{Perfect_Hash}.
1045
1046@item -7
1047@itemx --seven-bit
1048This option specifies that all strings that will be passed as arguments
1049to the generated hash function and the generated lookup function will
1050solely consist of 7-bit ASCII characters (bytes in the range 0..127).
1051(Note that the ANSI C functions @code{isalnum} and @code{isgraph} do
1052@emph{not} guarantee that a byte is in this range.  Only an explicit
1053test like @samp{c >= 'A' && c <= 'Z'} guarantees this.) This was the
1054default in versions of @code{gperf} earlier than 2.7; now the default is
1055to support 8-bit and multibyte characters.
1056
1057@item -l
1058@itemx --compare-lengths
1059Compare keyword lengths before trying a string comparison.  This option
1060is mandatory for binary comparisons (@pxref{Binary Strings}).  It also might
1061cut down on the number of string comparisons made during the lookup, since
1062keywords with different lengths are never compared via @code{strcmp}.
1063However, using @samp{-l} might greatly increase the size of the
1064generated C code if the lookup table range is large (which implies that
1065the switch option @samp{-S} or @samp{%switch} is not enabled), since the length
1066table contains as many elements as there are entries in the lookup table.
1067
1068@item -c
1069@itemx --compare-strncmp
1070Generates C code that uses the @code{strncmp} function to perform
1071string comparisons.  The default action is to use @code{strcmp}.
1072
1073@item -C
1074@itemx --readonly-tables
1075Makes the contents of all generated lookup tables constant, i.e.,
1076``readonly''.  Many compilers can generate more efficient code for this
1077by putting the tables in readonly memory.
1078
1079@item -E
1080@itemx  --enum
1081Define constant values using an enum local to the lookup function rather
1082than with #defines.  This also means that different lookup functions can
1083reside in the same file.  Thanks to James Clark @code{<jjc@@ai.mit.edu>}.
1084
1085@item -I
1086@itemx --includes
1087Include the necessary system include file, @code{<string.h>}, at the
1088beginning of the code.  By default, this is not done; the user must
1089include this header file himself to allow compilation of the code.
1090
1091@item -G
1092@itemx --global-table
1093Generate the static table of keywords as a static global variable,
1094rather than hiding it inside of the lookup function (which is the
1095default behavior).
1096
1097@item -P
1098@itemx --pic
1099Optimize the generated table for inclusion in shared libraries.  This
1100reduces the startup time of programs using a shared library containing
1101the generated code.  If the option @samp{-t} (or, equivalently, the
1102@samp{%struct-type} declaration) is also given, the first field of the
1103user-defined struct must be of type @samp{int}, not @samp{char *}, because
1104it will contain offsets into the string pool instead of actual strings.
1105To convert such an offset to a string, you can use the expression
1106@samp{stringpool + @var{o}}, where @var{o} is the offset.  The string pool
1107name can be changed through the option @samp{--string-pool-name}.
1108
1109@item -Q @var{string-pool-name}
1110@itemx --string-pool-name=@var{string-pool-name}
1111Allows you to specify the name of the generated string pool created by
1112option @samp{-P}.  The default name is @samp{stringpool}.  This option
1113permits the use of two hash tables in the same file, with @samp{-P} and
1114even when the option @samp{-G} (or, equivalently, the @samp{%global-table}
1115declaration) is given.
1116
1117@item --null-strings
1118Use NULL strings instead of empty strings for empty keyword table entries.
1119This reduces the startup time of programs using a shared library containing
1120the generated code (but not as much as option @samp{-P}), at the expense
1121of one more test-and-branch instruction at run time.
1122
1123@item -W @var{hash-table-array-name}
1124@itemx --word-array-name=@var{hash-table-array-name}
1125@cindex Array name
1126Allows you to specify the name for the generated array containing the
1127hash table.  Default name is @samp{wordlist}.  This option permits the
1128use of two hash tables in the same file, even when the option @samp{-G}
1129(or, equivalently, the @samp{%global-table} declaration) is given.
1130
1131@itemx --length-table-name=@var{length-table-array-name}
1132@cindex Array name
1133Allows you to specify the name for the generated array containing the
1134length table.  Default name is @samp{lengthtable}.  This option permits the
1135use of two length tables in the same file, even when the option @samp{-G}
1136(or, equivalently, the @samp{%global-table} declaration) is given.
1137
1138@item -S @var{total-switch-statements}
1139@itemx --switch=@var{total-switch-statements}
1140@cindex @code{switch}
1141Causes the generated C code to use a @code{switch} statement scheme,
1142rather than an array lookup table.  This can lead to a reduction in both
1143time and space requirements for some input files.  The argument to this
1144option determines how many @code{switch} statements are generated.  A
1145value of 1 generates 1 @code{switch} containing all the elements, a
1146value of 2 generates 2 tables with 1/2 the elements in each
1147@code{switch}, etc.  This is useful since many C compilers cannot
1148correctly generate code for large @code{switch} statements.  This option
1149was inspired in part by Keith Bostic's original C program.
1150
1151@item -T
1152@itemx --omit-struct-type
1153Prevents the transfer of the type declaration to the output file.  Use
1154this option if the type is already defined elsewhere.
1155
1156@item -p
1157This option is supported for compatibility with previous releases of
1158@code{gperf}.  It does not do anything.
1159@end table
1160
1161@node Algorithmic Details, Verbosity, Output Details, Options
1162@section Options for changing the Algorithms employed by @code{gperf}
1163
1164@table @samp
1165@item -k @var{selected-byte-positions}
1166@itemx --key-positions=@var{selected-byte-positions}
1167Allows selection of the byte positions used in the keywords'
1168hash function.  The allowable choices range between 1-255, inclusive.
1169The positions are separated by commas, e.g., @samp{-k 9,4,13,14};
1170ranges may be used, e.g., @samp{-k 2-7}; and positions may occur
1171in any order.  Furthermore, the wildcard '*' causes the generated
1172hash function to consider @strong{all} byte positions in each keyword,
1173whereas '$' instructs the hash function to use the ``final byte''
1174of a keyword (this is the only way to use a byte position greater than
1175255, incidentally).
1176
1177For instance, the option @samp{-k 1,2,4,6-10,'$'} generates a hash
1178function that considers positions 1,2,4,6,7,8,9,10, plus the last
1179byte in each keyword (which may be at a different position for each
1180keyword, obviously).  Keywords
1181with length less than the indicated byte positions work properly, since
1182selected byte positions exceeding the keyword length are simply not
1183referenced in the hash function.
1184
1185This option is not normally needed since version 2.8 of @code{gperf};
1186the default byte positions are computed depending on the keyword set,
1187through a search that minimizes the number of byte positions.
1188
1189@item -D
1190@itemx --duplicates
1191@cindex Duplicates
1192Handle keywords whose selected byte sets hash to duplicate values.
1193Duplicate hash values can occur if a set of keywords has the same names, but
1194possesses different attributes, or if the selected byte positions are not well
1195chosen.  With the -D option @code{gperf} treats all these keywords as
1196part of an equivalence class and generates a perfect hash function with
1197multiple comparisons for duplicate keywords.  It is up to you to completely
1198disambiguate the keywords by modifying the generated C code.  However,
1199@code{gperf} helps you out by organizing the output.
1200
1201Using this option usually means that the generated hash function is no
1202longer perfect.  On the other hand, it permits @code{gperf} to work on
1203keyword sets that it otherwise could not handle.
1204
1205@item -m @var{iterations}
1206@itemx --multiple-iterations=@var{iterations}
1207Perform multiple choices of the @samp{-i} and @samp{-j} values, and
1208choose the best results.  This increases the running time by a factor of
1209@var{iterations} but does a good job minimizing the generated table size.
1210
1211@item -i @var{initial-value}
1212@itemx --initial-asso=@var{initial-value}
1213Provides an initial @var{value} for the associate values array.  Default
1214is 0.  Increasing the initial value helps inflate the final table size,
1215possibly leading to more time efficient keyword lookups.  Note that this
1216option is not particularly useful when @samp{-S} (or, equivalently,
1217@samp{%switch}) is used.  Also,
1218@samp{-i} is overridden when the @samp{-r} option is used.
1219
1220@item -j @var{jump-value}
1221@itemx --jump=@var{jump-value}
1222@cindex Jump value
1223Affects the ``jump value'', i.e., how far to advance the associated
1224byte value upon collisions.  @var{Jump-value} is rounded up to an
1225odd number, the default is 5.  If the @var{jump-value} is 0 @code{gperf}
1226jumps by random amounts.
1227
1228@item -n
1229@itemx --no-strlen
1230Instructs the generator not to include the length of a keyword when
1231computing its hash value.  This may save a few assembly instructions in
1232the generated lookup table.
1233
1234@item -r
1235@itemx --random
1236Utilizes randomness to initialize the associated values table.  This
1237frequently generates solutions faster than using deterministic
1238initialization (which starts all associated values at 0).  Furthermore,
1239using the randomization option generally increases the size of the
1240table.
1241
1242@item -s @var{size-multiple}
1243@itemx --size-multiple=@var{size-multiple}
1244Affects the size of the generated hash table.  The numeric argument for
1245this option indicates ``how many times larger or smaller'' the maximum
1246associated value range should be, in relationship to the number of keywords.
1247It can be written as an integer, a floating-point number or a fraction.
1248For example, a value of 3 means ``allow the maximum associated value to be
1249about 3 times larger than the number of input keywords''.
1250Conversely, a value of 1/3 means ``allow the maximum associated value to
1251be about 3 times smaller than the number of input keywords''.  Values
1252smaller than 1 are useful for limiting the overall size of the generated hash
1253table, though the option @samp{-m} is better at this purpose.
1254
1255If `generate switch' option @samp{-S} (or, equivalently, @samp{%switch}) is
1256@emph{not} enabled, the maximum
1257associated value influences the static array table size, and a larger
1258table should decrease the time required for an unsuccessful search, at
1259the expense of extra table space.
1260
1261The default value is 1, thus the default maximum associated value about
1262the same size as the number of keywords (for efficiency, the maximum
1263associated value is always rounded up to a power of 2).  The actual
1264table size may vary somewhat, since this technique is essentially a
1265heuristic.
1266@end table
1267
1268@node Verbosity,  , Algorithmic Details, Options
1269@section Informative Output
1270
1271@table @samp
1272@item -h
1273@itemx --help
1274Prints a short summary on the meaning of each program option.  Aborts
1275further program execution.
1276
1277@item -v
1278@itemx --version
1279Prints out the current version number.
1280
1281@item -d
1282@itemx --debug
1283Enables the debugging option.  This produces verbose diagnostics to
1284``standard error'' when @code{gperf} is executing.  It is useful both for
1285maintaining the program and for determining whether a given set of
1286options is actually speeding up the search for a solution.  Some useful
1287information is dumped at the end of the program when the @samp{-d}
1288option is enabled.
1289@end table
1290
1291@node Bugs, Projects, Options, Top
1292@chapter Known Bugs and Limitations with @code{gperf}
1293
1294The following are some limitations with the current release of
1295@code{gperf}:
1296
1297@itemize @bullet
1298@item
1299The @code{gperf} utility is tuned to execute quickly, and works quickly
1300for small to medium size data sets (around 1000 keywords).  It is
1301extremely useful for maintaining perfect hash functions for compiler
1302keyword sets.  Several recent enhancements now enable @code{gperf} to
1303work efficiently on much larger keyword sets (over 15,000 keywords).
1304When processing large keyword sets it helps greatly to have over 8 megs
1305of RAM.
1306
1307@item 
1308The size of the generate static keyword array can get @emph{extremely}
1309large if the input keyword file is large or if the keywords are quite
1310similar.  This tends to slow down the compilation of the generated C
1311code, and @emph{greatly} inflates the object code size.  If this
1312situation occurs, consider using the @samp{-S} option to reduce data
1313size, potentially increasing keyword recognition time a negligible
1314amount.  Since many C compilers cannot correctly generate code for
1315large switch statements it is important to qualify the @var{-S} option
1316with an appropriate numerical argument that controls the number of
1317switch statements generated.
1318
1319@item 
1320The maximum number of selected byte positions has an
1321arbitrary limit of 255.  This restriction should be removed, and if
1322anyone considers this a problem write me and let me know so I can remove
1323the constraint.
1324@end itemize
1325
1326@node Projects, Bibliography, Bugs, Top
1327@chapter Things Still Left to Do
1328
1329It should be ``relatively'' easy to replace the current perfect hash
1330function algorithm with a more exhaustive approach; the perfect hash
1331module is essential independent from other program modules.  Additional
1332worthwhile improvements include:
1333
1334@itemize @bullet
1335@item 
1336Another useful extension involves modifying the program to generate
1337``minimal'' perfect hash functions (under certain circumstances, the
1338current version can be rather extravagant in the generated table size).
1339This is mostly of theoretical interest, since a sparse table
1340often produces faster lookups, and use of the @samp{-S} @code{switch}
1341option can minimize the data size, at the expense of slightly longer
1342lookups (note that the gcc compiler generally produces good code for
1343@code{switch} statements, reducing the need for more complex schemes).
1344
1345@item
1346In addition to improving the algorithm, it would also be useful to
1347generate an Ada package as the code output, in addition to the current
1348C and C++ routines.
1349@end itemize
1350
1351@page
1352
1353@node Bibliography, Concept Index, Projects, Top
1354@chapter Bibliography
1355
1356@noindent
1357[1] Chang, C.C.: @i{A Scheme for Constructing Ordered Minimal Perfect
1358Hashing Functions} Information Sciences 39(1986), 187-195.
1359
1360@noindent
1361[2] Cichelli, Richard J. @i{Author's Response to ``On Cichelli's Minimal Perfect Hash
1362Functions Method''} Communications of the ACM, 23, 12(December 1980), 729.
1363
1364@noindent
1365[3] Cichelli, Richard J. @i{Minimal Perfect Hash Functions Made Simple}
1366Communications of the ACM, 23, 1(January 1980), 17-19.
1367
1368@noindent
1369[4] Cook, C. R. and Oldehoeft, R.R. @i{A Letter Oriented Minimal
1370Perfect Hashing Function} SIGPLAN Notices, 17, 9(September 1982), 18-27.
1371
1372@noindent
1373[5] Cormack, G. V. and Horspool, R. N. S. and Kaiserwerth, M.
1374@i{Practical Perfect Hashing} Computer Journal, 28, 1(January 1985), 54-58.
1375
1376@noindent
1377[6] Jaeschke, G. @i{Reciprocal Hashing: A Method for Generating Minimal
1378Perfect Hashing Functions} Communications of the ACM, 24, 12(December
13791981), 829-833.
1380
1381@noindent
1382[7] Jaeschke, G. and Osterburg, G. @i{On Cichelli's Minimal Perfect
1383Hash Functions Method} Communications of the ACM, 23, 12(December 1980),
1384728-729.
1385
1386@noindent
1387[8] Sager, Thomas J. @i{A Polynomial Time Generator for Minimal Perfect
1388Hash Functions} Communications of the ACM, 28, 5(December 1985), 523-532
1389
1390@noindent
1391[9] Schmidt, Douglas C. @i{GPERF: A Perfect Hash Function Generator}
1392Second USENIX C++ Conference Proceedings, April 1990.
1393
1394@noindent
1395[10] Schmidt, Douglas C. @i{GPERF: A Perfect Hash Function Generator}
1396C++ Report, SIGS 10 10 (November/December 1998).
1397
1398@noindent
1399[11] Sebesta, R.W. and Taylor, M.A. @i{Minimal Perfect Hash Functions
1400for Reserved Word Lists}  SIGPLAN Notices, 20, 12(September 1985), 47-53.
1401
1402@noindent
1403[12] Sprugnoli, R. @i{Perfect Hashing Functions: A Single Probe
1404Retrieving Method for Static Sets} Communications of the ACM, 20
140511(November 1977), 841-850.
1406
1407@noindent
1408[13] Stallman, Richard M. @i{Using and Porting GNU CC} Free Software Foundation,
14091988.
1410
1411@noindent
1412[14] Stroustrup, Bjarne @i{The C++ Programming Language.} Addison-Wesley, 1986.
1413
1414@noindent
1415[15] Tiemann, Michael D. @i{User's Guide to GNU C++} Free Software
1416Foundation, 1989.
1417
1418@node Concept Index,  , Bibliography, Top
1419@unnumbered Concept Index
1420
1421@printindex cp
1422
1423@iftex
1424@c Table of Contents
1425@contents
1426@end iftex
1427
1428@bye
1429