1<HTML>
2<HEAD>
3<!-- This HTML file has been created by texi2html 1.52b
4     from gperf.texi on 19 March 2013 -->
5
6<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">
7<TITLE>Perfect Hash Function Generator - 4  High-Level Description of GNU gperf</TITLE>
8</HEAD>
9<BODY>
10Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_4.html">previous</A>, <A HREF="gperf_6.html">next</A>, <A HREF="gperf_10.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
11<P><HR><P>
12
13
14<H1><A NAME="SEC5" HREF="gperf_toc.html#TOC5">4  High-Level Description of GNU <CODE>gperf</CODE></A></H1>
15
16<P>
17The perfect hash function generator <CODE>gperf</CODE> reads a set of
18���keywords��� from an input file (or from the standard input by
19default).  It attempts to derive a perfect hashing function that
20recognizes a member of the <EM>static keyword set</EM> with at most a
21single probe into the lookup table.  If <CODE>gperf</CODE> succeeds in
22generating such a function it produces a pair of C source code routines
23that perform hashing and table lookup recognition.  All generated C code
24is directed to the standard output.  Command-line options described
25below allow you to modify the input and output format to <CODE>gperf</CODE>.
26
27</P>
28<P>
29By default, <CODE>gperf</CODE> attempts to produce time-efficient code, with
30less emphasis on efficient space utilization.  However, several options
31exist that permit trading-off execution time for storage space and vice
32versa.  In particular, expanding the generated table size produces a
33sparse search structure, generally yielding faster searches.
34Conversely, you can direct <CODE>gperf</CODE> to utilize a C <CODE>switch</CODE>
35statement scheme that minimizes data space storage size.  Furthermore,
36using a C <CODE>switch</CODE> may actually speed up the keyword retrieval time
37somewhat.  Actual results depend on your C compiler, of course.
38
39</P>
40<P>
41In general, <CODE>gperf</CODE> assigns values to the bytes it is using
42for hashing until some set of values gives each keyword a unique value.
43A helpful heuristic is that the larger the hash value range, the easier
44it is for <CODE>gperf</CODE> to find and generate a perfect hash function.
45Experimentation is the key to getting the most from <CODE>gperf</CODE>.
46
47</P>
48
49
50<H2><A NAME="SEC6" HREF="gperf_toc.html#TOC6">4.1  Input Format to <CODE>gperf</CODE></A></H2>
51<P>
52<A NAME="IDX4"></A>
53<A NAME="IDX5"></A>
54<A NAME="IDX6"></A>
55<A NAME="IDX7"></A>
56You can control the input file format by varying certain command-line
57arguments, in particular the <SAMP>&lsquo;-t&rsquo;</SAMP> option.  The input's appearance
58is similar to GNU utilities <CODE>flex</CODE> and <CODE>bison</CODE> (or UNIX
59utilities <CODE>lex</CODE> and <CODE>yacc</CODE>).  Here's an outline of the general
60format:
61
62</P>
63
64<PRE>
65declarations
66%%
67keywords
68%%
69functions
70</PRE>
71
72<P>
73<EM>Unlike</EM> <CODE>flex</CODE> or <CODE>bison</CODE>, the declarations section and
74the functions section are optional.  The following sections describe the
75input format for each section.
76
77</P>
78
79<P>
80It is possible to omit the declaration section entirely, if the <SAMP>&lsquo;-t&rsquo;</SAMP>
81option is not given.  In this case the input file begins directly with the
82first keyword line, e.g.:
83
84</P>
85
86<PRE>
87january
88february
89march
90april
91...
92</PRE>
93
94
95
96<H3><A NAME="SEC7" HREF="gperf_toc.html#TOC7">4.1.1  Declarations</A></H3>
97
98<P>
99The keyword input file optionally contains a section for including
100arbitrary C declarations and definitions, <CODE>gperf</CODE> declarations that
101act like command-line options, as well as for providing a user-supplied
102<CODE>struct</CODE>.
103
104</P>
105
106
107
108<H4><A NAME="SEC8" HREF="gperf_toc.html#TOC8">4.1.1.1  User-supplied <CODE>struct</CODE></A></H4>
109
110<P>
111If the <SAMP>&lsquo;-t&rsquo;</SAMP> option (or, equivalently, the <SAMP>&lsquo;%struct-type&rsquo;</SAMP> declaration)
112<EM>is</EM> enabled, you <EM>must</EM> provide a C <CODE>struct</CODE> as the last
113component in the declaration section from the input file.  The first
114field in this struct must be of type <CODE>char *</CODE> or <CODE>const char *</CODE>
115if the <SAMP>&lsquo;-P&rsquo;</SAMP> option is not given, or of type <CODE>int</CODE> if the option
116<SAMP>&lsquo;-P&rsquo;</SAMP> (or, equivalently, the <SAMP>&lsquo;%pic&rsquo;</SAMP> declaration) is enabled.
117This first field must be called <SAMP>&lsquo;name&rsquo;</SAMP>, although it is possible to modify
118its name with the <SAMP>&lsquo;-K&rsquo;</SAMP> option (or, equivalently, the
119<SAMP>&lsquo;%define slot-name&rsquo;</SAMP> declaration) described below.
120
121</P>
122<P>
123Here is a simple example, using months of the year and their attributes as
124input:
125
126</P>
127
128<PRE>
129struct month { char *name; int number; int days; int leap_days; };
130%%
131january,   1, 31, 31
132february,  2, 28, 29
133march,     3, 31, 31
134april,     4, 30, 30
135may,       5, 31, 31
136june,      6, 30, 30
137july,      7, 31, 31
138august,    8, 31, 31
139september, 9, 30, 30
140october,  10, 31, 31
141november, 11, 30, 30
142december, 12, 31, 31
143</PRE>
144
145<P>
146<A NAME="IDX8"></A>
147Separating the <CODE>struct</CODE> declaration from the list of keywords and
148other fields are a pair of consecutive percent signs, <SAMP>&lsquo;%%&rsquo;</SAMP>,
149appearing left justified in the first column, as in the UNIX utility
150<CODE>lex</CODE>.
151
152</P>
153<P>
154If the <CODE>struct</CODE> has already been declared in an include file, it can
155be mentioned in an abbreviated form, like this:
156
157</P>
158
159<PRE>
160struct month;
161%%
162january,   1, 31, 31
163...
164</PRE>
165
166
167
168<H4><A NAME="SEC9" HREF="gperf_toc.html#TOC9">4.1.1.2  Gperf Declarations</A></H4>
169
170<P>
171The declaration section can contain <CODE>gperf</CODE> declarations.  They
172influence the way <CODE>gperf</CODE> works, like command line options do.
173In fact, every such declaration is equivalent to a command line option.
174There are three forms of declarations:
175
176</P>
177
178<OL>
179<LI>
180
181Declarations without argument, like <SAMP>&lsquo;%compare-lengths&rsquo;</SAMP>.
182
183<LI>
184
185Declarations with an argument, like <SAMP>&lsquo;%switch=<VAR>count</VAR>&rsquo;</SAMP>.
186
187<LI>
188
189Declarations of names of entities in the output file, like
190<SAMP>&lsquo;%define lookup-function-name <VAR>name</VAR>&rsquo;</SAMP>.
191</OL>
192
193<P>
194When a declaration is given both in the input file and as a command line
195option, the command-line option's value prevails.
196
197</P>
198<P>
199The following <CODE>gperf</CODE> declarations are available.
200
201</P>
202<DL COMPACT>
203
204<DT><SAMP>&lsquo;%delimiters=<VAR>delimiter-list</VAR>&rsquo;</SAMP>
205<DD>
206<A NAME="IDX9"></A>
207Allows you to provide a string containing delimiters used to
208separate keywords from their attributes.  The default is ",".  This
209option is essential if you want to use keywords that have embedded
210commas or newlines.
211
212<DT><SAMP>&lsquo;%struct-type&rsquo;</SAMP>
213<DD>
214<A NAME="IDX10"></A>
215Allows you to include a <CODE>struct</CODE> type declaration for generated
216code; see above for an example.
217
218<DT><SAMP>&lsquo;%ignore-case&rsquo;</SAMP>
219<DD>
220<A NAME="IDX11"></A>
221Consider upper and lower case ASCII characters as equivalent.  The string
222comparison will use a case insignificant character comparison.  Note that
223locale dependent case mappings are ignored.
224
225<DT><SAMP>&lsquo;%language=<VAR>language-name</VAR>&rsquo;</SAMP>
226<DD>
227<A NAME="IDX12"></A>
228Instructs <CODE>gperf</CODE> to generate code in the language specified by the
229option's argument.  Languages handled are currently:
230
231<DL COMPACT>
232
233<DT><SAMP>&lsquo;KR-C&rsquo;</SAMP>
234<DD>
235Old-style K&#38;R C.  This language is understood by old-style C compilers and
236ANSI C compilers, but ANSI C compilers may flag warnings (or even errors)
237because of lacking <SAMP>&lsquo;const&rsquo;</SAMP>.
238
239<DT><SAMP>&lsquo;C&rsquo;</SAMP>
240<DD>
241Common C.  This language is understood by ANSI C compilers, and also by
242old-style C compilers, provided that you <CODE>#define const</CODE> to empty
243for compilers which don't know about this keyword.
244
245<DT><SAMP>&lsquo;ANSI-C&rsquo;</SAMP>
246<DD>
247ANSI C.  This language is understood by ANSI C (C89, ISO C90) compilers,
248ISO C99 compilers, and C++ compilers.
249
250<DT><SAMP>&lsquo;C++&rsquo;</SAMP>
251<DD>
252C++.  This language is understood by C++ compilers.
253</DL>
254
255The default is C.
256
257<DT><SAMP>&lsquo;%define slot-name <VAR>name</VAR>&rsquo;</SAMP>
258<DD>
259<A NAME="IDX13"></A>
260This declaration is only useful when option <SAMP>&lsquo;-t&rsquo;</SAMP> (or, equivalently, the
261<SAMP>&lsquo;%struct-type&rsquo;</SAMP> declaration) has been given.
262By default, the program assumes the structure component identifier for
263the keyword is <SAMP>&lsquo;name&rsquo;</SAMP>.  This option allows an arbitrary choice of
264identifier for this component, although it still must occur as the first
265field in your supplied <CODE>struct</CODE>.
266
267<DT><SAMP>&lsquo;%define initializer-suffix <VAR>initializers</VAR>&rsquo;</SAMP>
268<DD>
269<A NAME="IDX14"></A>
270This declaration is only useful when option <SAMP>&lsquo;-t&rsquo;</SAMP> (or, equivalently, the
271<SAMP>&lsquo;%struct-type&rsquo;</SAMP> declaration) has been given.
272It permits to specify initializers for the structure members following
273<VAR>slot-name</VAR> in empty hash table entries.  The list of initializers
274should start with a comma.  By default, the emitted code will
275zero-initialize structure members following <VAR>slot-name</VAR>.
276
277<DT><SAMP>&lsquo;%define hash-function-name <VAR>name</VAR>&rsquo;</SAMP>
278<DD>
279<A NAME="IDX15"></A>
280Allows you to specify the name for the generated hash function.  Default
281name is <SAMP>&lsquo;hash&rsquo;</SAMP>.  This option permits the use of two hash tables in
282the same file.
283
284<DT><SAMP>&lsquo;%define lookup-function-name <VAR>name</VAR>&rsquo;</SAMP>
285<DD>
286<A NAME="IDX16"></A>
287Allows you to specify the name for the generated lookup function.
288Default name is <SAMP>&lsquo;in_word_set&rsquo;</SAMP>.  This option permits multiple
289generated hash functions to be used in the same application.
290
291<DT><SAMP>&lsquo;%define class-name <VAR>name</VAR>&rsquo;</SAMP>
292<DD>
293<A NAME="IDX17"></A>
294This option is only useful when option <SAMP>&lsquo;-L C++&rsquo;</SAMP> (or, equivalently,
295the <SAMP>&lsquo;%language=C++&rsquo;</SAMP> declaration) has been given.  It
296allows you to specify the name of generated C++ class.  Default name is
297<CODE>Perfect_Hash</CODE>.
298
299<DT><SAMP>&lsquo;%7bit&rsquo;</SAMP>
300<DD>
301<A NAME="IDX18"></A>
302This option specifies that all strings that will be passed as arguments
303to the generated hash function and the generated lookup function will
304solely consist of 7-bit ASCII characters (bytes in the range 0..127).
305(Note that the ANSI C functions <CODE>isalnum</CODE> and <CODE>isgraph</CODE> do
306<EM>not</EM> guarantee that a byte is in this range.  Only an explicit
307test like <SAMP>&lsquo;c &#62;= 'A' &#38;&#38; c &#60;= 'Z'&rsquo;</SAMP> guarantees this.)
308
309<DT><SAMP>&lsquo;%compare-lengths&rsquo;</SAMP>
310<DD>
311<A NAME="IDX19"></A>
312Compare keyword lengths before trying a string comparison.  This option
313is mandatory for binary comparisons (see section <A HREF="gperf_5.html#SEC15">4.3  Use of NUL bytes</A>).  It also might
314cut down on the number of string comparisons made during the lookup, since
315keywords with different lengths are never compared via <CODE>strcmp</CODE>.
316However, using <SAMP>&lsquo;%compare-lengths&rsquo;</SAMP> might greatly increase the size of the
317generated C code if the lookup table range is large (which implies that
318the switch option <SAMP>&lsquo;-S&rsquo;</SAMP> or <SAMP>&lsquo;%switch&rsquo;</SAMP> is not enabled), since the length
319table contains as many elements as there are entries in the lookup table.
320
321<DT><SAMP>&lsquo;%compare-strncmp&rsquo;</SAMP>
322<DD>
323<A NAME="IDX20"></A>
324Generates C code that uses the <CODE>strncmp</CODE> function to perform
325string comparisons.  The default action is to use <CODE>strcmp</CODE>.
326
327<DT><SAMP>&lsquo;%readonly-tables&rsquo;</SAMP>
328<DD>
329<A NAME="IDX21"></A>
330Makes the contents of all generated lookup tables constant, i.e.,
331���readonly���.  Many compilers can generate more efficient code for this
332by putting the tables in readonly memory.
333
334<DT><SAMP>&lsquo;%enum&rsquo;</SAMP>
335<DD>
336<A NAME="IDX22"></A>
337Define constant values using an enum local to the lookup function rather
338than with #defines.  This also means that different lookup functions can
339reside in the same file.  Thanks to James Clark <CODE>&#60;jjc@ai.mit.edu&#62;</CODE>.
340
341<DT><SAMP>&lsquo;%includes&rsquo;</SAMP>
342<DD>
343<A NAME="IDX23"></A>
344Include the necessary system include file, <CODE>&#60;string.h&#62;</CODE>, at the
345beginning of the code.  By default, this is not done; the user must
346include this header file himself to allow compilation of the code.
347
348<DT><SAMP>&lsquo;%global-table&rsquo;</SAMP>
349<DD>
350<A NAME="IDX24"></A>
351Generate the static table of keywords as a static global variable,
352rather than hiding it inside of the lookup function (which is the
353default behavior).
354
355<DT><SAMP>&lsquo;%pic&rsquo;</SAMP>
356<DD>
357<A NAME="IDX25"></A>
358Optimize the generated table for inclusion in shared libraries.  This
359reduces the startup time of programs using a shared library containing
360the generated code.  If the <SAMP>&lsquo;%struct-type&rsquo;</SAMP> declaration (or,
361equivalently, the option <SAMP>&lsquo;-t&rsquo;</SAMP>) is also given, the first field of the
362user-defined struct must be of type <SAMP>&lsquo;int&rsquo;</SAMP>, not <SAMP>&lsquo;char *&rsquo;</SAMP>, because
363it will contain offsets into the string pool instead of actual strings.
364To convert such an offset to a string, you can use the expression
365<SAMP>&lsquo;stringpool + <VAR>o</VAR>&rsquo;</SAMP>, where <VAR>o</VAR> is the offset.  The string pool
366name can be changed through the <SAMP>&lsquo;%define string-pool-name&rsquo;</SAMP> declaration.
367
368<DT><SAMP>&lsquo;%define string-pool-name <VAR>name</VAR>&rsquo;</SAMP>
369<DD>
370<A NAME="IDX26"></A>
371Allows you to specify the name of the generated string pool created by
372the declaration <SAMP>&lsquo;%pic&rsquo;</SAMP> (or, equivalently, the option <SAMP>&lsquo;-P&rsquo;</SAMP>).
373The default name is <SAMP>&lsquo;stringpool&rsquo;</SAMP>.  This declaration permits the use of
374two hash tables in the same file, with <SAMP>&lsquo;%pic&rsquo;</SAMP> and even when the
375<SAMP>&lsquo;%global-table&rsquo;</SAMP> declaration (or, equivalently, the option <SAMP>&lsquo;-G&rsquo;</SAMP>)
376is given.
377
378<DT><SAMP>&lsquo;%null-strings&rsquo;</SAMP>
379<DD>
380<A NAME="IDX27"></A>
381Use NULL strings instead of empty strings for empty keyword table entries.
382This reduces the startup time of programs using a shared library containing
383the generated code (but not as much as the declaration <SAMP>&lsquo;%pic&rsquo;</SAMP>), at the
384expense of one more test-and-branch instruction at run time.
385
386<DT><SAMP>&lsquo;%define word-array-name <VAR>name</VAR>&rsquo;</SAMP>
387<DD>
388<A NAME="IDX28"></A>
389Allows you to specify the name for the generated array containing the
390hash table.  Default name is <SAMP>&lsquo;wordlist&rsquo;</SAMP>.  This option permits the
391use of two hash tables in the same file, even when the option <SAMP>&lsquo;-G&rsquo;</SAMP>
392(or, equivalently, the <SAMP>&lsquo;%global-table&rsquo;</SAMP> declaration) is given.
393
394<DT><SAMP>&lsquo;%define length-table-name <VAR>name</VAR>&rsquo;</SAMP>
395<DD>
396<A NAME="IDX29"></A>
397Allows you to specify the name for the generated array containing the
398length table.  Default name is <SAMP>&lsquo;lengthtable&rsquo;</SAMP>.  This option permits the
399use of two length tables in the same file, even when the option <SAMP>&lsquo;-G&rsquo;</SAMP>
400(or, equivalently, the <SAMP>&lsquo;%global-table&rsquo;</SAMP> declaration) is given.
401
402<DT><SAMP>&lsquo;%switch=<VAR>count</VAR>&rsquo;</SAMP>
403<DD>
404<A NAME="IDX30"></A>
405Causes the generated C code to use a <CODE>switch</CODE> statement scheme,
406rather than an array lookup table.  This can lead to a reduction in both
407time and space requirements for some input files.  The argument to this
408option determines how many <CODE>switch</CODE> statements are generated.  A
409value of 1 generates 1 <CODE>switch</CODE> containing all the elements, a
410value of 2 generates 2 tables with 1/2 the elements in each
411<CODE>switch</CODE>, etc.  This is useful since many C compilers cannot
412correctly generate code for large <CODE>switch</CODE> statements.  This option
413was inspired in part by Keith Bostic's original C program.
414
415<DT><SAMP>&lsquo;%omit-struct-type&rsquo;</SAMP>
416<DD>
417<A NAME="IDX31"></A>
418Prevents the transfer of the type declaration to the output file.  Use
419this option if the type is already defined elsewhere.
420</DL>
421
422
423
424<H4><A NAME="SEC10" HREF="gperf_toc.html#TOC10">4.1.1.3  C Code Inclusion</A></H4>
425
426<P>
427<A NAME="IDX32"></A>
428<A NAME="IDX33"></A>
429Using a syntax similar to GNU utilities <CODE>flex</CODE> and <CODE>bison</CODE>, it
430is possible to directly include C source text and comments verbatim into
431the generated output file.  This is accomplished by enclosing the region
432inside left-justified surrounding <SAMP>&lsquo;%{&rsquo;</SAMP>, <SAMP>&lsquo;%}&rsquo;</SAMP> pairs.  Here is
433an input fragment based on the previous example that illustrates this
434feature:
435
436</P>
437
438<PRE>
439%{
440#include &#60;assert.h&#62;
441/* This section of code is inserted directly into the output. */
442int return_month_days (struct month *months, int is_leap_year);
443%}
444struct month { char *name; int number; int days; int leap_days; };
445%%
446january,   1, 31, 31
447february,  2, 28, 29
448march,     3, 31, 31
449...
450</PRE>
451
452
453
454<H3><A NAME="SEC11" HREF="gperf_toc.html#TOC11">4.1.2  Format for Keyword Entries</A></H3>
455
456<P>
457The second input file format section contains lines of keywords and any
458associated attributes you might supply.  A line beginning with <SAMP>&lsquo;#&rsquo;</SAMP>
459in the first column is considered a comment.  Everything following the
460<SAMP>&lsquo;#&rsquo;</SAMP> is ignored, up to and including the following newline.  A line
461beginning with <SAMP>&lsquo;%&rsquo;</SAMP> in the first column is an option declaration and
462must not occur within the keywords section.
463
464</P>
465<P>
466The first field of each non-comment line is always the keyword itself.  It
467can be given in two ways: as a simple name, i.e., without surrounding
468string quotation marks, or as a string enclosed in double-quotes, in
469C syntax, possibly with backslash escapes like <CODE>\"</CODE> or <CODE>\234</CODE>
470or <CODE>\xa8</CODE>.  In either case, it must start right at the beginning
471of the line, without leading whitespace.
472In this context, a ���field��� is considered to extend up to, but
473not include, the first blank, comma, or newline.  Here is a simple
474example taken from a partial list of C reserved words:
475
476</P>
477
478<PRE>
479# These are a few C reserved words, see the c.gperf file 
480# for a complete list of ANSI C reserved words.
481unsigned
482sizeof
483switch
484signed
485if
486default
487for
488while
489return
490</PRE>
491
492<P>
493Note that unlike <CODE>flex</CODE> or <CODE>bison</CODE> the first <SAMP>&lsquo;%%&rsquo;</SAMP> marker
494may be elided if the declaration section is empty.
495
496</P>
497<P>
498Additional fields may optionally follow the leading keyword.  Fields
499should be separated by commas, and terminate at the end of line.  What
500these fields mean is entirely up to you; they are used to initialize the
501elements of the user-defined <CODE>struct</CODE> provided by you in the
502declaration section.  If the <SAMP>&lsquo;-t&rsquo;</SAMP> option (or, equivalently, the
503<SAMP>&lsquo;%struct-type&rsquo;</SAMP> declaration) is <EM>not</EM> enabled
504these fields are simply ignored.  All previous examples except the last
505one contain keyword attributes.
506
507</P>
508
509
510<H3><A NAME="SEC12" HREF="gperf_toc.html#TOC12">4.1.3  Including Additional C Functions</A></H3>
511
512<P>
513The optional third section also corresponds closely with conventions
514found in <CODE>flex</CODE> and <CODE>bison</CODE>.  All text in this section,
515starting at the final <SAMP>&lsquo;%%&rsquo;</SAMP> and extending to the end of the input
516file, is included verbatim into the generated output file.  Naturally,
517it is your responsibility to ensure that the code contained in this
518section is valid C.
519
520</P>
521
522
523<H3><A NAME="SEC13" HREF="gperf_toc.html#TOC13">4.1.4  Where to place directives for GNU <CODE>indent</CODE>.</A></H3>
524
525<P>
526If you want to invoke GNU <CODE>indent</CODE> on a <CODE>gperf</CODE> input file,
527you will see that GNU <CODE>indent</CODE> doesn't understand the <SAMP>&lsquo;%%&rsquo;</SAMP>,
528<SAMP>&lsquo;%{&rsquo;</SAMP> and <SAMP>&lsquo;%}&rsquo;</SAMP> directives that control <CODE>gperf</CODE>'s
529interpretation of the input file.  Therefore you have to insert some
530directives for GNU <CODE>indent</CODE>.  More precisely, assuming the most
531general input file structure
532
533</P>
534
535<PRE>
536declarations part 1
537%{
538verbatim code
539%}
540declarations part 2
541%%
542keywords
543%%
544functions
545</PRE>
546
547<P>
548you would insert <SAMP>&lsquo;*INDENT-OFF*&rsquo;</SAMP> and <SAMP>&lsquo;*INDENT-ON*&rsquo;</SAMP> comments
549as follows:
550
551</P>
552
553<PRE>
554/* *INDENT-OFF* */
555declarations part 1
556%{
557/* *INDENT-ON* */
558verbatim code
559/* *INDENT-OFF* */
560%}
561declarations part 2
562%%
563keywords
564%%
565/* *INDENT-ON* */
566functions
567</PRE>
568
569
570
571<H2><A NAME="SEC14" HREF="gperf_toc.html#TOC14">4.2  Output Format for Generated C Code with <CODE>gperf</CODE></A></H2>
572<P>
573<A NAME="IDX34"></A>
574
575</P>
576<P>
577Several options control how the generated C code appears on the standard 
578output.  Two C functions are generated.  They are called <CODE>hash</CODE> and 
579<CODE>in_word_set</CODE>, although you may modify their names with a command-line 
580option.  Both functions require two arguments, a string, <CODE>char *</CODE> 
581<VAR>str</VAR>, and a length parameter, <CODE>int</CODE> <VAR>len</VAR>.  Their default 
582function prototypes are as follows:
583
584</P>
585<P>
586<DL>
587<DT><U>Function:</U> unsigned int <B>hash</B> <I>(const char * <VAR>str</VAR>, unsigned int <VAR>len</VAR>)</I>
588<DD><A NAME="IDX35"></A>
589By default, the generated <CODE>hash</CODE> function returns an integer value
590created by adding <VAR>len</VAR> to several user-specified <VAR>str</VAR> byte
591positions indexed into an <EM>associated values</EM> table stored in a
592local static array.  The associated values table is constructed
593internally by <CODE>gperf</CODE> and later output as a static local C array
594called <SAMP>&lsquo;hash_table&rsquo;</SAMP>.  The relevant selected positions (i.e. indices
595into <VAR>str</VAR>) are specified via the <SAMP>&lsquo;-k&rsquo;</SAMP> option when running
596<CODE>gperf</CODE>, as detailed in the <EM>Options</EM> section below (see section <A HREF="gperf_6.html#SEC17">5  Invoking <CODE>gperf</CODE></A>).
597</DL>
598
599</P>
600<P>
601<DL>
602<DT><U>Function:</U>  <B>in_word_set</B> <I>(const char * <VAR>str</VAR>, unsigned int <VAR>len</VAR>)</I>
603<DD><A NAME="IDX36"></A>
604If <VAR>str</VAR> is in the keyword set, returns a pointer to that
605keyword.  More exactly, if the option <SAMP>&lsquo;-t&rsquo;</SAMP> (or, equivalently, the
606<SAMP>&lsquo;%struct-type&rsquo;</SAMP> declaration) was given, it returns
607a pointer to the matching keyword's structure.  Otherwise it returns
608<CODE>NULL</CODE>.
609</DL>
610
611</P>
612<P>
613If the option <SAMP>&lsquo;-c&rsquo;</SAMP> (or, equivalently, the <SAMP>&lsquo;%compare-strncmp&rsquo;</SAMP>
614declaration) is not used, <VAR>str</VAR> must be a NUL terminated
615string of exactly length <VAR>len</VAR>.  If <SAMP>&lsquo;-c&rsquo;</SAMP> (or, equivalently, the
616<SAMP>&lsquo;%compare-strncmp&rsquo;</SAMP> declaration) is used, <VAR>str</VAR> must
617simply be an array of <VAR>len</VAR> bytes and does not need to be NUL
618terminated.
619
620</P>
621<P>
622The code generated for these two functions is affected by the following
623options:
624
625</P>
626<DL COMPACT>
627
628<DT><SAMP>&lsquo;-t&rsquo;</SAMP>
629<DD>
630<DT><SAMP>&lsquo;--struct-type&rsquo;</SAMP>
631<DD>
632Make use of the user-defined <CODE>struct</CODE>.
633
634<DT><SAMP>&lsquo;-S <VAR>total-switch-statements</VAR>&rsquo;</SAMP>
635<DD>
636<DT><SAMP>&lsquo;--switch=<VAR>total-switch-statements</VAR>&rsquo;</SAMP>
637<DD>
638<A NAME="IDX37"></A>
639Generate 1 or more C <CODE>switch</CODE> statement rather than use a large,
640(and potentially sparse) static array.  Although the exact time and
641space savings of this approach vary according to your C compiler's
642degree of optimization, this method often results in smaller and faster
643code.
644</DL>
645
646<P>
647If the <SAMP>&lsquo;-t&rsquo;</SAMP> and <SAMP>&lsquo;-S&rsquo;</SAMP> options (or, equivalently, the
648<SAMP>&lsquo;%struct-type&rsquo;</SAMP> and <SAMP>&lsquo;%switch&rsquo;</SAMP> declarations) are omitted, the default
649action
650is to generate a <CODE>char *</CODE> array containing the keywords, together with
651additional empty strings used for padding the array.  By experimenting
652with the various input and output options, and timing the resulting C
653code, you can determine the best option choices for different keyword
654set characteristics.
655
656</P>
657
658
659<H2><A NAME="SEC15" HREF="gperf_toc.html#TOC15">4.3  Use of NUL bytes</A></H2>
660<P>
661<A NAME="IDX38"></A>
662
663</P>
664<P>
665By default, the code generated by <CODE>gperf</CODE> operates on zero
666terminated strings, the usual representation of strings in C.  This means
667that the keywords in the input file must not contain NUL bytes,
668and the <VAR>str</VAR> argument passed to <CODE>hash</CODE> or <CODE>in_word_set</CODE>
669must be NUL terminated and have exactly length <VAR>len</VAR>.
670
671</P>
672<P>
673If option <SAMP>&lsquo;-c&rsquo;</SAMP> (or, equivalently, the <SAMP>&lsquo;%compare-strncmp&rsquo;</SAMP>
674declaration) is used, then the <VAR>str</VAR> argument does not need
675to be NUL terminated.  The code generated by <CODE>gperf</CODE> will only
676access the first <VAR>len</VAR>, not <VAR>len+1</VAR>, bytes starting at <VAR>str</VAR>.
677However, the keywords in the input file still must not contain NUL
678bytes.
679
680</P>
681<P>
682If option <SAMP>&lsquo;-l&rsquo;</SAMP> (or, equivalently, the <SAMP>&lsquo;%compare-lengths&rsquo;</SAMP>
683declaration) is used, then the hash table performs binary
684comparison.  The keywords in the input file may contain NUL bytes,
685written in string syntax as <CODE>\000</CODE> or <CODE>\x00</CODE>, and the code
686generated by <CODE>gperf</CODE> will treat NUL like any other byte.
687Also, in this case the <SAMP>&lsquo;-c&rsquo;</SAMP> option (or, equivalently, the
688<SAMP>&lsquo;%compare-strncmp&rsquo;</SAMP> declaration) is ignored.
689
690</P>
691
692
693<H2><A NAME="SEC16" HREF="gperf_toc.html#TOC16">4.4  The Copyright of the Output</A></H2>
694<P>
695<A NAME="IDX39"></A>
696
697</P>
698<P>
699<CODE>gperf</CODE> is under GPL, but that does not cause the output produced
700by <CODE>gperf</CODE> to be under GPL.  The reason is that the output contains
701only small pieces of text that come directly from <CODE>gperf</CODE>'s source
702code -- only about 7 lines long, too small for being significant --, and
703therefore the output is not a ���work based on <CODE>gperf</CODE>��� (in the
704sense of the GPL version 3).
705
706</P>
707<P>
708On the other hand, the output produced by <CODE>gperf</CODE> contains
709essentially all of the input file.  Therefore the output is a
710���derivative work��� of the input (in the sense of U.S. copyright law);
711and its copyright status depends on the copyright of the input.  For most
712software licenses, the result is that the the output is under the same
713license, with the same copyright holder, as the input that was passed to
714<CODE>gperf</CODE>.
715
716</P>
717<P><HR><P>
718Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_4.html">previous</A>, <A HREF="gperf_6.html">next</A>, <A HREF="gperf_10.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
719</BODY>
720</HTML>
721