1<HTML> 2<HEAD> 3<!-- This HTML file has been created by texi2html 1.52b 4 from gperf.texi on 19 March 2013 --> 5 6<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8"> 7<TITLE>Perfect Hash Function Generator - 4 High-Level Description of GNU gperf</TITLE> 8</HEAD> 9<BODY> 10Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_4.html">previous</A>, <A HREF="gperf_6.html">next</A>, <A HREF="gperf_10.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>. 11<P><HR><P> 12 13 14<H1><A NAME="SEC5" HREF="gperf_toc.html#TOC5">4 High-Level Description of GNU <CODE>gperf</CODE></A></H1> 15 16<P> 17The perfect hash function generator <CODE>gperf</CODE> reads a set of 18���keywords��� from an input file (or from the standard input by 19default). It attempts to derive a perfect hashing function that 20recognizes a member of the <EM>static keyword set</EM> with at most a 21single probe into the lookup table. If <CODE>gperf</CODE> succeeds in 22generating such a function it produces a pair of C source code routines 23that perform hashing and table lookup recognition. All generated C code 24is directed to the standard output. Command-line options described 25below allow you to modify the input and output format to <CODE>gperf</CODE>. 26 27</P> 28<P> 29By default, <CODE>gperf</CODE> attempts to produce time-efficient code, with 30less emphasis on efficient space utilization. However, several options 31exist that permit trading-off execution time for storage space and vice 32versa. In particular, expanding the generated table size produces a 33sparse search structure, generally yielding faster searches. 34Conversely, you can direct <CODE>gperf</CODE> to utilize a C <CODE>switch</CODE> 35statement scheme that minimizes data space storage size. Furthermore, 36using a C <CODE>switch</CODE> may actually speed up the keyword retrieval time 37somewhat. Actual results depend on your C compiler, of course. 38 39</P> 40<P> 41In general, <CODE>gperf</CODE> assigns values to the bytes it is using 42for hashing until some set of values gives each keyword a unique value. 43A helpful heuristic is that the larger the hash value range, the easier 44it is for <CODE>gperf</CODE> to find and generate a perfect hash function. 45Experimentation is the key to getting the most from <CODE>gperf</CODE>. 46 47</P> 48 49 50<H2><A NAME="SEC6" HREF="gperf_toc.html#TOC6">4.1 Input Format to <CODE>gperf</CODE></A></H2> 51<P> 52<A NAME="IDX4"></A> 53<A NAME="IDX5"></A> 54<A NAME="IDX6"></A> 55<A NAME="IDX7"></A> 56You can control the input file format by varying certain command-line 57arguments, in particular the <SAMP>‘-t’</SAMP> option. The input's appearance 58is similar to GNU utilities <CODE>flex</CODE> and <CODE>bison</CODE> (or UNIX 59utilities <CODE>lex</CODE> and <CODE>yacc</CODE>). Here's an outline of the general 60format: 61 62</P> 63 64<PRE> 65declarations 66%% 67keywords 68%% 69functions 70</PRE> 71 72<P> 73<EM>Unlike</EM> <CODE>flex</CODE> or <CODE>bison</CODE>, the declarations section and 74the functions section are optional. The following sections describe the 75input format for each section. 76 77</P> 78 79<P> 80It is possible to omit the declaration section entirely, if the <SAMP>‘-t’</SAMP> 81option is not given. In this case the input file begins directly with the 82first keyword line, e.g.: 83 84</P> 85 86<PRE> 87january 88february 89march 90april 91... 92</PRE> 93 94 95 96<H3><A NAME="SEC7" HREF="gperf_toc.html#TOC7">4.1.1 Declarations</A></H3> 97 98<P> 99The keyword input file optionally contains a section for including 100arbitrary C declarations and definitions, <CODE>gperf</CODE> declarations that 101act like command-line options, as well as for providing a user-supplied 102<CODE>struct</CODE>. 103 104</P> 105 106 107 108<H4><A NAME="SEC8" HREF="gperf_toc.html#TOC8">4.1.1.1 User-supplied <CODE>struct</CODE></A></H4> 109 110<P> 111If the <SAMP>‘-t’</SAMP> option (or, equivalently, the <SAMP>‘%struct-type’</SAMP> declaration) 112<EM>is</EM> enabled, you <EM>must</EM> provide a C <CODE>struct</CODE> as the last 113component in the declaration section from the input file. The first 114field in this struct must be of type <CODE>char *</CODE> or <CODE>const char *</CODE> 115if the <SAMP>‘-P’</SAMP> option is not given, or of type <CODE>int</CODE> if the option 116<SAMP>‘-P’</SAMP> (or, equivalently, the <SAMP>‘%pic’</SAMP> declaration) is enabled. 117This first field must be called <SAMP>‘name’</SAMP>, although it is possible to modify 118its name with the <SAMP>‘-K’</SAMP> option (or, equivalently, the 119<SAMP>‘%define slot-name’</SAMP> declaration) described below. 120 121</P> 122<P> 123Here is a simple example, using months of the year and their attributes as 124input: 125 126</P> 127 128<PRE> 129struct month { char *name; int number; int days; int leap_days; }; 130%% 131january, 1, 31, 31 132february, 2, 28, 29 133march, 3, 31, 31 134april, 4, 30, 30 135may, 5, 31, 31 136june, 6, 30, 30 137july, 7, 31, 31 138august, 8, 31, 31 139september, 9, 30, 30 140october, 10, 31, 31 141november, 11, 30, 30 142december, 12, 31, 31 143</PRE> 144 145<P> 146<A NAME="IDX8"></A> 147Separating the <CODE>struct</CODE> declaration from the list of keywords and 148other fields are a pair of consecutive percent signs, <SAMP>‘%%’</SAMP>, 149appearing left justified in the first column, as in the UNIX utility 150<CODE>lex</CODE>. 151 152</P> 153<P> 154If the <CODE>struct</CODE> has already been declared in an include file, it can 155be mentioned in an abbreviated form, like this: 156 157</P> 158 159<PRE> 160struct month; 161%% 162january, 1, 31, 31 163... 164</PRE> 165 166 167 168<H4><A NAME="SEC9" HREF="gperf_toc.html#TOC9">4.1.1.2 Gperf Declarations</A></H4> 169 170<P> 171The declaration section can contain <CODE>gperf</CODE> declarations. They 172influence the way <CODE>gperf</CODE> works, like command line options do. 173In fact, every such declaration is equivalent to a command line option. 174There are three forms of declarations: 175 176</P> 177 178<OL> 179<LI> 180 181Declarations without argument, like <SAMP>‘%compare-lengths’</SAMP>. 182 183<LI> 184 185Declarations with an argument, like <SAMP>‘%switch=<VAR>count</VAR>’</SAMP>. 186 187<LI> 188 189Declarations of names of entities in the output file, like 190<SAMP>‘%define lookup-function-name <VAR>name</VAR>’</SAMP>. 191</OL> 192 193<P> 194When a declaration is given both in the input file and as a command line 195option, the command-line option's value prevails. 196 197</P> 198<P> 199The following <CODE>gperf</CODE> declarations are available. 200 201</P> 202<DL COMPACT> 203 204<DT><SAMP>‘%delimiters=<VAR>delimiter-list</VAR>’</SAMP> 205<DD> 206<A NAME="IDX9"></A> 207Allows you to provide a string containing delimiters used to 208separate keywords from their attributes. The default is ",". This 209option is essential if you want to use keywords that have embedded 210commas or newlines. 211 212<DT><SAMP>‘%struct-type’</SAMP> 213<DD> 214<A NAME="IDX10"></A> 215Allows you to include a <CODE>struct</CODE> type declaration for generated 216code; see above for an example. 217 218<DT><SAMP>‘%ignore-case’</SAMP> 219<DD> 220<A NAME="IDX11"></A> 221Consider upper and lower case ASCII characters as equivalent. The string 222comparison will use a case insignificant character comparison. Note that 223locale dependent case mappings are ignored. 224 225<DT><SAMP>‘%language=<VAR>language-name</VAR>’</SAMP> 226<DD> 227<A NAME="IDX12"></A> 228Instructs <CODE>gperf</CODE> to generate code in the language specified by the 229option's argument. Languages handled are currently: 230 231<DL COMPACT> 232 233<DT><SAMP>‘KR-C’</SAMP> 234<DD> 235Old-style K&R C. This language is understood by old-style C compilers and 236ANSI C compilers, but ANSI C compilers may flag warnings (or even errors) 237because of lacking <SAMP>‘const’</SAMP>. 238 239<DT><SAMP>‘C’</SAMP> 240<DD> 241Common C. This language is understood by ANSI C compilers, and also by 242old-style C compilers, provided that you <CODE>#define const</CODE> to empty 243for compilers which don't know about this keyword. 244 245<DT><SAMP>‘ANSI-C’</SAMP> 246<DD> 247ANSI C. This language is understood by ANSI C (C89, ISO C90) compilers, 248ISO C99 compilers, and C++ compilers. 249 250<DT><SAMP>‘C++’</SAMP> 251<DD> 252C++. This language is understood by C++ compilers. 253</DL> 254 255The default is C. 256 257<DT><SAMP>‘%define slot-name <VAR>name</VAR>’</SAMP> 258<DD> 259<A NAME="IDX13"></A> 260This declaration is only useful when option <SAMP>‘-t’</SAMP> (or, equivalently, the 261<SAMP>‘%struct-type’</SAMP> declaration) has been given. 262By default, the program assumes the structure component identifier for 263the keyword is <SAMP>‘name’</SAMP>. This option allows an arbitrary choice of 264identifier for this component, although it still must occur as the first 265field in your supplied <CODE>struct</CODE>. 266 267<DT><SAMP>‘%define initializer-suffix <VAR>initializers</VAR>’</SAMP> 268<DD> 269<A NAME="IDX14"></A> 270This declaration is only useful when option <SAMP>‘-t’</SAMP> (or, equivalently, the 271<SAMP>‘%struct-type’</SAMP> declaration) has been given. 272It permits to specify initializers for the structure members following 273<VAR>slot-name</VAR> in empty hash table entries. The list of initializers 274should start with a comma. By default, the emitted code will 275zero-initialize structure members following <VAR>slot-name</VAR>. 276 277<DT><SAMP>‘%define hash-function-name <VAR>name</VAR>’</SAMP> 278<DD> 279<A NAME="IDX15"></A> 280Allows you to specify the name for the generated hash function. Default 281name is <SAMP>‘hash’</SAMP>. This option permits the use of two hash tables in 282the same file. 283 284<DT><SAMP>‘%define lookup-function-name <VAR>name</VAR>’</SAMP> 285<DD> 286<A NAME="IDX16"></A> 287Allows you to specify the name for the generated lookup function. 288Default name is <SAMP>‘in_word_set’</SAMP>. This option permits multiple 289generated hash functions to be used in the same application. 290 291<DT><SAMP>‘%define class-name <VAR>name</VAR>’</SAMP> 292<DD> 293<A NAME="IDX17"></A> 294This option is only useful when option <SAMP>‘-L C++’</SAMP> (or, equivalently, 295the <SAMP>‘%language=C++’</SAMP> declaration) has been given. It 296allows you to specify the name of generated C++ class. Default name is 297<CODE>Perfect_Hash</CODE>. 298 299<DT><SAMP>‘%7bit’</SAMP> 300<DD> 301<A NAME="IDX18"></A> 302This option specifies that all strings that will be passed as arguments 303to the generated hash function and the generated lookup function will 304solely consist of 7-bit ASCII characters (bytes in the range 0..127). 305(Note that the ANSI C functions <CODE>isalnum</CODE> and <CODE>isgraph</CODE> do 306<EM>not</EM> guarantee that a byte is in this range. Only an explicit 307test like <SAMP>‘c >= 'A' && c <= 'Z'’</SAMP> guarantees this.) 308 309<DT><SAMP>‘%compare-lengths’</SAMP> 310<DD> 311<A NAME="IDX19"></A> 312Compare keyword lengths before trying a string comparison. This option 313is mandatory for binary comparisons (see section <A HREF="gperf_5.html#SEC15">4.3 Use of NUL bytes</A>). It also might 314cut down on the number of string comparisons made during the lookup, since 315keywords with different lengths are never compared via <CODE>strcmp</CODE>. 316However, using <SAMP>‘%compare-lengths’</SAMP> might greatly increase the size of the 317generated C code if the lookup table range is large (which implies that 318the switch option <SAMP>‘-S’</SAMP> or <SAMP>‘%switch’</SAMP> is not enabled), since the length 319table contains as many elements as there are entries in the lookup table. 320 321<DT><SAMP>‘%compare-strncmp’</SAMP> 322<DD> 323<A NAME="IDX20"></A> 324Generates C code that uses the <CODE>strncmp</CODE> function to perform 325string comparisons. The default action is to use <CODE>strcmp</CODE>. 326 327<DT><SAMP>‘%readonly-tables’</SAMP> 328<DD> 329<A NAME="IDX21"></A> 330Makes the contents of all generated lookup tables constant, i.e., 331���readonly���. Many compilers can generate more efficient code for this 332by putting the tables in readonly memory. 333 334<DT><SAMP>‘%enum’</SAMP> 335<DD> 336<A NAME="IDX22"></A> 337Define constant values using an enum local to the lookup function rather 338than with #defines. This also means that different lookup functions can 339reside in the same file. Thanks to James Clark <CODE><jjc@ai.mit.edu></CODE>. 340 341<DT><SAMP>‘%includes’</SAMP> 342<DD> 343<A NAME="IDX23"></A> 344Include the necessary system include file, <CODE><string.h></CODE>, at the 345beginning of the code. By default, this is not done; the user must 346include this header file himself to allow compilation of the code. 347 348<DT><SAMP>‘%global-table’</SAMP> 349<DD> 350<A NAME="IDX24"></A> 351Generate the static table of keywords as a static global variable, 352rather than hiding it inside of the lookup function (which is the 353default behavior). 354 355<DT><SAMP>‘%pic’</SAMP> 356<DD> 357<A NAME="IDX25"></A> 358Optimize the generated table for inclusion in shared libraries. This 359reduces the startup time of programs using a shared library containing 360the generated code. If the <SAMP>‘%struct-type’</SAMP> declaration (or, 361equivalently, the option <SAMP>‘-t’</SAMP>) is also given, the first field of the 362user-defined struct must be of type <SAMP>‘int’</SAMP>, not <SAMP>‘char *’</SAMP>, because 363it will contain offsets into the string pool instead of actual strings. 364To convert such an offset to a string, you can use the expression 365<SAMP>‘stringpool + <VAR>o</VAR>’</SAMP>, where <VAR>o</VAR> is the offset. The string pool 366name can be changed through the <SAMP>‘%define string-pool-name’</SAMP> declaration. 367 368<DT><SAMP>‘%define string-pool-name <VAR>name</VAR>’</SAMP> 369<DD> 370<A NAME="IDX26"></A> 371Allows you to specify the name of the generated string pool created by 372the declaration <SAMP>‘%pic’</SAMP> (or, equivalently, the option <SAMP>‘-P’</SAMP>). 373The default name is <SAMP>‘stringpool’</SAMP>. This declaration permits the use of 374two hash tables in the same file, with <SAMP>‘%pic’</SAMP> and even when the 375<SAMP>‘%global-table’</SAMP> declaration (or, equivalently, the option <SAMP>‘-G’</SAMP>) 376is given. 377 378<DT><SAMP>‘%null-strings’</SAMP> 379<DD> 380<A NAME="IDX27"></A> 381Use NULL strings instead of empty strings for empty keyword table entries. 382This reduces the startup time of programs using a shared library containing 383the generated code (but not as much as the declaration <SAMP>‘%pic’</SAMP>), at the 384expense of one more test-and-branch instruction at run time. 385 386<DT><SAMP>‘%define word-array-name <VAR>name</VAR>’</SAMP> 387<DD> 388<A NAME="IDX28"></A> 389Allows you to specify the name for the generated array containing the 390hash table. Default name is <SAMP>‘wordlist’</SAMP>. This option permits the 391use of two hash tables in the same file, even when the option <SAMP>‘-G’</SAMP> 392(or, equivalently, the <SAMP>‘%global-table’</SAMP> declaration) is given. 393 394<DT><SAMP>‘%define length-table-name <VAR>name</VAR>’</SAMP> 395<DD> 396<A NAME="IDX29"></A> 397Allows you to specify the name for the generated array containing the 398length table. Default name is <SAMP>‘lengthtable’</SAMP>. This option permits the 399use of two length tables in the same file, even when the option <SAMP>‘-G’</SAMP> 400(or, equivalently, the <SAMP>‘%global-table’</SAMP> declaration) is given. 401 402<DT><SAMP>‘%switch=<VAR>count</VAR>’</SAMP> 403<DD> 404<A NAME="IDX30"></A> 405Causes the generated C code to use a <CODE>switch</CODE> statement scheme, 406rather than an array lookup table. This can lead to a reduction in both 407time and space requirements for some input files. The argument to this 408option determines how many <CODE>switch</CODE> statements are generated. A 409value of 1 generates 1 <CODE>switch</CODE> containing all the elements, a 410value of 2 generates 2 tables with 1/2 the elements in each 411<CODE>switch</CODE>, etc. This is useful since many C compilers cannot 412correctly generate code for large <CODE>switch</CODE> statements. This option 413was inspired in part by Keith Bostic's original C program. 414 415<DT><SAMP>‘%omit-struct-type’</SAMP> 416<DD> 417<A NAME="IDX31"></A> 418Prevents the transfer of the type declaration to the output file. Use 419this option if the type is already defined elsewhere. 420</DL> 421 422 423 424<H4><A NAME="SEC10" HREF="gperf_toc.html#TOC10">4.1.1.3 C Code Inclusion</A></H4> 425 426<P> 427<A NAME="IDX32"></A> 428<A NAME="IDX33"></A> 429Using a syntax similar to GNU utilities <CODE>flex</CODE> and <CODE>bison</CODE>, it 430is possible to directly include C source text and comments verbatim into 431the generated output file. This is accomplished by enclosing the region 432inside left-justified surrounding <SAMP>‘%{’</SAMP>, <SAMP>‘%}’</SAMP> pairs. Here is 433an input fragment based on the previous example that illustrates this 434feature: 435 436</P> 437 438<PRE> 439%{ 440#include <assert.h> 441/* This section of code is inserted directly into the output. */ 442int return_month_days (struct month *months, int is_leap_year); 443%} 444struct month { char *name; int number; int days; int leap_days; }; 445%% 446january, 1, 31, 31 447february, 2, 28, 29 448march, 3, 31, 31 449... 450</PRE> 451 452 453 454<H3><A NAME="SEC11" HREF="gperf_toc.html#TOC11">4.1.2 Format for Keyword Entries</A></H3> 455 456<P> 457The second input file format section contains lines of keywords and any 458associated attributes you might supply. A line beginning with <SAMP>‘#’</SAMP> 459in the first column is considered a comment. Everything following the 460<SAMP>‘#’</SAMP> is ignored, up to and including the following newline. A line 461beginning with <SAMP>‘%’</SAMP> in the first column is an option declaration and 462must not occur within the keywords section. 463 464</P> 465<P> 466The first field of each non-comment line is always the keyword itself. It 467can be given in two ways: as a simple name, i.e., without surrounding 468string quotation marks, or as a string enclosed in double-quotes, in 469C syntax, possibly with backslash escapes like <CODE>\"</CODE> or <CODE>\234</CODE> 470or <CODE>\xa8</CODE>. In either case, it must start right at the beginning 471of the line, without leading whitespace. 472In this context, a ���field��� is considered to extend up to, but 473not include, the first blank, comma, or newline. Here is a simple 474example taken from a partial list of C reserved words: 475 476</P> 477 478<PRE> 479# These are a few C reserved words, see the c.gperf file 480# for a complete list of ANSI C reserved words. 481unsigned 482sizeof 483switch 484signed 485if 486default 487for 488while 489return 490</PRE> 491 492<P> 493Note that unlike <CODE>flex</CODE> or <CODE>bison</CODE> the first <SAMP>‘%%’</SAMP> marker 494may be elided if the declaration section is empty. 495 496</P> 497<P> 498Additional fields may optionally follow the leading keyword. Fields 499should be separated by commas, and terminate at the end of line. What 500these fields mean is entirely up to you; they are used to initialize the 501elements of the user-defined <CODE>struct</CODE> provided by you in the 502declaration section. If the <SAMP>‘-t’</SAMP> option (or, equivalently, the 503<SAMP>‘%struct-type’</SAMP> declaration) is <EM>not</EM> enabled 504these fields are simply ignored. All previous examples except the last 505one contain keyword attributes. 506 507</P> 508 509 510<H3><A NAME="SEC12" HREF="gperf_toc.html#TOC12">4.1.3 Including Additional C Functions</A></H3> 511 512<P> 513The optional third section also corresponds closely with conventions 514found in <CODE>flex</CODE> and <CODE>bison</CODE>. All text in this section, 515starting at the final <SAMP>‘%%’</SAMP> and extending to the end of the input 516file, is included verbatim into the generated output file. Naturally, 517it is your responsibility to ensure that the code contained in this 518section is valid C. 519 520</P> 521 522 523<H3><A NAME="SEC13" HREF="gperf_toc.html#TOC13">4.1.4 Where to place directives for GNU <CODE>indent</CODE>.</A></H3> 524 525<P> 526If you want to invoke GNU <CODE>indent</CODE> on a <CODE>gperf</CODE> input file, 527you will see that GNU <CODE>indent</CODE> doesn't understand the <SAMP>‘%%’</SAMP>, 528<SAMP>‘%{’</SAMP> and <SAMP>‘%}’</SAMP> directives that control <CODE>gperf</CODE>'s 529interpretation of the input file. Therefore you have to insert some 530directives for GNU <CODE>indent</CODE>. More precisely, assuming the most 531general input file structure 532 533</P> 534 535<PRE> 536declarations part 1 537%{ 538verbatim code 539%} 540declarations part 2 541%% 542keywords 543%% 544functions 545</PRE> 546 547<P> 548you would insert <SAMP>‘*INDENT-OFF*’</SAMP> and <SAMP>‘*INDENT-ON*’</SAMP> comments 549as follows: 550 551</P> 552 553<PRE> 554/* *INDENT-OFF* */ 555declarations part 1 556%{ 557/* *INDENT-ON* */ 558verbatim code 559/* *INDENT-OFF* */ 560%} 561declarations part 2 562%% 563keywords 564%% 565/* *INDENT-ON* */ 566functions 567</PRE> 568 569 570 571<H2><A NAME="SEC14" HREF="gperf_toc.html#TOC14">4.2 Output Format for Generated C Code with <CODE>gperf</CODE></A></H2> 572<P> 573<A NAME="IDX34"></A> 574 575</P> 576<P> 577Several options control how the generated C code appears on the standard 578output. Two C functions are generated. They are called <CODE>hash</CODE> and 579<CODE>in_word_set</CODE>, although you may modify their names with a command-line 580option. Both functions require two arguments, a string, <CODE>char *</CODE> 581<VAR>str</VAR>, and a length parameter, <CODE>int</CODE> <VAR>len</VAR>. Their default 582function prototypes are as follows: 583 584</P> 585<P> 586<DL> 587<DT><U>Function:</U> unsigned int <B>hash</B> <I>(const char * <VAR>str</VAR>, unsigned int <VAR>len</VAR>)</I> 588<DD><A NAME="IDX35"></A> 589By default, the generated <CODE>hash</CODE> function returns an integer value 590created by adding <VAR>len</VAR> to several user-specified <VAR>str</VAR> byte 591positions indexed into an <EM>associated values</EM> table stored in a 592local static array. The associated values table is constructed 593internally by <CODE>gperf</CODE> and later output as a static local C array 594called <SAMP>‘hash_table’</SAMP>. The relevant selected positions (i.e. indices 595into <VAR>str</VAR>) are specified via the <SAMP>‘-k’</SAMP> option when running 596<CODE>gperf</CODE>, as detailed in the <EM>Options</EM> section below (see section <A HREF="gperf_6.html#SEC17">5 Invoking <CODE>gperf</CODE></A>). 597</DL> 598 599</P> 600<P> 601<DL> 602<DT><U>Function:</U> <B>in_word_set</B> <I>(const char * <VAR>str</VAR>, unsigned int <VAR>len</VAR>)</I> 603<DD><A NAME="IDX36"></A> 604If <VAR>str</VAR> is in the keyword set, returns a pointer to that 605keyword. More exactly, if the option <SAMP>‘-t’</SAMP> (or, equivalently, the 606<SAMP>‘%struct-type’</SAMP> declaration) was given, it returns 607a pointer to the matching keyword's structure. Otherwise it returns 608<CODE>NULL</CODE>. 609</DL> 610 611</P> 612<P> 613If the option <SAMP>‘-c’</SAMP> (or, equivalently, the <SAMP>‘%compare-strncmp’</SAMP> 614declaration) is not used, <VAR>str</VAR> must be a NUL terminated 615string of exactly length <VAR>len</VAR>. If <SAMP>‘-c’</SAMP> (or, equivalently, the 616<SAMP>‘%compare-strncmp’</SAMP> declaration) is used, <VAR>str</VAR> must 617simply be an array of <VAR>len</VAR> bytes and does not need to be NUL 618terminated. 619 620</P> 621<P> 622The code generated for these two functions is affected by the following 623options: 624 625</P> 626<DL COMPACT> 627 628<DT><SAMP>‘-t’</SAMP> 629<DD> 630<DT><SAMP>‘--struct-type’</SAMP> 631<DD> 632Make use of the user-defined <CODE>struct</CODE>. 633 634<DT><SAMP>‘-S <VAR>total-switch-statements</VAR>’</SAMP> 635<DD> 636<DT><SAMP>‘--switch=<VAR>total-switch-statements</VAR>’</SAMP> 637<DD> 638<A NAME="IDX37"></A> 639Generate 1 or more C <CODE>switch</CODE> statement rather than use a large, 640(and potentially sparse) static array. Although the exact time and 641space savings of this approach vary according to your C compiler's 642degree of optimization, this method often results in smaller and faster 643code. 644</DL> 645 646<P> 647If the <SAMP>‘-t’</SAMP> and <SAMP>‘-S’</SAMP> options (or, equivalently, the 648<SAMP>‘%struct-type’</SAMP> and <SAMP>‘%switch’</SAMP> declarations) are omitted, the default 649action 650is to generate a <CODE>char *</CODE> array containing the keywords, together with 651additional empty strings used for padding the array. By experimenting 652with the various input and output options, and timing the resulting C 653code, you can determine the best option choices for different keyword 654set characteristics. 655 656</P> 657 658 659<H2><A NAME="SEC15" HREF="gperf_toc.html#TOC15">4.3 Use of NUL bytes</A></H2> 660<P> 661<A NAME="IDX38"></A> 662 663</P> 664<P> 665By default, the code generated by <CODE>gperf</CODE> operates on zero 666terminated strings, the usual representation of strings in C. This means 667that the keywords in the input file must not contain NUL bytes, 668and the <VAR>str</VAR> argument passed to <CODE>hash</CODE> or <CODE>in_word_set</CODE> 669must be NUL terminated and have exactly length <VAR>len</VAR>. 670 671</P> 672<P> 673If option <SAMP>‘-c’</SAMP> (or, equivalently, the <SAMP>‘%compare-strncmp’</SAMP> 674declaration) is used, then the <VAR>str</VAR> argument does not need 675to be NUL terminated. The code generated by <CODE>gperf</CODE> will only 676access the first <VAR>len</VAR>, not <VAR>len+1</VAR>, bytes starting at <VAR>str</VAR>. 677However, the keywords in the input file still must not contain NUL 678bytes. 679 680</P> 681<P> 682If option <SAMP>‘-l’</SAMP> (or, equivalently, the <SAMP>‘%compare-lengths’</SAMP> 683declaration) is used, then the hash table performs binary 684comparison. The keywords in the input file may contain NUL bytes, 685written in string syntax as <CODE>\000</CODE> or <CODE>\x00</CODE>, and the code 686generated by <CODE>gperf</CODE> will treat NUL like any other byte. 687Also, in this case the <SAMP>‘-c’</SAMP> option (or, equivalently, the 688<SAMP>‘%compare-strncmp’</SAMP> declaration) is ignored. 689 690</P> 691 692 693<H2><A NAME="SEC16" HREF="gperf_toc.html#TOC16">4.4 The Copyright of the Output</A></H2> 694<P> 695<A NAME="IDX39"></A> 696 697</P> 698<P> 699<CODE>gperf</CODE> is under GPL, but that does not cause the output produced 700by <CODE>gperf</CODE> to be under GPL. The reason is that the output contains 701only small pieces of text that come directly from <CODE>gperf</CODE>'s source 702code -- only about 7 lines long, too small for being significant --, and 703therefore the output is not a ���work based on <CODE>gperf</CODE>��� (in the 704sense of the GPL version 3). 705 706</P> 707<P> 708On the other hand, the output produced by <CODE>gperf</CODE> contains 709essentially all of the input file. Therefore the output is a 710���derivative work��� of the input (in the sense of U.S. copyright law); 711and its copyright status depends on the copyright of the input. For most 712software licenses, the result is that the the output is under the same 713license, with the same copyright holder, as the input that was passed to 714<CODE>gperf</CODE>. 715 716</P> 717<P><HR><P> 718Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_4.html">previous</A>, <A HREF="gperf_6.html">next</A>, <A HREF="gperf_10.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>. 719</BODY> 720</HTML> 721