ChangeLog revision 18214
1Sat Nov 11 13:43:17 1989 Doug Schmidt (schmidt at zola.ics.uci.edu) 2 3 * Modified options by changing macro TOTAL_POSITIONS to GET_CHARSET_SIZE 4 and SET_CHARSET_SIZE. These two routines now either return 5 the total charset size *or* the length of the largest keyword 6 if the user specifies the -k'*' (ALLCHARS) option. This change 7 cleans up client code. 8 9Fri Nov 10 15:21:19 1989 Doug Schmidt (schmidt at zola.ics.uci.edu) 10 11 * Made sure to explicitly initialize perfect.fewest_collisions to 12 0. 13 14Wed Nov 1 21:39:54 1989 Doug Schmidt (schmidt at zola.ics.uci.edu) 15 16 * Upgraded the version number to 2.0, reflecting all the 17 major recent changes! 18 19 * Rearranged code so that fewer source lines are greater than 80 columns. 20 21 * Cleaned up some loose ends noticed by Nels Olson. 22 1. Removed `if (collisions <= perfect.fewest_collisions)' 23 from affects_prev () since it was superfluous. 24 2. Removed the fields best_char_value and best_asso_value 25 from PERFECT. There were also unnecessary. 26 3. Fixed a braino in boolarray.c's bool_array_reset () 27 function. Since iteration numbers can never be zero 28 the `if (bool_array.iteration_number++ == 0)' must be 29 `if (++bool_array.iteration_number == 0).' 30 4. Broke the -h help options string into 3 pieces. This 31 should silence broken C compilers for a while! 32 5. Modified `report_error ()' so that it correctly handled 33 "%%". 34 35Tue Oct 31 23:01:11 1989 Doug Schmidt (schmidt at siam.ics.uci.edu) 36 37 * It is important to note that -D no longer enables -S. 38 There is a good reason for this change, which will become 39 manifested in the next release... (suspense!). 40 41 * Made some subtle changes to print_switch so that if finally 42 seems to work correctly. Needs more stress testing, however... 43 44Mon Oct 30 15:06:59 1989 Doug Schmidt (schmidt at zola.ics.uci.edu) 45 46 * Made a major change to the keylist.c's print_switch () function. 47 The user can now specify the number of switch statements to generate 48 via an argument to the -S option, i.e., -S1 means `generate 1 49 switch statement with all keywords in it,' -S2 means generate 50 2 switch statements with 1/2 the elements in each one, etc. 51 Hopefully this will fix the problem with C compilers not being 52 able to generate code for giant switch statements (but don't 53 hold your breath!) 54 55 * Changed keylist.c's length () function to keyword_list_length (). 56 Also put an extern decl for this function and max_key_length () 57 into the keylist.h file (don't know why it wasn't already there...). 58 59Sun Oct 29 08:55:29 1989 Doug Schmidt (schmidt at siam.ics.uci.edu) 60 61 * Added a feature to main.c that prints out the starting wall-clock 62 time before the program begins and prints out the ending wall-clock 63 time when the program is finished. 64 65 * Added the GATHER_STATISTICS code in hashtable.c so we can 66 keep track of how well double hashing is doing. Eventually, 67 GATHER_STATISTICS will be added so that all instrumentation 68 code can be conditionally compiled in. 69 70 * Modified xmalloc.c's buffered_malloc () routine so that it 71 rounded the SIZE byte requests up to the correct alignment 72 size for the target machine. 73 74Sat Oct 28 11:17:36 1989 Doug Schmidt (schmidt at glacier.ics.uci.edu) 75 76 * Fixed a stupid bug in keylist.c's print_switch () routine. This 77 was necessary to make sure the generated switch statement worked 78 correctly when *both* `natural,' i.e., static links and dynamic 79 links, i.e., unresolved duplicates, hash to the same value. 80 81 * Modified LIST_NODE so that the char *char_set field is redeclared 82 as char char_set[1] and placed at the bottom of the struct. This 83 enables us to play the old `variable-length string in the struct 84 trick' (third time I fell for it this week, chief!). The 85 main purpose of this is to remove n calls to buffered_malloc, 86 where n is the number of keywords. 87 88 * Added a new function to xmalloc.c called buffered_malloc. This 89 reduces the number of calls to malloc by grabbing large chunks 90 of memory and doling them out in small pieces. Almost all uses 91 of xmalloc were replaced with calls to buffered_malloc (). 92 93 * Modified boolarray.c's bool_array_destroy () function so that 94 it now frees the bool_array.storage_array when it is no longer 95 needed. Since this array is generally very large it makes sense 96 to return the memory to the freelist when it is no longer in use. 97 98 * Changed the interface to hash_table_init. This function is 99 now passed a pointer to a power-of-two sized buffer that serves 100 as storage for the hash table. Although this weakens information 101 hiding a little bit it greatly reduces dynamic memory fragmentation, 102 since we can now obtain the memory via a call to alloca, rather 103 than malloc. This change modified keylist.c's read_keys () calling 104 interface. 105 106 * Since alloca is now being used more aggressively a conditional 107 compilation section was added to the init_all () routine in main.c. 108 Taken from GNU GCC, this code gets rid of any avoidable limit 109 on stack size so that alloca does not fail. It is only used 110 if the -DRLIMIT_STACK symbol is defined when gperf is compiled. 111 112 * Moved the `destructor' for bool_array from destroy_all () in 113 main.c into perfect_generate () in perfect.c. This enables 114 us to get free up the large dynamic memory as soon as it is no 115 longer necessary. Also moved the hash_table destructor from 116 destroy_all into read_keys () from keylist.c. This accomplishes 117 the same purpose, i.e., we can free up the space immediately. 118 119 * Added warnings in option.c so that user's would be informed 120 that -r superceeds -i on the command-line. 121 122 * Rewrote affects_prev () from perfect.c. First, the code structure 123 was cleaned up considerably (removing the need for a dreaded 124 goto!). Secondly, a major change occurred so that affects_prev () 125 returns FALSE (success) when fewest_hits gets down to whatever 126 it was after inserting the previous key (instead of waiting for 127 it to reach 0). In other words, it stops trying if it can 128 resolve the new collisions added by a key, even if there are 129 still other old, unresolved collisions. This modification was 130 suggested by Nels Olson and seems to *greatly* increase the 131 speed of gperf for large keyfiles. Thanks Nels! 132 133 * In a similar vein, inside the change () routine from perfect.c 134 the variable `perfect.fewest_collisions is no longer initialized 135 with the length of the keyword list. Instead it starts out at 136 0 and is incremented by 1 every time change () is called. 137 The rationale for this behavior is that there are times when a 138 collision causes the number of duplicates (collisions) to 139 increase by a large amount when it would presumably just have 140 gone up by 1 if none of the asso_values were changed. That is, 141 at the beginning of change(), you could initialize fewest_hits 142 to 1+(previous value of fewest_hits) instead of to the number of 143 keys. Thanks again, Nels. 144 145 * Replaced alloca with xmalloc in perfect.c's change () function. 146 This should eliminate some overhead at the expense of a little 147 extra memory that is never reclaimed. 148 149 * Renamed perfect.c's merge_sets () to compute_disjoint_union () 150 to reflect the change in behavior. 151 152Fri Oct 27 10:12:27 1989 Doug Schmidt (schmidt at crimee.ics.uci.edu) 153 154 * Added the -e option so users can supply a string containing 155 the characters used to separate keywords from their attributes. 156 The default behavior is ",\n". 157 158 * Removed the char *uniq_set field from LIST_NODE and modified 159 uses of uniq_set in perfect.c and keylist.c. Due to changes 160 to perfect.c's merge_set () described below this field was 161 no longer necessary, and its removal makes the program 162 smaller and potentially faster. 163 164 * Added lots of changes/fixes suggested by Nels Olson 165 (umls.UUCP!olson@mis.ucsf.edu). In particular: 166 1. Changed BOOL_ARRAY so that it would dynamically create 167 an array of unsigned shorts rather than ints if the 168 LO_CAL symbol was defined during program compilation. 169 This cuts the amount of dynamic memory usage in half, 170 which is important for large keyfile input. 171 2. Added some additional debugging statements that print extra 172 info to stderr when the -d option is enabled. 173 3. Fixed a really stupid bug in the print_switch () from keylist.c. 174 A right paren was placed at the wrong location. 175 4. Fixed a subtle problem with printing case values when keylinks 176 appear. The logic failed to account for the fact that there 177 can be keylinks *and* regular node info also! 178 5. Finally split the huge help string into two parts. This keeps 179 breaking compilers with static limits on the length of tokens... 180 6. Modified the -j option so that -j 0 means `try random values 181 when searching for a way to resolve collisions.' 182 7. Added a field `num_done' to the PERFECT struct. This is used 183 to report information collected when trying to resolve 184 hash collisions. 185 8. Modified the merge_sets algorithm to perform a disjoint 186 union of two multisets. This ensures that subsequent 187 processing in perfect.c function affect_prev () doesn't 188 waste time trying to change an associated value that is 189 shared between two conflicting keywords. 190 9. Modified affects_prev so that it doesn't try random jump 191 values unless the -j 0 option is enabled. 192 10. Fixed a silly bug in perfect.c change (). 193 This problem caused gperf to seg fault when 194 the -k* option was given and the keyfile file had long 195 keywords. 196 11. Changed the behavior of keylist.c's read_keys () routine 197 so that it would honor -D unequivocally, i.e., it doesn't 198 try to turn off dup handling if the user requests it, even 199 if there are no immediate links in the keyfile input. 200 201Mon Oct 16 19:58:08 1989 Doug Schmidt (schmidt at glacier.ics.uci.edu) 202 203 * Fixed a number of small bugs kindly brought to my attention by 204 Adam de Boor (bsw!adam@uunet.UU.NET). Thanks Adam! In particular, 205 changed the behavior for the -a (ANSI) option so that the 206 generated prototypes use int rather than size_t for the LEN 207 parameter. It was too ugly having to #include <stddef.h> all 208 over the place... 209 210 * Added a majorly neat hack to Bool_Array, suggested by rfg. 211 The basic idea was to throw away the Ullman array technique. 212 The Ullman array was used to remove the need to reinitialize all 213 the Bool_Array elements to zero everytime we needed to determine 214 whether there were duplicate hash values in the keyword list. 215 The current trick uses an `iteration number' scheme, which takes 216 about 1/3 the space and reduces the overall program running a 217 time by about 20 percent for large input! The hack works as 218 follows: 219 220 1. Dynamically allocate 1 boolean array of size k. 221 2. Initialize the boolean array to zeros, and consider the first 222 iteration to be iteration 1. 223 2. Then on all subsequent iterations we `reset' the bool array by 224 kicking the iteration count by 1. 225 3. When it comes time to check whether a hash value is currently 226 in the boolean array we simply check its index location. If 227 the value stored there is *not* equal to the current iteration 228 number then the item is clearly *not* in the set. In that 229 case we assign the iteration number to that array's index 230 location for future reference. Otherwise, if the item at 231 the index location *is* equal to the iteration number we've 232 found a duplicate. No muss, no fuss! 233 234Thu Oct 12 18:08:43 1989 Doug Schmidt (schmidt at zola.ics.uci.edu) 235 236 * Updated the version number to 1.9. 237 238 * Added support for the -C option. This makes the contents of 239 all generated tables `readonly'. 240 241 * Changed the handling of generated switches so that there is 242 only one call to str[n]?cmp. This *greatly* reduces the size of 243 the generated assembly code on all compilers I've seen. 244 245 * Fixed a subtle bug that occurred when the -l and -S option 246 was given. Code produced looked something like: 247 248 if (len != key_len || !strcmp (s1, resword->name)) return resword; 249 250 which doesn't make any sense. Clearly, this should be: 251 252 if (len == key_len && !strcmp (s1, resword->name)) return resword; 253 254Sat Sep 30 12:55:24 1989 Doug Schmidt (schmidt at glacier.ics.uci.edu) 255 256 * Fixed a stupid bug in Key_List::print_hash_function that manifested 257 itself if the `-k$' option was given (i.e., only use the key[length] 258 character in the hash function). 259 260Mon Jul 24 17:09:46 1989 Doug Schmidt (schmidt at glacier.ics.uci.edu) 261 262 * Fixed a bug in PRINT_MIN_MAX that resulted in MAX_INT being printed 263 for the MIN_KEY_LEN if there was only 1 keyword in the input file 264 (yeah, that's a pretty unlikely occurrence, I agree!). 265 266 * Fixed PRINT_HASH_FUNCTION and PRINT_LOOKUP_FUNCTION in keylist.c 267 so that the generated functions take an unsigned int length argument. 268 If -a is enabled the prototype is (const char *str, size_t len). 269 270Fri Jul 21 13:06:15 1989 Doug Schmidt (schmidt at zola.ics.uci.edu) 271 272 * Fixed a horrible typo in PRINT_KEYWORD_TABLE in keylist.cc 273 that prevented links from being printed correctly. 274 275Sun Jul 9 17:53:28 1989 Doug Schmidt (schmidt at glacier.ics.uci.edu) 276 277 * Changed the ./tests subdirectory Makefile so that it 278 uses $(CC) instead of gcc. 279 280Sun Jul 2 12:14:04 1989 Doug Schmidt (schmidt at glacier.ics.uci.edu) 281 282 * Moved comment handling from keylist.c to readline.c. This 283 simplifies the code and reduces the number of malloc calls. 284 285 * Fixed a number of subtle bugs that occurred when -S was 286 combined with various and sundry options. 287 288 * Added the -G option, that makes the generated keyword table 289 a global static variable, rather than hiding it inside 290 the lookup function. This allows other functions to directly 291 access the contents in this table. 292 293Sat Jul 1 10:12:21 1989 Doug Schmidt (schmidt at crimee.ics.uci.edu) 294 295 * Added the "#" feature, that allows comments inside the keyword 296 list from the input file. 297 298 * Also added the -H option (user can give the name of the hash 299 function) and the -T option (prevents the transfer of the type decl 300 to the output file, which is useful if the type is already defined 301 elsewhere). 302 303Fri Jun 30 18:22:35 1989 Doug Schmidt (schmidt at crimee.ics.uci.edu) 304 305 * Added Adam de Boor's changes. Created an UNSET_OPTION macro. 306 307Sat Jun 17 10:56:00 1989 Doug Schmidt (schmidt at glacier.ics.uci.edu) 308 309 * Modified option.h and option.c so that all mixed operations 310 between integers and enumerals are cast correctly to int. 311 This prevents errors in some brain-damaged C compilers. 312 313Fri Jun 16 14:13:15 1989 Doug Schmidt (schmidt at crimee.ics.uci.edu) 314 315 * Modified the -f (FAST) option. This now takes an argument. 316 The argument corresponds to the number of iterations used 317 to resolve collisions. -f 0 uses the length of the 318 keyword list (which is what -f did before). This makes 319 life much easier when dealing with large keyword files. 320 321Wed Jun 7 23:07:13 1989 Doug Schmidt (schmidt at zola.ics.uci.edu) 322 323 * Updated to version 1.8 in preparation to release to Doug Lea 324 and FSF. 325 326 * Added the -c (comparison) option. Enabling this 327 will use the strncmp function for string comparisons. 328 The default is to use strcmp. 329 330Tue Jun 6 16:32:09 1989 Doug Schmidt (schmidt at zola.ics.uci.edu) 331 332 * Fixed another stupid typo in xmalloc.c (XMALLOC). I accidentally 333 left the ANSI-fied prototype in place. This obviously 334 fails on old-style C compilers. 335 336 * Fixed stupid typos in PRINT_SWITCH from the keylist.c. This 337 caused the -D option to produce incorrect output when used 338 in conjunction with -p and -t. 339 340 * Replaced the use of STRCMP with STRNCMP for the generated 341 C output code. 342 343Fri Jun 2 23:16:01 1989 Doug Schmidt (schmidt at trinite.ics.uci.edu) 344 345 * Added a new function (XMALLOC) and file (xmalloc.c). All 346 calls to MALLOC were replaced by calls to XMALLOC. This 347 will complain when virtual memory runs out (don't laugh, 348 this has happened!) 349 350Thu Jun 1 21:10:10 1989 Doug Schmidt (schmidt at zola.ics.uci.edu) 351 352 * Fixed a typo in options.c that prevented the -f option 353 from being given on the command-line. 354 355Wed May 3 17:48:02 1989 Doug Schmidt (schmidt at zola.ics.uci.edu) 356 357 * Updated to version 1.7. This reflects the recent major changes 358 and the new C port. 359 360 * Fixed a typo in perfect.c perfect_destroy that prevented the actual 361 maximum hash table size from being printed. 362 363 * Added support for the -f option. This generates the perfect 364 hash function ``fast.'' It reduces the execution time of 365 gperf, at the cost of minimizing the range of hash values. 366 367Tue May 2 16:23:29 1989 Doug Schmidt (schmidt at crimee.ics.uci.edu) 368 369 * Enabled the diagnostics dump if the debugging option is enabled. 370 371 * Removed all calls to FREE (silly to do this at program termination). 372 373 * Ported gperf to C. From now on both K&R C and GNU G++ versions 374 will be supported. 375 376