1<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 2 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 3 4<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> 5<head> 6 <meta name="generator" content= 7 "HTML Tidy for Linux/x86 (vers 12 April 2005), see www.w3.org" /> 8 9 <title>Hash-Based Containers</title> 10 <meta http-equiv="Content-Type" content= 11 "text/html; charset=us-ascii" /> 12 </head> 13 14<body> 15 <div id="page"> 16 <h1>Hash Table Design</h1> 17 18 <h2><a name="overview" id="overview">Overview</a></h2> 19 20 <p>The collision-chaining hash-based container has the 21 following declaration.</p> 22 <pre> 23<b>template</b>< 24 <b>typename</b> Key, 25 <b>typename</b> Mapped, 26 <b>typename</b> Hash_Fn = std::hash<Key>, 27 <b>typename</b> Eq_Fn = std::equal_to<Key>, 28 <b>typename</b> Comb_Hash_Fn = <a href= 29"direct_mask_range_hashing.html">direct_mask_range_hashing</a><> 30 <b>typename</b> Resize_Policy = <i>default explained below.</i> 31 <b>bool</b> Store_Hash = <b>false</b>, 32 <b>typename</b> Allocator = std::allocator<<b>char</b>> > 33<b>class</b> <a href= 34"cc_hash_table.html">cc_hash_table</a>; 35</pre> 36 37 <p>The parameters have the following meaning:</p> 38 39 <ol> 40 <li><tt>Key</tt> is the key type.</li> 41 42 <li><tt>Mapped</tt> is the mapped-policy, and is explained in 43 <a href="tutorial.html#assoc_ms">Tutorial::Associative 44 Containers::Associative Containers Others than Maps</a>.</li> 45 46 <li><tt>Hash_Fn</tt> is a key hashing functor.</li> 47 48 <li><tt>Eq_Fn</tt> is a key equivalence functor.</li> 49 50 <li><tt>Comb_Hash_Fn</tt> is a <i>range-hashing_functor</i>; 51 it describes how to translate hash values into positions 52 within the table. This is described in <a href= 53 "#hash_policies">Hash Policies</a>.</li> 54 55 <li><tt>Resize_Policy</tt> describes how a container object 56 should change its internal size. This is described in 57 <a href="#resize_policies">Resize Policies</a>.</li> 58 59 <li><tt>Store_Hash</tt> indicates whether the hash value 60 should be stored with each entry. This is described in 61 <a href="#policy_interaction">Policy Interaction</a>.</li> 62 63 <li><tt>Allocator</tt> is an allocator 64 type.</li> 65 </ol> 66 67 <p>The probing hash-based container has the following 68 declaration.</p> 69 <pre> 70<b>template</b>< 71 <b>typename</b> Key, 72 <b>typename</b> Mapped, 73 <b>typename</b> Hash_Fn = std::hash<Key>, 74 <b>typename</b> Eq_Fn = std::equal_to<Key>, 75 <b>typename</b> Comb_Probe_Fn = <a href= 76"direct_mask_range_hashing.html">direct_mask_range_hashing</a><> 77 <b>typename</b> Probe_Fn = <i>default explained below.</i> 78 <b>typename</b> Resize_Policy = <i>default explained below.</i> 79 <b>bool</b> Store_Hash = <b>false</b>, 80 <b>typename</b> Allocator = std::allocator<<b>char</b>> > 81<b>class</b> <a href= 82"gp_hash_table.html">gp_hash_table</a>; 83</pre> 84 85 <p>The parameters are identical to those of the 86 collision-chaining container, except for the following.</p> 87 88 <ol> 89 <li><tt>Comb_Probe_Fn</tt> describes how to transform a probe 90 sequence into a sequence of positions within the table.</li> 91 92 <li><tt>Probe_Fn</tt> describes a probe sequence policy.</li> 93 </ol> 94 95 <p>Some of the default template values depend on the values of 96 other parameters, and are explained in <a href= 97 "#policy_interaction">Policy Interaction</a>.</p> 98 99 <h2><a name="hash_policies" id="hash_policies">Hash 100 Policies</a></h2> 101 102 <h3><a name="general_terms" id="general_terms">General 103 Terms</a></h3> 104 105 <p>Following is an explanation of some functions which hashing 106 involves. Figure <a href= 107 "#hash_ranged_hash_range_hashing_fns">Hash functions, 108 ranged-hash functions, and range-hashing functions</a>) 109 illustrates the discussion.</p> 110 111 <h6 class="c1"><a name="hash_ranged_hash_range_hashing_fns" id= 112 "hash_ranged_hash_range_hashing_fns"><img src= 113 "hash_ranged_hash_range_hashing_fns.png" alt= 114 "no image" /></a></h6> 115 116 <h6 class="c1">Hash functions, ranged-hash functions, and 117 range-hashing functions.</h6> 118 119 <p>Let <i>U</i> be a domain (<i>e.g.</i>, the integers, or the 120 strings of 3 characters). A hash-table algorithm needs to map 121 elements of <i>U</i> "uniformly" into the range <i>[0,..., m - 122 1]</i> (where <i>m</i> is a non-negative integral value, and 123 is, in general, time varying). <i>I.e.</i>, the algorithm needs 124 a <i>ranged-hash</i> function</p> 125 126 <p><i>f : U × Z<sub>+</sub> → Z<sub>+</sub></i> 127 ,</p> 128 129 <p>such that for any <i>u</i> in <i>U</i> ,</p> 130 131 <p><i>0 ≤ f(u, m) ≤ m - 1</i> ,</p> 132 133 <p>and which has "good uniformity" properties [<a href= 134 "references.html#knuth98sorting">knuth98sorting</a>]. One 135 common solution is to use the composition of the hash 136 function</p> 137 138 <p><i>h : U → Z<sub>+</sub></i> ,</p> 139 140 <p>which maps elements of <i>U</i> into the non-negative 141 integrals, and</p> 142 143 <p class="c2">g : Z<sub>+</sub> × Z<sub>+</sub> → 144 Z<sub>+</sub>,</p> 145 146 <p>which maps a non-negative hash value, and a non-negative 147 range upper-bound into a non-negative integral in the range 148 between 0 (inclusive) and the range upper bound (exclusive), 149 <i>i.e.</i>, for any <i>r</i> in <i>Z<sub>+</sub></i>,</p> 150 151 <p><i>0 ≤ g(r, m) ≤ m - 1</i> .</p> 152 153 <p>The resulting ranged-hash function, is</p> 154 155 <p><i><a name="ranged_hash_composed_of_hash_and_range_hashing" 156 id="ranged_hash_composed_of_hash_and_range_hashing">f(u , m) = 157 g(h(u), m)</a></i> (1) .</p> 158 159 <p>From the above, it is obvious that given <i>g</i> and 160 <i>h</i>, <i>f</i> can always be composed (however the converse 161 is not true). The STL's hash-based containers allow specifying 162 a hash function, and use a hard-wired range-hashing function; 163 the ranged-hash function is implicitly composed.</p> 164 165 <p>The above describes the case where a key is to be mapped 166 into a <i>single position</i> within a hash table, <i>e.g.</i>, 167 in a collision-chaining table. In other cases, a key is to be 168 mapped into a <i>sequence of positions</i> within a table, 169 <i>e.g.</i>, in a probing table. Similar terms apply in this 170 case: the table requires a <i>ranged probe</i> function, 171 mapping a key into a sequence of positions withing the table. 172 This is typically achieved by composing a <i>hash function</i> 173 mapping the key into a non-negative integral type, a 174 <i>probe</i> function transforming the hash value into a 175 sequence of hash values, and a <i>range-hashing</i> function 176 transforming the sequence of hash values into a sequence of 177 positions.</p> 178 179 <h3><a name="range_hashing_fns" id= 180 "range_hashing_fns">Range-Hashing Functions</a></h3> 181 182 <p>Some common choices for range-hashing functions are the 183 division, multiplication, and middle-square methods [<a href= 184 "references.html#knuth98sorting">knuth98sorting</a>], defined 185 as</p> 186 187 <p><i><a name="division_method" id="division_method">g(r, m) = 188 r mod m</a></i> (2) ,</p> 189 190 <p><i>g(r, m) = ⌈ u/v ( a r mod v ) ⌉</i> ,</p> 191 192 <p>and</p> 193 194 <p><i>g(r, m) = ⌈ u/v ( r<sup>2</sup> mod v ) ⌉</i> 195 ,</p> 196 197 <p>respectively, for some positive integrals <i>u</i> and 198 <i>v</i> (typically powers of 2), and some <i>a</i>. Each of 199 these range-hashing functions works best for some different 200 setting.</p> 201 202 <p>The division method <a href="#division_method">(2)</a> is a 203 very common choice. However, even this single method can be 204 implemented in two very different ways. It is possible to 205 implement <a href="#division_method">(2)</a> using the low 206 level <i>%</i> (modulo) operation (for any <i>m</i>), or the 207 low level <i>&</i> (bit-mask) operation (for the case where 208 <i>m</i> is a power of 2), <i>i.e.</i>,</p> 209 210 <p><i><a name="division_method_prime_mod" id= 211 "division_method_prime_mod">g(r, m) = r % m</a></i> (3) ,</p> 212 213 <p>and</p> 214 215 <p><i><a name="division_method_bit_mask" id= 216 "division_method_bit_mask">g(r, m) = r & m - 1, (m = 217 2<sup>k</sup>)</a></i> for some <i>k)</i> (4),</p> 218 219 <p>respectively.</p> 220 221 <p>The <i>%</i> (modulo) implementation <a href= 222 "#division_method_prime_mod">(3)</a> has the advantage that for 223 <i>m</i> a prime far from a power of 2, <i>g(r, m)</i> is 224 affected by all the bits of <i>r</i> (minimizing the chance of 225 collision). It has the disadvantage of using the costly modulo 226 operation. This method is hard-wired into SGI's implementation 227 [<a href="references.html#sgi_stl">sgi_stl</a>].</p> 228 229 <p>The <i>&</i> (bit-mask) implementation <a href= 230 "#division_method_bit_mask">(4)</a> has the advantage of 231 relying on the fast bit-wise and operation. It has the 232 disadvantage that for <i>g(r, m)</i> is affected only by the 233 low order bits of <i>r</i>. This method is hard-wired into 234 Dinkumware's implementation [<a href= 235 "references.html#dinkumware_stl">dinkumware_stl</a>].</p> 236 237 <h3><a name="hash_policies_ranged_hash_policies" id= 238 "hash_policies_ranged_hash_policies">Ranged-Hash 239 Functions</a></h3> 240 241 <p>In cases it is beneficial to allow the 242 client to directly specify a ranged-hash hash function. It is 243 true, that the writer of the ranged-hash function cannot rely 244 on the values of <i>m</i> having specific numerical properties 245 suitable for hashing (in the sense used in [<a href= 246 "references.html#knuth98sorting">knuth98sorting</a>]), since 247 the values of <i>m</i> are determined by a resize policy with 248 possibly orthogonal considerations.</p> 249 250 <p>There are two cases where a ranged-hash function can be 251 superior. The firs is when using perfect hashing [<a href= 252 "references.html#knuth98sorting">knuth98sorting</a>]; the 253 second is when the values of <i>m</i> can be used to estimate 254 the "general" number of distinct values required. This is 255 described in the following.</p> 256 257 <p>Let</p> 258 259 <p class="c2">s = [ s<sub>0</sub>,..., s<sub>t - 1</sub>]</p> 260 261 <p>be a string of <i>t</i> characters, each of which is from 262 domain <i>S</i>. Consider the following ranged-hash 263 function:</p> 264 265 <p><a name="total_string_dna_hash" id= 266 "total_string_dna_hash"><i>f<sub>1</sub>(s, m) = ∑ <sub>i = 267 0</sub><sup>t - 1</sup> s<sub>i</sub> a<sup>i</sup></i> mod 268 <i>m</i></a> (5) ,</p> 269 270 <p>where <i>a</i> is some non-negative integral value. This is 271 the standard string-hashing function used in SGI's 272 implementation (with <i>a = 5</i>) [<a href= 273 "references.html#sgi_stl">sgi_stl</a>]. Its advantage is that 274 it takes into account all of the characters of the string.</p> 275 276 <p>Now assume that <i>s</i> is the string representation of a 277 of a long DNA sequence (and so <i>S = {'A', 'C', 'G', 278 'T'}</i>). In this case, scanning the entire string might be 279 prohibitively expensive. A possible alternative might be to use 280 only the first <i>k</i> characters of the string, where</p> 281 282 <p>|S|<sup>k</sup> ≥ m ,</p> 283 284 <p><i>i.e.</i>, using the hash function</p> 285 286 <p><a name="only_k_string_dna_hash" id= 287 "only_k_string_dna_hash"><i>f<sub>2</sub>(s, m) = ∑ <sub>i 288 = 0</sub><sup>k - 1</sup> s<sub>i</sub> a<sup>i</sup></i> mod 289 <i>m</i></a> , (6)</p> 290 291 <p>requiring scanning over only</p> 292 293 <p><i>k =</i> log<i><sub>4</sub>( m )</i></p> 294 295 <p>characters.</p> 296 297 <p>Other more elaborate hash-functions might scan <i>k</i> 298 characters starting at a random position (determined at each 299 resize), or scanning <i>k</i> random positions (determined at 300 each resize), <i>i.e.</i>, using</p> 301 302 <p><i>f<sub>3</sub>(s, m) = ∑ <sub>i = 303 r</sub>0</i><sup>r<sub>0</sub> + k - 1</sup> s<sub>i</sub> 304 a<sup>i</sup> mod <i>m</i> ,</p> 305 306 <p>or</p> 307 308 <p><i>f<sub>4</sub>(s, m) = ∑ <sub>i = 0</sub><sup>k - 309 1</sup> s<sub>r</sub>i</i> a<sup>r<sub>i</sub></sup> mod 310 <i>m</i> ,</p> 311 312 <p>respectively, for <i>r<sub>0</sub>,..., r<sub>k-1</sub></i> 313 each in the (inclusive) range <i>[0,...,t-1]</i>.</p> 314 315 <p>It should be noted that the above functions cannot be 316 decomposed as <a href= 317 "#ranged_hash_composed_of_hash_and_range_hashing">(1)</a> .</p> 318 319 <h3><a name="pb_ds_imp" id="pb_ds_imp">Implementation</a></h3> 320 321 <p>This sub-subsection describes the implementation of the 322 above in <tt>pb_ds</tt>. It first explains range-hashing 323 functions in collision-chaining tables, then ranged-hash 324 functions in collision-chaining tables, then probing-based 325 tables, and, finally, lists the relevant classes in 326 <tt>pb_ds</tt>.</p> 327 328 <h4>Range-Hashing and Ranged-Hashes in Collision-Chaining 329 Tables</h4> 330 331 <p><a href= 332 "cc_hash_table.html"><tt>cc_hash_table</tt></a> is 333 parametrized by <tt>Hash_Fn</tt> and <tt>Comb_Hash_Fn</tt>, a 334 hash functor and a combining hash functor, respectively.</p> 335 336 <p>In general, <tt>Comb_Hash_Fn</tt> is considered a 337 range-hashing functor. <a href= 338 "cc_hash_table.html"><tt>cc_hash_table</tt></a> 339 synthesizes a ranged-hash function from <tt>Hash_Fn</tt> and 340 <tt>Comb_Hash_Fn</tt> (see <a href= 341 "#ranged_hash_composed_of_hash_and_range_hashing">(1)</a> 342 above). Figure <a href="#hash_range_hashing_seq_diagram">Insert 343 hash sequence diagram</a> shows an <tt>insert</tt> sequence 344 diagram for this case. The user inserts an element (point A), 345 the container transforms the key into a non-negative integral 346 using the hash functor (points B and C), and transforms the 347 result into a position using the combining functor (points D 348 and E).</p> 349 350 <h6 class="c1"><a name="hash_range_hashing_seq_diagram" id= 351 "hash_range_hashing_seq_diagram"><img src= 352 "hash_range_hashing_seq_diagram.png" alt="no image" /></a></h6> 353 354 <h6 class="c1">Insert hash sequence diagram.</h6> 355 356 <p>If <a href= 357 "cc_hash_table.html"><tt>cc_hash_table</tt></a>'s 358 hash-functor, <tt>Hash_Fn</tt> is instantiated by <a href= 359 "null_hash_fn.html"><tt>null_hash_fn</tt></a> (see <a href= 360 "concepts.html#concepts_null_policies">Interface::Concepts::Null 361 Policy Classes</a>), then <tt>Comb_Hash_Fn</tt> is taken to be 362 a ranged-hash function. Figure <a href= 363 "#hash_range_hashing_seq_diagram2">Insert hash sequence diagram 364 with a null hash policy</a> shows an <tt>insert</tt> sequence 365 diagram. The user inserts an element (point A), the container 366 transforms the key into a position using the combining functor 367 (points B and C).</p> 368 369 <h6 class="c1"><a name="hash_range_hashing_seq_diagram2" id= 370 "hash_range_hashing_seq_diagram2"><img src= 371 "hash_range_hashing_seq_diagram2.png" alt= 372 "no image" /></a></h6> 373 374 <h6 class="c1">Insert hash sequence diagram with a null hash 375 policy.</h6> 376 377 <h4>Probing Tables</h4> 378 379 <p><a href= 380 "gp_hash_table.html"></a><tt>gp_hash_table</tt> is 381 parametrized by <tt>Hash_Fn</tt>, <tt>Probe_Fn</tt>, and 382 <tt>Comb_Probe_Fn</tt>. As before, if <tt>Hash_Fn</tt> and 383 <tt>Probe_Fn</tt> are, respectively, <a href= 384 "null_hash_fn.html"><tt>null_hash_fn</tt></a> and <a href= 385 "null_probe_fn.html"><tt>null_probe_fn</tt></a>, then 386 <tt>Comb_Probe_Fn</tt> is a ranged-probe functor. Otherwise, 387 <tt>Hash_Fn</tt> is a hash functor, <tt>Probe_Fn</tt> is a 388 functor for offsets from a hash value, and 389 <tt>Comb_Probe_Fn</tt> transforms a probe sequence into a 390 sequence of positions within the table.</p> 391 392 <h4>Pre-Defined Policies</h4> 393 394 <p><tt>pb_ds</tt> contains some pre-defined classes 395 implementing range-hashing and probing functions:</p> 396 397 <ol> 398 <li><a href= 399 "direct_mask_range_hashing.html"><tt>direct_mask_range_hashing</tt></a> 400 and <a href= 401 "direct_mod_range_hashing.html"><tt>direct_mod_range_hashing</tt></a> 402 are range-hashing functions based on a bit-mask and a modulo 403 operation, respectively.</li> 404 405 <li><a href= 406 "linear_probe_fn.html"><tt>linear_probe_fn</tt></a>, and 407 <a href= 408 "quadratic_probe_fn.html"><tt>quadratic_probe_fn</tt></a> are 409 a linear probe and a quadratic probe function, 410 respectively.</li> 411 </ol>Figure <a href="#hash_policy_cd">Hash policy class 412 diagram</a> shows a class diagram. 413 414 <h6 class="c1"><a name="hash_policy_cd" id= 415 "hash_policy_cd"><img src="hash_policy_cd.png" alt= 416 "no image" /></a></h6> 417 418 <h6 class="c1">Hash policy class diagram.</h6> 419 420 <h2><a name="resize_policies" id="resize_policies">Resize 421 Policies</a></h2> 422 423 <h3><a name="general" id="general">General Terms</a></h3> 424 425 <p>Hash-tables, as opposed to trees, do not naturally grow or 426 shrink. It is necessary to specify policies to determine how 427 and when a hash table should change its size. Usually, resize 428 policies can be decomposed into orthogonal policies:</p> 429 430 <ol> 431 <li>A <i>size policy</i> indicating <i>how</i> a hash table 432 should grow (<i>e.g.,</i> it should multiply by powers of 433 2).</li> 434 435 <li>A <i>trigger policy</i> indicating <i>when</i> a hash 436 table should grow (<i>e.g.,</i> a load factor is 437 exceeded).</li> 438 </ol> 439 440 <h3><a name="size_policies" id="size_policies">Size 441 Policies</a></h3> 442 443 <p>Size policies determine how a hash table changes size. These 444 policies are simple, and there are relatively few sensible 445 options. An exponential-size policy (with the initial size and 446 growth factors both powers of 2) works well with a mask-based 447 range-hashing function (see <a href= 448 "#hash_policies">Range-Hashing Policies</a>), and is the 449 hard-wired policy used by Dinkumware [<a href= 450 "references.html#dinkumware_stl">dinkumware_stl</a>]. A 451 prime-list based policy works well with a modulo-prime range 452 hashing function (see <a href="#hash_policies">Range-Hashing 453 Policies</a>), and is the hard-wired policy used by SGI's 454 implementation [<a href= 455 "references.html#sgi_stl">sgi_stl</a>].</p> 456 457 <h3><a name="trigger_policies" id="trigger_policies">Trigger 458 Policies</a></h3> 459 460 <p>Trigger policies determine when a hash table changes size. 461 Following is a description of two policies: <i>load-check</i> 462 policies, and collision-check policies.</p> 463 464 <p>Load-check policies are straightforward. The user specifies 465 two factors, <i>α<sub>min</sub></i> and 466 <i>α<sub>max</sub></i>, and the hash table maintains the 467 invariant that</p> 468 469 <p><i><a name="load_factor_min_max" id= 470 "load_factor_min_max">α<sub>min</sub> ≤ (number of 471 stored elements) / (hash-table size) ≤ 472 α<sub>max</sub></a></i> (1) .</p> 473 474 <p>Collision-check policies work in the opposite direction of 475 load-check policies. They focus on keeping the number of 476 collisions moderate and hoping that the size of the table will 477 not grow very large, instead of keeping a moderate load-factor 478 and hoping that the number of collisions will be small. A 479 maximal collision-check policy resizes when the longest 480 probe-sequence grows too large.</p> 481 482 <p>Consider Figure <a href="#balls_and_bins">Balls and 483 bins</a>. Let the size of the hash table be denoted by 484 <i>m</i>, the length of a probe sequence be denoted by 485 <i>k</i>, and some load factor be denoted by α. We would 486 like to calculate the minimal length of <i>k</i>, such that if 487 there were <i>α m</i> elements in the hash table, a probe 488 sequence of length <i>k</i> would be found with probability at 489 most <i>1/m</i>.</p> 490 491 <h6 class="c1"><a name="balls_and_bins" id= 492 "balls_and_bins"><img src="balls_and_bins.png" alt= 493 "no image" /></a></h6> 494 495 <h6 class="c1">Balls and bins.</h6> 496 497 <p>Denote the probability that a probe sequence of length 498 <i>k</i> appears in bin <i>i</i> by <i>p<sub>i</sub></i>, the 499 length of the probe sequence of bin <i>i</i> by 500 <i>l<sub>i</sub></i>, and assume uniform distribution. Then</p> 501 502 <p><a name="prob_of_p1" id= 503 "prob_of_p1"><i>p<sub>1</sub></i></a> = (3)</p> 504 505 <p class="c2"><b>P</b>(l<sub>1</sub> ≥ k) =</p> 506 507 <p><i><b>P</b>(l<sub>1</sub> ≥ α ( 1 + k / α - 1 508 ) ≤</i> (a)</p> 509 510 <p><i>e ^ ( - ( α ( k / α - 1 )<sup>2</sup> ) /2 511 )</i> ,</p> 512 513 <p>where (a) follows from the Chernoff bound [<a href= 514 "references.html#motwani95random">motwani95random</a>]. To 515 calculate the probability that <i>some</i> bin contains a probe 516 sequence greater than <i>k</i>, we note that the 517 <i>l<sub>i</sub></i> are negatively-dependent [<a href= 518 "references.html#dubhashi98neg">dubhashi98neg</a>]. Let 519 <i><b>I</b>(.)</i> denote the indicator function. Then</p> 520 521 <p><a name="at_least_k_i_n_some_bin" id= 522 "at_least_k_i_n_some_bin"><i><b>P</b>( exists<sub>i</sub> 523 l<sub>i</sub> ≥ k ) =</i> (3)</a></p> 524 525 <p class="c2"><b>P</b> ( ∑ <sub>i = 1</sub><sup>m</sup> 526 <b>I</b>(l<sub>i</sub> ≥ k) ≥ 1 ) =</p> 527 528 <p><i><b>P</b> ( ∑ <sub>i = 1</sub><sup>m</sup> <b>I</b> ( 529 l<sub>i</sub> ≥ k ) ≥ m p<sub>1</sub> ( 1 + 1 / (m 530 p<sub>1</sub>) - 1 ) ) ≤</i> (a)</p> 531 532 <p class="c2">e ^ ( ( - m p<sub>1</sub> ( 1 / (m p<sub>1</sub>) 533 - 1 ) <sup>2</sup> ) / 2 ) ,</p> 534 535 <p>where (a) follows from the fact that the Chernoff bound can 536 be applied to negatively-dependent variables [<a href= 537 "references.html#dubhashi98neg">dubhashi98neg</a>]. Inserting 538 <a href="#prob_of_p1">(2)</a> into <a href= 539 "#at_least_k_i_n_some_bin">(3)</a>, and equating with 540 <i>1/m</i>, we obtain</p> 541 542 <p><i>k ~ √ ( 2 α</i> ln <i>2 m</i> ln<i>(m) ) 543 )</i> .</p> 544 545 <h3><a name="imp_pb_ds" id="imp_pb_ds">Implementation</a></h3> 546 547 <p>This sub-subsection describes the implementation of the 548 above in <tt>pb_ds</tt>. It first describes resize policies and 549 their decomposition into trigger and size policies, then 550 describes pre-defined classes, and finally discusses controlled 551 access the policies' internals.</p> 552 553 <h4>Resize Policies and Their Decomposition</h4> 554 555 <p>Each hash-based container is parametrized by a 556 <tt>Resize_Policy</tt> parameter; the container derives 557 <tt><b>public</b></tt>ly from <tt>Resize_Policy</tt>. For 558 example:</p> 559 <pre> 560<a href="cc_hash_table.html">cc_hash_table</a>< 561 <b>typename</b> Key, 562 <b>typename</b> Mapped, 563 ... 564 <b>typename</b> Resize_Policy 565 ...> : 566 <b>public</b> Resize_Policy 567</pre> 568 569 <p>As a container object is modified, it continuously notifies 570 its <tt>Resize_Policy</tt> base of internal changes 571 (<i>e.g.</i>, collisions encountered and elements being 572 inserted). It queries its <tt>Resize_Policy</tt> base whether 573 it needs to be resized, and if so, to what size.</p> 574 575 <p>Figure <a href="#insert_resize_sequence_diagram1">Insert 576 resize sequence diagram</a> shows a (possible) sequence diagram 577 of an insert operation. The user inserts an element; the hash 578 table notifies its resize policy that a search has started 579 (point A); in this case, a single collision is encountered - 580 the table notifies its resize policy of this (point B); the 581 container finally notifies its resize policy that the search 582 has ended (point C); it then queries its resize policy whether 583 a resize is needed, and if so, what is the new size (points D 584 to G); following the resize, it notifies the policy that a 585 resize has completed (point H); finally, the element is 586 inserted, and the policy notified (point I).</p> 587 588 <h6 class="c1"><a name="insert_resize_sequence_diagram1" id= 589 "insert_resize_sequence_diagram1"><img src= 590 "insert_resize_sequence_diagram1.png" alt= 591 "no image" /></a></h6> 592 593 <h6 class="c1">Insert resize sequence diagram.</h6> 594 595 <p>In practice, a resize policy can be usually orthogonally 596 decomposed to a size policy and a trigger policy. Consequently, 597 the library contains a single class for instantiating a resize 598 policy: <a href= 599 "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a> 600 is parametrized by <tt>Size_Policy</tt> and 601 <tt>Trigger_Policy</tt>, derives <tt><b>public</b></tt>ly from 602 both, and acts as a standard delegate [<a href= 603 "references.html#gamma95designpatterns">gamma95designpatterns</a>] 604 to these policies.</p> 605 606 <p>Figures <a href="#insert_resize_sequence_diagram2">Standard 607 resize policy trigger sequence diagram</a> and <a href= 608 "#insert_resize_sequence_diagram3">Standard resize policy size 609 sequence diagram</a> show sequence diagrams illustrating the 610 interaction between the standard resize policy and its trigger 611 and size policies, respectively.</p> 612 613 <h6 class="c1"><a name="insert_resize_sequence_diagram2" id= 614 "insert_resize_sequence_diagram2"><img src= 615 "insert_resize_sequence_diagram2.png" alt= 616 "no image" /></a></h6> 617 618 <h6 class="c1">Standard resize policy trigger sequence 619 diagram.</h6> 620 621 <h6 class="c1"><a name="insert_resize_sequence_diagram3" id= 622 "insert_resize_sequence_diagram3"><img src= 623 "insert_resize_sequence_diagram3.png" alt= 624 "no image" /></a></h6> 625 626 <h6 class="c1">Standard resize policy size sequence 627 diagram.</h6> 628 629 <h4>Pre-Defined Policies</h4> 630 631 <p>The library includes the following 632 instantiations of size and trigger policies:</p> 633 634 <ol> 635 <li><a href= 636 "hash_load_check_resize_trigger.html"><tt>hash_load_check_resize_trigger</tt></a> 637 implements a load check trigger policy.</li> 638 639 <li><a href= 640 "cc_hash_max_collision_check_resize_trigger.html"><tt>cc_hash_max_collision_check_resize_trigger</tt></a> 641 implements a collision check trigger policy.</li> 642 643 <li><a href= 644 "hash_exponential_size_policy.html"><tt>hash_exponential_size_policy</tt></a> 645 implements an exponential-size policy (which should be used 646 with mask range hashing).</li> 647 648 <li><a href= 649 "hash_prime_size_policy.html"><tt>hash_prime_size_policy</tt></a> 650 implementing a size policy based on a sequence of primes 651 [<a href="references.html#sgi_stl">sgi_stl</a>] (which should 652 be used with mod range hashing</li> 653 </ol> 654 655 <p>Figure <a href="#resize_policy_cd">Resize policy class 656 diagram</a> gives an overall picture of the resize-related 657 classes. <a href= 658 "basic_hash_table.html"><tt>basic_hash_table</tt></a> 659 is parametrized by <tt>Resize_Policy</tt>, which it subclasses 660 publicly. This class is currently instantiated only by <a href= 661 "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>. 662 <a href= 663 "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a> 664 itself is parametrized by <tt>Trigger_Policy</tt> and 665 <tt>Size_Policy</tt>. Currently, <tt>Trigger_Policy</tt> is 666 instantiated by <a href= 667 "hash_load_check_resize_trigger.html"><tt>hash_load_check_resize_trigger</tt></a>, 668 or <a href= 669 "cc_hash_max_collision_check_resize_trigger.html"><tt>cc_hash_max_collision_check_resize_trigger</tt></a>; 670 <tt>Size_Policy</tt> is instantiated by <a href= 671 "hash_exponential_size_policy.html"><tt>hash_exponential_size_policy</tt></a>, 672 or <a href= 673 "hash_prime_size_policy.html"><tt>hash_prime_size_policy</tt></a>.</p> 674 675 <h6 class="c1"><a name="resize_policy_cd" id= 676 "resize_policy_cd"><img src="resize_policy_cd.png" alt= 677 "no image" /></a></h6> 678 679 <h6 class="c1">Resize policy class diagram.</h6> 680 681 <h4>Controlled Access to Policies' Internals</h4> 682 683 <p>There are cases where (controlled) access to resize 684 policies' internals is beneficial. <i>E.g.</i>, it is sometimes 685 useful to query a hash-table for the table's actual size (as 686 opposed to its <tt>size()</tt> - the number of values it 687 currently holds); it is sometimes useful to set a table's 688 initial size, externally resize it, or change load factors.</p> 689 690 <p>Clearly, supporting such methods both decreases the 691 encapsulation of hash-based containers, and increases the 692 diversity between different associative-containers' interfaces. 693 Conversely, omitting such methods can decrease containers' 694 flexibility.</p> 695 696 <p>In order to avoid, to the extent possible, the above 697 conflict, the hash-based containers themselves do not address 698 any of these questions; this is deferred to the resize policies, 699 which are easier to change or replace. Thus, for example, 700 neither <a href= 701 "cc_hash_table.html"><tt>cc_hash_table</tt></a> nor 702 <a href= 703 "gp_hash_table.html"><tt>gp_hash_table</tt></a> 704 contain methods for querying the actual size of the table; this 705 is deferred to <a href= 706 "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>.</p> 707 708 <p>Furthermore, the policies themselves are parametrized by 709 template arguments that determine the methods they support 710 ([<a href= 711 "references.html#alexandrescu01modern">alexandrescu01modern</a>] 712 shows techniques for doing so). <a href= 713 "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a> 714 is parametrized by <tt>External_Size_Access</tt> that 715 determines whether it supports methods for querying the actual 716 size of the table or resizing it. <a href= 717 "hash_load_check_resize_trigger.html"><tt>hash_load_check_resize_trigger</tt></a> 718 is parametrized by <tt>External_Load_Access</tt> that 719 determines whether it supports methods for querying or 720 modifying the loads. <a href= 721 "cc_hash_max_collision_check_resize_trigger.html"><tt>cc_hash_max_collision_check_resize_trigger</tt></a> 722 is parametrized by <tt>External_Load_Access</tt> that 723 determines whether it supports methods for querying the 724 load.</p> 725 726 <p>Some operations, for example, resizing a container at 727 run time, or changing the load factors of a load-check trigger 728 policy, require the container itself to resize. As mentioned 729 above, the hash-based containers themselves do not contain 730 these types of methods, only their resize policies. 731 Consequently, there must be some mechanism for a resize policy 732 to manipulate the hash-based container. As the hash-based 733 container is a subclass of the resize policy, this is done 734 through virtual methods. Each hash-based container has a 735 <tt><b>private</b></tt> <tt><b>virtual</b></tt> method:</p> 736 <pre> 737<b>virtual void</b> 738 do_resize 739 (size_type new_size); 740</pre> 741 742 <p>which resizes the container. Implementations of 743 <tt>Resize_Policy</tt> can export public methods for resizing 744 the container externally; these methods internally call 745 <tt>do_resize</tt> to resize the table.</p> 746 747 <h2><a name="policy_interaction" id="policy_interaction">Policy 748 Interaction</a></h2> 749 750 <p>Hash-tables are unfortunately especially susceptible to 751 choice of policies. One of the more complicated aspects of this 752 is that poor combinations of good policies can form a poor 753 container. Following are some considerations.</p> 754 755 <h3><a name="policy_interaction_probe_size_trigger" id= 756 "policy_interaction_probe_size_trigger">Probe Policies, Size 757 Policies, and Trigger Policies</a></h3> 758 759 <p>Some combinations do not work well for probing containers. 760 For example, combining a quadratic probe policy with an 761 exponential size policy can yield a poor container: when an 762 element is inserted, a trigger policy might decide that there 763 is no need to resize, as the table still contains unused 764 entries; the probe sequence, however, might never reach any of 765 the unused entries.</p> 766 767 <p>Unfortunately, <tt>pb_ds</tt> cannot detect such problems at 768 compilation (they are halting reducible). It therefore defines 769 an exception class <a href= 770 "insert_error.html"><tt>insert_error</tt></a> to throw an 771 exception in this case.</p> 772 773 <h3><a name="policy_interaction_hash_trigger" id= 774 "policy_interaction_hash_trigger">Hash Policies and Trigger 775 Policies</a></h3> 776 777 <p>Some trigger policies are especially susceptible to poor 778 hash functions. Suppose, as an extreme case, that the hash 779 function transforms each key to the same hash value. After some 780 inserts, a collision detecting policy will always indicate that 781 the container needs to grow.</p> 782 783 <p>The library, therefore, by design, limits each operation to 784 one resize. For each <tt>insert</tt>, for example, it queries 785 only once whether a resize is needed.</p> 786 787 <h3><a name="policy_interaction_eq_sth_hash" id= 788 "policy_interaction_eq_sth_hash">Equivalence Functors, Storing 789 Hash Values, and Hash Functions</a></h3> 790 791 <p><a href= 792 "cc_hash_table.html"><tt>cc_hash_table</tt></a> and 793 <a href= 794 "gp_hash_table.html"><tt>gp_hash_table</tt></a> are 795 parametrized by an equivalence functor and by a 796 <tt>Store_Hash</tt> parameter. If the latter parameter is 797 <tt><b>true</b></tt>, then the container stores with each entry 798 a hash value, and uses this value in case of collisions to 799 determine whether to apply a hash value. This can lower the 800 cost of collision for some types, but increase the cost of 801 collisions for other types.</p> 802 803 <p>If a ranged-hash function or ranged probe function is 804 directly supplied, however, then it makes no sense to store the 805 hash value with each entry. <tt>pb_ds</tt>'s container will 806 fail at compilation, by design, if this is attempted.</p> 807 808 <h3><a name="policy_interaction_size_load_check" id= 809 "policy_interaction_size_load_check">Size Policies and 810 Load-Check Trigger Policies</a></h3> 811 812 <p>Assume a size policy issues an increasing sequence of sizes 813 <i>a, a q, a q<sup>1</sup>, a q<sup>2</sup>, ...</i> For 814 example, an exponential size policy might issue the sequence of 815 sizes <i>8, 16, 32, 64, ...</i></p> 816 817 <p>If a load-check trigger policy is used, with loads 818 <i>α<sub>min</sub></i> and <i>α<sub>max</sub></i>, 819 respectively, then it is a good idea to have:</p> 820 821 <ol> 822 <li><i>α<sub>max</sub> ~ 1 / q</i></li> 823 824 <li><i>α<sub>min</sub> < 1 / (2 q)</i></li> 825 </ol> 826 827 <p>This will ensure that the amortized hash cost of each 828 modifying operation is at most approximately 3.</p> 829 830 <p><i>α<sub>min</sub> ~ α<sub>max</sub></i> is, in 831 any case, a bad choice, and <i>α<sub>min</sub> > 832 α<sub>max</sub></i> is horrendous.</p> 833 </div> 834</body> 835</html> 836