197403Sobrien<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> 297403Sobrien<html> 3132720Skan <head> 497403Sobrien <title>Hash-Based Containers</title> 597403Sobrien <meta name="GENERATOR" content="Microsoft Visual Studio .NET 7.1"> 697403Sobrien <meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5"> 797403Sobrien </head> 897403Sobrien 997403Sobrien<body bgcolor = "white"> 1097403Sobrien 1197403Sobrien<h1>Hash-Based Containers</h1> 1297403Sobrien 1397403Sobrien<p> 1497403Sobrien This section describes hash-based containers. It is organized 1597403Sobrienas follows. 1697403Sobrien</p> 1797403Sobrien 18169691Skan<ol> 1997403Sobrien <li> 2097403Sobrien <a href = "#overview">Overview</a> is an overview. 2197403Sobrien </li> 2297403Sobrien <li> 2397403Sobrien <a href = "#hash_policies">Hash Policies</a> discusses 2497403Sobrien hash policies. 2597403Sobrien </li> 2697403Sobrien <li> 2797403Sobrien <a href = "#resize_policies">Resize Policies</a> discusses 2897403Sobrien resize policies. 2997403Sobrien </li> 3097403Sobrien <li> 3197403Sobrien <a href = "#policy_interaction">Policy Interaction</a> discusses 3297403Sobrien interaction between policies. 3397403Sobrien </li> 34132720Skan</ol> 35132720Skan 3697403Sobrien 3797403Sobrien 3897403Sobrien<h2><a name = "overview">Overview</a></h2> 3997403Sobrien 4097403Sobrien 4197403Sobrien<p> 42 Figure 43<a href = "#hash_cd">Hash-based containers</a> 44 shows the container-hierarchy; the hash-based containers are circled. 45<a href = "cc_hash_assoc_cntnr.html"><tt>cc_hash_assoc_cntnr</tt></a> 46is a collision-chaining hash-based container; 47<a href = "gp_hash_assoc_cntnr.html"><tt>gp_hash_assoc_cntnr</tt></a> 48is a (general) probing hash-based container. 49</p> 50 51<h6 align = "center"> 52<a name = "hash_cd"> 53<img src = "hash_cd.jpg" width = "70%" alt = "no image"> 54</h6> 55<h6 align = "center"> 56</a> 57Hash-based containers. 58</h6> 59 60<p> 61 The collision-chaining hash-based container has the following declaration. 62</p> 63<pre> 64<b>template</b>< 65 <b>typename</b> Key, 66 <b>typename</b> Data, 67 <b>class</b> Hash_Fn = std::hash<Key>, 68 <b>class</b> Eq_Fn = std::equal_to<Key>, 69 <b>class</b> Comb_Hash_Fn = 70 <a href = "direct_mask_range_hashing.html">direct_mask_range_hashing</a><> 71 <b>class</b> Resize_Policy = <i>default explained below.</i> 72 <b>bool</b> Store_Hash = <b>false</b>, 73 <b>class</b> Allocator = 74 std::allocator<<b>char</b>> > 75<b>class</b> <a href = "cc_hash_assoc_cntnr.html">cc_hash_assoc_cntnr</a>; 76</pre> 77 78<p> 79 The parameters have the following meaning: 80</p> 81<ol> 82 <li> <tt>Key</tt> is the key type. 83 </li> 84 <li> <tt>Data</tt> is the data-policy, and is explained in 85<a href = "ms_gen.html#ds_policy">Mapping-Semantics Genericity::Data Types as a Policy</a>. 86 </li> 87 <li> <tt>Hash_Fn</tt> is a key hashing functor.</li> 88 <li> <tt>Eq_Fn</tt> is a key equivalence functor.</li> 89 <li> <tt>Comb_Hash_Fn</tt> is a <i>range-hashing_functor</i>; it 90describes how to translate hash values into positions within the table. 91This is described in 92<a name = "#hash_policies">Hash Policies</a>.</li> 93 </li> 94 <li> <tt>Resize_Policy</tt> describes how a container object should 95change its internal size. This is described in 96<a name = #resize_policies">Resize Policies</a>.</li> 97 <li> <tt>Store_Hash</tt> indicates whether the hash value should 98be stored with each entry. This is described in 99<a name = "#policy_interaction">Policy Interaction</a>.</li> 100 <li> <tt>Allocator</tt> is (surprisingly) an allocator type. 101 </li> 102</ol> 103 104<p> 105 The probing hash-based container has the following declaration. 106</p> 107<pre> 108<b>template</b>< 109 <b>typename</b> Key, 110 <b>typename</b> Data, 111 <b>class</b> Hash_Fn = 112 std::hash< 113 Key>, 114 <b>class</b> Eq_Fn = 115 std::equal_to< 116 Key>, 117 <b>class</b> Comb_Probe_Fn = 118 <a href = "direct_mask_range_hashing.html">direct_mask_range_hashing</a><> 119 <b>class</b> Probe_Fn = <i>default explained below.</i> 120 <b>class</b> Resize_Policy = <i>default explained below.</i> 121 <b>bool</b> Store_Hash = <b>false</b>, 122 <b>class</b> Allocator = 123 std::allocator<<b>char</b>> > 124<b>class</b> <a href = "gp_hash_assoc_cntnr.html">gp_hash_assoc_cntnr</a>; 125</pre> 126 127<p> 128 The parameters are identical to those of the collision-chaining container, except 129for the following. 130</p> 131<ol> 132 <li> <tt>Comb_Probe_Fn</tt> describes how to transform a probe sequence into 133a sequence of positions within the table. 134 </li> 135 <li> <tt>Probe_Fn</tt> describes a probe sequence policy.</li> 136</ol> 137 138 139<p> 140 Some of the default template values depend on the values of other parameters, 141and are explained in 142<a name = "#policy_interaction">Policy Interaction</a>. 143</p> 144 145<h2><a name = "hash_policies">Hash Policies</h2></a> 146<p> 147 This subsection describes hash policies. It is organized as follows: 148</p> 149<ol> 150 <li> <a href = "#general_terms">General Terms</a> describes 151 some general terms. 152 </li> 153 <li> <a href = "#range_hashing_fns">Range-Hashing Functions</a> 154 describes range-hasing functions.</li> 155 <li> <a href = "#hash_policies_ranged_hash_policies">Ranged-Hash Functions</a> 156 describes ranged-hash functions. </li> 157 <li> <a href = "#pb_assoc_imp">Implementation in <tt>pb_assoc</tt></a> 158 describes the implementation of these concepts in <tt>pb_assoc</tt>. 159 </li> 160</ol> 161 162 163<h3><a name="general_terms">General Terms</a></h3> 164 165<p> 166 There 167are actually three functions involved in transforming a key into a hash-table's 168position (see Figure 169<a href = "#hash_ranged_hash_range_hashing_fns"> 170Hash runctions, ranged-hash functions, and range-hashing functions 171</a>): 172</p> 173<ol> 174 <li> 175 A <i>ranged-hash</i> function, which maps keys into an interval of the 176 non-negative integrals. This is the function actually required by the 177 hash-table algorithm. 178 </li> 179 <li> 180 A hash function, which maps keys into non-negative integral types. This is 181 typically specified by the writer of the key class. 182 </li> 183 <li> 184 A <i>range-hashing</i> function, which maps non-negative integral types into an 185 interval of non-negative integral types. 186 </li> 187</ol> 188 189<h6 align = "center"> 190<a name = "hash_ranged_hash_range_hashing_fns"> 191<img src = "hash_ranged_hash_range_hashing_fns.jpg" width = "70%" alt = "no image"> 192</h6> 193<h6 align = "center"> 194</a> 195Hash runctions, ranged-hash functions, and range-hashing functions. 196</h6> 197 198<p> 199 Let <i>U</i> be a domain (<i>e.g.</i>, the integers, or the strings of 3 200 characters). A hash-table algorithm needs to map elements of <i>U</i> "uniformly" 201 into the range <i>[0,..., m - 1]</i> (where <i>m</i> is a non-negative integral 202 value, and is, in general, time varying). <i>I.e.</i>, the algorithm needs a <i>ranged-hash</i> 203 function 204</p> 205<p> 206 <i>f : U × Z<sub>+</sub> → Z<sub>+</sub> </i>, 207</p> 208<p> 209 such that for any <i>u</i> in <i>U</i> 210, 211</p> 212<p> 213 <i>0 ≤ f(u, m) ≤ m - 1 </i>, 214</p> 215<p> 216 and which has "good uniformity" properties [<a href="references.html#knuth98sorting">knuth98sorting</a>]. 217 One common solution is to use the composition of the hash function 218</p> 219<p> 220 <i>h : U → Z<sub>+</sub> </i>, 221</p> 222<p> 223 which maps elements of <i>U</i> into the non-negative integrals, and 224</p> 225<p> 226 <i>g : Z<sub>+</sub> × Z<sub>+</sub> → Z<sub>+</sub>, </i> 227</p> 228<p> 229 which maps a non-negative hash value, and a non-negative range upper-bound into 230 a non-negative integral in the range between 0 (inclusive) and the range upper 231 bound (exclusive), <i>i.e.</i>, for any <i>r</i> in <i>Z<sub>+</sub></i>, 232</p> 233<p> 234 <i>0 ≤ g(r, m) ≤ m - 1 </i>. 235</p> 236<p> 237 The resulting ranged-hash function, is 238</p> 239<p> 240 <i><a name="eqn:ranged_hash_composed_of_hash_and_range_hashing">f(u , m) = g(h(u), m) </a> 241 </i>(1) . 242</p> 243 244<p> 245 From the above, it is obvious that given <i>g</i> and <i>h</i>, <i>f</i> can 246 always be composed (however the converse is not true). The STL's hash-based 247 containers allow specifying a hash function, and use a hard-wired range-hashing function; the ranged-hash function is implicitly composed. 248</p> 249 250 251<p> 252 The above describes the case where a key is to be mapped into a <i>single 253position</i> within a hash table, <i>e.g.</i>, in a collision-chaining table. 254In other cases, a key is to be mapped into a <i>sequence of poisitions</i> 255within a table, <i>e.g.</i>, in a probing table. 256</p> 257<p> 258 Similar terms apply in this case: the table requires a <i>ranged probe</i> 259function, mapping a key into a sequence of positions withing the table. This is 260typically acheived by composing a <i>hash function</i> mapping the key 261into a non-negative integral type, a <i>probe</i> function transforming the 262hash value into a sequence of hash values, and a <i>range-hashing</i> function 263transforming the sequence of hash values into a sequence of positions. 264</p> 265 266 267<h3><a name="range_hashing_fns">Range-Hashing Functions</a></h3> 268 269<p> 270 Some common choices for range-hashing functions are the division, 271 multiplication, and middle-square methods [<a href="references.html#knuth98sorting">knuth98sorting</a>], 272 defined as 273</p> 274<p> 275 <i><a name="eqn:division_method">g(r, m) = r mod m </a></i>(2) , 276</p> 277<p> 278 <i>g(r, m) = ⌈ u/v ( a r mod v ) ⌉ </i>, 279</p> 280<p> 281 and 282</p> 283<p> 284 <i>g(r, m) = ⌈ u/v ( r<sup>2</sup> mod v ) ⌉ </i>, 285</p> 286<p> 287respectively, for some positive integrals <i>u</i> and <i>v</i> (typically 288powers of 2), and some <i>a</i>. Each of these range-hashing functions works 289best for some different setting. 290</p> 291<p> 292 The division method <a href="#division_method">(2)</a> is a very common 293 choice. However, even this single method can be implemented in two very 294 different ways. It is possible to implement <a href="#division_method">(2)</a> 295 using the low level <i>%</i> (modulo) operation (for any <i>m</i>), or the low 296 level <i>&</i> (bit-mask) operation (for the case where <i>m</i> is a power of 297 2), <i>i.e.</i>, 298</p> 299<p> 300 <i><a name="eqn:division_method_prime_mod">g(r, m) = r % m </a></i>(3) , 301</p> 302<p> 303 and 304</p> 305<p> 306 <a name="eqn:division_method_bit_mask"> 307 <i>g(r, m) = r & m - 1, ( m = 2<sup>k</sup> 308 </i> 309 for some<i> k) </i></a>(4) , 310</p> 311<p> 312 respectively. 313</p> 314<p> 315 The <i>%</i> (modulo) implementation <a href="#division_method_prime_mod">(3)</a> 316 has the advantage that for <i>m</i> a prime far from a power of 2, <i>g(r, m)</i> 317 is affected by all the bits of <i>r</i> (minimizing the chance of collision). 318 It has the disadvantage of using the costly modulo operation. This method is 319 hard-wired into SGI's implementation [<a href="references.html#sgi_stl">sgi_stl</a>]. 320</p> 321 322<p> 323 The <i>&</i> (bit-mask) implementation <a href="#division_method_bit_mask">(4)</a> 324 has the advantage of relying on the fast bitwise and operation. It has the 325 disadvantage that for <i>g(r, m)</i> is affected only by the low order bits of <i>r</i>. 326 This method is hard-wired into Dinkumware's implementation [<a href="references.html#dinkumware_stl">dinkumware_stl</a>]. 327</p> 328 329 330 331 332<h3><a name="hash_policies_ranged_hash_policies">Ranged-Hash Functions</a></h3> 333 334<p> 335 In some less frequent cases it is beneficial to allow the client to 336directly specify a ranged-hash hash function. It is true, that the writer of 337the ranged-hash function cannot rely on the values of <i>m</i> having specific 338numerical properties suitable for hashing (in the sense used in [<a href="references.html#knuth98sorting">knuth98sorting</a>]), 339since the values of <i>m</i> are determined by a resize policy with possibly 340orthogonal considerations. 341</p> 342 343<p> 344 There are two cases where a ranged-hash function can be superior. The firs is when using perfect hashing 345[<a href="references.html#knuth98sorting">knuth98sorting</a>]; the second 346is when the values of <i>m</i> can be used to estimate the 347"general" number of distinct values required. This is described in the following. 348</p> 349 350<p> 351 Let 352</p> 353 354<p> 355 <i>s = [ s<sub>0</sub>,..., s<sub>t - 1</sub>] </i> 356</p> 357 358<p> 359 be a string of <i>t</i> characters, each of which is from domain <i>S</i>. 360Consider the following ranged-hash function: 361</p> 362 363<p> 364 <a name="eqn:total_string_dna_hash"> 365 <i> 366 f<sub>1</sub>(s, m) = 367 ∑ <sub>i = 368 0</sub><sup>t - 1</sup> s<sub>i</sub> a<sup>i</sup> </i>mod<i> m </i> 369 </a> (5) , 370</p> 371 372<p> 373 where <i>a</i> is some non-negative integral value. This is the standard 374string-hashing function used in SGI's implementation (with <i>a = 5</i>) [<a href="references.html#sgi_stl">sgi_stl</a>]. 375Its advantage is that it takes into account all of the characters of the 376string. 377</p> 378 379<p> 380 Now assume that <i>s</i> is the string representation of a of a long DNA 381sequence (and so <i>S = {'A', 'C', 'G', 'T'}</i>). In this case, scanning the 382entire string might be prohibitively expensive. A possible alternative might be 383to use only the first <i>k</i> characters of the string, where 384</p> 385 386<p> 387 k <sup>|S|</sup> ≥ m , 388</p> 389<p> 390 <i>i.e.</i>, using the hash function 391</p> 392<p> 393 <a name="eqn:only_k_string_dna_hash"><i>f<sub>2</sub>(s, m) = ∑ <sub>i = 0</sub><sup>k 394 - 1</sup> s<sub>i</sub> a<sup>i</sup> </i>mod <i>m </i></a>, (6) 395</p> 396<p> 397 requiring scanning over only 398</p> 399<p> 400 <i>k = </i>log<i><sub>4</sub>( m ) </i> 401</p> 402<p> 403 characters. 404</p> 405<p> 406 Other more elaborate hash-functions might scan <i>k</i> characters starting at 407 a random position (determined at each resize), or scanning <i>k</i> random 408 positions (determined at each resize), <i>i.e.</i>, using 409</p> 410<p> 411 <i>f<sub>3</sub>(s, m) = ∑ <sub>i = r<sub>0</sub></sub><sup>r<sub>0</sub> + k - 1</sup> 412 s<sub>i</sub> a<sup>i</sup> </i>mod <i>m </i>, 413</p> 414<p> 415 or 416</p> 417<p> 418 <i>f<sub>4</sub>(s, m) = ∑ <sub>i = 0</sub><sup>k - 1</sup> s<sub>r<sub>i</sub></sub> 419 a<sup>r<sub>i</sub></sup> </i>mod <i>m </i>, 420</p> 421<p> 422<p> 423 respectively, for <i>r<sub>0</sub>,..., r<sub>k-1</sub></i> each in the 424 (inclusive) range <i>[0,...,t-1]</i>. 425</p> 426 427 428<h3><a name="pb_assoc_imp">Implementation in <tt>pb_assoc</tt></a></h3> 429 430<p> 431<a href = "cc_hash_assoc_cntnr.html"><tt>cc_hash_assoc_cntnr</tt></a> is 432parameterized by <tt>Hash_Fn</tt> and <tt>Comb_Hash_Fn</tt>, a hash functor 433and a combining hash functor, respectively. 434</p> 435 436<p> 437 For any hash functor except <a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a>, 438one of the 439<a href = "concepts.html#concepts_null_policies">Concepts::Null Policy Classes</a>, 440then <tt>Comb_Hash_Fn</tt> is considered a range-hashing functor. 441The container will synthesize a ranged-hash functor from both. For example, Figure 442<a href = "#hash_range_hashing_seq_diagram"> 443Insert hash sequence diagram 444</a> 445shows an <tt>insert</tt> sequence diagram. The user inserts an element (point A), 446the container transforms the key into a non-negative integral using the hash 447functor (points B and C), and transforms the result into a position 448using the combining functor (points D and E). 449</p> 450 451<h6 align = "center"> 452<a name = "hash_range_hashing_seq_diagram"> 453<img src = "hash_range_hashing_seq_diagram.jpg" width = "70%" alt = "no image"> 454</a> 455</h6> 456<h6 align = "center"> 457Insert hash sequence diagram. 458</h6> 459 460 461<p> 462 <tt>pb_assoc</tt> contains the following range-hashing policies: 463</p> 464 465<ol> 466 <li> 467<a href = "direct_mask_range_hashing.html"><tt>direct_mask_range_hashing</tt></a> 468and 469<a href = "direct_mod_range_hashing.html"><tt>direct_mod_range_hashing</tt></a> 470are range-hashing functions based on a bit-mask and a modulo operation, respectively. 471 </li> 472</ol> 473 474 475<p> 476 If <tt>Comb_Hash_Fn</tt> is instantiated by 477<a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a>, 478and a combining-hash functor, the container treats 479the combining hash functor as a ranged-hash function. For example, Figure 480<a href = "#hash_range_hashing_seq_diagram2"> 481Insert hash sequence diagram with a null combination policy 482</a> 483shows an <tt>insert</tt> sequence diagram. The user inserts an element (point A), 484the container transforms the key into a position 485using the combining functor (points B and C). 486</p> 487 488 489<h6 align = "center"> 490<a name = "hash_range_hashing_seq_diagram2"> 491<img src = "hash_range_hashing_seq_diagram2.jpg" width = "70%" alt = "no image"> 492</a> 493</h6> 494<h6 align = "center"> 495Insert hash sequence diagram with a null combination policy. 496</h6> 497 498<p> 499 Similarly, 500<a href = "gp_hash_assoc_cntnr.html"></a><tt>gp_hash_assoc_cntnr</tt> 501is parameterized by <tt>Hash_Fn</tt>, <tt>Probe_Fn</tt>, and 502<tt>Comb_Probe_Fn</tt>. As before, if <tt>Probe_Fn</tt> 503and <tt>Comb_Probe_Fn</tt> are, respectively, 504<a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a> and 505<a href = "null_probe_fn.html"><tt>null_probe_fn</tt></a>, 506then <tt>Comb_Probe_Fn</tt> is a ranged-probe functor. Otherwise, <tt>Hash_Fn</tt> 507is a hash functor, <tt>Probe_Fn</tt> is a functor for offsets from a hash value, 508and <tt>Comb_Probe_Fn</tt> transforms a probe sequence into a sequence of positions 509within the table. 510</p> 511 512<p> 513 <tt>pb_assoc</tt> contains the following probe policies: 514</p> 515 516<ol> 517 <li> 518<a href = "linear_probe_fn.html"><tt>linear_probe_fn</tt></a> is a linear probe 519function. 520 </li> 521 <li> 522<a href = "quadratic_probe_fn.html"><tt>quadratic_probe_fn</tt></a> is 523a quadratic probe function. 524 </li> 525</ol> 526 527 528 529 530 531 532 533 534 535<h2><a name = "resize_policies">Resize Policies</a></h2> 536 537<p> 538 This subsection describes resize policies. It is organized as follows: 539</p> 540 541<ol> 542 <li> <a href = "#general">General Terms</a> describes general 543 terms. 544 </li> 545 <li> <a href = "#size_policies">Size Policies</a> describes size 546 policies. 547 </li> 548 <li> <a href = "#trigger_policies">Trigger Policies</a> describes trigger 549 policies. 550 </li> <li> <a href = "#imp_pb_assoc">Implementation in <tt>pb_assoc</tt></a> 551 describes the implementation of these concepts in <tt>pb_assoc</tt>. 552 </li> 553</ol> 554 555 556<h3><a name = "general">General Terms</a></h3> 557 558<p> 559 Hash-tables, as opposed to trees, do not naturally grow or shrink. It 560is necessary to specify policies to determine how and when a hash table should change 561its size. 562</p> 563 564<p> 565 In general, resize policies can be decomposed into (probably orthogonal) 566policies: 567</p> 568<ol> 569 <li> A <i>size policy</i> indicating <i>how</i> a hash table should 570grow (<i>e.g.,</i> it should multiply by powers of 2). 571 </li> 572 <li> A <i>trigger policy</i> indicating <i>when</i> a hash table should 573grow (<i>e.g.,</i> a load factor is exceeded). 574 </li> 575</ol> 576 577 578 579<h3><a name = "size_policies">Size Policies</a></h3> 580 581<p> 582 Size policies determine how a hash table 583changes size. These policies are simple, and there are relatively 584few sensible options. An exponential-size policy (with the initial 585size and growth factors both powers of 2) works well with a 586mask-based range-hashing function (see the <a href = "hash_policies.html#hash_policies_range_hashing_policies">Range-Hashing Policies</a> subsection), 587and is the 588hard-wired policy used by Dinkumware 589[<a href = "references.html#dinkumware_stl">dinkumware_stl</a>]. A 590prime-list based policy works well with a modulo-prime range 591hashing function (see the <a href = "hash_policies.html#hash_policies_range_hashing_policies">Range-Hashing Policies</a> subsection), 592and is the 593hard-wired policy used by SGI's implementation 594[<a href = "references.html#sgi_stl">sgi_stl</a>]. 595</p> 596 597 598 599 600<h3><a name = "trigger_policies">Trigger Policies</a></h3> 601 602<p> 603 Trigger policies determine when a hash table changes size. 604Following is a description of two polcies: <i>load-check</i> 605policies, and a collision-check policies. 606</p> 607 608<p> 609 Load-check policies are straightforward. The user 610specifies two factors, <i>α<sub>min</sub></i> and <i>α<sub>max</sub></i>, and 611the hash table maintains the invariant that 612</p> 613<p> 614 <i> 615 <a name = "eqn:load_factor_min_max"> 616 α<sub>min</sub> 617 ≤ 618 (number of stored elements) / (hash-table size) 619 ≤ 620 α<sub>max</sub> 621 </a> 622 </i> 623 (1) 624 . 625</p> 626 627<p> 628 Collision-check policies work in the opposite direction of 629load-check policies. They focus on keeping the number of 630collisions moderate and hoping 631that the size of the table will not grow very large, 632instead of keeping a moderate load-factor and 633hoping that the number of collisions will be small. 634A 635maximal collision-check policy resizes when the shortest 636probe-sequence grows too large. 637</p> 638 639 640<p> 641 Consider Figure 642<a href = "#balls_and_bins">Balls and bins</a>. 643 Let the size of the hash table be denoted by <i>m</i>, the 644length of a probe sequence be denoted by <i>k</i>, and some load 645factor be denoted by α. We would like to calculate the 646minimal length of <i>k</i>, such that if there were <i>α m</i> elements 647in the hash table, a probe sequence of length <i>k</i> would be found 648with probability at most <i>1/m</i>. 649</p> 650 651<h6 align = "center"> 652<a name = "balls_and_bins"> 653<img src = "balls_and_bins.jpg" width = "70%" alt = "no image"> 654</a> 655</h6> 656<h6 align = "center"> 657Balls and bins. 658</h6> 659 660 661<p> 662 Denote the probability that a probe sequence of length <i>k</i> 663appears in bin <i>i</i> by <i>p<sub>i</sub></i>, the length of the probe sequence 664of bin <i>i</i> by <i>l<sub>i</sub></i>, and assume uniform distribution. 665Then 666</p> 667 <p> 668 <a name = "eqn:prob_of_p1"> 669 <i>p<sub>1</sub> 670 = </i>(3) 671 </a> 672 </p> 673 <p> 674 <i> 675 <b>P</b>(l<sub>1</sub> ≥ k) 676 = 677 </i> 678 </p> 679 <p> 680 <i><b>P</b>(l<sub>1</sub> ≥ α ( 1 + k / α - 1 ) 681 ≤ </i>(a) 682 </p> 683 <p> 684 <i> 685 e 686 ^ 687 ( 688 - 689 ( 690 α ( k / α - 1 )<sup>2</sup> 691 ) 692 /2 693 ) 694 </i> 695 , 696</p> 697<p> 698 where (a) follows from the Chernoff bound 699[<a href = "references.html#motwani95random">motwani95random</a>]. 700To 701calculate the probability that <i>some</i> bin contains a probe 702sequence greater than <i>k</i>, we note that the <i>l<sub>i</sub></i> are 703negatively-dependent 704[<a href = "references.html#dubhashi98neg">dubhashi98neg</a>]. 705Let <i><b>I</b>(.)</i> 706denote the indicator function. Then 707 <p> 708 <a name = "eqn:at_least_k_i_n_some_bin"> 709 <i><b>P</b>( exists<sub>i</sub> l<sub>i</sub> ≥ k ) 710 = </i>(3) 711 </a> 712 </p> 713 <p> 714 <i> 715 <b>P</b> 716 ( 717 ∑ <sub>i = 1</sub><sup>m</sup> 718 <b>I</b>(l<sub>i</sub> ≥ k) ≥ 1 719 ) 720 = 721 </i> 722 </p> 723 <p> 724 <i> 725 <b>P</b> 726 ( 727 ∑ <sub>i = 1</sub><sup>m</sup> 728 <b>I</b> 729 ( 730 l<sub>i</sub> ≥ k 731 ) 732 ≥ 733 m p<sub>1</sub> ( 1 + 1 / (m p<sub>1</sub>) - 1 ) 734 ) 735 ≤ </i>(a) 736 </p> 737 <p> 738 <i> 739 e 740 ^ 741 ( 742 ( 743 - 744 m p<sub>1</sub> 745 ( 746 1 / (m p<sub>1</sub>) - 1 747 ) 748 <sup>2</sup> 749 ) 750 / 751 2 752 ) 753 , 754 </i> 755 </p> 756<p> 757where (a) follows from the fact that the Chernoff bound can be 758applied to negatively-dependent variables 759[<a href = "references.html#dubhashi98neg">dubhashi98neg</a>]. 760Inserting <a href = "#prob_of_p1">(2)</a> into 761<a href = "#at_least_k_i_n_some_bin">(3)</a>, and equating with <i>1/m</i>, 762we obtain 763</p> 764<p> 765 <i> 766 k 767 ~ 768 √ 769 ( 770 2 α </i>ln<i> 2 m </i>ln<i>(m) ) 771 ) 772 </i> 773 . 774</p> 775 776 777 778 779 780 781 782 783 784<h3><a name = "imp_pb_assoc">Implementation in <tt>pb_assoc</tt></a></h3> 785 786<p> 787 The resize policies in the previous subsection are conceptually straightforward. The design 788of hash-based containers' size-related interface is complicated by some factors. 789</p> 790<ol> 791 <li> Most containers, <i>i.e.</i> lists, trees, and vectors, have a single "size" concept. There is no 792distinction between the number of entries the container holds and the number of entries it is using. This, 793of course, is not the case for hash-based containers. Moreover, even describing the 794"actual" size of a hash-based container (as opposed to its logical size) is difficult - a probing-based container 795holds entries to elements, even those it does not use, while a chaining-based container holds pointers to entries. 796 </li> 797 <li> 798 The policies mentioned above operate in terms of invariants. <i>E.g.</i> a load-check trigger policy 799maintains an invariant concerning the load factor of a container object. This is sometimes too rigid: 800 <ol> 801 <li>In some cases it is desirable to allow controlled override of an entire policy, <i>e.g.</i> by externally resizing a container object (or giving it an initial size, which is a special case of externally resizing the container). 802 </li> 803 <li> 804 In other cases it is desirable to allow changing the specifics of a policy in runtime, <i>e.g.</i>, changing the load factors of a load-check policy. 805 </li> 806 </ol> 807 </li> 808 <li> 809 Resize policies interact strongly with hash policies. Performance-wise, for example, it is undesirable to use an exponential size policy of powers of two with a modulo range-hashing function, and it is undesirable to use a prime size policy with a mask range-hashing function. In other cases, the effects are more dramatic. For example, using a quadratic probe function with an exponential size policy will probably cause cases where the container object has available entries which are never reached by the probe function. (<a href = "hash_policies.html">Hash Policies</a> 810discusses the previous concepts.) 811 </li> 812</ol> 813 814<p> 815 Clearly, the more of these points an interface addresses, the greater its flexibility but the lower its encapsulation and uniformity between associative containers. 816</p> 817 818 819 820<p> 821 This library attempts to address these types of problems by delegating all size-related functionality to 822policy classes. Hash-based containers 823are parameterized by a resize-policy class (among others), and derive publicly from 824the resize-policy class 825[<a href = "references.html#alexandrescu01modern">alexandrescu01modern</a>] 826 <i>E.g.</i>, a collision-chaining 827hash table is defined as follows: 828</p> 829<pre> 830cc_ht_map< 831 <b>class</b> Key, 832 <b>class</b> Data, 833 ... 834 <b>class</b> Resize_Policy 835 ...> : 836 <b>public</b> Resize_Policy 837</pre> 838 839<p> 840 The containers themselves lack any functionality or public interface for manipulating sizes. A container 841object merely forwards events to its resize policy object and queries it for needed actions. 842</p> 843 844<p> 845 Figure 846<a href = "#insert_resize_sequence_diagram1"> 847Insert resize sequence diagram 848</a> 849shows a (possible) sequence diagram of an insert operation. 850The user inserts an element; the hash table 851notifies its resize policy that a search has started (point A); 852in this case, a single collision is encountered - the table 853notifies its resize policy of this (point B); the container 854finally notifies its resize policy that the search has ended (point C); 855it then queries its resize policy whether a resize is needed, and if so, 856what is the new size (points D to G); following the resize, it notifies 857the policy that a resize has completed (point H); finally, the element 858is inserted, and the policy notified (point I). 859</p> 860 861<h6 align = "center"> 862<a name = "insert_resize_sequence_diagram1"> 863<img src = "insert_resize_sequence_diagram1.jpg" width = "70%" alt = "no image"> 864</a> 865</h6> 866<h6 align = "center"> 867Insert resize sequence diagram. 868</h6> 869 870<p> 871 This addresses, to some extent, the problems mentioned above: 872</p> 873<ol> 874 <li> 875 Different instantiations of range-hashing policies can be met with different instantiations of 876 resize policies. 877 </li> 878 <li> 879 Questions on size-related interface are avoided, since the containers have no size-related methods. Thus 880 a container has no method for querying its actual size. It merely continuously forwards enough information to 881 its resize policy to answer such queries; the designer of the resize policy can decide whether, or how, to design the appropriate method. Also, a container has no methods for setting its size. It merely queries its 882resize policy for an initial size, queries it on a new size (if the resize policy indicates a resize is needed), and 883supports a <tt><b>protected virtual</b></tt> function for external resize. 884 </li> 885</ol> 886 887<p> 888 The library contains a single class for instantiating a resize policy, 889<tt>pb_assoc</tt> contains 890a standard resize policy, 891<a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a> (the name is explained shortly). 892In terms of interface, it is parameterized by a boolean constant indicating whether its public interface supports 893queries of actual size and external resize operations (the inclusion and exclusion of these methods in the interface have obvious tradeoffs in terms of encapsulation and flexibility). 894([<a href = "references.html#alexandrescu01modern">alexandrescu01modern</a>] shows many techniques for 895changing between alternative interfaces at compile time.) 896</p> 897 898<p> 899As noted before, 900 size and trigger policies are usually orthogonal. 901<a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a> 902is parameterized by size and trigger policies. For example, 903a collision-chaining hash table 904is typically be defined as follows: 905</p> 906<pre> 907cc_ht_map< 908 key, 909 data, 910 ... 911 hash_standard_resize_policy< 912 some_trigger_policy, 913 some_size_policy, 914 ...> > 915</pre> 916 917<p> 918 The sole function of 919<a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a> 920 is to 921act as a standard delegator 922[<a href = "references.html#gamma95designpatterns">gamma95designpatterns</a>] for these 923policies. 924 925<p> 926 Figures 927<a href = "#insert_resize_sequence_diagram2">Standard resize policy trigger sequence diagram</a> 928 and 929<a href = "#insert_resize_sequence_diagram3">Standard resize policy size sequence diagram</a> 930 show sequence diagrams illustrating the interaction between 931 the standard resize policy and its trigger and size policies, respectively. 932</p> 933 934<h6 align = "center"> 935<a name = "insert_resize_sequence_diagram2"> 936<img src = "insert_resize_sequence_diagram2.jpg" width = "70%" alt = "no image"> 937</a> 938</h6> 939<h6 align = "center"> 940Standard resize policy trigger sequence diagram. 941</h6> 942 943<h6 align = "center"> 944<a name = "insert_resize_sequence_diagram3"> 945<img src = "insert_resize_sequence_diagram3.jpg" width = "70%" alt = "no image"> 946</a> 947</h6> 948<h6 align = "center"> 949Standard resize policy size sequence diagram. 950</h6> 951 952<p> 953 The library (currently) supports the following instantiations of size 954and trigger policies: 955</p> 956 957<ol> 958 <li> 959 <a href = "hash_load_check_resize_trigger.html"><tt>hash_load_check_resize_trigger</tt></a> implements 960 a load check trigger policy. 961 </li> 962 <li> 963 <a href = "cc_hash_max_collision_check_resize_trigger.html"><tt>cc_hash_max_collision_check_resize_trigger</tt></a> 964 implements a collision check trigger policy. 965 </li> 966 <li> 967<a href = "hash_exponential_size_policy.html"><tt>hash_exponential_size_policy</tt></a> implemens 968an exponential-size policy (which should be used with mask range hashing). 969 </li> 970 <li> 971<a href = "hash_prime_size_policy.html"><tt>hash_prime_size_policy</tt></a> implementing 972a size policy based on a sequence of primes 973[<a href = "references.html#sgi_stl">sgi_stl</a>] (which should be used with mod range hashing 974 </li> 975</ol> 976 977<p> 978 The trigger policies also support interfaces for changing their specifics which depend on compile time constants. 979</p> 980 981 982<p> 983 Figure 984<a href = "#resize_policy_cd">Resize policy class diagram</a> gives an overall picture 985of the resize-related classes. 986<tt>Container</tt> (which stands for any of the hash-based containers) is parameterized 987by <tt>Resize_Policy</tt>, from which it subclasses publicly 988[<a href = "references.html#alexandrescu01modern">alexandrescu01modern</a>]. 989This class is currently instantiated only by 990<a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>. 991<a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a> itself 992is parameterized by <tt>Trigger_Policy</tt> and <tt>Size_Policy</tt>. 993Currently, <tt>Trigger_Policy</tt> is instantiated by 994<a href = "hash_load_check_resize_trigger.html"><tt>hash_load_check_resize_trigger</tt></a>, 995or 996<a href = "cc_hash_max_collision_check_resize_trigger.html"><tt>cc_hash_max_collision_check_resize_trigger</tt></a>; <tt>Size_Policy</tt> is instantiated by 997<a href = "hash_exponential_size_policy.html"><tt>hash_exponential_size_policy</tt></a>, 998or 999<a href = "hash_prime_size_policy.html"><tt>hash_prime_size_policy</tt></a>. 1000</p> 1001 1002 1003<h6 align = "center"> 1004<a name = "resize_policy_cd"> 1005<img src = "resize_policy_cd.jpg" width = "70%" alt = "no image"> 1006</a> 1007</h6> 1008<h6 align = "center"> 1009Resize policy class diagram. 1010</h6> 1011 1012 1013 1014 1015<h2><a name = "#policy_interaction">Policy Interaction</a></h2> 1016 1017<p> 1018 Hash-tables are unfortunately susceptible to choice of policies. One 1019of the more complicated aspects of this is that poor combinations of good policies 1020can alter performance drastically. Following are some considerations. 1021</p> 1022 1023 1024 1025 1026 1027<h3>Range-Hashing Policies and Resize Policies</h3> 1028 1029<p> 1030</p> 1031 1032 1033<h3>Equivalence Functors, Storing Hash Values, and Hash Functions</h3> 1034 1035 1036<p> 1037<a href = "cc_hash_assoc_cntnr.html"><tt>cc_hash_assoc_cntnr</tt></a> 1038and 1039<a href = "gp_hash_assoc_cntnr.html"><tt>gp_hash_assoc_cntnr</tt></a> 1040are parameterized by an equivalenc functor and by a <tt>Store_Hash</tt> 1041parameter. If the latter parameter is <tt><b>true</b></tt>, then 1042the container stores with each entry a hash value, and uses 1043this value in case of collisions to determine whether to apply a hash value. 1044This can lower the cost of collision for some types, but increase the cost of collisions for other types. 1045</p> 1046 1047<p> 1048 If a ranged-hash function or ranged probe function is directly supplied, however, 1049then it makes no sense to store the hash value with each entry. <tt>pb_assoc</tt>'s container will fail at compilation, by design, if this is attempted. 1050</p> 1051 1052 1053 1054</body> 1055 1056</html> 1057