1<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
2    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
3
4<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
5<head>
6  <meta name="generator" content=
7  "HTML Tidy for Linux/x86 (vers 12 April 2005), see www.w3.org" />
8
9  <title>Hash-Based Containers</title>
10  <meta http-equiv="Content-Type" content=
11  "text/html; charset=us-ascii" />
12  </head>
13
14<body>
15  <div id="page">
16    <h1>Hash Table Design</h1>
17
18    <h2><a name="overview" id="overview">Overview</a></h2>
19
20    <p>The collision-chaining hash-based container has the
21    following declaration.</p>
22    <pre>
23<b>template</b>&lt;
24    <b>typename</b> Key,
25    <b>typename</b> Mapped,
26    <b>typename</b> Hash_Fn = std::hash&lt;Key&gt;,
27    <b>typename</b> Eq_Fn = std::equal_to&lt;Key&gt;,
28    <b>typename</b> Comb_Hash_Fn = <a href=
29"direct_mask_range_hashing.html">direct_mask_range_hashing</a>&lt;&gt;
30    <b>typename</b> Resize_Policy = <i>default explained below.</i>
31     <b>bool</b> Store_Hash = <b>false</b>,
32     <b>typename</b> Allocator = std::allocator&lt;<b>char</b>&gt; &gt;
33<b>class</b> <a href=
34"cc_hash_table.html">cc_hash_table</a>;
35</pre>
36
37    <p>The parameters have the following meaning:</p>
38
39    <ol>
40      <li><tt>Key</tt> is the key type.</li>
41
42      <li><tt>Mapped</tt> is the mapped-policy, and is explained in
43      <a href="tutorial.html#assoc_ms">Tutorial::Associative
44      Containers::Associative Containers Others than Maps</a>.</li>
45
46      <li><tt>Hash_Fn</tt> is a key hashing functor.</li>
47
48      <li><tt>Eq_Fn</tt> is a key equivalence functor.</li>
49
50      <li><tt>Comb_Hash_Fn</tt> is a <i>range-hashing_functor</i>;
51      it describes how to translate hash values into positions
52      within the table. This is described in <a href=
53      "#hash_policies">Hash Policies</a>.</li>
54
55      <li><tt>Resize_Policy</tt> describes how a container object
56      should change its internal size. This is described in
57      <a href="#resize_policies">Resize Policies</a>.</li>
58
59      <li><tt>Store_Hash</tt> indicates whether the hash value
60      should be stored with each entry. This is described in
61      <a href="#policy_interaction">Policy Interaction</a>.</li>
62
63      <li><tt>Allocator</tt> is an allocator
64      type.</li>
65    </ol>
66
67    <p>The probing hash-based container has the following
68    declaration.</p>
69    <pre>
70<b>template</b>&lt;
71    <b>typename</b> Key,
72    <b>typename</b> Mapped,
73    <b>typename</b> Hash_Fn = std::hash&lt;Key&gt;,
74    <b>typename</b> Eq_Fn = std::equal_to&lt;Key&gt;,
75    <b>typename</b> Comb_Probe_Fn = <a href=
76"direct_mask_range_hashing.html">direct_mask_range_hashing</a>&lt;&gt;
77    <b>typename</b> Probe_Fn = <i>default explained below.</i>
78    <b>typename</b> Resize_Policy = <i>default explained below.</i>
79    <b>bool</b> Store_Hash = <b>false</b>,
80    <b>typename</b> Allocator =  std::allocator&lt;<b>char</b>&gt; &gt;
81<b>class</b> <a href=
82"gp_hash_table.html">gp_hash_table</a>;
83</pre>
84
85    <p>The parameters are identical to those of the
86    collision-chaining container, except for the following.</p>
87
88    <ol>
89      <li><tt>Comb_Probe_Fn</tt> describes how to transform a probe
90      sequence into a sequence of positions within the table.</li>
91
92      <li><tt>Probe_Fn</tt> describes a probe sequence policy.</li>
93    </ol>
94
95    <p>Some of the default template values depend on the values of
96    other parameters, and are explained in <a href=
97    "#policy_interaction">Policy Interaction</a>.</p>
98
99    <h2><a name="hash_policies" id="hash_policies">Hash
100    Policies</a></h2>
101
102    <h3><a name="general_terms" id="general_terms">General
103    Terms</a></h3>
104
105    <p>Following is an explanation of some functions which hashing
106    involves. Figure <a href=
107    "#hash_ranged_hash_range_hashing_fns">Hash functions,
108    ranged-hash functions, and range-hashing functions</a>)
109    illustrates the discussion.</p>
110
111    <h6 class="c1"><a name="hash_ranged_hash_range_hashing_fns" id=
112    "hash_ranged_hash_range_hashing_fns"><img src=
113    "hash_ranged_hash_range_hashing_fns.png" alt=
114    "no image" /></a></h6>
115
116    <h6 class="c1">Hash functions, ranged-hash functions, and
117    range-hashing functions.</h6>
118
119    <p>Let <i>U</i> be a domain (<i>e.g.</i>, the integers, or the
120    strings of 3 characters). A hash-table algorithm needs to map
121    elements of <i>U</i> "uniformly" into the range <i>[0,..., m -
122    1]</i> (where <i>m</i> is a non-negative integral value, and
123    is, in general, time varying). <i>I.e.</i>, the algorithm needs
124    a <i>ranged-hash</i> function</p>
125
126    <p><i>f : U &times; Z<sub>+</sub> &rarr; Z<sub>+</sub></i>
127    ,</p>
128
129    <p>such that for any <i>u</i> in <i>U</i> ,</p>
130
131    <p><i>0 &le; f(u, m) &le; m - 1</i> ,</p>
132
133    <p>and which has "good uniformity" properties [<a href=
134    "references.html#knuth98sorting">knuth98sorting</a>]. One
135    common solution is to use the composition of the hash
136    function</p>
137
138    <p><i>h : U &rarr; Z<sub>+</sub></i> ,</p>
139
140    <p>which maps elements of <i>U</i> into the non-negative
141    integrals, and</p>
142
143    <p class="c2">g : Z<sub>+</sub> &times; Z<sub>+</sub> &rarr;
144    Z<sub>+</sub>,</p>
145
146    <p>which maps a non-negative hash value, and a non-negative
147    range upper-bound into a non-negative integral in the range
148    between 0 (inclusive) and the range upper bound (exclusive),
149    <i>i.e.</i>, for any <i>r</i> in <i>Z<sub>+</sub></i>,</p>
150
151    <p><i>0 &le; g(r, m) &le; m - 1</i> .</p>
152
153    <p>The resulting ranged-hash function, is</p>
154
155    <p><i><a name="ranged_hash_composed_of_hash_and_range_hashing"
156    id="ranged_hash_composed_of_hash_and_range_hashing">f(u , m) =
157    g(h(u), m)</a></i> (1) .</p>
158
159    <p>From the above, it is obvious that given <i>g</i> and
160    <i>h</i>, <i>f</i> can always be composed (however the converse
161    is not true). The STL's hash-based containers allow specifying
162    a hash function, and use a hard-wired range-hashing function;
163    the ranged-hash function is implicitly composed.</p>
164
165    <p>The above describes the case where a key is to be mapped
166    into a <i>single position</i> within a hash table, <i>e.g.</i>,
167    in a collision-chaining table. In other cases, a key is to be
168    mapped into a <i>sequence of positions</i> within a table,
169    <i>e.g.</i>, in a probing table. Similar terms apply in this
170    case: the table requires a <i>ranged probe</i> function,
171    mapping a key into a sequence of positions withing the table.
172    This is typically achieved by composing a <i>hash function</i>
173    mapping the key into a non-negative integral type, a
174    <i>probe</i> function transforming the hash value into a
175    sequence of hash values, and a <i>range-hashing</i> function
176    transforming the sequence of hash values into a sequence of
177    positions.</p>
178
179    <h3><a name="range_hashing_fns" id=
180    "range_hashing_fns">Range-Hashing Functions</a></h3>
181
182    <p>Some common choices for range-hashing functions are the
183    division, multiplication, and middle-square methods [<a href=
184    "references.html#knuth98sorting">knuth98sorting</a>], defined
185    as</p>
186
187    <p><i><a name="division_method" id="division_method">g(r, m) =
188    r mod m</a></i> (2) ,</p>
189
190    <p><i>g(r, m) = &lceil; u/v ( a r mod v ) &rceil;</i> ,</p>
191
192    <p>and</p>
193
194    <p><i>g(r, m) = &lceil; u/v ( r<sup>2</sup> mod v ) &rceil;</i>
195    ,</p>
196
197    <p>respectively, for some positive integrals <i>u</i> and
198    <i>v</i> (typically powers of 2), and some <i>a</i>. Each of
199    these range-hashing functions works best for some different
200    setting.</p>
201
202    <p>The division method <a href="#division_method">(2)</a> is a
203    very common choice. However, even this single method can be
204    implemented in two very different ways. It is possible to
205    implement <a href="#division_method">(2)</a> using the low
206    level <i>%</i> (modulo) operation (for any <i>m</i>), or the
207    low level <i>&amp;</i> (bit-mask) operation (for the case where
208    <i>m</i> is a power of 2), <i>i.e.</i>,</p>
209
210    <p><i><a name="division_method_prime_mod" id=
211    "division_method_prime_mod">g(r, m) = r % m</a></i> (3) ,</p>
212
213    <p>and</p>
214
215    <p><i><a name="division_method_bit_mask" id=
216    "division_method_bit_mask">g(r, m) = r &amp; m - 1, (m =
217    2<sup>k</sup>)</a></i> for some <i>k)</i> (4),</p>
218
219    <p>respectively.</p>
220
221    <p>The <i>%</i> (modulo) implementation <a href=
222    "#division_method_prime_mod">(3)</a> has the advantage that for
223    <i>m</i> a prime far from a power of 2, <i>g(r, m)</i> is
224    affected by all the bits of <i>r</i> (minimizing the chance of
225    collision). It has the disadvantage of using the costly modulo
226    operation. This method is hard-wired into SGI's implementation
227    [<a href="references.html#sgi_stl">sgi_stl</a>].</p>
228
229    <p>The <i>&amp;</i> (bit-mask) implementation <a href=
230    "#division_method_bit_mask">(4)</a> has the advantage of
231    relying on the fast bit-wise and operation. It has the
232    disadvantage that for <i>g(r, m)</i> is affected only by the
233    low order bits of <i>r</i>. This method is hard-wired into
234    Dinkumware's implementation [<a href=
235    "references.html#dinkumware_stl">dinkumware_stl</a>].</p>
236
237    <h3><a name="hash_policies_ranged_hash_policies" id=
238    "hash_policies_ranged_hash_policies">Ranged-Hash
239    Functions</a></h3>
240
241    <p>In cases it is beneficial to allow the
242    client to directly specify a ranged-hash hash function. It is
243    true, that the writer of the ranged-hash function cannot rely
244    on the values of <i>m</i> having specific numerical properties
245    suitable for hashing (in the sense used in [<a href=
246    "references.html#knuth98sorting">knuth98sorting</a>]), since
247    the values of <i>m</i> are determined by a resize policy with
248    possibly orthogonal considerations.</p>
249
250    <p>There are two cases where a ranged-hash function can be
251    superior. The firs is when using perfect hashing [<a href=
252    "references.html#knuth98sorting">knuth98sorting</a>]; the
253    second is when the values of <i>m</i> can be used to estimate
254    the "general" number of distinct values required. This is
255    described in the following.</p>
256
257    <p>Let</p>
258
259    <p class="c2">s = [ s<sub>0</sub>,..., s<sub>t - 1</sub>]</p>
260
261    <p>be a string of <i>t</i> characters, each of which is from
262    domain <i>S</i>. Consider the following ranged-hash
263    function:</p>
264
265    <p><a name="total_string_dna_hash" id=
266    "total_string_dna_hash"><i>f<sub>1</sub>(s, m) = &sum; <sub>i =
267    0</sub><sup>t - 1</sup> s<sub>i</sub> a<sup>i</sup></i> mod
268    <i>m</i></a> (5) ,</p>
269
270    <p>where <i>a</i> is some non-negative integral value. This is
271    the standard string-hashing function used in SGI's
272    implementation (with <i>a = 5</i>) [<a href=
273    "references.html#sgi_stl">sgi_stl</a>]. Its advantage is that
274    it takes into account all of the characters of the string.</p>
275
276    <p>Now assume that <i>s</i> is the string representation of a
277    of a long DNA sequence (and so <i>S = {'A', 'C', 'G',
278    'T'}</i>). In this case, scanning the entire string might be
279    prohibitively expensive. A possible alternative might be to use
280    only the first <i>k</i> characters of the string, where</p>
281
282    <p>|S|<sup>k</sup> &ge; m ,</p>
283
284    <p><i>i.e.</i>, using the hash function</p>
285
286    <p><a name="only_k_string_dna_hash" id=
287    "only_k_string_dna_hash"><i>f<sub>2</sub>(s, m) = &sum; <sub>i
288    = 0</sub><sup>k - 1</sup> s<sub>i</sub> a<sup>i</sup></i> mod
289    <i>m</i></a> , (6)</p>
290
291    <p>requiring scanning over only</p>
292
293    <p><i>k =</i> log<i><sub>4</sub>( m )</i></p>
294
295    <p>characters.</p>
296
297    <p>Other more elaborate hash-functions might scan <i>k</i>
298    characters starting at a random position (determined at each
299    resize), or scanning <i>k</i> random positions (determined at
300    each resize), <i>i.e.</i>, using</p>
301
302    <p><i>f<sub>3</sub>(s, m) = &sum; <sub>i =
303    r</sub>0</i><sup>r<sub>0</sub> + k - 1</sup> s<sub>i</sub>
304    a<sup>i</sup> mod <i>m</i> ,</p>
305
306    <p>or</p>
307
308    <p><i>f<sub>4</sub>(s, m) = &sum; <sub>i = 0</sub><sup>k -
309    1</sup> s<sub>r</sub>i</i> a<sup>r<sub>i</sub></sup> mod
310    <i>m</i> ,</p>
311
312    <p>respectively, for <i>r<sub>0</sub>,..., r<sub>k-1</sub></i>
313    each in the (inclusive) range <i>[0,...,t-1]</i>.</p>
314
315    <p>It should be noted that the above functions cannot be
316    decomposed as <a href=
317    "#ranged_hash_composed_of_hash_and_range_hashing">(1)</a> .</p>
318
319    <h3><a name="pb_ds_imp" id="pb_ds_imp">Implementation</a></h3>
320
321    <p>This sub-subsection describes the implementation of the
322    above in <tt>pb_ds</tt>. It first explains range-hashing
323    functions in collision-chaining tables, then ranged-hash
324    functions in collision-chaining tables, then probing-based
325    tables, and, finally, lists the relevant classes in
326    <tt>pb_ds</tt>.</p>
327
328    <h4>Range-Hashing and Ranged-Hashes in Collision-Chaining
329    Tables</h4>
330
331    <p><a href=
332    "cc_hash_table.html"><tt>cc_hash_table</tt></a> is
333    parametrized by <tt>Hash_Fn</tt> and <tt>Comb_Hash_Fn</tt>, a
334    hash functor and a combining hash functor, respectively.</p>
335
336    <p>In general, <tt>Comb_Hash_Fn</tt> is considered a
337    range-hashing functor. <a href=
338    "cc_hash_table.html"><tt>cc_hash_table</tt></a>
339    synthesizes a ranged-hash function from <tt>Hash_Fn</tt> and
340    <tt>Comb_Hash_Fn</tt> (see <a href=
341    "#ranged_hash_composed_of_hash_and_range_hashing">(1)</a>
342    above). Figure <a href="#hash_range_hashing_seq_diagram">Insert
343    hash sequence diagram</a> shows an <tt>insert</tt> sequence
344    diagram for this case. The user inserts an element (point A),
345    the container transforms the key into a non-negative integral
346    using the hash functor (points B and C), and transforms the
347    result into a position using the combining functor (points D
348    and E).</p>
349
350    <h6 class="c1"><a name="hash_range_hashing_seq_diagram" id=
351    "hash_range_hashing_seq_diagram"><img src=
352    "hash_range_hashing_seq_diagram.png" alt="no image" /></a></h6>
353
354    <h6 class="c1">Insert hash sequence diagram.</h6>
355
356    <p>If <a href=
357    "cc_hash_table.html"><tt>cc_hash_table</tt></a>'s
358    hash-functor, <tt>Hash_Fn</tt> is instantiated by <a href=
359    "null_hash_fn.html"><tt>null_hash_fn</tt></a> (see <a href=
360    "concepts.html#concepts_null_policies">Interface::Concepts::Null
361    Policy Classes</a>), then <tt>Comb_Hash_Fn</tt> is taken to be
362    a ranged-hash function. Figure <a href=
363    "#hash_range_hashing_seq_diagram2">Insert hash sequence diagram
364    with a null hash policy</a> shows an <tt>insert</tt> sequence
365    diagram. The user inserts an element (point A), the container
366    transforms the key into a position using the combining functor
367    (points B and C).</p>
368
369    <h6 class="c1"><a name="hash_range_hashing_seq_diagram2" id=
370    "hash_range_hashing_seq_diagram2"><img src=
371    "hash_range_hashing_seq_diagram2.png" alt=
372    "no image" /></a></h6>
373
374    <h6 class="c1">Insert hash sequence diagram with a null hash
375    policy.</h6>
376
377    <h4>Probing Tables</h4>
378
379    <p><a href=
380    "gp_hash_table.html"></a><tt>gp_hash_table</tt> is
381    parametrized by <tt>Hash_Fn</tt>, <tt>Probe_Fn</tt>, and
382    <tt>Comb_Probe_Fn</tt>. As before, if <tt>Hash_Fn</tt> and
383    <tt>Probe_Fn</tt> are, respectively, <a href=
384    "null_hash_fn.html"><tt>null_hash_fn</tt></a> and <a href=
385    "null_probe_fn.html"><tt>null_probe_fn</tt></a>, then
386    <tt>Comb_Probe_Fn</tt> is a ranged-probe functor. Otherwise,
387    <tt>Hash_Fn</tt> is a hash functor, <tt>Probe_Fn</tt> is a
388    functor for offsets from a hash value, and
389    <tt>Comb_Probe_Fn</tt> transforms a probe sequence into a
390    sequence of positions within the table.</p>
391
392    <h4>Pre-Defined Policies</h4>
393
394    <p><tt>pb_ds</tt> contains some pre-defined classes
395    implementing range-hashing and probing functions:</p>
396
397    <ol>
398      <li><a href=
399      "direct_mask_range_hashing.html"><tt>direct_mask_range_hashing</tt></a>
400      and <a href=
401      "direct_mod_range_hashing.html"><tt>direct_mod_range_hashing</tt></a>
402      are range-hashing functions based on a bit-mask and a modulo
403      operation, respectively.</li>
404
405      <li><a href=
406      "linear_probe_fn.html"><tt>linear_probe_fn</tt></a>, and
407      <a href=
408      "quadratic_probe_fn.html"><tt>quadratic_probe_fn</tt></a> are
409      a linear probe and a quadratic probe function,
410      respectively.</li>
411    </ol>Figure <a href="#hash_policy_cd">Hash policy class
412    diagram</a> shows a class diagram.
413
414    <h6 class="c1"><a name="hash_policy_cd" id=
415    "hash_policy_cd"><img src="hash_policy_cd.png" alt=
416    "no image" /></a></h6>
417
418    <h6 class="c1">Hash policy class diagram.</h6>
419
420    <h2><a name="resize_policies" id="resize_policies">Resize
421    Policies</a></h2>
422
423    <h3><a name="general" id="general">General Terms</a></h3>
424
425    <p>Hash-tables, as opposed to trees, do not naturally grow or
426    shrink. It is necessary to specify policies to determine how
427    and when a hash table should change its size. Usually, resize
428    policies can be decomposed into orthogonal policies:</p>
429
430    <ol>
431      <li>A <i>size policy</i> indicating <i>how</i> a hash table
432      should grow (<i>e.g.,</i> it should multiply by powers of
433      2).</li>
434
435      <li>A <i>trigger policy</i> indicating <i>when</i> a hash
436      table should grow (<i>e.g.,</i> a load factor is
437      exceeded).</li>
438    </ol>
439
440    <h3><a name="size_policies" id="size_policies">Size
441    Policies</a></h3>
442
443    <p>Size policies determine how a hash table changes size. These
444    policies are simple, and there are relatively few sensible
445    options. An exponential-size policy (with the initial size and
446    growth factors both powers of 2) works well with a mask-based
447    range-hashing function (see <a href=
448    "#hash_policies">Range-Hashing Policies</a>), and is the
449    hard-wired policy used by Dinkumware [<a href=
450    "references.html#dinkumware_stl">dinkumware_stl</a>]. A
451    prime-list based policy works well with a modulo-prime range
452    hashing function (see <a href="#hash_policies">Range-Hashing
453    Policies</a>), and is the hard-wired policy used by SGI's
454    implementation [<a href=
455    "references.html#sgi_stl">sgi_stl</a>].</p>
456
457    <h3><a name="trigger_policies" id="trigger_policies">Trigger
458    Policies</a></h3>
459
460    <p>Trigger policies determine when a hash table changes size.
461    Following is a description of two policies: <i>load-check</i>
462    policies, and collision-check policies.</p>
463
464    <p>Load-check policies are straightforward. The user specifies
465    two factors, <i>&alpha;<sub>min</sub></i> and
466    <i>&alpha;<sub>max</sub></i>, and the hash table maintains the
467    invariant that</p>
468
469    <p><i><a name="load_factor_min_max" id=
470    "load_factor_min_max">&alpha;<sub>min</sub> &le; (number of
471    stored elements) / (hash-table size) &le;
472    &alpha;<sub>max</sub></a></i> (1) .</p>
473
474    <p>Collision-check policies work in the opposite direction of
475    load-check policies. They focus on keeping the number of
476    collisions moderate and hoping that the size of the table will
477    not grow very large, instead of keeping a moderate load-factor
478    and hoping that the number of collisions will be small. A
479    maximal collision-check policy resizes when the longest
480    probe-sequence grows too large.</p>
481
482    <p>Consider Figure <a href="#balls_and_bins">Balls and
483    bins</a>. Let the size of the hash table be denoted by
484    <i>m</i>, the length of a probe sequence be denoted by
485    <i>k</i>, and some load factor be denoted by &alpha;. We would
486    like to calculate the minimal length of <i>k</i>, such that if
487    there were <i>&alpha; m</i> elements in the hash table, a probe
488    sequence of length <i>k</i> would be found with probability at
489    most <i>1/m</i>.</p>
490
491    <h6 class="c1"><a name="balls_and_bins" id=
492    "balls_and_bins"><img src="balls_and_bins.png" alt=
493    "no image" /></a></h6>
494
495    <h6 class="c1">Balls and bins.</h6>
496
497    <p>Denote the probability that a probe sequence of length
498    <i>k</i> appears in bin <i>i</i> by <i>p<sub>i</sub></i>, the
499    length of the probe sequence of bin <i>i</i> by
500    <i>l<sub>i</sub></i>, and assume uniform distribution. Then</p>
501
502    <p><a name="prob_of_p1" id=
503    "prob_of_p1"><i>p<sub>1</sub></i></a> = (3)</p>
504
505    <p class="c2"><b>P</b>(l<sub>1</sub> &ge; k) =</p>
506
507    <p><i><b>P</b>(l<sub>1</sub> &ge; &alpha; ( 1 + k / &alpha; - 1
508    ) &le;</i> (a)</p>
509
510    <p><i>e ^ ( - ( &alpha; ( k / &alpha; - 1 )<sup>2</sup> ) /2
511    )</i> ,</p>
512
513    <p>where (a) follows from the Chernoff bound [<a href=
514    "references.html#motwani95random">motwani95random</a>]. To
515    calculate the probability that <i>some</i> bin contains a probe
516    sequence greater than <i>k</i>, we note that the
517    <i>l<sub>i</sub></i> are negatively-dependent [<a href=
518    "references.html#dubhashi98neg">dubhashi98neg</a>]. Let
519    <i><b>I</b>(.)</i> denote the indicator function. Then</p>
520
521    <p><a name="at_least_k_i_n_some_bin" id=
522    "at_least_k_i_n_some_bin"><i><b>P</b>( exists<sub>i</sub>
523    l<sub>i</sub> &ge; k ) =</i> (3)</a></p>
524
525    <p class="c2"><b>P</b> ( &sum; <sub>i = 1</sub><sup>m</sup>
526    <b>I</b>(l<sub>i</sub> &ge; k) &ge; 1 ) =</p>
527
528    <p><i><b>P</b> ( &sum; <sub>i = 1</sub><sup>m</sup> <b>I</b> (
529    l<sub>i</sub> &ge; k ) &ge; m p<sub>1</sub> ( 1 + 1 / (m
530    p<sub>1</sub>) - 1 ) ) &le;</i> (a)</p>
531
532    <p class="c2">e ^ ( ( - m p<sub>1</sub> ( 1 / (m p<sub>1</sub>)
533    - 1 ) <sup>2</sup> ) / 2 ) ,</p>
534
535    <p>where (a) follows from the fact that the Chernoff bound can
536    be applied to negatively-dependent variables [<a href=
537    "references.html#dubhashi98neg">dubhashi98neg</a>]. Inserting
538    <a href="#prob_of_p1">(2)</a> into <a href=
539    "#at_least_k_i_n_some_bin">(3)</a>, and equating with
540    <i>1/m</i>, we obtain</p>
541
542    <p><i>k ~ &radic; ( 2 &alpha;</i> ln <i>2 m</i> ln<i>(m) )
543    )</i> .</p>
544
545    <h3><a name="imp_pb_ds" id="imp_pb_ds">Implementation</a></h3>
546
547    <p>This sub-subsection describes the implementation of the
548    above in <tt>pb_ds</tt>. It first describes resize policies and
549    their decomposition into trigger and size policies, then
550    describes pre-defined classes, and finally discusses controlled
551    access the policies' internals.</p>
552
553    <h4>Resize Policies and Their Decomposition</h4>
554
555    <p>Each hash-based container is parametrized by a
556    <tt>Resize_Policy</tt> parameter; the container derives
557    <tt><b>public</b></tt>ly from <tt>Resize_Policy</tt>. For
558    example:</p>
559    <pre>
560<a href="cc_hash_table.html">cc_hash_table</a>&lt;
561    <b>typename</b> Key,
562    <b>typename</b> Mapped,
563    ...
564    <b>typename</b> Resize_Policy
565    ...&gt; :
566        <b>public</b> Resize_Policy
567</pre>
568
569    <p>As a container object is modified, it continuously notifies
570    its <tt>Resize_Policy</tt> base of internal changes
571    (<i>e.g.</i>, collisions encountered and elements being
572    inserted). It queries its <tt>Resize_Policy</tt> base whether
573    it needs to be resized, and if so, to what size.</p>
574
575    <p>Figure <a href="#insert_resize_sequence_diagram1">Insert
576    resize sequence diagram</a> shows a (possible) sequence diagram
577    of an insert operation. The user inserts an element; the hash
578    table notifies its resize policy that a search has started
579    (point A); in this case, a single collision is encountered -
580    the table notifies its resize policy of this (point B); the
581    container finally notifies its resize policy that the search
582    has ended (point C); it then queries its resize policy whether
583    a resize is needed, and if so, what is the new size (points D
584    to G); following the resize, it notifies the policy that a
585    resize has completed (point H); finally, the element is
586    inserted, and the policy notified (point I).</p>
587
588    <h6 class="c1"><a name="insert_resize_sequence_diagram1" id=
589    "insert_resize_sequence_diagram1"><img src=
590    "insert_resize_sequence_diagram1.png" alt=
591    "no image" /></a></h6>
592
593    <h6 class="c1">Insert resize sequence diagram.</h6>
594
595    <p>In practice, a resize policy can be usually orthogonally
596    decomposed to a size policy and a trigger policy. Consequently,
597    the library contains a single class for instantiating a resize
598    policy: <a href=
599    "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>
600    is parametrized by <tt>Size_Policy</tt> and
601    <tt>Trigger_Policy</tt>, derives <tt><b>public</b></tt>ly from
602    both, and acts as a standard delegate [<a href=
603    "references.html#gamma95designpatterns">gamma95designpatterns</a>]
604    to these policies.</p>
605
606    <p>Figures <a href="#insert_resize_sequence_diagram2">Standard
607    resize policy trigger sequence diagram</a> and <a href=
608    "#insert_resize_sequence_diagram3">Standard resize policy size
609    sequence diagram</a> show sequence diagrams illustrating the
610    interaction between the standard resize policy and its trigger
611    and size policies, respectively.</p>
612
613    <h6 class="c1"><a name="insert_resize_sequence_diagram2" id=
614    "insert_resize_sequence_diagram2"><img src=
615    "insert_resize_sequence_diagram2.png" alt=
616    "no image" /></a></h6>
617
618    <h6 class="c1">Standard resize policy trigger sequence
619    diagram.</h6>
620
621    <h6 class="c1"><a name="insert_resize_sequence_diagram3" id=
622    "insert_resize_sequence_diagram3"><img src=
623    "insert_resize_sequence_diagram3.png" alt=
624    "no image" /></a></h6>
625
626    <h6 class="c1">Standard resize policy size sequence
627    diagram.</h6>
628
629    <h4>Pre-Defined Policies</h4>
630
631    <p>The library includes the following
632    instantiations of size and trigger policies:</p>
633
634    <ol>
635      <li><a href=
636      "hash_load_check_resize_trigger.html"><tt>hash_load_check_resize_trigger</tt></a>
637      implements a load check trigger policy.</li>
638
639      <li><a href=
640      "cc_hash_max_collision_check_resize_trigger.html"><tt>cc_hash_max_collision_check_resize_trigger</tt></a>
641      implements a collision check trigger policy.</li>
642
643      <li><a href=
644      "hash_exponential_size_policy.html"><tt>hash_exponential_size_policy</tt></a>
645      implements an exponential-size policy (which should be used
646      with mask range hashing).</li>
647
648      <li><a href=
649      "hash_prime_size_policy.html"><tt>hash_prime_size_policy</tt></a>
650      implementing a size policy based on a sequence of primes
651      [<a href="references.html#sgi_stl">sgi_stl</a>] (which should
652      be used with mod range hashing</li>
653    </ol>
654
655    <p>Figure <a href="#resize_policy_cd">Resize policy class
656    diagram</a> gives an overall picture of the resize-related
657    classes. <a href=
658    "basic_hash_table.html"><tt>basic_hash_table</tt></a>
659    is parametrized by <tt>Resize_Policy</tt>, which it subclasses
660    publicly. This class is currently instantiated only by <a href=
661    "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>.
662    <a href=
663    "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>
664    itself is parametrized by <tt>Trigger_Policy</tt> and
665    <tt>Size_Policy</tt>. Currently, <tt>Trigger_Policy</tt> is
666    instantiated by <a href=
667    "hash_load_check_resize_trigger.html"><tt>hash_load_check_resize_trigger</tt></a>,
668    or <a href=
669    "cc_hash_max_collision_check_resize_trigger.html"><tt>cc_hash_max_collision_check_resize_trigger</tt></a>;
670    <tt>Size_Policy</tt> is instantiated by <a href=
671    "hash_exponential_size_policy.html"><tt>hash_exponential_size_policy</tt></a>,
672    or <a href=
673    "hash_prime_size_policy.html"><tt>hash_prime_size_policy</tt></a>.</p>
674
675    <h6 class="c1"><a name="resize_policy_cd" id=
676    "resize_policy_cd"><img src="resize_policy_cd.png" alt=
677    "no image" /></a></h6>
678
679    <h6 class="c1">Resize policy class diagram.</h6>
680
681    <h4>Controlled Access to Policies' Internals</h4>
682
683    <p>There are cases where (controlled) access to resize
684    policies' internals is beneficial. <i>E.g.</i>, it is sometimes
685    useful to query a hash-table for the table's actual size (as
686    opposed to its <tt>size()</tt> - the number of values it
687    currently holds); it is sometimes useful to set a table's
688    initial size, externally resize it, or change load factors.</p>
689
690    <p>Clearly, supporting such methods both decreases the
691    encapsulation of hash-based containers, and increases the
692    diversity between different associative-containers' interfaces.
693    Conversely, omitting such methods can decrease containers'
694    flexibility.</p>
695
696    <p>In order to avoid, to the extent possible, the above
697    conflict, the hash-based containers themselves do not address
698    any of these questions; this is deferred to the resize policies,
699    which are easier to change or replace. Thus, for example,
700    neither <a href=
701    "cc_hash_table.html"><tt>cc_hash_table</tt></a> nor
702    <a href=
703    "gp_hash_table.html"><tt>gp_hash_table</tt></a>
704    contain methods for querying the actual size of the table; this
705    is deferred to <a href=
706    "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>.</p>
707
708    <p>Furthermore, the policies themselves are parametrized by
709    template arguments that determine the methods they support
710    ([<a href=
711    "references.html#alexandrescu01modern">alexandrescu01modern</a>]
712    shows techniques for doing so). <a href=
713    "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>
714    is parametrized by <tt>External_Size_Access</tt> that
715    determines whether it supports methods for querying the actual
716    size of the table or resizing it. <a href=
717    "hash_load_check_resize_trigger.html"><tt>hash_load_check_resize_trigger</tt></a>
718    is parametrized by <tt>External_Load_Access</tt> that
719    determines whether it supports methods for querying or
720    modifying the loads. <a href=
721    "cc_hash_max_collision_check_resize_trigger.html"><tt>cc_hash_max_collision_check_resize_trigger</tt></a>
722    is parametrized by <tt>External_Load_Access</tt> that
723    determines whether it supports methods for querying the
724    load.</p>
725
726    <p>Some operations, for example, resizing a container at
727    run time, or changing the load factors of a load-check trigger
728    policy, require the container itself to resize. As mentioned
729    above, the hash-based containers themselves do not contain
730    these types of methods, only their resize policies.
731    Consequently, there must be some mechanism for a resize policy
732    to manipulate the hash-based container. As the hash-based
733    container is a subclass of the resize policy, this is done
734    through virtual methods. Each hash-based container has a
735    <tt><b>private</b></tt> <tt><b>virtual</b></tt> method:</p>
736    <pre>
737<b>virtual void</b>
738    do_resize
739    (size_type new_size);
740</pre>
741
742    <p>which resizes the container. Implementations of
743    <tt>Resize_Policy</tt> can export public methods for resizing
744    the container externally; these methods internally call
745    <tt>do_resize</tt> to resize the table.</p>
746
747    <h2><a name="policy_interaction" id="policy_interaction">Policy
748    Interaction</a></h2>
749
750    <p>Hash-tables are unfortunately especially susceptible to
751    choice of policies. One of the more complicated aspects of this
752    is that poor combinations of good policies can form a poor
753    container. Following are some considerations.</p>
754
755    <h3><a name="policy_interaction_probe_size_trigger" id=
756    "policy_interaction_probe_size_trigger">Probe Policies, Size
757    Policies, and Trigger Policies</a></h3>
758
759    <p>Some combinations do not work well for probing containers.
760    For example, combining a quadratic probe policy with an
761    exponential size policy can yield a poor container: when an
762    element is inserted, a trigger policy might decide that there
763    is no need to resize, as the table still contains unused
764    entries; the probe sequence, however, might never reach any of
765    the unused entries.</p>
766
767    <p>Unfortunately, <tt>pb_ds</tt> cannot detect such problems at
768    compilation (they are halting reducible). It therefore defines
769    an exception class <a href=
770    "insert_error.html"><tt>insert_error</tt></a> to throw an
771    exception in this case.</p>
772
773    <h3><a name="policy_interaction_hash_trigger" id=
774    "policy_interaction_hash_trigger">Hash Policies and Trigger
775    Policies</a></h3>
776
777    <p>Some trigger policies are especially susceptible to poor
778    hash functions. Suppose, as an extreme case, that the hash
779    function transforms each key to the same hash value. After some
780    inserts, a collision detecting policy will always indicate that
781    the container needs to grow.</p>
782
783    <p>The library, therefore, by design, limits each operation to
784    one resize. For each <tt>insert</tt>, for example, it queries
785    only once whether a resize is needed.</p>
786
787    <h3><a name="policy_interaction_eq_sth_hash" id=
788    "policy_interaction_eq_sth_hash">Equivalence Functors, Storing
789    Hash Values, and Hash Functions</a></h3>
790
791    <p><a href=
792    "cc_hash_table.html"><tt>cc_hash_table</tt></a> and
793    <a href=
794    "gp_hash_table.html"><tt>gp_hash_table</tt></a> are
795    parametrized by an equivalence functor and by a
796    <tt>Store_Hash</tt> parameter. If the latter parameter is
797    <tt><b>true</b></tt>, then the container stores with each entry
798    a hash value, and uses this value in case of collisions to
799    determine whether to apply a hash value. This can lower the
800    cost of collision for some types, but increase the cost of
801    collisions for other types.</p>
802
803    <p>If a ranged-hash function or ranged probe function is
804    directly supplied, however, then it makes no sense to store the
805    hash value with each entry. <tt>pb_ds</tt>'s container will
806    fail at compilation, by design, if this is attempted.</p>
807
808    <h3><a name="policy_interaction_size_load_check" id=
809    "policy_interaction_size_load_check">Size Policies and
810    Load-Check Trigger Policies</a></h3>
811
812    <p>Assume a size policy issues an increasing sequence of sizes
813    <i>a, a q, a q<sup>1</sup>, a q<sup>2</sup>, ...</i> For
814    example, an exponential size policy might issue the sequence of
815    sizes <i>8, 16, 32, 64, ...</i></p>
816
817    <p>If a load-check trigger policy is used, with loads
818    <i>&alpha;<sub>min</sub></i> and <i>&alpha;<sub>max</sub></i>,
819    respectively, then it is a good idea to have:</p>
820
821    <ol>
822      <li><i>&alpha;<sub>max</sub> ~ 1 / q</i></li>
823
824      <li><i>&alpha;<sub>min</sub> &lt; 1 / (2 q)</i></li>
825    </ol>
826
827    <p>This will ensure that the amortized hash cost of each
828    modifying operation is at most approximately 3.</p>
829
830    <p><i>&alpha;<sub>min</sub> ~ &alpha;<sub>max</sub></i> is, in
831    any case, a bad choice, and <i>&alpha;<sub>min</sub> &gt;
832    &alpha;<sub>max</sub></i> is horrendous.</p>
833  </div>
834</body>
835</html>
836