197403Sobrien<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
297403Sobrien<html>
3132720Skan    <head>
497403Sobrien        <title>Hash-Based Containers</title>
597403Sobrien        <meta name="GENERATOR" content="Microsoft Visual Studio .NET 7.1">
697403Sobrien        <meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5">
797403Sobrien    </head>
897403Sobrien
997403Sobrien<body bgcolor = "white">
1097403Sobrien
1197403Sobrien<h1>Hash-Based Containers</h1>
1297403Sobrien
1397403Sobrien<p>
1497403Sobrien    This section describes hash-based containers. It is organized
1597403Sobrienas follows.
1697403Sobrien</p>
1797403Sobrien
18169691Skan<ol>
1997403Sobrien	<li>
2097403Sobrien		<a href = "#overview">Overview</a> is an overview.
2197403Sobrien	</li>
2297403Sobrien	<li>
2397403Sobrien		<a href = "#hash_policies">Hash Policies</a> discusses
2497403Sobrien	hash policies.
2597403Sobrien	</li>
2697403Sobrien	<li>
2797403Sobrien		<a href = "#resize_policies">Resize Policies</a> discusses
2897403Sobrien	resize policies.
2997403Sobrien	</li>
3097403Sobrien	<li>
3197403Sobrien		<a href = "#policy_interaction">Policy Interaction</a> discusses
3297403Sobrien	interaction between policies.
3397403Sobrien	</li>
34132720Skan</ol>
35132720Skan
3697403Sobrien
3797403Sobrien
3897403Sobrien<h2><a name = "overview">Overview</a></h2>
3997403Sobrien
4097403Sobrien
4197403Sobrien<p>
42	Figure
43<a href = "#hash_cd">Hash-based containers</a>
44	shows the container-hierarchy; the hash-based containers are circled.
45<a href = "cc_hash_assoc_cntnr.html"><tt>cc_hash_assoc_cntnr</tt></a>
46is a collision-chaining hash-based container;
47<a href = "gp_hash_assoc_cntnr.html"><tt>gp_hash_assoc_cntnr</tt></a>
48is a (general) probing hash-based container.
49</p>
50
51<h6 align = "center">
52<a name = "hash_cd">
53<img src = "hash_cd.jpg" width = "70%" alt = "no image">
54</h6>
55<h6 align = "center">
56</a>
57Hash-based containers.
58</h6>
59
60<p>
61	The collision-chaining hash-based container has the following declaration.
62</p>
63<pre>
64<b>template</b>&lt;
65	<b>typename</b> Key,
66	<b>typename</b> Data,
67	<b>class</b> Hash_Fn = std::hash&lt;Key&gt;,
68	<b>class</b> Eq_Fn = std::equal_to&lt;Key&gt;,
69	<b>class</b> Comb_Hash_Fn =
70		<a href = "direct_mask_range_hashing.html">direct_mask_range_hashing</a>&lt;&gt;
71	<b>class</b> Resize_Policy = <i>default explained below.</i>
72	<b>bool</b> Store_Hash = <b>false</b>,
73	<b>class</b> Allocator =
74		std::allocator&lt;<b>char</b>&gt; &gt;
75<b>class</b> <a href = "cc_hash_assoc_cntnr.html">cc_hash_assoc_cntnr</a>;
76</pre>
77
78<p>
79	The parameters have the following meaning:
80</p>
81<ol>
82	<li> <tt>Key</tt> is the key type.
83	</li>
84	<li> <tt>Data</tt> is the data-policy, and is explained in
85<a href = "ms_gen.html#ds_policy">Mapping-Semantics Genericity::Data Types as a Policy</a>.
86	</li>
87	<li> <tt>Hash_Fn</tt> is a key hashing functor.</li>
88	<li> <tt>Eq_Fn</tt> is a key equivalence functor.</li>
89	<li> <tt>Comb_Hash_Fn</tt> is a <i>range-hashing_functor</i>; it
90describes how to translate hash values into positions within the table.
91This is described in
92<a name = "#hash_policies">Hash Policies</a>.</li>
93	</li>
94	<li> <tt>Resize_Policy</tt> describes how a container object should
95change its internal size. This is described in
96<a name = #resize_policies">Resize Policies</a>.</li>
97	<li> <tt>Store_Hash</tt> indicates whether the hash value should
98be stored with each entry. This is described in
99<a name = "#policy_interaction">Policy Interaction</a>.</li>
100	<li> <tt>Allocator</tt> is (surprisingly) an allocator type.
101	</li>
102</ol>
103
104<p>
105	The probing hash-based container has the following declaration.
106</p>
107<pre>
108<b>template</b>&lt;
109	<b>typename</b> Key,
110	<b>typename</b> Data,
111	<b>class</b> Hash_Fn =
112		std::hash&lt;
113			Key&gt;,
114	<b>class</b> Eq_Fn =
115		std::equal_to&lt;
116			Key&gt;,
117	<b>class</b> Comb_Probe_Fn =
118		<a href = "direct_mask_range_hashing.html">direct_mask_range_hashing</a>&lt;&gt;
119	<b>class</b> Probe_Fn = <i>default explained below.</i>
120	<b>class</b> Resize_Policy = <i>default explained below.</i>
121	<b>bool</b> Store_Hash = <b>false</b>,
122	<b>class</b> Allocator =
123		std::allocator&lt;<b>char</b>&gt; &gt;
124<b>class</b> <a href = "gp_hash_assoc_cntnr.html">gp_hash_assoc_cntnr</a>;
125</pre>
126
127<p>
128	The parameters are identical to those of the collision-chaining container, except
129for the following.
130</p>
131<ol>
132	<li> <tt>Comb_Probe_Fn</tt> describes how to transform a probe sequence into
133a sequence of positions within the table.
134	</li>
135	<li> <tt>Probe_Fn</tt> describes a probe sequence policy.</li>
136</ol>
137
138
139<p>
140	Some of the default template values depend on the values of other parameters,
141and are explained in
142<a name = "#policy_interaction">Policy Interaction</a>.
143</p>
144
145<h2><a name = "hash_policies">Hash Policies</h2></a>
146<p>
147    This subsection describes hash policies. It is organized as follows:
148</p>
149<ol>
150    <li> <a href = "#general_terms">General Terms</a>  describes
151            some general terms.
152    </li>
153    <li> <a href = "#range_hashing_fns">Range-Hashing Functions</a>
154        describes range-hasing functions.</li>
155    <li> <a href = "#hash_policies_ranged_hash_policies">Ranged-Hash Functions</a>
156        describes ranged-hash functions. </li>
157    <li> <a href = "#pb_assoc_imp">Implementation in <tt>pb_assoc</tt></a>
158            describes the implementation of these concepts in <tt>pb_assoc</tt>.
159    </li>
160</ol>
161
162
163<h3><a name="general_terms">General Terms</a></h3>
164
165<p>
166    There
167are actually three functions involved in transforming a key into a hash-table's
168position (see Figure
169<a href = "#hash_ranged_hash_range_hashing_fns">
170Hash runctions, ranged-hash functions, and range-hashing functions
171</a>):
172</p>
173<ol>
174    <li>
175        A <i>ranged-hash</i> function, which maps keys into an interval of the
176        non-negative integrals. This is the function actually required by the
177        hash-table algorithm.
178    </li>
179    <li>
180        A hash function, which maps keys into non-negative integral types. This is
181        typically specified by the writer of the key class.
182    </li>
183    <li>
184        A <i>range-hashing</i> function, which maps non-negative integral types into an
185        interval of non-negative integral types.
186    </li>
187</ol>
188
189<h6 align = "center">
190<a name = "hash_ranged_hash_range_hashing_fns">
191<img src = "hash_ranged_hash_range_hashing_fns.jpg" width = "70%" alt = "no image">
192</h6>
193<h6 align = "center">
194</a>
195Hash runctions, ranged-hash functions, and range-hashing functions.
196</h6>
197
198<p>
199    Let <i>U</i> be a domain (<i>e.g.</i>, the integers, or the strings of 3
200    characters). A hash-table algorithm needs to map elements of <i>U</i> "uniformly"
201    into the range <i>[0,..., m - 1]</i> (where <i>m</i> is a non-negative integral
202    value, and is, in general, time varying). <i>I.e.</i>, the algorithm needs a <i>ranged-hash</i>
203    function
204</p>
205<p>
206    <i>f : U &times; Z<sub>+</sub> &rarr; Z<sub>+</sub> </i>,
207</p>
208<p>
209    such that for any <i>u</i> in <i>U</i>
210,
211</p>
212<p>
213    <i>0 &le; f(u, m) &le; m - 1 </i>,
214</p>
215<p>
216    and which has "good uniformity" properties [<a href="references.html#knuth98sorting">knuth98sorting</a>].
217    One common solution is to use the composition of the hash function
218</p>
219<p>
220    <i>h : U &rarr; Z<sub>+</sub> </i>,
221</p>
222<p>
223    which maps elements of <i>U</i> into the non-negative integrals, and
224</p>
225<p>
226    <i>g : Z<sub>+</sub> &times; Z<sub>+</sub> &rarr; Z<sub>+</sub>, </i>
227</p>
228<p>
229    which maps a non-negative hash value, and a non-negative range upper-bound into
230    a non-negative integral in the range between 0 (inclusive) and the range upper
231    bound (exclusive), <i>i.e.</i>, for any <i>r</i> in <i>Z<sub>+</sub></i>,
232</p>
233<p>
234    <i>0 &le; g(r, m) &le; m - 1 </i>.
235</p>
236<p>
237    The resulting ranged-hash function, is
238</p>
239<p>
240    <i><a name="eqn:ranged_hash_composed_of_hash_and_range_hashing">f(u , m) = g(h(u), m) </a>
241    </i>(1) .
242</p>
243
244<p>
245    From the above, it is obvious that given <i>g</i> and <i>h</i>, <i>f</i> can
246    always be composed (however the converse is not true). The STL's hash-based
247    containers allow specifying a hash function, and use a hard-wired range-hashing function; the ranged-hash function is implicitly composed.
248</p>
249
250
251<p>
252    The above describes the case where a key is to be mapped into a <i>single
253position</i> within a hash table, <i>e.g.</i>, in a collision-chaining table.
254In other cases, a key is to be mapped into a <i>sequence of poisitions</i>
255within a table, <i>e.g.</i>, in a probing table.
256</p>
257<p>
258    Similar terms apply in this case: the table requires a <i>ranged probe</i>
259function, mapping a key into a sequence of positions withing the table. This is
260typically acheived by composing a <i>hash function</i> mapping the key
261into a non-negative integral type, a <i>probe</i> function transforming the
262hash value into a sequence of hash values, and a <i>range-hashing</i> function
263transforming the sequence of hash values into a sequence of positions.
264</p>
265
266
267<h3><a name="range_hashing_fns">Range-Hashing Functions</a></h3>
268
269<p>
270    Some common choices for range-hashing functions are the division,
271    multiplication, and middle-square methods [<a href="references.html#knuth98sorting">knuth98sorting</a>],
272    defined as
273</p>
274<p>
275    <i><a name="eqn:division_method">g(r, m) = r mod m </a></i>(2) ,
276</p>
277<p>
278    <i>g(r, m) = &lceil; u/v ( a r mod v ) &rceil; </i>,
279</p>
280<p>
281    and
282</p>
283<p>
284    <i>g(r, m) = &lceil; u/v ( r<sup>2</sup> mod v ) &rceil; </i>,
285</p>
286<p>
287respectively, for some positive integrals <i>u</i> and <i>v</i> (typically
288powers of 2), and some <i>a</i>. Each of these range-hashing functions works
289best for some different setting.
290</p>
291<p>
292    The division method <a href="#division_method">(2)</a> is a very common
293    choice. However, even this single method can be implemented in two very
294    different ways. It is possible to implement <a href="#division_method">(2)</a>
295    using the low level <i>%</i> (modulo) operation (for any <i>m</i>), or the low
296    level <i>&amp;</i> (bit-mask) operation (for the case where <i>m</i> is a power of
297    2), <i>i.e.</i>,
298</p>
299<p>
300    <i><a name="eqn:division_method_prime_mod">g(r, m) = r % m </a></i>(3) ,
301</p>
302<p>
303    and
304</p>
305<p>
306    <a name="eqn:division_method_bit_mask">
307    <i>g(r, m) = r &amp; m - 1, ( m = 2<sup>k</sup>
308    </i>
309        for some<i> k) </i></a>(4) ,
310</p>
311<p>
312    respectively.
313</p>
314<p>
315    The <i>%</i> (modulo) implementation <a href="#division_method_prime_mod">(3)</a>
316    has the advantage that for <i>m</i> a prime far from a power of 2, <i>g(r, m)</i>
317    is affected by all the bits of <i>r</i> (minimizing the chance of collision).
318    It has the disadvantage of using the costly modulo operation. This method is
319    hard-wired into SGI's implementation [<a href="references.html#sgi_stl">sgi_stl</a>].
320</p>
321
322<p>
323    The <i>&amp;</i> (bit-mask) implementation <a href="#division_method_bit_mask">(4)</a>
324    has the advantage of relying on the fast bitwise and operation. It has the
325    disadvantage that for <i>g(r, m)</i> is affected only by the low order bits of <i>r</i>.
326    This method is hard-wired into Dinkumware's implementation [<a href="references.html#dinkumware_stl">dinkumware_stl</a>].
327</p>
328
329
330
331
332<h3><a name="hash_policies_ranged_hash_policies">Ranged-Hash Functions</a></h3>
333
334<p>
335    In some less frequent cases it is beneficial to allow the client to
336directly specify a ranged-hash hash function. It is true, that the writer of
337the ranged-hash function cannot rely on the values of <i>m</i> having specific
338numerical properties suitable for hashing (in the sense used in [<a href="references.html#knuth98sorting">knuth98sorting</a>]),
339since the values of <i>m</i> are determined by a resize policy with possibly
340orthogonal considerations.
341</p>
342
343<p>
344	There are two cases where a ranged-hash function can be superior. The firs is when using perfect hashing
345[<a href="references.html#knuth98sorting">knuth98sorting</a>]; the second
346is when the values of <i>m</i> can be used to estimate the
347"general" number of distinct values required. This is described in the following.
348</p>
349
350<p>
351    Let
352</p>
353
354<p>
355    <i>s = [ s<sub>0</sub>,..., s<sub>t - 1</sub>] </i>
356</p>
357
358<p>
359    be a string of <i>t</i> characters, each of which is from domain <i>S</i>.
360Consider the following ranged-hash function:
361</p>
362
363<p>
364    <a name="eqn:total_string_dna_hash">
365        <i>
366            f<sub>1</sub>(s, m) =
367            &sum; <sub>i =
368            0</sub><sup>t   - 1</sup> s<sub>i</sub> a<sup>i</sup> </i>mod<i> m </i>
369    </a> (5) ,
370</p>
371
372<p>
373    where <i>a</i> is some non-negative integral value. This is the standard
374string-hashing function used in SGI's implementation (with <i>a = 5</i>) [<a href="references.html#sgi_stl">sgi_stl</a>].
375Its advantage is that it takes into account all of the characters of the
376string.
377</p>
378
379<p>
380    Now assume that <i>s</i> is the string representation of a of a long DNA
381sequence (and so <i>S = {'A', 'C', 'G', 'T'}</i>). In this case, scanning the
382entire string might be prohibitively expensive. A possible alternative might be
383to use only the first <i>k</i> characters of the string, where
384</p>
385
386<p>
387    k <sup>|S|</sup> &ge; m ,
388</p>
389<p>
390    <i>i.e.</i>, using the hash function
391</p>
392<p>
393    <a name="eqn:only_k_string_dna_hash"><i>f<sub>2</sub>(s, m) = &sum; <sub>i = 0</sub><sup>k
394                - 1</sup> s<sub>i</sub> a<sup>i</sup> </i>mod <i>m </i></a>, (6)
395</p>
396<p>
397    requiring scanning over only
398</p>
399<p>
400    <i>k = </i>log<i><sub>4</sub>( m ) </i>
401</p>
402<p>
403    characters.
404</p>
405<p>
406    Other more elaborate hash-functions might scan <i>k</i> characters starting at
407    a random position (determined at each resize), or scanning <i>k</i> random
408    positions (determined at each resize), <i>i.e.</i>, using
409</p>
410<p>
411    <i>f<sub>3</sub>(s, m) = &sum; <sub>i = r<sub>0</sub></sub><sup>r<sub>0</sub> + k - 1</sup>
412        s<sub>i</sub> a<sup>i</sup> </i>mod <i>m </i>,
413</p>
414<p>
415    or
416</p>
417<p>
418    <i>f<sub>4</sub>(s, m) = &sum; <sub>i = 0</sub><sup>k - 1</sup> s<sub>r<sub>i</sub></sub>
419        a<sup>r<sub>i</sub></sup> </i>mod <i>m </i>,
420</p>
421<p>
422<p>
423    respectively, for <i>r<sub>0</sub>,..., r<sub>k-1</sub></i> each in the
424    (inclusive) range <i>[0,...,t-1]</i>.
425</p>
426
427
428<h3><a name="pb_assoc_imp">Implementation in <tt>pb_assoc</tt></a></h3>
429
430<p>
431<a href = "cc_hash_assoc_cntnr.html"><tt>cc_hash_assoc_cntnr</tt></a> is
432parameterized by <tt>Hash_Fn</tt> and <tt>Comb_Hash_Fn</tt>, a hash functor
433and a combining hash functor, respectively.
434</p>
435
436<p>
437	For any hash functor except <a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a>,
438one of the
439<a href = "concepts.html#concepts_null_policies">Concepts::Null Policy Classes</a>,
440then <tt>Comb_Hash_Fn</tt> is considered a range-hashing functor.
441The container will synthesize a ranged-hash functor from both. For example, Figure
442<a href = "#hash_range_hashing_seq_diagram">
443Insert hash sequence diagram
444</a>
445shows an <tt>insert</tt> sequence diagram. The user inserts an element (point A),
446the container transforms the key into a non-negative integral using the hash
447functor (points B and C), and transforms the result into a position
448using the combining functor (points D and E).
449</p>
450
451<h6 align = "center">
452<a name = "hash_range_hashing_seq_diagram">
453<img src = "hash_range_hashing_seq_diagram.jpg" width = "70%" alt = "no image">
454</a>
455</h6>
456<h6 align = "center">
457Insert hash sequence diagram.
458</h6>
459
460
461<p>
462    <tt>pb_assoc</tt> contains the following range-hashing policies:
463</p>
464
465<ol>
466    <li>
467<a href = "direct_mask_range_hashing.html"><tt>direct_mask_range_hashing</tt></a>
468and
469<a href = "direct_mod_range_hashing.html"><tt>direct_mod_range_hashing</tt></a>
470are range-hashing functions based on a bit-mask and a modulo operation, respectively.
471    </li>
472</ol>
473
474
475<p>
476    If <tt>Comb_Hash_Fn</tt> is instantiated by
477<a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a>,
478and a combining-hash functor, the container treats
479the combining hash functor as a ranged-hash function. For example, Figure
480<a href = "#hash_range_hashing_seq_diagram2">
481Insert hash sequence diagram with a null combination policy
482</a>
483shows an <tt>insert</tt> sequence diagram. The user inserts an element (point A),
484the container transforms the key into a position
485using the combining functor (points B and C).
486</p>
487
488
489<h6 align = "center">
490<a name = "hash_range_hashing_seq_diagram2">
491<img src = "hash_range_hashing_seq_diagram2.jpg" width = "70%" alt = "no image">
492</a>
493</h6>
494<h6 align = "center">
495Insert hash sequence diagram with a null combination policy.
496</h6>
497
498<p>
499	Similarly,
500<a href = "gp_hash_assoc_cntnr.html"></a><tt>gp_hash_assoc_cntnr</tt>
501is parameterized by <tt>Hash_Fn</tt>, <tt>Probe_Fn</tt>, and
502<tt>Comb_Probe_Fn</tt>. As before, if <tt>Probe_Fn</tt>
503and <tt>Comb_Probe_Fn</tt> are, respectively,
504<a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a> and
505<a href = "null_probe_fn.html"><tt>null_probe_fn</tt></a>,
506then <tt>Comb_Probe_Fn</tt> is a ranged-probe functor. Otherwise, <tt>Hash_Fn</tt>
507is a hash functor, <tt>Probe_Fn</tt> is a functor for offsets from a hash value,
508and <tt>Comb_Probe_Fn</tt> transforms a probe sequence into a sequence of positions
509within the table.
510</p>
511
512<p>
513	<tt>pb_assoc</tt> contains the following probe policies:
514</p>
515
516<ol>
517    <li>
518<a href = "linear_probe_fn.html"><tt>linear_probe_fn</tt></a> is a linear probe
519function.
520    </li>
521	<li>
522<a href = "quadratic_probe_fn.html"><tt>quadratic_probe_fn</tt></a> is
523a quadratic probe function.
524    </li>
525</ol>
526
527
528
529
530
531
532
533
534
535<h2><a name = "resize_policies">Resize Policies</a></h2>
536
537<p>
538    This subsection describes resize policies. It is organized as follows:
539</p>
540
541<ol>
542    <li> <a href = "#general">General Terms</a> describes general
543        terms.
544    </li>
545    <li> <a href = "#size_policies">Size Policies</a>  describes size
546    policies.
547    </li>
548    <li> <a href = "#trigger_policies">Trigger Policies</a>  describes trigger
549    policies.
550    </li>   <li> <a href = "#imp_pb_assoc">Implementation in <tt>pb_assoc</tt></a>
551         describes the implementation of these concepts in <tt>pb_assoc</tt>.
552    </li>
553</ol>
554
555
556<h3><a name = "general">General Terms</a></h3>
557
558<p>
559    Hash-tables, as opposed to trees, do not naturally grow or shrink. It
560is necessary to specify policies to determine how and when a hash table should change
561its size.
562</p>
563
564<p>
565    In general, resize policies can be decomposed into (probably orthogonal)
566policies:
567</p>
568<ol>
569    <li> A <i>size policy</i> indicating <i>how</i> a hash table should
570grow (<i>e.g.,</i> it should multiply by powers of 2).
571    </li>
572    <li> A <i>trigger policy</i> indicating <i>when</i> a hash table should
573grow (<i>e.g.,</i> a load factor is exceeded).
574    </li>
575</ol>
576
577
578
579<h3><a name = "size_policies">Size Policies</a></h3>
580
581<p>
582    Size policies determine how a hash table
583changes size. These policies are simple, and there are relatively
584few sensible options. An exponential-size policy (with the initial
585size and growth factors both powers of 2) works well with a
586mask-based range-hashing function (see the <a href = "hash_policies.html#hash_policies_range_hashing_policies">Range-Hashing Policies</a> subsection),
587and is the
588hard-wired policy used by Dinkumware
589[<a href = "references.html#dinkumware_stl">dinkumware_stl</a>]. A
590prime-list based policy works well with a modulo-prime range
591hashing function (see the <a href = "hash_policies.html#hash_policies_range_hashing_policies">Range-Hashing Policies</a> subsection),
592and is the
593hard-wired policy used by SGI's implementation
594[<a href = "references.html#sgi_stl">sgi_stl</a>].
595</p>
596
597
598
599
600<h3><a name = "trigger_policies">Trigger Policies</a></h3>
601
602<p>
603    Trigger policies determine when a hash table changes size.
604Following is a description of two polcies: <i>load-check</i>
605policies, and a collision-check policies.
606</p>
607
608<p>
609    Load-check policies are straightforward. The user
610specifies two factors, <i>&alpha;<sub>min</sub></i> and <i>&alpha;<sub>max</sub></i>, and
611the hash table maintains the invariant that
612</p>
613<p>
614    <i>
615        <a name = "eqn:load_factor_min_max">
616        &alpha;<sub>min</sub>
617        &le;
618        (number of stored elements) / (hash-table size)
619        &le;
620        &alpha;<sub>max</sub>
621        </a>
622    </i>
623    (1)
624    .
625</p>
626
627<p>
628    Collision-check policies work in the opposite direction of
629load-check policies. They focus on keeping the number of
630collisions moderate and hoping
631that the size of the table will not grow very large,
632instead of keeping a moderate load-factor and
633hoping that the number of collisions will be small.
634A
635maximal collision-check policy resizes when the shortest
636probe-sequence grows too large.
637</p>
638
639
640<p>
641    Consider Figure
642<a href = "#balls_and_bins">Balls and bins</a>.
643    Let the size of the hash table be denoted by <i>m</i>, the
644length of a probe sequence be denoted by <i>k</i>, and some load
645factor be denoted by &alpha;. We would like to calculate the
646minimal length of <i>k</i>, such that if there were <i>&alpha; m</i> elements
647in the hash table, a probe sequence of length <i>k</i> would be found
648with probability at most <i>1/m</i>.
649</p>
650
651<h6 align = "center">
652<a name = "balls_and_bins">
653<img src = "balls_and_bins.jpg" width = "70%" alt = "no image">
654</a>
655</h6>
656<h6 align = "center">
657Balls and bins.
658</h6>
659
660
661<p>
662    Denote the probability that a probe sequence of length <i>k</i>
663appears in bin <i>i</i> by <i>p<sub>i</sub></i>, the length of the probe sequence
664of bin <i>i</i> by <i>l<sub>i</sub></i>, and assume uniform distribution.
665Then
666</p>
667    <p>
668    <a name = "eqn:prob_of_p1">
669        <i>p<sub>1</sub>
670        = </i>(3)
671    </a>
672    </p>
673    <p>
674    <i>
675    <b>P</b>(l<sub>1</sub> &ge; k)
676    =
677    </i>
678    </p>
679    <p>
680    <i><b>P</b>(l<sub>1</sub> &ge; &alpha; ( 1 + k / &alpha; - 1 )
681    &le; </i>(a)
682    </p>
683    <p>
684    <i>
685    e
686    ^
687    (
688        -
689        (
690            &alpha; ( k / &alpha; - 1 )<sup>2</sup>
691        )
692        /2
693    )
694    </i>
695    ,
696</p>
697<p>
698    where (a) follows from the Chernoff bound
699[<a href = "references.html#motwani95random">motwani95random</a>].
700To
701calculate the probability that <i>some</i> bin contains a probe
702sequence greater than <i>k</i>, we note that the <i>l<sub>i</sub></i> are
703negatively-dependent
704[<a href = "references.html#dubhashi98neg">dubhashi98neg</a>].
705Let <i><b>I</b>(.)</i>
706denote the indicator function. Then
707    <p>
708    <a name = "eqn:at_least_k_i_n_some_bin">
709        <i><b>P</b>( exists<sub>i</sub> l<sub>i</sub> &ge; k )
710        = </i>(3)
711    </a>
712    </p>
713    <p>
714    <i>
715   <b>P</b>
716    (
717        &sum; <sub>i = 1</sub><sup>m</sup>
718        <b>I</b>(l<sub>i</sub> &ge; k) &ge; 1
719    )
720    =
721    </i>
722    </p>
723    <p>
724    <i>
725    <b>P</b>
726    (
727        &sum; <sub>i = 1</sub><sup>m</sup>
728        <b>I</b>
729        (
730            l<sub>i</sub> &ge; k
731        )
732        &ge;
733        m  p<sub>1</sub> ( 1 + 1 / (m p<sub>1</sub>) - 1 )
734    )
735    &le; </i>(a)
736    </p>
737    <p>
738    <i>
739    e
740    ^
741    (
742        (
743            -
744            m p<sub>1</sub>
745            (
746                1 / (m p<sub>1</sub>) - 1
747            )
748            <sup>2</sup>
749        )
750        /
751        2
752    )
753    ,
754    </i>
755    </p>
756<p>
757where (a) follows from the fact that the Chernoff bound can be
758applied to negatively-dependent variables
759[<a href = "references.html#dubhashi98neg">dubhashi98neg</a>].
760Inserting <a href = "#prob_of_p1">(2)</a> into
761<a href = "#at_least_k_i_n_some_bin">(3)</a>, and equating with <i>1/m</i>,
762we obtain
763</p>
764<p>
765    <i>
766    k
767    ~
768    &radic;
769    (
770        2 &alpha; </i>ln<i> 2 m </i>ln<i>(m) )
771    )
772    </i>
773    .
774</p>
775
776
777
778
779
780
781
782
783
784<h3><a name = "imp_pb_assoc">Implementation in <tt>pb_assoc</tt></a></h3>
785
786<p>
787    The resize policies in the previous subsection are conceptually straightforward. The design
788of hash-based containers' size-related interface is complicated by some factors.
789</p>
790<ol>
791    <li> Most containers, <i>i.e.</i> lists, trees, and vectors, have a single "size" concept. There is no
792distinction between the number of entries the container holds and the number of entries it is using. This,
793of course, is not the case for hash-based containers. Moreover, even describing the
794"actual" size of a hash-based container (as opposed to its logical size) is difficult - a probing-based container
795holds entries to elements, even those it does not use, while a chaining-based container holds pointers to entries.
796    </li>
797    <li>
798    The policies mentioned above operate in terms of invariants. <i>E.g.</i> a load-check trigger policy
799maintains an invariant concerning the load factor of a container object. This is sometimes too rigid:
800    <ol>
801        <li>In some cases it is desirable to allow controlled override of an entire policy, <i>e.g.</i> by externally resizing a container object (or giving it an initial size, which is a special case of externally resizing the container).
802        </li>
803        <li>
804            In other cases it is desirable to allow changing the specifics of a policy in runtime, <i>e.g.</i>, changing the load factors of a load-check policy.
805        </li>
806    </ol>
807    </li>
808    <li>
809    Resize policies interact strongly with hash policies. Performance-wise, for example, it is undesirable to use an exponential size policy of powers of two with a modulo range-hashing function, and it is undesirable to use a prime size policy with a mask range-hashing function. In other cases, the effects are more dramatic. For example, using a quadratic probe function with an exponential size policy will probably cause cases where the container object has available entries which are never reached by the probe function. (<a href = "hash_policies.html">Hash Policies</a>
810discusses the previous concepts.)
811    </li>
812</ol>
813
814<p>
815    Clearly, the more of these points an interface addresses, the greater its flexibility but the lower its encapsulation and uniformity between associative containers.
816</p>
817
818
819
820<p>
821        This library attempts to address these types of problems by delegating all size-related functionality to
822policy classes. Hash-based containers
823are parameterized by a resize-policy class (among others), and derive publicly from
824the resize-policy class
825[<a href = "references.html#alexandrescu01modern">alexandrescu01modern</a>]
826 <i>E.g.</i>, a collision-chaining
827hash table is defined as follows:
828</p>
829<pre>
830cc_ht_map&lt;
831  <b>class</b> Key,
832  <b>class</b> Data,
833  ...
834  <b>class</b> Resize_Policy
835  ...&gt; :
836    <b>public</b> Resize_Policy
837</pre>
838
839<p>
840    The containers themselves lack any functionality or public interface for manipulating sizes. A container
841object merely forwards events to its resize policy object and queries it for needed actions.
842</p>
843
844<p>
845    Figure
846<a href = "#insert_resize_sequence_diagram1">
847Insert resize sequence diagram
848</a>
849shows a (possible) sequence diagram of an insert operation.
850The user inserts an element; the hash table
851notifies its resize policy that a search has started (point A);
852in this case, a single collision is encountered - the table
853notifies its resize policy of this (point B); the container
854finally notifies its resize policy that the search has ended (point C);
855it then queries its resize policy whether a resize is needed, and if so,
856what is the new size (points D to G); following the resize, it notifies
857the policy that a resize has completed (point H); finally, the element
858is inserted, and the policy notified (point I).
859</p>
860
861<h6 align = "center">
862<a name = "insert_resize_sequence_diagram1">
863<img src = "insert_resize_sequence_diagram1.jpg" width = "70%" alt = "no image">
864</a>
865</h6>
866<h6 align = "center">
867Insert resize sequence diagram.
868</h6>
869
870<p>
871    This addresses, to some extent, the problems mentioned above:
872</p>
873<ol>
874    <li>
875        Different instantiations of range-hashing policies can be met with different instantiations of
876    resize policies.
877    </li>
878    <li>
879        Questions on size-related interface are avoided, since the containers have no size-related methods. Thus
880    a container has no method for querying its actual size. It merely continuously forwards enough information to
881    its resize policy to answer such queries; the designer of the resize policy can decide whether, or how, to design     the appropriate method. Also, a container has no methods for setting its size. It merely queries its
882resize policy for an initial size, queries it on a new size (if the resize policy indicates a resize is needed), and
883supports a <tt><b>protected virtual</b></tt> function for external resize.
884    </li>
885</ol>
886
887<p>
888    The library contains a single class for instantiating a resize policy,
889<tt>pb_assoc</tt> contains
890a standard resize policy,
891<a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a> (the name is explained shortly).
892In terms of interface, it is parameterized by a boolean constant indicating whether its public interface supports
893queries of actual size and external resize operations (the inclusion and exclusion of these methods in the interface have obvious tradeoffs in terms of encapsulation and flexibility).
894([<a href = "references.html#alexandrescu01modern">alexandrescu01modern</a>] shows many techniques for
895changing between alternative interfaces at compile time.)
896</p>
897
898<p>
899As noted before,
900    size and trigger policies are usually orthogonal.
901<a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>
902is parameterized by size and trigger policies. For example,
903a collision-chaining hash table
904is typically be defined as follows:
905</p>
906<pre>
907cc_ht_map&lt;
908  key,
909  data,
910  ...
911  hash_standard_resize_policy&lt;
912    some_trigger_policy,
913    some_size_policy,
914    ...&gt; &gt;
915</pre>
916
917<p>
918 The sole function of
919<a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>
920 is to
921act as a standard delegator
922[<a href = "references.html#gamma95designpatterns">gamma95designpatterns</a>] for these
923policies.
924
925<p>
926    Figures
927<a href = "#insert_resize_sequence_diagram2">Standard resize policy trigger sequence diagram</a>
928    and
929<a href = "#insert_resize_sequence_diagram3">Standard resize policy size sequence diagram</a>
930    show sequence diagrams illustrating the interaction between
931    the standard resize policy and its trigger and size policies, respectively.
932</p>
933
934<h6 align = "center">
935<a name = "insert_resize_sequence_diagram2">
936<img src = "insert_resize_sequence_diagram2.jpg" width = "70%" alt = "no image">
937</a>
938</h6>
939<h6 align = "center">
940Standard resize policy trigger sequence diagram.
941</h6>
942
943<h6 align = "center">
944<a name = "insert_resize_sequence_diagram3">
945<img src = "insert_resize_sequence_diagram3.jpg" width = "70%" alt = "no image">
946</a>
947</h6>
948<h6 align = "center">
949Standard resize policy size sequence diagram.
950</h6>
951
952<p>
953    The library (currently) supports the following instantiations of size
954and trigger policies:
955</p>
956
957<ol>
958    <li>
959        <a href = "hash_load_check_resize_trigger.html"><tt>hash_load_check_resize_trigger</tt></a> implements
960    a load check trigger policy.
961    </li>
962    <li>
963        <a href = "cc_hash_max_collision_check_resize_trigger.html"><tt>cc_hash_max_collision_check_resize_trigger</tt></a>
964    implements a collision check trigger policy.
965    </li>
966    <li>
967<a href = "hash_exponential_size_policy.html"><tt>hash_exponential_size_policy</tt></a> implemens
968an exponential-size policy (which should be used with mask range hashing).
969    </li>
970    <li>
971<a href = "hash_prime_size_policy.html"><tt>hash_prime_size_policy</tt></a> implementing
972a size policy based on a sequence of primes
973[<a href = "references.html#sgi_stl">sgi_stl</a>] (which should be used with mod range hashing
974    </li>
975</ol>
976
977<p>
978    The trigger policies also support interfaces for changing their specifics which depend on compile time constants.
979</p>
980
981
982<p>
983    Figure
984<a href = "#resize_policy_cd">Resize policy class diagram</a> gives an overall picture
985of the resize-related classes.
986<tt>Container</tt> (which stands for any of the hash-based containers) is parameterized
987by <tt>Resize_Policy</tt>, from which it subclasses publicly
988[<a href = "references.html#alexandrescu01modern">alexandrescu01modern</a>].
989This class is currently instantiated only by
990<a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>.
991<a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a> itself
992is parameterized by <tt>Trigger_Policy</tt> and <tt>Size_Policy</tt>.
993Currently, <tt>Trigger_Policy</tt> is instantiated by
994<a href = "hash_load_check_resize_trigger.html"><tt>hash_load_check_resize_trigger</tt></a>,
995or
996<a href = "cc_hash_max_collision_check_resize_trigger.html"><tt>cc_hash_max_collision_check_resize_trigger</tt></a>; <tt>Size_Policy</tt> is instantiated by
997<a href = "hash_exponential_size_policy.html"><tt>hash_exponential_size_policy</tt></a>,
998or
999<a href = "hash_prime_size_policy.html"><tt>hash_prime_size_policy</tt></a>.
1000</p>
1001
1002
1003<h6 align = "center">
1004<a name = "resize_policy_cd">
1005<img src = "resize_policy_cd.jpg" width = "70%" alt = "no image">
1006</a>
1007</h6>
1008<h6 align = "center">
1009Resize policy class diagram.
1010</h6>
1011
1012
1013
1014
1015<h2><a name = "#policy_interaction">Policy Interaction</a></h2>
1016
1017<p>
1018	Hash-tables are unfortunately susceptible to choice of policies. One
1019of the more complicated aspects of this is that poor combinations of good policies
1020can alter performance drastically. Following are some considerations.
1021</p>
1022
1023
1024
1025
1026
1027<h3>Range-Hashing Policies and Resize Policies</h3>
1028
1029<p>
1030</p>
1031
1032
1033<h3>Equivalence Functors, Storing Hash Values, and Hash Functions</h3>
1034
1035
1036<p>
1037<a href = "cc_hash_assoc_cntnr.html"><tt>cc_hash_assoc_cntnr</tt></a>
1038and
1039<a href = "gp_hash_assoc_cntnr.html"><tt>gp_hash_assoc_cntnr</tt></a>
1040are parameterized by an equivalenc functor and by a <tt>Store_Hash</tt>
1041parameter. If the latter parameter is <tt><b>true</b></tt>, then
1042the container stores with each entry a hash value, and uses
1043this value in case of collisions to determine whether to apply a hash value.
1044This can lower the cost of collision for some types, but increase the cost of collisions for other types.
1045</p>
1046
1047<p>
1048	If a ranged-hash function or ranged probe function is directly supplied, however,
1049then it makes no sense to store the hash value with each entry. <tt>pb_assoc</tt>'s container will fail at compilation, by design, if this is attempted.
1050</p>
1051
1052
1053
1054</body>
1055
1056</html>
1057