policy_data_structures.xml revision 1.3
1<chapter xmlns="http://docbook.org/ns/docbook" version="5.0"
2	 xml:id="manual.ext.containers.pbds" xreflabel="pbds">
3  <info>
4    <title>Policy-Based Data Structures</title>
5    <keywordset>
6      <keyword>ISO C++</keyword>
7      <keyword>policy</keyword>
8      <keyword>container</keyword>
9      <keyword>data</keyword>
10      <keyword>structure</keyword>
11      <keyword>associated</keyword>
12      <keyword>tree</keyword>
13      <keyword>trie</keyword>
14      <keyword>hash</keyword>
15      <keyword>metaprogramming</keyword>
16    </keywordset>
17  </info>
18  <?dbhtml filename="policy_data_structures.html"?>
19
20  <!-- 2006-04-01 Ami Tavory -->
21  <!-- 2011-05-25 Benjamin Kosnik -->
22
23  <!-- S01: intro -->
24  <section xml:id="pbds.intro">
25    <info><title>Intro</title></info>
26
27    <para>
28      This is a library of policy-based elementary data structures:
29      associative containers and priority queues. It is designed for
30      high-performance, flexibility, semantic safety, and conformance to
31      the corresponding containers in <literal>std</literal> and
32      <literal>std::tr1</literal> (except for some points where it differs
33      by design).
34    </para>
35    <para>
36    </para>
37
38    <section xml:id="pbds.intro.issues">
39      <info><title>Performance Issues</title></info>
40      <para>
41      </para>
42
43      <para>
44	An attempt is made to categorize the wide variety of possible
45	container designs in terms of performance-impacting factors. These
46	performance factors are translated into design policies and
47	incorporated into container design.
48      </para>
49
50      <para>
51	There is tension between unravelling factors into a coherent set of
52	policies. Every attempt is made to make a minimal set of
53	factors. However, in many cases multiple factors make for long
54	template names. Every attempt is made to alias and use typedefs in
55	the source files, but the generated names for external symbols can
56	be large for binary files or debuggers.
57      </para>
58
59      <para>
60	In many cases, the longer names allow capabilities and behaviours
61	controlled by macros to also be unamibiguously emitted as distinct
62	generated names.
63      </para>
64
65      <para>
66	Specific issues found while unraveling performance factors in the
67	design of associative containers and priority queues follow.
68      </para>
69
70      <section xml:id="pbds.intro.issues.associative">
71	<info><title>Associative</title></info>
72
73	<para>
74	  Associative containers depend on their composite policies to a very
75	  large extent. Implicitly hard-wiring policies can hamper their
76	  performance and limit their functionality. An efficient hash-based
77	  container, for example, requires policies for testing key
78	  equivalence, hashing keys, translating hash values into positions
79	  within the hash table, and determining when and how to resize the
80	  table internally. A tree-based container can efficiently support
81	  order statistics, i.e. the ability to query what is the order of
82	  each key within the sequence of keys in the container, but only if
83	  the container is supplied with a policy to internally update
84	  meta-data. There are many other such examples.
85	</para>
86
87	<para>
88	  Ideally, all associative containers would share the same
89	  interface. Unfortunately, underlying data structures and mapping
90	  semantics differentiate between different containers. For example,
91	  suppose one writes a generic function manipulating an associative
92	  container.
93	</para>
94
95	<programlisting>
96	  template&lt;typename Cntnr&gt;
97	  void
98	  some_op_sequence(Cntnr&amp; r_cnt)
99	  {
100	  ...
101	  }
102	</programlisting>
103
104	<para>
105	  Given this, then what can one assume about the instantiating
106	  container? The answer varies according to its underlying data
107	  structure. If the underlying data structure of
108	  <literal>Cntnr</literal> is based on a tree or trie, then the order
109	  of elements is well defined; otherwise, it is not, in general. If
110	  the underlying data structure of <literal>Cntnr</literal> is based
111	  on a collision-chaining hash table, then modifying
112	  r_<literal>Cntnr</literal> will not invalidate its iterators' order;
113	  if the underlying data structure is a probing hash table, then this
114	  is not the case. If the underlying data structure is based on a tree
115	  or trie, then a reference to the container can efficiently be split;
116	  otherwise, it cannot, in general. If the underlying data structure
117	  is a red-black tree, then splitting a reference to the container is
118	  exception-free; if it is an ordered-vector tree, exceptions can be
119	  thrown.
120	</para>
121
122      </section>
123
124      <section xml:id="pbds.intro.issues.priority_queue">
125	<info><title>Priority Que</title></info>
126
127	<para>
128	  Priority queues are useful when one needs to efficiently access a
129	  minimum (or maximum) value as the set of values changes.
130	</para>
131
132	<para>
133	  Most useful data structures for priority queues have a relatively
134	  simple structure, as they are geared toward relatively simple
135	  requirements. Unfortunately, these structures do not support access
136	  to an arbitrary value, which turns out to be necessary in many
137	  algorithms. Say, decreasing an arbitrary value in a graph
138	  algorithm. Therefore, some extra mechanism is necessary and must be
139	  invented for accessing arbitrary values. There are at least two
140	  alternatives: embedding an associative container in a priority
141	  queue, or allowing cross-referencing through iterators. The first
142	  solution adds significant overhead; the second solution requires a
143	  precise definition of iterator invalidation. Which is the next
144	  point...
145	</para>
146
147	<para>
148	  Priority queues, like hash-based containers, store values in an
149	  order that is meaningless and undefined externally. For example, a
150	  <code>push</code> operation can internally reorganize the
151	  values. Because of this characteristic, describing a priority
152	  queues' iterator is difficult: on one hand, the values to which
153	  iterators point can remain valid, but on the other, the logical
154	  order of iterators can change unpredictably.
155	</para>
156
157	<para>
158	  Roughly speaking, any element that is both inserted to a priority
159	  queue (e.g. through <code>push</code>) and removed
160	  from it (e.g., through <code>pop</code>), incurs a
161	  logarithmic overhead (in the amortized sense). Different underlying
162	  data structures place the actual cost differently: some are
163	  optimized for amortized complexity, whereas others guarantee that
164	  specific operations only have a constant cost. One underlying data
165	  structure might be chosen if modifying a value is frequent
166	  (Dijkstra's shortest-path algorithm), whereas a different one might
167	  be chosen otherwise. Unfortunately, an array-based binary heap - an
168	  underlying data structure that optimizes (in the amortized sense)
169	  <code>push</code> and <code>pop</code> operations, differs from the
170	  others in terms of its invalidation guarantees. Other design
171	  decisions also impact the cost and placement of the overhead, at the
172	  expense of more difference in the the kinds of operations that the
173	  underlying data structure can support. These differences pose a
174	  challenge when creating a uniform interface for priority queues.
175	</para>
176      </section>
177    </section>
178
179    <section xml:id="pbds.intro.motivation">
180      <info><title>Goals</title></info>
181
182      <para>
183	Many fine associative-container libraries were already written,
184	most notably, the C++ standard's associative containers. Why
185	then write another library? This section shows some possible
186	advantages of this library, when considering the challenges in
187	the introduction. Many of these points stem from the fact that
188	the ISO C++ process introduced associative-containers in a
189	two-step process (first standardizing tree-based containers,
190	only then adding hash-based containers, which are fundamentally
191	different), did not standardize priority queues as containers,
192	and (in our opinion) overloads the iterator concept.
193      </para>
194
195      <section xml:id="pbds.intro.motivation.associative">
196	<info><title>Associative</title></info>
197	<para>
198	</para>
199
200	<section xml:id="motivation.associative.policy">
201	  <info><title>Policy Choices</title></info>
202	  <para>
203	    Associative containers require a relatively large number of
204	    policies to function efficiently in various settings. In some
205	    cases this is needed for making their common operations more
206	    efficient, and in other cases this allows them to support a
207	    larger set of operations
208	  </para>
209
210	  <orderedlist>
211	    <listitem>
212	      <para>
213		Hash-based containers, for example, support look-up and
214		insertion methods (<function>find</function> and
215		<function>insert</function>). In order to locate elements
216		quickly, they are supplied a hash functor, which instruct
217		how to transform a key object into some size type; a hash
218		functor might transform <constant>"hello"</constant>
219		into <constant>1123002298</constant>. A hash table, though,
220		requires transforming each key object into some size-type
221		type in some specific domain; a hash table with a 128-long
222		table might transform <constant>"hello"</constant> into
223		position <constant>63</constant>. The policy by which the
224		hash value is transformed into a position within the table
225		can dramatically affect performance.  Hash-based containers
226		also do not resize naturally (as opposed to tree-based
227		containers, for example). The appropriate resize policy is
228		unfortunately intertwined with the policy that transforms
229		hash value into a position within the table.
230	      </para>
231	    </listitem>
232
233	    <listitem>
234	      <para>
235		Tree-based containers, for example, also support look-up and
236		insertion methods, and are primarily useful when maintaining
237		order between elements is important. In some cases, though,
238		one can utilize their balancing algorithms for completely
239		different purposes.
240	      </para>
241
242	      <para>
243		Figure A shows a tree whose each node contains two entries:
244		a floating-point key, and some size-type
245		<emphasis>metadata</emphasis> (in bold beneath it) that is
246		the number of nodes in the sub-tree. (The root has key 0.99,
247		and has 5 nodes (including itself) in its sub-tree.) A
248		container based on this data structure can obviously answer
249		efficiently whether 0.3 is in the container object, but it
250		can also answer what is the order of 0.3 among all those in
251		the container object: see <xref linkend="biblio.clrs2001"/>.
252
253	      </para>
254
255	      <para>
256		As another example, Figure B shows a tree whose each node
257		contains two entries: a half-open geometric line interval,
258		and a number <emphasis>metadata</emphasis> (in bold beneath
259		it) that is the largest endpoint of all intervals in its
260		sub-tree.  (The root describes the interval <constant>[20,
261		36)</constant>, and the largest endpoint in its sub-tree is
262		99.) A container based on this data structure can obviously
263		answer efficiently whether <constant>[3, 41)</constant> is
264		in the container object, but it can also answer efficiently
265		whether the container object has intervals that intersect
266		<constant>[3, 41)</constant>. These types of queries are
267		very useful in geometric algorithms and lease-management
268		algorithms.
269	      </para>
270
271	      <para>
272		It is important to note, however, that as the trees are
273		modified, their internal structure changes. To maintain
274		these invariants, one must supply some policy that is aware
275		of these changes.  Without this, it would be better to use a
276		linked list (in itself very efficient for these purposes).
277	      </para>
278
279	    </listitem>
280	  </orderedlist>
281
282	  <figure>
283	    <title>Node Invariants</title>
284	    <mediaobject>
285	      <imageobject>
286		<imagedata align="center" format="PNG" scale="100"
287			   fileref="../images/pbds_node_invariants.png"/>
288	      </imageobject>
289	      <textobject>
290		<phrase>Node Invariants</phrase>
291	      </textobject>
292	    </mediaobject>
293	  </figure>
294
295	</section>
296
297	<section xml:id="motivation.associative.underlying">
298	  <info><title>Underlying Data Structures</title></info>
299	  <para>
300	    The standard C++ library contains associative containers based on
301	    red-black trees and collision-chaining hash tables. These are
302	    very useful, but they are not ideal for all types of
303	    settings.
304	  </para>
305
306	  <para>
307	    The figure below shows the different underlying data structures
308	    currently supported in this library.
309	  </para>
310
311	  <figure>
312	    <title>Underlying Associative Data Structures</title>
313	    <mediaobject>
314	      <imageobject>
315		<imagedata align="center" format="PNG" scale="100"
316			   fileref="../images/pbds_different_underlying_dss_1.png"/>
317	      </imageobject>
318	      <textobject>
319		<phrase>Underlying Associative Data Structures</phrase>
320	      </textobject>
321	    </mediaobject>
322	  </figure>
323
324	  <para>
325	    A shows a collision-chaining hash-table, B shows a probing
326	    hash-table, C shows a red-black tree, D shows a splay tree, E shows
327	    a tree based on an ordered vector(implicit in the order of the
328	    elements), F shows a PATRICIA trie, and G shows a list-based
329	    container with update policies.
330	  </para>
331
332	  <para>
333	    Each of these data structures has some performance benefits, in
334	    terms of speed, size or both. For now, note that vector-based trees
335	    and probing hash tables manipulate memory more efficiently than
336	    red-black trees and collision-chaining hash tables, and that
337	    list-based associative containers are very useful for constructing
338	    "multimaps".
339	  </para>
340
341	  <para>
342	    Now consider a function manipulating a generic associative
343	    container,
344	  </para>
345	  <programlisting>
346	    template&lt;class Cntnr&gt;
347	    int
348	    some_op_sequence(Cntnr &amp;r_cnt)
349	    {
350	    ...
351	    }
352	  </programlisting>
353
354	  <para>
355	    Ideally, the underlying data structure
356	    of <classname>Cntnr</classname> would not affect what can be
357	    done with <varname>r_cnt</varname>.  Unfortunately, this is not
358	    the case.
359	  </para>
360
361	  <para>
362	    For example, if <classname>Cntnr</classname>
363	    is <classname>std::map</classname>, then the function can
364	    use
365	  </para>
366	  <programlisting>
367	    std::for_each(r_cnt.find(foo), r_cnt.find(bar), foobar)
368	  </programlisting>
369	  <para>
370	    in order to apply <classname>foobar</classname> to all
371	    elements between <classname>foo</classname> and
372	    <classname>bar</classname>. If
373	    <classname>Cntnr</classname> is a hash-based container,
374	    then this call's results are undefined.
375	  </para>
376
377	  <para>
378	    Also, if <classname>Cntnr</classname> is tree-based, the type
379	    and object of the comparison functor can be
380	    accessed. If <classname>Cntnr</classname> is hash based, these
381	    queries are nonsensical.
382	  </para>
383
384	  <para>
385	    There are various other differences based on the container's
386	    underlying data structure. For one, they can be constructed by,
387	    and queried for, different policies. Furthermore:
388	  </para>
389
390	  <orderedlist>
391	    <listitem>
392	      <para>
393		Containers based on C, D, E and F store elements in a
394		meaningful order; the others store elements in a meaningless
395		(and probably time-varying) order. By implication, only
396		containers based on C, D, E and F can
397		support <function>erase</function> operations taking an
398		iterator and returning an iterator to the following element
399		without performance loss.
400	      </para>
401	    </listitem>
402
403	    <listitem>
404	      <para>
405		Containers based on C, D, E, and F can be split and joined
406		efficiently, while the others cannot. Containers based on C
407		and D, furthermore, can guarantee that this is exception-free;
408		containers based on E cannot guarantee this.
409	      </para>
410	    </listitem>
411
412	    <listitem>
413	      <para>
414		Containers based on all but E can guarantee that
415		erasing an element is exception free; containers based on E
416		cannot guarantee this. Containers based on all but B and E
417		can guarantee that modifying an object of their type does
418		not invalidate iterators or references to their elements,
419		while containers based on B and E cannot. Containers based
420		on C, D, and E can furthermore make a stronger guarantee,
421		namely that modifying an object of their type does not
422		affect the order of iterators.
423	      </para>
424	    </listitem>
425	  </orderedlist>
426
427	  <para>
428	    A unified tag and traits system (as used for the C++ standard
429	    library iterators, for example) can ease generic manipulation of
430	    associative containers based on different underlying data
431	    structures.
432	  </para>
433
434	</section>
435
436	<section xml:id="motivation.associative.iterators">
437	  <info><title>Iterators</title></info>
438	  <para>
439	    Iterators are centric to the design of the standard library
440	    containers, because of the container/algorithm/iterator
441	    decomposition that allows an algorithm to operate on a range
442	    through iterators of some sequence.  Iterators, then, are useful
443	    because they allow going over a
444	    specific <emphasis>sequence</emphasis>.  The standard library
445	    also uses iterators for accessing a
446	    specific <emphasis>element</emphasis>: when an associative
447	    container returns one through <function>find</function>. The
448	    standard library consistently uses the same types of iterators
449	    for both purposes: going over a range, and accessing a specific
450	    found element. Before the introduction of hash-based containers
451	    to the standard library, this made sense (with the exception of
452	    priority queues, which are discussed later).
453	  </para>
454
455	  <para>
456	    Using the standard associative containers together with
457	    non-order-preserving associative containers (and also because of
458	    priority-queues container), there is a possible need for
459	    different types of iterators for self-organizing containers:
460	    the iterator concept seems overloaded to mean two different
461	    things (in some cases). <remark> XXX
462	    "ds_gen.html#find_range">Design::Associative
463	    Containers::Data-Structure Genericity::Point-Type and Range-Type
464	    Methods</remark>.
465	  </para>
466
467	  <section xml:id="associative.iterators.using">
468	    <info>
469	      <title>Using Point Iterators for Range Operations</title>
470	    </info>
471	    <para>
472	      Suppose <classname>cntnr</classname> is some associative
473	      container, and say <varname>c</varname> is an object of
474	      type <classname>cntnr</classname>. Then what will be the outcome
475	      of
476	    </para>
477
478	    <programlisting>
479	      std::for_each(c.find(1), c.find(5), foo);
480	    </programlisting>
481
482	    <para>
483	      If <classname>cntnr</classname> is a tree-based container
484	      object, then an in-order walk will
485	      apply <classname>foo</classname> to the relevant elements,
486	      as in the graphic below, label A. If <varname>c</varname> is
487	      a hash-based container, then the order of elements between any
488	      two elements is undefined (and probably time-varying); there is
489	      no guarantee that the elements traversed will coincide with the
490	      <emphasis>logical</emphasis> elements between 1 and 5, as in
491	      label B.
492	    </para>
493
494	    <figure>
495	      <title>Range Iteration in Different Data Structures</title>
496	      <mediaobject>
497		<imageobject>
498		  <imagedata align="center" format="PNG" scale="100"
499			     fileref="../images/pbds_point_iterators_range_ops_1.png"/>
500		</imageobject>
501		<textobject>
502		  <phrase>Node Invariants</phrase>
503		</textobject>
504	      </mediaobject>
505	    </figure>
506
507	    <para>
508	      In our opinion, this problem is not caused just because
509	      red-black trees are order preserving while
510	      collision-chaining hash tables are (generally) not - it
511	      is more fundamental. Most of the standard's containers
512	      order sequences in a well-defined manner that is
513	      determined by their <emphasis>interface</emphasis>:
514	      calling <function>insert</function> on a tree-based
515	      container modifies its sequence in a predictable way, as
516	      does calling <function>push_back</function> on a list or
517	      a vector. Conversely, collision-chaining hash tables,
518	      probing hash tables, priority queues, and list-based
519	      containers (which are very useful for "multimaps") are
520	      self-organizing data structures; the effect of each
521	      operation modifies their sequences in a manner that is
522	      (practically) determined by their
523	      <emphasis>implementation</emphasis>.
524	    </para>
525
526	    <para>
527	      Consequently, applying an algorithm to a sequence obtained from most
528	      containers may or may not make sense, but applying it to a
529	      sub-sequence of a self-organizing container does not.
530	    </para>
531	  </section>
532
533	  <section xml:id="associative.iterators.cost">
534	    <info>
535	      <title>Cost to Point Iterators to Enable Range Operations</title>
536	    </info>
537	    <para>
538	      Suppose <varname>c</varname> is some collision-chaining
539	      hash-based container object, and one calls
540	    </para>
541	    <programlisting>c.find(3)</programlisting>
542	    <para>
543	      Then what composes the returned iterator?
544	    </para>
545
546	    <para>
547	      In the graphic below, label A shows the simplest (and
548	      most efficient) implementation of a collision-chaining
549	      hash table.  The little box marked
550	      <classname>point_iterator</classname> shows an object
551	      that contains a pointer to the element's node. Note that
552	      this "iterator" has no way to move to the next element (
553	      it cannot support
554	      <function>operator++</function>). Conversely, the little
555	      box marked <classname>iterator</classname> stores both a
556	      pointer to the element, as well as some other
557	      information (the bucket number of the element). the
558	      second iterator, then, is "heavier" than the first one-
559	      it requires more time and space. If we were to use a
560	      different container to cross-reference into this
561	      hash-table using these iterators - it would take much
562	      more space. As noted above, nothing much can be done by
563	      incrementing these iterators, so why is this extra
564	      information needed?
565	    </para>
566
567	    <para>
568	      Alternatively, one might create a collision-chaining hash-table
569	      where the lists might be linked, forming a monolithic total-element
570	      list, as in the graphic below, label B.  Here the iterators are as
571	      light as can be, but the hash-table's operations are more
572	      complicated.
573	    </para>
574
575	    <figure>
576	      <title>Point Iteration in Hash Data Structures</title>
577	      <mediaobject>
578		<imageobject>
579		  <imagedata align="center" format="PNG" scale="100"
580			     fileref="../images/pbds_point_iterators_range_ops_2.png"/>
581		</imageobject>
582		<textobject>
583		  <phrase>Point Iteration in Hash Data Structures</phrase>
584		</textobject>
585	      </mediaobject>
586	    </figure>
587
588	    <para>
589	      It should be noted that containers based on collision-chaining
590	      hash-tables are not the only ones with this type of behavior;
591	      many other self-organizing data structures display it as well.
592	    </para>
593	  </section>
594
595	  <section xml:id="associative.iterators.invalidation">
596	    <info><title>Invalidation Guarantees</title></info>
597	    <para>Consider the following snippet:</para>
598	    <programlisting>
599	      it = c.find(3);
600	      c.erase(5);
601	    </programlisting>
602
603	    <para>
604	      Following the call to <classname>erase</classname>, what is the
605	      validity of <classname>it</classname>: can it be de-referenced?
606	      can it be incremented?
607	    </para>
608
609	    <para>
610	      The answer depends on the underlying data structure of the
611	      container. The graphic below shows three cases: A1 and A2 show
612	      a red-black tree; B1 and B2 show a probing hash-table; C1 and C2
613	      show a collision-chaining hash table.
614	    </para>
615
616	    <figure>
617	      <title>Effect of erase in different underlying data structures</title>
618	      <mediaobject>
619		<imageobject>
620		  <imagedata align="center" format="PNG" scale="100"
621			     fileref="../images/pbds_invalidation_guarantee_erase.png"/>
622		</imageobject>
623		<textobject>
624		  <phrase>Effect of erase in different underlying data structures</phrase>
625		</textobject>
626	      </mediaobject>
627	    </figure>
628
629	    <orderedlist>
630	      <listitem>
631		<para>
632		  Erasing 5 from A1 yields A2. Clearly, an iterator to 3 can
633		  be de-referenced and incremented. The sequence of iterators
634		  changed, but in a way that is well-defined by the interface.
635		</para>
636	      </listitem>
637
638	      <listitem>
639		<para>
640		  Erasing 5 from B1 yields B2. Clearly, an iterator to 3 is
641		  not valid at all - it cannot be de-referenced or
642		  incremented; the order of iterators changed in a way that is
643		  (practically) determined by the implementation and not by
644		  the interface.
645		</para>
646	      </listitem>
647
648	      <listitem>
649		<para>
650		  Erasing 5 from C1 yields C2. Here the situation is more
651		  complicated. On the one hand, there is no problem in
652		  de-referencing <classname>it</classname>. On the other hand,
653		  the order of iterators changed in a way that is
654		  (practically) determined by the implementation and not by
655		  the interface.
656		</para>
657	      </listitem>
658	    </orderedlist>
659
660	    <para>
661	      So in the standard library containers, it is not always possible
662	      to express whether <varname>it</varname> is valid or not. This
663	      is true also for <function>insert</function>. Again, the
664	      iterator concept seems overloaded.
665	    </para>
666	  </section>
667	</section> <!--iterators-->
668
669
670	<section xml:id="motivation.associative.functions">
671	  <info><title>Functional</title></info>
672	  <para>
673	  </para>
674
675	  <para>
676	    The design of the functional overlay to the underlying data
677	    structures differs slightly from some of the conventions used in
678	    the C++ standard.  A strict public interface of methods that
679	    comprise only operations which depend on the class's internal
680	    structure; other operations are best designed as external
681	    functions. (See <xref linkend="biblio.meyers02both"/>).With this
682	    rubric, the standard associative containers lack some useful
683	    methods, and provide other methods which would be better
684	    removed.
685	  </para>
686
687	  <section xml:id="motivation.associative.functions.erase">
688	    <info><title><function>erase</function></title></info>
689
690	    <orderedlist>
691	      <listitem>
692		<para>
693		  Order-preserving standard associative containers provide the
694		  method
695		</para>
696		<programlisting>
697		  iterator
698		  erase(iterator it)
699		</programlisting>
700
701		<para>
702		  which takes an iterator, erases the corresponding
703		  element, and returns an iterator to the following
704		  element. Also standardd hash-based associative
705		  containers provide this method. This seemingly
706		  increasesgenericity between associative containers,
707		  since it is possible to use
708		</para>
709		<programlisting>
710		  typename C::iterator it = c.begin();
711		  typename C::iterator e_it = c.end();
712
713		  while(it != e_it)
714		  it = pred(*it)? c.erase(it) : ++it;
715		</programlisting>
716
717		<para>
718		  in order to erase from a container object <varname>
719		  c</varname> all element which match a
720		  predicate <classname>pred</classname>. However, in a
721		  different sense this actually decreases genericity: an
722		  integral implication of this method is that tree-based
723		  associative containers' memory use is linear in the total
724		  number of elements they store, while hash-based
725		  containers' memory use is unbounded in the total number of
726		  elements they store. Assume a hash-based container is
727		  allowed to decrease its size when an element is
728		  erased. Then the elements might be rehashed, which means
729		  that there is no "next" element - it is simply
730		  undefined. Consequently, it is possible to infer from the
731		  fact that the standard library's hash-based containers
732		  provide this method that they cannot downsize when
733		  elements are erased. As a consequence, different code is
734		  needed to manipulate different containers, assuming that
735		  memory should be conserved. Therefor, this library's
736		  non-order preserving associative containers omit this
737		  method.
738		</para>
739	      </listitem>
740
741	      <listitem>
742		<para>
743		  All associative containers include a conditional-erase method
744		</para>
745		<programlisting>
746		  template&lt;
747		  class Pred&gt;
748		  size_type
749		  erase_if
750		  (Pred pred)
751		</programlisting>
752		<para>
753		  which erases all elements matching a predicate. This is probably the
754		  only way to ensure linear-time multiple-item erase which can
755		  actually downsize a container.
756		</para>
757	      </listitem>
758
759	      <listitem>
760		<para>
761		  The standard associative containers provide methods for
762		  multiple-item erase of the form
763		</para>
764		<programlisting>
765		  size_type
766		  erase(It b, It e)
767		</programlisting>
768		<para>
769		  erasing a range of elements given by a pair of
770		  iterators. For tree-based or trie-based containers, this can
771		  implemented more efficiently as a (small) sequence of split
772		  and join operations. For other, unordered, containers, this
773		  method isn't much better than an external loop. Moreover,
774		  if <varname>c</varname> is a hash-based container,
775		  then
776		</para>
777		<programlisting>
778		  c.erase(c.find(2), c.find(5))
779		</programlisting>
780		<para>
781		  is almost certain to do something
782		  different than erasing all elements whose keys are between 2
783		  and 5, and is likely to produce other undefined behavior.
784		</para>
785	      </listitem>
786	    </orderedlist>
787	  </section> <!-- erase -->
788
789	  <section xml:id="motivation.associative.functions.split">
790	    <info>
791	      <title>
792		<function>split</function> and <function>join</function>
793	      </title>
794	    </info>
795	    <para>
796	      It is well-known that tree-based and trie-based container
797	      objects can be efficiently split or joined (See
798	      <xref linkend="biblio.clrs2001"/>). Externally splitting or
799	      joining trees is super-linear, and, furthermore, can throw
800	      exceptions. Split and join methods, consequently, seem good
801	      choices for tree-based container methods, especially, since as
802	      noted just before, they are efficient replacements for erasing
803	      sub-sequences.
804	    </para>
805
806	  </section> <!-- split -->
807
808	  <section xml:id="motivation.associative.functions.insert">
809	    <info>
810	      <title>
811		<function>insert</function>
812	      </title>
813	    </info>
814	    <para>
815	      The standard associative containers provide methods of the form
816	    </para>
817	    <programlisting>
818	      template&lt;class It&gt;
819	      size_type
820	      insert(It b, It e);
821	    </programlisting>
822
823	    <para>
824	      for inserting a range of elements given by a pair of
825	      iterators. At best, this can be implemented as an external loop,
826	      or, even more efficiently, as a join operation (for the case of
827	      tree-based or trie-based containers). Moreover, these methods seem
828	      similar to constructors taking a range given by a pair of
829	      iterators; the constructors, however, are transactional, whereas
830	      the insert methods are not; this is possibly confusing.
831	    </para>
832
833	  </section> <!-- insert -->
834
835	  <section xml:id="motivation.associative.functions.compare">
836	    <info>
837	      <title>
838		<function>operator==</function> and <function>operator&lt;=</function>
839	      </title>
840	    </info>
841
842	    <para>
843	      Associative containers are parametrized by policies allowing to
844	      test key equivalence: a hash-based container can do this through
845	      its equivalence functor, and a tree-based container can do this
846	      through its comparison functor. In addition, some standard
847	      associative containers have global function operators, like
848	      <function>operator==</function> and <function>operator&lt;=</function>,
849	      that allow comparing entire associative containers.
850	    </para>
851
852	    <para>
853	      In our opinion, these functions are better left out. To begin
854	      with, they do not significantly improve over an external
855	      loop. More importantly, however, they are possibly misleading -
856	      <function>operator==</function>, for example, usually checks for
857	      equivalence, or interchangeability, but the associative
858	      container cannot check for values' equivalence, only keys'
859	      equivalence; also, are two containers considered equivalent if
860	      they store the same values in different order? this is an
861	      arbitrary decision.
862	    </para>
863	  </section> <!-- compare -->
864
865	</section>  <!-- functional -->
866
867      </section> <!--associative-->
868
869      <section xml:id="pbds.intro.motivation.priority_queue">
870	<info><title>Priority Queues</title></info>
871
872	<section xml:id="motivation.priority_queue.policy">
873	  <info><title>Policy Choices</title></info>
874
875	  <para>
876	    Priority queues are containers that allow efficiently inserting
877	    values and accessing the maximal value (in the sense of the
878	    container's comparison functor). Their interface
879	    supports <function>push</function>
880	    and <function>pop</function>. The standard
881	    container <classname>std::priorityqueue</classname> indeed support
882	    these methods, but little else. For algorithmic and
883	    software-engineering purposes, other methods are needed:
884	  </para>
885
886	  <orderedlist>
887	    <listitem>
888	      <para>
889		Many graph algorithms (see
890		<xref linkend="biblio.clrs2001"/>) require increasing a
891		value in a priority queue (again, in the sense of the
892		container's comparison functor), or joining two
893		priority-queue objects.
894	      </para>
895	    </listitem>
896
897	    <listitem>
898	      <para>The return type of <classname>priority_queue</classname>'s
899	      <function>push</function> method is a point-type iterator, which can
900	      be used for modifying or erasing arbitrary values. For
901	      example:</para>
902	      <programlisting>
903		priority_queue&lt;int&gt; p;
904		priority_queue&lt;int&gt;::point_iterator it = p.push(3);
905		p.modify(it, 4);
906	      </programlisting>
907
908	      <para>These types of cross-referencing operations are necessary
909	      for making priority queues useful for different applications,
910	      especially graph applications.</para>
911
912	    </listitem>
913	    <listitem>
914	      <para>
915		It is sometimes necessary to erase an arbitrary value in a
916		priority queue. For example, consider
917		the <function>select</function> function for monitoring
918		file descriptors:
919	      </para>
920
921	      <programlisting>
922		int
923		select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *errorfds,
924		struct timeval *timeout);
925	      </programlisting>
926	      <para>
927		then, as the select documentation states:
928	      </para>
929	      <para>
930		<quote>
931		  The nfds argument specifies the range of file
932		  descriptors to be tested. The select() function tests file
933		descriptors in the range of 0 to nfds-1.</quote>
934	      </para>
935
936	      <para>
937		It stands to reason, therefore, that we might wish to
938		maintain a minimal value for <varname>nfds</varname>, and
939		priority queues immediately come to mind. Note, though, that
940		when a socket is closed, the minimal file description might
941		change; in the absence of an efficient means to erase an
942		arbitrary value from a priority queue, we might as well
943		avoid its use altogether.
944	      </para>
945
946	      <para>
947		The standard containers typically support iterators. It is
948		somewhat unusual
949		for <classname>std::priority_queue</classname> to omit them
950		(See <xref linkend="biblio.meyers01stl"/>). One might
951		ask why do priority queues need to support iterators, since
952		they are self-organizing containers with a different purpose
953		than abstracting sequences. There are several reasons:
954	      </para>
955	      <orderedlist>
956		<listitem>
957		  <para>
958		    Iterators (even in self-organizing containers) are
959		    useful for many purposes: cross-referencing
960		    containers, serialization, and debugging code that uses
961		    these containers.
962		  </para>
963		</listitem>
964
965		<listitem>
966		  <para>
967		    The standard library's hash-based containers support
968		    iterators, even though they too are self-organizing
969		    containers with a different purpose than abstracting
970		    sequences.
971		  </para>
972		</listitem>
973
974		<listitem>
975		  <para>
976		    In standard-library-like containers, it is natural to specify the
977		    interface of operations for modifying a value or erasing
978		    a value (discussed previously) in terms of a iterators.
979		    It should be noted that the standard
980		    containers also use iterators for accessing and
981		    manipulating a specific value. In hash-based
982		    containers, one checks the existence of a key by
983		    comparing the iterator returned by <function>find</function> to the
984		    iterator returned by <function>end</function>, and not by comparing a
985		    pointer returned by <function>find</function> to <type>NULL</type>.
986		  </para>
987		</listitem>
988	      </orderedlist>
989	    </listitem>
990	  </orderedlist>
991
992	</section>
993
994	<section xml:id="motivation.priority_queue.underlying">
995	  <info><title>Underlying Data Structures</title></info>
996
997	  <para>
998	    There are three main implementations of priority queues: the
999	    first employs a binary heap, typically one which uses a
1000	    sequence; the second uses a tree (or forest of trees), which is
1001	    typically less structured than an associative container's tree;
1002	    the third simply uses an associative container. These are
1003	    shown in the figure below with labels A1 and A2, B, and C.
1004	  </para>
1005
1006	  <figure>
1007	    <title>Underlying Priority Queue Data Structures</title>
1008	    <mediaobject>
1009	      <imageobject>
1010		<imagedata align="center" format="PNG" scale="100"
1011			   fileref="../images/pbds_different_underlying_dss_2.png"/>
1012	      </imageobject>
1013	      <textobject>
1014		<phrase>Underlying Priority Queue Data Structures</phrase>
1015	      </textobject>
1016	    </mediaobject>
1017	  </figure>
1018
1019	  <para>
1020	    No single implementation can completely replace any of the
1021	    others. Some have better <function>push</function>
1022	    and <function>pop</function> amortized performance, some have
1023	    better bounded (worst case) response time than others, some
1024	    optimize a single method at the expense of others, etc. In
1025	    general the "best" implementation is dictated by the specific
1026	    problem.
1027	  </para>
1028
1029	  <para>
1030	    As with associative containers, the more implementations
1031	    co-exist, the more necessary a traits mechanism is for handling
1032	    generic containers safely and efficiently. This is especially
1033	    important for priority queues, since the invalidation guarantees
1034	    of one of the most useful data structures - binary heaps - is
1035	    markedly different than those of most of the others.
1036	  </para>
1037
1038	</section>
1039
1040	<section xml:id="motivation.priority_queue.binary_heap">
1041	  <info><title>Binary Heaps</title></info>
1042
1043
1044	  <para>
1045	    Binary heaps are one of the most useful underlying
1046	    data structures for priority queues. They are very efficient in
1047	    terms of memory (since they don't require per-value structure
1048	    metadata), and have the best amortized <function>push</function> and
1049	    <function>pop</function> performance for primitive types like
1050	    <type>int</type>.
1051	  </para>
1052
1053	  <para>
1054	    The standard library's <classname>priority_queue</classname>
1055	    implements this data structure as an adapter over a sequence,
1056	    typically
1057	    <classname>std::vector</classname>
1058	    or <classname>std::deque</classname>, which correspond to labels
1059	    A1 and A2 respectively in the graphic above.
1060	  </para>
1061
1062	  <para>
1063	    This is indeed an elegant example of the adapter concept and
1064	    the algorithm/container/iterator decomposition. (See <xref linkend="biblio.nelson96stlpq"/>). There are
1065	    several reasons why a binary-heap priority queue
1066	    may be better implemented as a container instead of a
1067	    sequence adapter:
1068	  </para>
1069
1070	  <orderedlist>
1071	    <listitem>
1072	      <para>
1073		<classname>std::priority_queue</classname> cannot erase values
1074		from its adapted sequence (irrespective of the sequence
1075		type). This means that the memory use of
1076		an <classname>std::priority_queue</classname> object is always
1077		proportional to the maximal number of values it ever contained,
1078		and not to the number of values that it currently
1079		contains. (See <filename>performance/priority_queue_text_pop_mem_usage.cc</filename>.)
1080		This implementation of binary heaps acts very differently than
1081		other underlying data structures (See also pairing heaps).
1082	      </para>
1083	    </listitem>
1084
1085	    <listitem>
1086	      <para>
1087		Some combinations of adapted sequences and value types
1088		are very inefficient or just don't make sense. If one uses
1089		<classname>std::priority_queue&lt;std::vector&lt;std::string&gt;
1090		&gt; &gt;</classname>, for example, then not only will each
1091		operation perform a logarithmic number of
1092		<classname>std::string</classname> assignments, but, furthermore, any
1093		operation (including <function>pop</function>) can render the container
1094		useless due to exceptions. Conversely, if one uses
1095		<classname>std::priority_queue&lt;std::deque&lt;int&gt; &gt;
1096		&gt;</classname>, then each operation uses incurs a logarithmic
1097		number of indirect accesses (through pointers) unnecessarily.
1098		It might be better to let the container make a conservative
1099		deduction whether to use the structure in the graphic above, labels A1 or A2.
1100	      </para>
1101	    </listitem>
1102
1103	    <listitem>
1104	      <para>
1105		There does not seem to be a systematic way to determine
1106		what exactly can be done with the priority queue.
1107	      </para>
1108	      <orderedlist>
1109		<listitem>
1110		  <para>
1111		    If <classname>p</classname> is a priority queue adapting an
1112		    <classname>std::vector</classname>, then it is possible to iterate over
1113		    all values by using <function>&amp;p.top()</function> and
1114		    <function>&amp;p.top() + p.size()</function>, but this will not work
1115		    if <varname>p</varname> is adapting an <classname>std::deque</classname>; in any
1116		    case, one cannot use <classname>p.begin()</classname> and
1117		    <classname>p.end()</classname>. If a different sequence is adapted, it
1118		    is even more difficult to determine what can be
1119		    done.
1120		  </para>
1121		</listitem>
1122
1123		<listitem>
1124		  <para>
1125		    If <varname>p</varname> is a priority queue adapting an
1126		    <classname>std::deque</classname>, then the reference return by
1127		  </para>
1128		  <programlisting>
1129		    p.top()
1130		  </programlisting>
1131		  <para>
1132		    will remain valid until it is popped,
1133		    but if <varname>p</varname> adapts an <classname>std::vector</classname>, the
1134		    next <function>push</function> will invalidate it. If a different
1135		    sequence is adapted, it is even more difficult to
1136		    determine what can be done.
1137		  </para>
1138		</listitem>
1139	      </orderedlist>
1140	    </listitem>
1141
1142	    <listitem>
1143	      <para>
1144		Sequence-based binary heaps can still implement
1145		linear-time <function>erase</function> and <function>modify</function> operations.
1146		This means that if one needs to erase a small
1147		(say logarithmic) number of values, then one might still
1148		choose this underlying data structure. Using
1149		<classname>std::priority_queue</classname>, however, this will generally
1150		change the order of growth of the entire sequence of
1151		operations.
1152	      </para>
1153	    </listitem>
1154	  </orderedlist>
1155
1156	</section>
1157      </section>
1158    </section> <!-- goals/motivation -->
1159  </section> <!-- intro -->
1160
1161  <!-- S02: Using -->
1162  <section xml:id="containers.pbds.using">
1163    <info><title>Using</title></info>
1164    <?dbhtml filename="policy_data_structures_using.html"?>
1165
1166    <section xml:id="pbds.using.prereq">
1167      <info><title>Prerequisites</title></info>
1168
1169      <para>The library contains only header files, and does not require any
1170      other libraries except the standard C++ library . All classes are
1171      defined in namespace <code>__gnu_pbds</code>. The library internally
1172      uses macros beginning with <code>PB_DS</code>, but
1173      <code>#undef</code>s anything it <code>#define</code>s (except for
1174      header guards). Compiling the library in an environment where macros
1175      beginning in <code>PB_DS</code> are defined, may yield unpredictable
1176      results in compilation, execution, or both.</para>
1177
1178      <para>
1179	Further dependencies are necessary to create the visual output
1180	for the performance tests. To create these graphs, an
1181	additional package is needed: <command>pychart</command>.
1182      </para>
1183    </section>
1184
1185    <section xml:id="pbds.using.organization">
1186      <info><title>Organization</title></info>
1187
1188      <para>
1189	The various data structures are organized as follows.
1190      </para>
1191
1192      <itemizedlist>
1193	<listitem>
1194	  <para>
1195	    Branch-Based
1196	  </para>
1197
1198	  <itemizedlist>
1199	    <listitem>
1200	      <para>
1201		<classname>basic_branch</classname>
1202		is an abstract base class for branched-based
1203		associative-containers
1204	      </para>
1205	    </listitem>
1206
1207	    <listitem>
1208	      <para>
1209		<classname>tree</classname>
1210		is a concrete base class for tree-based
1211		associative-containers
1212	      </para>
1213	    </listitem>
1214
1215	    <listitem>
1216	      <para>
1217		<classname>trie</classname>
1218		is a concrete base class trie-based
1219		associative-containers
1220	      </para>
1221	    </listitem>
1222	  </itemizedlist>
1223	</listitem>
1224
1225	<listitem>
1226	  <para>
1227	    Hash-Based
1228	  </para>
1229	  <itemizedlist>
1230	    <listitem>
1231	      <para>
1232		<classname>basic_hash_table</classname>
1233		is an abstract base class for hash-based
1234		associative-containers
1235	      </para>
1236	    </listitem>
1237
1238	    <listitem>
1239	      <para>
1240		<classname>cc_hash_table</classname>
1241		is a concrete collision-chaining hash-based
1242		associative-containers
1243	      </para>
1244	    </listitem>
1245
1246	    <listitem>
1247	      <para>
1248		<classname>gp_hash_table</classname>
1249		is a concrete (general) probing hash-based
1250		associative-containers
1251	      </para>
1252	    </listitem>
1253	  </itemizedlist>
1254	</listitem>
1255
1256	<listitem>
1257	  <para>
1258	    List-Based
1259	  </para>
1260	  <itemizedlist>
1261	    <listitem>
1262	      <para>
1263		<classname>list_update</classname>
1264		list-based update-policy associative container
1265	      </para>
1266	    </listitem>
1267	  </itemizedlist>
1268	</listitem>
1269	<listitem>
1270	  <para>
1271	    Heap-Based
1272	  </para>
1273	  <itemizedlist>
1274	    <listitem>
1275	      <para>
1276		<classname>priority_queue</classname>
1277		A priority queue.
1278	      </para>
1279	    </listitem>
1280	  </itemizedlist>
1281	</listitem>
1282      </itemizedlist>
1283
1284      <para>
1285	The hierarchy is composed naturally so that commonality is
1286	captured by base classes. Thus <function>operator[]</function>
1287	is defined at the base of any hierarchy, since all derived
1288	containers support it. Conversely <function>split</function> is
1289	defined in <classname>basic_branch</classname>, since only
1290	tree-like containers support it.
1291      </para>
1292
1293      <para>
1294	In addition, there are the following diagnostics classes,
1295	used to report errors specific to this library's data
1296	structures.
1297      </para>
1298
1299      <figure>
1300	<title>Exception Hierarchy</title>
1301	<mediaobject>
1302	  <imageobject>
1303	    <imagedata align="center" format="PDF" scale="75"
1304		       fileref="../images/pbds_exception_hierarchy.pdf"/>
1305	  </imageobject>
1306	  <imageobject>
1307	    <imagedata align="center" format="PNG" scale="100"
1308		       fileref="../images/pbds_exception_hierarchy.png"/>
1309	  </imageobject>
1310	  <textobject>
1311	    <phrase>Exception Hierarchy</phrase>
1312	  </textobject>
1313	</mediaobject>
1314      </figure>
1315
1316    </section>
1317
1318    <section xml:id="pbds.using.tutorial">
1319      <info><title>Tutorial</title></info>
1320
1321      <section xml:id="pbds.using.tutorial.basic">
1322	<info><title>Basic Use</title></info>
1323
1324	<para>
1325	  For the most part, the policy-based containers containers in
1326	  namespace <literal>__gnu_pbds</literal> have the same interface as
1327	  the equivalent containers in the standard C++ library, except for
1328	  the names used for the container classes themselves. For example,
1329	  this shows basic operations on a collision-chaining hash-based
1330	  container:
1331	</para>
1332	<programlisting>
1333	  #include &lt;ext/pb_ds/assoc_container.h&gt;
1334
1335	  int main()
1336	  {
1337	  __gnu_pbds::cc_hash_table&lt;int, char&gt; c;
1338	  c[2] = 'b';
1339	  assert(c.find(1) == c.end());
1340	  };
1341	</programlisting>
1342
1343	<para>
1344	  The container is called
1345	  <classname>__gnu_pbds::cc_hash_table</classname> instead of
1346	  <classname>std::unordered_map</classname>, since <quote>unordered
1347	  map</quote> does not necessarily mean a hash-based map as implied by
1348	  the C++ library (C++11 or TR1). For example, list-based associative
1349	  containers, which are very useful for the construction of
1350	  "multimaps," are also unordered.
1351	</para>
1352
1353	<para>This snippet shows a red-black tree based container:</para>
1354
1355	<programlisting>
1356	  #include &lt;ext/pb_ds/assoc_container.h&gt;
1357
1358	  int main()
1359	  {
1360	  __gnu_pbds::tree&lt;int, char&gt; c;
1361	  c[2] = 'b';
1362	  assert(c.find(2) != c.end());
1363	  };
1364	</programlisting>
1365
1366	<para>The container is called <classname>tree</classname> instead of
1367	<classname>map</classname> since the underlying data structures are
1368	being named with specificity.
1369	</para>
1370
1371	<para>
1372	  The member function naming convention is to strive to be the same as
1373	  the equivalent member functions in other C++ standard library
1374	  containers. The familiar methods are unchanged:
1375	  <function>begin</function>, <function>end</function>,
1376	  <function>size</function>, <function>empty</function>, and
1377	  <function>clear</function>.
1378	</para>
1379
1380	<para>
1381	  This isn't to say that things are exactly as one would expect, given
1382	  the container requirments and interfaces in the C++ standard.
1383	</para>
1384
1385	<para>
1386	  The names of containers' policies and policy accessors are
1387	  different then the usual. For example, if <type>hash_type</type> is
1388	some type of hash-based container, then</para>
1389
1390	<programlisting>
1391	  hash_type::hash_fn
1392	</programlisting>
1393
1394	<para>
1395	  gives the type of its hash functor, and if <varname>obj</varname> is
1396	  some hash-based container object, then
1397	</para>
1398
1399	<programlisting>
1400	  obj.get_hash_fn()
1401	</programlisting>
1402
1403	<para>will return a reference to its hash-functor object.</para>
1404
1405
1406	<para>
1407	  Similarly, if <type>tree_type</type> is some type of tree-based
1408	  container, then
1409	</para>
1410
1411	<programlisting>
1412	  tree_type::cmp_fn
1413	</programlisting>
1414
1415	<para>
1416	  gives the type of its comparison functor, and if
1417	  <varname>obj</varname> is some tree-based container object,
1418	  then
1419	</para>
1420
1421	<programlisting>
1422	  obj.get_cmp_fn()
1423	</programlisting>
1424
1425	<para>will return a reference to its comparison-functor object.</para>
1426
1427	<para>
1428	  It would be nice to give names consistent with those in the existing
1429	  C++ standard (inclusive of TR1). Unfortunately, these standard
1430	  containers don't consistently name types and methods. For example,
1431	  <classname>std::tr1::unordered_map</classname> uses
1432	  <type>hasher</type> for the hash functor, but
1433	  <classname>std::map</classname> uses <type>key_compare</type> for
1434	  the comparison functor. Also, we could not find an accessor for
1435	  <classname>std::tr1::unordered_map</classname>'s hash functor, but
1436	  <classname>std::map</classname> uses <classname>compare</classname>
1437	  for accessing the comparison functor.
1438	</para>
1439
1440	<para>
1441	  Instead, <literal>__gnu_pbds</literal> attempts to be internally
1442	  consistent, and uses standard-derived terminology if possible.
1443	</para>
1444
1445	<para>
1446	  Another source of difference is in scope:
1447	  <literal>__gnu_pbds</literal> contains more types of associative
1448	  containers than the standard C++ library, and more opportunities
1449	  to configure these new containers, since different types of
1450	  associative containers are useful in different settings.
1451	</para>
1452
1453	<para>
1454	  Namespace <literal>__gnu_pbds</literal> contains different classes for
1455	  hash-based containers, tree-based containers, trie-based containers,
1456	  and list-based containers.
1457	</para>
1458
1459	<para>
1460	  Since associative containers share parts of their interface, they
1461	  are organized as a class hierarchy.
1462	</para>
1463
1464	<para>Each type or method is defined in the most-common ancestor
1465	in which it makes sense.
1466	</para>
1467
1468	<para>For example, all associative containers support iteration
1469	expressed in the following form:
1470	</para>
1471
1472	<programlisting>
1473	  const_iterator
1474	  begin() const;
1475
1476	  iterator
1477	  begin();
1478
1479	  const_iterator
1480	  end() const;
1481
1482	  iterator
1483	  end();
1484	</programlisting>
1485
1486	<para>
1487	  But not all containers contain or use hash functors. Yet, both
1488	  collision-chaining and (general) probing hash-based associative
1489	  containers have a hash functor, so
1490	  <classname>basic_hash_table</classname> contains the interface:
1491	</para>
1492
1493	<programlisting>
1494	  const hash_fn&amp;
1495	  get_hash_fn() const;
1496
1497	  hash_fn&amp;
1498	  get_hash_fn();
1499	</programlisting>
1500
1501	<para>
1502	  so all hash-based associative containers inherit the same
1503	  hash-functor accessor methods.
1504	</para>
1505
1506      </section> <!--basic use -->
1507
1508      <section xml:id="pbds.using.tutorial.configuring">
1509	<info>
1510	  <title>
1511	    Configuring via Template Parameters
1512	  </title>
1513	</info>
1514
1515	<para>
1516	  In general, each of this library's containers is
1517	  parametrized by more policies than those of the standard library. For
1518	  example, the standard hash-based container is parametrized as
1519	  follows:
1520	</para>
1521	<programlisting>
1522	  template&lt;typename Key, typename Mapped, typename Hash,
1523	  typename Pred, typename Allocator, bool Cache_Hashe_Code&gt;
1524	  class unordered_map;
1525	</programlisting>
1526
1527	<para>
1528	  and so can be configured by key type, mapped type, a functor
1529	  that translates keys to unsigned integral types, an equivalence
1530	  predicate, an allocator, and an indicator whether to store hash
1531	  values with each entry. this library's collision-chaining
1532	  hash-based container is parametrized as
1533	</para>
1534	<programlisting>
1535	  template&lt;typename Key, typename Mapped, typename Hash_Fn,
1536	  typename Eq_Fn, typename Comb_Hash_Fn,
1537	  typename Resize_Policy, bool Store_Hash
1538	  typename Allocator&gt;
1539	  class cc_hash_table;
1540	</programlisting>
1541
1542	<para>
1543	  and so can be configured by the first four types of
1544	  <classname>std::tr1::unordered_map</classname>, then a
1545	  policy for translating the key-hash result into a position
1546	  within the table, then a policy by which the table resizes,
1547	  an indicator whether to store hash values with each entry,
1548	  and an allocator (which is typically the last template
1549	  parameter in standard containers).
1550	</para>
1551
1552	<para>
1553	  Nearly all policy parameters have default values, so this
1554	  need not be considered for casual use. It is important to
1555	  note, however, that hash-based containers' policies can
1556	  dramatically alter their performance in different settings,
1557	  and that tree-based containers' policies can make them
1558	  useful for other purposes than just look-up.
1559	</para>
1560
1561
1562	<para>As opposed to associative containers, priority queues have
1563	relatively few configuration options. The priority queue is
1564	parametrized as follows:</para>
1565	<programlisting>
1566	  template&lt;typename Value_Type, typename Cmp_Fn,typename Tag,
1567	  typename Allocator&gt;
1568	  class priority_queue;
1569	</programlisting>
1570
1571	<para>The <classname>Value_Type</classname>, <classname>Cmp_Fn</classname>, and
1572	<classname>Allocator</classname> parameters are the container's value type,
1573	comparison-functor type, and allocator type, respectively;
1574	these are very similar to the standard's priority queue. The
1575	<classname>Tag</classname> parameter is different: there are a number of
1576	pre-defined tag types corresponding to binary heaps, binomial
1577	heaps, etc., and <classname>Tag</classname> should be instantiated
1578	by one of them.</para>
1579
1580	<para>Note that as opposed to the
1581	<classname>std::priority_queue</classname>,
1582	<classname>__gnu_pbds::priority_queue</classname> is not a
1583	sequence-adapter; it is a regular container.</para>
1584
1585      </section>
1586
1587      <section xml:id="pbds.using.tutorial.traits">
1588	<info>
1589	  <title>
1590	    Querying Container Attributes
1591	  </title>
1592	</info>
1593	<para></para>
1594
1595	<para>A containers underlying data structure
1596	affect their performance; Unfortunately, they can also affect
1597	their interface. When manipulating generically associative
1598	containers, it is often useful to be able to statically
1599	determine what they can support and what the cannot.
1600	</para>
1601
1602	<para>Happily, the standard provides a good solution to a similar
1603	problem - that of the different behavior of iterators. If
1604	<classname>It</classname> is an iterator, then
1605	</para>
1606	<programlisting>
1607	  typename std::iterator_traits&lt;It&gt;::iterator_category
1608	</programlisting>
1609
1610	<para>is one of a small number of pre-defined tag classes, and
1611	</para>
1612	<programlisting>
1613	  typename std::iterator_traits&lt;It&gt;::value_type
1614	</programlisting>
1615
1616	<para>is the value type to which the iterator "points".</para>
1617
1618	<para>
1619	  Similarly, in this library, if <type>C</type> is a
1620	  container, then <classname>container_traits</classname> is a
1621	  trait class that stores information about the kind of
1622	  container that is implemented.
1623	</para>
1624	<programlisting>
1625	  typename container_traits&lt;C&gt;::container_category
1626	</programlisting>
1627	<para>
1628	  is one of a small number of predefined tag structures that
1629	  uniquely identifies the type of underlying data structure.
1630	</para>
1631
1632	<para>In most cases, however, the exact underlying data
1633	structure is not really important, but what is important is
1634	one of its other attributes: whether it guarantees storing
1635	elements by key order, for example. For this one can
1636	use</para>
1637	<programlisting>
1638	  typename container_traits&lt;C&gt;::order_preserving
1639	</programlisting>
1640	<para>
1641	  Also,
1642	</para>
1643	<programlisting>
1644	  typename container_traits&lt;C&gt;::invalidation_guarantee
1645	</programlisting>
1646
1647	<para>is the container's invalidation guarantee. Invalidation
1648	guarantees are especially important regarding priority queues,
1649	since in this library's design, iterators are practically the
1650	only way to manipulate them.</para>
1651      </section>
1652
1653      <section xml:id="pbds.using.tutorial.point_range_iteration">
1654	<info>
1655	  <title>
1656	    Point and Range Iteration
1657	  </title>
1658	</info>
1659	<para></para>
1660
1661	<para>This library differentiates between two types of methods
1662	and iterators: point-type, and range-type. For example,
1663	<function>find</function> and <function>insert</function> are point-type methods, since
1664	they each deal with a specific element; their returned
1665	iterators are point-type iterators. <function>begin</function> and
1666	<function>end</function> are range-type methods, since they are not used to
1667	find a specific element, but rather to go over all elements in
1668	a container object; their returned iterators are range-type
1669	iterators.
1670	</para>
1671
1672	<para>Most containers store elements in an order that is
1673	determined by their interface. Correspondingly, it is fine that
1674	their point-type iterators are synonymous with their range-type
1675	iterators. For example, in the following snippet
1676	</para>
1677	<programlisting>
1678	  std::for_each(c.find(1), c.find(5), foo);
1679	</programlisting>
1680	<para>
1681	  two point-type iterators (returned by <function>find</function>) are used
1682	  for a range-type purpose - going over all elements whose key is
1683	  between 1 and 5.
1684	</para>
1685
1686	<para>
1687	  Conversely, the above snippet makes no sense for
1688	  self-organizing containers - ones that order (and reorder)
1689	  their elements by implementation. It would be nice to have a
1690	  uniform iterator system that would allow the above snippet to
1691	  compile only if it made sense.
1692	</para>
1693
1694	<para>
1695	  This could trivially be done by specializing
1696	  <function>std::for_each</function> for the case of iterators returned by
1697	  <classname>std::tr1::unordered_map</classname>, but this would only solve the
1698	  problem for one algorithm and one container. Fundamentally, the
1699	  problem is that one can loop using a self-organizing
1700	  container's point-type iterators.
1701	</para>
1702
1703	<para>
1704	  This library's containers define two families of
1705	  iterators: <type>point_const_iterator</type> and
1706	  <type>point_iterator</type> are the iterator types returned by
1707	  point-type methods; <type>const_iterator</type> and
1708	  <type>iterator</type> are the iterator types returned by range-type
1709	  methods.
1710	</para>
1711	<programlisting>
1712	  class &lt;- some container -&gt;
1713	  {
1714	  public:
1715	  ...
1716
1717	  typedef &lt;- something -&gt; const_iterator;
1718
1719	  typedef &lt;- something -&gt; iterator;
1720
1721	  typedef &lt;- something -&gt; point_const_iterator;
1722
1723	  typedef &lt;- something -&gt; point_iterator;
1724
1725	  ...
1726
1727	  public:
1728	  ...
1729
1730	  const_iterator begin () const;
1731
1732	  iterator begin();
1733
1734	  point_const_iterator find(...) const;
1735
1736	  point_iterator find(...);
1737	  };
1738	</programlisting>
1739
1740	<para>For
1741	containers whose interface defines sequence order , it
1742	is very simple: point-type and range-type iterators are exactly
1743	the same, which means that the above snippet will compile if it
1744	is used for an order-preserving associative container.
1745	</para>
1746
1747	<para>
1748	  For self-organizing containers, however, (hash-based
1749	  containers as a special example), the preceding snippet will
1750	  not compile, because their point-type iterators do not support
1751	  <function>operator++</function>.
1752	</para>
1753
1754	<para>In any case, both for order-preserving and self-organizing
1755	containers, the following snippet will compile:
1756	</para>
1757	<programlisting>
1758	  typename Cntnr::point_iterator it = c.find(2);
1759	</programlisting>
1760
1761	<para>
1762	  because a range-type iterator can always be converted to a
1763	  point-type iterator.
1764	</para>
1765
1766	<para>Distingushing between iterator types also
1767	raises the point that a container's iterators might have
1768	different invalidation rules concerning their de-referencing
1769	abilities and movement abilities. This now corresponds exactly
1770	to the question of whether point-type and range-type iterators
1771	are valid. As explained above, <classname>container_traits</classname> allows
1772	querying a container for its data structure attributes. The
1773	iterator-invalidation guarantees are certainly a property of
1774	the underlying data structure, and so
1775	</para>
1776	<programlisting>
1777	  container_traits&lt;C&gt;::invalidation_guarantee
1778	</programlisting>
1779
1780	<para>
1781	  gives one of three pre-determined types that answer this
1782	  query.
1783	</para>
1784
1785      </section>
1786    </section> <!-- tutorial -->
1787
1788    <section xml:id="pbds.using.examples">
1789      <info><title>Examples</title></info>
1790      <para>
1791	Additional code examples are provided in the source
1792	distribution, as part of the regression and performance
1793	testsuite.
1794      </para>
1795
1796      <section xml:id="pbds.using.examples.basic">
1797	<info><title>Intermediate Use</title></info>
1798
1799	<itemizedlist>
1800	  <listitem>
1801	    <para>
1802	      Basic use of maps:
1803	      <filename>basic_map.cc</filename>
1804	    </para>
1805	  </listitem>
1806
1807	  <listitem>
1808	    <para>
1809	      Basic use of sets:
1810	      <filename>basic_set.cc</filename>
1811	    </para>
1812	  </listitem>
1813
1814	  <listitem>
1815	    <para>
1816	      Conditionally erasing values from an associative container object:
1817	      <filename>erase_if.cc</filename>
1818	    </para>
1819	  </listitem>
1820
1821	  <listitem>
1822	    <para>
1823	      Basic use of multimaps:
1824	      <filename>basic_multimap.cc</filename>
1825	    </para>
1826	  </listitem>
1827
1828	  <listitem>
1829	    <para>
1830	      Basic use of multisets:
1831	      <filename>basic_multiset.cc</filename>
1832	    </para>
1833	  </listitem>
1834
1835	  <listitem>
1836	    <para>
1837	      Basic use of priority queues:
1838	      <filename>basic_priority_queue.cc</filename>
1839	    </para>
1840	  </listitem>
1841
1842	  <listitem>
1843	    <para>
1844	      Splitting and joining priority queues:
1845	      <filename>priority_queue_split_join.cc</filename>
1846	    </para>
1847	  </listitem>
1848
1849	  <listitem>
1850	    <para>
1851	      Conditionally erasing values from a priority queue:
1852	      <filename>priority_queue_erase_if.cc</filename>
1853	    </para>
1854	  </listitem>
1855	</itemizedlist>
1856
1857      </section>
1858
1859      <section xml:id="pbds.using.examples.query">
1860	<info><title>Querying with <classname>container_traits</classname> </title></info>
1861	<itemizedlist>
1862	  <listitem>
1863	    <para>
1864	      Using <classname>container_traits</classname> to query
1865	      about underlying data structure behavior:
1866	      <filename>assoc_container_traits.cc</filename>
1867	    </para>
1868	  </listitem>
1869
1870	  <listitem>
1871	    <para>
1872	      A non-compiling example showing wrong use of finding keys in
1873	      hash-based containers: <filename>hash_find_neg.cc</filename>
1874	    </para>
1875	  </listitem>
1876	  <listitem>
1877	    <para>
1878	      Using <classname>container_traits</classname>
1879	      to query about underlying data structure behavior:
1880	      <filename>priority_queue_container_traits.cc</filename>
1881	    </para>
1882	  </listitem>
1883
1884	</itemizedlist>
1885
1886      </section>
1887
1888      <section xml:id="pbds.using.examples.container">
1889	<info><title>By Container Method</title></info>
1890	<para></para>
1891
1892	<section xml:id="pbds.using.examples.container.hash">
1893	  <info><title>Hash-Based</title></info>
1894
1895	  <section xml:id="pbds.using.examples.container.hash.resize">
1896	    <info><title>size Related</title></info>
1897
1898	    <itemizedlist>
1899	      <listitem>
1900		<para>
1901		  Setting the initial size of a hash-based container
1902		  object:
1903		  <filename>hash_initial_size.cc</filename>
1904		</para>
1905	      </listitem>
1906
1907	      <listitem>
1908		<para>
1909		  A non-compiling example showing how not to resize a
1910		  hash-based container object:
1911		  <filename>hash_resize_neg.cc</filename>
1912		</para>
1913	      </listitem>
1914
1915	      <listitem>
1916		<para>
1917		  Resizing the size of a hash-based container object:
1918		  <filename>hash_resize.cc</filename>
1919		</para>
1920	      </listitem>
1921
1922	      <listitem>
1923		<para>
1924		  Showing an illegal resize of a hash-based container
1925		  object:
1926		  <filename>hash_illegal_resize.cc</filename>
1927		</para>
1928	      </listitem>
1929
1930	      <listitem>
1931		<para>
1932		  Changing the load factors of a hash-based container
1933		  object: <filename>hash_load_set_change.cc</filename>
1934		</para>
1935	      </listitem>
1936	    </itemizedlist>
1937	  </section>
1938
1939	  <section xml:id="pbds.using.examples.container.hash.hashor">
1940	    <info><title>Hashing Function Related</title></info>
1941	    <para></para>
1942
1943	    <itemizedlist>
1944	      <listitem>
1945		<para>
1946		  Using a modulo range-hashing function for the case of an
1947		  unknown skewed key distribution:
1948		  <filename>hash_mod.cc</filename>
1949		</para>
1950	      </listitem>
1951
1952	      <listitem>
1953		<para>
1954		  Writing a range-hashing functor for the case of a known
1955		  skewed key distribution:
1956		  <filename>shift_mask.cc</filename>
1957		</para>
1958	      </listitem>
1959
1960	      <listitem>
1961		<para>
1962		  Storing the hash value along with each key:
1963		  <filename>store_hash.cc</filename>
1964		</para>
1965	      </listitem>
1966
1967	      <listitem>
1968		<para>
1969		  Writing a ranged-hash functor:
1970		  <filename>ranged_hash.cc</filename>
1971		</para>
1972	      </listitem>
1973	    </itemizedlist>
1974
1975	  </section>
1976
1977	</section>
1978
1979	<section xml:id="pbds.using.examples.container.branch">
1980	  <info><title>Branch-Based</title></info>
1981
1982
1983	  <section xml:id="pbds.using.examples.container.branch.split">
1984	    <info><title>split or join Related</title></info>
1985
1986	    <itemizedlist>
1987	      <listitem>
1988		<para>
1989		  Joining two tree-based container objects:
1990		  <filename>tree_join.cc</filename>
1991		</para>
1992	      </listitem>
1993
1994	      <listitem>
1995		<para>
1996		  Splitting a PATRICIA trie container object:
1997		  <filename>trie_split.cc</filename>
1998		</para>
1999	      </listitem>
2000
2001	      <listitem>
2002		<para>
2003		  Order statistics while joining two tree-based container
2004		  objects:
2005		  <filename>tree_order_statistics_join.cc</filename>
2006		</para>
2007	      </listitem>
2008	    </itemizedlist>
2009
2010	  </section>
2011
2012	  <section xml:id="pbds.using.examples.container.branch.invariants">
2013	    <info><title>Node Invariants</title></info>
2014
2015	    <itemizedlist>
2016	      <listitem>
2017		<para>
2018		  Using trees for order statistics:
2019		  <filename>tree_order_statistics.cc</filename>
2020		</para>
2021	      </listitem>
2022
2023	      <listitem>
2024		<para>
2025		  Augmenting trees to support operations on line
2026		  intervals:
2027		  <filename>tree_intervals.cc</filename>
2028		</para>
2029	      </listitem>
2030	    </itemizedlist>
2031
2032	  </section>
2033
2034	  <section xml:id="pbds.using.examples.container.branch.trie">
2035	    <info><title>trie</title></info>
2036	    <itemizedlist>
2037	      <listitem>
2038		<para>
2039		  Using a PATRICIA trie for DNA strings:
2040		  <filename>trie_dna.cc</filename>
2041		</para>
2042	      </listitem>
2043
2044	      <listitem>
2045		<para>
2046		  Using a PATRICIA
2047		  trie for finding all entries whose key matches a given prefix:
2048		  <filename>trie_prefix_search.cc</filename>
2049		</para>
2050	      </listitem>
2051	    </itemizedlist>
2052
2053	  </section>
2054
2055	</section>
2056
2057	<section xml:id="pbds.using.examples.container.priority_queue">
2058	  <info><title>Priority Queues</title></info>
2059	  <itemizedlist>
2060	    <listitem>
2061	      <para>
2062		Cross referencing an associative container and a priority
2063		queue: <filename>priority_queue_xref.cc</filename>
2064	      </para>
2065	    </listitem>
2066
2067	    <listitem>
2068	      <para>
2069		Cross referencing a vector and a priority queue using a
2070		very simple version of Dijkstra's shortest path
2071		algorithm:
2072		<filename>priority_queue_dijkstra.cc</filename>
2073	      </para>
2074	    </listitem>
2075	  </itemizedlist>
2076
2077	</section>
2078
2079
2080      </section>
2081
2082    </section>
2083
2084  </section> <!-- using -->
2085
2086  <!-- S03: Design -->
2087
2088
2089<section xml:id="containers.pbds.design">
2090  <info><title>Design</title></info>
2091  <?dbhtml filename="policy_data_structures_design.html"?>
2092  <para></para>
2093
2094  <section xml:id="pbds.design.concepts">
2095    <info><title>Concepts</title></info>
2096
2097    <section xml:id="pbds.design.concepts.null_type">
2098      <info><title>Null Policy Classes</title></info>
2099
2100      <para>
2101	Associative containers are typically parametrized by various
2102	policies. For example, a hash-based associative container is
2103	parametrized by a hash-functor, transforming each key into an
2104	non-negative numerical type. Each such value is then further mapped
2105	into a position within the table. The mapping of a key into a
2106	position within the table is therefore a two-step process.
2107      </para>
2108
2109      <para>
2110	In some cases, instantiations are redundant. For example, when the
2111	keys are integers, it is possible to use a redundant hash policy,
2112	which transforms each key into its value.
2113      </para>
2114
2115      <para>
2116	In some other cases, these policies are irrelevant.  For example, a
2117	hash-based associative container might transform keys into positions
2118	within a table by a different method than the two-step method
2119	described above. In such a case, the hash functor is simply
2120	irrelevant.
2121      </para>
2122
2123      <para>
2124	When a policy is either redundant or irrelevant, it can be replaced
2125	by <classname>null_type</classname>.
2126      </para>
2127
2128      <para>
2129	For example, a <emphasis>set</emphasis> is an associative
2130	container with one of its template parameters (the one for the
2131	mapped type) replaced with <classname>null_type</classname>. Other
2132	places simplifications are made possible with this technique
2133	include node updates in tree and trie data structures, and hash
2134	and probe functions for hash data structures.
2135      </para>
2136    </section>
2137
2138    <section xml:id="pbds.design.concepts.associative_semantics">
2139      <info><title>Map and Set Semantics</title></info>
2140
2141      <section xml:id="concepts.associative_semantics.set_vs_map">
2142	<info>
2143	  <title>
2144	    Distinguishing Between Maps and Sets
2145	  </title>
2146	</info>
2147
2148	<para>
2149	  Anyone familiar with the standard knows that there are four kinds
2150	  of associative containers: maps, sets, multimaps, and
2151	  multisets. The map datatype associates each key to
2152	  some data.
2153	</para>
2154
2155	<para>
2156	  Sets are associative containers that simply store keys -
2157	  they do not map them to anything. In the standard, each map class
2158	  has a corresponding set class. E.g.,
2159	  <classname>std::map&lt;int, char&gt;</classname> maps each
2160	  <classname>int</classname> to a <classname>char</classname>, but
2161	  <classname>std::set&lt;int, char&gt;</classname> simply stores
2162	  <classname>int</classname>s. In this library, however, there are no
2163	  distinct classes for maps and sets. Instead, an associative
2164	  container's <classname>Mapped</classname> template parameter is a policy: if
2165	  it is instantiated by <classname>null_type</classname>, then it
2166	  is a "set"; otherwise, it is a "map". E.g.,
2167	</para>
2168	<programlisting>
2169	  cc_hash_table&lt;int, char&gt;
2170	</programlisting>
2171	<para>
2172	  is a "map" mapping each <type>int</type> value to a <type>
2173	  char</type>, but
2174	</para>
2175	<programlisting>
2176	  cc_hash_table&lt;int, null_type&gt;
2177	</programlisting>
2178	<para>
2179	  is a type that uniquely stores <type>int</type> values.
2180	</para>
2181	<para>Once the <classname>Mapped</classname> template parameter is instantiated
2182	by <classname>null_type</classname>, then
2183	the "set" acts very similarly to the standard's sets - it does not
2184	map each key to a distinct <classname>null_type</classname> object. Also,
2185	, the container's <type>value_type</type> is essentially
2186	its <type>key_type</type> - just as with the standard's sets
2187	.</para>
2188
2189	<para>
2190	  The standard's multimaps and multisets allow, respectively,
2191	  non-uniquely mapping keys and non-uniquely storing keys. As
2192	  discussed, the
2193	  reasons why this might be necessary are 1) that a key might be
2194	  decomposed into a primary key and a secondary key, 2) that a
2195	  key might appear more than once, or 3) any arbitrary
2196	  combination of 1)s and 2)s. Correspondingly,
2197	  one should use 1) "maps" mapping primary keys to secondary
2198	  keys, 2) "maps" mapping keys to size types, or 3) any arbitrary
2199	  combination of 1)s and 2)s. Thus, for example, an
2200	  <classname>std::multiset&lt;int&gt;</classname> might be used to store
2201	  multiple instances of integers, but using this library's
2202	  containers, one might use
2203	</para>
2204	<programlisting>
2205	  tree&lt;int, size_t&gt;
2206	</programlisting>
2207
2208	<para>
2209	  i.e., a <classname>map</classname> of <type>int</type>s to
2210	  <type>size_t</type>s.
2211	</para>
2212	<para>
2213	  These "multimaps" and "multisets" might be confusing to
2214	  anyone familiar with the standard's <classname>std::multimap</classname> and
2215	  <classname>std::multiset</classname>, because there is no clear
2216	  correspondence between the two. For example, in some cases
2217	  where one uses <classname>std::multiset</classname> in the standard, one might use
2218	  in this library a "multimap" of "multisets" - i.e., a
2219	  container that maps primary keys each to an associative
2220	  container that maps each secondary key to the number of times
2221	  it occurs.
2222	</para>
2223
2224	<para>
2225	  When one uses a "multimap," one should choose with care the
2226	  type of container used for secondary keys.
2227	</para>
2228      </section> <!-- map vs set -->
2229
2230
2231      <section xml:id="concepts.associative_semantics.multi">
2232	<info><title>Alternatives to <classname>std::multiset</classname> and <classname>std::multimap</classname></title></info>
2233
2234	<para>
2235	  Brace onself: this library does not contain containers like
2236	  <classname>std::multimap</classname> or
2237	  <classname>std::multiset</classname>. Instead, these data
2238	  structures can be synthesized via manipulation of the
2239	  <classname>Mapped</classname> template parameter.
2240	</para>
2241	<para>
2242	  One maps the unique part of a key - the primary key, into an
2243	  associative-container of the (originally) non-unique parts of
2244	  the key - the secondary key. A primary associative-container
2245	  is an associative container of primary keys; a secondary
2246	  associative-container is an associative container of
2247	  secondary keys.
2248	</para>
2249
2250	<para>
2251	  Stepping back a bit, and starting in from the beginning.
2252	</para>
2253
2254
2255	<para>
2256	  Maps (or sets) allow mapping (or storing) unique-key values.
2257	  The standard library also supplies associative containers which
2258	  map (or store) multiple values with equivalent keys:
2259	  <classname>std::multimap</classname>, <classname>std::multiset</classname>,
2260	  <classname>std::tr1::unordered_multimap</classname>, and
2261	  <classname>unordered_multiset</classname>. We first discuss how these might
2262	  be used, then why we think it is best to avoid them.
2263	</para>
2264
2265	<para>
2266	  Suppose one builds a simple bank-account application that
2267	  records for each client (identified by an <classname>std::string</classname>)
2268	  and account-id (marked by an <type>unsigned long</type>) -
2269	  the balance in the account (described by a
2270	  <type>float</type>). Suppose further that ordering this
2271	  information is not useful, so a hash-based container is
2272	  preferable to a tree based container. Then one can use
2273	</para>
2274
2275	<programlisting>
2276	  std::tr1::unordered_map&lt;std::pair&lt;std::string, unsigned long&gt;, float, ...&gt;
2277	</programlisting>
2278
2279	<para>
2280	  which hashes every combination of client and account-id. This
2281	  might work well, except for the fact that it is now impossible
2282	  to efficiently list all of the accounts of a specific client
2283	  (this would practically require iterating over all
2284	  entries). Instead, one can use
2285	</para>
2286
2287	<programlisting>
2288	  std::tr1::unordered_multimap&lt;std::pair&lt;std::string, unsigned long&gt;, float, ...&gt;
2289	</programlisting>
2290
2291	<para>
2292	  which hashes every client, and decides equivalence based on
2293	  client only. This will ensure that all accounts belonging to a
2294	  specific user are stored consecutively.
2295	</para>
2296
2297	<para>
2298	  Also, suppose one wants an integers' priority queue
2299	  (a container that supports <function>push</function>,
2300	  <function>pop</function>, and <function>top</function> operations, the last of which
2301	  returns the largest <type>int</type>) that also supports
2302	  operations such as <function>find</function> and <function>lower_bound</function>. A
2303	  reasonable solution is to build an adapter over
2304	  <classname>std::set&lt;int&gt;</classname>. In this adapter,
2305	  <function>push</function> will just call the tree-based
2306	  associative container's <function>insert</function> method; <function>pop</function>
2307	  will call its <function>end</function> method, and use it to return the
2308	  preceding element (which must be the largest). Then this might
2309	  work well, except that the container object cannot hold
2310	  multiple instances of the same integer (<function>push(4)</function>,
2311	  will be a no-op if <constant>4</constant> is already in the
2312	  container object). If multiple keys are necessary, then one
2313	  might build the adapter over an
2314	  <classname>std::multiset&lt;int&gt;</classname>.
2315	</para>
2316
2317	<para>
2318	  The standard library's non-unique-mapping containers are useful
2319	  when (1) a key can be decomposed in to a primary key and a
2320	  secondary key, (2) a key is needed multiple times, or (3) any
2321	  combination of (1) and (2).
2322	</para>
2323
2324	<para>
2325	  The graphic below shows how the standard library's container
2326	  design works internally; in this figure nodes shaded equally
2327	  represent equivalent-key values. Equivalent keys are stored
2328	  consecutively using the properties of the underlying data
2329	  structure: binary search trees (label A) store equivalent-key
2330	  values consecutively (in the sense of an in-order walk)
2331	  naturally; collision-chaining hash tables (label B) store
2332	  equivalent-key values in the same bucket, the bucket can be
2333	  arranged so that equivalent-key values are consecutive.
2334	</para>
2335
2336	<figure>
2337	  <title>Non-unique Mapping Standard Containers</title>
2338	  <mediaobject>
2339	    <imageobject>
2340	      <imagedata align="center" format="PNG" scale="100"
2341			 fileref="../images/pbds_embedded_lists_1.png"/>
2342	    </imageobject>
2343	    <textobject>
2344	      <phrase>Non-unique Mapping Standard Containers</phrase>
2345	    </textobject>
2346	  </mediaobject>
2347	</figure>
2348
2349	<para>
2350	  Put differently, the standards' non-unique mapping
2351	  associative-containers are associative containers that map
2352	  primary keys to linked lists that are embedded into the
2353	  container. The graphic below shows again the two
2354	  containers from the first graphic above, this time with
2355	  the embedded linked lists of the grayed nodes marked
2356	  explicitly.
2357	</para>
2358
2359	<figure xml:id="fig.pbds_embedded_lists_2">
2360	  <title>
2361	    Effect of embedded lists in
2362	    <classname>std::multimap</classname>
2363	  </title>
2364	  <mediaobject>
2365	    <imageobject>
2366	      <imagedata align="center" format="PNG" scale="100"
2367			 fileref="../images/pbds_embedded_lists_2.png"/>
2368	    </imageobject>
2369	    <textobject>
2370	      <phrase>
2371		Effect of embedded lists in
2372		<classname>std::multimap</classname>
2373	      </phrase>
2374	    </textobject>
2375	  </mediaobject>
2376	</figure>
2377
2378	<para>
2379	  These embedded linked lists have several disadvantages.
2380	</para>
2381
2382	<orderedlist>
2383	  <listitem>
2384	    <para>
2385	      The underlying data structure embeds the linked lists
2386	      according to its own consideration, which means that the
2387	      search path for a value might include several different
2388	      equivalent-key values. For example, the search path for the
2389	      the black node in either of the first graphic, labels A or B,
2390	      includes more than a single gray node.
2391	    </para>
2392	  </listitem>
2393
2394	  <listitem>
2395	    <para>
2396	      The links of the linked lists are the underlying data
2397	      structures' nodes, which typically are quite structured.  In
2398	      the case of tree-based containers (the grapic above, label
2399	      B), each "link" is actually a node with three pointers (one
2400	      to a parent and two to children), and a
2401	      relatively-complicated iteration algorithm. The linked
2402	      lists, therefore, can take up quite a lot of memory, and
2403	      iterating over all values equal to a given key (through the
2404	      return value of the standard
2405	      library's <function>equal_range</function>) can be
2406	      expensive.
2407	    </para>
2408	  </listitem>
2409
2410	  <listitem>
2411	    <para>
2412	      The primary key is stored multiply; this uses more memory.
2413	    </para>
2414	  </listitem>
2415
2416	  <listitem>
2417	    <para>
2418	      Finally, the interface of this design excludes several
2419	      useful underlying data structures. Of all the unordered
2420	      self-organizing data structures, practically only
2421	      collision-chaining hash tables can (efficiently) guarantee
2422	      that equivalent-key values are stored consecutively.
2423	    </para>
2424	  </listitem>
2425	</orderedlist>
2426
2427	<para>
2428	  The above reasons hold even when the ratio of secondary keys to
2429	  primary keys (or average number of identical keys) is small, but
2430	  when it is large, there are more severe problems:
2431	</para>
2432
2433	<orderedlist>
2434	  <listitem>
2435	    <para>
2436	      The underlying data structures order the links inside each
2437	      embedded linked-lists according to their internal
2438	      considerations, which effectively means that each of the
2439	      links is unordered. Irrespective of the underlying data
2440	      structure, searching for a specific value can degrade to
2441	      linear complexity.
2442	    </para>
2443	  </listitem>
2444
2445	  <listitem>
2446	    <para>
2447	      Similarly to the above point, it is impossible to apply
2448	      to the secondary keys considerations that apply to primary
2449	      keys. For example, it is not possible to maintain secondary
2450	      keys by sorted order.
2451	    </para>
2452	  </listitem>
2453
2454	  <listitem>
2455	    <para>
2456	      While the interface "understands" that all equivalent-key
2457	      values constitute a distinct list (through
2458	      <function>equal_range</function>), the underlying data
2459	      structure typically does not. This means that operations such
2460	      as erasing from a tree-based container all values whose keys
2461	      are equivalent to a a given key can be super-linear in the
2462	      size of the tree; this is also true also for several other
2463	      operations that target a specific list.
2464	    </para>
2465	  </listitem>
2466
2467	</orderedlist>
2468
2469	<para>
2470	  In this library, all associative containers map
2471	  (or store) unique-key values. One can (1) map primary keys to
2472	  secondary associative-containers (containers of
2473	  secondary keys) or non-associative containers (2) map identical
2474	  keys to a size-type representing the number of times they
2475	  occur, or (3) any combination of (1) and (2). Instead of
2476	  allowing multiple equivalent-key values, this library
2477	  supplies associative containers based on underlying
2478	  data structures that are suitable as secondary
2479	  associative-containers.
2480	</para>
2481
2482	<para>
2483	  In the figure below, labels A and B show the equivalent
2484	  underlying data structures in this library, as mapped to the
2485	  first graphic above. Labels A and B, respectively. Each shaded
2486	  box represents some size-type or secondary
2487	  associative-container.
2488	</para>
2489
2490	<figure>
2491	  <title>Non-unique Mapping Containers</title>
2492	  <mediaobject>
2493	    <imageobject>
2494	      <imagedata align="center" format="PNG" scale="100"
2495			 fileref="../images/pbds_embedded_lists_3.png"/>
2496	    </imageobject>
2497	    <textobject>
2498	      <phrase>Non-unique Mapping Containers</phrase>
2499	    </textobject>
2500	  </mediaobject>
2501	</figure>
2502
2503	<para>
2504	  In the first example above, then, one would use an associative
2505	  container mapping each user to an associative container which
2506	  maps each application id to a start time (see
2507	  <filename>example/basic_multimap.cc</filename>); in the second
2508	  example, one would use an associative container mapping
2509	  each <classname>int</classname> to some size-type indicating the
2510	  number of times it logically occurs
2511	  (see <filename>example/basic_multiset.cc</filename>.
2512	</para>
2513
2514	<para>
2515	  See the discussion in list-based container types for containers
2516	  especially suited as secondary associative-containers.
2517	</para>
2518      </section>
2519
2520    </section> <!-- map and set semantics -->
2521
2522    <section xml:id="pbds.design.concepts.iterator_semantics">
2523      <info><title>Iterator Semantics</title></info>
2524
2525      <section xml:id="concepts.iterator_semantics.point_and_range">
2526	<info><title>Point and Range Iterators</title></info>
2527
2528	<para>
2529	  Iterator concepts are bifurcated in this design, and are
2530	  comprised of point-type and range-type iteration.
2531	</para>
2532
2533	<para>
2534	  A point-type iterator is an iterator that refers to a specific
2535	  element as returned through an
2536	  associative-container's <function>find</function> method.
2537	</para>
2538
2539	<para>
2540	  A range-type iterator is an iterator that is used to go over a
2541	  sequence of elements, as returned by a container's
2542	  <function>find</function> method.
2543	</para>
2544
2545	<para>
2546	  A point-type method is a method that
2547	  returns a point-type iterator; a range-type method is a method
2548	  that returns a range-type iterator.
2549	</para>
2550
2551	<para>For most containers, these types are synonymous; for
2552	self-organizing containers, such as hash-based containers or
2553	priority queues, these are inherently different (in any
2554	implementation, including that of C++ standard library
2555	components), but in this design, it is made explicit. They are
2556	distinct types.
2557	</para>
2558      </section>
2559
2560
2561      <section xml:id="concepts.iterator_semantics.both">
2562	<info><title>Distinguishing Point and Range Iterators</title></info>
2563
2564	<para>When using this library, is necessary to differentiate
2565	between two types of methods and iterators: point-type methods and
2566	iterators, and range-type methods and iterators. Each associative
2567	container's interface includes the methods:</para>
2568	<programlisting>
2569	  point_const_iterator
2570	  find(const_key_reference r_key) const;
2571
2572	  point_iterator
2573	  find(const_key_reference r_key);
2574
2575	  std::pair&lt;point_iterator,bool&gt;
2576	  insert(const_reference r_val);
2577	</programlisting>
2578
2579	<para>The relationship between these iterator types varies between
2580	container types. The figure below
2581	shows the most general invariant between point-type and
2582	range-type iterators: In <emphasis>A</emphasis> <literal>iterator</literal>, can
2583	always be converted to <literal>point_iterator</literal>. In <emphasis>B</emphasis>
2584	shows invariants for order-preserving containers: point-type
2585	iterators are synonymous with range-type iterators.
2586	Orthogonally,  <emphasis>C</emphasis>shows invariants for "set"
2587	containers: iterators are synonymous with const iterators.</para>
2588
2589	<figure>
2590	  <title>Point Iterator Hierarchy</title>
2591	  <mediaobject>
2592	    <imageobject>
2593	      <imagedata align="center" format="PNG" scale="100"
2594			 fileref="../images/pbds_point_iterator_hierarchy.png"/>
2595	    </imageobject>
2596	    <textobject>
2597	      <phrase>Point Iterator Hierarchy</phrase>
2598	    </textobject>
2599	  </mediaobject>
2600	</figure>
2601
2602
2603	<para>Note that point-type iterators in self-organizing containers
2604	(hash-based associative containers) lack movement
2605	operators, such as <literal>operator++</literal> - in fact, this
2606	is the reason why this library differentiates from the standard C++ librarys
2607	design on this point.</para>
2608
2609	<para>Typically, one can determine an iterator's movement
2610	capabilities using
2611	<literal>std::iterator_traits&lt;It&gt;iterator_category</literal>,
2612	which is a <literal>struct</literal> indicating the iterator's
2613	movement capabilities. Unfortunately, none of the standard predefined
2614	categories reflect a pointer's <emphasis>not</emphasis> having any
2615	movement capabilities whatsoever. Consequently,
2616	<literal>pb_ds</literal> adds a type
2617	<literal>trivial_iterator_tag</literal> (whose name is taken from
2618	a concept in C++ standardese, which is the category of iterators
2619	with no movement capabilities.) All other standard C++ library
2620	tags, such as <literal>forward_iterator_tag</literal> retain their
2621	common use.</para>
2622
2623      </section>
2624
2625      <section xml:id="pbds.design.concepts.invalidation">
2626	<info><title>Invalidation Guarantees</title></info>
2627	<para>
2628	  If one manipulates a container object, then iterators previously
2629	  obtained from it can be invalidated. In some cases a
2630	  previously-obtained iterator cannot be de-referenced; in other cases,
2631	  the iterator's next or previous element might have changed
2632	  unpredictably. This corresponds exactly to the question whether a
2633	  point-type or range-type iterator (see previous concept) is valid or
2634	  not. In this design, one can query a container (in compile time) about
2635	  its invalidation guarantees.
2636	</para>
2637
2638
2639	<para>
2640	  Given three different types of associative containers, a modifying
2641	  operation (in that example, <function>erase</function>) invalidated
2642	  iterators in three different ways: the iterator of one container
2643	  remained completely valid - it could be de-referenced and
2644	  incremented; the iterator of a different container could not even be
2645	  de-referenced; the iterator of the third container could be
2646	  de-referenced, but its "next" iterator changed unpredictably.
2647	</para>
2648
2649	<para>
2650	  Distinguishing between find and range types allows fine-grained
2651	  invalidation guarantees, because these questions correspond exactly
2652	  to the question of whether point-type iterators and range-type
2653	  iterators are valid. The graphic below shows tags corresponding to
2654	  different types of invalidation guarantees.
2655	</para>
2656
2657	<figure>
2658	  <title>Invalidation Guarantee Tags Hierarchy</title>
2659	  <mediaobject>
2660	    <imageobject>
2661	      <imagedata align="center" format="PDF" scale="75"
2662			 fileref="../images/pbds_invalidation_tag_hierarchy.pdf"/>
2663	    </imageobject>
2664	    <imageobject>
2665	      <imagedata align="center" format="PNG" scale="100"
2666			 fileref="../images/pbds_invalidation_tag_hierarchy.png"/>
2667	    </imageobject>
2668	    <textobject>
2669	      <phrase>Invalidation Guarantee Tags Hierarchy</phrase>
2670	    </textobject>
2671	  </mediaobject>
2672	</figure>
2673
2674	<itemizedlist>
2675	  <listitem>
2676	    <para>
2677	      <classname>basic_invalidation_guarantee</classname>
2678	      corresponds to a basic guarantee that a point-type iterator,
2679	      a found pointer, or a found reference, remains valid as long
2680	      as the container object is not modified.
2681	    </para>
2682	  </listitem>
2683
2684	  <listitem>
2685	    <para>
2686	      <classname>point_invalidation_guarantee</classname>
2687	      corresponds to a guarantee that a point-type iterator, a
2688	      found pointer, or a found reference, remains valid even if
2689	      the container object is modified.
2690	    </para>
2691	  </listitem>
2692
2693	  <listitem>
2694	    <para>
2695	      <classname>range_invalidation_guarantee</classname>
2696	      corresponds to a guarantee that a range-type iterator remains
2697	      valid even if the container object is modified.
2698	    </para>
2699	  </listitem>
2700	</itemizedlist>
2701
2702	<para>To find the invalidation guarantee of a
2703	container, one can use</para>
2704	<programlisting>
2705	  typename container_traits&lt;Cntnr&gt;::invalidation_guarantee
2706	</programlisting>
2707
2708	<para>Note that this hierarchy corresponds to the logic it
2709	represents: if a container has range-invalidation guarantees,
2710	then it must also have find invalidation guarantees;
2711	correspondingly, its invalidation guarantee (in this case
2712	<classname>range_invalidation_guarantee</classname>)
2713	can be cast to its base class (in this case <classname>point_invalidation_guarantee</classname>).
2714	This means that this this hierarchy can be used easily using
2715	standard metaprogramming techniques, by specializing on the
2716	type of <literal>invalidation_guarantee</literal>.</para>
2717
2718	<para>
2719	  These types of problems were addressed, in a more general
2720	  setting, in <xref linkend="biblio.meyers96more"/> - Item 2. In
2721	  our opinion, an invalidation-guarantee hierarchy would solve
2722	  these problems in all container types - not just associative
2723	  containers.
2724	</para>
2725
2726      </section>
2727    </section> <!-- iterator semantics -->
2728
2729    <section xml:id="pbds.design.concepts.genericity">
2730      <info><title>Genericity</title></info>
2731
2732      <para>
2733	The design attempts to address the following problem of
2734	data-structure genericity. When writing a function manipulating
2735	a generic container object, what is the behavior of the object?
2736	Suppose one writes
2737      </para>
2738      <programlisting>
2739	template&lt;typename Cntnr&gt;
2740	void
2741	some_op_sequence(Cntnr &amp;r_container)
2742	{
2743	...
2744	}
2745      </programlisting>
2746
2747      <para>
2748	then one needs to address the following questions in the body
2749	of <function>some_op_sequence</function>:
2750      </para>
2751
2752      <itemizedlist>
2753	<listitem>
2754	  <para>
2755	    Which types and methods does <literal>Cntnr</literal> support?
2756	    Containers based on hash tables can be queries for the
2757	    hash-functor type and object; this is meaningless for tree-based
2758	    containers. Containers based on trees can be split, joined, or
2759	    can erase iterators and return the following iterator; this
2760	    cannot be done by hash-based containers.
2761	  </para>
2762	</listitem>
2763
2764	<listitem>
2765	  <para>
2766	    What are the exception and invalidation guarantees
2767	    of <literal>Cntnr</literal>? A container based on a probing
2768	    hash-table invalidates all iterators when it is modified; this
2769	    is not the case for containers based on node-based
2770	    trees. Containers based on a node-based tree can be split or
2771	    joined without exceptions; this is not the case for containers
2772	    based on vector-based trees.
2773	  </para>
2774	</listitem>
2775
2776	<listitem>
2777	  <para>
2778	    How does the container maintain its elements? Tree-based and
2779	    Trie-based containers store elements by key order; others,
2780	    typically, do not. A container based on a splay trees or lists
2781	    with update policies "cache" "frequently accessed" elements;
2782	    containers based on most other underlying data structures do
2783	    not.
2784	  </para>
2785	</listitem>
2786	<listitem>
2787	  <para>
2788	    How does one query a container about characteristics and
2789	    capabilities? What is the relationship between two different
2790	    data structures, if anything?
2791	  </para>
2792	</listitem>
2793      </itemizedlist>
2794
2795      <para>The remainder of this section explains these issues in
2796      detail.</para>
2797
2798
2799      <section xml:id="concepts.genericity.tag">
2800	<info><title>Tag</title></info>
2801	<para>
2802	  Tags are very useful for manipulating generic types. For example, if
2803	  <literal>It</literal> is an iterator class, then <literal>typename
2804	  It::iterator_category</literal> or <literal>typename
2805	  std::iterator_traits&lt;It&gt;::iterator_category</literal> will
2806	  yield its category, and <literal>typename
2807	  std::iterator_traits&lt;It&gt;::value_type</literal> will yield its
2808	  value type.
2809	</para>
2810
2811	<para>
2812	  This library contains a container tag hierarchy corresponding to the
2813	  diagram below.
2814	</para>
2815
2816	<figure>
2817	  <title>Container Tag Hierarchy</title>
2818	  <mediaobject>
2819	    <imageobject>
2820	      <imagedata align="center" format="PDF" scale="75"
2821			 fileref="../images/pbds_container_tag_hierarchy.pdf"/>
2822	    </imageobject>
2823	    <imageobject>
2824	      <imagedata align="center" format="PNG" scale="100"
2825			 fileref="../images/pbds_container_tag_hierarchy.png"/>
2826	    </imageobject>
2827	    <textobject>
2828	      <phrase>Container Tag Hierarchy</phrase>
2829	    </textobject>
2830	  </mediaobject>
2831	</figure>
2832
2833	<para>
2834	  Given any container <type>Cntnr</type>, the tag of
2835	  the underlying data structure can be found via <literal>typename
2836	  Cntnr::container_category</literal>.
2837	</para>
2838
2839      </section> <!-- tag -->
2840
2841      <section xml:id="concepts.genericity.traits">
2842	<info><title>Traits</title></info>
2843	<para></para>
2844
2845	<para>Additionally, a traits mechanism can be used to query a
2846	container type for its attributes. Given any container
2847	<literal>Cntnr</literal>, then <literal>&lt;Cntnr&gt;</literal>
2848	is a traits class identifying the properties of the
2849	container.</para>
2850
2851	<para>To find if a container can throw when a key is erased (which
2852	is true for vector-based trees, for example), one can
2853	use
2854	</para>
2855	<programlisting>container_traits&lt;Cntnr&gt;::erase_can_throw</programlisting>
2856
2857	<para>
2858	  Some of the definitions in <classname>container_traits</classname>
2859	  are dependent on other
2860	  definitions. If <classname>container_traits&lt;Cntnr&gt;::order_preserving</classname>
2861	  is <constant>true</constant> (which is the case for containers
2862	  based on trees and tries), then the container can be split or
2863	  joined; in this
2864	  case, <classname>container_traits&lt;Cntnr&gt;::split_join_can_throw</classname>
2865	  indicates whether splits or joins can throw exceptions (which is
2866	  true for vector-based trees);
2867	  otherwise <classname>container_traits&lt;Cntnr&gt;::split_join_can_throw</classname>
2868	  will yield a compilation error. (This is somewhat similar to a
2869	  compile-time version of the COM model).
2870	</para>
2871
2872      </section> <!-- traits -->
2873
2874    </section> <!-- genericity -->
2875  </section> <!-- concepts -->
2876
2877  <section xml:id="pbds.design.container">
2878    <info><title>By Container</title></info>
2879
2880    <!-- hash -->
2881    <section xml:id="pbds.design.container.hash">
2882      <info><title>hash</title></info>
2883
2884      <!--
2885
2886// hash policies
2887/// general terms / background
2888/// range hashing policies
2889/// ranged-hash policies
2890/// implementation
2891
2892// resize policies
2893/// general
2894/// size policies
2895/// trigger policies
2896/// implementation
2897
2898// policy interactions
2899/// probe/size/trigger
2900/// hash/trigger
2901/// eq/hash/storing hash values
2902/// size/load-check trigger
2903      -->
2904      <section xml:id="container.hash.interface">
2905	<info><title>Interface</title></info>
2906
2907
2908
2909	<para>
2910	  The collision-chaining hash-based container has the
2911	following declaration.</para>
2912	<programlisting>
2913	  template&lt;
2914	  typename Key,
2915	  typename Mapped,
2916	  typename Hash_Fn = std::hash&lt;Key&gt;,
2917	  typename Eq_Fn = std::equal_to&lt;Key&gt;,
2918	  typename Comb_Hash_Fn =  direct_mask_range_hashing&lt;&gt;
2919	  typename Resize_Policy = default explained below.
2920	  bool Store_Hash = false,
2921	  typename Allocator = std::allocator&lt;char&gt; &gt;
2922	  class cc_hash_table;
2923	</programlisting>
2924
2925	<para>The parameters have the following meaning:</para>
2926
2927	<orderedlist>
2928	  <listitem><para><classname>Key</classname> is the key type.</para></listitem>
2929
2930	  <listitem><para><classname>Mapped</classname> is the mapped-policy.</para></listitem>
2931
2932	  <listitem><para><classname>Hash_Fn</classname> is a key hashing functor.</para></listitem>
2933
2934	  <listitem><para><classname>Eq_Fn</classname> is a key equivalence functor.</para></listitem>
2935
2936	  <listitem><para><classname>Comb_Hash_Fn</classname> is a range-hashing_functor;
2937	  it describes how to translate hash values into positions
2938	  within the table. </para></listitem>
2939
2940	  <listitem><para><classname>Resize_Policy</classname> describes how a container object
2941	  should change its internal size. </para></listitem>
2942
2943	  <listitem><para><classname>Store_Hash</classname> indicates whether the hash value
2944	  should be stored with each entry. </para></listitem>
2945
2946	  <listitem><para><classname>Allocator</classname> is an allocator
2947	  type.</para></listitem>
2948	</orderedlist>
2949
2950	<para>The probing hash-based container has the following
2951	declaration.</para>
2952	<programlisting>
2953	  template&lt;
2954	  typename Key,
2955	  typename Mapped,
2956	  typename Hash_Fn = std::hash&lt;Key&gt;,
2957	  typename Eq_Fn = std::equal_to&lt;Key&gt;,
2958	  typename Comb_Probe_Fn = direct_mask_range_hashing&lt;&gt;
2959	  typename Probe_Fn = default explained below.
2960	  typename Resize_Policy = default explained below.
2961	  bool Store_Hash = false,
2962	  typename Allocator =  std::allocator&lt;char&gt; &gt;
2963	  class gp_hash_table;
2964	</programlisting>
2965
2966	<para>The parameters are identical to those of the
2967	collision-chaining container, except for the following.</para>
2968
2969	<orderedlist>
2970	  <listitem><para><classname>Comb_Probe_Fn</classname> describes how to transform a probe
2971	  sequence into a sequence of positions within the table.</para></listitem>
2972
2973	  <listitem><para><classname>Probe_Fn</classname> describes a probe sequence policy.</para></listitem>
2974	</orderedlist>
2975
2976	<para>Some of the default template values depend on the values of
2977	other parameters, and are explained below.</para>
2978
2979      </section>
2980      <section xml:id="container.hash.details">
2981	<info><title>Details</title></info>
2982
2983	<section xml:id="container.hash.details.hash_policies">
2984	  <info><title>Hash Policies</title></info>
2985
2986	  <section xml:id="details.hash_policies.general">
2987	    <info><title>General</title></info>
2988
2989	    <para>Following is an explanation of some functions which hashing
2990	    involves. The graphic below illustrates the discussion.</para>
2991
2992	    <figure>
2993	      <title>Hash functions, ranged-hash functions, and
2994	      range-hashing functions</title>
2995	      <mediaobject>
2996		<imageobject>
2997		  <imagedata align="center" format="PNG" scale="100"
2998			     fileref="../images/pbds_hash_ranged_hash_range_hashing_fns.png"/>
2999		</imageobject>
3000		<textobject>
3001		  <phrase>Hash functions, ranged-hash functions, and
3002		  range-hashing functions</phrase>
3003		</textobject>
3004	      </mediaobject>
3005	    </figure>
3006	    
3007	    <para>Let U be a domain (e.g., the integers, or the
3008	    strings of 3 characters). A hash-table algorithm needs to map
3009	    elements of U "uniformly" into the range [0,..., m -
3010	    1] (where m is a non-negative integral value, and
3011	    is, in general, time varying). I.e., the algorithm needs
3012	    a ranged-hash function</para>
3013
3014	    <para>
3015	      f : U �� Z<subscript>+</subscript> ��� Z<subscript>+</subscript>
3016	    </para>
3017
3018	    <para>such that for any u in U ,</para>
3019
3020	    <para>0 ��� f(u, m) ��� m - 1</para>
3021
3022	    <para>and which has "good uniformity" properties (say
3023	    <xref linkend="biblio.knuth98sorting"/>.)
3024	    One
3025	    common solution is to use the composition of the hash
3026	    function</para>
3027
3028	    <para>h : U ��� Z<subscript>+</subscript> ,</para>
3029
3030	    <para>which maps elements of U into the non-negative
3031	    integrals, and</para>
3032
3033	    <para>g : Z<subscript>+</subscript> �� Z<subscript>+</subscript> ���
3034	    Z<subscript>+</subscript>,</para>
3035
3036	    <para>which maps a non-negative hash value, and a non-negative
3037	    range upper-bound into a non-negative integral in the range
3038	    between 0 (inclusive) and the range upper bound (exclusive),
3039	    i.e., for any r in Z<subscript>+</subscript>,</para>
3040
3041	    <para>0 ��� g(r, m) ��� m - 1</para>
3042
3043
3044	    <para>The resulting ranged-hash function, is</para>
3045
3046	    <!-- ranged_hash_composed_of_hash_and_range_hashing -->
3047	    <equation>
3048	      <title>Ranged Hash Function</title>
3049	      <mathphrase>
3050		f(u , m) = g(h(u), m)
3051	      </mathphrase>
3052	    </equation>
3053
3054	    <para>From the above, it is obvious that given g and
3055	    h, f can always be composed (however the converse
3056	    is not true). The standard's hash-based containers allow specifying
3057	    a hash function, and use a hard-wired range-hashing function;
3058	    the ranged-hash function is implicitly composed.</para>
3059
3060	    <para>The above describes the case where a key is to be mapped
3061	    into a single position within a hash table, e.g.,
3062	    in a collision-chaining table. In other cases, a key is to be
3063	    mapped into a sequence of positions within a table,
3064	    e.g., in a probing table. Similar terms apply in this
3065	    case: the table requires a ranged probe function,
3066	    mapping a key into a sequence of positions withing the table.
3067	    This is typically achieved by composing a hash function
3068	    mapping the key into a non-negative integral type, a
3069	    probe function transforming the hash value into a
3070	    sequence of hash values, and a range-hashing function
3071	    transforming the sequence of hash values into a sequence of
3072	    positions.</para>
3073
3074	  </section>
3075
3076	  <section xml:id="details.hash_policies.range">
3077	    <info><title>Range Hashing</title></info>
3078
3079	    <para>Some common choices for range-hashing functions are the
3080	    division, multiplication, and middle-square methods (<xref linkend="biblio.knuth98sorting"/>), defined
3081	    as</para>
3082
3083	    <equation>
3084	      <title>Range-Hashing, Division Method</title>
3085	      <mathphrase>
3086		g(r, m) = r mod m
3087	      </mathphrase>
3088	    </equation>
3089
3090
3091
3092	    <para>g(r, m) = ��� u/v ( a r mod v ) ���</para>
3093
3094	    <para>and</para>
3095
3096	    <para>g(r, m) = ��� u/v ( r<superscript>2</superscript> mod v ) ���</para>
3097
3098	    <para>respectively, for some positive integrals u and
3099	    v (typically powers of 2), and some a. Each of
3100	    these range-hashing functions works best for some different
3101	    setting.</para>
3102
3103	    <para>The division method (see above) is a
3104	    very common choice. However, even this single method can be
3105	    implemented in two very different ways. It is possible to
3106	    implement using the low
3107	    level % (modulo) operation (for any m), or the
3108	    low level &amp; (bit-mask) operation (for the case where
3109	    m is a power of 2), i.e.,</para>
3110
3111	    <equation>
3112	      <title>Division via Prime Modulo</title>
3113	      <mathphrase>
3114		g(r, m) = r % m
3115	      </mathphrase>
3116	    </equation>
3117
3118	    <para>and</para>
3119
3120	    <equation>
3121	      <title>Division via Bit Mask</title>
3122	      <mathphrase>
3123		g(r, m) = r &amp; m - 1, (with m =
3124		2<superscript>k</superscript> for some k)
3125	      </mathphrase>
3126	    </equation>
3127
3128
3129	    <para>respectively.</para>
3130
3131	    <para>The % (modulo) implementation has the advantage that for
3132	    m a prime far from a power of 2, g(r, m) is
3133	    affected by all the bits of r (minimizing the chance of
3134	    collision). It has the disadvantage of using the costly modulo
3135	    operation. This method is hard-wired into SGI's implementation
3136	    .</para>
3137
3138	    <para>The &amp; (bit-mask) implementation has the advantage of
3139	    relying on the fast bit-wise and operation. It has the
3140	    disadvantage that for g(r, m) is affected only by the
3141	    low order bits of r. This method is hard-wired into
3142	    Dinkumware's implementation.</para>
3143
3144
3145	  </section>
3146
3147	  <section xml:id="details.hash_policies.ranged">
3148	    <info><title>Ranged Hash</title></info>
3149
3150	    <para>In cases it is beneficial to allow the
3151	    client to directly specify a ranged-hash hash function. It is
3152	    true, that the writer of the ranged-hash function cannot rely
3153	    on the values of m having specific numerical properties
3154	    suitable for hashing (in the sense used in <xref linkend="biblio.knuth98sorting"/>), since
3155	    the values of m are determined by a resize policy with
3156	    possibly orthogonal considerations.</para>
3157
3158	    <para>There are two cases where a ranged-hash function can be
3159	    superior. The firs is when using perfect hashing: the
3160	    second is when the values of m can be used to estimate
3161	    the "general" number of distinct values required. This is
3162	    described in the following.</para>
3163
3164	    <para>Let</para>
3165
3166	    <para>
3167	      s = [ s<subscript>0</subscript>,..., s<subscript>t - 1</subscript>]
3168	    </para>
3169
3170	    <para>be a string of t characters, each of which is from
3171	    domain S. Consider the following ranged-hash
3172	    function:</para>
3173	    <equation>
3174	      <title>
3175		A Standard String Hash Function
3176	      </title>
3177	      <mathphrase>
3178		f<subscript>1</subscript>(s, m) = ��� <subscript>i =
3179		0</subscript><superscript>t - 1</superscript> s<subscript>i</subscript> a<superscript>i</superscript> mod m
3180	      </mathphrase>
3181	    </equation>
3182	    
3183
3184	    <para>where a is some non-negative integral value. This is
3185	    the standard string-hashing function used in SGI's
3186	    implementation (with a = 5). Its advantage is that
3187	    it takes into account all of the characters of the string.</para>
3188
3189	    <para>Now assume that s is the string representation of a
3190	    of a long DNA sequence (and so S = {'A', 'C', 'G',
3191	    'T'}). In this case, scanning the entire string might be
3192	    prohibitively expensive. A possible alternative might be to use
3193	    only the first k characters of the string, where</para>
3194
3195	    <para>|S|<superscript>k</superscript> ��� m ,</para>
3196
3197	    <para>i.e., using the hash function</para>
3198
3199	    <equation>
3200	      <title>
3201		Only k String DNA Hash
3202	      </title>
3203	      <mathphrase>
3204		f<subscript>2</subscript>(s, m) = ��� <subscript>i
3205		= 0</subscript><superscript>k - 1</superscript> s<subscript>i</subscript> a<superscript>i</superscript> mod m 
3206	      </mathphrase>
3207	    </equation>
3208
3209	    <para>requiring scanning over only</para>
3210
3211	    <para>k = log<subscript>4</subscript>( m )</para>
3212
3213	    <para>characters.</para>
3214
3215	    <para>Other more elaborate hash-functions might scan k
3216	    characters starting at a random position (determined at each
3217	    resize), or scanning k random positions (determined at
3218	    each resize), i.e., using</para>
3219
3220	    <para>f<subscript>3</subscript>(s, m) = ��� <subscript>i =
3221	    r</subscript>0<superscript>r<subscript>0</subscript> + k - 1</superscript> s<subscript>i</subscript>
3222	    a<superscript>i</superscript> mod m ,</para>
3223
3224	    <para>or</para>
3225
3226	    <para>f<subscript>4</subscript>(s, m) = ��� <subscript>i = 0</subscript><superscript>k -
3227	    1</superscript> s<subscript>r</subscript>i a<superscript>r<subscript>i</subscript></superscript> mod
3228	    m ,</para>
3229
3230	    <para>respectively, for r<subscript>0</subscript>,..., r<subscript>k-1</subscript>
3231	    each in the (inclusive) range [0,...,t-1].</para>
3232
3233	    <para>It should be noted that the above functions cannot be
3234	    decomposed as per a ranged hash composed of hash and range hashing.</para>
3235
3236
3237	  </section>
3238
3239	  <section xml:id="details.hash_policies.implementation">
3240	    <info><title>Implementation</title></info>
3241
3242	    <para>This sub-subsection describes the implementation of
3243	    the above in this library. It first explains range-hashing
3244	    functions in collision-chaining tables, then ranged-hash
3245	    functions in collision-chaining tables, then probing-based
3246	    tables, and finally lists the relevant classes in this
3247	    library.</para>
3248
3249	    <section xml:id="hash_policies.implementation.collision-chaining">
3250	      <info><title>
3251		Range-Hashing and Ranged-Hashes in Collision-Chaining Tables
3252	      </title></info>
3253
3254
3255	      <para><classname>cc_hash_table</classname> is
3256	      parametrized by <classname>Hash_Fn</classname> and <classname>Comb_Hash_Fn</classname>, a
3257	      hash functor and a combining hash functor, respectively.</para>
3258
3259	      <para>In general, <classname>Comb_Hash_Fn</classname> is considered a
3260	      range-hashing functor. <classname>cc_hash_table</classname>
3261	      synthesizes a ranged-hash function from <classname>Hash_Fn</classname> and
3262	      <classname>Comb_Hash_Fn</classname>. The figure below shows an <classname>insert</classname> sequence
3263	      diagram for this case. The user inserts an element (point A),
3264	      the container transforms the key into a non-negative integral
3265	      using the hash functor (points B and C), and transforms the
3266	      result into a position using the combining functor (points D
3267	      and E).</para>
3268
3269	      <figure>
3270		<title>Insert hash sequence diagram</title>
3271		<mediaobject>
3272		  <imageobject>
3273		    <imagedata align="center" format="PNG" scale="100"
3274			       fileref="../images/pbds_hash_range_hashing_seq_diagram.png"/>
3275		  </imageobject>
3276		  <textobject>
3277		    <phrase>Insert hash sequence diagram</phrase>
3278		  </textobject>
3279		</mediaobject>
3280	      </figure>
3281	      
3282	      <para>If <classname>cc_hash_table</classname>'s
3283	      hash-functor, <classname>Hash_Fn</classname> is instantiated by <classname>null_type</classname> , then <classname>Comb_Hash_Fn</classname> is taken to be
3284	      a ranged-hash function. The graphic below shows an <function>insert</function> sequence
3285	      diagram. The user inserts an element (point A), the container
3286	      transforms the key into a position using the combining functor
3287	      (points B and C).</para>
3288
3289	      <figure>
3290		<title>Insert hash sequence diagram with a null policy</title>
3291		<mediaobject>
3292		  <imageobject>
3293		    <imagedata align="center" format="PNG" scale="100"
3294			       fileref="../images/pbds_hash_range_hashing_seq_diagram2.png"/>
3295		  </imageobject>
3296		  <textobject>
3297		    <phrase>Insert hash sequence diagram with a null policy</phrase>
3298		  </textobject>
3299		</mediaobject>
3300	      </figure>
3301	      
3302	    </section>
3303
3304	    <section xml:id="hash_policies.implementation.probe">
3305	      <info><title>
3306		Probing tables
3307	      </title></info>
3308	      <para><classname>gp_hash_table</classname> is parametrized by
3309	      <classname>Hash_Fn</classname>, <classname>Probe_Fn</classname>,
3310	      and <classname>Comb_Probe_Fn</classname>. As before, if
3311	      <classname>Hash_Fn</classname> and <classname>Probe_Fn</classname>
3312	      are both <classname>null_type</classname>, then
3313	      <classname>Comb_Probe_Fn</classname> is a ranged-probe
3314	      functor. Otherwise, <classname>Hash_Fn</classname> is a hash
3315	      functor, <classname>Probe_Fn</classname> is a functor for offsets
3316	      from a hash value, and <classname>Comb_Probe_Fn</classname>
3317	      transforms a probe sequence into a sequence of positions within
3318	      the table.</para>
3319
3320	    </section>
3321
3322	    <section xml:id="hash_policies.implementation.predefined">
3323	      <info><title>
3324		Pre-Defined Policies
3325	      </title></info>
3326
3327	      <para>This library contains some pre-defined classes
3328	      implementing range-hashing and probing functions:</para>
3329
3330	      <orderedlist>
3331		<listitem><para><classname>direct_mask_range_hashing</classname>
3332		and <classname>direct_mod_range_hashing</classname>
3333		are range-hashing functions based on a bit-mask and a modulo
3334		operation, respectively.</para></listitem>
3335
3336		<listitem><para><classname>linear_probe_fn</classname>, and
3337		<classname>quadratic_probe_fn</classname> are
3338		a linear probe and a quadratic probe function,
3339		respectively.</para></listitem>
3340	      </orderedlist>
3341
3342	      <para>
3343		The graphic below shows the relationships.
3344	      </para>
3345	      <figure>
3346		<title>Hash policy class diagram</title>
3347		<mediaobject>
3348		  <imageobject>
3349		    <imagedata align="center" format="PNG" scale="100"
3350			       fileref="../images/pbds_hash_policy_cd.png"/>
3351		  </imageobject>
3352		  <textobject>
3353		    <phrase>Hash policy class diagram</phrase>
3354		  </textobject>
3355		</mediaobject>
3356	      </figure>
3357
3358
3359	    </section>
3360
3361	  </section> <!-- impl -->
3362
3363	</section>
3364
3365	<section xml:id="container.hash.details.resize_policies">
3366	  <info><title>Resize Policies</title></info>
3367
3368	  <section xml:id="resize_policies.general">
3369	    <info><title>General</title></info>
3370
3371	    <para>Hash-tables, as opposed to trees, do not naturally grow or
3372	    shrink. It is necessary to specify policies to determine how
3373	    and when a hash table should change its size. Usually, resize
3374	    policies can be decomposed into orthogonal policies:</para>
3375
3376	    <orderedlist>
3377	      <listitem><para>A size policy indicating how a hash table
3378	      should grow (e.g., it should multiply by powers of
3379	      2).</para></listitem>
3380
3381	      <listitem><para>A trigger policy indicating when a hash
3382	      table should grow (e.g., a load factor is
3383	      exceeded).</para></listitem>
3384	    </orderedlist>
3385
3386	  </section>
3387
3388	  <section xml:id="resize_policies.size">
3389	    <info><title>Size Policies</title></info>
3390
3391
3392	    <para>Size policies determine how a hash table changes size. These
3393	    policies are simple, and there are relatively few sensible
3394	    options. An exponential-size policy (with the initial size and
3395	    growth factors both powers of 2) works well with a mask-based
3396	    range-hashing function, and is the
3397	    hard-wired policy used by Dinkumware. A
3398	    prime-list based policy works well with a modulo-prime range
3399	    hashing function and is the hard-wired policy used by SGI's
3400	    implementation.</para>
3401
3402	  </section>
3403
3404	  <section xml:id="resize_policies.trigger">
3405	    <info><title>Trigger Policies</title></info>
3406
3407	    <para>Trigger policies determine when a hash table changes size.
3408	    Following is a description of two policies: load-check
3409	    policies, and collision-check policies.</para>
3410
3411	    <para>Load-check policies are straightforward. The user specifies
3412	    two factors, ��<subscript>min</subscript> and
3413	    ��<subscript>max</subscript>, and the hash table maintains the
3414	    invariant that</para>
3415
3416	    <para>��<subscript>min</subscript> ��� (number of
3417	    stored elements) / (hash-table size) ���
3418	    ��<subscript>max</subscript><remark>load factor min max</remark></para>
3419
3420	    <para>Collision-check policies work in the opposite direction of
3421	    load-check policies. They focus on keeping the number of
3422	    collisions moderate and hoping that the size of the table will
3423	    not grow very large, instead of keeping a moderate load-factor
3424	    and hoping that the number of collisions will be small. A
3425	    maximal collision-check policy resizes when the longest
3426	    probe-sequence grows too large.</para>
3427
3428	    <para>Consider the graphic below. Let the size of the hash table
3429	    be denoted by m, the length of a probe sequence be denoted by k,
3430	    and some load factor be denoted by ��. We would like to
3431	    calculate the minimal length of k, such that if there were ��
3432	    m elements in the hash table, a probe sequence of length k would
3433	    be found with probability at most 1/m.</para>
3434
3435	    <figure>
3436	      <title>Balls and bins</title>
3437	      <mediaobject>
3438		<imageobject>
3439		  <imagedata align="center" format="PNG" scale="100"
3440			     fileref="../images/pbds_balls_and_bins.png"/>
3441		</imageobject>
3442		<textobject>
3443		  <phrase>Balls and bins</phrase>
3444		</textobject>
3445	      </mediaobject>
3446	    </figure>
3447
3448	    <para>Denote the probability that a probe sequence of length
3449	    k appears in bin i by p<subscript>i</subscript>, the
3450	    length of the probe sequence of bin i by
3451	    l<subscript>i</subscript>, and assume uniform distribution. Then</para>
3452
3453
3454
3455	    <equation>
3456	      <title>
3457		Probability of Probe Sequence of Length k
3458	      </title>
3459	      <mathphrase>
3460		p<subscript>1</subscript> = 
3461	      </mathphrase>
3462	    </equation>
3463
3464	    <para>P(l<subscript>1</subscript> ��� k) =</para>
3465
3466	    <para>
3467	      P(l<subscript>1</subscript> ��� �� ( 1 + k / �� - 1) ��� (a)
3468	    </para>
3469
3470	    <para>
3471	      e ^ ( - ( �� ( k / �� - 1 )<superscript>2</superscript> ) /2)
3472	    </para>
3473
3474	    <para>where (a) follows from the Chernoff bound (<xref linkend="biblio.motwani95random"/>). To
3475	    calculate the probability that some bin contains a probe
3476	    sequence greater than k, we note that the
3477	    l<subscript>i</subscript> are negatively-dependent
3478	    (<xref linkend="biblio.dubhashi98neg"/>)
3479	    . Let
3480	    I(.) denote the indicator function. Then</para>
3481
3482	    <equation>
3483	      <title>
3484		Probability Probe Sequence in Some Bin
3485	      </title>
3486	      <mathphrase>
3487		P( exists<subscript>i</subscript> l<subscript>i</subscript> ��� k ) = 
3488	      </mathphrase>
3489	    </equation>
3490
3491	    <para>P ( ��� <subscript>i = 1</subscript><superscript>m</superscript>
3492	    I(l<subscript>i</subscript> ��� k) ��� 1 ) =</para>
3493
3494	    <para>P ( ��� <subscript>i = 1</subscript><superscript>m</superscript> I (
3495	    l<subscript>i</subscript> ��� k ) ��� m p<subscript>1</subscript> ( 1 + 1 / (m
3496	    p<subscript>1</subscript>) - 1 ) ) ��� (a)</para>
3497
3498	    <para>e ^ ( ( - m p<subscript>1</subscript> ( 1 / (m p<subscript>1</subscript>)
3499	    - 1 ) <superscript>2</superscript> ) / 2 ) ,</para>
3500
3501	    <para>where (a) follows from the fact that the Chernoff bound can
3502	    be applied to negatively-dependent variables (<xref
3503	    linkend="biblio.dubhashi98neg"/>). Inserting the first probability
3504	    equation into the second one, and equating with 1/m, we
3505	    obtain</para>
3506
3507
3508	    <para>k ~ ��� ( 2 �� ln 2 m ln(m) )
3509	    ) .</para>
3510
3511	  </section>
3512
3513	  <section xml:id="resize_policies.impl">
3514	    <info><title>Implementation</title></info>
3515
3516	    <para>This sub-subsection describes the implementation of the
3517	    above in this library. It first describes resize policies and
3518	    their decomposition into trigger and size policies, then
3519	    describes pre-defined classes, and finally discusses controlled
3520	    access the policies' internals.</para>
3521
3522	    <section xml:id="resize_policies.impl.decomposition">
3523	      <info><title>Decomposition</title></info>
3524
3525
3526	      <para>Each hash-based container is parametrized by a
3527	      <classname>Resize_Policy</classname> parameter; the container derives
3528	      <classname>public</classname>ly from <classname>Resize_Policy</classname>. For
3529	      example:</para>
3530	      <programlisting>
3531		cc_hash_table&lt;typename Key,
3532		typename Mapped,
3533		...
3534		typename Resize_Policy
3535		...&gt; : public Resize_Policy
3536	      </programlisting>
3537
3538	      <para>As a container object is modified, it continuously notifies
3539	      its <classname>Resize_Policy</classname> base of internal changes
3540	      (e.g., collisions encountered and elements being
3541	      inserted). It queries its <classname>Resize_Policy</classname> base whether
3542	      it needs to be resized, and if so, to what size.</para>
3543
3544	      <para>The graphic below shows a (possible) sequence diagram
3545	      of an insert operation. The user inserts an element; the hash
3546	      table notifies its resize policy that a search has started
3547	      (point A); in this case, a single collision is encountered -
3548	      the table notifies its resize policy of this (point B); the
3549	      container finally notifies its resize policy that the search
3550	      has ended (point C); it then queries its resize policy whether
3551	      a resize is needed, and if so, what is the new size (points D
3552	      to G); following the resize, it notifies the policy that a
3553	      resize has completed (point H); finally, the element is
3554	      inserted, and the policy notified (point I).</para>
3555
3556	      <figure>
3557		<title>Insert resize sequence diagram</title>
3558		<mediaobject>
3559		  <imageobject>
3560		    <imagedata align="center" format="PNG" scale="100"
3561			       fileref="../images/pbds_insert_resize_sequence_diagram1.png"/>
3562		  </imageobject>
3563		  <textobject>
3564		    <phrase>Insert resize sequence diagram</phrase>
3565		  </textobject>
3566		</mediaobject>
3567	      </figure>
3568
3569
3570	      <para>In practice, a resize policy can be usually orthogonally
3571	      decomposed to a size policy and a trigger policy. Consequently,
3572	      the library contains a single class for instantiating a resize
3573	      policy: <classname>hash_standard_resize_policy</classname>
3574	      is parametrized by <classname>Size_Policy</classname> and
3575	      <classname>Trigger_Policy</classname>, derives <classname>public</classname>ly from
3576	      both, and acts as a standard delegate (<xref linkend="biblio.gof"/>)
3577	      to these policies.</para>
3578
3579	      <para>The two graphics immediately below show sequence diagrams
3580	      illustrating the interaction between the standard resize policy
3581	      and its trigger and size policies, respectively.</para>
3582
3583	      <figure>
3584		<title>Standard resize policy trigger sequence
3585		diagram</title>
3586		<mediaobject>
3587		  <imageobject>
3588		    <imagedata align="center" format="PNG" scale="100"
3589			       fileref="../images/pbds_insert_resize_sequence_diagram2.png"/>
3590		  </imageobject>
3591		  <textobject>
3592		    <phrase>Standard resize policy trigger sequence
3593		    diagram</phrase>
3594		  </textobject>
3595		</mediaobject>
3596	      </figure>
3597
3598	      <figure>
3599		<title>Standard resize policy size sequence
3600		diagram</title>
3601		<mediaobject>
3602		  <imageobject>
3603		    <imagedata align="center" format="PNG" scale="100"
3604			       fileref="../images/pbds_insert_resize_sequence_diagram3.png"/>
3605		  </imageobject>
3606		  <textobject>
3607		    <phrase>Standard resize policy size sequence
3608		    diagram</phrase>
3609		  </textobject>
3610		</mediaobject>
3611	      </figure>
3612
3613
3614	    </section>
3615
3616	    <section xml:id="resize_policies.impl.predefined">
3617	      <info><title>Predefined Policies</title></info>
3618	      <para>The library includes the following
3619	      instantiations of size and trigger policies:</para>
3620
3621	      <orderedlist>
3622		<listitem><para><classname>hash_load_check_resize_trigger</classname>
3623		implements a load check trigger policy.</para></listitem>
3624
3625		<listitem><para><classname>cc_hash_max_collision_check_resize_trigger</classname>
3626		implements a collision check trigger policy.</para></listitem>
3627
3628		<listitem><para><classname>hash_exponential_size_policy</classname>
3629		implements an exponential-size policy (which should be used
3630		with mask range hashing).</para></listitem>
3631
3632		<listitem><para><classname>hash_prime_size_policy</classname>
3633		implementing a size policy based on a sequence of primes
3634		(which should
3635		be used with mod range hashing</para></listitem>
3636	      </orderedlist>
3637
3638	      <para>The graphic below gives an overall picture of the resize-related
3639	      classes. <classname>basic_hash_table</classname>
3640	      is parametrized by <classname>Resize_Policy</classname>, which it subclasses
3641	      publicly. This class is currently instantiated only by <classname>hash_standard_resize_policy</classname>. 
3642	      <classname>hash_standard_resize_policy</classname>
3643	      itself is parametrized by <classname>Trigger_Policy</classname> and
3644	      <classname>Size_Policy</classname>. Currently, <classname>Trigger_Policy</classname> is
3645	      instantiated by <classname>hash_load_check_resize_trigger</classname>,
3646	      or <classname>cc_hash_max_collision_check_resize_trigger</classname>;
3647	      <classname>Size_Policy</classname> is instantiated by <classname>hash_exponential_size_policy</classname>,
3648	      or <classname>hash_prime_size_policy</classname>.</para>
3649
3650	    </section>
3651
3652	    <section xml:id="resize_policies.impl.internals">
3653	      <info><title>Controling Access to Internals</title></info>
3654
3655	      <para>There are cases where (controlled) access to resize
3656	      policies' internals is beneficial. E.g., it is sometimes
3657	      useful to query a hash-table for the table's actual size (as
3658	      opposed to its <function>size()</function> - the number of values it
3659	      currently holds); it is sometimes useful to set a table's
3660	      initial size, externally resize it, or change load factors.</para>
3661
3662	      <para>Clearly, supporting such methods both decreases the
3663	      encapsulation of hash-based containers, and increases the
3664	      diversity between different associative-containers' interfaces.
3665	      Conversely, omitting such methods can decrease containers'
3666	      flexibility.</para>
3667
3668	      <para>In order to avoid, to the extent possible, the above
3669	      conflict, the hash-based containers themselves do not address
3670	      any of these questions; this is deferred to the resize policies,
3671	      which are easier to change or replace. Thus, for example,
3672	      neither <classname>cc_hash_table</classname> nor
3673	      <classname>gp_hash_table</classname>
3674	      contain methods for querying the actual size of the table; this
3675	      is deferred to <classname>hash_standard_resize_policy</classname>.</para>
3676
3677	      <para>Furthermore, the policies themselves are parametrized by
3678	      template arguments that determine the methods they support
3679	      (
3680	      <xref linkend="biblio.alexandrescu01modern"/>
3681	      shows techniques for doing so). <classname>hash_standard_resize_policy</classname>
3682	      is parametrized by <classname>External_Size_Access</classname> that
3683	      determines whether it supports methods for querying the actual
3684	      size of the table or resizing it. <classname>hash_load_check_resize_trigger</classname>
3685	      is parametrized by <classname>External_Load_Access</classname> that
3686	      determines whether it supports methods for querying or
3687	      modifying the loads. <classname>cc_hash_max_collision_check_resize_trigger</classname>
3688	      is parametrized by <classname>External_Load_Access</classname> that
3689	      determines whether it supports methods for querying the
3690	      load.</para>
3691
3692	      <para>Some operations, for example, resizing a container at
3693	      run time, or changing the load factors of a load-check trigger
3694	      policy, require the container itself to resize. As mentioned
3695	      above, the hash-based containers themselves do not contain
3696	      these types of methods, only their resize policies.
3697	      Consequently, there must be some mechanism for a resize policy
3698	      to manipulate the hash-based container. As the hash-based
3699	      container is a subclass of the resize policy, this is done
3700	      through virtual methods. Each hash-based container has a
3701	      <classname>private</classname> <classname>virtual</classname> method:</para>
3702	      <programlisting>
3703		virtual void
3704		do_resize
3705		(size_type new_size);
3706	      </programlisting>
3707
3708	      <para>which resizes the container. Implementations of
3709	      <classname>Resize_Policy</classname> can export public methods for resizing
3710	      the container externally; these methods internally call
3711	      <classname>do_resize</classname> to resize the table.</para>
3712
3713
3714	    </section>
3715
3716	  </section>
3717
3718
3719	</section> <!-- resize policies -->
3720
3721	<section xml:id="container.hash.details.policy_interaction">
3722	  <info><title>Policy Interactions</title></info>
3723	  <para>
3724	  </para>
3725	  <para>Hash-tables are unfortunately especially susceptible to
3726	  choice of policies. One of the more complicated aspects of this
3727	  is that poor combinations of good policies can form a poor
3728	  container. Following are some considerations.</para>
3729
3730	  <section xml:id="policy_interaction.probesizetrigger">
3731	    <info><title>probe/size/trigger</title></info>
3732
3733	    <para>Some combinations do not work well for probing containers.
3734	    For example, combining a quadratic probe policy with an
3735	    exponential size policy can yield a poor container: when an
3736	    element is inserted, a trigger policy might decide that there
3737	    is no need to resize, as the table still contains unused
3738	    entries; the probe sequence, however, might never reach any of
3739	    the unused entries.</para>
3740
3741	    <para>Unfortunately, this library cannot detect such problems at
3742	    compilation (they are halting reducible). It therefore defines
3743	    an exception class <classname>insert_error</classname> to throw an
3744	    exception in this case.</para>
3745
3746	  </section>
3747
3748	  <section xml:id="policy_interaction.hashtrigger">
3749	    <info><title>hash/trigger</title></info>
3750
3751	    <para>Some trigger policies are especially susceptible to poor
3752	    hash functions. Suppose, as an extreme case, that the hash
3753	    function transforms each key to the same hash value. After some
3754	    inserts, a collision detecting policy will always indicate that
3755	    the container needs to grow.</para>
3756
3757	    <para>The library, therefore, by design, limits each operation to
3758	    one resize. For each <classname>insert</classname>, for example, it queries
3759	    only once whether a resize is needed.</para>
3760
3761	  </section>
3762
3763	  <section xml:id="policy_interaction.eqstorehash">
3764	    <info><title>equivalence functors/storing hash values/hash</title></info>
3765
3766	    <para><classname>cc_hash_table</classname> and
3767	    <classname>gp_hash_table</classname> are
3768	    parametrized by an equivalence functor and by a
3769	    <classname>Store_Hash</classname> parameter. If the latter parameter is
3770	    <classname>true</classname>, then the container stores with each entry
3771	    a hash value, and uses this value in case of collisions to
3772	    determine whether to apply a hash value. This can lower the
3773	    cost of collision for some types, but increase the cost of
3774	    collisions for other types.</para>
3775
3776	    <para>If a ranged-hash function or ranged probe function is
3777	    directly supplied, however, then it makes no sense to store the
3778	    hash value with each entry. This library's container will
3779	    fail at compilation, by design, if this is attempted.</para>
3780
3781	  </section>
3782
3783	  <section xml:id="policy_interaction.sizeloadtrigger">
3784	    <info><title>size/load-check trigger</title></info>
3785
3786	    <para>Assume a size policy issues an increasing sequence of sizes
3787	    a, a q, a q<superscript>1</superscript>, a q<superscript>2</superscript>, ... For
3788	    example, an exponential size policy might issue the sequence of
3789	    sizes 8, 16, 32, 64, ...</para>
3790
3791	    <para>If a load-check trigger policy is used, with loads
3792	    ��<subscript>min</subscript> and ��<subscript>max</subscript>,
3793	    respectively, then it is a good idea to have:</para>
3794
3795	    <orderedlist>
3796	      <listitem><para>��<subscript>max</subscript> ~ 1 / q</para></listitem>
3797
3798	      <listitem><para>��<subscript>min</subscript> &lt; 1 / (2 q)</para></listitem>
3799	    </orderedlist>
3800
3801	    <para>This will ensure that the amortized hash cost of each
3802	    modifying operation is at most approximately 3.</para>
3803
3804	    <para>��<subscript>min</subscript> ~ ��<subscript>max</subscript> is, in
3805	    any case, a bad choice, and ��<subscript>min</subscript> &gt;
3806	    �� <subscript>max</subscript> is horrendous.</para>
3807
3808	  </section>
3809
3810	</section>
3811
3812      </section> <!-- details -->
3813
3814    </section> <!-- hash -->
3815
3816    <!-- tree -->
3817    <section xml:id="pbds.design.container.tree">
3818      <info><title>tree</title></info>
3819
3820      <section xml:id="container.tree.interface">
3821	<info><title>Interface</title></info>
3822
3823	<para>The tree-based container has the following declaration:</para>
3824	<programlisting>
3825	  template&lt;
3826	  typename Key,
3827	  typename Mapped,
3828	  typename Cmp_Fn = std::less&lt;Key&gt;,
3829	  typename Tag = rb_tree_tag,
3830	  template&lt;
3831	  typename Const_Node_Iterator,
3832	  typename Node_Iterator,
3833	  typename Cmp_Fn_,
3834	  typename Allocator_&gt;
3835	  class Node_Update = null_node_update,
3836	  typename Allocator = std::allocator&lt;char&gt; &gt;
3837	  class tree;
3838	</programlisting>
3839
3840	<para>The parameters have the following meaning:</para>
3841
3842	<orderedlist>
3843	  <listitem>
3844	  <para><classname>Key</classname> is the key type.</para></listitem>
3845
3846	  <listitem>
3847	  <para><classname>Mapped</classname> is the mapped-policy.</para></listitem>
3848
3849	  <listitem>
3850	  <para><classname>Cmp_Fn</classname> is a key comparison functor</para></listitem>
3851
3852	  <listitem>
3853	    <para><classname>Tag</classname> specifies which underlying data structure
3854	  to use.</para></listitem>
3855
3856	  <listitem>
3857	    <para><classname>Node_Update</classname> is a policy for updating node
3858	  invariants.</para></listitem>
3859
3860	  <listitem>
3861	    <para><classname>Allocator</classname> is an allocator
3862	  type.</para></listitem>
3863	</orderedlist>
3864
3865	<para>The <classname>Tag</classname> parameter specifies which underlying
3866	data structure to use. Instantiating it by <classname>rb_tree_tag</classname>, <classname>splay_tree_tag</classname>, or
3867	<classname>ov_tree_tag</classname>,
3868	specifies an underlying red-black tree, splay tree, or
3869	ordered-vector tree, respectively; any other tag is illegal.
3870	Note that containers based on the former two contain more types
3871	and methods than the latter (e.g.,
3872	<classname>reverse_iterator</classname> and <classname>rbegin</classname>), and different
3873	exception and invalidation guarantees.</para>
3874
3875      </section>
3876
3877      <section xml:id="container.tree.details">
3878	<info><title>Details</title></info>
3879
3880	<section xml:id="container.tree.node">
3881	  <info><title>Node Invariants</title></info>
3882
3883
3884	  <para>Consider the two trees in the graphic below, labels A and B. The first
3885	  is a tree of floats; the second is a tree of pairs, each
3886	  signifying a geometric line interval. Each element in a tree is referred to as a node of the tree. Of course, each of
3887	  these trees can support the usual queries: the first can easily
3888	  search for <classname>0.4</classname>; the second can easily search for
3889	  <classname>std::make_pair(10, 41)</classname>.</para>
3890
3891	  <para>Each of these trees can efficiently support other queries.
3892	  The first can efficiently determine that the 2rd key in the
3893	  tree is <constant>0.3</constant>; the second can efficiently determine
3894	  whether any of its intervals overlaps
3895	  <programlisting>std::make_pair(29,42)</programlisting> (useful in geometric
3896	  applications or distributed file systems with leases, for
3897	  example).  It should be noted that an <classname>std::set</classname> can
3898	  only solve these types of problems with linear complexity.</para>
3899
3900	  <para>In order to do so, each tree stores some metadata in
3901	  each node, and maintains node invariants (see <xref linkend="biblio.clrs2001"/>.) The first stores in
3902	  each node the size of the sub-tree rooted at the node; the
3903	  second stores at each node the maximal endpoint of the
3904	  intervals at the sub-tree rooted at the node.</para>
3905
3906	  <figure>
3907	    <title>Tree node invariants</title>
3908	    <mediaobject>
3909	      <imageobject>
3910		<imagedata align="center" format="PNG" scale="100"
3911			   fileref="../images/pbds_tree_node_invariants.png"/>
3912	      </imageobject>
3913	      <textobject>
3914		<phrase>Tree node invariants</phrase>
3915	      </textobject>
3916	    </mediaobject>
3917	  </figure>
3918	  
3919	  <para>Supporting such trees is difficult for a number of
3920	  reasons:</para>
3921
3922	  <orderedlist>
3923	    <listitem><para>There must be a way to specify what a node's metadata
3924	    should be (if any).</para></listitem>
3925
3926	    <listitem><para>Various operations can invalidate node
3927	    invariants.  The graphic below shows how a right rotation,
3928	    performed on A, results in B, with nodes x and y having
3929	    corrupted invariants (the grayed nodes in C). The graphic shows
3930	    how an insert, performed on D, results in E, with nodes x and y
3931	    having corrupted invariants (the grayed nodes in F). It is not
3932	    feasible to know outside the tree the effect of an operation on
3933	    the nodes of the tree.</para></listitem>
3934
3935	    <listitem><para>The search paths of standard associative containers are
3936	    defined by comparisons between keys, and not through
3937	    metadata.</para></listitem>
3938
3939	    <listitem><para>It is not feasible to know in advance which methods trees
3940	    can support. Besides the usual <classname>find</classname> method, the
3941	    first tree can support a <classname>find_by_order</classname> method, while
3942	    the second can support an <classname>overlaps</classname> method.</para></listitem>
3943	  </orderedlist>
3944
3945	  <figure>
3946	    <title>Tree node invalidation</title>
3947	    <mediaobject>
3948	      <imageobject>
3949		<imagedata align="center" format="PNG" scale="100"
3950			   fileref="../images/pbds_tree_node_invalidations.png"/>
3951	      </imageobject>
3952	      <textobject>
3953		<phrase>Tree node invalidation</phrase>
3954	      </textobject>
3955	    </mediaobject>
3956	  </figure>
3957
3958	  <para>These problems are solved by a combination of two means:
3959	  node iterators, and template-template node updater
3960	  parameters.</para>
3961
3962	  <section xml:id="container.tree.node.iterators">
3963	    <info><title>Node Iterators</title></info>
3964
3965
3966	    <para>Each tree-based container defines two additional iterator
3967	    types, <classname>const_node_iterator</classname>
3968	    and <classname>node_iterator</classname>.
3969	    These iterators allow descending from a node to one of its
3970	    children. Node iterator allow search paths different than those
3971	    determined by the comparison functor. The <classname>tree</classname>
3972	    supports the methods:</para>
3973	    <programlisting>
3974	      const_node_iterator
3975	      node_begin() const;
3976
3977	      node_iterator
3978	      node_begin();
3979
3980	      const_node_iterator
3981	      node_end() const;
3982
3983	      node_iterator
3984	      node_end(); 
3985	    </programlisting>
3986
3987	    <para>The first pairs return node iterators corresponding to the
3988	    root node of the tree; the latter pair returns node iterators
3989	    corresponding to a just-after-leaf node.</para>
3990	  </section>
3991
3992	  <section xml:id="container.tree.node.updator">
3993	    <info><title>Node Updator</title></info>
3994
3995	    <para>The tree-based containers are parametrized by a
3996	    <classname>Node_Update</classname> template-template parameter. A
3997	    tree-based container instantiates
3998	    <classname>Node_Update</classname> to some
3999	    <classname>node_update</classname> class, and publicly subclasses
4000	    <classname>node_update</classname>. The graphic below shows this
4001	    scheme, as well as some predefined policies (which are explained
4002	    below).</para>
4003
4004	    <figure>
4005	      <title>A tree and its update policy</title>
4006	      <mediaobject>
4007		<imageobject>
4008		  <imagedata align="center" format="PNG" scale="100"
4009			     fileref="../images/pbds_tree_node_updator_policy_cd.png"/>
4010		</imageobject>
4011		<textobject>
4012		  <phrase>A tree and its update policy</phrase>
4013		</textobject>
4014	      </mediaobject>
4015	    </figure>
4016
4017	    <para><classname>node_update</classname> (an instantiation of
4018	    <classname>Node_Update</classname>) must define <classname>metadata_type</classname> as
4019	    the type of metadata it requires. For order statistics,
4020	    e.g., <classname>metadata_type</classname> might be <classname>size_t</classname>.
4021	    The tree defines within each node a <classname>metadata_type</classname>
4022	    object.</para>
4023
4024	    <para><classname>node_update</classname> must also define the following method
4025	    for restoring node invariants:</para>
4026	    <programlisting>
4027	      void 
4028	      operator()(node_iterator nd_it, const_node_iterator end_nd_it)
4029	    </programlisting>
4030
4031	    <para>In this method, <varname>nd_it</varname> is a
4032	    <classname>node_iterator</classname> corresponding to a node whose
4033	    A) all descendants have valid invariants, and B) its own
4034	    invariants might be violated; <classname>end_nd_it</classname> is
4035	    a <classname>const_node_iterator</classname> corresponding to a
4036	    just-after-leaf node. This method should correct the node
4037	    invariants of the node pointed to by
4038	    <classname>nd_it</classname>. For example, say node x in the
4039	    graphic below label A has an invalid invariant, but its' children,
4040	    y and z have valid invariants. After the invocation, all three
4041	    nodes should have valid invariants, as in label B.</para>
4042
4043
4044	    <figure>
4045	      <title>Restoring node invariants</title>
4046	      <mediaobject>
4047		<imageobject>
4048		  <imagedata align="center" format="PNG" scale="100"
4049			     fileref="../images/pbds_restoring_node_invariants.png"/>
4050		</imageobject>
4051		<textobject>
4052		  <phrase>Restoring node invariants</phrase>
4053		</textobject>
4054	      </mediaobject>
4055	    </figure>
4056
4057	    <para>When a tree operation might invalidate some node invariant,
4058	    it invokes this method in its <classname>node_update</classname> base to
4059	    restore the invariant. For example, the graphic below shows
4060	    an <function>insert</function> operation (point A); the tree performs some
4061	    operations, and calls the update functor three times (points B,
4062	    C, and D). (It is well known that any <function>insert</function>,
4063	    <function>erase</function>, <function>split</function> or <function>join</function>, can restore
4064	    all node invariants by a small number of node invariant updates (<xref linkend="biblio.clrs2001"/>)
4065	    .</para>
4066
4067	    <figure>
4068	      <title>Insert update sequence</title>
4069	      <mediaobject>
4070		<imageobject>
4071		  <imagedata align="center" format="PNG" scale="100"
4072			     fileref="../images/pbds_update_seq_diagram.png"/>
4073		</imageobject>
4074		<textobject>
4075		  <phrase>Insert update sequence</phrase>
4076		</textobject>
4077	      </mediaobject>
4078	    </figure>
4079
4080	    <para>To complete the description of the scheme, three questions
4081	    need to be answered:</para>
4082
4083	    <orderedlist>
4084	      <listitem><para>How can a tree which supports order statistics define a
4085	      method such as <classname>find_by_order</classname>?</para></listitem>
4086
4087	      <listitem><para>How can the node updater base access methods of the
4088	      tree?</para></listitem>
4089
4090	      <listitem><para>How can the following cyclic dependency be resolved?
4091	      <classname>node_update</classname> is a base class of the tree, yet it
4092	      uses node iterators defined in the tree (its child).</para></listitem>
4093	    </orderedlist>
4094
4095	    <para>The first two questions are answered by the fact that
4096	    <classname>node_update</classname> (an instantiation of
4097	    <classname>Node_Update</classname>) is a <emphasis>public</emphasis> base class
4098	    of the tree. Consequently:</para>
4099
4100	    <orderedlist>
4101	      <listitem><para>Any public methods of
4102	      <classname>node_update</classname> are automatically methods of
4103	      the tree (<xref linkend="biblio.alexandrescu01modern"/>).
4104	      Thus an order-statistics node updater,
4105	      <classname>tree_order_statistics_node_update</classname> defines
4106	      the <function>find_by_order</function> method; any tree
4107	      instantiated by this policy consequently supports this method as
4108	      well.</para></listitem>
4109
4110	      <listitem><para>In C++, if a base class declares a method as
4111	      <literal>virtual</literal>, it is
4112	      <literal>virtual</literal> in its subclasses. If
4113	      <classname>node_update</classname> needs to access one of the
4114	      tree's methods, say the member function
4115	      <function>end</function>, it simply declares that method as
4116	      <literal>virtual</literal> abstract.</para></listitem>
4117	    </orderedlist>
4118
4119	    <para>The cyclic dependency is solved through template-template
4120	    parameters. <classname>Node_Update</classname> is parametrized by
4121	    the tree's node iterators, its comparison functor, and its
4122	    allocator type. Thus, instantiations of
4123	    <classname>Node_Update</classname> have all information
4124	    required.</para>
4125
4126	    <para>This library assumes that constructing a metadata object and
4127	    modifying it are exception free. Suppose that during some method,
4128	    say <classname>insert</classname>, a metadata-related operation
4129	    (e.g., changing the value of a metadata) throws an exception. Ack!
4130	    Rolling back the method is unusually complex.</para>
4131
4132	    <para>Previously, a distinction was made between redundant
4133	    policies and null policies. Node invariants show a
4134	    case where null policies are required.</para>
4135
4136	    <para>Assume a regular tree is required, one which need not
4137	    support order statistics or interval overlap queries.
4138	    Seemingly, in this case a redundant policy - a policy which
4139	    doesn't affect nodes' contents would suffice. This, would lead
4140	    to the following drawbacks:</para>
4141
4142	    <orderedlist>
4143	      <listitem><para>Each node would carry a useless metadata object, wasting
4144	      space.</para></listitem>
4145
4146	      <listitem><para>The tree cannot know if its
4147	      <classname>Node_Update</classname> policy actually modifies a
4148	      node's metadata (this is halting reducible). In the graphic
4149	      below, assume the shaded node is inserted. The tree would have
4150	      to traverse the useless path shown to the root, applying
4151	      redundant updates all the way.</para></listitem>
4152	    </orderedlist>
4153	    <figure>
4154	      <title>Useless update path</title>
4155	      <mediaobject>
4156		<imageobject>
4157		  <imagedata align="center" format="PNG" scale="100"
4158			     fileref="../images/pbds_rationale_null_node_updator.png"/>
4159		</imageobject>
4160		<textobject>
4161		  <phrase>Useless update path</phrase>
4162		</textobject>
4163	      </mediaobject>
4164	    </figure>
4165
4166
4167	    <para>A null policy class, <classname>null_node_update</classname>
4168	    solves both these problems. The tree detects that node
4169	    invariants are irrelevant, and defines all accordingly.</para>
4170
4171	  </section>
4172
4173	</section> 
4174
4175	<section xml:id="container.tree.details.split">
4176	  <info><title>Split and Join</title></info>
4177
4178	  <para>Tree-based containers support split and join methods.
4179	  It is possible to split a tree so that it passes
4180	  all nodes with keys larger than a given key to a different
4181	  tree. These methods have the following advantages over the
4182	  alternative of externally inserting to the destination
4183	  tree and erasing from the source tree:</para>
4184
4185	  <orderedlist>
4186	    <listitem><para>These methods are efficient - red-black trees are split
4187	    and joined in poly-logarithmic complexity; ordered-vector
4188	    trees are split and joined at linear complexity. The
4189	    alternatives have super-linear complexity.</para></listitem>
4190
4191	    <listitem><para>Aside from orders of growth, these operations perform
4192	    few allocations and de-allocations. For red-black trees, allocations are not performed,
4193	    and the methods are exception-free. </para></listitem>
4194	  </orderedlist>
4195	</section>
4196
4197      </section> <!-- details -->
4198
4199    </section> <!-- tree -->
4200
4201    <!-- trie -->
4202    <section xml:id="pbds.design.container.trie">
4203      <info><title>Trie</title></info>
4204
4205      <section xml:id="container.trie.interface">
4206	<info><title>Interface</title></info>
4207
4208	<para>The trie-based container has the following declaration:</para>
4209	<programlisting>
4210	  template&lt;typename Key,
4211	  typename Mapped,
4212	  typename Cmp_Fn = std::less&lt;Key&gt;,
4213	  typename Tag = pat_trie_tag,
4214	  template&lt;typename Const_Node_Iterator,
4215	  typename Node_Iterator,
4216	  typename E_Access_Traits_,
4217	  typename Allocator_&gt;
4218	  class Node_Update = null_node_update,
4219	  typename Allocator = std::allocator&lt;char&gt; &gt;
4220	  class trie;
4221	</programlisting>
4222
4223	<para>The parameters have the following meaning:</para>
4224
4225	<orderedlist>
4226	  <listitem><para><classname>Key</classname> is the key type.</para></listitem>
4227
4228	  <listitem><para><classname>Mapped</classname> is the mapped-policy.</para></listitem>
4229
4230	  <listitem><para><classname>E_Access_Traits</classname> is described in below.</para></listitem>
4231
4232	  <listitem><para><classname>Tag</classname> specifies which underlying data structure
4233	  to use, and is described shortly.</para></listitem>
4234
4235	  <listitem><para><classname>Node_Update</classname> is a policy for updating node
4236	  invariants. This is described below.</para></listitem>
4237
4238	  <listitem><para><classname>Allocator</classname> is an allocator
4239	  type.</para></listitem>
4240	</orderedlist>
4241
4242	<para>The <classname>Tag</classname> parameter specifies which underlying
4243	data structure to use. Instantiating it by <classname>pat_trie_tag</classname>, specifies an
4244	underlying PATRICIA trie (explained shortly); any other tag is
4245	currently illegal.</para>
4246
4247	<para>Following is a description of a (PATRICIA) trie
4248	(this implementation follows <xref linkend="biblio.okasaki98mereable"/> and 
4249	<xref linkend="biblio.filliatre2000ptset"/>). 
4250	</para>
4251
4252	<para>A (PATRICIA) trie is similar to a tree, but with the
4253	following differences:</para>
4254
4255	<orderedlist>
4256	  <listitem><para>It explicitly views keys as a sequence of elements.
4257	  E.g., a trie can view a string as a sequence of
4258	  characters; a trie can view a number as a sequence of
4259	  bits.</para></listitem>
4260
4261	  <listitem><para>It is not (necessarily) binary. Each node has fan-out n
4262	  + 1, where n is the number of distinct
4263	  elements.</para></listitem>
4264
4265	  <listitem><para>It stores values only at leaf nodes.</para></listitem>
4266
4267	  <listitem><para>Internal nodes have the properties that A) each has at
4268	  least two children, and B) each shares the same prefix with
4269	  any of its descendant.</para></listitem>
4270	</orderedlist>
4271
4272	<para>A (PATRICIA) trie has some useful properties:</para>
4273
4274	<orderedlist>
4275	  <listitem><para>It can be configured to use large node fan-out, giving it
4276	  very efficient find performance (albeit at insertion
4277	  complexity and size).</para></listitem>
4278
4279	  <listitem><para>It works well for common-prefix keys.</para></listitem>
4280
4281	  <listitem><para>It can support efficiently queries such as which
4282	  keys match a certain prefix. This is sometimes useful in file
4283	  systems and routers, and for "type-ahead" aka predictive text matching
4284	  on mobile devices.</para></listitem>
4285	</orderedlist>
4286
4287
4288      </section>
4289
4290      <section xml:id="container.trie.details">
4291	<info><title>Details</title></info>
4292
4293	<section xml:id="container.trie.details.etraits">
4294	  <info><title>Element Access Traits</title></info>
4295
4296	  <para>A trie inherently views its keys as sequences of elements.
4297	  For example, a trie can view a string as a sequence of
4298	  characters. A trie needs to map each of n elements to a
4299	  number in {0, n - 1}. For example, a trie can map a
4300	  character <varname>c</varname> to
4301	  <programlisting>static_cast&lt;size_t&gt;(c)</programlisting>.</para>
4302
4303	  <para>Seemingly, then, a trie can assume that its keys support
4304	  (const) iterators, and that the <classname>value_type</classname> of this
4305	  iterator can be cast to a <classname>size_t</classname>. There are several
4306	  reasons, though, to decouple the mechanism by which the trie
4307	  accesses its keys' elements from the trie:</para>
4308
4309	  <orderedlist>
4310	    <listitem><para>In some cases, the numerical value of an element is
4311	    inappropriate. Consider a trie storing DNA strings. It is
4312	    logical to use a trie with a fan-out of 5 = 1 + |{'A', 'C',
4313	    'G', 'T'}|. This requires mapping 'T' to 3, though.</para></listitem>
4314
4315	    <listitem><para>In some cases the keys' iterators are different than what
4316	    is needed. For example, a trie can be used to search for
4317	    common suffixes, by using strings'
4318	    <classname>reverse_iterator</classname>. As another example, a trie mapping
4319	    UNICODE strings would have a huge fan-out if each node would
4320	    branch on a UNICODE character; instead, one can define an
4321	    iterator iterating over 8-bit (or less) groups.</para></listitem>
4322	  </orderedlist>
4323
4324	  <para>trie is,
4325	  consequently, parametrized by <classname>E_Access_Traits</classname> -
4326	  traits which instruct how to access sequences' elements.
4327	  <classname>string_trie_e_access_traits</classname>
4328	  is a traits class for strings. Each such traits define some
4329	  types, like:</para>
4330	  <programlisting>
4331	    typename E_Access_Traits::const_iterator
4332	  </programlisting>
4333
4334	  <para>is a const iterator iterating over a key's elements. The
4335	  traits class must also define methods for obtaining an iterator
4336	  to the first and last element of a key.</para>
4337
4338	  <para>The graphic below shows a
4339	  (PATRICIA) trie resulting from inserting the words: "I wish
4340	  that I could ever see a poem lovely as a trie" (which,
4341	  unfortunately, does not rhyme).</para>
4342
4343	  <para>The leaf nodes contain values; each internal node contains
4344	  two <classname>typename E_Access_Traits::const_iterator</classname>
4345	  objects, indicating the maximal common prefix of all keys in
4346	  the sub-tree. For example, the shaded internal node roots a
4347	  sub-tree with leafs "a" and "as". The maximal common prefix is
4348	  "a". The internal node contains, consequently, to const
4349	  iterators, one pointing to <varname>'a'</varname>, and the other to
4350	  <varname>'s'</varname>.</para>
4351
4352	  <figure>
4353	    <title>A PATRICIA trie</title>
4354	    <mediaobject>
4355	      <imageobject>
4356		<imagedata align="center" format="PNG" scale="100"
4357			   fileref="../images/pbds_pat_trie.png"/>
4358	      </imageobject>
4359	      <textobject>
4360		<phrase>A PATRICIA trie</phrase>
4361	      </textobject>
4362	    </mediaobject>
4363	  </figure>
4364
4365	</section>
4366
4367	<section xml:id="container.trie.details.node">
4368	  <info><title>Node Invariants</title></info>
4369
4370	  <para>Trie-based containers support node invariants, as do
4371	  tree-based containers. There are two minor
4372	  differences, though, which, unfortunately, thwart sharing them
4373	  sharing the same node-updating policies:</para>
4374
4375	  <orderedlist>
4376	    <listitem>
4377	      <para>A trie's <classname>Node_Update</classname> template-template
4378	      parameter is parametrized by <classname>E_Access_Traits</classname>, while
4379	      a tree's <classname>Node_Update</classname> template-template parameter is
4380	    parametrized by <classname>Cmp_Fn</classname>.</para></listitem>
4381
4382	    <listitem><para>Tree-based containers store values in all nodes, while
4383	    trie-based containers (at least in this implementation) store
4384	    values in leafs.</para></listitem>
4385	  </orderedlist>
4386
4387	  <para>The graphic below shows the scheme, as well as some predefined
4388	  policies (which are explained below).</para>
4389
4390	  <figure>
4391	    <title>A trie and its update policy</title>
4392	    <mediaobject>
4393	      <imageobject>
4394		<imagedata align="center" format="PNG" scale="100"
4395			   fileref="../images/pbds_trie_node_updator_policy_cd.png"/>
4396	      </imageobject>
4397	      <textobject>
4398		<phrase>A trie and its update policy</phrase>
4399	      </textobject>
4400	    </mediaobject>
4401	  </figure>
4402
4403
4404	  <para>This library offers the following pre-defined trie node
4405	  updating policies:</para>
4406
4407	  <orderedlist>
4408	    <listitem>
4409	      <para>
4410		<classname>trie_order_statistics_node_update</classname>
4411		supports order statistics.
4412	      </para>
4413	    </listitem>
4414
4415	    <listitem><para><classname>trie_prefix_search_node_update</classname>
4416	    supports searching for ranges that match a given prefix.</para></listitem>
4417
4418	    <listitem><para><classname>null_node_update</classname>
4419	    is the null node updater.</para></listitem>
4420	  </orderedlist>
4421
4422	</section>
4423
4424	<section xml:id="container.trie.details.split">
4425	  <info><title>Split and Join</title></info>
4426	  <para>Trie-based containers support split and join methods; the
4427	  rationale is equal to that of tree-based containers supporting
4428	  these methods.</para>
4429	</section>
4430
4431      </section> <!-- details -->
4432
4433    </section> <!-- trie -->
4434
4435    <!-- list_update -->
4436    <section xml:id="pbds.design.container.list">
4437      <info><title>List</title></info>
4438
4439      <section xml:id="container.list.interface">
4440	<info><title>Interface</title></info>
4441
4442	<para>The list-based container has the following declaration:</para>
4443	<programlisting>
4444	  template&lt;typename Key,
4445	  typename Mapped,
4446	  typename Eq_Fn = std::equal_to&lt;Key&gt;,
4447	  typename Update_Policy = move_to_front_lu_policy&lt;&gt;,
4448	  typename Allocator = std::allocator&lt;char&gt; &gt;
4449	  class list_update;
4450	</programlisting>
4451
4452	<para>The parameters have the following meaning:</para>
4453
4454	<orderedlist>
4455	  <listitem>
4456	    <para>
4457	      <classname>Key</classname> is the key type.
4458	    </para>
4459	  </listitem>
4460
4461	  <listitem>
4462	    <para>
4463	      <classname>Mapped</classname> is the mapped-policy.
4464	    </para>
4465	  </listitem>
4466
4467	  <listitem>
4468	    <para>
4469	      <classname>Eq_Fn</classname> is a key equivalence functor.
4470	    </para>
4471	  </listitem>
4472
4473	  <listitem>
4474	    <para>
4475	      <classname>Update_Policy</classname> is a policy updating positions in
4476	      the list based on access patterns. It is described in the
4477	      following subsection.
4478	    </para>
4479	  </listitem>
4480
4481	  <listitem>
4482	    <para>
4483	      <classname>Allocator</classname> is an allocator type.
4484	    </para>
4485	  </listitem>
4486	</orderedlist>
4487
4488	<para>A list-based associative container is a container that
4489	stores elements in a linked-list. It does not order the elements
4490	by any particular order related to the keys.  List-based
4491	containers are primarily useful for creating "multimaps". In fact,
4492	list-based containers are designed in this library expressly for
4493	this purpose.</para>
4494
4495	<para>List-based containers might also be useful for some rare
4496	cases, where a key is encapsulated to the extent that only
4497	key-equivalence can be tested. Hash-based containers need to know
4498	how to transform a key into a size type, and tree-based containers
4499	need to know if some key is larger than another.  List-based
4500	associative containers, conversely, only need to know if two keys
4501	are equivalent.</para>
4502
4503	<para>Since a list-based associative container does not order
4504	elements by keys, is it possible to order the list in some
4505	useful manner? Remarkably, many on-line competitive
4506	algorithms exist for reordering lists to reflect access
4507	prediction. (See <xref linkend="biblio.motwani95random"/> and <xref linkend="biblio.andrew04mtf"/>).
4508	</para>
4509
4510      </section>
4511
4512      <section xml:id="container.list.details">
4513	<info><title>Details</title></info>
4514	<para>
4515	</para>
4516	<section xml:id="container.list.details.ds">
4517	  <info><title>Underlying Data Structure</title></info>
4518
4519	  <para>The graphic below shows a
4520	  simple list of integer keys. If we search for the integer 6, we
4521	  are paying an overhead: the link with key 6 is only the fifth
4522	  link; if it were the first link, it could be accessed
4523	  faster.</para>
4524
4525	  <figure>
4526	    <title>A simple list</title>
4527	    <mediaobject>
4528	      <imageobject>
4529		<imagedata align="center" format="PNG" scale="100"
4530			   fileref="../images/pbds_simple_list.png"/>
4531	      </imageobject>
4532	      <textobject>
4533		<phrase>A simple list</phrase>
4534	      </textobject>
4535	    </mediaobject>
4536	  </figure>
4537
4538	  <para>List-update algorithms reorder lists as elements are
4539	  accessed. They try to determine, by the access history, which
4540	  keys to move to the front of the list. Some of these algorithms
4541	  require adding some metadata alongside each entry.</para>
4542
4543	  <para>For example, in the graphic below label A shows the counter
4544	  algorithm. Each node contains both a key and a count metadata
4545	  (shown in bold). When an element is accessed (e.g. 6) its count is
4546	  incremented, as shown in label B. If the count reaches some
4547	  predetermined value, say 10, as shown in label C, the count is set
4548	  to 0 and the node is moved to the front of the list, as in label
4549	  D.
4550	  </para>
4551
4552	  <figure>
4553	    <title>The counter algorithm</title>
4554	    <mediaobject>
4555	      <imageobject>
4556		<imagedata align="center" format="PNG" scale="100"
4557			   fileref="../images/pbds_list_update.png"/>
4558	      </imageobject>
4559	      <textobject>
4560		<phrase>The counter algorithm</phrase>
4561	      </textobject>
4562	    </mediaobject>
4563	  </figure>
4564
4565
4566	</section>
4567
4568	<section xml:id="container.list.details.policies">
4569	  <info><title>Policies</title></info>
4570
4571	  <para>this library allows instantiating lists with policies
4572	  implementing any algorithm moving nodes to the front of the
4573	  list (policies implementing algorithms interchanging nodes are
4574	  unsupported).</para>
4575
4576	  <para>Associative containers based on lists are parametrized by a
4577	  <classname>Update_Policy</classname> parameter. This parameter defines the
4578	  type of metadata each node contains, how to create the
4579	  metadata, and how to decide, using this metadata, whether to
4580	  move a node to the front of the list. A list-based associative
4581	  container object derives (publicly) from its update policy.
4582	  </para>
4583
4584	  <para>An instantiation of <classname>Update_Policy</classname> must define
4585	  internally <classname>update_metadata</classname> as the metadata it
4586	  requires. Internally, each node of the list contains, besides
4587	  the usual key and data, an instance of <classname>typename
4588	  Update_Policy::update_metadata</classname>.</para>
4589
4590	  <para>An instantiation of <classname>Update_Policy</classname> must define
4591	  internally two operators:</para>
4592	  <programlisting>
4593	    update_metadata
4594	    operator()();
4595
4596	    bool
4597	    operator()(update_metadata &amp;);
4598	  </programlisting>
4599
4600	  <para>The first is called by the container object, when creating a
4601	  new node, to create the node's metadata. The second is called
4602	  by the container object, when a node is accessed (
4603	  when a find operation's key is equivalent to the key of the
4604	  node), to determine whether to move the node to the front of
4605	  the list.
4606	  </para>
4607
4608	  <para>The library contains two predefined implementations of
4609	  list-update policies. The first
4610	  is <classname>lu_counter_policy</classname>, which implements the
4611	  counter algorithm described above. The second is
4612	  <classname>lu_move_to_front_policy</classname>,
4613	  which unconditionally move an accessed element to the front of
4614	  the list. The latter type is very useful in this library,
4615	  since there is no need to associate metadata with each element.
4616	  (See <xref linkend="biblio.andrew04mtf"/> 
4617	  </para>
4618
4619	</section>
4620
4621	<section xml:id="container.list.details.mapped">
4622	  <info><title>Use in Multimaps</title></info>
4623
4624	  <para>In this library, there are no equivalents for the standard's
4625	  multimaps and multisets; instead one uses an associative
4626	  container mapping primary keys to secondary keys.</para>
4627
4628	  <para>List-based containers are especially useful as associative
4629	  containers for secondary keys. In fact, they are implemented
4630	  here expressly for this purpose.</para>
4631
4632	  <para>To begin with, these containers use very little per-entry
4633	  structure memory overhead, since they can be implemented as
4634	  singly-linked lists. (Arrays use even lower per-entry memory
4635	  overhead, but they are less flexible in moving around entries,
4636	  and have weaker invalidation guarantees).</para>
4637
4638	  <para>More importantly, though, list-based containers use very
4639	  little per-container memory overhead. The memory overhead of an
4640	  empty list-based container is practically that of a pointer.
4641	  This is important for when they are used as secondary
4642	  associative-containers in situations where the average ratio of
4643	  secondary keys to primary keys is low (or even 1).</para>
4644
4645	  <para>In order to reduce the per-container memory overhead as much
4646	  as possible, they are implemented as closely as possible to
4647	  singly-linked lists.</para>
4648
4649	  <orderedlist>
4650	    <listitem>
4651	      <para>
4652		List-based containers do not store internally the number
4653		of values that they hold. This means that their <function>size</function>
4654		method has linear complexity (just like <classname>std::list</classname>).
4655		Note that finding the number of equivalent-key values in a
4656		standard multimap also has linear complexity (because it must be
4657		done,  via <function>std::distance</function> of the
4658		multimap's <function>equal_range</function> method), but usually with
4659		higher constants.
4660	      </para>
4661	    </listitem>
4662
4663	    <listitem>
4664	      <para>
4665		Most associative-container objects each hold a policy
4666		object (a hash-based container object holds a
4667		hash functor). List-based containers, conversely, only have
4668		class-wide policy objects.
4669	      </para>
4670	    </listitem>
4671	  </orderedlist>
4672
4673
4674	</section>
4675
4676      </section> <!-- details -->
4677
4678    </section> <!-- list -->
4679
4680
4681    <!-- priority_queue -->
4682    <section xml:id="pbds.design.container.priority_queue">
4683      <info><title>Priority Queue</title></info>
4684
4685      <section xml:id="container.priority_queue.interface">
4686	<info><title>Interface</title></info>
4687
4688	<para>The priority queue container has the following
4689	declaration:
4690	</para>
4691	<programlisting>
4692	  template&lt;typename  Value_Type,
4693	  typename  Cmp_Fn = std::less&lt;Value_Type&gt;,
4694	  typename  Tag = pairing_heap_tag,
4695	  typename  Allocator = std::allocator&lt;char &gt; &gt;
4696	  class priority_queue;
4697	</programlisting>
4698
4699	<para>The parameters have the following meaning:</para>
4700
4701	<orderedlist>
4702	  <listitem><para><classname>Value_Type</classname> is the value type.</para></listitem>
4703
4704	  <listitem><para><classname>Cmp_Fn</classname> is a value comparison functor</para></listitem>
4705
4706	  <listitem><para><classname>Tag</classname> specifies which underlying data structure
4707	  to use.</para></listitem>
4708
4709	  <listitem><para><classname>Allocator</classname> is an allocator
4710	  type.</para></listitem>
4711	</orderedlist>
4712
4713	<para>The <classname>Tag</classname> parameter specifies which underlying
4714	data structure to use. Instantiating it by<classname>pairing_heap_tag</classname>,<classname>binary_heap_tag</classname>,
4715	<classname>binomial_heap_tag</classname>,
4716	<classname>rc_binomial_heap_tag</classname>,
4717	or <classname>thin_heap_tag</classname>,
4718	specifies, respectively, 
4719	an underlying pairing heap (<xref linkend="biblio.fredman86pairing"/>),
4720	binary heap (<xref linkend="biblio.clrs2001"/>),
4721	binomial heap (<xref linkend="biblio.clrs2001"/>),
4722	a binomial heap with a redundant binary counter (<xref linkend="biblio.maverik_lowerbounds"/>),
4723	or a thin heap (<xref linkend="biblio.kt99fat_heaps"/>).
4724	</para>
4725
4726	<para>
4727	  As mentioned in the tutorial,
4728	  <classname>__gnu_pbds::priority_queue</classname> shares most of the
4729	  same interface with <classname>std::priority_queue</classname>.
4730	  E.g. if <varname>q</varname> is a priority queue of type
4731	  <classname>Q</classname>, then <function>q.top()</function> will
4732	  return the "largest" value in the container (according to
4733	  <classname>typename
4734	  Q::cmp_fn</classname>). <classname>__gnu_pbds::priority_queue</classname>
4735	  has a larger (and very slightly different) interface than
4736	  <classname>std::priority_queue</classname>, however, since typically
4737	  <classname>push</classname> and <classname>pop</classname> are deemed
4738	insufficient for manipulating priority-queues. </para>
4739
4740	<para>Different settings require different priority-queue
4741	implementations which are described in later; see traits
4742	discusses ways to differentiate between the different traits of
4743	different implementations.</para>
4744
4745
4746      </section>
4747
4748      <section xml:id="container.priority_queue.details">
4749	<info><title>Details</title></info>
4750
4751	<section xml:id="container.priority_queue.details.iterators">
4752	  <info><title>Iterators</title></info>
4753
4754	  <para>There are many different underlying-data structures for
4755	  implementing priority queues. Unfortunately, most such
4756	  structures are oriented towards making <function>push</function> and
4757	  <function>top</function> efficient, and consequently don't allow efficient
4758	  access of other elements: for instance, they cannot support an efficient
4759	  <function>find</function> method. In the use case where it
4760	  is important to both access and "do something with" an
4761	  arbitrary value, one would be out of luck. For example, many graph algorithms require
4762	  modifying a value (typically increasing it in the sense of the
4763	  priority queue's comparison functor).</para>
4764
4765	  <para>In order to access and manipulate an arbitrary value in a
4766	  priority queue, one needs to reference the internals of the
4767	  priority queue from some form of an associative container -
4768	  this is unavoidable. Of course, in order to maintain the
4769	  encapsulation of the priority queue, this needs to be done in a
4770	  way that minimizes exposure to implementation internals.</para>
4771
4772	  <para>In this library the priority queue's <function>insert</function>
4773	  method returns an iterator, which if valid can be used for subsequent <function>modify</function> and
4774	  <function>erase</function> operations. This both preserves the priority
4775	  queue's encapsulation, and allows accessing arbitrary values (since the
4776	  returned iterators from the <function>push</function> operation can be
4777	  stored in some form of associative container).</para>
4778
4779	  <para>Priority queues' iterators present a problem regarding their
4780	  invalidation guarantees. One assumes that calling
4781	  <function>operator++</function> on an iterator will associate it
4782	  with the "next" value. Priority-queues are
4783	  self-organizing: each operation changes what the "next" value
4784	  means. Consequently, it does not make sense that <function>push</function>
4785	  will return an iterator that can be incremented - this can have
4786	  no possible use. Also, as in the case of hash-based containers,
4787	  it is awkward to define if a subsequent <function>push</function> operation
4788	  invalidates a prior returned iterator: it invalidates it in the
4789	  sense that its "next" value is not related to what it
4790	  previously considered to be its "next" value. However, it might not
4791	  invalidate it, in the sense that it can be
4792	  de-referenced and used for <function>modify</function> and <function>erase</function>
4793	  operations.</para>
4794
4795	  <para>Similarly to the case of the other unordered associative
4796	  containers, this library uses a distinction between
4797	  point-type and range type iterators. A priority queue's <classname>iterator</classname> can always be
4798	  converted to a <classname>point_iterator</classname>, and a
4799	  <classname>const_iterator</classname> can always be converted to a
4800	  <classname>point_const_iterator</classname>.</para>
4801
4802	  <para>The following snippet demonstrates manipulating an arbitrary
4803	  value:</para>
4804	  <programlisting>
4805	    // A priority queue of integers.
4806	    priority_queue&lt;int &gt; p;
4807
4808	    // Insert some values into the priority queue.
4809	    priority_queue&lt;int &gt;::point_iterator it = p.push(0);
4810
4811	    p.push(1);
4812	    p.push(2);
4813
4814	    // Now modify a value.
4815	    p.modify(it, 3);
4816
4817	    assert(p.top() == 3);
4818	  </programlisting>
4819
4820	  
4821	  <para>It should be noted that an alternative design could embed an
4822	  associative container in a priority queue. Could, but most
4823	  probably should not. To begin with, it should be noted that one
4824	  could always encapsulate a priority queue and an associative
4825	  container mapping values to priority queue iterators with no
4826	  performance loss. One cannot, however, "un-encapsulate" a priority
4827	  queue embedding an associative container, which might lead to
4828	  performance loss. Assume, that one needs to associate each value
4829	  with some data unrelated to priority queues. Then using
4830	  this library's design, one could use an
4831	  associative container mapping each value to a pair consisting of
4832	  this data and a priority queue's iterator. Using the embedded
4833	  method would need to use two associative containers. Similar
4834	  problems might arise in cases where a value can reside
4835	  simultaneously in many priority queues.</para>
4836
4837	</section>
4838
4839
4840	<section xml:id="container.priority_queue.details.d">
4841	  <info><title>Underlying Data Structure</title></info>
4842
4843	  <para>There are three main implementations of priority queues: the
4844	  first employs a binary heap, typically one which uses a
4845	  sequence; the second uses a tree (or forest of trees), which is
4846	  typically less structured than an associative container's tree;
4847	  the third simply uses an associative container. These are
4848	  shown in the graphic below, in labels A1 and A2, label B, and label C.</para>
4849
4850	  <figure>
4851	    <title>Underlying Priority-Queue Data-Structures.</title>
4852	    <mediaobject>
4853	      <imageobject>
4854		<imagedata align="center" format="PNG" scale="100"
4855			   fileref="../images/pbds_priority_queue_different_underlying_dss.png"/>
4856	      </imageobject>
4857	      <textobject>
4858		<phrase>Underlying Priority-Queue Data-Structures.</phrase>
4859	      </textobject>
4860	    </mediaobject>
4861	  </figure>
4862
4863	  <para>Roughly speaking, any value that is both pushed and popped
4864	  from a priority queue must incur a logarithmic expense (in the
4865	  amortized sense). Any priority queue implementation that would
4866	  avoid this, would violate known bounds on comparison-based
4867	  sorting (see <xref linkend="biblio.clrs2001"/> and <xref linkend="biblio.brodal96priority"/>).
4868	  </para>
4869
4870	  <para>Most implementations do
4871	  not differ in the asymptotic amortized complexity of
4872	  <function>push</function> and <function>pop</function> operations, but they differ in
4873	  the constants involved, in the complexity of other operations
4874	  (e.g., <function>modify</function>), and in the worst-case
4875	  complexity of single operations. In general, the more
4876	  "structured" an implementation (i.e., the more internal
4877	  invariants it possesses) - the higher its amortized complexity
4878	  of <function>push</function> and <function>pop</function> operations.</para>
4879
4880	  <para>This library implements different algorithms using a
4881	  single class: <classname>priority_queue</classname>.
4882	  Instantiating the <classname>Tag</classname> template parameter, "selects"
4883	  the implementation:</para>
4884
4885	  <orderedlist>
4886	    <listitem><para>
4887	      Instantiating <classname>Tag = binary_heap_tag</classname> creates
4888	      a binary heap of the form in represented in the graphic with labels A1 or A2. The former is internally
4889	      selected by priority_queue
4890	      if <classname>Value_Type</classname> is instantiated by a primitive type
4891	      (e.g., an <type>int</type>); the latter is
4892	      internally selected for all other types (e.g.,
4893	      <classname>std::string</classname>). This implementations is relatively
4894	      unstructured, and so has good <classname>push</classname> and <classname>pop</classname>
4895	      performance; it is the "best-in-kind" for primitive
4896	      types, e.g., <type>int</type>s. Conversely, it has
4897	      high worst-case performance, and can support only linear-time
4898	    <function>modify</function> and <function>erase</function> operations.</para></listitem>
4899
4900	    <listitem><para>Instantiating <classname>Tag =
4901	    pairing_heap_tag</classname> creates a pairing heap of the form
4902	    in represented by label B in the graphic above. This
4903	    implementations too is relatively unstructured, and so has good
4904	    <function>push</function> and <function>pop</function>
4905	    performance; it is the "best-in-kind" for non-primitive types,
4906	    e.g., <classname>std:string</classname>s. It also has very good
4907	    worst-case <function>push</function> and
4908	    <function>join</function> performance (O(1)), but has high
4909	    worst-case <function>pop</function>
4910	    complexity.</para></listitem>
4911
4912	    <listitem><para>Instantiating <classname>Tag =
4913	    binomial_heap_tag</classname> creates a binomial heap of the
4914	    form repsented by label B in the graphic above. This
4915	    implementations is more structured than a pairing heap, and so
4916	    has worse <function>push</function> and <function>pop</function>
4917	    performance. Conversely, it has sub-linear worst-case bounds for
4918	    <function>pop</function>, e.g., and so it might be preferred in
4919	    cases where responsiveness is important.</para></listitem>
4920
4921	    <listitem><para>Instantiating <classname>Tag =
4922	    rc_binomial_heap_tag</classname> creates a binomial heap of the
4923	    form represented in label B above, accompanied by a redundant
4924	    counter which governs the trees. This implementations is
4925	    therefore more structured than a binomial heap, and so has worse
4926	    <function>push</function> and <function>pop</function>
4927	    performance. Conversely, it guarantees O(1)
4928	    <function>push</function> complexity, and so it might be
4929	    preferred in cases where the responsiveness of a binomial heap
4930	    is insufficient.</para></listitem>
4931
4932	    <listitem><para>Instantiating <classname>Tag =
4933	    thin_heap_tag</classname> creates a thin heap of the form
4934	    represented by the label B in the graphic above. This
4935	    implementations too is more structured than a pairing heap, and
4936	    so has worse <function>push</function> and
4937	    <function>pop</function> performance. Conversely, it has better
4938	    worst-case and identical amortized complexities than a Fibonacci
4939	    heap, and so might be more appropriate for some graph
4940	    algorithms.</para></listitem>
4941	  </orderedlist>
4942
4943	  <para>Of course, one can use any order-preserving associative
4944	  container as a priority queue, as in the graphic above label C, possibly by creating an adapter class
4945	  over the associative container (much as 
4946	  <classname>std::priority_queue</classname> can adapt <classname>std::vector</classname>).
4947	  This has the advantage that no cross-referencing is necessary
4948	  at all; the priority queue itself is an associative container.
4949	  Most associative containers are too structured to compete with
4950	  priority queues in terms of <function>push</function> and <function>pop</function>
4951	  performance.</para>
4952
4953
4954
4955	</section>
4956
4957	<section xml:id="container.priority_queue.details.traits">
4958	  <info><title>Traits</title></info>
4959
4960	  <para>It would be nice if all priority queues could
4961	  share exactly the same behavior regardless of implementation. Sadly, this is not possible. Just one for instance is in join operations: joining
4962	  two binary heaps might throw an exception (not corrupt
4963	  any of the heaps on which it operates), but joining two pairing
4964	  heaps is exception free.</para>
4965
4966	  <para>Tags and traits are very useful for manipulating generic
4967	  types. <classname>__gnu_pbds::priority_queue</classname>
4968	  publicly defines <classname>container_category</classname> as one of the tags. Given any
4969	  container <classname>Cntnr</classname>, the tag of the underlying
4970	  data structure can be found via <classname>typename 
4971	  Cntnr::container_category</classname>; this is one of the possible tags shown in the graphic below.
4972	  </para>
4973
4974	  <figure>
4975	    <title>Priority-Queue Data-Structure Tags.</title>
4976	    <mediaobject>
4977	      <imageobject>
4978		<imagedata align="center" format="PNG" scale="100"
4979                 fileref="../images/pbds_priority_queue_tag_hierarchy.png"/>
4980	      </imageobject>
4981	      <textobject>
4982		<phrase>Priority-Queue Data-Structure Tags.</phrase>
4983	      </textobject>
4984	    </mediaobject>
4985	  </figure>
4986
4987
4988	  <para>Additionally, a traits mechanism can be used to query a
4989	  container type for its attributes. Given any container
4990	  <classname>Cntnr</classname>, then <programlisting>__gnu_pbds::container_traits&lt;Cntnr&gt;</programlisting>
4991	  is a traits class identifying the properties of the
4992	  container.</para>
4993
4994	  <para>To find if a container might throw if two of its objects are
4995	  joined, one can use 
4996	  <programlisting>
4997	    container_traits&lt;Cntnr&gt;::split_join_can_throw
4998	  </programlisting>
4999	  </para>
5000
5001	  <para>
5002	    Different priority-queue implementations have different invalidation guarantees. This is
5003	    especially important, since there is no way to access an arbitrary
5004	    value of priority queues except for iterators. Similarly to
5005	    associative containers, one can use
5006	    <programlisting>
5007	      container_traits&lt;Cntnr&gt;::invalidation_guarantee
5008	    </programlisting>
5009	  to get the invalidation guarantee type of a priority queue.</para>
5010
5011	  <para>It is easy to understand from the graphic above, what <classname>container_traits&lt;Cntnr&gt;::invalidation_guarantee</classname>
5012	  will be for different implementations. All implementations of
5013	  type represented by label B have <classname>point_invalidation_guarantee</classname>:
5014	  the container can freely internally reorganize the nodes -
5015	  range-type iterators are invalidated, but point-type iterators
5016	  are always valid. Implementations of type represented by labels A1 and A2 have <classname>basic_invalidation_guarantee</classname>:
5017	  the container can freely internally reallocate the array - both
5018	  point-type and range-type iterators might be invalidated.</para>
5019
5020	  <para>
5021	    This has major implications, and constitutes a good reason to avoid
5022	    using binary heaps. A binary heap can perform <function>modify</function>
5023	    or <function>erase</function> efficiently given a valid point-type
5024	    iterator. However, in order to supply it with a valid point-type
5025	    iterator, one needs to iterate (linearly) over all
5026	    values, then supply the relevant iterator (recall that a
5027	    range-type iterator can always be converted to a point-type
5028	    iterator). This means that if the number of <function>modify</function> or
5029	    <function>erase</function> operations is non-negligible (say
5030	    super-logarithmic in the total sequence of operations) - binary
5031	    heaps will perform badly.
5032	  </para>
5033
5034	</section>
5035
5036      </section> <!-- details -->
5037
5038    </section> <!-- priority_queue -->
5039
5040
5041
5042  </section> <!-- container -->
5043
5044  </section> <!-- design -->
5045
5046
5047
5048  <!-- S04: Test -->
5049  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" parse="xml"
5050	      href="test_policy_data_structures.xml">
5051  </xi:include>
5052
5053  <!-- S05: Reference/Acknowledgments -->
5054  <section xml:id="pbds.ack">
5055    <info><title>Acknowledgments</title></info>
5056    <?dbhtml filename="policy_data_structures_ack.html"?>
5057
5058    <para>
5059      Written by Ami Tavory and Vladimir Dreizin (IBM Haifa Research
5060      Laboratories), and Benjamin Kosnik (Red Hat).
5061    </para>
5062
5063    <para>
5064      This library was partially written at IBM's Haifa Research Labs.
5065      It is based heavily on policy-based design and uses many useful
5066      techniques from Modern C++ Design: Generic Programming and Design
5067      Patterns Applied by Andrei Alexandrescu.
5068    </para>
5069
5070    <para>
5071      Two ideas are borrowed from the SGI-STL implementation:
5072    </para>
5073
5074    <orderedlist>
5075      <listitem>
5076	<para>
5077	  The prime-based resize policies use a list of primes taken from
5078	  the SGI-STL implementation.
5079	</para>
5080      </listitem>
5081
5082      <listitem>
5083	<para>
5084	  The red-black trees contain both a root node and a header node
5085	  (containing metadata), connected in a way that forward and
5086	  reverse iteration can be performed efficiently.
5087	</para>
5088      </listitem>
5089    </orderedlist>
5090
5091    <para>
5092      Some test utilities borrow ideas from
5093      <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.boost.org/doc/libs/release/libs/timer/index.html">boost::timer</link>.
5094    </para>
5095
5096    <para>
5097      We would like to thank Scott Meyers for useful comments (without
5098      attributing to him any flaws in the design or implementation of the
5099      library).
5100    </para>
5101    <para>We would like to thank Matt Austern for the suggestion to
5102    include tries.</para>
5103  </section>
5104
5105  <!-- S06: Biblio -->
5106<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" parse="xml"
5107	    href="policy_data_structures_biblio.xml">
5108</xi:include>
5109
5110</chapter>
5111