1<chapter xmlns="http://docbook.org/ns/docbook" version="5.0"
2	 xml:id="manual.ext.containers.pbds" xreflabel="pbds">
3  <info>
4    <title>Policy-Based Data Structures</title>
5    <keywordset>
6      <keyword>ISO C++</keyword>
7      <keyword>policy</keyword>
8      <keyword>container</keyword>
9      <keyword>data</keyword>
10      <keyword>structure</keyword>
11      <keyword>associated</keyword>
12      <keyword>tree</keyword>
13      <keyword>trie</keyword>
14      <keyword>hash</keyword>
15      <keyword>metaprogramming</keyword>
16    </keywordset>
17  </info>
18  <?dbhtml filename="policy_data_structures.html"?>
19
20  <!-- 2006-04-01 Ami Tavory -->
21  <!-- 2011-05-25 Benjamin Kosnik -->
22
23  <!-- S01: intro -->
24  <section xml:id="pbds.intro">
25    <info><title>Intro</title></info>
26
27    <para>
28      This is a library of policy-based elementary data structures:
29      associative containers and priority queues. It is designed for
30      high-performance, flexibility, semantic safety, and conformance to
31      the corresponding containers in <literal>std</literal> and
32      <literal>std::tr1</literal> (except for some points where it differs
33      by design).
34    </para>
35    <para>
36    </para>
37
38    <section xml:id="pbds.intro.issues">
39      <info><title>Performance Issues</title></info>
40      <para>
41      </para>
42
43      <para>
44	An attempt is made to categorize the wide variety of possible
45	container designs in terms of performance-impacting factors. These
46	performance factors are translated into design policies and
47	incorporated into container design.
48      </para>
49
50      <para>
51	There is tension between unravelling factors into a coherent set of
52	policies. Every attempt is made to make a minimal set of
53	factors. However, in many cases multiple factors make for long
54	template names. Every attempt is made to alias and use typedefs in
55	the source files, but the generated names for external symbols can
56	be large for binary files or debuggers.
57      </para>
58
59      <para>
60	In many cases, the longer names allow capabilities and behaviours
61	controlled by macros to also be unamibiguously emitted as distinct
62	generated names.
63      </para>
64
65      <para>
66	Specific issues found while unraveling performance factors in the
67	design of associative containers and priority queues follow.
68      </para>
69
70      <section xml:id="pbds.intro.issues.associative">
71	<info><title>Associative</title></info>
72
73	<para>
74	  Associative containers depend on their composite policies to a very
75	  large extent. Implicitly hard-wiring policies can hamper their
76	  performance and limit their functionality. An efficient hash-based
77	  container, for example, requires policies for testing key
78	  equivalence, hashing keys, translating hash values into positions
79	  within the hash table, and determining when and how to resize the
80	  table internally. A tree-based container can efficiently support
81	  order statistics, i.e. the ability to query what is the order of
82	  each key within the sequence of keys in the container, but only if
83	  the container is supplied with a policy to internally update
84	  meta-data. There are many other such examples.
85	</para>
86
87	<para>
88	  Ideally, all associative containers would share the same
89	  interface. Unfortunately, underlying data structures and mapping
90	  semantics differentiate between different containers. For example,
91	  suppose one writes a generic function manipulating an associative
92	  container.
93	</para>
94
95	<programlisting>
96	  template&lt;typename Cntnr&gt;
97	  void
98	  some_op_sequence(Cntnr&amp; r_cnt)
99	  {
100	  ...
101	  }
102	</programlisting>
103
104	<para>
105	  Given this, then what can one assume about the instantiating
106	  container? The answer varies according to its underlying data
107	  structure. If the underlying data structure of
108	  <literal>Cntnr</literal> is based on a tree or trie, then the order
109	  of elements is well defined; otherwise, it is not, in general. If
110	  the underlying data structure of <literal>Cntnr</literal> is based
111	  on a collision-chaining hash table, then modifying
112	  r_<literal>Cntnr</literal> will not invalidate its iterators' order;
113	  if the underlying data structure is a probing hash table, then this
114	  is not the case. If the underlying data structure is based on a tree
115	  or trie, then a reference to the container can efficiently be split;
116	  otherwise, it cannot, in general. If the underlying data structure
117	  is a red-black tree, then splitting a reference to the container is
118	  exception-free; if it is an ordered-vector tree, exceptions can be
119	  thrown.
120	</para>
121
122      </section>
123
124      <section xml:id="pbds.intro.issues.priority_queue">
125	<info><title>Priority Que</title></info>
126
127	<para>
128	  Priority queues are useful when one needs to efficiently access a
129	  minimum (or maximum) value as the set of values changes.
130	</para>
131
132	<para>
133	  Most useful data structures for priority queues have a relatively
134	  simple structure, as they are geared toward relatively simple
135	  requirements. Unfortunately, these structures do not support access
136	  to an arbitrary value, which turns out to be necessary in many
137	  algorithms. Say, decreasing an arbitrary value in a graph
138	  algorithm. Therefore, some extra mechanism is necessary and must be
139	  invented for accessing arbitrary values. There are at least two
140	  alternatives: embedding an associative container in a priority
141	  queue, or allowing cross-referencing through iterators. The first
142	  solution adds significant overhead; the second solution requires a
143	  precise definition of iterator invalidation. Which is the next
144	  point...
145	</para>
146
147	<para>
148	  Priority queues, like hash-based containers, store values in an
149	  order that is meaningless and undefined externally. For example, a
150	  <code>push</code> operation can internally reorganize the
151	  values. Because of this characteristic, describing a priority
152	  queues' iterator is difficult: on one hand, the values to which
153	  iterators point can remain valid, but on the other, the logical
154	  order of iterators can change unpredictably.
155	</para>
156
157	<para>
158	  Roughly speaking, any element that is both inserted to a priority
159	  queue (e.g. through <code>push</code>) and removed
160	  from it (e.g., through <code>pop</code>), incurs a
161	  logarithmic overhead (in the amortized sense). Different underlying
162	  data structures place the actual cost differently: some are
163	  optimized for amortized complexity, whereas others guarantee that
164	  specific operations only have a constant cost. One underlying data
165	  structure might be chosen if modifying a value is frequent
166	  (Dijkstra's shortest-path algorithm), whereas a different one might
167	  be chosen otherwise. Unfortunately, an array-based binary heap - an
168	  underlying data structure that optimizes (in the amortized sense)
169	  <code>push</code> and <code>pop</code> operations, differs from the
170	  others in terms of its invalidation guarantees. Other design
171	  decisions also impact the cost and placement of the overhead, at the
172	  expense of more difference in the kinds of operations that the
173	  underlying data structure can support. These differences pose a
174	  challenge when creating a uniform interface for priority queues.
175	</para>
176      </section>
177    </section>
178
179    <section xml:id="pbds.intro.motivation">
180      <info><title>Goals</title></info>
181
182      <para>
183	Many fine associative-container libraries were already written,
184	most notably, the C++ standard's associative containers. Why
185	then write another library? This section shows some possible
186	advantages of this library, when considering the challenges in
187	the introduction. Many of these points stem from the fact that
188	the ISO C++ process introduced associative-containers in a
189	two-step process (first standardizing tree-based containers,
190	only then adding hash-based containers, which are fundamentally
191	different), did not standardize priority queues as containers,
192	and (in our opinion) overloads the iterator concept.
193      </para>
194
195      <section xml:id="pbds.intro.motivation.associative">
196	<info><title>Associative</title></info>
197	<para>
198	</para>
199
200	<section xml:id="motivation.associative.policy">
201	  <info><title>Policy Choices</title></info>
202	  <para>
203	    Associative containers require a relatively large number of
204	    policies to function efficiently in various settings. In some
205	    cases this is needed for making their common operations more
206	    efficient, and in other cases this allows them to support a
207	    larger set of operations
208	  </para>
209
210	  <orderedlist>
211	    <listitem>
212	      <para>
213		Hash-based containers, for example, support look-up and
214		insertion methods (<function>find</function> and
215		<function>insert</function>). In order to locate elements
216		quickly, they are supplied a hash functor, which instruct
217		how to transform a key object into some size type; a hash
218		functor might transform <constant>"hello"</constant>
219		into <constant>1123002298</constant>. A hash table, though,
220		requires transforming each key object into some size-type
221		type in some specific domain; a hash table with a 128-long
222		table might transform <constant>"hello"</constant> into
223		position <constant>63</constant>. The policy by which the
224		hash value is transformed into a position within the table
225		can dramatically affect performance.  Hash-based containers
226		also do not resize naturally (as opposed to tree-based
227		containers, for example). The appropriate resize policy is
228		unfortunately intertwined with the policy that transforms
229		hash value into a position within the table.
230	      </para>
231	    </listitem>
232
233	    <listitem>
234	      <para>
235		Tree-based containers, for example, also support look-up and
236		insertion methods, and are primarily useful when maintaining
237		order between elements is important. In some cases, though,
238		one can utilize their balancing algorithms for completely
239		different purposes.
240	      </para>
241
242	      <para>
243		Figure A shows a tree whose each node contains two entries:
244		a floating-point key, and some size-type
245		<emphasis>metadata</emphasis> (in bold beneath it) that is
246		the number of nodes in the sub-tree. (The root has key 0.99,
247		and has 5 nodes (including itself) in its sub-tree.) A
248		container based on this data structure can obviously answer
249		efficiently whether 0.3 is in the container object, but it
250		can also answer what is the order of 0.3 among all those in
251		the container object: see <xref linkend="biblio.clrs2001"/>.
252
253	      </para>
254
255	      <para>
256		As another example, Figure B shows a tree whose each node
257		contains two entries: a half-open geometric line interval,
258		and a number <emphasis>metadata</emphasis> (in bold beneath
259		it) that is the largest endpoint of all intervals in its
260		sub-tree.  (The root describes the interval <constant>[20,
261		36)</constant>, and the largest endpoint in its sub-tree is
262		99.) A container based on this data structure can obviously
263		answer efficiently whether <constant>[3, 41)</constant> is
264		in the container object, but it can also answer efficiently
265		whether the container object has intervals that intersect
266		<constant>[3, 41)</constant>. These types of queries are
267		very useful in geometric algorithms and lease-management
268		algorithms.
269	      </para>
270
271	      <para>
272		It is important to note, however, that as the trees are
273		modified, their internal structure changes. To maintain
274		these invariants, one must supply some policy that is aware
275		of these changes.  Without this, it would be better to use a
276		linked list (in itself very efficient for these purposes).
277	      </para>
278
279	    </listitem>
280	  </orderedlist>
281
282	  <figure>
283	    <title>Node Invariants</title>
284	    <mediaobject>
285	      <imageobject>
286		<imagedata align="center" format="PNG" scale="100"
287			   fileref="../images/pbds_node_invariants.png"/>
288	      </imageobject>
289	      <textobject>
290		<phrase>Node Invariants</phrase>
291	      </textobject>
292	    </mediaobject>
293	  </figure>
294
295	</section>
296
297	<section xml:id="motivation.associative.underlying">
298	  <info><title>Underlying Data Structures</title></info>
299	  <para>
300	    The standard C++ library contains associative containers based on
301	    red-black trees and collision-chaining hash tables. These are
302	    very useful, but they are not ideal for all types of
303	    settings.
304	  </para>
305
306	  <para>
307	    The figure below shows the different underlying data structures
308	    currently supported in this library.
309	  </para>
310
311	  <figure>
312	    <title>Underlying Associative Data Structures</title>
313	    <mediaobject>
314	      <imageobject>
315		<imagedata align="center" format="PNG" scale="100"
316			   fileref="../images/pbds_different_underlying_dss_1.png"/>
317	      </imageobject>
318	      <textobject>
319		<phrase>Underlying Associative Data Structures</phrase>
320	      </textobject>
321	    </mediaobject>
322	  </figure>
323
324	  <para>
325	    A shows a collision-chaining hash-table, B shows a probing
326	    hash-table, C shows a red-black tree, D shows a splay tree, E shows
327	    a tree based on an ordered vector(implicit in the order of the
328	    elements), F shows a PATRICIA trie, and G shows a list-based
329	    container with update policies.
330	  </para>
331
332	  <para>
333	    Each of these data structures has some performance benefits, in
334	    terms of speed, size or both. For now, note that vector-based trees
335	    and probing hash tables manipulate memory more efficiently than
336	    red-black trees and collision-chaining hash tables, and that
337	    list-based associative containers are very useful for constructing
338	    "multimaps".
339	  </para>
340
341	  <para>
342	    Now consider a function manipulating a generic associative
343	    container,
344	  </para>
345	  <programlisting>
346	    template&lt;class Cntnr&gt;
347	    int
348	    some_op_sequence(Cntnr &amp;r_cnt)
349	    {
350	    ...
351	    }
352	  </programlisting>
353
354	  <para>
355	    Ideally, the underlying data structure
356	    of <classname>Cntnr</classname> would not affect what can be
357	    done with <varname>r_cnt</varname>.  Unfortunately, this is not
358	    the case.
359	  </para>
360
361	  <para>
362	    For example, if <classname>Cntnr</classname>
363	    is <classname>std::map</classname>, then the function can
364	    use
365	  </para>
366	  <programlisting>
367	    std::for_each(r_cnt.find(foo), r_cnt.find(bar), foobar)
368	  </programlisting>
369	  <para>
370	    in order to apply <classname>foobar</classname> to all
371	    elements between <classname>foo</classname> and
372	    <classname>bar</classname>. If
373	    <classname>Cntnr</classname> is a hash-based container,
374	    then this call's results are undefined.
375	  </para>
376
377	  <para>
378	    Also, if <classname>Cntnr</classname> is tree-based, the type
379	    and object of the comparison functor can be
380	    accessed. If <classname>Cntnr</classname> is hash based, these
381	    queries are nonsensical.
382	  </para>
383
384	  <para>
385	    There are various other differences based on the container's
386	    underlying data structure. For one, they can be constructed by,
387	    and queried for, different policies. Furthermore:
388	  </para>
389
390	  <orderedlist>
391	    <listitem>
392	      <para>
393		Containers based on C, D, E and F store elements in a
394		meaningful order; the others store elements in a meaningless
395		(and probably time-varying) order. By implication, only
396		containers based on C, D, E and F can
397		support <function>erase</function> operations taking an
398		iterator and returning an iterator to the following element
399		without performance loss.
400	      </para>
401	    </listitem>
402
403	    <listitem>
404	      <para>
405		Containers based on C, D, E, and F can be split and joined
406		efficiently, while the others cannot. Containers based on C
407		and D, furthermore, can guarantee that this is exception-free;
408		containers based on E cannot guarantee this.
409	      </para>
410	    </listitem>
411
412	    <listitem>
413	      <para>
414		Containers based on all but E can guarantee that
415		erasing an element is exception free; containers based on E
416		cannot guarantee this. Containers based on all but B and E
417		can guarantee that modifying an object of their type does
418		not invalidate iterators or references to their elements,
419		while containers based on B and E cannot. Containers based
420		on C, D, and E can furthermore make a stronger guarantee,
421		namely that modifying an object of their type does not
422		affect the order of iterators.
423	      </para>
424	    </listitem>
425	  </orderedlist>
426
427	  <para>
428	    A unified tag and traits system (as used for the C++ standard
429	    library iterators, for example) can ease generic manipulation of
430	    associative containers based on different underlying data
431	    structures.
432	  </para>
433
434	</section>
435
436	<section xml:id="motivation.associative.iterators">
437	  <info><title>Iterators</title></info>
438	  <para>
439	    Iterators are centric to the design of the standard library
440	    containers, because of the container/algorithm/iterator
441	    decomposition that allows an algorithm to operate on a range
442	    through iterators of some sequence.  Iterators, then, are useful
443	    because they allow going over a
444	    specific <emphasis>sequence</emphasis>.  The standard library
445	    also uses iterators for accessing a
446	    specific <emphasis>element</emphasis>: when an associative
447	    container returns one through <function>find</function>. The
448	    standard library consistently uses the same types of iterators
449	    for both purposes: going over a range, and accessing a specific
450	    found element. Before the introduction of hash-based containers
451	    to the standard library, this made sense (with the exception of
452	    priority queues, which are discussed later).
453	  </para>
454
455	  <para>
456	    Using the standard associative containers together with
457	    non-order-preserving associative containers (and also because of
458	    priority-queues container), there is a possible need for
459	    different types of iterators for self-organizing containers:
460	    the iterator concept seems overloaded to mean two different
461	    things (in some cases). <!-- <remark> XXX
462	    "ds_gen.html#find_range">Design::Associative
463	    Containers::Data-Structure Genericity::Point-Type and Range-Type
464	    Methods</remark>. -->
465	  </para>
466
467	  <section xml:id="associative.iterators.using">
468	    <info>
469	      <title>Using Point Iterators for Range Operations</title>
470	    </info>
471	    <para>
472	      Suppose <classname>cntnr</classname> is some associative
473	      container, and say <varname>c</varname> is an object of
474	      type <classname>cntnr</classname>. Then what will be the outcome
475	      of
476	    </para>
477
478	    <programlisting>
479	      std::for_each(c.find(1), c.find(5), foo);
480	    </programlisting>
481
482	    <para>
483	      If <classname>cntnr</classname> is a tree-based container
484	      object, then an in-order walk will
485	      apply <classname>foo</classname> to the relevant elements,
486	      as in the graphic below, label A. If <varname>c</varname> is
487	      a hash-based container, then the order of elements between any
488	      two elements is undefined (and probably time-varying); there is
489	      no guarantee that the elements traversed will coincide with the
490	      <emphasis>logical</emphasis> elements between 1 and 5, as in
491	      label B.
492	    </para>
493
494	    <figure>
495	      <title>Range Iteration in Different Data Structures</title>
496	      <mediaobject>
497		<imageobject>
498		  <imagedata align="center" format="PNG" scale="100"
499			     fileref="../images/pbds_point_iterators_range_ops_1.png"/>
500		</imageobject>
501		<textobject>
502		  <phrase>Node Invariants</phrase>
503		</textobject>
504	      </mediaobject>
505	    </figure>
506
507	    <para>
508	      In our opinion, this problem is not caused just because
509	      red-black trees are order preserving while
510	      collision-chaining hash tables are (generally) not - it
511	      is more fundamental. Most of the standard's containers
512	      order sequences in a well-defined manner that is
513	      determined by their <emphasis>interface</emphasis>:
514	      calling <function>insert</function> on a tree-based
515	      container modifies its sequence in a predictable way, as
516	      does calling <function>push_back</function> on a list or
517	      a vector. Conversely, collision-chaining hash tables,
518	      probing hash tables, priority queues, and list-based
519	      containers (which are very useful for "multimaps") are
520	      self-organizing data structures; the effect of each
521	      operation modifies their sequences in a manner that is
522	      (practically) determined by their
523	      <emphasis>implementation</emphasis>.
524	    </para>
525
526	    <para>
527	      Consequently, applying an algorithm to a sequence obtained from most
528	      containers may or may not make sense, but applying it to a
529	      sub-sequence of a self-organizing container does not.
530	    </para>
531	  </section>
532
533	  <section xml:id="associative.iterators.cost">
534	    <info>
535	      <title>Cost to Point Iterators to Enable Range Operations</title>
536	    </info>
537	    <para>
538	      Suppose <varname>c</varname> is some collision-chaining
539	      hash-based container object, and one calls
540	    </para>
541	    <programlisting>c.find(3)</programlisting>
542	    <para>
543	      Then what composes the returned iterator?
544	    </para>
545
546	    <para>
547	      In the graphic below, label A shows the simplest (and
548	      most efficient) implementation of a collision-chaining
549	      hash table.  The little box marked
550	      <classname>point_iterator</classname> shows an object
551	      that contains a pointer to the element's node. Note that
552	      this "iterator" has no way to move to the next element (
553	      it cannot support
554	      <function>operator++</function>). Conversely, the little
555	      box marked <classname>iterator</classname> stores both a
556	      pointer to the element, as well as some other
557	      information (the bucket number of the element). the
558	      second iterator, then, is "heavier" than the first one-
559	      it requires more time and space. If we were to use a
560	      different container to cross-reference into this
561	      hash-table using these iterators - it would take much
562	      more space. As noted above, nothing much can be done by
563	      incrementing these iterators, so why is this extra
564	      information needed?
565	    </para>
566
567	    <para>
568	      Alternatively, one might create a collision-chaining hash-table
569	      where the lists might be linked, forming a monolithic total-element
570	      list, as in the graphic below, label B.  Here the iterators are as
571	      light as can be, but the hash-table's operations are more
572	      complicated.
573	    </para>
574
575	    <figure>
576	      <title>Point Iteration in Hash Data Structures</title>
577	      <mediaobject>
578		<imageobject>
579		  <imagedata align="center" format="PNG" scale="100"
580			     fileref="../images/pbds_point_iterators_range_ops_2.png"/>
581		</imageobject>
582		<textobject>
583		  <phrase>Point Iteration in Hash Data Structures</phrase>
584		</textobject>
585	      </mediaobject>
586	    </figure>
587
588	    <para>
589	      It should be noted that containers based on collision-chaining
590	      hash-tables are not the only ones with this type of behavior;
591	      many other self-organizing data structures display it as well.
592	    </para>
593	  </section>
594
595	  <section xml:id="associative.iterators.invalidation">
596	    <info><title>Invalidation Guarantees</title></info>
597	    <para>Consider the following snippet:</para>
598	    <programlisting>
599	      it = c.find(3);
600	      c.erase(5);
601	    </programlisting>
602
603	    <para>
604	      Following the call to <classname>erase</classname>, what is the
605	      validity of <classname>it</classname>: can it be de-referenced?
606	      can it be incremented?
607	    </para>
608
609	    <para>
610	      The answer depends on the underlying data structure of the
611	      container. The graphic below shows three cases: A1 and A2 show
612	      a red-black tree; B1 and B2 show a probing hash-table; C1 and C2
613	      show a collision-chaining hash table.
614	    </para>
615
616	    <figure>
617	      <title>Effect of erase in different underlying data structures</title>
618	      <mediaobject>
619		<imageobject>
620		  <imagedata align="center" format="PNG" scale="100"
621			     fileref="../images/pbds_invalidation_guarantee_erase.png"/>
622		</imageobject>
623		<textobject>
624		  <phrase>Effect of erase in different underlying data structures</phrase>
625		</textobject>
626	      </mediaobject>
627	    </figure>
628
629	    <orderedlist>
630	      <listitem>
631		<para>
632		  Erasing 5 from A1 yields A2. Clearly, an iterator to 3 can
633		  be de-referenced and incremented. The sequence of iterators
634		  changed, but in a way that is well-defined by the interface.
635		</para>
636	      </listitem>
637
638	      <listitem>
639		<para>
640		  Erasing 5 from B1 yields B2. Clearly, an iterator to 3 is
641		  not valid at all - it cannot be de-referenced or
642		  incremented; the order of iterators changed in a way that is
643		  (practically) determined by the implementation and not by
644		  the interface.
645		</para>
646	      </listitem>
647
648	      <listitem>
649		<para>
650		  Erasing 5 from C1 yields C2. Here the situation is more
651		  complicated. On the one hand, there is no problem in
652		  de-referencing <classname>it</classname>. On the other hand,
653		  the order of iterators changed in a way that is
654		  (practically) determined by the implementation and not by
655		  the interface.
656		</para>
657	      </listitem>
658	    </orderedlist>
659
660	    <para>
661	      So in the standard library containers, it is not always possible
662	      to express whether <varname>it</varname> is valid or not. This
663	      is true also for <function>insert</function>. Again, the
664	      iterator concept seems overloaded.
665	    </para>
666	  </section>
667	</section> <!--iterators-->
668
669
670	<section xml:id="motivation.associative.functions">
671	  <info><title>Functional</title></info>
672	  <para>
673	  </para>
674
675	  <para>
676	    The design of the functional overlay to the underlying data
677	    structures differs slightly from some of the conventions used in
678	    the C++ standard.  A strict public interface of methods that
679	    comprise only operations which depend on the class's internal
680	    structure; other operations are best designed as external
681	    functions. (See <xref linkend="biblio.meyers02both"/>).With this
682	    rubric, the standard associative containers lack some useful
683	    methods, and provide other methods which would be better
684	    removed.
685	  </para>
686
687	  <section xml:id="motivation.associative.functions.erase">
688	    <info><title><function>erase</function></title></info>
689
690	    <orderedlist>
691	      <listitem>
692		<para>
693		  Order-preserving standard associative containers provide the
694		  method
695		</para>
696		<programlisting>
697		  iterator
698		  erase(iterator it)
699		</programlisting>
700
701		<para>
702		  which takes an iterator, erases the corresponding
703		  element, and returns an iterator to the following
704		  element. Also standardd hash-based associative
705		  containers provide this method. This seemingly
706		  increasesgenericity between associative containers,
707		  since it is possible to use
708		</para>
709		<programlisting>
710		  typename C::iterator it = c.begin();
711		  typename C::iterator e_it = c.end();
712
713		  while(it != e_it)
714		  it = pred(*it)? c.erase(it) : ++it;
715		</programlisting>
716
717		<para>
718		  in order to erase from a container object <varname>
719		  c</varname> all element which match a
720		  predicate <classname>pred</classname>. However, in a
721		  different sense this actually decreases genericity: an
722		  integral implication of this method is that tree-based
723		  associative containers' memory use is linear in the total
724		  number of elements they store, while hash-based
725		  containers' memory use is unbounded in the total number of
726		  elements they store. Assume a hash-based container is
727		  allowed to decrease its size when an element is
728		  erased. Then the elements might be rehashed, which means
729		  that there is no "next" element - it is simply
730		  undefined. Consequently, it is possible to infer from the
731		  fact that the standard library's hash-based containers
732		  provide this method that they cannot downsize when
733		  elements are erased. As a consequence, different code is
734		  needed to manipulate different containers, assuming that
735		  memory should be conserved. Therefor, this library's
736		  non-order preserving associative containers omit this
737		  method.
738		</para>
739	      </listitem>
740
741	      <listitem>
742		<para>
743		  All associative containers include a conditional-erase method
744		</para>
745		<programlisting>
746		  template&lt;
747		  class Pred&gt;
748		  size_type
749		  erase_if
750		  (Pred pred)
751		</programlisting>
752		<para>
753		  which erases all elements matching a predicate. This is probably the
754		  only way to ensure linear-time multiple-item erase which can
755		  actually downsize a container.
756		</para>
757	      </listitem>
758
759	      <listitem>
760		<para>
761		  The standard associative containers provide methods for
762		  multiple-item erase of the form
763		</para>
764		<programlisting>
765		  size_type
766		  erase(It b, It e)
767		</programlisting>
768		<para>
769		  erasing a range of elements given by a pair of
770		  iterators. For tree-based or trie-based containers, this can
771		  implemented more efficiently as a (small) sequence of split
772		  and join operations. For other, unordered, containers, this
773		  method isn't much better than an external loop. Moreover,
774		  if <varname>c</varname> is a hash-based container,
775		  then
776		</para>
777		<programlisting>
778		  c.erase(c.find(2), c.find(5))
779		</programlisting>
780		<para>
781		  is almost certain to do something
782		  different than erasing all elements whose keys are between 2
783		  and 5, and is likely to produce other undefined behavior.
784		</para>
785	      </listitem>
786	    </orderedlist>
787	  </section> <!-- erase -->
788
789	  <section xml:id="motivation.associative.functions.split">
790	    <info>
791	      <title>
792		<function>split</function> and <function>join</function>
793	      </title>
794	    </info>
795	    <para>
796	      It is well-known that tree-based and trie-based container
797	      objects can be efficiently split or joined (See
798	      <xref linkend="biblio.clrs2001"/>). Externally splitting or
799	      joining trees is super-linear, and, furthermore, can throw
800	      exceptions. Split and join methods, consequently, seem good
801	      choices for tree-based container methods, especially, since as
802	      noted just before, they are efficient replacements for erasing
803	      sub-sequences.
804	    </para>
805
806	  </section> <!-- split -->
807
808	  <section xml:id="motivation.associative.functions.insert">
809	    <info>
810	      <title>
811		<function>insert</function>
812	      </title>
813	    </info>
814	    <para>
815	      The standard associative containers provide methods of the form
816	    </para>
817	    <programlisting>
818	      template&lt;class It&gt;
819	      size_type
820	      insert(It b, It e);
821	    </programlisting>
822
823	    <para>
824	      for inserting a range of elements given by a pair of
825	      iterators. At best, this can be implemented as an external loop,
826	      or, even more efficiently, as a join operation (for the case of
827	      tree-based or trie-based containers). Moreover, these methods seem
828	      similar to constructors taking a range given by a pair of
829	      iterators; the constructors, however, are transactional, whereas
830	      the insert methods are not; this is possibly confusing.
831	    </para>
832
833	  </section> <!-- insert -->
834
835	  <section xml:id="motivation.associative.functions.compare">
836	    <info>
837	      <title>
838		<function>operator==</function> and <function>operator&lt;=</function>
839	      </title>
840	    </info>
841
842	    <para>
843	      Associative containers are parametrized by policies allowing to
844	      test key equivalence: a hash-based container can do this through
845	      its equivalence functor, and a tree-based container can do this
846	      through its comparison functor. In addition, some standard
847	      associative containers have global function operators, like
848	      <function>operator==</function> and <function>operator&lt;=</function>,
849	      that allow comparing entire associative containers.
850	    </para>
851
852	    <para>
853	      In our opinion, these functions are better left out. To begin
854	      with, they do not significantly improve over an external
855	      loop. More importantly, however, they are possibly misleading -
856	      <function>operator==</function>, for example, usually checks for
857	      equivalence, or interchangeability, but the associative
858	      container cannot check for values' equivalence, only keys'
859	      equivalence; also, are two containers considered equivalent if
860	      they store the same values in different order? this is an
861	      arbitrary decision.
862	    </para>
863	  </section> <!-- compare -->
864
865	</section>  <!-- functional -->
866
867      </section> <!--associative-->
868
869      <section xml:id="pbds.intro.motivation.priority_queue">
870	<info><title>Priority Queues</title></info>
871
872	<section xml:id="motivation.priority_queue.policy">
873	  <info><title>Policy Choices</title></info>
874
875	  <para>
876	    Priority queues are containers that allow efficiently inserting
877	    values and accessing the maximal value (in the sense of the
878	    container's comparison functor). Their interface
879	    supports <function>push</function>
880	    and <function>pop</function>. The standard
881	    container <classname>std::priorityqueue</classname> indeed support
882	    these methods, but little else. For algorithmic and
883	    software-engineering purposes, other methods are needed:
884	  </para>
885
886	  <orderedlist>
887	    <listitem>
888	      <para>
889		Many graph algorithms (see
890		<xref linkend="biblio.clrs2001"/>) require increasing a
891		value in a priority queue (again, in the sense of the
892		container's comparison functor), or joining two
893		priority-queue objects.
894	      </para>
895	    </listitem>
896
897	    <listitem>
898	      <para>The return type of <classname>priority_queue</classname>'s
899	      <function>push</function> method is a point-type iterator, which can
900	      be used for modifying or erasing arbitrary values. For
901	      example:</para>
902	      <programlisting>
903		priority_queue&lt;int&gt; p;
904		priority_queue&lt;int&gt;::point_iterator it = p.push(3);
905		p.modify(it, 4);
906	      </programlisting>
907
908	      <para>These types of cross-referencing operations are necessary
909	      for making priority queues useful for different applications,
910	      especially graph applications.</para>
911
912	    </listitem>
913	    <listitem>
914	      <para>
915		It is sometimes necessary to erase an arbitrary value in a
916		priority queue. For example, consider
917		the <function>select</function> function for monitoring
918		file descriptors:
919	      </para>
920
921	      <programlisting>
922		int
923		select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *errorfds,
924		struct timeval *timeout);
925	      </programlisting>
926	      <para>
927		then, as the select documentation states:
928	      </para>
929	      <para>
930		<quote>
931		  The nfds argument specifies the range of file
932		  descriptors to be tested. The select() function tests file
933		descriptors in the range of 0 to nfds-1.</quote>
934	      </para>
935
936	      <para>
937		It stands to reason, therefore, that we might wish to
938		maintain a minimal value for <varname>nfds</varname>, and
939		priority queues immediately come to mind. Note, though, that
940		when a socket is closed, the minimal file description might
941		change; in the absence of an efficient means to erase an
942		arbitrary value from a priority queue, we might as well
943		avoid its use altogether.
944	      </para>
945
946	      <para>
947		The standard containers typically support iterators. It is
948		somewhat unusual
949		for <classname>std::priority_queue</classname> to omit them
950		(See <xref linkend="biblio.meyers01stl"/>). One might
951		ask why do priority queues need to support iterators, since
952		they are self-organizing containers with a different purpose
953		than abstracting sequences. There are several reasons:
954	      </para>
955	      <orderedlist>
956		<listitem>
957		  <para>
958		    Iterators (even in self-organizing containers) are
959		    useful for many purposes: cross-referencing
960		    containers, serialization, and debugging code that uses
961		    these containers.
962		  </para>
963		</listitem>
964
965		<listitem>
966		  <para>
967		    The standard library's hash-based containers support
968		    iterators, even though they too are self-organizing
969		    containers with a different purpose than abstracting
970		    sequences.
971		  </para>
972		</listitem>
973
974		<listitem>
975		  <para>
976		    In standard-library-like containers, it is natural to specify the
977		    interface of operations for modifying a value or erasing
978		    a value (discussed previously) in terms of a iterators.
979		    It should be noted that the standard
980		    containers also use iterators for accessing and
981		    manipulating a specific value. In hash-based
982		    containers, one checks the existence of a key by
983		    comparing the iterator returned by <function>find</function> to the
984		    iterator returned by <function>end</function>, and not by comparing a
985		    pointer returned by <function>find</function> to <type>NULL</type>.
986		  </para>
987		</listitem>
988	      </orderedlist>
989	    </listitem>
990	  </orderedlist>
991
992	</section>
993
994	<section xml:id="motivation.priority_queue.underlying">
995	  <info><title>Underlying Data Structures</title></info>
996
997	  <para>
998	    There are three main implementations of priority queues: the
999	    first employs a binary heap, typically one which uses a
1000	    sequence; the second uses a tree (or forest of trees), which is
1001	    typically less structured than an associative container's tree;
1002	    the third simply uses an associative container. These are
1003	    shown in the figure below with labels A1 and A2, B, and C.
1004	  </para>
1005
1006	  <figure>
1007	    <title>Underlying Priority Queue Data Structures</title>
1008	    <mediaobject>
1009	      <imageobject>
1010		<imagedata align="center" format="PNG" scale="100"
1011			   fileref="../images/pbds_different_underlying_dss_2.png"/>
1012	      </imageobject>
1013	      <textobject>
1014		<phrase>Underlying Priority Queue Data Structures</phrase>
1015	      </textobject>
1016	    </mediaobject>
1017	  </figure>
1018
1019	  <para>
1020	    No single implementation can completely replace any of the
1021	    others. Some have better <function>push</function>
1022	    and <function>pop</function> amortized performance, some have
1023	    better bounded (worst case) response time than others, some
1024	    optimize a single method at the expense of others, etc. In
1025	    general the "best" implementation is dictated by the specific
1026	    problem.
1027	  </para>
1028
1029	  <para>
1030	    As with associative containers, the more implementations
1031	    co-exist, the more necessary a traits mechanism is for handling
1032	    generic containers safely and efficiently. This is especially
1033	    important for priority queues, since the invalidation guarantees
1034	    of one of the most useful data structures - binary heaps - is
1035	    markedly different than those of most of the others.
1036	  </para>
1037
1038	</section>
1039
1040	<section xml:id="motivation.priority_queue.binary_heap">
1041	  <info><title>Binary Heaps</title></info>
1042
1043
1044	  <para>
1045	    Binary heaps are one of the most useful underlying
1046	    data structures for priority queues. They are very efficient in
1047	    terms of memory (since they don't require per-value structure
1048	    metadata), and have the best amortized <function>push</function> and
1049	    <function>pop</function> performance for primitive types like
1050	    <type>int</type>.
1051	  </para>
1052
1053	  <para>
1054	    The standard library's <classname>priority_queue</classname>
1055	    implements this data structure as an adapter over a sequence,
1056	    typically
1057	    <classname>std::vector</classname>
1058	    or <classname>std::deque</classname>, which correspond to labels
1059	    A1 and A2 respectively in the graphic above.
1060	  </para>
1061
1062	  <para>
1063	    This is indeed an elegant example of the adapter concept and
1064	    the algorithm/container/iterator decomposition. (See <xref linkend="biblio.nelson96stlpq"/>). There are
1065	    several reasons why a binary-heap priority queue
1066	    may be better implemented as a container instead of a
1067	    sequence adapter:
1068	  </para>
1069
1070	  <orderedlist>
1071	    <listitem>
1072	      <para>
1073		<classname>std::priority_queue</classname> cannot erase values
1074		from its adapted sequence (irrespective of the sequence
1075		type). This means that the memory use of
1076		an <classname>std::priority_queue</classname> object is always
1077		proportional to the maximal number of values it ever contained,
1078		and not to the number of values that it currently
1079		contains. (See <filename>performance/priority_queue_text_pop_mem_usage.cc</filename>.)
1080		This implementation of binary heaps acts very differently than
1081		other underlying data structures (See also pairing heaps).
1082	      </para>
1083	    </listitem>
1084
1085	    <listitem>
1086	      <para>
1087		Some combinations of adapted sequences and value types
1088		are very inefficient or just don't make sense. If one uses
1089		<classname>std::priority_queue&lt;std::vector&lt;std::string&gt;
1090		&gt; &gt;</classname>, for example, then not only will each
1091		operation perform a logarithmic number of
1092		<classname>std::string</classname> assignments, but, furthermore, any
1093		operation (including <function>pop</function>) can render the container
1094		useless due to exceptions. Conversely, if one uses
1095		<classname>std::priority_queue&lt;std::deque&lt;int&gt; &gt;
1096		&gt;</classname>, then each operation uses incurs a logarithmic
1097		number of indirect accesses (through pointers) unnecessarily.
1098		It might be better to let the container make a conservative
1099		deduction whether to use the structure in the graphic above, labels A1 or A2.
1100	      </para>
1101	    </listitem>
1102
1103	    <listitem>
1104	      <para>
1105		There does not seem to be a systematic way to determine
1106		what exactly can be done with the priority queue.
1107	      </para>
1108	      <orderedlist>
1109		<listitem>
1110		  <para>
1111		    If <classname>p</classname> is a priority queue adapting an
1112		    <classname>std::vector</classname>, then it is possible to iterate over
1113		    all values by using <function>&amp;p.top()</function> and
1114		    <function>&amp;p.top() + p.size()</function>, but this will not work
1115		    if <varname>p</varname> is adapting an <classname>std::deque</classname>; in any
1116		    case, one cannot use <classname>p.begin()</classname> and
1117		    <classname>p.end()</classname>. If a different sequence is adapted, it
1118		    is even more difficult to determine what can be
1119		    done.
1120		  </para>
1121		</listitem>
1122
1123		<listitem>
1124		  <para>
1125		    If <varname>p</varname> is a priority queue adapting an
1126		    <classname>std::deque</classname>, then the reference return by
1127		  </para>
1128		  <programlisting>
1129		    p.top()
1130		  </programlisting>
1131		  <para>
1132		    will remain valid until it is popped,
1133		    but if <varname>p</varname> adapts an <classname>std::vector</classname>, the
1134		    next <function>push</function> will invalidate it. If a different
1135		    sequence is adapted, it is even more difficult to
1136		    determine what can be done.
1137		  </para>
1138		</listitem>
1139	      </orderedlist>
1140	    </listitem>
1141
1142	    <listitem>
1143	      <para>
1144		Sequence-based binary heaps can still implement
1145		linear-time <function>erase</function> and <function>modify</function> operations.
1146		This means that if one needs to erase a small
1147		(say logarithmic) number of values, then one might still
1148		choose this underlying data structure. Using
1149		<classname>std::priority_queue</classname>, however, this will generally
1150		change the order of growth of the entire sequence of
1151		operations.
1152	      </para>
1153	    </listitem>
1154	  </orderedlist>
1155
1156	</section>
1157      </section>
1158    </section> <!-- goals/motivation -->
1159  </section> <!-- intro -->
1160
1161  <!-- S02: Using -->
1162  <section xml:id="containers.pbds.using">
1163    <info><title>Using</title></info>
1164    <?dbhtml filename="policy_data_structures_using.html"?>
1165
1166    <section xml:id="pbds.using.prereq">
1167      <info><title>Prerequisites</title></info>
1168
1169      <para>The library contains only header files, and does not require any
1170      other libraries except the standard C++ library . All classes are
1171      defined in namespace <code>__gnu_pbds</code>. The library internally
1172      uses macros beginning with <code>PB_DS</code>, but
1173      <code>#undef</code>s anything it <code>#define</code>s (except for
1174      header guards). Compiling the library in an environment where macros
1175      beginning in <code>PB_DS</code> are defined, may yield unpredictable
1176      results in compilation, execution, or both.</para>
1177
1178      <para>
1179	Further dependencies are necessary to create the visual output
1180	for the performance tests. To create these graphs, an
1181	additional package is needed: <command>pychart</command>.
1182      </para>
1183    </section>
1184
1185    <section xml:id="pbds.using.organization">
1186      <info><title>Organization</title></info>
1187
1188      <para>
1189	The various data structures are organized as follows.
1190      </para>
1191
1192      <itemizedlist>
1193	<listitem>
1194	  <para>
1195	    Branch-Based
1196	  </para>
1197
1198	  <itemizedlist>
1199	    <listitem>
1200	      <para>
1201		<classname>basic_branch</classname>
1202		is an abstract base class for branched-based
1203		associative-containers
1204	      </para>
1205	    </listitem>
1206
1207	    <listitem>
1208	      <para>
1209		<classname>tree</classname>
1210		is a concrete base class for tree-based
1211		associative-containers
1212	      </para>
1213	    </listitem>
1214
1215	    <listitem>
1216	      <para>
1217		<classname>trie</classname>
1218		is a concrete base class trie-based
1219		associative-containers
1220	      </para>
1221	    </listitem>
1222	  </itemizedlist>
1223	</listitem>
1224
1225	<listitem>
1226	  <para>
1227	    Hash-Based
1228	  </para>
1229	  <itemizedlist>
1230	    <listitem>
1231	      <para>
1232		<classname>basic_hash_table</classname>
1233		is an abstract base class for hash-based
1234		associative-containers
1235	      </para>
1236	    </listitem>
1237
1238	    <listitem>
1239	      <para>
1240		<classname>cc_hash_table</classname>
1241		is a concrete collision-chaining hash-based
1242		associative-containers
1243	      </para>
1244	    </listitem>
1245
1246	    <listitem>
1247	      <para>
1248		<classname>gp_hash_table</classname>
1249		is a concrete (general) probing hash-based
1250		associative-containers
1251	      </para>
1252	    </listitem>
1253	  </itemizedlist>
1254	</listitem>
1255
1256	<listitem>
1257	  <para>
1258	    List-Based
1259	  </para>
1260	  <itemizedlist>
1261	    <listitem>
1262	      <para>
1263		<classname>list_update</classname>
1264		list-based update-policy associative container
1265	      </para>
1266	    </listitem>
1267	  </itemizedlist>
1268	</listitem>
1269	<listitem>
1270	  <para>
1271	    Heap-Based
1272	  </para>
1273	  <itemizedlist>
1274	    <listitem>
1275	      <para>
1276		<classname>priority_queue</classname>
1277		A priority queue.
1278	      </para>
1279	    </listitem>
1280	  </itemizedlist>
1281	</listitem>
1282      </itemizedlist>
1283
1284      <para>
1285	The hierarchy is composed naturally so that commonality is
1286	captured by base classes. Thus <function>operator[]</function>
1287	is defined at the base of any hierarchy, since all derived
1288	containers support it. Conversely <function>split</function> is
1289	defined in <classname>basic_branch</classname>, since only
1290	tree-like containers support it.
1291      </para>
1292
1293      <para>
1294	In addition, there are the following diagnostics classes,
1295	used to report errors specific to this library's data
1296	structures.
1297      </para>
1298
1299      <figure>
1300	<title>Exception Hierarchy</title>
1301	<mediaobject>
1302	  <imageobject>
1303	    <imagedata align="center" format="PDF" scale="75"
1304		       fileref="../images/pbds_exception_hierarchy.pdf"/>
1305	  </imageobject>
1306	  <imageobject>
1307	    <imagedata align="center" format="PNG" scale="100"
1308		       fileref="../images/pbds_exception_hierarchy.png"/>
1309	  </imageobject>
1310	  <textobject>
1311	    <phrase>Exception Hierarchy</phrase>
1312	  </textobject>
1313	</mediaobject>
1314      </figure>
1315
1316    </section>
1317
1318    <section xml:id="pbds.using.tutorial">
1319      <info><title>Tutorial</title></info>
1320
1321      <section xml:id="pbds.using.tutorial.basic">
1322	<info><title>Basic Use</title></info>
1323
1324	<para>
1325	  For the most part, the policy-based containers containers in
1326	  namespace <literal>__gnu_pbds</literal> have the same interface as
1327	  the equivalent containers in the standard C++ library, except for
1328	  the names used for the container classes themselves. For example,
1329	  this shows basic operations on a collision-chaining hash-based
1330	  container:
1331	</para>
1332	<programlisting>
1333	  #include &lt;ext/pb_ds/assoc_container.h&gt;
1334
1335	  int main()
1336	  {
1337	  __gnu_pbds::cc_hash_table&lt;int, char&gt; c;
1338	  c[2] = 'b';
1339	  assert(c.find(1) == c.end());
1340	  };
1341	</programlisting>
1342
1343	<para>
1344	  The container is called
1345	  <classname>__gnu_pbds::cc_hash_table</classname> instead of
1346	  <classname>std::unordered_map</classname>, since <quote>unordered
1347	  map</quote> does not necessarily mean a hash-based map as implied by
1348	  the C++ library (C++11 or TR1). For example, list-based associative
1349	  containers, which are very useful for the construction of
1350	  "multimaps," are also unordered.
1351	</para>
1352
1353	<para>This snippet shows a red-black tree based container:</para>
1354
1355	<programlisting>
1356	  #include &lt;ext/pb_ds/assoc_container.h&gt;
1357
1358	  int main()
1359	  {
1360	  __gnu_pbds::tree&lt;int, char&gt; c;
1361	  c[2] = 'b';
1362	  assert(c.find(2) != c.end());
1363	  };
1364	</programlisting>
1365
1366	<para>The container is called <classname>tree</classname> instead of
1367	<classname>map</classname> since the underlying data structures are
1368	being named with specificity.
1369	</para>
1370
1371	<para>
1372	  The member function naming convention is to strive to be the same as
1373	  the equivalent member functions in other C++ standard library
1374	  containers. The familiar methods are unchanged:
1375	  <function>begin</function>, <function>end</function>,
1376	  <function>size</function>, <function>empty</function>, and
1377	  <function>clear</function>.
1378	</para>
1379
1380	<para>
1381	  This isn't to say that things are exactly as one would expect, given
1382	  the container requirments and interfaces in the C++ standard.
1383	</para>
1384
1385	<para>
1386	  The names of containers' policies and policy accessors are
1387	  different then the usual. For example, if <type>hash_type</type> is
1388	some type of hash-based container, then</para>
1389
1390	<programlisting>
1391	  hash_type::hash_fn
1392	</programlisting>
1393
1394	<para>
1395	  gives the type of its hash functor, and if <varname>obj</varname> is
1396	  some hash-based container object, then
1397	</para>
1398
1399	<programlisting>
1400	  obj.get_hash_fn()
1401	</programlisting>
1402
1403	<para>will return a reference to its hash-functor object.</para>
1404
1405
1406	<para>
1407	  Similarly, if <type>tree_type</type> is some type of tree-based
1408	  container, then
1409	</para>
1410
1411	<programlisting>
1412	  tree_type::cmp_fn
1413	</programlisting>
1414
1415	<para>
1416	  gives the type of its comparison functor, and if
1417	  <varname>obj</varname> is some tree-based container object,
1418	  then
1419	</para>
1420
1421	<programlisting>
1422	  obj.get_cmp_fn()
1423	</programlisting>
1424
1425	<para>will return a reference to its comparison-functor object.</para>
1426
1427	<para>
1428	  It would be nice to give names consistent with those in the existing
1429	  C++ standard (inclusive of TR1). Unfortunately, these standard
1430	  containers don't consistently name types and methods. For example,
1431	  <classname>std::tr1::unordered_map</classname> uses
1432	  <type>hasher</type> for the hash functor, but
1433	  <classname>std::map</classname> uses <type>key_compare</type> for
1434	  the comparison functor. Also, we could not find an accessor for
1435	  <classname>std::tr1::unordered_map</classname>'s hash functor, but
1436	  <classname>std::map</classname> uses <classname>compare</classname>
1437	  for accessing the comparison functor.
1438	</para>
1439
1440	<para>
1441	  Instead, <literal>__gnu_pbds</literal> attempts to be internally
1442	  consistent, and uses standard-derived terminology if possible.
1443	</para>
1444
1445	<para>
1446	  Another source of difference is in scope:
1447	  <literal>__gnu_pbds</literal> contains more types of associative
1448	  containers than the standard C++ library, and more opportunities
1449	  to configure these new containers, since different types of
1450	  associative containers are useful in different settings.
1451	</para>
1452
1453	<para>
1454	  Namespace <literal>__gnu_pbds</literal> contains different classes for
1455	  hash-based containers, tree-based containers, trie-based containers,
1456	  and list-based containers.
1457	</para>
1458
1459	<para>
1460	  Since associative containers share parts of their interface, they
1461	  are organized as a class hierarchy.
1462	</para>
1463
1464	<para>Each type or method is defined in the most-common ancestor
1465	in which it makes sense.
1466	</para>
1467
1468	<para>For example, all associative containers support iteration
1469	expressed in the following form:
1470	</para>
1471
1472	<programlisting>
1473	  const_iterator
1474	  begin() const;
1475
1476	  iterator
1477	  begin();
1478
1479	  const_iterator
1480	  end() const;
1481
1482	  iterator
1483	  end();
1484	</programlisting>
1485
1486	<para>
1487	  But not all containers contain or use hash functors. Yet, both
1488	  collision-chaining and (general) probing hash-based associative
1489	  containers have a hash functor, so
1490	  <classname>basic_hash_table</classname> contains the interface:
1491	</para>
1492
1493	<programlisting>
1494	  const hash_fn&amp;
1495	  get_hash_fn() const;
1496
1497	  hash_fn&amp;
1498	  get_hash_fn();
1499	</programlisting>
1500
1501	<para>
1502	  so all hash-based associative containers inherit the same
1503	  hash-functor accessor methods.
1504	</para>
1505
1506      </section> <!--basic use -->
1507
1508      <section xml:id="pbds.using.tutorial.configuring">
1509	<info>
1510	  <title>
1511	    Configuring via Template Parameters
1512	  </title>
1513	</info>
1514
1515	<para>
1516	  In general, each of this library's containers is
1517	  parametrized by more policies than those of the standard library. For
1518	  example, the standard hash-based container is parametrized as
1519	  follows:
1520	</para>
1521	<programlisting>
1522	  template&lt;typename Key, typename Mapped, typename Hash,
1523	  typename Pred, typename Allocator, bool Cache_Hashe_Code&gt;
1524	  class unordered_map;
1525	</programlisting>
1526
1527	<para>
1528	  and so can be configured by key type, mapped type, a functor
1529	  that translates keys to unsigned integral types, an equivalence
1530	  predicate, an allocator, and an indicator whether to store hash
1531	  values with each entry. this library's collision-chaining
1532	  hash-based container is parametrized as
1533	</para>
1534	<programlisting>
1535	  template&lt;typename Key, typename Mapped, typename Hash_Fn,
1536	  typename Eq_Fn, typename Comb_Hash_Fn,
1537	  typename Resize_Policy, bool Store_Hash
1538	  typename Allocator&gt;
1539	  class cc_hash_table;
1540	</programlisting>
1541
1542	<para>
1543	  and so can be configured by the first four types of
1544	  <classname>std::tr1::unordered_map</classname>, then a
1545	  policy for translating the key-hash result into a position
1546	  within the table, then a policy by which the table resizes,
1547	  an indicator whether to store hash values with each entry,
1548	  and an allocator (which is typically the last template
1549	  parameter in standard containers).
1550	</para>
1551
1552	<para>
1553	  Nearly all policy parameters have default values, so this
1554	  need not be considered for casual use. It is important to
1555	  note, however, that hash-based containers' policies can
1556	  dramatically alter their performance in different settings,
1557	  and that tree-based containers' policies can make them
1558	  useful for other purposes than just look-up.
1559	</para>
1560
1561
1562	<para>As opposed to associative containers, priority queues have
1563	relatively few configuration options. The priority queue is
1564	parametrized as follows:</para>
1565	<programlisting>
1566	  template&lt;typename Value_Type, typename Cmp_Fn,typename Tag,
1567	  typename Allocator&gt;
1568	  class priority_queue;
1569	</programlisting>
1570
1571	<para>The <classname>Value_Type</classname>, <classname>Cmp_Fn</classname>, and
1572	<classname>Allocator</classname> parameters are the container's value type,
1573	comparison-functor type, and allocator type, respectively;
1574	these are very similar to the standard's priority queue. The
1575	<classname>Tag</classname> parameter is different: there are a number of
1576	pre-defined tag types corresponding to binary heaps, binomial
1577	heaps, etc., and <classname>Tag</classname> should be instantiated
1578	by one of them.</para>
1579
1580	<para>Note that as opposed to the
1581	<classname>std::priority_queue</classname>,
1582	<classname>__gnu_pbds::priority_queue</classname> is not a
1583	sequence-adapter; it is a regular container.</para>
1584
1585      </section>
1586
1587      <section xml:id="pbds.using.tutorial.traits">
1588	<info>
1589	  <title>
1590	    Querying Container Attributes
1591	  </title>
1592	</info>
1593	<para></para>
1594
1595	<para>A containers underlying data structure
1596	affect their performance; Unfortunately, they can also affect
1597	their interface. When manipulating generically associative
1598	containers, it is often useful to be able to statically
1599	determine what they can support and what the cannot.
1600	</para>
1601
1602	<para>Happily, the standard provides a good solution to a similar
1603	problem - that of the different behavior of iterators. If
1604	<classname>It</classname> is an iterator, then
1605	</para>
1606	<programlisting>
1607	  typename std::iterator_traits&lt;It&gt;::iterator_category
1608	</programlisting>
1609
1610	<para>is one of a small number of pre-defined tag classes, and
1611	</para>
1612	<programlisting>
1613	  typename std::iterator_traits&lt;It&gt;::value_type
1614	</programlisting>
1615
1616	<para>is the value type to which the iterator "points".</para>
1617
1618	<para>
1619	  Similarly, in this library, if <type>C</type> is a
1620	  container, then <classname>container_traits</classname> is a
1621	  trait class that stores information about the kind of
1622	  container that is implemented.
1623	</para>
1624	<programlisting>
1625	  typename container_traits&lt;C&gt;::container_category
1626	</programlisting>
1627	<para>
1628	  is one of a small number of predefined tag structures that
1629	  uniquely identifies the type of underlying data structure.
1630	</para>
1631
1632	<para>In most cases, however, the exact underlying data
1633	structure is not really important, but what is important is
1634	one of its other attributes: whether it guarantees storing
1635	elements by key order, for example. For this one can
1636	use</para>
1637	<programlisting>
1638	  typename container_traits&lt;C&gt;::order_preserving
1639	</programlisting>
1640	<para>
1641	  Also,
1642	</para>
1643	<programlisting>
1644	  typename container_traits&lt;C&gt;::invalidation_guarantee
1645	</programlisting>
1646
1647	<para>is the container's invalidation guarantee. Invalidation
1648	guarantees are especially important regarding priority queues,
1649	since in this library's design, iterators are practically the
1650	only way to manipulate them.</para>
1651      </section>
1652
1653      <section xml:id="pbds.using.tutorial.point_range_iteration">
1654	<info>
1655	  <title>
1656	    Point and Range Iteration
1657	  </title>
1658	</info>
1659	<para></para>
1660
1661	<para>This library differentiates between two types of methods
1662	and iterators: point-type, and range-type. For example,
1663	<function>find</function> and <function>insert</function> are point-type methods, since
1664	they each deal with a specific element; their returned
1665	iterators are point-type iterators. <function>begin</function> and
1666	<function>end</function> are range-type methods, since they are not used to
1667	find a specific element, but rather to go over all elements in
1668	a container object; their returned iterators are range-type
1669	iterators.
1670	</para>
1671
1672	<para>Most containers store elements in an order that is
1673	determined by their interface. Correspondingly, it is fine that
1674	their point-type iterators are synonymous with their range-type
1675	iterators. For example, in the following snippet
1676	</para>
1677	<programlisting>
1678	  std::for_each(c.find(1), c.find(5), foo);
1679	</programlisting>
1680	<para>
1681	  two point-type iterators (returned by <function>find</function>) are used
1682	  for a range-type purpose - going over all elements whose key is
1683	  between 1 and 5.
1684	</para>
1685
1686	<para>
1687	  Conversely, the above snippet makes no sense for
1688	  self-organizing containers - ones that order (and reorder)
1689	  their elements by implementation. It would be nice to have a
1690	  uniform iterator system that would allow the above snippet to
1691	  compile only if it made sense.
1692	</para>
1693
1694	<para>
1695	  This could trivially be done by specializing
1696	  <function>std::for_each</function> for the case of iterators returned by
1697	  <classname>std::tr1::unordered_map</classname>, but this would only solve the
1698	  problem for one algorithm and one container. Fundamentally, the
1699	  problem is that one can loop using a self-organizing
1700	  container's point-type iterators.
1701	</para>
1702
1703	<para>
1704	  This library's containers define two families of
1705	  iterators: <type>point_const_iterator</type> and
1706	  <type>point_iterator</type> are the iterator types returned by
1707	  point-type methods; <type>const_iterator</type> and
1708	  <type>iterator</type> are the iterator types returned by range-type
1709	  methods.
1710	</para>
1711	<programlisting>
1712	  class &lt;- some container -&gt;
1713	  {
1714	  public:
1715	  ...
1716
1717	  typedef &lt;- something -&gt; const_iterator;
1718
1719	  typedef &lt;- something -&gt; iterator;
1720
1721	  typedef &lt;- something -&gt; point_const_iterator;
1722
1723	  typedef &lt;- something -&gt; point_iterator;
1724
1725	  ...
1726
1727	  public:
1728	  ...
1729
1730	  const_iterator begin () const;
1731
1732	  iterator begin();
1733
1734	  point_const_iterator find(...) const;
1735
1736	  point_iterator find(...);
1737	  };
1738	</programlisting>
1739
1740	<para>For
1741	containers whose interface defines sequence order , it
1742	is very simple: point-type and range-type iterators are exactly
1743	the same, which means that the above snippet will compile if it
1744	is used for an order-preserving associative container.
1745	</para>
1746
1747	<para>
1748	  For self-organizing containers, however, (hash-based
1749	  containers as a special example), the preceding snippet will
1750	  not compile, because their point-type iterators do not support
1751	  <function>operator++</function>.
1752	</para>
1753
1754	<para>In any case, both for order-preserving and self-organizing
1755	containers, the following snippet will compile:
1756	</para>
1757	<programlisting>
1758	  typename Cntnr::point_iterator it = c.find(2);
1759	</programlisting>
1760
1761	<para>
1762	  because a range-type iterator can always be converted to a
1763	  point-type iterator.
1764	</para>
1765
1766	<para>Distingushing between iterator types also
1767	raises the point that a container's iterators might have
1768	different invalidation rules concerning their de-referencing
1769	abilities and movement abilities. This now corresponds exactly
1770	to the question of whether point-type and range-type iterators
1771	are valid. As explained above, <classname>container_traits</classname> allows
1772	querying a container for its data structure attributes. The
1773	iterator-invalidation guarantees are certainly a property of
1774	the underlying data structure, and so
1775	</para>
1776	<programlisting>
1777	  container_traits&lt;C&gt;::invalidation_guarantee
1778	</programlisting>
1779
1780	<para>
1781	  gives one of three pre-determined types that answer this
1782	  query.
1783	</para>
1784
1785      </section>
1786    </section> <!-- tutorial -->
1787
1788    <section xml:id="pbds.using.examples">
1789      <info><title>Examples</title></info>
1790      <para>
1791	Additional code examples are provided in the source
1792	distribution, as part of the regression and performance
1793	testsuite.
1794      </para>
1795
1796      <section xml:id="pbds.using.examples.basic">
1797	<info><title>Intermediate Use</title></info>
1798
1799	<itemizedlist>
1800	  <listitem>
1801	    <para>
1802	      Basic use of maps:
1803	      <filename>basic_map.cc</filename>
1804	    </para>
1805	  </listitem>
1806
1807	  <listitem>
1808	    <para>
1809	      Basic use of sets:
1810	      <filename>basic_set.cc</filename>
1811	    </para>
1812	  </listitem>
1813
1814	  <listitem>
1815	    <para>
1816	      Conditionally erasing values from an associative container object:
1817	      <filename>erase_if.cc</filename>
1818	    </para>
1819	  </listitem>
1820
1821	  <listitem>
1822	    <para>
1823	      Basic use of multimaps:
1824	      <filename>basic_multimap.cc</filename>
1825	    </para>
1826	  </listitem>
1827
1828	  <listitem>
1829	    <para>
1830	      Basic use of multisets:
1831	      <filename>basic_multiset.cc</filename>
1832	    </para>
1833	  </listitem>
1834
1835	  <listitem>
1836	    <para>
1837	      Basic use of priority queues:
1838	      <filename>basic_priority_queue.cc</filename>
1839	    </para>
1840	  </listitem>
1841
1842	  <listitem>
1843	    <para>
1844	      Splitting and joining priority queues:
1845	      <filename>priority_queue_split_join.cc</filename>
1846	    </para>
1847	  </listitem>
1848
1849	  <listitem>
1850	    <para>
1851	      Conditionally erasing values from a priority queue:
1852	      <filename>priority_queue_erase_if.cc</filename>
1853	    </para>
1854	  </listitem>
1855	</itemizedlist>
1856
1857      </section>
1858
1859      <section xml:id="pbds.using.examples.query">
1860	<info><title>Querying with <classname>container_traits</classname> </title></info>
1861	<itemizedlist>
1862	  <listitem>
1863	    <para>
1864	      Using <classname>container_traits</classname> to query
1865	      about underlying data structure behavior:
1866	      <filename>assoc_container_traits.cc</filename>
1867	    </para>
1868	  </listitem>
1869
1870	  <listitem>
1871	    <para>
1872	      A non-compiling example showing wrong use of finding keys in
1873	      hash-based containers: <filename>hash_find_neg.cc</filename>
1874	    </para>
1875	  </listitem>
1876	  <listitem>
1877	    <para>
1878	      Using <classname>container_traits</classname>
1879	      to query about underlying data structure behavior:
1880	      <filename>priority_queue_container_traits.cc</filename>
1881	    </para>
1882	  </listitem>
1883
1884	</itemizedlist>
1885
1886      </section>
1887
1888      <section xml:id="pbds.using.examples.container">
1889	<info><title>By Container Method</title></info>
1890	<para></para>
1891
1892	<section xml:id="pbds.using.examples.container.hash">
1893	  <info><title>Hash-Based</title></info>
1894
1895	  <section xml:id="pbds.using.examples.container.hash.resize">
1896	    <info><title>size Related</title></info>
1897
1898	    <itemizedlist>
1899	      <listitem>
1900		<para>
1901		  Setting the initial size of a hash-based container
1902		  object:
1903		  <filename>hash_initial_size.cc</filename>
1904		</para>
1905	      </listitem>
1906
1907	      <listitem>
1908		<para>
1909		  A non-compiling example showing how not to resize a
1910		  hash-based container object:
1911		  <filename>hash_resize_neg.cc</filename>
1912		</para>
1913	      </listitem>
1914
1915	      <listitem>
1916		<para>
1917		  Resizing the size of a hash-based container object:
1918		  <filename>hash_resize.cc</filename>
1919		</para>
1920	      </listitem>
1921
1922	      <listitem>
1923		<para>
1924		  Showing an illegal resize of a hash-based container
1925		  object:
1926		  <filename>hash_illegal_resize.cc</filename>
1927		</para>
1928	      </listitem>
1929
1930	      <listitem>
1931		<para>
1932		  Changing the load factors of a hash-based container
1933		  object: <filename>hash_load_set_change.cc</filename>
1934		</para>
1935	      </listitem>
1936	    </itemizedlist>
1937	  </section>
1938
1939	  <section xml:id="pbds.using.examples.container.hash.hashor">
1940	    <info><title>Hashing Function Related</title></info>
1941	    <para></para>
1942
1943	    <itemizedlist>
1944	      <listitem>
1945		<para>
1946		  Using a modulo range-hashing function for the case of an
1947		  unknown skewed key distribution:
1948		  <filename>hash_mod.cc</filename>
1949		</para>
1950	      </listitem>
1951
1952	      <listitem>
1953		<para>
1954		  Writing a range-hashing functor for the case of a known
1955		  skewed key distribution:
1956		  <filename>shift_mask.cc</filename>
1957		</para>
1958	      </listitem>
1959
1960	      <listitem>
1961		<para>
1962		  Storing the hash value along with each key:
1963		  <filename>store_hash.cc</filename>
1964		</para>
1965	      </listitem>
1966
1967	      <listitem>
1968		<para>
1969		  Writing a ranged-hash functor:
1970		  <filename>ranged_hash.cc</filename>
1971		</para>
1972	      </listitem>
1973	    </itemizedlist>
1974
1975	  </section>
1976
1977	</section>
1978
1979	<section xml:id="pbds.using.examples.container.branch">
1980	  <info><title>Branch-Based</title></info>
1981
1982
1983	  <section xml:id="pbds.using.examples.container.branch.split">
1984	    <info><title>split or join Related</title></info>
1985
1986	    <itemizedlist>
1987	      <listitem>
1988		<para>
1989		  Joining two tree-based container objects:
1990		  <filename>tree_join.cc</filename>
1991		</para>
1992	      </listitem>
1993
1994	      <listitem>
1995		<para>
1996		  Splitting a PATRICIA trie container object:
1997		  <filename>trie_split.cc</filename>
1998		</para>
1999	      </listitem>
2000
2001	      <listitem>
2002		<para>
2003		  Order statistics while joining two tree-based container
2004		  objects:
2005		  <filename>tree_order_statistics_join.cc</filename>
2006		</para>
2007	      </listitem>
2008	    </itemizedlist>
2009
2010	  </section>
2011
2012	  <section xml:id="pbds.using.examples.container.branch.invariants">
2013	    <info><title>Node Invariants</title></info>
2014
2015	    <itemizedlist>
2016	      <listitem>
2017		<para>
2018		  Using trees for order statistics:
2019		  <filename>tree_order_statistics.cc</filename>
2020		</para>
2021	      </listitem>
2022
2023	      <listitem>
2024		<para>
2025		  Augmenting trees to support operations on line
2026		  intervals:
2027		  <filename>tree_intervals.cc</filename>
2028		</para>
2029	      </listitem>
2030	    </itemizedlist>
2031
2032	  </section>
2033
2034	  <section xml:id="pbds.using.examples.container.branch.trie">
2035	    <info><title>trie</title></info>
2036	    <itemizedlist>
2037	      <listitem>
2038		<para>
2039		  Using a PATRICIA trie for DNA strings:
2040		  <filename>trie_dna.cc</filename>
2041		</para>
2042	      </listitem>
2043
2044	      <listitem>
2045		<para>
2046		  Using a PATRICIA
2047		  trie for finding all entries whose key matches a given prefix:
2048		  <filename>trie_prefix_search.cc</filename>
2049		</para>
2050	      </listitem>
2051	    </itemizedlist>
2052
2053	  </section>
2054
2055	</section>
2056
2057	<section xml:id="pbds.using.examples.container.priority_queue">
2058	  <info><title>Priority Queues</title></info>
2059	  <itemizedlist>
2060	    <listitem>
2061	      <para>
2062		Cross referencing an associative container and a priority
2063		queue: <filename>priority_queue_xref.cc</filename>
2064	      </para>
2065	    </listitem>
2066
2067	    <listitem>
2068	      <para>
2069		Cross referencing a vector and a priority queue using a
2070		very simple version of Dijkstra's shortest path
2071		algorithm:
2072		<filename>priority_queue_dijkstra.cc</filename>
2073	      </para>
2074	    </listitem>
2075	  </itemizedlist>
2076
2077	</section>
2078
2079
2080      </section>
2081
2082    </section>
2083
2084  </section> <!-- using -->
2085
2086  <!-- S03: Design -->
2087
2088
2089<section xml:id="containers.pbds.design">
2090  <info><title>Design</title></info>
2091  <?dbhtml filename="policy_data_structures_design.html"?>
2092  <para></para>
2093
2094  <section xml:id="pbds.design.concepts">
2095    <info><title>Concepts</title></info>
2096
2097    <section xml:id="pbds.design.concepts.null_type">
2098      <info><title>Null Policy Classes</title></info>
2099
2100      <para>
2101	Associative containers are typically parametrized by various
2102	policies. For example, a hash-based associative container is
2103	parametrized by a hash-functor, transforming each key into an
2104	non-negative numerical type. Each such value is then further mapped
2105	into a position within the table. The mapping of a key into a
2106	position within the table is therefore a two-step process.
2107      </para>
2108
2109      <para>
2110	In some cases, instantiations are redundant. For example, when the
2111	keys are integers, it is possible to use a redundant hash policy,
2112	which transforms each key into its value.
2113      </para>
2114
2115      <para>
2116	In some other cases, these policies are irrelevant.  For example, a
2117	hash-based associative container might transform keys into positions
2118	within a table by a different method than the two-step method
2119	described above. In such a case, the hash functor is simply
2120	irrelevant.
2121      </para>
2122
2123      <para>
2124	When a policy is either redundant or irrelevant, it can be replaced
2125	by <classname>null_type</classname>.
2126      </para>
2127
2128      <para>
2129	For example, a <emphasis>set</emphasis> is an associative
2130	container with one of its template parameters (the one for the
2131	mapped type) replaced with <classname>null_type</classname>. Other
2132	places simplifications are made possible with this technique
2133	include node updates in tree and trie data structures, and hash
2134	and probe functions for hash data structures.
2135      </para>
2136    </section>
2137
2138    <section xml:id="pbds.design.concepts.associative_semantics">
2139      <info><title>Map and Set Semantics</title></info>
2140
2141      <section xml:id="concepts.associative_semantics.set_vs_map">
2142	<info>
2143	  <title>
2144	    Distinguishing Between Maps and Sets
2145	  </title>
2146	</info>
2147
2148	<para>
2149	  Anyone familiar with the standard knows that there are four kinds
2150	  of associative containers: maps, sets, multimaps, and
2151	  multisets. The map datatype associates each key to
2152	  some data.
2153	</para>
2154
2155	<para>
2156	  Sets are associative containers that simply store keys -
2157	  they do not map them to anything. In the standard, each map class
2158	  has a corresponding set class. E.g.,
2159	  <classname>std::map&lt;int, char&gt;</classname> maps each
2160	  <classname>int</classname> to a <classname>char</classname>, but
2161	  <classname>std::set&lt;int, char&gt;</classname> simply stores
2162	  <classname>int</classname>s. In this library, however, there are no
2163	  distinct classes for maps and sets. Instead, an associative
2164	  container's <classname>Mapped</classname> template parameter is a policy: if
2165	  it is instantiated by <classname>null_type</classname>, then it
2166	  is a "set"; otherwise, it is a "map". E.g.,
2167	</para>
2168	<programlisting>
2169	  cc_hash_table&lt;int, char&gt;
2170	</programlisting>
2171	<para>
2172	  is a "map" mapping each <type>int</type> value to a <type>
2173	  char</type>, but
2174	</para>
2175	<programlisting>
2176	  cc_hash_table&lt;int, null_type&gt;
2177	</programlisting>
2178	<para>
2179	  is a type that uniquely stores <type>int</type> values.
2180	</para>
2181	<para>Once the <classname>Mapped</classname> template parameter is instantiated
2182	by <classname>null_type</classname>, then
2183	the "set" acts very similarly to the standard's sets - it does not
2184	map each key to a distinct <classname>null_type</classname> object. Also,
2185	, the container's <type>value_type</type> is essentially
2186	its <type>key_type</type> - just as with the standard's sets
2187	.</para>
2188
2189	<para>
2190	  The standard's multimaps and multisets allow, respectively,
2191	  non-uniquely mapping keys and non-uniquely storing keys. As
2192	  discussed, the
2193	  reasons why this might be necessary are 1) that a key might be
2194	  decomposed into a primary key and a secondary key, 2) that a
2195	  key might appear more than once, or 3) any arbitrary
2196	  combination of 1)s and 2)s. Correspondingly,
2197	  one should use 1) "maps" mapping primary keys to secondary
2198	  keys, 2) "maps" mapping keys to size types, or 3) any arbitrary
2199	  combination of 1)s and 2)s. Thus, for example, an
2200	  <classname>std::multiset&lt;int&gt;</classname> might be used to store
2201	  multiple instances of integers, but using this library's
2202	  containers, one might use
2203	</para>
2204	<programlisting>
2205	  tree&lt;int, size_t&gt;
2206	</programlisting>
2207
2208	<para>
2209	  i.e., a <classname>map</classname> of <type>int</type>s to
2210	  <type>size_t</type>s.
2211	</para>
2212	<para>
2213	  These "multimaps" and "multisets" might be confusing to
2214	  anyone familiar with the standard's <classname>std::multimap</classname> and
2215	  <classname>std::multiset</classname>, because there is no clear
2216	  correspondence between the two. For example, in some cases
2217	  where one uses <classname>std::multiset</classname> in the standard, one might use
2218	  in this library a "multimap" of "multisets" - i.e., a
2219	  container that maps primary keys each to an associative
2220	  container that maps each secondary key to the number of times
2221	  it occurs.
2222	</para>
2223
2224	<para>
2225	  When one uses a "multimap," one should choose with care the
2226	  type of container used for secondary keys.
2227	</para>
2228      </section> <!-- map vs set -->
2229
2230
2231      <section xml:id="concepts.associative_semantics.multi">
2232	<info><title>Alternatives to <classname>std::multiset</classname> and <classname>std::multimap</classname></title></info>
2233
2234	<para>
2235	  Brace onself: this library does not contain containers like
2236	  <classname>std::multimap</classname> or
2237	  <classname>std::multiset</classname>. Instead, these data
2238	  structures can be synthesized via manipulation of the
2239	  <classname>Mapped</classname> template parameter.
2240	</para>
2241	<para>
2242	  One maps the unique part of a key - the primary key, into an
2243	  associative-container of the (originally) non-unique parts of
2244	  the key - the secondary key. A primary associative-container
2245	  is an associative container of primary keys; a secondary
2246	  associative-container is an associative container of
2247	  secondary keys.
2248	</para>
2249
2250	<para>
2251	  Stepping back a bit, and starting in from the beginning.
2252	</para>
2253
2254
2255	<para>
2256	  Maps (or sets) allow mapping (or storing) unique-key values.
2257	  The standard library also supplies associative containers which
2258	  map (or store) multiple values with equivalent keys:
2259	  <classname>std::multimap</classname>, <classname>std::multiset</classname>,
2260	  <classname>std::tr1::unordered_multimap</classname>, and
2261	  <classname>unordered_multiset</classname>. We first discuss how these might
2262	  be used, then why we think it is best to avoid them.
2263	</para>
2264
2265	<para>
2266	  Suppose one builds a simple bank-account application that
2267	  records for each client (identified by an <classname>std::string</classname>)
2268	  and account-id (marked by an <type>unsigned long</type>) -
2269	  the balance in the account (described by a
2270	  <type>float</type>). Suppose further that ordering this
2271	  information is not useful, so a hash-based container is
2272	  preferable to a tree based container. Then one can use
2273	</para>
2274
2275	<programlisting>
2276	  std::tr1::unordered_map&lt;std::pair&lt;std::string, unsigned long&gt;, float, ...&gt;
2277	</programlisting>
2278
2279	<para>
2280	  which hashes every combination of client and account-id. This
2281	  might work well, except for the fact that it is now impossible
2282	  to efficiently list all of the accounts of a specific client
2283	  (this would practically require iterating over all
2284	  entries). Instead, one can use
2285	</para>
2286
2287	<programlisting>
2288	  std::tr1::unordered_multimap&lt;std::pair&lt;std::string, unsigned long&gt;, float, ...&gt;
2289	</programlisting>
2290
2291	<para>
2292	  which hashes every client, and decides equivalence based on
2293	  client only. This will ensure that all accounts belonging to a
2294	  specific user are stored consecutively.
2295	</para>
2296
2297	<para>
2298	  Also, suppose one wants an integers' priority queue
2299	  (a container that supports <function>push</function>,
2300	  <function>pop</function>, and <function>top</function> operations, the last of which
2301	  returns the largest <type>int</type>) that also supports
2302	  operations such as <function>find</function> and <function>lower_bound</function>. A
2303	  reasonable solution is to build an adapter over
2304	  <classname>std::set&lt;int&gt;</classname>. In this adapter,
2305	  <function>push</function> will just call the tree-based
2306	  associative container's <function>insert</function> method; <function>pop</function>
2307	  will call its <function>end</function> method, and use it to return the
2308	  preceding element (which must be the largest). Then this might
2309	  work well, except that the container object cannot hold
2310	  multiple instances of the same integer (<function>push(4)</function>,
2311	  will be a no-op if <constant>4</constant> is already in the
2312	  container object). If multiple keys are necessary, then one
2313	  might build the adapter over an
2314	  <classname>std::multiset&lt;int&gt;</classname>.
2315	</para>
2316
2317	<para>
2318	  The standard library's non-unique-mapping containers are useful
2319	  when (1) a key can be decomposed in to a primary key and a
2320	  secondary key, (2) a key is needed multiple times, or (3) any
2321	  combination of (1) and (2).
2322	</para>
2323
2324	<para>
2325	  The graphic below shows how the standard library's container
2326	  design works internally; in this figure nodes shaded equally
2327	  represent equivalent-key values. Equivalent keys are stored
2328	  consecutively using the properties of the underlying data
2329	  structure: binary search trees (label A) store equivalent-key
2330	  values consecutively (in the sense of an in-order walk)
2331	  naturally; collision-chaining hash tables (label B) store
2332	  equivalent-key values in the same bucket, the bucket can be
2333	  arranged so that equivalent-key values are consecutive.
2334	</para>
2335
2336	<figure>
2337	  <title>Non-unique Mapping Standard Containers</title>
2338	  <mediaobject>
2339	    <imageobject>
2340	      <imagedata align="center" format="PNG" scale="100"
2341			 fileref="../images/pbds_embedded_lists_1.png"/>
2342	    </imageobject>
2343	    <textobject>
2344	      <phrase>Non-unique Mapping Standard Containers</phrase>
2345	    </textobject>
2346	  </mediaobject>
2347	</figure>
2348
2349	<para>
2350	  Put differently, the standards' non-unique mapping
2351	  associative-containers are associative containers that map
2352	  primary keys to linked lists that are embedded into the
2353	  container. The graphic below shows again the two
2354	  containers from the first graphic above, this time with
2355	  the embedded linked lists of the grayed nodes marked
2356	  explicitly.
2357	</para>
2358
2359	<figure xml:id="fig.pbds_embedded_lists_2">
2360	  <title>
2361	    Effect of embedded lists in
2362	    <classname>std::multimap</classname>
2363	  </title>
2364	  <mediaobject>
2365	    <imageobject>
2366	      <imagedata align="center" format="PNG" scale="100"
2367			 fileref="../images/pbds_embedded_lists_2.png"/>
2368	    </imageobject>
2369	    <textobject>
2370	      <phrase>
2371		Effect of embedded lists in
2372		<classname>std::multimap</classname>
2373	      </phrase>
2374	    </textobject>
2375	  </mediaobject>
2376	</figure>
2377
2378	<para>
2379	  These embedded linked lists have several disadvantages.
2380	</para>
2381
2382	<orderedlist>
2383	  <listitem>
2384	    <para>
2385	      The underlying data structure embeds the linked lists
2386	      according to its own consideration, which means that the
2387	      search path for a value might include several different
2388	      equivalent-key values. For example, the search path for the
2389	      the black node in either of the first graphic, labels A or B,
2390	      includes more than a single gray node.
2391	    </para>
2392	  </listitem>
2393
2394	  <listitem>
2395	    <para>
2396	      The links of the linked lists are the underlying data
2397	      structures' nodes, which typically are quite structured.  In
2398	      the case of tree-based containers (the grapic above, label
2399	      B), each "link" is actually a node with three pointers (one
2400	      to a parent and two to children), and a
2401	      relatively-complicated iteration algorithm. The linked
2402	      lists, therefore, can take up quite a lot of memory, and
2403	      iterating over all values equal to a given key (through the
2404	      return value of the standard
2405	      library's <function>equal_range</function>) can be
2406	      expensive.
2407	    </para>
2408	  </listitem>
2409
2410	  <listitem>
2411	    <para>
2412	      The primary key is stored multiply; this uses more memory.
2413	    </para>
2414	  </listitem>
2415
2416	  <listitem>
2417	    <para>
2418	      Finally, the interface of this design excludes several
2419	      useful underlying data structures. Of all the unordered
2420	      self-organizing data structures, practically only
2421	      collision-chaining hash tables can (efficiently) guarantee
2422	      that equivalent-key values are stored consecutively.
2423	    </para>
2424	  </listitem>
2425	</orderedlist>
2426
2427	<para>
2428	  The above reasons hold even when the ratio of secondary keys to
2429	  primary keys (or average number of identical keys) is small, but
2430	  when it is large, there are more severe problems:
2431	</para>
2432
2433	<orderedlist>
2434	  <listitem>
2435	    <para>
2436	      The underlying data structures order the links inside each
2437	      embedded linked-lists according to their internal
2438	      considerations, which effectively means that each of the
2439	      links is unordered. Irrespective of the underlying data
2440	      structure, searching for a specific value can degrade to
2441	      linear complexity.
2442	    </para>
2443	  </listitem>
2444
2445	  <listitem>
2446	    <para>
2447	      Similarly to the above point, it is impossible to apply
2448	      to the secondary keys considerations that apply to primary
2449	      keys. For example, it is not possible to maintain secondary
2450	      keys by sorted order.
2451	    </para>
2452	  </listitem>
2453
2454	  <listitem>
2455	    <para>
2456	      While the interface "understands" that all equivalent-key
2457	      values constitute a distinct list (through
2458	      <function>equal_range</function>), the underlying data
2459	      structure typically does not. This means that operations such
2460	      as erasing from a tree-based container all values whose keys
2461	      are equivalent to a a given key can be super-linear in the
2462	      size of the tree; this is also true also for several other
2463	      operations that target a specific list.
2464	    </para>
2465	  </listitem>
2466
2467	</orderedlist>
2468
2469	<para>
2470	  In this library, all associative containers map
2471	  (or store) unique-key values. One can (1) map primary keys to
2472	  secondary associative-containers (containers of
2473	  secondary keys) or non-associative containers (2) map identical
2474	  keys to a size-type representing the number of times they
2475	  occur, or (3) any combination of (1) and (2). Instead of
2476	  allowing multiple equivalent-key values, this library
2477	  supplies associative containers based on underlying
2478	  data structures that are suitable as secondary
2479	  associative-containers.
2480	</para>
2481
2482	<para>
2483	  In the figure below, labels A and B show the equivalent
2484	  underlying data structures in this library, as mapped to the
2485	  first graphic above. Labels A and B, respectively. Each shaded
2486	  box represents some size-type or secondary
2487	  associative-container.
2488	</para>
2489
2490	<figure>
2491	  <title>Non-unique Mapping Containers</title>
2492	  <mediaobject>
2493	    <imageobject>
2494	      <imagedata align="center" format="PNG" scale="100"
2495			 fileref="../images/pbds_embedded_lists_3.png"/>
2496	    </imageobject>
2497	    <textobject>
2498	      <phrase>Non-unique Mapping Containers</phrase>
2499	    </textobject>
2500	  </mediaobject>
2501	</figure>
2502
2503	<para>
2504	  In the first example above, then, one would use an associative
2505	  container mapping each user to an associative container which
2506	  maps each application id to a start time (see
2507	  <filename>example/basic_multimap.cc</filename>); in the second
2508	  example, one would use an associative container mapping
2509	  each <classname>int</classname> to some size-type indicating the
2510	  number of times it logically occurs
2511	  (see <filename>example/basic_multiset.cc</filename>.
2512	</para>
2513
2514	<para>
2515	  See the discussion in list-based container types for containers
2516	  especially suited as secondary associative-containers.
2517	</para>
2518      </section>
2519
2520    </section> <!-- map and set semantics -->
2521
2522    <section xml:id="pbds.design.concepts.iterator_semantics">
2523      <info><title>Iterator Semantics</title></info>
2524
2525      <section xml:id="concepts.iterator_semantics.point_and_range">
2526	<info><title>Point and Range Iterators</title></info>
2527
2528	<para>
2529	  Iterator concepts are bifurcated in this design, and are
2530	  comprised of point-type and range-type iteration.
2531	</para>
2532
2533	<para>
2534	  A point-type iterator is an iterator that refers to a specific
2535	  element as returned through an
2536	  associative-container's <function>find</function> method.
2537	</para>
2538
2539	<para>
2540	  A range-type iterator is an iterator that is used to go over a
2541	  sequence of elements, as returned by a container's
2542	  <function>find</function> method.
2543	</para>
2544
2545	<para>
2546	  A point-type method is a method that
2547	  returns a point-type iterator; a range-type method is a method
2548	  that returns a range-type iterator.
2549	</para>
2550
2551	<para>For most containers, these types are synonymous; for
2552	self-organizing containers, such as hash-based containers or
2553	priority queues, these are inherently different (in any
2554	implementation, including that of C++ standard library
2555	components), but in this design, it is made explicit. They are
2556	distinct types.
2557	</para>
2558      </section>
2559
2560
2561      <section xml:id="concepts.iterator_semantics.both">
2562	<info><title>Distinguishing Point and Range Iterators</title></info>
2563
2564	<para>When using this library, is necessary to differentiate
2565	between two types of methods and iterators: point-type methods and
2566	iterators, and range-type methods and iterators. Each associative
2567	container's interface includes the methods:</para>
2568	<programlisting>
2569	  point_const_iterator
2570	  find(const_key_reference r_key) const;
2571
2572	  point_iterator
2573	  find(const_key_reference r_key);
2574
2575	  std::pair&lt;point_iterator,bool&gt;
2576	  insert(const_reference r_val);
2577	</programlisting>
2578
2579	<para>The relationship between these iterator types varies between
2580	container types. The figure below
2581	shows the most general invariant between point-type and
2582	range-type iterators: In <emphasis>A</emphasis> <literal>iterator</literal>, can
2583	always be converted to <literal>point_iterator</literal>. In <emphasis>B</emphasis>
2584	shows invariants for order-preserving containers: point-type
2585	iterators are synonymous with range-type iterators.
2586	Orthogonally,  <emphasis>C</emphasis>shows invariants for "set"
2587	containers: iterators are synonymous with const iterators.</para>
2588
2589	<figure>
2590	  <title>Point Iterator Hierarchy</title>
2591	  <mediaobject>
2592	    <imageobject>
2593	      <imagedata align="center" format="PNG" scale="100"
2594			 fileref="../images/pbds_point_iterator_hierarchy.png"/>
2595	    </imageobject>
2596	    <textobject>
2597	      <phrase>Point Iterator Hierarchy</phrase>
2598	    </textobject>
2599	  </mediaobject>
2600	</figure>
2601
2602
2603	<para>Note that point-type iterators in self-organizing containers
2604	(hash-based associative containers) lack movement
2605	operators, such as <literal>operator++</literal> - in fact, this
2606	is the reason why this library differentiates from the standard C++ librarys
2607	design on this point.</para>
2608
2609	<para>Typically, one can determine an iterator's movement
2610	capabilities using
2611	<literal>std::iterator_traits&lt;It&gt;iterator_category</literal>,
2612	which is a <literal>struct</literal> indicating the iterator's
2613	movement capabilities. Unfortunately, none of the standard predefined
2614	categories reflect a pointer's <emphasis>not</emphasis> having any
2615	movement capabilities whatsoever. Consequently,
2616	<literal>pb_ds</literal> adds a type
2617	<literal>trivial_iterator_tag</literal> (whose name is taken from
2618	a concept in C++ standardese, which is the category of iterators
2619	with no movement capabilities.) All other standard C++ library
2620	tags, such as <literal>forward_iterator_tag</literal> retain their
2621	common use.</para>
2622
2623      </section>
2624
2625      <section xml:id="pbds.design.concepts.invalidation">
2626	<info><title>Invalidation Guarantees</title></info>
2627	<para>
2628	  If one manipulates a container object, then iterators previously
2629	  obtained from it can be invalidated. In some cases a
2630	  previously-obtained iterator cannot be de-referenced; in other cases,
2631	  the iterator's next or previous element might have changed
2632	  unpredictably. This corresponds exactly to the question whether a
2633	  point-type or range-type iterator (see previous concept) is valid or
2634	  not. In this design, one can query a container (in compile time) about
2635	  its invalidation guarantees.
2636	</para>
2637
2638
2639	<para>
2640	  Given three different types of associative containers, a modifying
2641	  operation (in that example, <function>erase</function>) invalidated
2642	  iterators in three different ways: the iterator of one container
2643	  remained completely valid - it could be de-referenced and
2644	  incremented; the iterator of a different container could not even be
2645	  de-referenced; the iterator of the third container could be
2646	  de-referenced, but its "next" iterator changed unpredictably.
2647	</para>
2648
2649	<para>
2650	  Distinguishing between find and range types allows fine-grained
2651	  invalidation guarantees, because these questions correspond exactly
2652	  to the question of whether point-type iterators and range-type
2653	  iterators are valid. The graphic below shows tags corresponding to
2654	  different types of invalidation guarantees.
2655	</para>
2656
2657	<figure>
2658	  <title>Invalidation Guarantee Tags Hierarchy</title>
2659	  <mediaobject>
2660	    <imageobject>
2661	      <imagedata align="center" format="PDF" scale="75"
2662			 fileref="../images/pbds_invalidation_tag_hierarchy.pdf"/>
2663	    </imageobject>
2664	    <imageobject>
2665	      <imagedata align="center" format="PNG" scale="100"
2666			 fileref="../images/pbds_invalidation_tag_hierarchy.png"/>
2667	    </imageobject>
2668	    <textobject>
2669	      <phrase>Invalidation Guarantee Tags Hierarchy</phrase>
2670	    </textobject>
2671	  </mediaobject>
2672	</figure>
2673
2674	<itemizedlist>
2675	  <listitem>
2676	    <para>
2677	      <classname>basic_invalidation_guarantee</classname>
2678	      corresponds to a basic guarantee that a point-type iterator,
2679	      a found pointer, or a found reference, remains valid as long
2680	      as the container object is not modified.
2681	    </para>
2682	  </listitem>
2683
2684	  <listitem>
2685	    <para>
2686	      <classname>point_invalidation_guarantee</classname>
2687	      corresponds to a guarantee that a point-type iterator, a
2688	      found pointer, or a found reference, remains valid even if
2689	      the container object is modified.
2690	    </para>
2691	  </listitem>
2692
2693	  <listitem>
2694	    <para>
2695	      <classname>range_invalidation_guarantee</classname>
2696	      corresponds to a guarantee that a range-type iterator remains
2697	      valid even if the container object is modified.
2698	    </para>
2699	  </listitem>
2700	</itemizedlist>
2701
2702	<para>To find the invalidation guarantee of a
2703	container, one can use</para>
2704	<programlisting>
2705	  typename container_traits&lt;Cntnr&gt;::invalidation_guarantee
2706	</programlisting>
2707
2708	<para>Note that this hierarchy corresponds to the logic it
2709	represents: if a container has range-invalidation guarantees,
2710	then it must also have find invalidation guarantees;
2711	correspondingly, its invalidation guarantee (in this case
2712	<classname>range_invalidation_guarantee</classname>)
2713	can be cast to its base class (in this case <classname>point_invalidation_guarantee</classname>).
2714	This means that this this hierarchy can be used easily using
2715	standard metaprogramming techniques, by specializing on the
2716	type of <literal>invalidation_guarantee</literal>.</para>
2717
2718	<para>
2719	  These types of problems were addressed, in a more general
2720	  setting, in <xref linkend="biblio.meyers96more"/> - Item 2. In
2721	  our opinion, an invalidation-guarantee hierarchy would solve
2722	  these problems in all container types - not just associative
2723	  containers.
2724	</para>
2725
2726      </section>
2727    </section> <!-- iterator semantics -->
2728
2729    <section xml:id="pbds.design.concepts.genericity">
2730      <info><title>Genericity</title></info>
2731
2732      <para>
2733	The design attempts to address the following problem of
2734	data-structure genericity. When writing a function manipulating
2735	a generic container object, what is the behavior of the object?
2736	Suppose one writes
2737      </para>
2738      <programlisting>
2739	template&lt;typename Cntnr&gt;
2740	void
2741	some_op_sequence(Cntnr &amp;r_container)
2742	{
2743	...
2744	}
2745      </programlisting>
2746
2747      <para>
2748	then one needs to address the following questions in the body
2749	of <function>some_op_sequence</function>:
2750      </para>
2751
2752      <itemizedlist>
2753	<listitem>
2754	  <para>
2755	    Which types and methods does <literal>Cntnr</literal> support?
2756	    Containers based on hash tables can be queries for the
2757	    hash-functor type and object; this is meaningless for tree-based
2758	    containers. Containers based on trees can be split, joined, or
2759	    can erase iterators and return the following iterator; this
2760	    cannot be done by hash-based containers.
2761	  </para>
2762	</listitem>
2763
2764	<listitem>
2765	  <para>
2766	    What are the exception and invalidation guarantees
2767	    of <literal>Cntnr</literal>? A container based on a probing
2768	    hash-table invalidates all iterators when it is modified; this
2769	    is not the case for containers based on node-based
2770	    trees. Containers based on a node-based tree can be split or
2771	    joined without exceptions; this is not the case for containers
2772	    based on vector-based trees.
2773	  </para>
2774	</listitem>
2775
2776	<listitem>
2777	  <para>
2778	    How does the container maintain its elements? Tree-based and
2779	    Trie-based containers store elements by key order; others,
2780	    typically, do not. A container based on a splay trees or lists
2781	    with update policies "cache" "frequently accessed" elements;
2782	    containers based on most other underlying data structures do
2783	    not.
2784	  </para>
2785	</listitem>
2786	<listitem>
2787	  <para>
2788	    How does one query a container about characteristics and
2789	    capabilities? What is the relationship between two different
2790	    data structures, if anything?
2791	  </para>
2792	</listitem>
2793      </itemizedlist>
2794
2795      <para>The remainder of this section explains these issues in
2796      detail.</para>
2797
2798
2799      <section xml:id="concepts.genericity.tag">
2800	<info><title>Tag</title></info>
2801	<para>
2802	  Tags are very useful for manipulating generic types. For example, if
2803	  <literal>It</literal> is an iterator class, then <literal>typename
2804	  It::iterator_category</literal> or <literal>typename
2805	  std::iterator_traits&lt;It&gt;::iterator_category</literal> will
2806	  yield its category, and <literal>typename
2807	  std::iterator_traits&lt;It&gt;::value_type</literal> will yield its
2808	  value type.
2809	</para>
2810
2811	<para>
2812	  This library contains a container tag hierarchy corresponding to the
2813	  diagram below.
2814	</para>
2815
2816	<figure>
2817	  <title>Container Tag Hierarchy</title>
2818	  <mediaobject>
2819	    <imageobject>
2820	      <imagedata align="center" format="PDF" scale="75"
2821			 fileref="../images/pbds_container_tag_hierarchy.pdf"/>
2822	    </imageobject>
2823	    <imageobject>
2824	      <imagedata align="center" format="PNG" scale="100"
2825			 fileref="../images/pbds_container_tag_hierarchy.png"/>
2826	    </imageobject>
2827	    <textobject>
2828	      <phrase>Container Tag Hierarchy</phrase>
2829	    </textobject>
2830	  </mediaobject>
2831	</figure>
2832
2833	<para>
2834	  Given any container <type>Cntnr</type>, the tag of
2835	  the underlying data structure can be found via <literal>typename
2836	  Cntnr::container_category</literal>.
2837	</para>
2838
2839      </section> <!-- tag -->
2840
2841      <section xml:id="concepts.genericity.traits">
2842	<info><title>Traits</title></info>
2843	<para></para>
2844
2845	<para>Additionally, a traits mechanism can be used to query a
2846	container type for its attributes. Given any container
2847	<literal>Cntnr</literal>, then <literal>&lt;Cntnr&gt;</literal>
2848	is a traits class identifying the properties of the
2849	container.</para>
2850
2851	<para>To find if a container can throw when a key is erased (which
2852	is true for vector-based trees, for example), one can
2853	use
2854	</para>
2855	<programlisting>container_traits&lt;Cntnr&gt;::erase_can_throw</programlisting>
2856
2857	<para>
2858	  Some of the definitions in <classname>container_traits</classname>
2859	  are dependent on other
2860	  definitions. If <classname>container_traits&lt;Cntnr&gt;::order_preserving</classname>
2861	  is <constant>true</constant> (which is the case for containers
2862	  based on trees and tries), then the container can be split or
2863	  joined; in this
2864	  case, <classname>container_traits&lt;Cntnr&gt;::split_join_can_throw</classname>
2865	  indicates whether splits or joins can throw exceptions (which is
2866	  true for vector-based trees);
2867	  otherwise <classname>container_traits&lt;Cntnr&gt;::split_join_can_throw</classname>
2868	  will yield a compilation error. (This is somewhat similar to a
2869	  compile-time version of the COM model).
2870	</para>
2871
2872      </section> <!-- traits -->
2873
2874    </section> <!-- genericity -->
2875  </section> <!-- concepts -->
2876
2877  <section xml:id="pbds.design.container">
2878    <info><title>By Container</title></info>
2879
2880    <!-- hash -->
2881    <section xml:id="pbds.design.container.hash">
2882      <info><title>hash</title></info>
2883
2884      <!--
2885
2886// hash policies
2887/// general terms / background
2888/// range hashing policies
2889/// ranged-hash policies
2890/// implementation
2891
2892// resize policies
2893/// general
2894/// size policies
2895/// trigger policies
2896/// implementation
2897
2898// policy interactions
2899/// probe/size/trigger
2900/// hash/trigger
2901/// eq/hash/storing hash values
2902/// size/load-check trigger
2903      -->
2904      <section xml:id="container.hash.interface">
2905	<info><title>Interface</title></info>
2906
2907
2908
2909	<para>
2910	  The collision-chaining hash-based container has the
2911	following declaration.</para>
2912	<programlisting>
2913	  template&lt;
2914	  typename Key,
2915	  typename Mapped,
2916	  typename Hash_Fn = std::hash&lt;Key&gt;,
2917	  typename Eq_Fn = std::equal_to&lt;Key&gt;,
2918	  typename Comb_Hash_Fn =  direct_mask_range_hashing&lt;&gt;
2919	  typename Resize_Policy = default explained below.
2920	  bool Store_Hash = false,
2921	  typename Allocator = std::allocator&lt;char&gt; &gt;
2922	  class cc_hash_table;
2923	</programlisting>
2924
2925	<para>The parameters have the following meaning:</para>
2926
2927	<orderedlist>
2928	  <listitem><para><classname>Key</classname> is the key type.</para></listitem>
2929
2930	  <listitem><para><classname>Mapped</classname> is the mapped-policy.</para></listitem>
2931
2932	  <listitem><para><classname>Hash_Fn</classname> is a key hashing functor.</para></listitem>
2933
2934	  <listitem><para><classname>Eq_Fn</classname> is a key equivalence functor.</para></listitem>
2935
2936	  <listitem><para><classname>Comb_Hash_Fn</classname> is a range-hashing_functor;
2937	  it describes how to translate hash values into positions
2938	  within the table. </para></listitem>
2939
2940	  <listitem><para><classname>Resize_Policy</classname> describes how a container object
2941	  should change its internal size. </para></listitem>
2942
2943	  <listitem><para><classname>Store_Hash</classname> indicates whether the hash value
2944	  should be stored with each entry. </para></listitem>
2945
2946	  <listitem><para><classname>Allocator</classname> is an allocator
2947	  type.</para></listitem>
2948	</orderedlist>
2949
2950	<para>The probing hash-based container has the following
2951	declaration.</para>
2952	<programlisting>
2953	  template&lt;
2954	  typename Key,
2955	  typename Mapped,
2956	  typename Hash_Fn = std::hash&lt;Key&gt;,
2957	  typename Eq_Fn = std::equal_to&lt;Key&gt;,
2958	  typename Comb_Probe_Fn = direct_mask_range_hashing&lt;&gt;
2959	  typename Probe_Fn = default explained below.
2960	  typename Resize_Policy = default explained below.
2961	  bool Store_Hash = false,
2962	  typename Allocator =  std::allocator&lt;char&gt; &gt;
2963	  class gp_hash_table;
2964	</programlisting>
2965
2966	<para>The parameters are identical to those of the
2967	collision-chaining container, except for the following.</para>
2968
2969	<orderedlist>
2970	  <listitem><para><classname>Comb_Probe_Fn</classname> describes how to transform a probe
2971	  sequence into a sequence of positions within the table.</para></listitem>
2972
2973	  <listitem><para><classname>Probe_Fn</classname> describes a probe sequence policy.</para></listitem>
2974	</orderedlist>
2975
2976	<para>Some of the default template values depend on the values of
2977	other parameters, and are explained below.</para>
2978
2979      </section>
2980      <section xml:id="container.hash.details">
2981	<info><title>Details</title></info>
2982
2983	<section xml:id="container.hash.details.hash_policies">
2984	  <info><title>Hash Policies</title></info>
2985
2986	  <section xml:id="details.hash_policies.general">
2987	    <info><title>General</title></info>
2988
2989	    <para>Following is an explanation of some functions which hashing
2990	    involves. The graphic below illustrates the discussion.</para>
2991
2992	    <figure>
2993	      <title>Hash functions, ranged-hash functions, and
2994	      range-hashing functions</title>
2995	      <mediaobject>
2996		<imageobject>
2997		  <imagedata align="center" format="PNG" scale="100"
2998			     fileref="../images/pbds_hash_ranged_hash_range_hashing_fns.png"/>
2999		</imageobject>
3000		<textobject>
3001		  <phrase>Hash functions, ranged-hash functions, and
3002		  range-hashing functions</phrase>
3003		</textobject>
3004	      </mediaobject>
3005	    </figure>
3006	
3007	    <para>Let U be a domain (e.g., the integers, or the
3008	    strings of 3 characters). A hash-table algorithm needs to map
3009	    elements of U "uniformly" into the range [0,..., m -
3010	    1] (where m is a non-negative integral value, and
3011	    is, in general, time varying). I.e., the algorithm needs
3012	    a ranged-hash function</para>
3013
3014	    <para>
3015	      f : U �� Z<subscript>+</subscript> ��� Z<subscript>+</subscript>
3016	    </para>
3017
3018	    <para>such that for any u in U ,</para>
3019
3020	    <para>0 ��� f(u, m) ��� m - 1</para>
3021
3022	    <para>and which has "good uniformity" properties (say
3023	    <xref linkend="biblio.knuth98sorting"/>.)
3024	    One
3025	    common solution is to use the composition of the hash
3026	    function</para>
3027
3028	    <para>h : U ��� Z<subscript>+</subscript> ,</para>
3029
3030	    <para>which maps elements of U into the non-negative
3031	    integrals, and</para>
3032
3033	    <para>g : Z<subscript>+</subscript> �� Z<subscript>+</subscript> ���
3034	    Z<subscript>+</subscript>,</para>
3035
3036	    <para>which maps a non-negative hash value, and a non-negative
3037	    range upper-bound into a non-negative integral in the range
3038	    between 0 (inclusive) and the range upper bound (exclusive),
3039	    i.e., for any r in Z<subscript>+</subscript>,</para>
3040
3041	    <para>0 ��� g(r, m) ��� m - 1</para>
3042
3043
3044	    <para>The resulting ranged-hash function, is</para>
3045
3046	    <!-- ranged_hash_composed_of_hash_and_range_hashing -->
3047	    <equation>
3048	      <title>Ranged Hash Function</title>
3049	      <mathphrase>
3050		f(u , m) = g(h(u), m)
3051	      </mathphrase>
3052	    </equation>
3053
3054	    <para>From the above, it is obvious that given g and
3055	    h, f can always be composed (however the converse
3056	    is not true). The standard's hash-based containers allow specifying
3057	    a hash function, and use a hard-wired range-hashing function;
3058	    the ranged-hash function is implicitly composed.</para>
3059
3060	    <para>The above describes the case where a key is to be mapped
3061	    into a single position within a hash table, e.g.,
3062	    in a collision-chaining table. In other cases, a key is to be
3063	    mapped into a sequence of positions within a table,
3064	    e.g., in a probing table. Similar terms apply in this
3065	    case: the table requires a ranged probe function,
3066	    mapping a key into a sequence of positions withing the table.
3067	    This is typically achieved by composing a hash function
3068	    mapping the key into a non-negative integral type, a
3069	    probe function transforming the hash value into a
3070	    sequence of hash values, and a range-hashing function
3071	    transforming the sequence of hash values into a sequence of
3072	    positions.</para>
3073
3074	  </section>
3075
3076	  <section xml:id="details.hash_policies.range">
3077	    <info><title>Range Hashing</title></info>
3078
3079	    <para>Some common choices for range-hashing functions are the
3080	    division, multiplication, and middle-square methods (<xref linkend="biblio.knuth98sorting"/>), defined
3081	    as</para>
3082
3083	    <equation>
3084	      <title>Range-Hashing, Division Method</title>
3085	      <mathphrase>
3086		g(r, m) = r mod m
3087	      </mathphrase>
3088	    </equation>
3089
3090
3091
3092	    <para>g(r, m) = ��� u/v ( a r mod v ) ���</para>
3093
3094	    <para>and</para>
3095
3096	    <para>g(r, m) = ��� u/v ( r<superscript>2</superscript> mod v ) ���</para>
3097
3098	    <para>respectively, for some positive integrals u and
3099	    v (typically powers of 2), and some a. Each of
3100	    these range-hashing functions works best for some different
3101	    setting.</para>
3102
3103	    <para>The division method (see above) is a
3104	    very common choice. However, even this single method can be
3105	    implemented in two very different ways. It is possible to
3106	    implement using the low
3107	    level % (modulo) operation (for any m), or the
3108	    low level &amp; (bit-mask) operation (for the case where
3109	    m is a power of 2), i.e.,</para>
3110
3111	    <equation>
3112	      <title>Division via Prime Modulo</title>
3113	      <mathphrase>
3114		g(r, m) = r % m
3115	      </mathphrase>
3116	    </equation>
3117
3118	    <para>and</para>
3119
3120	    <equation>
3121	      <title>Division via Bit Mask</title>
3122	      <mathphrase>
3123		g(r, m) = r &amp; m - 1, (with m =
3124		2<superscript>k</superscript> for some k)
3125	      </mathphrase>
3126	    </equation>
3127
3128
3129	    <para>respectively.</para>
3130
3131	    <para>The % (modulo) implementation has the advantage that for
3132	    m a prime far from a power of 2, g(r, m) is
3133	    affected by all the bits of r (minimizing the chance of
3134	    collision). It has the disadvantage of using the costly modulo
3135	    operation. This method is hard-wired into SGI's implementation
3136	    .</para>
3137
3138	    <para>The &amp; (bit-mask) implementation has the advantage of
3139	    relying on the fast bit-wise and operation. It has the
3140	    disadvantage that for g(r, m) is affected only by the
3141	    low order bits of r. This method is hard-wired into
3142	    Dinkumware's implementation.</para>
3143
3144
3145	  </section>
3146
3147	  <section xml:id="details.hash_policies.ranged">
3148	    <info><title>Ranged Hash</title></info>
3149
3150	    <para>In cases it is beneficial to allow the
3151	    client to directly specify a ranged-hash hash function. It is
3152	    true, that the writer of the ranged-hash function cannot rely
3153	    on the values of m having specific numerical properties
3154	    suitable for hashing (in the sense used in <xref linkend="biblio.knuth98sorting"/>), since
3155	    the values of m are determined by a resize policy with
3156	    possibly orthogonal considerations.</para>
3157
3158	    <para>There are two cases where a ranged-hash function can be
3159	    superior. The firs is when using perfect hashing: the
3160	    second is when the values of m can be used to estimate
3161	    the "general" number of distinct values required. This is
3162	    described in the following.</para>
3163
3164	    <para>Let</para>
3165
3166	    <para>
3167	      s = [ s<subscript>0</subscript>,..., s<subscript>t - 1</subscript>]
3168	    </para>
3169
3170	    <para>be a string of t characters, each of which is from
3171	    domain S. Consider the following ranged-hash
3172	    function:</para>
3173	    <equation>
3174	      <title>
3175		A Standard String Hash Function
3176	      </title>
3177	      <mathphrase>
3178		f<subscript>1</subscript>(s, m) = ��� <subscript>i =
3179		0</subscript><superscript>t - 1</superscript> s<subscript>i</subscript> a<superscript>i</superscript> mod m
3180	      </mathphrase>
3181	    </equation>
3182	
3183
3184	    <para>where a is some non-negative integral value. This is
3185	    the standard string-hashing function used in SGI's
3186	    implementation (with a = 5). Its advantage is that
3187	    it takes into account all of the characters of the string.</para>
3188
3189	    <para>Now assume that s is the string representation of a
3190	    of a long DNA sequence (and so S = {'A', 'C', 'G',
3191	    'T'}). In this case, scanning the entire string might be
3192	    prohibitively expensive. A possible alternative might be to use
3193	    only the first k characters of the string, where</para>
3194
3195	    <para>|S|<superscript>k</superscript> ��� m ,</para>
3196
3197	    <para>i.e., using the hash function</para>
3198
3199	    <equation>
3200	      <title>
3201		Only k String DNA Hash
3202	      </title>
3203	      <mathphrase>
3204		f<subscript>2</subscript>(s, m) = ��� <subscript>i
3205		= 0</subscript><superscript>k - 1</superscript> s<subscript>i</subscript> a<superscript>i</superscript> mod m
3206	      </mathphrase>
3207	    </equation>
3208
3209	    <para>requiring scanning over only</para>
3210
3211	    <para>k = log<subscript>4</subscript>( m )</para>
3212
3213	    <para>characters.</para>
3214
3215	    <para>Other more elaborate hash-functions might scan k
3216	    characters starting at a random position (determined at each
3217	    resize), or scanning k random positions (determined at
3218	    each resize), i.e., using</para>
3219
3220	    <para>f<subscript>3</subscript>(s, m) = ��� <subscript>i =
3221	    r</subscript>0<superscript>r<subscript>0</subscript> + k - 1</superscript> s<subscript>i</subscript>
3222	    a<superscript>i</superscript> mod m ,</para>
3223
3224	    <para>or</para>
3225
3226	    <para>f<subscript>4</subscript>(s, m) = ��� <subscript>i = 0</subscript><superscript>k -
3227	    1</superscript> s<subscript>r</subscript>i a<superscript>r<subscript>i</subscript></superscript> mod
3228	    m ,</para>
3229
3230	    <para>respectively, for r<subscript>0</subscript>,..., r<subscript>k-1</subscript>
3231	    each in the (inclusive) range [0,...,t-1].</para>
3232
3233	    <para>It should be noted that the above functions cannot be
3234	    decomposed as per a ranged hash composed of hash and range hashing.</para>
3235
3236
3237	  </section>
3238
3239	  <section xml:id="details.hash_policies.implementation">
3240	    <info><title>Implementation</title></info>
3241
3242	    <para>This sub-subsection describes the implementation of
3243	    the above in this library. It first explains range-hashing
3244	    functions in collision-chaining tables, then ranged-hash
3245	    functions in collision-chaining tables, then probing-based
3246	    tables, and finally lists the relevant classes in this
3247	    library.</para>
3248
3249	    <section xml:id="hash_policies.implementation.collision-chaining">
3250	      <info><title>
3251		Range-Hashing and Ranged-Hashes in Collision-Chaining Tables
3252	      </title></info>
3253
3254
3255	      <para><classname>cc_hash_table</classname> is
3256	      parametrized by <classname>Hash_Fn</classname> and <classname>Comb_Hash_Fn</classname>, a
3257	      hash functor and a combining hash functor, respectively.</para>
3258
3259	      <para>In general, <classname>Comb_Hash_Fn</classname> is considered a
3260	      range-hashing functor. <classname>cc_hash_table</classname>
3261	      synthesizes a ranged-hash function from <classname>Hash_Fn</classname> and
3262	      <classname>Comb_Hash_Fn</classname>. The figure below shows an <classname>insert</classname> sequence
3263	      diagram for this case. The user inserts an element (point A),
3264	      the container transforms the key into a non-negative integral
3265	      using the hash functor (points B and C), and transforms the
3266	      result into a position using the combining functor (points D
3267	      and E).</para>
3268
3269	      <figure>
3270		<title>Insert hash sequence diagram</title>
3271		<mediaobject>
3272		  <imageobject>
3273		    <imagedata align="center" format="PNG" scale="100"
3274			       fileref="../images/pbds_hash_range_hashing_seq_diagram.png"/>
3275		  </imageobject>
3276		  <textobject>
3277		    <phrase>Insert hash sequence diagram</phrase>
3278		  </textobject>
3279		</mediaobject>
3280	      </figure>
3281	
3282	      <para>If <classname>cc_hash_table</classname>'s
3283	      hash-functor, <classname>Hash_Fn</classname> is instantiated by <classname>null_type</classname> , then <classname>Comb_Hash_Fn</classname> is taken to be
3284	      a ranged-hash function. The graphic below shows an <function>insert</function> sequence
3285	      diagram. The user inserts an element (point A), the container
3286	      transforms the key into a position using the combining functor
3287	      (points B and C).</para>
3288
3289	      <figure>
3290		<title>Insert hash sequence diagram with a null policy</title>
3291		<mediaobject>
3292		  <imageobject>
3293		    <imagedata align="center" format="PNG" scale="100"
3294			       fileref="../images/pbds_hash_range_hashing_seq_diagram2.png"/>
3295		  </imageobject>
3296		  <textobject>
3297		    <phrase>Insert hash sequence diagram with a null policy</phrase>
3298		  </textobject>
3299		</mediaobject>
3300	      </figure>
3301	
3302	    </section>
3303
3304	    <section xml:id="hash_policies.implementation.probe">
3305	      <info><title>
3306		Probing tables
3307	      </title></info>
3308	      <para><classname>gp_hash_table</classname> is parametrized by
3309	      <classname>Hash_Fn</classname>, <classname>Probe_Fn</classname>,
3310	      and <classname>Comb_Probe_Fn</classname>. As before, if
3311	      <classname>Hash_Fn</classname> and <classname>Probe_Fn</classname>
3312	      are both <classname>null_type</classname>, then
3313	      <classname>Comb_Probe_Fn</classname> is a ranged-probe
3314	      functor. Otherwise, <classname>Hash_Fn</classname> is a hash
3315	      functor, <classname>Probe_Fn</classname> is a functor for offsets
3316	      from a hash value, and <classname>Comb_Probe_Fn</classname>
3317	      transforms a probe sequence into a sequence of positions within
3318	      the table.</para>
3319
3320	    </section>
3321
3322	    <section xml:id="hash_policies.implementation.predefined">
3323	      <info><title>
3324		Pre-Defined Policies
3325	      </title></info>
3326
3327	      <para>This library contains some pre-defined classes
3328	      implementing range-hashing and probing functions:</para>
3329
3330	      <orderedlist>
3331		<listitem><para><classname>direct_mask_range_hashing</classname>
3332		and <classname>direct_mod_range_hashing</classname>
3333		are range-hashing functions based on a bit-mask and a modulo
3334		operation, respectively.</para></listitem>
3335
3336		<listitem><para><classname>linear_probe_fn</classname>, and
3337		<classname>quadratic_probe_fn</classname> are
3338		a linear probe and a quadratic probe function,
3339		respectively.</para></listitem>
3340	      </orderedlist>
3341
3342	      <para>
3343		The graphic below shows the relationships.
3344	      </para>
3345	      <figure>
3346		<title>Hash policy class diagram</title>
3347		<mediaobject>
3348		  <imageobject>
3349		    <imagedata align="center" format="PNG" scale="100"
3350			       fileref="../images/pbds_hash_policy_cd.png"/>
3351		  </imageobject>
3352		  <textobject>
3353		    <phrase>Hash policy class diagram</phrase>
3354		  </textobject>
3355		</mediaobject>
3356	      </figure>
3357
3358
3359	    </section>
3360
3361	  </section> <!-- impl -->
3362
3363	</section>
3364
3365	<section xml:id="container.hash.details.resize_policies">
3366	  <info><title>Resize Policies</title></info>
3367
3368	  <section xml:id="resize_policies.general">
3369	    <info><title>General</title></info>
3370
3371	    <para>Hash-tables, as opposed to trees, do not naturally grow or
3372	    shrink. It is necessary to specify policies to determine how
3373	    and when a hash table should change its size. Usually, resize
3374	    policies can be decomposed into orthogonal policies:</para>
3375
3376	    <orderedlist>
3377	      <listitem><para>A size policy indicating how a hash table
3378	      should grow (e.g., it should multiply by powers of
3379	      2).</para></listitem>
3380
3381	      <listitem><para>A trigger policy indicating when a hash
3382	      table should grow (e.g., a load factor is
3383	      exceeded).</para></listitem>
3384	    </orderedlist>
3385
3386	  </section>
3387
3388	  <section xml:id="resize_policies.size">
3389	    <info><title>Size Policies</title></info>
3390
3391
3392	    <para>Size policies determine how a hash table changes size. These
3393	    policies are simple, and there are relatively few sensible
3394	    options. An exponential-size policy (with the initial size and
3395	    growth factors both powers of 2) works well with a mask-based
3396	    range-hashing function, and is the
3397	    hard-wired policy used by Dinkumware. A
3398	    prime-list based policy works well with a modulo-prime range
3399	    hashing function and is the hard-wired policy used by SGI's
3400	    implementation.</para>
3401
3402	  </section>
3403
3404	  <section xml:id="resize_policies.trigger">
3405	    <info><title>Trigger Policies</title></info>
3406
3407	    <para>Trigger policies determine when a hash table changes size.
3408	    Following is a description of two policies: load-check
3409	    policies, and collision-check policies.</para>
3410
3411	    <para>Load-check policies are straightforward. The user specifies
3412	    two factors, ��<subscript>min</subscript> and
3413	    ��<subscript>max</subscript>, and the hash table maintains the
3414	    invariant that</para>
3415
3416	    <para>��<subscript>min</subscript> ��� (number of
3417	    stored elements) / (hash-table size) ���
3418	    ��<subscript>max</subscript>
3419            <!-- <remark>load factor min max</remark> -->
3420            </para>
3421
3422	    <para>Collision-check policies work in the opposite direction of
3423	    load-check policies. They focus on keeping the number of
3424	    collisions moderate and hoping that the size of the table will
3425	    not grow very large, instead of keeping a moderate load-factor
3426	    and hoping that the number of collisions will be small. A
3427	    maximal collision-check policy resizes when the longest
3428	    probe-sequence grows too large.</para>
3429
3430	    <para>Consider the graphic below. Let the size of the hash table
3431	    be denoted by m, the length of a probe sequence be denoted by k,
3432	    and some load factor be denoted by ��. We would like to
3433	    calculate the minimal length of k, such that if there were ��
3434	    m elements in the hash table, a probe sequence of length k would
3435	    be found with probability at most 1/m.</para>
3436
3437	    <figure>
3438	      <title>Balls and bins</title>
3439	      <mediaobject>
3440		<imageobject>
3441		  <imagedata align="center" format="PNG" scale="100"
3442			     fileref="../images/pbds_balls_and_bins.png"/>
3443		</imageobject>
3444		<textobject>
3445		  <phrase>Balls and bins</phrase>
3446		</textobject>
3447	      </mediaobject>
3448	    </figure>
3449
3450	    <para>Denote the probability that a probe sequence of length
3451	    k appears in bin i by p<subscript>i</subscript>, the
3452	    length of the probe sequence of bin i by
3453	    l<subscript>i</subscript>, and assume uniform distribution. Then</para>
3454
3455
3456
3457	    <equation>
3458	      <title>
3459		Probability of Probe Sequence of Length k
3460	      </title>
3461	      <mathphrase>
3462		p<subscript>1</subscript> =
3463	      </mathphrase>
3464	    </equation>
3465
3466	    <para>P(l<subscript>1</subscript> ��� k) =</para>
3467
3468	    <para>
3469	      P(l<subscript>1</subscript> ��� �� ( 1 + k / �� - 1) ��� (a)
3470	    </para>
3471
3472	    <para>
3473	      e ^ ( - ( �� ( k / �� - 1 )<superscript>2</superscript> ) /2)
3474	    </para>
3475
3476	    <para>where (a) follows from the Chernoff bound (<xref linkend="biblio.motwani95random"/>). To
3477	    calculate the probability that some bin contains a probe
3478	    sequence greater than k, we note that the
3479	    l<subscript>i</subscript> are negatively-dependent
3480	    (<xref linkend="biblio.dubhashi98neg"/>)
3481	    . Let
3482	    I(.) denote the indicator function. Then</para>
3483
3484	    <equation>
3485	      <title>
3486		Probability Probe Sequence in Some Bin
3487	      </title>
3488	      <mathphrase>
3489		P( exists<subscript>i</subscript> l<subscript>i</subscript> ��� k ) =
3490	      </mathphrase>
3491	    </equation>
3492
3493	    <para>P ( ��� <subscript>i = 1</subscript><superscript>m</superscript>
3494	    I(l<subscript>i</subscript> ��� k) ��� 1 ) =</para>
3495
3496	    <para>P ( ��� <subscript>i = 1</subscript><superscript>m</superscript> I (
3497	    l<subscript>i</subscript> ��� k ) ��� m p<subscript>1</subscript> ( 1 + 1 / (m
3498	    p<subscript>1</subscript>) - 1 ) ) ��� (a)</para>
3499
3500	    <para>e ^ ( ( - m p<subscript>1</subscript> ( 1 / (m p<subscript>1</subscript>)
3501	    - 1 ) <superscript>2</superscript> ) / 2 ) ,</para>
3502
3503	    <para>where (a) follows from the fact that the Chernoff bound can
3504	    be applied to negatively-dependent variables (<xref
3505	    linkend="biblio.dubhashi98neg"/>). Inserting the first probability
3506	    equation into the second one, and equating with 1/m, we
3507	    obtain</para>
3508
3509
3510	    <para>k ~ ��� ( 2 �� ln 2 m ln(m) )
3511	    ) .</para>
3512
3513	  </section>
3514
3515	  <section xml:id="resize_policies.impl">
3516	    <info><title>Implementation</title></info>
3517
3518	    <para>This sub-subsection describes the implementation of the
3519	    above in this library. It first describes resize policies and
3520	    their decomposition into trigger and size policies, then
3521	    describes pre-defined classes, and finally discusses controlled
3522	    access the policies' internals.</para>
3523
3524	    <section xml:id="resize_policies.impl.decomposition">
3525	      <info><title>Decomposition</title></info>
3526
3527
3528	      <para>Each hash-based container is parametrized by a
3529	      <classname>Resize_Policy</classname> parameter; the container derives
3530	      <classname>public</classname>ly from <classname>Resize_Policy</classname>. For
3531	      example:</para>
3532	      <programlisting>
3533		cc_hash_table&lt;typename Key,
3534		typename Mapped,
3535		...
3536		typename Resize_Policy
3537		...&gt; : public Resize_Policy
3538	      </programlisting>
3539
3540	      <para>As a container object is modified, it continuously notifies
3541	      its <classname>Resize_Policy</classname> base of internal changes
3542	      (e.g., collisions encountered and elements being
3543	      inserted). It queries its <classname>Resize_Policy</classname> base whether
3544	      it needs to be resized, and if so, to what size.</para>
3545
3546	      <para>The graphic below shows a (possible) sequence diagram
3547	      of an insert operation. The user inserts an element; the hash
3548	      table notifies its resize policy that a search has started
3549	      (point A); in this case, a single collision is encountered -
3550	      the table notifies its resize policy of this (point B); the
3551	      container finally notifies its resize policy that the search
3552	      has ended (point C); it then queries its resize policy whether
3553	      a resize is needed, and if so, what is the new size (points D
3554	      to G); following the resize, it notifies the policy that a
3555	      resize has completed (point H); finally, the element is
3556	      inserted, and the policy notified (point I).</para>
3557
3558	      <figure>
3559		<title>Insert resize sequence diagram</title>
3560		<mediaobject>
3561		  <imageobject>
3562		    <imagedata align="center" format="PNG" scale="100"
3563			       fileref="../images/pbds_insert_resize_sequence_diagram1.png"/>
3564		  </imageobject>
3565		  <textobject>
3566		    <phrase>Insert resize sequence diagram</phrase>
3567		  </textobject>
3568		</mediaobject>
3569	      </figure>
3570
3571
3572	      <para>In practice, a resize policy can be usually orthogonally
3573	      decomposed to a size policy and a trigger policy. Consequently,
3574	      the library contains a single class for instantiating a resize
3575	      policy: <classname>hash_standard_resize_policy</classname>
3576	      is parametrized by <classname>Size_Policy</classname> and
3577	      <classname>Trigger_Policy</classname>, derives <classname>public</classname>ly from
3578	      both, and acts as a standard delegate (<xref linkend="biblio.gof"/>)
3579	      to these policies.</para>
3580
3581	      <para>The two graphics immediately below show sequence diagrams
3582	      illustrating the interaction between the standard resize policy
3583	      and its trigger and size policies, respectively.</para>
3584
3585	      <figure>
3586		<title>Standard resize policy trigger sequence
3587		diagram</title>
3588		<mediaobject>
3589		  <imageobject>
3590		    <imagedata align="center" format="PNG" scale="100"
3591			       fileref="../images/pbds_insert_resize_sequence_diagram2.png"/>
3592		  </imageobject>
3593		  <textobject>
3594		    <phrase>Standard resize policy trigger sequence
3595		    diagram</phrase>
3596		  </textobject>
3597		</mediaobject>
3598	      </figure>
3599
3600	      <figure>
3601		<title>Standard resize policy size sequence
3602		diagram</title>
3603		<mediaobject>
3604		  <imageobject>
3605		    <imagedata align="center" format="PNG" scale="100"
3606			       fileref="../images/pbds_insert_resize_sequence_diagram3.png"/>
3607		  </imageobject>
3608		  <textobject>
3609		    <phrase>Standard resize policy size sequence
3610		    diagram</phrase>
3611		  </textobject>
3612		</mediaobject>
3613	      </figure>
3614
3615
3616	    </section>
3617
3618	    <section xml:id="resize_policies.impl.predefined">
3619	      <info><title>Predefined Policies</title></info>
3620	      <para>The library includes the following
3621	      instantiations of size and trigger policies:</para>
3622
3623	      <orderedlist>
3624		<listitem><para><classname>hash_load_check_resize_trigger</classname>
3625		implements a load check trigger policy.</para></listitem>
3626
3627		<listitem><para><classname>cc_hash_max_collision_check_resize_trigger</classname>
3628		implements a collision check trigger policy.</para></listitem>
3629
3630		<listitem><para><classname>hash_exponential_size_policy</classname>
3631		implements an exponential-size policy (which should be used
3632		with mask range hashing).</para></listitem>
3633
3634		<listitem><para><classname>hash_prime_size_policy</classname>
3635		implementing a size policy based on a sequence of primes
3636		(which should
3637		be used with mod range hashing</para></listitem>
3638	      </orderedlist>
3639
3640	      <para>The graphic below gives an overall picture of the resize-related
3641	      classes. <classname>basic_hash_table</classname>
3642	      is parametrized by <classname>Resize_Policy</classname>, which it subclasses
3643	      publicly. This class is currently instantiated only by <classname>hash_standard_resize_policy</classname>.
3644	      <classname>hash_standard_resize_policy</classname>
3645	      itself is parametrized by <classname>Trigger_Policy</classname> and
3646	      <classname>Size_Policy</classname>. Currently, <classname>Trigger_Policy</classname> is
3647	      instantiated by <classname>hash_load_check_resize_trigger</classname>,
3648	      or <classname>cc_hash_max_collision_check_resize_trigger</classname>;
3649	      <classname>Size_Policy</classname> is instantiated by <classname>hash_exponential_size_policy</classname>,
3650	      or <classname>hash_prime_size_policy</classname>.</para>
3651
3652	    </section>
3653
3654	    <section xml:id="resize_policies.impl.internals">
3655	      <info><title>Controling Access to Internals</title></info>
3656
3657	      <para>There are cases where (controlled) access to resize
3658	      policies' internals is beneficial. E.g., it is sometimes
3659	      useful to query a hash-table for the table's actual size (as
3660	      opposed to its <function>size()</function> - the number of values it
3661	      currently holds); it is sometimes useful to set a table's
3662	      initial size, externally resize it, or change load factors.</para>
3663
3664	      <para>Clearly, supporting such methods both decreases the
3665	      encapsulation of hash-based containers, and increases the
3666	      diversity between different associative-containers' interfaces.
3667	      Conversely, omitting such methods can decrease containers'
3668	      flexibility.</para>
3669
3670	      <para>In order to avoid, to the extent possible, the above
3671	      conflict, the hash-based containers themselves do not address
3672	      any of these questions; this is deferred to the resize policies,
3673	      which are easier to change or replace. Thus, for example,
3674	      neither <classname>cc_hash_table</classname> nor
3675	      <classname>gp_hash_table</classname>
3676	      contain methods for querying the actual size of the table; this
3677	      is deferred to <classname>hash_standard_resize_policy</classname>.</para>
3678
3679	      <para>Furthermore, the policies themselves are parametrized by
3680	      template arguments that determine the methods they support
3681	      (
3682	      <xref linkend="biblio.alexandrescu01modern"/>
3683	      shows techniques for doing so). <classname>hash_standard_resize_policy</classname>
3684	      is parametrized by <classname>External_Size_Access</classname> that
3685	      determines whether it supports methods for querying the actual
3686	      size of the table or resizing it. <classname>hash_load_check_resize_trigger</classname>
3687	      is parametrized by <classname>External_Load_Access</classname> that
3688	      determines whether it supports methods for querying or
3689	      modifying the loads. <classname>cc_hash_max_collision_check_resize_trigger</classname>
3690	      is parametrized by <classname>External_Load_Access</classname> that
3691	      determines whether it supports methods for querying the
3692	      load.</para>
3693
3694	      <para>Some operations, for example, resizing a container at
3695	      run time, or changing the load factors of a load-check trigger
3696	      policy, require the container itself to resize. As mentioned
3697	      above, the hash-based containers themselves do not contain
3698	      these types of methods, only their resize policies.
3699	      Consequently, there must be some mechanism for a resize policy
3700	      to manipulate the hash-based container. As the hash-based
3701	      container is a subclass of the resize policy, this is done
3702	      through virtual methods. Each hash-based container has a
3703	      <classname>private</classname> <classname>virtual</classname> method:</para>
3704	      <programlisting>
3705		virtual void
3706		do_resize
3707		(size_type new_size);
3708	      </programlisting>
3709
3710	      <para>which resizes the container. Implementations of
3711	      <classname>Resize_Policy</classname> can export public methods for resizing
3712	      the container externally; these methods internally call
3713	      <classname>do_resize</classname> to resize the table.</para>
3714
3715
3716	    </section>
3717
3718	  </section>
3719
3720
3721	</section> <!-- resize policies -->
3722
3723	<section xml:id="container.hash.details.policy_interaction">
3724	  <info><title>Policy Interactions</title></info>
3725	  <para>
3726	  </para>
3727	  <para>Hash-tables are unfortunately especially susceptible to
3728	  choice of policies. One of the more complicated aspects of this
3729	  is that poor combinations of good policies can form a poor
3730	  container. Following are some considerations.</para>
3731
3732	  <section xml:id="policy_interaction.probesizetrigger">
3733	    <info><title>probe/size/trigger</title></info>
3734
3735	    <para>Some combinations do not work well for probing containers.
3736	    For example, combining a quadratic probe policy with an
3737	    exponential size policy can yield a poor container: when an
3738	    element is inserted, a trigger policy might decide that there
3739	    is no need to resize, as the table still contains unused
3740	    entries; the probe sequence, however, might never reach any of
3741	    the unused entries.</para>
3742
3743	    <para>Unfortunately, this library cannot detect such problems at
3744	    compilation (they are halting reducible). It therefore defines
3745	    an exception class <classname>insert_error</classname> to throw an
3746	    exception in this case.</para>
3747
3748	  </section>
3749
3750	  <section xml:id="policy_interaction.hashtrigger">
3751	    <info><title>hash/trigger</title></info>
3752
3753	    <para>Some trigger policies are especially susceptible to poor
3754	    hash functions. Suppose, as an extreme case, that the hash
3755	    function transforms each key to the same hash value. After some
3756	    inserts, a collision detecting policy will always indicate that
3757	    the container needs to grow.</para>
3758
3759	    <para>The library, therefore, by design, limits each operation to
3760	    one resize. For each <classname>insert</classname>, for example, it queries
3761	    only once whether a resize is needed.</para>
3762
3763	  </section>
3764
3765	  <section xml:id="policy_interaction.eqstorehash">
3766	    <info><title>equivalence functors/storing hash values/hash</title></info>
3767
3768	    <para><classname>cc_hash_table</classname> and
3769	    <classname>gp_hash_table</classname> are
3770	    parametrized by an equivalence functor and by a
3771	    <classname>Store_Hash</classname> parameter. If the latter parameter is
3772	    <classname>true</classname>, then the container stores with each entry
3773	    a hash value, and uses this value in case of collisions to
3774	    determine whether to apply a hash value. This can lower the
3775	    cost of collision for some types, but increase the cost of
3776	    collisions for other types.</para>
3777
3778	    <para>If a ranged-hash function or ranged probe function is
3779	    directly supplied, however, then it makes no sense to store the
3780	    hash value with each entry. This library's container will
3781	    fail at compilation, by design, if this is attempted.</para>
3782
3783	  </section>
3784
3785	  <section xml:id="policy_interaction.sizeloadtrigger">
3786	    <info><title>size/load-check trigger</title></info>
3787
3788	    <para>Assume a size policy issues an increasing sequence of sizes
3789	    a, a q, a q<superscript>1</superscript>, a q<superscript>2</superscript>, ... For
3790	    example, an exponential size policy might issue the sequence of
3791	    sizes 8, 16, 32, 64, ...</para>
3792
3793	    <para>If a load-check trigger policy is used, with loads
3794	    ��<subscript>min</subscript> and ��<subscript>max</subscript>,
3795	    respectively, then it is a good idea to have:</para>
3796
3797	    <orderedlist>
3798	      <listitem><para>��<subscript>max</subscript> ~ 1 / q</para></listitem>
3799
3800	      <listitem><para>��<subscript>min</subscript> &lt; 1 / (2 q)</para></listitem>
3801	    </orderedlist>
3802
3803	    <para>This will ensure that the amortized hash cost of each
3804	    modifying operation is at most approximately 3.</para>
3805
3806	    <para>��<subscript>min</subscript> ~ ��<subscript>max</subscript> is, in
3807	    any case, a bad choice, and ��<subscript>min</subscript> &gt;
3808	    �� <subscript>max</subscript> is horrendous.</para>
3809
3810	  </section>
3811
3812	</section>
3813
3814      </section> <!-- details -->
3815
3816    </section> <!-- hash -->
3817
3818    <!-- tree -->
3819    <section xml:id="pbds.design.container.tree">
3820      <info><title>tree</title></info>
3821
3822      <section xml:id="container.tree.interface">
3823	<info><title>Interface</title></info>
3824
3825	<para>The tree-based container has the following declaration:</para>
3826	<programlisting>
3827	  template&lt;
3828	  typename Key,
3829	  typename Mapped,
3830	  typename Cmp_Fn = std::less&lt;Key&gt;,
3831	  typename Tag = rb_tree_tag,
3832	  template&lt;
3833	  typename Const_Node_Iterator,
3834	  typename Node_Iterator,
3835	  typename Cmp_Fn_,
3836	  typename Allocator_&gt;
3837	  class Node_Update = null_node_update,
3838	  typename Allocator = std::allocator&lt;char&gt; &gt;
3839	  class tree;
3840	</programlisting>
3841
3842	<para>The parameters have the following meaning:</para>
3843
3844	<orderedlist>
3845	  <listitem>
3846	  <para><classname>Key</classname> is the key type.</para></listitem>
3847
3848	  <listitem>
3849	  <para><classname>Mapped</classname> is the mapped-policy.</para></listitem>
3850
3851	  <listitem>
3852	  <para><classname>Cmp_Fn</classname> is a key comparison functor</para></listitem>
3853
3854	  <listitem>
3855	    <para><classname>Tag</classname> specifies which underlying data structure
3856	  to use.</para></listitem>
3857
3858	  <listitem>
3859	    <para><classname>Node_Update</classname> is a policy for updating node
3860	  invariants.</para></listitem>
3861
3862	  <listitem>
3863	    <para><classname>Allocator</classname> is an allocator
3864	  type.</para></listitem>
3865	</orderedlist>
3866
3867	<para>The <classname>Tag</classname> parameter specifies which underlying
3868	data structure to use. Instantiating it by <classname>rb_tree_tag</classname>, <classname>splay_tree_tag</classname>, or
3869	<classname>ov_tree_tag</classname>,
3870	specifies an underlying red-black tree, splay tree, or
3871	ordered-vector tree, respectively; any other tag is illegal.
3872	Note that containers based on the former two contain more types
3873	and methods than the latter (e.g.,
3874	<classname>reverse_iterator</classname> and <classname>rbegin</classname>), and different
3875	exception and invalidation guarantees.</para>
3876
3877      </section>
3878
3879      <section xml:id="container.tree.details">
3880	<info><title>Details</title></info>
3881
3882	<section xml:id="container.tree.node">
3883	  <info><title>Node Invariants</title></info>
3884
3885
3886	  <para>Consider the two trees in the graphic below, labels A and B. The first
3887	  is a tree of floats; the second is a tree of pairs, each
3888	  signifying a geometric line interval. Each element in a tree is referred to as a node of the tree. Of course, each of
3889	  these trees can support the usual queries: the first can easily
3890	  search for <classname>0.4</classname>; the second can easily search for
3891	  <classname>std::make_pair(10, 41)</classname>.</para>
3892
3893	  <para>Each of these trees can efficiently support other queries.
3894	  The first can efficiently determine that the 2rd key in the
3895	  tree is <constant>0.3</constant>; the second can efficiently determine
3896	  whether any of its intervals overlaps
3897	  <programlisting>std::make_pair(29,42)</programlisting> (useful in geometric
3898	  applications or distributed file systems with leases, for
3899	  example).  It should be noted that an <classname>std::set</classname> can
3900	  only solve these types of problems with linear complexity.</para>
3901
3902	  <para>In order to do so, each tree stores some metadata in
3903	  each node, and maintains node invariants (see <xref linkend="biblio.clrs2001"/>.) The first stores in
3904	  each node the size of the sub-tree rooted at the node; the
3905	  second stores at each node the maximal endpoint of the
3906	  intervals at the sub-tree rooted at the node.</para>
3907
3908	  <figure>
3909	    <title>Tree node invariants</title>
3910	    <mediaobject>
3911	      <imageobject>
3912		<imagedata align="center" format="PNG" scale="100"
3913			   fileref="../images/pbds_tree_node_invariants.png"/>
3914	      </imageobject>
3915	      <textobject>
3916		<phrase>Tree node invariants</phrase>
3917	      </textobject>
3918	    </mediaobject>
3919	  </figure>
3920	
3921	  <para>Supporting such trees is difficult for a number of
3922	  reasons:</para>
3923
3924	  <orderedlist>
3925	    <listitem><para>There must be a way to specify what a node's metadata
3926	    should be (if any).</para></listitem>
3927
3928	    <listitem><para>Various operations can invalidate node
3929	    invariants.  The graphic below shows how a right rotation,
3930	    performed on A, results in B, with nodes x and y having
3931	    corrupted invariants (the grayed nodes in C). The graphic shows
3932	    how an insert, performed on D, results in E, with nodes x and y
3933	    having corrupted invariants (the grayed nodes in F). It is not
3934	    feasible to know outside the tree the effect of an operation on
3935	    the nodes of the tree.</para></listitem>
3936
3937	    <listitem><para>The search paths of standard associative containers are
3938	    defined by comparisons between keys, and not through
3939	    metadata.</para></listitem>
3940
3941	    <listitem><para>It is not feasible to know in advance which methods trees
3942	    can support. Besides the usual <classname>find</classname> method, the
3943	    first tree can support a <classname>find_by_order</classname> method, while
3944	    the second can support an <classname>overlaps</classname> method.</para></listitem>
3945	  </orderedlist>
3946
3947	  <figure>
3948	    <title>Tree node invalidation</title>
3949	    <mediaobject>
3950	      <imageobject>
3951		<imagedata align="center" format="PNG" scale="100"
3952			   fileref="../images/pbds_tree_node_invalidations.png"/>
3953	      </imageobject>
3954	      <textobject>
3955		<phrase>Tree node invalidation</phrase>
3956	      </textobject>
3957	    </mediaobject>
3958	  </figure>
3959
3960	  <para>These problems are solved by a combination of two means:
3961	  node iterators, and template-template node updater
3962	  parameters.</para>
3963
3964	  <section xml:id="container.tree.node.iterators">
3965	    <info><title>Node Iterators</title></info>
3966
3967
3968	    <para>Each tree-based container defines two additional iterator
3969	    types, <classname>const_node_iterator</classname>
3970	    and <classname>node_iterator</classname>.
3971	    These iterators allow descending from a node to one of its
3972	    children. Node iterator allow search paths different than those
3973	    determined by the comparison functor. The <classname>tree</classname>
3974	    supports the methods:</para>
3975	    <programlisting>
3976	      const_node_iterator
3977	      node_begin() const;
3978
3979	      node_iterator
3980	      node_begin();
3981
3982	      const_node_iterator
3983	      node_end() const;
3984
3985	      node_iterator
3986	      node_end();
3987	    </programlisting>
3988
3989	    <para>The first pairs return node iterators corresponding to the
3990	    root node of the tree; the latter pair returns node iterators
3991	    corresponding to a just-after-leaf node.</para>
3992	  </section>
3993
3994	  <section xml:id="container.tree.node.updator">
3995	    <info><title>Node Updator</title></info>
3996
3997	    <para>The tree-based containers are parametrized by a
3998	    <classname>Node_Update</classname> template-template parameter. A
3999	    tree-based container instantiates
4000	    <classname>Node_Update</classname> to some
4001	    <classname>node_update</classname> class, and publicly subclasses
4002	    <classname>node_update</classname>. The graphic below shows this
4003	    scheme, as well as some predefined policies (which are explained
4004	    below).</para>
4005
4006	    <figure>
4007	      <title>A tree and its update policy</title>
4008	      <mediaobject>
4009		<imageobject>
4010		  <imagedata align="center" format="PNG" scale="100"
4011			     fileref="../images/pbds_tree_node_updator_policy_cd.png"/>
4012		</imageobject>
4013		<textobject>
4014		  <phrase>A tree and its update policy</phrase>
4015		</textobject>
4016	      </mediaobject>
4017	    </figure>
4018
4019	    <para><classname>node_update</classname> (an instantiation of
4020	    <classname>Node_Update</classname>) must define <classname>metadata_type</classname> as
4021	    the type of metadata it requires. For order statistics,
4022	    e.g., <classname>metadata_type</classname> might be <classname>size_t</classname>.
4023	    The tree defines within each node a <classname>metadata_type</classname>
4024	    object.</para>
4025
4026	    <para><classname>node_update</classname> must also define the following method
4027	    for restoring node invariants:</para>
4028	    <programlisting>
4029	      void
4030	      operator()(node_iterator nd_it, const_node_iterator end_nd_it)
4031	    </programlisting>
4032
4033	    <para>In this method, <varname>nd_it</varname> is a
4034	    <classname>node_iterator</classname> corresponding to a node whose
4035	    A) all descendants have valid invariants, and B) its own
4036	    invariants might be violated; <classname>end_nd_it</classname> is
4037	    a <classname>const_node_iterator</classname> corresponding to a
4038	    just-after-leaf node. This method should correct the node
4039	    invariants of the node pointed to by
4040	    <classname>nd_it</classname>. For example, say node x in the
4041	    graphic below label A has an invalid invariant, but its' children,
4042	    y and z have valid invariants. After the invocation, all three
4043	    nodes should have valid invariants, as in label B.</para>
4044
4045
4046	    <figure>
4047	      <title>Restoring node invariants</title>
4048	      <mediaobject>
4049		<imageobject>
4050		  <imagedata align="center" format="PNG" scale="100"
4051			     fileref="../images/pbds_restoring_node_invariants.png"/>
4052		</imageobject>
4053		<textobject>
4054		  <phrase>Restoring node invariants</phrase>
4055		</textobject>
4056	      </mediaobject>
4057	    </figure>
4058
4059	    <para>When a tree operation might invalidate some node invariant,
4060	    it invokes this method in its <classname>node_update</classname> base to
4061	    restore the invariant. For example, the graphic below shows
4062	    an <function>insert</function> operation (point A); the tree performs some
4063	    operations, and calls the update functor three times (points B,
4064	    C, and D). (It is well known that any <function>insert</function>,
4065	    <function>erase</function>, <function>split</function> or <function>join</function>, can restore
4066	    all node invariants by a small number of node invariant updates (<xref linkend="biblio.clrs2001"/>)
4067	    .</para>
4068
4069	    <figure>
4070	      <title>Insert update sequence</title>
4071	      <mediaobject>
4072		<imageobject>
4073		  <imagedata align="center" format="PNG" scale="100"
4074			     fileref="../images/pbds_update_seq_diagram.png"/>
4075		</imageobject>
4076		<textobject>
4077		  <phrase>Insert update sequence</phrase>
4078		</textobject>
4079	      </mediaobject>
4080	    </figure>
4081
4082	    <para>To complete the description of the scheme, three questions
4083	    need to be answered:</para>
4084
4085	    <orderedlist>
4086	      <listitem><para>How can a tree which supports order statistics define a
4087	      method such as <classname>find_by_order</classname>?</para></listitem>
4088
4089	      <listitem><para>How can the node updater base access methods of the
4090	      tree?</para></listitem>
4091
4092	      <listitem><para>How can the following cyclic dependency be resolved?
4093	      <classname>node_update</classname> is a base class of the tree, yet it
4094	      uses node iterators defined in the tree (its child).</para></listitem>
4095	    </orderedlist>
4096
4097	    <para>The first two questions are answered by the fact that
4098	    <classname>node_update</classname> (an instantiation of
4099	    <classname>Node_Update</classname>) is a <emphasis>public</emphasis> base class
4100	    of the tree. Consequently:</para>
4101
4102	    <orderedlist>
4103	      <listitem><para>Any public methods of
4104	      <classname>node_update</classname> are automatically methods of
4105	      the tree (<xref linkend="biblio.alexandrescu01modern"/>).
4106	      Thus an order-statistics node updater,
4107	      <classname>tree_order_statistics_node_update</classname> defines
4108	      the <function>find_by_order</function> method; any tree
4109	      instantiated by this policy consequently supports this method as
4110	      well.</para></listitem>
4111
4112	      <listitem><para>In C++, if a base class declares a method as
4113	      <literal>virtual</literal>, it is
4114	      <literal>virtual</literal> in its subclasses. If
4115	      <classname>node_update</classname> needs to access one of the
4116	      tree's methods, say the member function
4117	      <function>end</function>, it simply declares that method as
4118	      <literal>virtual</literal> abstract.</para></listitem>
4119	    </orderedlist>
4120
4121	    <para>The cyclic dependency is solved through template-template
4122	    parameters. <classname>Node_Update</classname> is parametrized by
4123	    the tree's node iterators, its comparison functor, and its
4124	    allocator type. Thus, instantiations of
4125	    <classname>Node_Update</classname> have all information
4126	    required.</para>
4127
4128	    <para>This library assumes that constructing a metadata object and
4129	    modifying it are exception free. Suppose that during some method,
4130	    say <classname>insert</classname>, a metadata-related operation
4131	    (e.g., changing the value of a metadata) throws an exception. Ack!
4132	    Rolling back the method is unusually complex.</para>
4133
4134	    <para>Previously, a distinction was made between redundant
4135	    policies and null policies. Node invariants show a
4136	    case where null policies are required.</para>
4137
4138	    <para>Assume a regular tree is required, one which need not
4139	    support order statistics or interval overlap queries.
4140	    Seemingly, in this case a redundant policy - a policy which
4141	    doesn't affect nodes' contents would suffice. This, would lead
4142	    to the following drawbacks:</para>
4143
4144	    <orderedlist>
4145	      <listitem><para>Each node would carry a useless metadata object, wasting
4146	      space.</para></listitem>
4147
4148	      <listitem><para>The tree cannot know if its
4149	      <classname>Node_Update</classname> policy actually modifies a
4150	      node's metadata (this is halting reducible). In the graphic
4151	      below, assume the shaded node is inserted. The tree would have
4152	      to traverse the useless path shown to the root, applying
4153	      redundant updates all the way.</para></listitem>
4154	    </orderedlist>
4155	    <figure>
4156	      <title>Useless update path</title>
4157	      <mediaobject>
4158		<imageobject>
4159		  <imagedata align="center" format="PNG" scale="100"
4160			     fileref="../images/pbds_rationale_null_node_updator.png"/>
4161		</imageobject>
4162		<textobject>
4163		  <phrase>Useless update path</phrase>
4164		</textobject>
4165	      </mediaobject>
4166	    </figure>
4167
4168
4169	    <para>A null policy class, <classname>null_node_update</classname>
4170	    solves both these problems. The tree detects that node
4171	    invariants are irrelevant, and defines all accordingly.</para>
4172
4173	  </section>
4174
4175	</section>
4176
4177	<section xml:id="container.tree.details.split">
4178	  <info><title>Split and Join</title></info>
4179
4180	  <para>Tree-based containers support split and join methods.
4181	  It is possible to split a tree so that it passes
4182	  all nodes with keys larger than a given key to a different
4183	  tree. These methods have the following advantages over the
4184	  alternative of externally inserting to the destination
4185	  tree and erasing from the source tree:</para>
4186
4187	  <orderedlist>
4188	    <listitem><para>These methods are efficient - red-black trees are split
4189	    and joined in poly-logarithmic complexity; ordered-vector
4190	    trees are split and joined at linear complexity. The
4191	    alternatives have super-linear complexity.</para></listitem>
4192
4193	    <listitem><para>Aside from orders of growth, these operations perform
4194	    few allocations and de-allocations. For red-black trees, allocations are not performed,
4195	    and the methods are exception-free. </para></listitem>
4196	  </orderedlist>
4197	</section>
4198
4199      </section> <!-- details -->
4200
4201    </section> <!-- tree -->
4202
4203    <!-- trie -->
4204    <section xml:id="pbds.design.container.trie">
4205      <info><title>Trie</title></info>
4206
4207      <section xml:id="container.trie.interface">
4208	<info><title>Interface</title></info>
4209
4210	<para>The trie-based container has the following declaration:</para>
4211	<programlisting>
4212	  template&lt;typename Key,
4213	  typename Mapped,
4214	  typename Cmp_Fn = std::less&lt;Key&gt;,
4215	  typename Tag = pat_trie_tag,
4216	  template&lt;typename Const_Node_Iterator,
4217	  typename Node_Iterator,
4218	  typename E_Access_Traits_,
4219	  typename Allocator_&gt;
4220	  class Node_Update = null_node_update,
4221	  typename Allocator = std::allocator&lt;char&gt; &gt;
4222	  class trie;
4223	</programlisting>
4224
4225	<para>The parameters have the following meaning:</para>
4226
4227	<orderedlist>
4228	  <listitem><para><classname>Key</classname> is the key type.</para></listitem>
4229
4230	  <listitem><para><classname>Mapped</classname> is the mapped-policy.</para></listitem>
4231
4232	  <listitem><para><classname>E_Access_Traits</classname> is described in below.</para></listitem>
4233
4234	  <listitem><para><classname>Tag</classname> specifies which underlying data structure
4235	  to use, and is described shortly.</para></listitem>
4236
4237	  <listitem><para><classname>Node_Update</classname> is a policy for updating node
4238	  invariants. This is described below.</para></listitem>
4239
4240	  <listitem><para><classname>Allocator</classname> is an allocator
4241	  type.</para></listitem>
4242	</orderedlist>
4243
4244	<para>The <classname>Tag</classname> parameter specifies which underlying
4245	data structure to use. Instantiating it by <classname>pat_trie_tag</classname>, specifies an
4246	underlying PATRICIA trie (explained shortly); any other tag is
4247	currently illegal.</para>
4248
4249	<para>Following is a description of a (PATRICIA) trie
4250	(this implementation follows <xref linkend="biblio.okasaki98mereable"/> and
4251	<xref linkend="biblio.filliatre2000ptset"/>).
4252	</para>
4253
4254	<para>A (PATRICIA) trie is similar to a tree, but with the
4255	following differences:</para>
4256
4257	<orderedlist>
4258	  <listitem><para>It explicitly views keys as a sequence of elements.
4259	  E.g., a trie can view a string as a sequence of
4260	  characters; a trie can view a number as a sequence of
4261	  bits.</para></listitem>
4262
4263	  <listitem><para>It is not (necessarily) binary. Each node has fan-out n
4264	  + 1, where n is the number of distinct
4265	  elements.</para></listitem>
4266
4267	  <listitem><para>It stores values only at leaf nodes.</para></listitem>
4268
4269	  <listitem><para>Internal nodes have the properties that A) each has at
4270	  least two children, and B) each shares the same prefix with
4271	  any of its descendant.</para></listitem>
4272	</orderedlist>
4273
4274	<para>A (PATRICIA) trie has some useful properties:</para>
4275
4276	<orderedlist>
4277	  <listitem><para>It can be configured to use large node fan-out, giving it
4278	  very efficient find performance (albeit at insertion
4279	  complexity and size).</para></listitem>
4280
4281	  <listitem><para>It works well for common-prefix keys.</para></listitem>
4282
4283	  <listitem><para>It can support efficiently queries such as which
4284	  keys match a certain prefix. This is sometimes useful in file
4285	  systems and routers, and for "type-ahead" aka predictive text matching
4286	  on mobile devices.</para></listitem>
4287	</orderedlist>
4288
4289
4290      </section>
4291
4292      <section xml:id="container.trie.details">
4293	<info><title>Details</title></info>
4294
4295	<section xml:id="container.trie.details.etraits">
4296	  <info><title>Element Access Traits</title></info>
4297
4298	  <para>A trie inherently views its keys as sequences of elements.
4299	  For example, a trie can view a string as a sequence of
4300	  characters. A trie needs to map each of n elements to a
4301	  number in {0, n - 1}. For example, a trie can map a
4302	  character <varname>c</varname> to
4303	  <programlisting>static_cast&lt;size_t&gt;(c)</programlisting>.</para>
4304
4305	  <para>Seemingly, then, a trie can assume that its keys support
4306	  (const) iterators, and that the <classname>value_type</classname> of this
4307	  iterator can be cast to a <classname>size_t</classname>. There are several
4308	  reasons, though, to decouple the mechanism by which the trie
4309	  accesses its keys' elements from the trie:</para>
4310
4311	  <orderedlist>
4312	    <listitem><para>In some cases, the numerical value of an element is
4313	    inappropriate. Consider a trie storing DNA strings. It is
4314	    logical to use a trie with a fan-out of 5 = 1 + |{'A', 'C',
4315	    'G', 'T'}|. This requires mapping 'T' to 3, though.</para></listitem>
4316
4317	    <listitem><para>In some cases the keys' iterators are different than what
4318	    is needed. For example, a trie can be used to search for
4319	    common suffixes, by using strings'
4320	    <classname>reverse_iterator</classname>. As another example, a trie mapping
4321	    UNICODE strings would have a huge fan-out if each node would
4322	    branch on a UNICODE character; instead, one can define an
4323	    iterator iterating over 8-bit (or less) groups.</para></listitem>
4324	  </orderedlist>
4325
4326	  <para>trie is,
4327	  consequently, parametrized by <classname>E_Access_Traits</classname> -
4328	  traits which instruct how to access sequences' elements.
4329	  <classname>string_trie_e_access_traits</classname>
4330	  is a traits class for strings. Each such traits define some
4331	  types, like:</para>
4332	  <programlisting>
4333	    typename E_Access_Traits::const_iterator
4334	  </programlisting>
4335
4336	  <para>is a const iterator iterating over a key's elements. The
4337	  traits class must also define methods for obtaining an iterator
4338	  to the first and last element of a key.</para>
4339
4340	  <para>The graphic below shows a
4341	  (PATRICIA) trie resulting from inserting the words: "I wish
4342	  that I could ever see a poem lovely as a trie" (which,
4343	  unfortunately, does not rhyme).</para>
4344
4345	  <para>The leaf nodes contain values; each internal node contains
4346	  two <classname>typename E_Access_Traits::const_iterator</classname>
4347	  objects, indicating the maximal common prefix of all keys in
4348	  the sub-tree. For example, the shaded internal node roots a
4349	  sub-tree with leafs "a" and "as". The maximal common prefix is
4350	  "a". The internal node contains, consequently, to const
4351	  iterators, one pointing to <varname>'a'</varname>, and the other to
4352	  <varname>'s'</varname>.</para>
4353
4354	  <figure>
4355	    <title>A PATRICIA trie</title>
4356	    <mediaobject>
4357	      <imageobject>
4358		<imagedata align="center" format="PNG" scale="100"
4359			   fileref="../images/pbds_pat_trie.png"/>
4360	      </imageobject>
4361	      <textobject>
4362		<phrase>A PATRICIA trie</phrase>
4363	      </textobject>
4364	    </mediaobject>
4365	  </figure>
4366
4367	</section>
4368
4369	<section xml:id="container.trie.details.node">
4370	  <info><title>Node Invariants</title></info>
4371
4372	  <para>Trie-based containers support node invariants, as do
4373	  tree-based containers. There are two minor
4374	  differences, though, which, unfortunately, thwart sharing them
4375	  sharing the same node-updating policies:</para>
4376
4377	  <orderedlist>
4378	    <listitem>
4379	      <para>A trie's <classname>Node_Update</classname> template-template
4380	      parameter is parametrized by <classname>E_Access_Traits</classname>, while
4381	      a tree's <classname>Node_Update</classname> template-template parameter is
4382	    parametrized by <classname>Cmp_Fn</classname>.</para></listitem>
4383
4384	    <listitem><para>Tree-based containers store values in all nodes, while
4385	    trie-based containers (at least in this implementation) store
4386	    values in leafs.</para></listitem>
4387	  </orderedlist>
4388
4389	  <para>The graphic below shows the scheme, as well as some predefined
4390	  policies (which are explained below).</para>
4391
4392	  <figure>
4393	    <title>A trie and its update policy</title>
4394	    <mediaobject>
4395	      <imageobject>
4396		<imagedata align="center" format="PNG" scale="100"
4397			   fileref="../images/pbds_trie_node_updator_policy_cd.png"/>
4398	      </imageobject>
4399	      <textobject>
4400		<phrase>A trie and its update policy</phrase>
4401	      </textobject>
4402	    </mediaobject>
4403	  </figure>
4404
4405
4406	  <para>This library offers the following pre-defined trie node
4407	  updating policies:</para>
4408
4409	  <orderedlist>
4410	    <listitem>
4411	      <para>
4412		<classname>trie_order_statistics_node_update</classname>
4413		supports order statistics.
4414	      </para>
4415	    </listitem>
4416
4417	    <listitem><para><classname>trie_prefix_search_node_update</classname>
4418	    supports searching for ranges that match a given prefix.</para></listitem>
4419
4420	    <listitem><para><classname>null_node_update</classname>
4421	    is the null node updater.</para></listitem>
4422	  </orderedlist>
4423
4424	</section>
4425
4426	<section xml:id="container.trie.details.split">
4427	  <info><title>Split and Join</title></info>
4428	  <para>Trie-based containers support split and join methods; the
4429	  rationale is equal to that of tree-based containers supporting
4430	  these methods.</para>
4431	</section>
4432
4433      </section> <!-- details -->
4434
4435    </section> <!-- trie -->
4436
4437    <!-- list_update -->
4438    <section xml:id="pbds.design.container.list">
4439      <info><title>List</title></info>
4440
4441      <section xml:id="container.list.interface">
4442	<info><title>Interface</title></info>
4443
4444	<para>The list-based container has the following declaration:</para>
4445	<programlisting>
4446	  template&lt;typename Key,
4447	  typename Mapped,
4448	  typename Eq_Fn = std::equal_to&lt;Key&gt;,
4449	  typename Update_Policy = move_to_front_lu_policy&lt;&gt;,
4450	  typename Allocator = std::allocator&lt;char&gt; &gt;
4451	  class list_update;
4452	</programlisting>
4453
4454	<para>The parameters have the following meaning:</para>
4455
4456	<orderedlist>
4457	  <listitem>
4458	    <para>
4459	      <classname>Key</classname> is the key type.
4460	    </para>
4461	  </listitem>
4462
4463	  <listitem>
4464	    <para>
4465	      <classname>Mapped</classname> is the mapped-policy.
4466	    </para>
4467	  </listitem>
4468
4469	  <listitem>
4470	    <para>
4471	      <classname>Eq_Fn</classname> is a key equivalence functor.
4472	    </para>
4473	  </listitem>
4474
4475	  <listitem>
4476	    <para>
4477	      <classname>Update_Policy</classname> is a policy updating positions in
4478	      the list based on access patterns. It is described in the
4479	      following subsection.
4480	    </para>
4481	  </listitem>
4482
4483	  <listitem>
4484	    <para>
4485	      <classname>Allocator</classname> is an allocator type.
4486	    </para>
4487	  </listitem>
4488	</orderedlist>
4489
4490	<para>A list-based associative container is a container that
4491	stores elements in a linked-list. It does not order the elements
4492	by any particular order related to the keys.  List-based
4493	containers are primarily useful for creating "multimaps". In fact,
4494	list-based containers are designed in this library expressly for
4495	this purpose.</para>
4496
4497	<para>List-based containers might also be useful for some rare
4498	cases, where a key is encapsulated to the extent that only
4499	key-equivalence can be tested. Hash-based containers need to know
4500	how to transform a key into a size type, and tree-based containers
4501	need to know if some key is larger than another.  List-based
4502	associative containers, conversely, only need to know if two keys
4503	are equivalent.</para>
4504
4505	<para>Since a list-based associative container does not order
4506	elements by keys, is it possible to order the list in some
4507	useful manner? Remarkably, many on-line competitive
4508	algorithms exist for reordering lists to reflect access
4509	prediction. (See <xref linkend="biblio.motwani95random"/> and <xref linkend="biblio.andrew04mtf"/>).
4510	</para>
4511
4512      </section>
4513
4514      <section xml:id="container.list.details">
4515	<info><title>Details</title></info>
4516	<para>
4517	</para>
4518	<section xml:id="container.list.details.ds">
4519	  <info><title>Underlying Data Structure</title></info>
4520
4521	  <para>The graphic below shows a
4522	  simple list of integer keys. If we search for the integer 6, we
4523	  are paying an overhead: the link with key 6 is only the fifth
4524	  link; if it were the first link, it could be accessed
4525	  faster.</para>
4526
4527	  <figure>
4528	    <title>A simple list</title>
4529	    <mediaobject>
4530	      <imageobject>
4531		<imagedata align="center" format="PNG" scale="100"
4532			   fileref="../images/pbds_simple_list.png"/>
4533	      </imageobject>
4534	      <textobject>
4535		<phrase>A simple list</phrase>
4536	      </textobject>
4537	    </mediaobject>
4538	  </figure>
4539
4540	  <para>List-update algorithms reorder lists as elements are
4541	  accessed. They try to determine, by the access history, which
4542	  keys to move to the front of the list. Some of these algorithms
4543	  require adding some metadata alongside each entry.</para>
4544
4545	  <para>For example, in the graphic below label A shows the counter
4546	  algorithm. Each node contains both a key and a count metadata
4547	  (shown in bold). When an element is accessed (e.g. 6) its count is
4548	  incremented, as shown in label B. If the count reaches some
4549	  predetermined value, say 10, as shown in label C, the count is set
4550	  to 0 and the node is moved to the front of the list, as in label
4551	  D.
4552	  </para>
4553
4554	  <figure>
4555	    <title>The counter algorithm</title>
4556	    <mediaobject>
4557	      <imageobject>
4558		<imagedata align="center" format="PNG" scale="100"
4559			   fileref="../images/pbds_list_update.png"/>
4560	      </imageobject>
4561	      <textobject>
4562		<phrase>The counter algorithm</phrase>
4563	      </textobject>
4564	    </mediaobject>
4565	  </figure>
4566
4567
4568	</section>
4569
4570	<section xml:id="container.list.details.policies">
4571	  <info><title>Policies</title></info>
4572
4573	  <para>this library allows instantiating lists with policies
4574	  implementing any algorithm moving nodes to the front of the
4575	  list (policies implementing algorithms interchanging nodes are
4576	  unsupported).</para>
4577
4578	  <para>Associative containers based on lists are parametrized by a
4579	  <classname>Update_Policy</classname> parameter. This parameter defines the
4580	  type of metadata each node contains, how to create the
4581	  metadata, and how to decide, using this metadata, whether to
4582	  move a node to the front of the list. A list-based associative
4583	  container object derives (publicly) from its update policy.
4584	  </para>
4585
4586	  <para>An instantiation of <classname>Update_Policy</classname> must define
4587	  internally <classname>update_metadata</classname> as the metadata it
4588	  requires. Internally, each node of the list contains, besides
4589	  the usual key and data, an instance of <classname>typename
4590	  Update_Policy::update_metadata</classname>.</para>
4591
4592	  <para>An instantiation of <classname>Update_Policy</classname> must define
4593	  internally two operators:</para>
4594	  <programlisting>
4595	    update_metadata
4596	    operator()();
4597
4598	    bool
4599	    operator()(update_metadata &amp;);
4600	  </programlisting>
4601
4602	  <para>The first is called by the container object, when creating a
4603	  new node, to create the node's metadata. The second is called
4604	  by the container object, when a node is accessed (
4605	  when a find operation's key is equivalent to the key of the
4606	  node), to determine whether to move the node to the front of
4607	  the list.
4608	  </para>
4609
4610	  <para>The library contains two predefined implementations of
4611	  list-update policies. The first
4612	  is <classname>lu_counter_policy</classname>, which implements the
4613	  counter algorithm described above. The second is
4614	  <classname>lu_move_to_front_policy</classname>,
4615	  which unconditionally move an accessed element to the front of
4616	  the list. The latter type is very useful in this library,
4617	  since there is no need to associate metadata with each element.
4618	  (See <xref linkend="biblio.andrew04mtf"/>
4619	  </para>
4620
4621	</section>
4622
4623	<section xml:id="container.list.details.mapped">
4624	  <info><title>Use in Multimaps</title></info>
4625
4626	  <para>In this library, there are no equivalents for the standard's
4627	  multimaps and multisets; instead one uses an associative
4628	  container mapping primary keys to secondary keys.</para>
4629
4630	  <para>List-based containers are especially useful as associative
4631	  containers for secondary keys. In fact, they are implemented
4632	  here expressly for this purpose.</para>
4633
4634	  <para>To begin with, these containers use very little per-entry
4635	  structure memory overhead, since they can be implemented as
4636	  singly-linked lists. (Arrays use even lower per-entry memory
4637	  overhead, but they are less flexible in moving around entries,
4638	  and have weaker invalidation guarantees).</para>
4639
4640	  <para>More importantly, though, list-based containers use very
4641	  little per-container memory overhead. The memory overhead of an
4642	  empty list-based container is practically that of a pointer.
4643	  This is important for when they are used as secondary
4644	  associative-containers in situations where the average ratio of
4645	  secondary keys to primary keys is low (or even 1).</para>
4646
4647	  <para>In order to reduce the per-container memory overhead as much
4648	  as possible, they are implemented as closely as possible to
4649	  singly-linked lists.</para>
4650
4651	  <orderedlist>
4652	    <listitem>
4653	      <para>
4654		List-based containers do not store internally the number
4655		of values that they hold. This means that their <function>size</function>
4656		method has linear complexity (just like <classname>std::list</classname>).
4657		Note that finding the number of equivalent-key values in a
4658		standard multimap also has linear complexity (because it must be
4659		done,  via <function>std::distance</function> of the
4660		multimap's <function>equal_range</function> method), but usually with
4661		higher constants.
4662	      </para>
4663	    </listitem>
4664
4665	    <listitem>
4666	      <para>
4667		Most associative-container objects each hold a policy
4668		object (a hash-based container object holds a
4669		hash functor). List-based containers, conversely, only have
4670		class-wide policy objects.
4671	      </para>
4672	    </listitem>
4673	  </orderedlist>
4674
4675
4676	</section>
4677
4678      </section> <!-- details -->
4679
4680    </section> <!-- list -->
4681
4682
4683    <!-- priority_queue -->
4684    <section xml:id="pbds.design.container.priority_queue">
4685      <info><title>Priority Queue</title></info>
4686
4687      <section xml:id="container.priority_queue.interface">
4688	<info><title>Interface</title></info>
4689
4690	<para>The priority queue container has the following
4691	declaration:
4692	</para>
4693	<programlisting>
4694	  template&lt;typename  Value_Type,
4695	  typename  Cmp_Fn = std::less&lt;Value_Type&gt;,
4696	  typename  Tag = pairing_heap_tag,
4697	  typename  Allocator = std::allocator&lt;char &gt; &gt;
4698	  class priority_queue;
4699	</programlisting>
4700
4701	<para>The parameters have the following meaning:</para>
4702
4703	<orderedlist>
4704	  <listitem><para><classname>Value_Type</classname> is the value type.</para></listitem>
4705
4706	  <listitem><para><classname>Cmp_Fn</classname> is a value comparison functor</para></listitem>
4707
4708	  <listitem><para><classname>Tag</classname> specifies which underlying data structure
4709	  to use.</para></listitem>
4710
4711	  <listitem><para><classname>Allocator</classname> is an allocator
4712	  type.</para></listitem>
4713	</orderedlist>
4714
4715	<para>The <classname>Tag</classname> parameter specifies which underlying
4716	data structure to use. Instantiating it by<classname>pairing_heap_tag</classname>,<classname>binary_heap_tag</classname>,
4717	<classname>binomial_heap_tag</classname>,
4718	<classname>rc_binomial_heap_tag</classname>,
4719	or <classname>thin_heap_tag</classname>,
4720	specifies, respectively,
4721	an underlying pairing heap (<xref linkend="biblio.fredman86pairing"/>),
4722	binary heap (<xref linkend="biblio.clrs2001"/>),
4723	binomial heap (<xref linkend="biblio.clrs2001"/>),
4724	a binomial heap with a redundant binary counter (<xref linkend="biblio.maverick_lowerbounds"/>),
4725	or a thin heap (<xref linkend="biblio.kt99fat_heaps"/>).
4726	</para>
4727
4728	<para>
4729	  As mentioned in the tutorial,
4730	  <classname>__gnu_pbds::priority_queue</classname> shares most of the
4731	  same interface with <classname>std::priority_queue</classname>.
4732	  E.g. if <varname>q</varname> is a priority queue of type
4733	  <classname>Q</classname>, then <function>q.top()</function> will
4734	  return the "largest" value in the container (according to
4735	  <classname>typename
4736	  Q::cmp_fn</classname>). <classname>__gnu_pbds::priority_queue</classname>
4737	  has a larger (and very slightly different) interface than
4738	  <classname>std::priority_queue</classname>, however, since typically
4739	  <classname>push</classname> and <classname>pop</classname> are deemed
4740	insufficient for manipulating priority-queues. </para>
4741
4742	<para>Different settings require different priority-queue
4743	implementations which are described in later; see traits
4744	discusses ways to differentiate between the different traits of
4745	different implementations.</para>
4746
4747
4748      </section>
4749
4750      <section xml:id="container.priority_queue.details">
4751	<info><title>Details</title></info>
4752
4753	<section xml:id="container.priority_queue.details.iterators">
4754	  <info><title>Iterators</title></info>
4755
4756	  <para>There are many different underlying-data structures for
4757	  implementing priority queues. Unfortunately, most such
4758	  structures are oriented towards making <function>push</function> and
4759	  <function>top</function> efficient, and consequently don't allow efficient
4760	  access of other elements: for instance, they cannot support an efficient
4761	  <function>find</function> method. In the use case where it
4762	  is important to both access and "do something with" an
4763	  arbitrary value, one would be out of luck. For example, many graph algorithms require
4764	  modifying a value (typically increasing it in the sense of the
4765	  priority queue's comparison functor).</para>
4766
4767	  <para>In order to access and manipulate an arbitrary value in a
4768	  priority queue, one needs to reference the internals of the
4769	  priority queue from some form of an associative container -
4770	  this is unavoidable. Of course, in order to maintain the
4771	  encapsulation of the priority queue, this needs to be done in a
4772	  way that minimizes exposure to implementation internals.</para>
4773
4774	  <para>In this library the priority queue's <function>insert</function>
4775	  method returns an iterator, which if valid can be used for subsequent <function>modify</function> and
4776	  <function>erase</function> operations. This both preserves the priority
4777	  queue's encapsulation, and allows accessing arbitrary values (since the
4778	  returned iterators from the <function>push</function> operation can be
4779	  stored in some form of associative container).</para>
4780
4781	  <para>Priority queues' iterators present a problem regarding their
4782	  invalidation guarantees. One assumes that calling
4783	  <function>operator++</function> on an iterator will associate it
4784	  with the "next" value. Priority-queues are
4785	  self-organizing: each operation changes what the "next" value
4786	  means. Consequently, it does not make sense that <function>push</function>
4787	  will return an iterator that can be incremented - this can have
4788	  no possible use. Also, as in the case of hash-based containers,
4789	  it is awkward to define if a subsequent <function>push</function> operation
4790	  invalidates a prior returned iterator: it invalidates it in the
4791	  sense that its "next" value is not related to what it
4792	  previously considered to be its "next" value. However, it might not
4793	  invalidate it, in the sense that it can be
4794	  de-referenced and used for <function>modify</function> and <function>erase</function>
4795	  operations.</para>
4796
4797	  <para>Similarly to the case of the other unordered associative
4798	  containers, this library uses a distinction between
4799	  point-type and range type iterators. A priority queue's <classname>iterator</classname> can always be
4800	  converted to a <classname>point_iterator</classname>, and a
4801	  <classname>const_iterator</classname> can always be converted to a
4802	  <classname>point_const_iterator</classname>.</para>
4803
4804	  <para>The following snippet demonstrates manipulating an arbitrary
4805	  value:</para>
4806	  <programlisting>
4807	    // A priority queue of integers.
4808	    priority_queue&lt;int &gt; p;
4809
4810	    // Insert some values into the priority queue.
4811	    priority_queue&lt;int &gt;::point_iterator it = p.push(0);
4812
4813	    p.push(1);
4814	    p.push(2);
4815
4816	    // Now modify a value.
4817	    p.modify(it, 3);
4818
4819	    assert(p.top() == 3);
4820	  </programlisting>
4821
4822	
4823	  <para>It should be noted that an alternative design could embed an
4824	  associative container in a priority queue. Could, but most
4825	  probably should not. To begin with, it should be noted that one
4826	  could always encapsulate a priority queue and an associative
4827	  container mapping values to priority queue iterators with no
4828	  performance loss. One cannot, however, "un-encapsulate" a priority
4829	  queue embedding an associative container, which might lead to
4830	  performance loss. Assume, that one needs to associate each value
4831	  with some data unrelated to priority queues. Then using
4832	  this library's design, one could use an
4833	  associative container mapping each value to a pair consisting of
4834	  this data and a priority queue's iterator. Using the embedded
4835	  method would need to use two associative containers. Similar
4836	  problems might arise in cases where a value can reside
4837	  simultaneously in many priority queues.</para>
4838
4839	</section>
4840
4841
4842	<section xml:id="container.priority_queue.details.d">
4843	  <info><title>Underlying Data Structure</title></info>
4844
4845	  <para>There are three main implementations of priority queues: the
4846	  first employs a binary heap, typically one which uses a
4847	  sequence; the second uses a tree (or forest of trees), which is
4848	  typically less structured than an associative container's tree;
4849	  the third simply uses an associative container. These are
4850	  shown in the graphic below, in labels A1 and A2, label B, and label C.</para>
4851
4852	  <figure>
4853	    <title>Underlying Priority-Queue Data-Structures.</title>
4854	    <mediaobject>
4855	      <imageobject>
4856		<imagedata align="center" format="PNG" scale="100"
4857			   fileref="../images/pbds_priority_queue_different_underlying_dss.png"/>
4858	      </imageobject>
4859	      <textobject>
4860		<phrase>Underlying Priority-Queue Data-Structures.</phrase>
4861	      </textobject>
4862	    </mediaobject>
4863	  </figure>
4864
4865	  <para>Roughly speaking, any value that is both pushed and popped
4866	  from a priority queue must incur a logarithmic expense (in the
4867	  amortized sense). Any priority queue implementation that would
4868	  avoid this, would violate known bounds on comparison-based
4869	  sorting (see <xref linkend="biblio.clrs2001"/> and <xref linkend="biblio.brodal96priority"/>).
4870	  </para>
4871
4872	  <para>Most implementations do
4873	  not differ in the asymptotic amortized complexity of
4874	  <function>push</function> and <function>pop</function> operations, but they differ in
4875	  the constants involved, in the complexity of other operations
4876	  (e.g., <function>modify</function>), and in the worst-case
4877	  complexity of single operations. In general, the more
4878	  "structured" an implementation (i.e., the more internal
4879	  invariants it possesses) - the higher its amortized complexity
4880	  of <function>push</function> and <function>pop</function> operations.</para>
4881
4882	  <para>This library implements different algorithms using a
4883	  single class: <classname>priority_queue</classname>.
4884	  Instantiating the <classname>Tag</classname> template parameter, "selects"
4885	  the implementation:</para>
4886
4887	  <orderedlist>
4888	    <listitem><para>
4889	      Instantiating <classname>Tag = binary_heap_tag</classname> creates
4890	      a binary heap of the form in represented in the graphic with labels A1 or A2. The former is internally
4891	      selected by priority_queue
4892	      if <classname>Value_Type</classname> is instantiated by a primitive type
4893	      (e.g., an <type>int</type>); the latter is
4894	      internally selected for all other types (e.g.,
4895	      <classname>std::string</classname>). This implementations is relatively
4896	      unstructured, and so has good <classname>push</classname> and <classname>pop</classname>
4897	      performance; it is the "best-in-kind" for primitive
4898	      types, e.g., <type>int</type>s. Conversely, it has
4899	      high worst-case performance, and can support only linear-time
4900	    <function>modify</function> and <function>erase</function> operations.</para></listitem>
4901
4902	    <listitem><para>Instantiating <classname>Tag =
4903	    pairing_heap_tag</classname> creates a pairing heap of the form
4904	    in represented by label B in the graphic above. This
4905	    implementations too is relatively unstructured, and so has good
4906	    <function>push</function> and <function>pop</function>
4907	    performance; it is the "best-in-kind" for non-primitive types,
4908	    e.g., <classname>std:string</classname>s. It also has very good
4909	    worst-case <function>push</function> and
4910	    <function>join</function> performance (O(1)), but has high
4911	    worst-case <function>pop</function>
4912	    complexity.</para></listitem>
4913
4914	    <listitem><para>Instantiating <classname>Tag =
4915	    binomial_heap_tag</classname> creates a binomial heap of the
4916	    form repsented by label B in the graphic above. This
4917	    implementations is more structured than a pairing heap, and so
4918	    has worse <function>push</function> and <function>pop</function>
4919	    performance. Conversely, it has sub-linear worst-case bounds for
4920	    <function>pop</function>, e.g., and so it might be preferred in
4921	    cases where responsiveness is important.</para></listitem>
4922
4923	    <listitem><para>Instantiating <classname>Tag =
4924	    rc_binomial_heap_tag</classname> creates a binomial heap of the
4925	    form represented in label B above, accompanied by a redundant
4926	    counter which governs the trees. This implementations is
4927	    therefore more structured than a binomial heap, and so has worse
4928	    <function>push</function> and <function>pop</function>
4929	    performance. Conversely, it guarantees O(1)
4930	    <function>push</function> complexity, and so it might be
4931	    preferred in cases where the responsiveness of a binomial heap
4932	    is insufficient.</para></listitem>
4933
4934	    <listitem><para>Instantiating <classname>Tag =
4935	    thin_heap_tag</classname> creates a thin heap of the form
4936	    represented by the label B in the graphic above. This
4937	    implementations too is more structured than a pairing heap, and
4938	    so has worse <function>push</function> and
4939	    <function>pop</function> performance. Conversely, it has better
4940	    worst-case and identical amortized complexities than a Fibonacci
4941	    heap, and so might be more appropriate for some graph
4942	    algorithms.</para></listitem>
4943	  </orderedlist>
4944
4945	  <para>Of course, one can use any order-preserving associative
4946	  container as a priority queue, as in the graphic above label C, possibly by creating an adapter class
4947	  over the associative container (much as
4948	  <classname>std::priority_queue</classname> can adapt <classname>std::vector</classname>).
4949	  This has the advantage that no cross-referencing is necessary
4950	  at all; the priority queue itself is an associative container.
4951	  Most associative containers are too structured to compete with
4952	  priority queues in terms of <function>push</function> and <function>pop</function>
4953	  performance.</para>
4954
4955
4956
4957	</section>
4958
4959	<section xml:id="container.priority_queue.details.traits">
4960	  <info><title>Traits</title></info>
4961
4962	  <para>It would be nice if all priority queues could
4963	  share exactly the same behavior regardless of implementation. Sadly, this is not possible. Just one for instance is in join operations: joining
4964	  two binary heaps might throw an exception (not corrupt
4965	  any of the heaps on which it operates), but joining two pairing
4966	  heaps is exception free.</para>
4967
4968	  <para>Tags and traits are very useful for manipulating generic
4969	  types. <classname>__gnu_pbds::priority_queue</classname>
4970	  publicly defines <classname>container_category</classname> as one of the tags. Given any
4971	  container <classname>Cntnr</classname>, the tag of the underlying
4972	  data structure can be found via <classname>typename
4973	  Cntnr::container_category</classname>; this is one of the possible tags shown in the graphic below.
4974	  </para>
4975
4976	  <figure>
4977	    <title>Priority-Queue Data-Structure Tags.</title>
4978	    <mediaobject>
4979	      <imageobject>
4980		<imagedata align="center" format="PNG" scale="100"
4981                 fileref="../images/pbds_priority_queue_tag_hierarchy.png"/>
4982	      </imageobject>
4983	      <textobject>
4984		<phrase>Priority-Queue Data-Structure Tags.</phrase>
4985	      </textobject>
4986	    </mediaobject>
4987	  </figure>
4988
4989
4990	  <para>Additionally, a traits mechanism can be used to query a
4991	  container type for its attributes. Given any container
4992	  <classname>Cntnr</classname>, then <programlisting>__gnu_pbds::container_traits&lt;Cntnr&gt;</programlisting>
4993	  is a traits class identifying the properties of the
4994	  container.</para>
4995
4996	  <para>To find if a container might throw if two of its objects are
4997	  joined, one can use
4998	  <programlisting>
4999	    container_traits&lt;Cntnr&gt;::split_join_can_throw
5000	  </programlisting>
5001	  </para>
5002
5003	  <para>
5004	    Different priority-queue implementations have different invalidation guarantees. This is
5005	    especially important, since there is no way to access an arbitrary
5006	    value of priority queues except for iterators. Similarly to
5007	    associative containers, one can use
5008	    <programlisting>
5009	      container_traits&lt;Cntnr&gt;::invalidation_guarantee
5010	    </programlisting>
5011	  to get the invalidation guarantee type of a priority queue.</para>
5012
5013	  <para>It is easy to understand from the graphic above, what <classname>container_traits&lt;Cntnr&gt;::invalidation_guarantee</classname>
5014	  will be for different implementations. All implementations of
5015	  type represented by label B have <classname>point_invalidation_guarantee</classname>:
5016	  the container can freely internally reorganize the nodes -
5017	  range-type iterators are invalidated, but point-type iterators
5018	  are always valid. Implementations of type represented by labels A1 and A2 have <classname>basic_invalidation_guarantee</classname>:
5019	  the container can freely internally reallocate the array - both
5020	  point-type and range-type iterators might be invalidated.</para>
5021
5022	  <para>
5023	    This has major implications, and constitutes a good reason to avoid
5024	    using binary heaps. A binary heap can perform <function>modify</function>
5025	    or <function>erase</function> efficiently given a valid point-type
5026	    iterator. However, in order to supply it with a valid point-type
5027	    iterator, one needs to iterate (linearly) over all
5028	    values, then supply the relevant iterator (recall that a
5029	    range-type iterator can always be converted to a point-type
5030	    iterator). This means that if the number of <function>modify</function> or
5031	    <function>erase</function> operations is non-negligible (say
5032	    super-logarithmic in the total sequence of operations) - binary
5033	    heaps will perform badly.
5034	  </para>
5035
5036	</section>
5037
5038      </section> <!-- details -->
5039
5040    </section> <!-- priority_queue -->
5041
5042
5043
5044  </section> <!-- container -->
5045
5046  </section> <!-- design -->
5047
5048
5049
5050  <!-- S04: Test -->
5051  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" parse="xml"
5052	      href="test_policy_data_structures.xml">
5053  </xi:include>
5054
5055  <!-- S05: Reference/Acknowledgments -->
5056  <section xml:id="pbds.ack">
5057    <info><title>Acknowledgments</title></info>
5058    <?dbhtml filename="policy_data_structures_ack.html"?>
5059
5060    <para>
5061      Written by Ami Tavory and Vladimir Dreizin (IBM Haifa Research
5062      Laboratories), and Benjamin Kosnik (Red Hat).
5063    </para>
5064
5065    <para>
5066      This library was partially written at IBM's Haifa Research Labs.
5067      It is based heavily on policy-based design and uses many useful
5068      techniques from Modern C++ Design: Generic Programming and Design
5069      Patterns Applied by Andrei Alexandrescu.
5070    </para>
5071
5072    <para>
5073      Two ideas are borrowed from the SGI-STL implementation:
5074    </para>
5075
5076    <orderedlist>
5077      <listitem>
5078	<para>
5079	  The prime-based resize policies use a list of primes taken from
5080	  the SGI-STL implementation.
5081	</para>
5082      </listitem>
5083
5084      <listitem>
5085	<para>
5086	  The red-black trees contain both a root node and a header node
5087	  (containing metadata), connected in a way that forward and
5088	  reverse iteration can be performed efficiently.
5089	</para>
5090      </listitem>
5091    </orderedlist>
5092
5093    <para>
5094      Some test utilities borrow ideas from
5095      <link xmlns:xlink="http://www.w3.org/1999/xlink"
5096	    xlink:href="http://www.boost.org/libs/timer/">boost::timer</link>.
5097    </para>
5098
5099    <para>
5100      We would like to thank Scott Meyers for useful comments (without
5101      attributing to him any flaws in the design or implementation of the
5102      library).
5103    </para>
5104    <para>We would like to thank Matt Austern for the suggestion to
5105    include tries.</para>
5106  </section>
5107
5108  <!-- S06: Biblio -->
5109<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" parse="xml"
5110	    href="policy_data_structures_biblio.xml">
5111</xi:include>
5112
5113</chapter>
5114