1<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
2    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3
4<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
5<head>
6  <meta name="generator" content=
7  "HTML Tidy for Linux/x86 (vers 12 April 2005), see www.w3.org" />
8
9  <title>Motivation</title>
10  <meta http-equiv="Content-Type" content=
11  "text/html; charset=us-ascii" />
12  </head>
13
14<body>
15  <div id="page">
16    <h1>Motivation</h1>
17
18    <p>Many fine associative-container libraries were already
19    written, most notably, the STL's associative containers. Why
20    then write another library? This section shows some possible
21    advantages of this library, when considering the challenges in
22    <a href="introduction.html">Introduction</a>. Many of these
23    points stem from the fact that the STL introduced
24    associative-containers in a two-step process (first
25    standardizing tree-based containers, only then adding
26    hash-based containers, which are fundamentally different), did
27    not standardize priority queues as containers, and (in our
28    opinion) overloads the iterator concept.</p>
29
30    <h2><a name="assoc" id="assoc">Associative Containers</a></h2>
31
32    <h3><a name="assoc_policies" id="assoc_policies">More
33    Configuration Choices</a></h3>
34
35    <p>Associative containers require a relatively large number of
36    policies to function efficiently in various settings. In some
37    cases this is needed for making their common operations more
38    efficient, and in other cases this allows them to support a
39    larger set of operations</p>
40
41    <ol>
42      <li>Hash-based containers, for example, support look-up and
43      insertion methods (<i>e.g.</i>, <tt>find</tt> and
44      <tt>insert</tt>). In order to locate elements quickly, they
45      are supplied a hash functor, which instruct how to transform
46      a key object into some size type; <i>e.g.</i>, a hash functor
47      might transform <tt>"hello"</tt> into <tt>1123002298</tt>. A
48      hash table, though, requires transforming each key object
49      into some size-type type in some specific domain;
50      <i>e.g.</i>, a hash table with a 128-long table might
51      transform <tt>"hello"</tt> into position 63. The policy by
52      which the hash value is transformed into a position within
53      the table can dramatically affect performance (see <a href=
54      "hash_based_containers.html#hash_policies">Design::Associative
55      Containers::Hash-Based Containers::Hash Policies</a>).
56      Hash-based containers also do not resize naturally (as
57      opposed to tree-based containers, for example). The
58      appropriate resize policy is unfortunately intertwined with
59      the policy that transforms hash value into a position within
60      the table (see <a href=
61      "hash_based_containers.html#resize_policies">Design::Associative
62      Containers::Hash-Based Containers::Resize Policies</a>).
63
64        <p><a href=
65        "assoc_performance_tests.html#hash_based">Associative-Container
66        Performance Tests::Hash-Based Containers</a> quantifies
67        some of these points.</p>
68      </li>
69
70      <li>Tree-based containers, for example, also support look-up
71      and insertion methods, and are primarily useful when
72      maintaining order between elements is important. In some
73      cases, though, one can utilize their balancing algorithms for
74      completely different purposes.
75
76        <p>Figure <a href="#node_invariants">Metadata for
77        order-statistics and interval intersections</a>-A, for
78        example, shows a tree whose each node contains two entries:
79        a floating-point key, and some size-type <i>metadata</i>
80        (in bold beneath it) that is the number of nodes in the
81        sub-tree. (<i>E.g.</i>, the root has key 0.99, and has 5
82        nodes (including itself) in its sub-tree.) A container based
83        on this data structure can obviously answer efficiently
84        whether 0.3 is in the container object, but it can also
85        answer what is the order of 0.3 among all those in the
86        container object [<a href=
87        "references.html#clrs2001">clrs2001</a>] (see <a href=
88        "assoc_examples.html#tree_like_based">Associative Container
89        Examples::Tree-Like-Based Containers (Trees and
90        Tries)</a>).</p>
91
92        <p>As another example, Figure <a href=
93        "#node_invariants">Metadata for order-statistics and
94        interval intersections</a>-B shows a tree whose each node
95        contains two entries: a half-open geometric line interval,
96        and a number <i>metadata</i> (in bold beneath it) that is
97        the largest endpoint of all intervals in its sub-tree.
98        (<i>E.g.</i>, the root describes the interval <i>[20,
99        36)</i>, and the largest endpoint in its sub-tree is 99.) A
100        container based on this data structure can obviously answer
101        efficiently whether <i>[3, 41)</i> is in the container
102        object, but it can also answer efficiently whether the
103        container object has intervals that intersect <i>[3,
104        41)</i> (see <a href=
105        "assoc_examples.html#tree_like_based">Associative Container
106        Examples::Tree-Like-Based Containers (Trees and
107        Tries)</a>). These types of queries are very useful in
108        geometric algorithms and lease-management algorithms.</p>
109
110        <p>It is important to note, however, that as the trees are
111        modified, their internal structure changes. To maintain
112        these invariants, one must supply some policy that is aware
113        of these changes (see <a href=
114        "tree_based_containers.html#invariants">Design::Associative
115        Containers::Tree-Based Containers::Node Invariants</a>);
116        without this, it would be better to use a linked list (in
117        itself very efficient for these purposes).</p>
118
119        <p><a href=
120        "assoc_performance_tests.html#tree_like_based">Associative-Container
121        Performance Tests::Tree-Like-Based Containers</a>
122        quantifies some of these points.</p>
123      </li>
124    </ol>
125
126    <h6 class="c1"><a name="node_invariants" id=
127    "node_invariants"><img src="node_invariants.png" alt=
128    "no image" /></a></h6>
129
130    <h6 class="c1">Metadata for order-statistics and interval
131    intersections.</h6>
132
133    <h3><a name="assoc_ds_genericity" id="assoc_ds_genericity">More
134    Data Structures and Traits</a></h3>
135
136    <p>The STL contains associative containers based on red-black
137    trees and collision-chaining hash tables. These are obviously
138    very useful, but they are not ideal for all types of
139    settings.</p>
140
141    <p>Figure <a href=
142    "#different_underlying_data_structures">Different underlying
143    data structures</a> shows different underlying data structures
144    (the ones currently supported in <tt>pb_ds</tt>). A shows a
145    collision-chaining hash-table, B shows a probing hash-table, C
146    shows a red-black tree, D shows a splay tree, E shows a tree
147    based on an ordered vector(implicit in the order of the
148    elements), F shows a PATRICIA trie, and G shows a list-based
149    container with update policies.</p>
150
151    <p>Each of these data structures has some performance benefits,
152    in terms of speed, size or both (see <a href=
153    "assoc_performance_tests.html">Associative-Container
154    Performance Tests</a> and <a href=
155    "assoc_performance_tests.html#dss_family_choice">Associative-Container
156    Performance Tests::Observations::Underlying Data-Structure
157    Families</a>). For now, though, note that <i>e.g.</i>,
158    vector-based trees and probing hash tables manipulate memory
159    more efficiently than red-black trees and collision-chaining
160    hash tables, and that list-based associative containers are
161    very useful for constructing "multimaps" (see <a href=
162    "#assoc_mapping_semantics">Alternative to Multiple Equivalent
163    Keys</a>, <a href=
164    "assoc_performance_tests.html#multimaps">Associative Container
165    Performance Tests::Multimaps</a>, and <a href=
166    "assoc_performance_tests.html#msc">Associative Container
167    Performance Tests::Observations::Mapping-Semantics
168    Considerations</a>).</p>
169
170    <h6 class="c1"><a name="different_underlying_data_structures"
171    id="different_underlying_data_structures"><img src=
172    "different_underlying_dss.png" alt="no image" /></a></h6>
173
174    <h6 class="c1">Different underlying data structures.</h6>
175
176    <p>Now consider a function manipulating a generic associative
177    container, <i>e.g.</i>,</p>
178    <pre>
179<b>template</b>&lt;
180    <b>class</b> Cntnr&gt;
181<b>int</b> 
182    some_op_sequence
183    (Cntnr &amp;r_cnt)
184{
185    ...
186}
187</pre>
188
189    <p>Ideally, the underlying data structure of <tt>Cntnr</tt>
190    would not affect what can be done with <tt>r_cnt</tt>.
191    Unfortunately, this is not the case.</p>
192
193    <p>For example, if <tt>Cntnr</tt> is <tt>std::map</tt>, then
194    the function can use <tt>std::for_each(r_cnt.find(foo),
195    r_cnt.find(bar), foobar)</tt> in order to apply <tt>foobar</tt>
196    to all elements between <tt>foo</tt> and <tt>bar</tt>. If
197    <tt>Cntnr</tt> is a hash-based container, then this call's
198    results are undefined.</p>
199
200    <p>Also, if <tt>Cntnr</tt> is tree-based, the type and object
201    of the comparison functor can be accessed. If <tt>Cntnr</tt> is
202    hash based, these queries are nonsensical.</p>
203
204    <p>There are various other differences based on the container's
205    underlying data structure. For one, they can be constructed by,
206    and queried for, different policies. Furthermore:</p>
207
208    <ol>
209      <li>Containers based on C, D, E and F store elements in a
210      meaningful order; the others store elements in a meaningless
211      (and probably time-varying) order. By implication, only
212      containers based on C, D, E and F can support erase
213      operations taking an iterator and returning an iterator to
214      the following element without performance loss (see <a href=
215      "#assoc_ers_methods">Slightly Different Methods::Methods
216      Related to Erase</a>).</li>
217
218      <li>Containers based on C, D, E, and F can be split and
219      joined efficiently, while the others cannot. Containers based
220      on C and D, furthermore, can guarantee that this is
221      exception-free; containers based on E cannot guarantee
222      this.</li>
223
224      <li>Containers based on all but E can guarantee that erasing
225      an element is exception free; containers based on E cannot
226      guarantee this. Containers based on all but B and E can
227      guarantee that modifying an object of their type does not
228      invalidate iterators or references to their elements, while
229      containers based on B and E cannot. Containers based on C, D,
230      and E can furthermore make a stronger guarantee, namely that
231      modifying an object of their type does not affect the order
232      of iterators.</li>
233    </ol>
234
235    <p>A unified tag and traits system (as used for the STL's
236    iterators, for example) can ease generic manipulation of
237    associative containers based on different underlying
238    data structures (see <a href=
239    "tutorial.html#assoc_ds_gen">Tutorial::Associative
240    Containers::Determining Containers' Attributes</a> and <a href=
241    "ds_gen.html#container_traits">Design::Associative
242    Containers::Data-Structure Genericity::Data-Structure Tags and
243    Traits</a>).</p>
244
245    <h3><a name="assoc_diff_it" id="assoc_diff_it">Differentiating
246    between Iterator Types</a></h3>
247
248    <p>Iterators are centric to the STL's design, because of the
249    container/algorithm/iterator decomposition that allows an
250    algorithm to operate on a range through iterators of some
251    sequence (<i>e.g.</i>, one originating from a container).
252    Iterators, then, are useful because they allow going over a
253    <u>sequence</u>. The STL also uses iterators for accessing a
254    <u>specific</u> element - <i>e.g.</i>, when an associative
255    container returns one through <tt>find</tt>. The STL, however,
256    consistently uses the same types of iterators for both
257    purposes: going over a range, and accessing a specific found
258    element. Before the introduction of hash-based containers to
259    the STL, this made sense (with the exception of priority
260    queues, which are discussed in <a href="#pq">Priority
261    Queues</a>).</p>
262
263    <p>Using the STL's associative containers together with
264    non-order-preserving associative containers (and also because
265    of priority-queues container), there is a possible need for
266    different types of iterators for self-organizing containers -
267    the iterator concept seems overloaded to mean two different
268    things (in some cases). The following subsections explain this;
269    <a href="tutorial.html#assoc_find_range">Tutorial::Associative
270    Containers::Point-Type and Range-Type Methods</a> explains an
271    alternative design which does not complicate the use of
272    order-preserving containers, but is better for unordered
273    containers; <a href=
274    "ds_gen.html#find_range">Design::Associative
275    Containers::Data-Structure Genericity::Point-Type and
276    Range-Type Methods</a> explains the design further.</p>
277
278    <h4><a name="assoc_find_it_range_it" id=
279    "assoc_find_it_range_it">Using Point-Type Iterators for
280    Range-Type Operations</a></h4>
281
282    <p>Suppose <tt>cntnr</tt> is some associative container, and
283    say <tt>c</tt> is an object of type <tt>cntnr</tt>. Then what
284    will be the outcome of</p>
285    <pre>
286std::for_each(c.find(1), c.find(5), foo);
287</pre>
288
289    <p>If <tt>cntnr</tt> is a tree-based container object, then an
290    in-order walk will apply <tt>foo</tt> to the relevant elements,
291    <i>e.g.</i>, as in Figure <a href="#range_it_in_hts">Range
292    iteration in different data structures</a> -A. If <tt>c</tt> is
293    a hash-based container, then the order of elements between any
294    two elements is undefined (and probably time-varying); there is
295    no guarantee that the elements traversed will coincide with the
296    <i>logical</i> elements between 1 and 5, <i>e.g.</i>, as in
297    Figure <a href="#range_it_in_hts">Range iteration in different
298    data structures</a>-B.</p>
299
300    <h6 class="c1"><a name="range_it_in_hts" id=
301    "range_it_in_hts"><img src="point_iterators_range_ops_1.png"
302    alt="no image" /></a></h6>
303
304    <h6 class="c1">Range iteration in different
305    data structures.</h6>
306
307    <p>In our opinion, this problem is not caused just because
308    red-black trees are order preserving while collision-chaining
309    hash tables are (generally) not - it is more fundamental. Most
310    of the STL's containers order sequences in a well-defined
311    manner that is determined by their <u>interface</u>: calling
312    <tt>insert</tt> on a tree-based container modifies its sequence
313    in a predictable way, as does calling <tt>push_back</tt> on a
314    list or a vector. Conversely, collision-chaining hash tables,
315    probing hash tables, priority queues, and list-based containers
316    (which are very useful for "multimaps") are self-organizing
317    data structures; the effect of each operation modifies their
318    sequences in a manner that is (practically) determined by their
319    <u>implementation</u>.</p>
320
321    <p>Consequently, applying an algorithm to a sequence obtained
322    from most containers <u>may or may not</u> make sense, but
323    applying it to a sub-sequence of a self-organizing container
324    <u>does not</u>.</p>
325
326    <h4><a name="assoc_range_it_for_find_it" id=
327    "assoc_range_it_for_find_it">The Cost of Enabling Range
328    Capabilities to Point-Type Iterators</a></h4>
329
330    <p>Suppose <tt>c</tt> is some collision-chaining hash-based
331    container object, and one calls <tt>c.find(3)</tt>. Then what
332    composes the returned iterator?</p>
333
334    <p>Figure <a href="#find_its_in_hash_tables">Point-type
335    iterators in hash tables</a>-A shows the simplest (and most
336    efficient) implementation of a collision-chaining hash table.
337    The little box marked <tt>point_iterator</tt> shows an object
338    that contains a pointer to the element's node. Note that this
339    "iterator" has no way to move to the next element (<i>i.e.</i>,
340    it cannot support <tt><b>operator</b>++</tt>). Conversely, the
341    little box marked <tt>iterator</tt> stores both a pointer to
342    the element, as well as some other information (<i>e.g.</i>,
343    the bucket number of the element). the second iterator, then,
344    is "heavier" than the first one- it requires more time and
345    space. If we were to use a different container to
346    cross-reference into this hash-table using these iterators - it
347    would take much more space. As noted in <a href=
348    "#assoc_find_it_range_it">Using Point-Type Iterators for
349    Range-Type Operations</a>, nothing much can be done by
350    incrementing these iterators, so why is this extra information
351    needed?</p>
352
353    <p>Alternatively, one might create a collision-chaining
354    hash-table where the lists might be linked, forming a
355    monolithic total-element list, as in Figure <a href=
356    "#find_its_in_hash_tables">Point-type iterators in hash
357    tables</a>-B (this seems similar to the Dinkumware design
358    [<a href="references.html#dinkumware_stl">dinkumware_stl</a>]).
359    Here the iterators are as light as can be, but the hash-table's
360    operations are more complicated.</p>
361
362    <h6 class="c1"><a name="find_its_in_hash_tables" id=
363    "find_its_in_hash_tables"><img src=
364    "point_iterators_range_ops_2.png" alt="no image" /></a></h6>
365
366    <h6 class="c1">Point-type iterators in hash tables.</h6>
367
368    <p>It should be noted that containers based on
369    collision-chaining hash-tables are not the only ones with this
370    type of behavior; many other self-organizing data structures
371    display it as well.</p>
372
373    <h4><a name="assoc_inv_guar" id="assoc_inv_guar">Invalidation
374    Guarantees</a></h4>
375
376    <p>Consider the following snippet:</p>
377    <pre>
378it = c.find(3);
379
380c.erase(5);
381</pre>
382
383    <p>Following the call to <tt>erase</tt>, what is the validity
384    of <tt>it</tt>: can it be de-referenced? can it be
385    incremented?</p>
386
387    <p>The answer depends on the underlying data structure of the
388    container. Figure <a href=
389    "#invalidation_guarantee_erase">Effect of erase in different
390    underlying data structures</a> shows three cases: A1 and A2
391    show a red-black tree; B1 and B2 show a probing hash-table; C1
392    and C2 show a collision-chaining hash table.</p>
393
394    <h6 class="c1"><a name="invalidation_guarantee_erase" id=
395    "invalidation_guarantee_erase"><img src=
396    "invalidation_guarantee_erase.png" alt="no image" /></a></h6>
397
398    <h6 class="c1">Effect of erase in different underlying
399    data structures.</h6>
400
401    <ol>
402      <li>Erasing 5 from A1 yields A2. Clearly, an iterator to 3
403      can be de-referenced and incremented. The sequence of
404      iterators changed, but in a way that is well-defined by the
405      <u>interface</u>.</li>
406
407      <li>Erasing 5 from B1 yields B2. Clearly, an iterator to 3 is
408      not valid at all - it cannot be de-referenced or incremented;
409      the order of iterators changed in a way that is (practically)
410      determined by the <u>implementation</u> and not by the
411      <u>interface</u>.</li>
412
413      <li>Erasing 5 from C1 yields C2. Here the situation is more
414      complicated. On the one hand, there is no problem in
415      de-referencing <tt>it</tt>. On the other hand, the order of
416      iterators changed in a way that is (practically) determined
417      by the <u>implementation</u> and not by the
418      <u>interface</u>.</li>
419    </ol>
420
421    <p>So in classic STL, it is not always possible to express
422    whether <tt>it</tt> is valid or not. This is true also for
423    <tt>insert</tt>, <i>e.g.</i>. Again, the iterator concept seems
424    overloaded.</p>
425
426    <h3><a name="assoc_methods" id="assoc_methods">Slightly
427    Different Methods</a></h3>
428
429    <p>[<a href="references.html#meyers02both">meyers02both</a>]
430    points out that a class's methods should comprise only
431    operations which depend on the class's internal structure;
432    other operations are best designed as external functions.
433    Possibly, therefore, the STL's associative containers lack some
434    useful methods, and provide some other methods which would be
435    better left out (<i>e.g.</i>, [<a href=
436    "references.html#sgi_stl">sgi_stl</a>] ).</p>
437
438    <h4><a name="assoc_ers_methods" id="assoc_ers_methods">Methods
439    Related to Erase</a></h4>
440
441    <ol>
442      <li>Order-preserving STL associative containers provide the
443      method
444        <pre>
445iterator 
446    erase
447    (iterator it)
448</pre>which takes an iterator, erases the corresponding element,
449and returns an iterator to the following element. Also hash-based
450STL associative containers provide this method. This <u>seemingly
451increases</u> genericity between associative containers, since, <i>
452        e.g.</i>, it is possible to use
453        <pre>
454<b>typename</b> C::iterator it = c.begin();
455<b>typename</b> C::iterator e_it = c.end();
456
457<b>while</b>(it != e_it)
458    it = pred(*it)? c.erase(it) : ++it;
459</pre>in order to erase from a container object <tt>
460        c</tt> all element which match a predicate <tt>pred</tt>.
461        However, in a different sense this actually
462        <u>decreases</u> genericity: an integral implication of
463        this method is that tree-based associative containers'
464        memory use is linear in the total number of elements they
465        store, while hash-based containers' memory use is unbounded
466        in the total number of elements they store. Assume a
467        hash-based container is allowed to decrease its size when
468        an element is erased. Then the elements might be rehashed,
469        which means that there is no "next" element - it is simply
470        undefined. Consequently, it is possible to infer from the
471        fact that STL hash-based containers provide this method
472        that they cannot downsize when elements are erased
473        (<a href="assoc_performance_tests.html#hash_based">Performance
474        Tests::Hash-Based Container Tests</a> quantifies this.) As
475        a consequence, different code is needed to manipulate
476        different containers, assuming that memory should be
477        conserved. <tt>pb_ds</tt>'s non-order preserving
478        associative containers omit this method.
479      </li>
480
481      <li>All of <tt>pb_ds</tt>'s associative containers include a
482      conditional-erase method
483        <pre>
484<b>template</b>&lt;
485    <b>class</b> Pred&gt;
486size_type
487    erase_if
488    (Pred pred)
489</pre>which erases all elements matching a predicate. This is
490probably the only way to ensure linear-time multiple-item erase
491which can actually downsize a container.
492      </li>
493
494      <li>STL associative containers provide methods for
495      multiple-item erase of the form
496        <pre>
497size_type
498    erase
499    (It b, 
500        It e)
501</pre>erasing a range of elements given by a pair of iterators. For
502tree-based or trie-based containers, this can implemented more
503efficiently as a (small) sequence of split and join operations. For
504other, unordered, containers, this method isn't much better than an
505external loop. Moreover, if <tt>c</tt> is a hash-based container,
506then, <i>e.g.</i>, <tt>c.erase(c.find(2), c.find(5))</tt> is almost
507certain to do something different than erasing all elements whose
508keys are between 2 and 5, and is likely to produce other undefined
509behavior.
510      </li>
511    </ol>
512
513    <h4><a name="assoc_split_join_methods" id=
514    "assoc_split_join_methods">Methods Related to Split and
515    Join</a></h4>
516
517    <p>It is well-known that tree-based and trie-based container
518    objects can be efficiently split or joined [<a href=
519    "references.html#clrs2001">clrs2001</a>]. Externally splitting
520    or joining trees is super-linear, and, furthermore, can throw
521    exceptions. Split and join methods, consequently, seem good
522    choices for tree-based container methods, especially, since as
523    noted just before, they are efficient replacements for erasing
524    sub-sequences. <a href=
525    "assoc_performance_tests.html#tree_like_based">Performance
526    Tests::Tree-Like-Based Container Tests</a> quantifies this.</p>
527
528    <h4><a name="assoc_insert_methods" id=
529    "assoc_insert_methods">Methods Related to Insert</a></h4>
530
531    <p>STL associative containers provide methods of the form</p>
532    <pre>
533<b>template</b>&lt;
534    <b>class</b> It&gt;
535size_type
536    insert
537    (It b,
538        It e);
539</pre>for inserting a range of elements given by a pair of
540iterators. At best, this can be implemented as an external loop,
541or, even more efficiently, as a join operation (for the case of
542tree-based or trie-based containers). Moreover, these methods seem
543similar to constructors taking a range given by a pair of
544iterators; the constructors, however, are transactional, whereas
545the insert methods are not; this is possibly confusing.
546
547    <h4><a name="assoc_equiv_comp_methods" id=
548    "assoc_equiv_comp_methods">Functions Related to
549    Comparison</a></h4>
550
551    <p>Associative containers are parametrized by policies
552    allowing to test key equivalence; <i>e.g.</i> a hash-based
553    container can do this through its equivalence functor, and a
554    tree-based container can do this through its comparison
555    functor. In addition, some STL associative containers have
556    global function operators, <i>e.g.</i>,
557    <tt><b>operator</b>==</tt> and <tt><b>operator</b>&lt;=</tt>,
558    that allow comparing entire associative containers.</p>
559
560    <p>In our opinion, these functions are better left out. To
561    begin with, they do not significantly improve over an external
562    loop. More importantly, however, they are possibly misleading -
563    <tt><b>operator</b>==</tt>, for example, usually checks for
564    equivalence, or interchangeability, but the associative
565    container cannot check for values' equivalence, only keys'
566    equivalence; also, are two containers considered equivalent if
567    they store the same values in different order? this is an
568    arbitrary decision.</p>
569
570    <h3><a name="assoc_mapping_semantics" id=
571    "assoc_mapping_semantics">Alternative to Multiple Equivalent
572    Keys</a></h3>
573
574    <p>Maps (or sets) allow mapping (or storing) unique-key values.
575    The STL, however, also supplies associative containers which
576    map (or store) multiple values with equivalent keys:
577    <tt>std::multimap</tt>, <tt>std::multiset</tt>,
578    <tt>std::tr1::unordered_multimap</tt>, and
579    <tt>unordered_multiset</tt>. We first discuss how these might
580    be used, then why we think it is best to avoid them.</p>
581
582    <p>Suppose one builds a simple bank-account application that
583    records for each client (identified by an <tt>std::string</tt>)
584    and account-id (marked by an <tt><b>unsigned long</b></tt>) -
585    the balance in the account (described by a
586    <tt><b>float</b></tt>). Suppose further that ordering this
587    information is not useful, so a hash-based container is
588    preferable to a tree based container. Then one can use</p>
589    <pre>
590std::tr1::unordered_map&lt;std::pair&lt;std::string, <b>unsigned long</b>&gt;, <b>float</b>, ...&gt;
591</pre>which <u>hashes every combination of client and
592account-id</u>. This might work well, except for the fact that it
593is now impossible to efficiently list all of the accounts of a
594specific client (this would practically require iterating over all
595entries). Instead, one can use
596    <pre>
597std::tr1::unordered_multimap&lt;std::pair&lt;std::string, <tt><b>unsigned long</b></tt>&gt;, <b>float</b>, ...&gt;
598</pre>which <u>hashes every client</u>, and <u>decides equivalence
599based on client</u> only. This will ensure that all accounts
600belonging to a specific user are stored consecutively.
601
602    <p>Also, suppose one wants an integers' priority queue
603    (<i>i.e.,</i> a container that supports <tt>push</tt>,
604    <tt>pop</tt>, and <tt>top</tt> operations, the last of which
605    returns the largest <tt><b>int</b></tt>) that also supports
606    operations such as <tt>find</tt> and <tt>lower_bound</tt>. A
607    reasonable solution is to build an adapter over
608    <tt>std::set&lt;<b>int</b>&gt;</tt>. In this adapter,
609    <i>e.g.</i>, <tt>push</tt> will just call the tree-based
610    associative container's <tt>insert</tt> method; <tt>pop</tt>
611    will call its <tt>end</tt> method, and use it to return the
612    preceding element (which must be the largest). Then this might
613    work well, except that the container object cannot hold
614    multiple instances of the same integer (<tt>push(4)</tt>,
615    <i>e.g.</i>, will be a no-op if <tt>4</tt> is already in the
616    container object). If multiple keys are necessary, then one
617    might build the adapter over an
618    <tt>std::multiset&lt;<b>int</b>&gt;</tt>.</p>
619
620    <p class="c1">STL non-unique-mapping containers, then, are
621    useful when (1) a key can be decomposed in to a primary key and
622    a secondary key, (2) a key is needed multiple times, or (3) any
623    combination of (1) and (2).</p>
624
625    <p>Figure <a href="#embedded_lists_1">Non-unique mapping
626    containers in the STL's design</a> shows how the STL's design
627    works internally; in this figure nodes shaded equally represent
628    equivalent-key values. Equivalent keys are stored consecutively
629    using the properties of the underlying data structure: binary
630    search trees (Figure <a href="#embedded_lists_1">Non-unique
631    mapping containers in the STL's design</a>-A) store
632    equivalent-key values consecutively (in the sense of an
633    in-order walk) naturally; collision-chaining hash tables
634    (Figure <a href="#embedded_lists_1">Non-unique mapping
635    containers in the STL's design</a>-B) store equivalent-key
636    values in the same bucket, the bucket can be arranged so that
637    equivalent-key values are consecutive.</p>
638
639    <h6 class="c1"><a name="embedded_lists_1" id=
640    "embedded_lists_1"><img src="embedded_lists_1.png" alt=
641    "no image" /></a></h6>
642
643    <h6 class="c1">Non-unique mapping containers in the STL's
644    design.</h6>
645
646    <p>Put differently, STL non-unique mapping
647    associative-containers are associative containers that map
648    primary keys to linked lists that are embedded into the
649    container. Figure <a href="#embedded_lists_2">Effect of
650    embedded lists in STL multimaps</a> shows again the two
651    containers from Figure <a href="#embedded_lists_1">Non-unique
652    mapping containers in the STL's design</a>, this time with the
653    embedded linked lists of the grayed nodes marked
654    explicitly.</p>
655
656    <h6 class="c1"><a name="embedded_lists_2" id=
657    "embedded_lists_2"><img src="embedded_lists_2.png" alt=
658    "no image" /></a></h6>
659
660    <h6 class="c1">Effect of embedded lists in STL multimaps.</h6>
661
662    <p>These embedded linked lists have several disadvantages.</p>
663
664    <ol>
665      <li>The underlying data structure embeds the linked lists
666      according to its own consideration, which means that the
667      search path for a value might include several different
668      equivalent-key values. For example, the search path for the
669      the black node in either of Figures <a href=
670      "#embedded_lists_1">Non-unique mapping containers in the
671      STL's design</a> A or B, includes more than a single gray
672      node.</li>
673
674      <li>The links of the linked lists are the underlying
675      data structures' nodes, which typically are quite structured.
676      <i>E.g.</i>, in the case of tree-based containers (Figure
677      <a href="#embedded_lists_2">Effect of embedded lists in STL
678      multimaps</a>-B), each "link" is actually a node with three
679      pointers (one to a parent and two to children), and a
680      relatively-complicated iteration algorithm. The linked lists,
681      therefore, can take up quite a lot of memory, and iterating
682      over all values equal to a given key (<i>e.g.</i>, through
683      the return value of the STL's <tt>equal_range</tt>) can be
684      expensive.</li>
685
686      <li>The primary key is stored multiply; this uses more
687      memory.</li>
688
689      <li>Finally, the interface of this design excludes several
690      useful underlying data structures. <i>E.g.</i>, of all the
691      unordered self-organizing data structures, practically only
692      collision-chaining hash tables can (efficiently) guarantee
693      that equivalent-key values are stored consecutively.</li>
694    </ol>
695
696    <p>The above reasons hold even when the ratio of secondary keys
697    to primary keys (or average number of identical keys) is small,
698    but when it is large, there are more severe problems:</p>
699
700    <ol>
701      <li>The underlying data structures order the links inside
702      each embedded linked-lists according to their internal
703      considerations, which effectively means that each of the
704      links is unordered. Irrespective of the underlying
705      data structure, searching for a specific value can degrade to
706      linear complexity.</li>
707
708      <li>Similarly to the above point, it is impossible to apply
709      to the secondary keys considerations that apply to primary
710      keys. For example, it is not possible to maintain secondary
711      keys by sorted order.</li>
712
713      <li>While the interface "understands" that all equivalent-key
714      values constitute a distinct list (<i>e.g.</i>, through
715      <tt>equal_range</tt>), the underlying data structure
716      typically does not. This means, <i>e.g.</i>, that operations
717      such as erasing from a tree-based container all values whose
718      keys are equivalent to a a given key can be super-linear in
719      the size of the tree; this is also true also for several
720      other operations that target a specific list.</li>
721    </ol>
722
723    <p>In <tt>pb_ds</tt>, therefore, all associative containers map
724    (or store) unique-key values. One can (1) map primary keys to
725    secondary associative-containers (<i>i.e.</i>, containers of
726    secondary keys) or non-associative containers (2) map identical
727    keys to a size-type representing the number of times they
728    occur, or (3) any combination of (1) and (2). Instead of
729    allowing multiple equivalent-key values, <tt>pb_ds</tt>
730    supplies associative containers based on underlying
731    data structures that are suitable as secondary
732    associative-containers (see <a href=
733    "assoc_performance_tests.html#msc">Associative-Container
734    Performance Tests::Observations::Mapping-Semantics
735    Considerations</a>).</p>
736
737    <p>Figures <a href="#embedded_lists_3">Non-unique mapping
738    containers in <tt>pb_ds</tt></a> A and B show the equivalent
739    structures in <tt>pb_ds</tt>'s design, to those in Figures
740    <a href="#embedded_lists_1">Non-unique mapping containers in
741    the STL's design</a> A and B, respectively. Each shaded box
742    represents some size-type or secondary
743    associative-container.</p>
744
745    <h6 class="c1"><a name="embedded_lists_3" id=
746    "embedded_lists_3"><img src="embedded_lists_3.png" alt=
747    "no image" /></a></h6>
748
749    <h6 class="c1">Non-unique mapping containers in the
750    <tt>pb_ds</tt>.</h6>
751
752    <p>In the first example above, then, one would use an
753    associative container mapping each user to an associative
754    container which maps each application id to a start time (see
755    <a href=
756    "http://gcc.gnu.org/viewcvs/*checkout*/trunk/libstdc%2B%2B-v3/testsuite/ext/pb_ds/example/basic_multimap.cc"><tt>basic_multimap.cc</tt></a>);
757    in the second example, one would use an associative container
758    mapping each <tt><b>int</b></tt> to some size-type indicating
759    the number of times it logically occurs (see <a href=
760    "http://gcc.gnu.org/viewcvs/*checkout*/trunk/libstdc%2B%2B-v3/testsuite/ext/pb_ds/example/basic_multiset.cc"><tt>basic_multiset.cc</tt></a>).</p>
761
762    <p><a href=
763    "assoc_performance_tests.html#multimaps">Associative-Container
764    Performance Tests::Multimaps</a> quantifies some of these
765    points, and <a href=
766    "assoc_performance_tests.html#msc">Associative-Container
767    Performance Tests::Observations::Mapping-Semantics
768    Considerations</a> shows some simple calculations.</p>
769
770    <p><a href="assoc_examples.html#mmaps">Associative-Container
771    Examples::Multimaps</a> shows some simple examples of using
772    "multimaps".</p>
773
774    <p><a href="lu_based_containers.html">Design::Associative
775    Containers::List-Based Containers</a> discusses types of
776    containers especially suited as secondary
777    associative-containers.</p>
778
779    <h2><a name="pq" id="pq">Priority Queues</a></h2>
780
781    <h3><a name="pq_more_ops" id="pq_more_ops">Slightly Different
782    Methods</a></h3>
783
784    <p>Priority queues are containers that allow efficiently
785    inserting values and accessing the maximal value (in the sense
786    of the container's comparison functor); <i>i.e.</i>, their
787    interface supports <tt>push</tt> and <tt>pop</tt>. The STL's
788    priority queues indeed support these methods, but they support
789    little else. For algorithmic and software-engineering purposes,
790    other methods are needed:</p>
791
792    <ol>
793      <li>Many graph algorithms [<a href=
794      "references.html#clrs2001">clrs2001</a>] require increasing a
795      value in a priority queue (again, in the sense of the
796      container's comparison functor), or joining two
797      priority-queue objects.</li>
798
799      <li>It is sometimes necessary to erase an arbitrary value in
800      a priority queue. For example, consider the <tt>select</tt>
801      function for monitoring file descriptors:
802        <pre>
803<b>int</b> 
804    select
805    (<b>int</b> nfds,             
806        fd_set *readfds,                
807        fd_set *writefds,
808        fd_set *errorfds, 
809        <b>struct</b> timeval *timeout);
810</pre>then, as the <tt>select</tt> manual page [<a href=
811"references.html#select_man">select_man</a>] states:
812
813        <p><q>The nfds argument specifies the range of file
814        descriptors to be tested. The select() function tests file
815        descriptors in the range of 0 to nfds-1.</q></p>
816
817        <p>It stands to reason, therefore, that we might wish to
818        maintain a minimal value for <tt>nfds</tt>, and priority
819        queues immediately come to mind. Note, though, that when a
820        socket is closed, the minimal file description might
821        change; in the absence of an efficient means to erase an
822        arbitrary value from a priority queue, we might as well
823        avoid its use altogether.</p>
824
825        <p><a href="pq_examples.html#xref">Priority-Queue
826        Examples::Cross-Referencing</a> shows examples for these
827        types of operations.</p>
828      </li>
829
830      <li>STL containers typically support iterators. It is
831      somewhat unusual for <tt>std::priority_queue</tt> to omit
832      them (see, <i>e.g.</i>, [<a href=
833      "references.html#meyers01stl">meyers01stl</a>]). One might
834      ask why do priority queues need to support iterators, since
835      they are self-organizing containers with a different purpose
836      than abstracting sequences. There are several reasons:
837
838        <ol>
839          <li>Iterators (even in self-organizing containers) are
840          useful for many purposes, <i>e.g.</i>, cross-referencing
841          containers, serialization, and debugging code that uses
842          these containers.</li>
843
844          <li>The STL's hash-based containers support iterators,
845          even though they too are self-organizing containers with
846          a different purpose than abstracting sequences.</li>
847
848          <li>In STL-like containers, it is natural to specify the
849          interface of operations for modifying a value or erasing
850          a value (discussed previously) in terms of a iterators.
851          This is discussed further in <a href=
852          "pq_design.html#pq_it">Design::Priority
853          Queues::Iterators</a>. It should be noted that the STL's
854          containers also use iterators for accessing and
855          manipulating a specific value. <i>E.g.</i>, in hash-based
856          containers, one checks the existence of a key by
857          comparing the iterator returned by <tt>find</tt> to the
858          iterator returned by <tt>end</tt>, and not by comparing a
859          pointer returned by <tt>find</tt> to <tt>NULL</tt>.</li>
860        </ol>
861      </li>
862    </ol>
863
864    <p><a href="pq_performance_tests.html">Performance
865    Tests::Priority Queues</a> quantifies some of these points.</p>
866
867    <h3><a name="pq_ds_genericity" id="pq_ds_genericity">More Data
868    Structures and Traits</a></h3>
869
870    <p>There are three main implementations of priority queues: the
871    first employs a binary heap, typically one which uses a
872    sequence; the second uses a tree (or forest of trees), which is
873    typically less structured than an associative container's tree;
874    the third simply uses an associative container. These are
875    shown, respectively, in Figures <a href=
876    "#pq_different_underlying_dss">Underlying Priority-Queue
877    Data-Structures</a> A1 and A2, B, and C.</p>
878
879    <h6 class="c1"><a name="pq_different_underlying_dss" id=
880    "pq_different_underlying_dss"><img src=
881    "pq_different_underlying_dss.png" alt="no image" /></a></h6>
882
883    <h6 class="c1">Underlying Priority-Queue Data-Structures.</h6>
884
885    <p>No single implementation can completely replace any of the
886    others. Some have better <tt>push</tt> and <tt>pop</tt>
887    amortized performance, some have better bounded (worst case)
888    response time than others, some optimize a single method at the
889    expense of others, <i>etc.</i>. In general the "best"
890    implementation is dictated by the problem (see <a href=
891    "pq_performance_tests.html#pq_observations">Performance
892    Tests::Priority Queues::Observations</a>).</p>
893
894    <p>As with associative containers (see <a href=
895    "#assoc_ds_genericity">Associative Containers::Traits for
896    Underlying Data-Structures</a>), the more implementations
897    co-exist, the more necessary a traits mechanism is for handling
898    generic containers safely and efficiently. This is especially
899    important for priority queues, since the invalidation
900    guarantees of one of the most useful data structures - binary
901    heaps - is markedly different than those of most of the
902    others.</p>
903
904    <p><a href="pq_design.html#pq_traits">Design::Priority
905    Queues::Traits</a> discusses this further.</p>
906
907    <h3><a name="pq_binary_heap" id="pq_binary_heap">Binary Heap
908    Implementation</a></h3>
909
910    <p>Binary heaps are one of the most useful underlying
911    data structures for priority queues. They are very efficient in
912    terms of memory (since they don't require per-value structure
913    metadata), and have the best amortized <tt>push</tt> and
914    <tt>pop</tt> performance for primitive types (<i>e.g.</i>,
915    <tt><b>int</b></tt>s).</p>
916
917    <p>The STL's <tt>priority_queue</tt> implements this data
918    structure as an adapter over a sequence, typically
919    <tt>std::vector</tt> or <tt>std::deque</tt>, which correspond
920    to Figures <a href="#pq_different_underlying_dss">Underlying
921    Priority-Queue Data-Structures</a> A1 and A2, respectively.</p>
922
923    <p>This is indeed an elegant example of the adapter concept and
924    the algorithm/container/iterator decomposition (see [<a href=
925    "references.html#nelson96stlpq">nelson96stlpql</a>]). There are
926    possibly reasons, however, why a binary-heap priority queue
927    would be better implemented as a container instead of a
928    sequence adapter:</p>
929
930    <ol>
931      <li><tt>std::priority_queue</tt> cannot erase values from its
932      adapted sequence (irrespective of the sequence type). This
933      means that the memory use of an <tt>std::priority_queue</tt>
934      object is always proportional to the maximal number of values
935      it ever contained, and not to the number of values that it
936      currently contains (see <a href=
937      "priority_queue_text_pop_mem_usage_test.html">Priority Queue
938      Text <tt>pop</tt> Memory Use Test</a>); this implementation
939      of binary heaps acts very differently than other underlying
940      data structures (<i>e.g.</i>, pairing heaps).</li>
941
942      <li>Some combinations of adapted sequences and value types
943      are very inefficient or just don't make sense. If one uses
944      <tt>std::priority_queue&lt;std::vector&lt;std::string&gt;
945      &gt; &gt;</tt>, for example, then not only will each
946      operation perform a logarithmic number of
947      <tt>std::string</tt> assignments, but, furthermore, any
948      operation (including <tt>pop</tt>) can render the container
949      useless due to exceptions. Conversely, if one uses
950      <tt>std::priority_queue&lt;std::deque&lt;<b>int</b>&gt; &gt;
951      &gt;</tt>, then each operation uses incurs a logarithmic
952      number of indirect accesses (through pointers) unnecessarily.
953      It might be better to let the container make a conservative
954      deduction whether to use the structure in Figures <a href=
955      "#pq_different_underlying_dss">Underlying Priority-Queue
956      Data-Structures</a> A1 or A2.</li>
957
958      <li>There does not seem to be a systematic way to determine
959      what exactly can be done with the priority queue.
960
961        <ol>
962          <li>If <tt>p</tt> is a priority queue adapting an
963          <tt>std::vector</tt>, then it is possible to iterate over
964          all values by using <tt>&amp;p.top()</tt> and
965          <tt>&amp;p.top() + p.size()</tt>, but this will not work
966          if <tt>p</tt> is adapting an <tt>std::deque</tt>; in any
967          case, one cannot use <tt>p.begin()</tt> and
968          <tt>p.end()</tt>. If a different sequence is adapted, it
969          is even more difficult to determine what can be
970          done.</li>
971
972          <li>If <tt>p</tt> is a priority queue adapting an
973          <tt>std::deque</tt>, then the reference return by
974          <tt>p.top()</tt> will remain valid until it is popped,
975          but if <tt>p</tt> adapts an <tt>std::vector</tt>, the
976          next <tt>push</tt> will invalidate it. If a different
977          sequence is adapted, it is even more difficult to
978          determine what can be done.</li>
979        </ol>
980      </li>
981
982      <li>Sequence-based binary heaps can still implement
983      linear-time <tt>erase</tt> and <tt>modify</tt> operations.
984      This means that if one needs, <i>e.g.</i>, to erase a small
985      (say logarithmic) number of values, then one might still
986      choose this underlying data structure. Using
987      <tt>std::priority_queue</tt>, however, this will generally
988      change the order of growth of the entire sequence of
989      operations.</li>
990    </ol>
991  </div>
992</body>
993</html>
994