policy_data_structures.xml revision 1.1.1.1
1<chapter xmlns="http://docbook.org/ns/docbook" version="5.0" 2 xml:id="manual.ext.containers.pbds" xreflabel="pbds"> 3 <info> 4 <title>Policy-Based Data Structures</title> 5 <keywordset> 6 <keyword>ISO C++</keyword> 7 <keyword>policy</keyword> 8 <keyword>container</keyword> 9 <keyword>data</keyword> 10 <keyword>structure</keyword> 11 <keyword>associated</keyword> 12 <keyword>tree</keyword> 13 <keyword>trie</keyword> 14 <keyword>hash</keyword> 15 <keyword>metaprogramming</keyword> 16 </keywordset> 17 </info> 18 <?dbhtml filename="policy_data_structures.html"?> 19 20 <!-- 2006-04-01 Ami Tavory --> 21 <!-- 2011-05-25 Benjamin Kosnik --> 22 23 <!-- S01: intro --> 24 <section xml:id="pbds.intro"> 25 <info><title>Intro</title></info> 26 27 <para> 28 This is a library of policy-based elementary data structures: 29 associative containers and priority queues. It is designed for 30 high-performance, flexibility, semantic safety, and conformance to 31 the corresponding containers in <literal>std</literal> and 32 <literal>std::tr1</literal> (except for some points where it differs 33 by design). 34 </para> 35 <para> 36 </para> 37 38 <section xml:id="pbds.intro.issues"> 39 <info><title>Performance Issues</title></info> 40 <para> 41 </para> 42 43 <para> 44 An attempt is made to categorize the wide variety of possible 45 container designs in terms of performance-impacting factors. These 46 performance factors are translated into design policies and 47 incorporated into container design. 48 </para> 49 50 <para> 51 There is tension between unravelling factors into a coherent set of 52 policies. Every attempt is made to make a minimal set of 53 factors. However, in many cases multiple factors make for long 54 template names. Every attempt is made to alias and use typedefs in 55 the source files, but the generated names for external symbols can 56 be large for binary files or debuggers. 57 </para> 58 59 <para> 60 In many cases, the longer names allow capabilities and behaviours 61 controlled by macros to also be unamibiguously emitted as distinct 62 generated names. 63 </para> 64 65 <para> 66 Specific issues found while unraveling performance factors in the 67 design of associative containers and priority queues follow. 68 </para> 69 70 <section xml:id="pbds.intro.issues.associative"> 71 <info><title>Associative</title></info> 72 73 <para> 74 Associative containers depend on their composite policies to a very 75 large extent. Implicitly hard-wiring policies can hamper their 76 performance and limit their functionality. An efficient hash-based 77 container, for example, requires policies for testing key 78 equivalence, hashing keys, translating hash values into positions 79 within the hash table, and determining when and how to resize the 80 table internally. A tree-based container can efficiently support 81 order statistics, i.e. the ability to query what is the order of 82 each key within the sequence of keys in the container, but only if 83 the container is supplied with a policy to internally update 84 meta-data. There are many other such examples. 85 </para> 86 87 <para> 88 Ideally, all associative containers would share the same 89 interface. Unfortunately, underlying data structures and mapping 90 semantics differentiate between different containers. For example, 91 suppose one writes a generic function manipulating an associative 92 container. 93 </para> 94 95 <programlisting> 96 template<typename Cntnr> 97 void 98 some_op_sequence(Cntnr& r_cnt) 99 { 100 ... 101 } 102 </programlisting> 103 104 <para> 105 Given this, then what can one assume about the instantiating 106 container? The answer varies according to its underlying data 107 structure. If the underlying data structure of 108 <literal>Cntnr</literal> is based on a tree or trie, then the order 109 of elements is well defined; otherwise, it is not, in general. If 110 the underlying data structure of <literal>Cntnr</literal> is based 111 on a collision-chaining hash table, then modifying 112 r_<literal>Cntnr</literal> will not invalidate its iterators' order; 113 if the underlying data structure is a probing hash table, then this 114 is not the case. If the underlying data structure is based on a tree 115 or trie, then a reference to the container can efficiently be split; 116 otherwise, it cannot, in general. If the underlying data structure 117 is a red-black tree, then splitting a reference to the container is 118 exception-free; if it is an ordered-vector tree, exceptions can be 119 thrown. 120 </para> 121 122 </section> 123 124 <section xml:id="pbds.intro.issues.priority_queue"> 125 <info><title>Priority Que</title></info> 126 127 <para> 128 Priority queues are useful when one needs to efficiently access a 129 minimum (or maximum) value as the set of values changes. 130 </para> 131 132 <para> 133 Most useful data structures for priority queues have a relatively 134 simple structure, as they are geared toward relatively simple 135 requirements. Unfortunately, these structures do not support access 136 to an arbitrary value, which turns out to be necessary in many 137 algorithms. Say, decreasing an arbitrary value in a graph 138 algorithm. Therefore, some extra mechanism is necessary and must be 139 invented for accessing arbitrary values. There are at least two 140 alternatives: embedding an associative container in a priority 141 queue, or allowing cross-referencing through iterators. The first 142 solution adds significant overhead; the second solution requires a 143 precise definition of iterator invalidation. Which is the next 144 point... 145 </para> 146 147 <para> 148 Priority queues, like hash-based containers, store values in an 149 order that is meaningless and undefined externally. For example, a 150 <code>push</code> operation can internally reorganize the 151 values. Because of this characteristic, describing a priority 152 queues' iterator is difficult: on one hand, the values to which 153 iterators point can remain valid, but on the other, the logical 154 order of iterators can change unpredictably. 155 </para> 156 157 <para> 158 Roughly speaking, any element that is both inserted to a priority 159 queue (e.g. through <code>push</code>) and removed 160 from it (e.g., through <code>pop</code>), incurs a 161 logarithmic overhead (in the amortized sense). Different underlying 162 data structures place the actual cost differently: some are 163 optimized for amortized complexity, whereas others guarantee that 164 specific operations only have a constant cost. One underlying data 165 structure might be chosen if modifying a value is frequent 166 (Dijkstra's shortest-path algorithm), whereas a different one might 167 be chosen otherwise. Unfortunately, an array-based binary heap - an 168 underlying data structure that optimizes (in the amortized sense) 169 <code>push</code> and <code>pop</code> operations, differs from the 170 others in terms of its invalidation guarantees. Other design 171 decisions also impact the cost and placement of the overhead, at the 172 expense of more difference in the the kinds of operations that the 173 underlying data structure can support. These differences pose a 174 challenge when creating a uniform interface for priority queues. 175 </para> 176 </section> 177 </section> 178 179 <section xml:id="pbds.intro.motivation"> 180 <info><title>Goals</title></info> 181 182 <para> 183 Many fine associative-container libraries were already written, 184 most notably, the C++ standard's associative containers. Why 185 then write another library? This section shows some possible 186 advantages of this library, when considering the challenges in 187 the introduction. Many of these points stem from the fact that 188 the ISO C++ process introduced associative-containers in a 189 two-step process (first standardizing tree-based containers, 190 only then adding hash-based containers, which are fundamentally 191 different), did not standardize priority queues as containers, 192 and (in our opinion) overloads the iterator concept. 193 </para> 194 195 <section xml:id="pbds.intro.motivation.associative"> 196 <info><title>Associative</title></info> 197 <para> 198 </para> 199 200 <section xml:id="motivation.associative.policy"> 201 <info><title>Policy Choices</title></info> 202 <para> 203 Associative containers require a relatively large number of 204 policies to function efficiently in various settings. In some 205 cases this is needed for making their common operations more 206 efficient, and in other cases this allows them to support a 207 larger set of operations 208 </para> 209 210 <orderedlist> 211 <listitem> 212 <para> 213 Hash-based containers, for example, support look-up and 214 insertion methods (<function>find</function> and 215 <function>insert</function>). In order to locate elements 216 quickly, they are supplied a hash functor, which instruct 217 how to transform a key object into some size type; a hash 218 functor might transform <constant>"hello"</constant> 219 into <constant>1123002298</constant>. A hash table, though, 220 requires transforming each key object into some size-type 221 type in some specific domain; a hash table with a 128-long 222 table might transform <constant>"hello"</constant> into 223 position <constant>63</constant>. The policy by which the 224 hash value is transformed into a position within the table 225 can dramatically affect performance. Hash-based containers 226 also do not resize naturally (as opposed to tree-based 227 containers, for example). The appropriate resize policy is 228 unfortunately intertwined with the policy that transforms 229 hash value into a position within the table. 230 </para> 231 </listitem> 232 233 <listitem> 234 <para> 235 Tree-based containers, for example, also support look-up and 236 insertion methods, and are primarily useful when maintaining 237 order between elements is important. In some cases, though, 238 one can utilize their balancing algorithms for completely 239 different purposes. 240 </para> 241 242 <para> 243 Figure A shows a tree whose each node contains two entries: 244 a floating-point key, and some size-type 245 <emphasis>metadata</emphasis> (in bold beneath it) that is 246 the number of nodes in the sub-tree. (The root has key 0.99, 247 and has 5 nodes (including itself) in its sub-tree.) A 248 container based on this data structure can obviously answer 249 efficiently whether 0.3 is in the container object, but it 250 can also answer what is the order of 0.3 among all those in 251 the container object: see <xref linkend="biblio.clrs2001"/>. 252 253 </para> 254 255 <para> 256 As another example, Figure B shows a tree whose each node 257 contains two entries: a half-open geometric line interval, 258 and a number <emphasis>metadata</emphasis> (in bold beneath 259 it) that is the largest endpoint of all intervals in its 260 sub-tree. (The root describes the interval <constant>[20, 261 36)</constant>, and the largest endpoint in its sub-tree is 262 99.) A container based on this data structure can obviously 263 answer efficiently whether <constant>[3, 41)</constant> is 264 in the container object, but it can also answer efficiently 265 whether the container object has intervals that intersect 266 <constant>[3, 41)</constant>. These types of queries are 267 very useful in geometric algorithms and lease-management 268 algorithms. 269 </para> 270 271 <para> 272 It is important to note, however, that as the trees are 273 modified, their internal structure changes. To maintain 274 these invariants, one must supply some policy that is aware 275 of these changes. Without this, it would be better to use a 276 linked list (in itself very efficient for these purposes). 277 </para> 278 279 </listitem> 280 </orderedlist> 281 282 <figure> 283 <title>Node Invariants</title> 284 <mediaobject> 285 <imageobject> 286 <imagedata align="center" format="PNG" scale="100" 287 fileref="../images/pbds_node_invariants.png"/> 288 </imageobject> 289 <textobject> 290 <phrase>Node Invariants</phrase> 291 </textobject> 292 </mediaobject> 293 </figure> 294 295 </section> 296 297 <section xml:id="motivation.associative.underlying"> 298 <info><title>Underlying Data Structures</title></info> 299 <para> 300 The standard C++ library contains associative containers based on 301 red-black trees and collision-chaining hash tables. These are 302 very useful, but they are not ideal for all types of 303 settings. 304 </para> 305 306 <para> 307 The figure below shows the different underlying data structures 308 currently supported in this library. 309 </para> 310 311 <figure> 312 <title>Underlying Associative Data Structures</title> 313 <mediaobject> 314 <imageobject> 315 <imagedata align="center" format="PNG" scale="100" 316 fileref="../images/pbds_different_underlying_dss_1.png"/> 317 </imageobject> 318 <textobject> 319 <phrase>Underlying Associative Data Structures</phrase> 320 </textobject> 321 </mediaobject> 322 </figure> 323 324 <para> 325 A shows a collision-chaining hash-table, B shows a probing 326 hash-table, C shows a red-black tree, D shows a splay tree, E shows 327 a tree based on an ordered vector(implicit in the order of the 328 elements), F shows a PATRICIA trie, and G shows a list-based 329 container with update policies. 330 </para> 331 332 <para> 333 Each of these data structures has some performance benefits, in 334 terms of speed, size or both. For now, note that vector-based trees 335 and probing hash tables manipulate memory more efficiently than 336 red-black trees and collision-chaining hash tables, and that 337 list-based associative containers are very useful for constructing 338 "multimaps". 339 </para> 340 341 <para> 342 Now consider a function manipulating a generic associative 343 container, 344 </para> 345 <programlisting> 346 template<class Cntnr> 347 int 348 some_op_sequence(Cntnr &r_cnt) 349 { 350 ... 351 } 352 </programlisting> 353 354 <para> 355 Ideally, the underlying data structure 356 of <classname>Cntnr</classname> would not affect what can be 357 done with <varname>r_cnt</varname>. Unfortunately, this is not 358 the case. 359 </para> 360 361 <para> 362 For example, if <classname>Cntnr</classname> 363 is <classname>std::map</classname>, then the function can 364 use 365 </para> 366 <programlisting> 367 std::for_each(r_cnt.find(foo), r_cnt.find(bar), foobar) 368 </programlisting> 369 <para> 370 in order to apply <classname>foobar</classname> to all 371 elements between <classname>foo</classname> and 372 <classname>bar</classname>. If 373 <classname>Cntnr</classname> is a hash-based container, 374 then this call's results are undefined. 375 </para> 376 377 <para> 378 Also, if <classname>Cntnr</classname> is tree-based, the type 379 and object of the comparison functor can be 380 accessed. If <classname>Cntnr</classname> is hash based, these 381 queries are nonsensical. 382 </para> 383 384 <para> 385 There are various other differences based on the container's 386 underlying data structure. For one, they can be constructed by, 387 and queried for, different policies. Furthermore: 388 </para> 389 390 <orderedlist> 391 <listitem> 392 <para> 393 Containers based on C, D, E and F store elements in a 394 meaningful order; the others store elements in a meaningless 395 (and probably time-varying) order. By implication, only 396 containers based on C, D, E and F can 397 support <function>erase</function> operations taking an 398 iterator and returning an iterator to the following element 399 without performance loss. 400 </para> 401 </listitem> 402 403 <listitem> 404 <para> 405 Containers based on C, D, E, and F can be split and joined 406 efficiently, while the others cannot. Containers based on C 407 and D, furthermore, can guarantee that this is exception-free; 408 containers based on E cannot guarantee this. 409 </para> 410 </listitem> 411 412 <listitem> 413 <para> 414 Containers based on all but E can guarantee that 415 erasing an element is exception free; containers based on E 416 cannot guarantee this. Containers based on all but B and E 417 can guarantee that modifying an object of their type does 418 not invalidate iterators or references to their elements, 419 while containers based on B and E cannot. Containers based 420 on C, D, and E can furthermore make a stronger guarantee, 421 namely that modifying an object of their type does not 422 affect the order of iterators. 423 </para> 424 </listitem> 425 </orderedlist> 426 427 <para> 428 A unified tag and traits system (as used for the C++ standard 429 library iterators, for example) can ease generic manipulation of 430 associative containers based on different underlying data 431 structures. 432 </para> 433 434 </section> 435 436 <section xml:id="motivation.associative.iterators"> 437 <info><title>Iterators</title></info> 438 <para> 439 Iterators are centric to the design of the standard library 440 containers, because of the container/algorithm/iterator 441 decomposition that allows an algorithm to operate on a range 442 through iterators of some sequence. Iterators, then, are useful 443 because they allow going over a 444 specific <emphasis>sequence</emphasis>. The standard library 445 also uses iterators for accessing a 446 specific <emphasis>element</emphasis>: when an associative 447 container returns one through <function>find</function>. The 448 standard library consistently uses the same types of iterators 449 for both purposes: going over a range, and accessing a specific 450 found element. Before the introduction of hash-based containers 451 to the standard library, this made sense (with the exception of 452 priority queues, which are discussed later). 453 </para> 454 455 <para> 456 Using the standard associative containers together with 457 non-order-preserving associative containers (and also because of 458 priority-queues container), there is a possible need for 459 different types of iterators for self-organizing containers: 460 the iterator concept seems overloaded to mean two different 461 things (in some cases). <remark> XXX 462 "ds_gen.html#find_range">Design::Associative 463 Containers::Data-Structure Genericity::Point-Type and Range-Type 464 Methods</remark>. 465 </para> 466 467 <section xml:id="associative.iterators.using"> 468 <info> 469 <title>Using Point Iterators for Range Operations</title> 470 </info> 471 <para> 472 Suppose <classname>cntnr</classname> is some associative 473 container, and say <varname>c</varname> is an object of 474 type <classname>cntnr</classname>. Then what will be the outcome 475 of 476 </para> 477 478 <programlisting> 479 std::for_each(c.find(1), c.find(5), foo); 480 </programlisting> 481 482 <para> 483 If <classname>cntnr</classname> is a tree-based container 484 object, then an in-order walk will 485 apply <classname>foo</classname> to the relevant elements, 486 as in the graphic below, label A. If <varname>c</varname> is 487 a hash-based container, then the order of elements between any 488 two elements is undefined (and probably time-varying); there is 489 no guarantee that the elements traversed will coincide with the 490 <emphasis>logical</emphasis> elements between 1 and 5, as in 491 label B. 492 </para> 493 494 <figure> 495 <title>Range Iteration in Different Data Structures</title> 496 <mediaobject> 497 <imageobject> 498 <imagedata align="center" format="PNG" scale="100" 499 fileref="../images/pbds_point_iterators_range_ops_1.png"/> 500 </imageobject> 501 <textobject> 502 <phrase>Node Invariants</phrase> 503 </textobject> 504 </mediaobject> 505 </figure> 506 507 <para> 508 In our opinion, this problem is not caused just because 509 red-black trees are order preserving while 510 collision-chaining hash tables are (generally) not - it 511 is more fundamental. Most of the standard's containers 512 order sequences in a well-defined manner that is 513 determined by their <emphasis>interface</emphasis>: 514 calling <function>insert</function> on a tree-based 515 container modifies its sequence in a predictable way, as 516 does calling <function>push_back</function> on a list or 517 a vector. Conversely, collision-chaining hash tables, 518 probing hash tables, priority queues, and list-based 519 containers (which are very useful for "multimaps") are 520 self-organizing data structures; the effect of each 521 operation modifies their sequences in a manner that is 522 (practically) determined by their 523 <emphasis>implementation</emphasis>. 524 </para> 525 526 <para> 527 Consequently, applying an algorithm to a sequence obtained from most 528 containers may or may not make sense, but applying it to a 529 sub-sequence of a self-organizing container does not. 530 </para> 531 </section> 532 533 <section xml:id="associative.iterators.cost"> 534 <info> 535 <title>Cost to Point Iterators to Enable Range Operations</title> 536 </info> 537 <para> 538 Suppose <varname>c</varname> is some collision-chaining 539 hash-based container object, and one calls 540 </para> 541 <programlisting>c.find(3)</programlisting> 542 <para> 543 Then what composes the returned iterator? 544 </para> 545 546 <para> 547 In the graphic below, label A shows the simplest (and 548 most efficient) implementation of a collision-chaining 549 hash table. The little box marked 550 <classname>point_iterator</classname> shows an object 551 that contains a pointer to the element's node. Note that 552 this "iterator" has no way to move to the next element ( 553 it cannot support 554 <function>operator++</function>). Conversely, the little 555 box marked <classname>iterator</classname> stores both a 556 pointer to the element, as well as some other 557 information (the bucket number of the element). the 558 second iterator, then, is "heavier" than the first one- 559 it requires more time and space. If we were to use a 560 different container to cross-reference into this 561 hash-table using these iterators - it would take much 562 more space. As noted above, nothing much can be done by 563 incrementing these iterators, so why is this extra 564 information needed? 565 </para> 566 567 <para> 568 Alternatively, one might create a collision-chaining hash-table 569 where the lists might be linked, forming a monolithic total-element 570 list, as in the graphic below, label B. Here the iterators are as 571 light as can be, but the hash-table's operations are more 572 complicated. 573 </para> 574 575 <figure> 576 <title>Point Iteration in Hash Data Structures</title> 577 <mediaobject> 578 <imageobject> 579 <imagedata align="center" format="PNG" scale="100" 580 fileref="../images/pbds_point_iterators_range_ops_2.png"/> 581 </imageobject> 582 <textobject> 583 <phrase>Point Iteration in Hash Data Structures</phrase> 584 </textobject> 585 </mediaobject> 586 </figure> 587 588 <para> 589 It should be noted that containers based on collision-chaining 590 hash-tables are not the only ones with this type of behavior; 591 many other self-organizing data structures display it as well. 592 </para> 593 </section> 594 595 <section xml:id="associative.iterators.invalidation"> 596 <info><title>Invalidation Guarantees</title></info> 597 <para>Consider the following snippet:</para> 598 <programlisting> 599 it = c.find(3); 600 c.erase(5); 601 </programlisting> 602 603 <para> 604 Following the call to <classname>erase</classname>, what is the 605 validity of <classname>it</classname>: can it be de-referenced? 606 can it be incremented? 607 </para> 608 609 <para> 610 The answer depends on the underlying data structure of the 611 container. The graphic below shows three cases: A1 and A2 show 612 a red-black tree; B1 and B2 show a probing hash-table; C1 and C2 613 show a collision-chaining hash table. 614 </para> 615 616 <figure> 617 <title>Effect of erase in different underlying data structures</title> 618 <mediaobject> 619 <imageobject> 620 <imagedata align="center" format="PNG" scale="100" 621 fileref="../images/pbds_invalidation_guarantee_erase.png"/> 622 </imageobject> 623 <textobject> 624 <phrase>Effect of erase in different underlying data structures</phrase> 625 </textobject> 626 </mediaobject> 627 </figure> 628 629 <orderedlist> 630 <listitem> 631 <para> 632 Erasing 5 from A1 yields A2. Clearly, an iterator to 3 can 633 be de-referenced and incremented. The sequence of iterators 634 changed, but in a way that is well-defined by the interface. 635 </para> 636 </listitem> 637 638 <listitem> 639 <para> 640 Erasing 5 from B1 yields B2. Clearly, an iterator to 3 is 641 not valid at all - it cannot be de-referenced or 642 incremented; the order of iterators changed in a way that is 643 (practically) determined by the implementation and not by 644 the interface. 645 </para> 646 </listitem> 647 648 <listitem> 649 <para> 650 Erasing 5 from C1 yields C2. Here the situation is more 651 complicated. On the one hand, there is no problem in 652 de-referencing <classname>it</classname>. On the other hand, 653 the order of iterators changed in a way that is 654 (practically) determined by the implementation and not by 655 the interface. 656 </para> 657 </listitem> 658 </orderedlist> 659 660 <para> 661 So in the standard library containers, it is not always possible 662 to express whether <varname>it</varname> is valid or not. This 663 is true also for <function>insert</function>. Again, the 664 iterator concept seems overloaded. 665 </para> 666 </section> 667 </section> <!--iterators--> 668 669 670 <section xml:id="motivation.associative.functions"> 671 <info><title>Functional</title></info> 672 <para> 673 </para> 674 675 <para> 676 The design of the functional overlay to the underlying data 677 structures differs slightly from some of the conventions used in 678 the C++ standard. A strict public interface of methods that 679 comprise only operations which depend on the class's internal 680 structure; other operations are best designed as external 681 functions. (See <xref linkend="biblio.meyers02both"/>).With this 682 rubric, the standard associative containers lack some useful 683 methods, and provide other methods which would be better 684 removed. 685 </para> 686 687 <section xml:id="motivation.associative.functions.erase"> 688 <info><title><function>erase</function></title></info> 689 690 <orderedlist> 691 <listitem> 692 <para> 693 Order-preserving standard associative containers provide the 694 method 695 </para> 696 <programlisting> 697 iterator 698 erase(iterator it) 699 </programlisting> 700 701 <para> 702 which takes an iterator, erases the corresponding 703 element, and returns an iterator to the following 704 element. Also standardd hash-based associative 705 containers provide this method. This seemingly 706 increasesgenericity between associative containers, 707 since it is possible to use 708 </para> 709 <programlisting> 710 typename C::iterator it = c.begin(); 711 typename C::iterator e_it = c.end(); 712 713 while(it != e_it) 714 it = pred(*it)? c.erase(it) : ++it; 715 </programlisting> 716 717 <para> 718 in order to erase from a container object <varname> 719 c</varname> all element which match a 720 predicate <classname>pred</classname>. However, in a 721 different sense this actually decreases genericity: an 722 integral implication of this method is that tree-based 723 associative containers' memory use is linear in the total 724 number of elements they store, while hash-based 725 containers' memory use is unbounded in the total number of 726 elements they store. Assume a hash-based container is 727 allowed to decrease its size when an element is 728 erased. Then the elements might be rehashed, which means 729 that there is no "next" element - it is simply 730 undefined. Consequently, it is possible to infer from the 731 fact that the standard library's hash-based containers 732 provide this method that they cannot downsize when 733 elements are erased. As a consequence, different code is 734 needed to manipulate different containers, assuming that 735 memory should be conserved. Therefor, this library's 736 non-order preserving associative containers omit this 737 method. 738 </para> 739 </listitem> 740 741 <listitem> 742 <para> 743 All associative containers include a conditional-erase method 744 </para> 745 <programlisting> 746 template< 747 class Pred> 748 size_type 749 erase_if 750 (Pred pred) 751 </programlisting> 752 <para> 753 which erases all elements matching a predicate. This is probably the 754 only way to ensure linear-time multiple-item erase which can 755 actually downsize a container. 756 </para> 757 </listitem> 758 759 <listitem> 760 <para> 761 The standard associative containers provide methods for 762 multiple-item erase of the form 763 </para> 764 <programlisting> 765 size_type 766 erase(It b, It e) 767 </programlisting> 768 <para> 769 erasing a range of elements given by a pair of 770 iterators. For tree-based or trie-based containers, this can 771 implemented more efficiently as a (small) sequence of split 772 and join operations. For other, unordered, containers, this 773 method isn't much better than an external loop. Moreover, 774 if <varname>c</varname> is a hash-based container, 775 then 776 </para> 777 <programlisting> 778 c.erase(c.find(2), c.find(5)) 779 </programlisting> 780 <para> 781 is almost certain to do something 782 different than erasing all elements whose keys are between 2 783 and 5, and is likely to produce other undefined behavior. 784 </para> 785 </listitem> 786 </orderedlist> 787 </section> <!-- erase --> 788 789 <section xml:id="motivation.associative.functions.split"> 790 <info> 791 <title> 792 <function>split</function> and <function>join</function> 793 </title> 794 </info> 795 <para> 796 It is well-known that tree-based and trie-based container 797 objects can be efficiently split or joined (See 798 <xref linkend="biblio.clrs2001"/>). Externally splitting or 799 joining trees is super-linear, and, furthermore, can throw 800 exceptions. Split and join methods, consequently, seem good 801 choices for tree-based container methods, especially, since as 802 noted just before, they are efficient replacements for erasing 803 sub-sequences. 804 </para> 805 806 </section> <!-- split --> 807 808 <section xml:id="motivation.associative.functions.insert"> 809 <info> 810 <title> 811 <function>insert</function> 812 </title> 813 </info> 814 <para> 815 The standard associative containers provide methods of the form 816 </para> 817 <programlisting> 818 template<class It> 819 size_type 820 insert(It b, It e); 821 </programlisting> 822 823 <para> 824 for inserting a range of elements given by a pair of 825 iterators. At best, this can be implemented as an external loop, 826 or, even more efficiently, as a join operation (for the case of 827 tree-based or trie-based containers). Moreover, these methods seem 828 similar to constructors taking a range given by a pair of 829 iterators; the constructors, however, are transactional, whereas 830 the insert methods are not; this is possibly confusing. 831 </para> 832 833 </section> <!-- insert --> 834 835 <section xml:id="motivation.associative.functions.compare"> 836 <info> 837 <title> 838 <function>operator==</function> and <function>operator<=</function> 839 </title> 840 </info> 841 842 <para> 843 Associative containers are parametrized by policies allowing to 844 test key equivalence: a hash-based container can do this through 845 its equivalence functor, and a tree-based container can do this 846 through its comparison functor. In addition, some standard 847 associative containers have global function operators, like 848 <function>operator==</function> and <function>operator<=</function>, 849 that allow comparing entire associative containers. 850 </para> 851 852 <para> 853 In our opinion, these functions are better left out. To begin 854 with, they do not significantly improve over an external 855 loop. More importantly, however, they are possibly misleading - 856 <function>operator==</function>, for example, usually checks for 857 equivalence, or interchangeability, but the associative 858 container cannot check for values' equivalence, only keys' 859 equivalence; also, are two containers considered equivalent if 860 they store the same values in different order? this is an 861 arbitrary decision. 862 </para> 863 </section> <!-- compare --> 864 865 </section> <!-- functional --> 866 867 </section> <!--associative--> 868 869 <section xml:id="pbds.intro.motivation.priority_queue"> 870 <info><title>Priority Queues</title></info> 871 872 <section xml:id="motivation.priority_queue.policy"> 873 <info><title>Policy Choices</title></info> 874 875 <para> 876 Priority queues are containers that allow efficiently inserting 877 values and accessing the maximal value (in the sense of the 878 container's comparison functor). Their interface 879 supports <function>push</function> 880 and <function>pop</function>. The standard 881 container <classname>std::priorityqueue</classname> indeed support 882 these methods, but little else. For algorithmic and 883 software-engineering purposes, other methods are needed: 884 </para> 885 886 <orderedlist> 887 <listitem> 888 <para> 889 Many graph algorithms (see 890 <xref linkend="biblio.clrs2001"/>) require increasing a 891 value in a priority queue (again, in the sense of the 892 container's comparison functor), or joining two 893 priority-queue objects. 894 </para> 895 </listitem> 896 897 <listitem> 898 <para>The return type of <classname>priority_queue</classname>'s 899 <function>push</function> method is a point-type iterator, which can 900 be used for modifying or erasing arbitrary values. For 901 example:</para> 902 <programlisting> 903 priority_queue<int> p; 904 priority_queue<int>::point_iterator it = p.push(3); 905 p.modify(it, 4); 906 </programlisting> 907 908 <para>These types of cross-referencing operations are necessary 909 for making priority queues useful for different applications, 910 especially graph applications.</para> 911 912 </listitem> 913 <listitem> 914 <para> 915 It is sometimes necessary to erase an arbitrary value in a 916 priority queue. For example, consider 917 the <function>select</function> function for monitoring 918 file descriptors: 919 </para> 920 921 <programlisting> 922 int 923 select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *errorfds, 924 struct timeval *timeout); 925 </programlisting> 926 <para> 927 then, as the select documentation states: 928 </para> 929 <para> 930 <quote> 931 The nfds argument specifies the range of file 932 descriptors to be tested. The select() function tests file 933 descriptors in the range of 0 to nfds-1.</quote> 934 </para> 935 936 <para> 937 It stands to reason, therefore, that we might wish to 938 maintain a minimal value for <varname>nfds</varname>, and 939 priority queues immediately come to mind. Note, though, that 940 when a socket is closed, the minimal file description might 941 change; in the absence of an efficient means to erase an 942 arbitrary value from a priority queue, we might as well 943 avoid its use altogether. 944 </para> 945 946 <para> 947 The standard containers typically support iterators. It is 948 somewhat unusual 949 for <classname>std::priority_queue</classname> to omit them 950 (See <xref linkend="biblio.meyers01stl"/>). One might 951 ask why do priority queues need to support iterators, since 952 they are self-organizing containers with a different purpose 953 than abstracting sequences. There are several reasons: 954 </para> 955 <orderedlist> 956 <listitem> 957 <para> 958 Iterators (even in self-organizing containers) are 959 useful for many purposes: cross-referencing 960 containers, serialization, and debugging code that uses 961 these containers. 962 </para> 963 </listitem> 964 965 <listitem> 966 <para> 967 The standard library's hash-based containers support 968 iterators, even though they too are self-organizing 969 containers with a different purpose than abstracting 970 sequences. 971 </para> 972 </listitem> 973 974 <listitem> 975 <para> 976 In standard-library-like containers, it is natural to specify the 977 interface of operations for modifying a value or erasing 978 a value (discussed previously) in terms of a iterators. 979 It should be noted that the standard 980 containers also use iterators for accessing and 981 manipulating a specific value. In hash-based 982 containers, one checks the existence of a key by 983 comparing the iterator returned by <function>find</function> to the 984 iterator returned by <function>end</function>, and not by comparing a 985 pointer returned by <function>find</function> to <type>NULL</type>. 986 </para> 987 </listitem> 988 </orderedlist> 989 </listitem> 990 </orderedlist> 991 992 </section> 993 994 <section xml:id="motivation.priority_queue.underlying"> 995 <info><title>Underlying Data Structures</title></info> 996 997 <para> 998 There are three main implementations of priority queues: the 999 first employs a binary heap, typically one which uses a 1000 sequence; the second uses a tree (or forest of trees), which is 1001 typically less structured than an associative container's tree; 1002 the third simply uses an associative container. These are 1003 shown in the figure below with labels A1 and A2, B, and C. 1004 </para> 1005 1006 <figure> 1007 <title>Underlying Priority Queue Data Structures</title> 1008 <mediaobject> 1009 <imageobject> 1010 <imagedata align="center" format="PNG" scale="100" 1011 fileref="../images/pbds_different_underlying_dss_2.png"/> 1012 </imageobject> 1013 <textobject> 1014 <phrase>Underlying Priority Queue Data Structures</phrase> 1015 </textobject> 1016 </mediaobject> 1017 </figure> 1018 1019 <para> 1020 No single implementation can completely replace any of the 1021 others. Some have better <function>push</function> 1022 and <function>pop</function> amortized performance, some have 1023 better bounded (worst case) response time than others, some 1024 optimize a single method at the expense of others, etc. In 1025 general the "best" implementation is dictated by the specific 1026 problem. 1027 </para> 1028 1029 <para> 1030 As with associative containers, the more implementations 1031 co-exist, the more necessary a traits mechanism is for handling 1032 generic containers safely and efficiently. This is especially 1033 important for priority queues, since the invalidation guarantees 1034 of one of the most useful data structures - binary heaps - is 1035 markedly different than those of most of the others. 1036 </para> 1037 1038 </section> 1039 1040 <section xml:id="motivation.priority_queue.binary_heap"> 1041 <info><title>Binary Heaps</title></info> 1042 1043 1044 <para> 1045 Binary heaps are one of the most useful underlying 1046 data structures for priority queues. They are very efficient in 1047 terms of memory (since they don't require per-value structure 1048 metadata), and have the best amortized <function>push</function> and 1049 <function>pop</function> performance for primitive types like 1050 <type>int</type>. 1051 </para> 1052 1053 <para> 1054 The standard library's <classname>priority_queue</classname> 1055 implements this data structure as an adapter over a sequence, 1056 typically 1057 <classname>std::vector</classname> 1058 or <classname>std::deque</classname>, which correspond to labels 1059 A1 and A2 respectively in the graphic above. 1060 </para> 1061 1062 <para> 1063 This is indeed an elegant example of the adapter concept and 1064 the algorithm/container/iterator decomposition. (See <xref linkend="biblio.nelson96stlpq"/>). There are 1065 several reasons why a binary-heap priority queue 1066 may be better implemented as a container instead of a 1067 sequence adapter: 1068 </para> 1069 1070 <orderedlist> 1071 <listitem> 1072 <para> 1073 <classname>std::priority_queue</classname> cannot erase values 1074 from its adapted sequence (irrespective of the sequence 1075 type). This means that the memory use of 1076 an <classname>std::priority_queue</classname> object is always 1077 proportional to the maximal number of values it ever contained, 1078 and not to the number of values that it currently 1079 contains. (See <filename>performance/priority_queue_text_pop_mem_usage.cc</filename>.) 1080 This implementation of binary heaps acts very differently than 1081 other underlying data structures (See also pairing heaps). 1082 </para> 1083 </listitem> 1084 1085 <listitem> 1086 <para> 1087 Some combinations of adapted sequences and value types 1088 are very inefficient or just don't make sense. If one uses 1089 <classname>std::priority_queue<std::vector<std::string> 1090 > ></classname>, for example, then not only will each 1091 operation perform a logarithmic number of 1092 <classname>std::string</classname> assignments, but, furthermore, any 1093 operation (including <function>pop</function>) can render the container 1094 useless due to exceptions. Conversely, if one uses 1095 <classname>std::priority_queue<std::deque<int> > 1096 ></classname>, then each operation uses incurs a logarithmic 1097 number of indirect accesses (through pointers) unnecessarily. 1098 It might be better to let the container make a conservative 1099 deduction whether to use the structure in the graphic above, labels A1 or A2. 1100 </para> 1101 </listitem> 1102 1103 <listitem> 1104 <para> 1105 There does not seem to be a systematic way to determine 1106 what exactly can be done with the priority queue. 1107 </para> 1108 <orderedlist> 1109 <listitem> 1110 <para> 1111 If <classname>p</classname> is a priority queue adapting an 1112 <classname>std::vector</classname>, then it is possible to iterate over 1113 all values by using <function>&p.top()</function> and 1114 <function>&p.top() + p.size()</function>, but this will not work 1115 if <varname>p</varname> is adapting an <classname>std::deque</classname>; in any 1116 case, one cannot use <classname>p.begin()</classname> and 1117 <classname>p.end()</classname>. If a different sequence is adapted, it 1118 is even more difficult to determine what can be 1119 done. 1120 </para> 1121 </listitem> 1122 1123 <listitem> 1124 <para> 1125 If <varname>p</varname> is a priority queue adapting an 1126 <classname>std::deque</classname>, then the reference return by 1127 </para> 1128 <programlisting> 1129 p.top() 1130 </programlisting> 1131 <para> 1132 will remain valid until it is popped, 1133 but if <varname>p</varname> adapts an <classname>std::vector</classname>, the 1134 next <function>push</function> will invalidate it. If a different 1135 sequence is adapted, it is even more difficult to 1136 determine what can be done. 1137 </para> 1138 </listitem> 1139 </orderedlist> 1140 </listitem> 1141 1142 <listitem> 1143 <para> 1144 Sequence-based binary heaps can still implement 1145 linear-time <function>erase</function> and <function>modify</function> operations. 1146 This means that if one needs to erase a small 1147 (say logarithmic) number of values, then one might still 1148 choose this underlying data structure. Using 1149 <classname>std::priority_queue</classname>, however, this will generally 1150 change the order of growth of the entire sequence of 1151 operations. 1152 </para> 1153 </listitem> 1154 </orderedlist> 1155 1156 </section> 1157 </section> 1158 </section> <!-- goals/motivation --> 1159 </section> <!-- intro --> 1160 1161 <!-- S02: Using --> 1162 <section xml:id="containers.pbds.using"> 1163 <info><title>Using</title></info> 1164 <?dbhtml filename="policy_data_structures_using.html"?> 1165 1166 <section xml:id="pbds.using.prereq"> 1167 <info><title>Prerequisites</title></info> 1168 1169 <para>The library contains only header files, and does not require any 1170 other libraries except the standard C++ library . All classes are 1171 defined in namespace <code>__gnu_pbds</code>. The library internally 1172 uses macros beginning with <code>PB_DS</code>, but 1173 <code>#undef</code>s anything it <code>#define</code>s (except for 1174 header guards). Compiling the library in an environment where macros 1175 beginning in <code>PB_DS</code> are defined, may yield unpredictable 1176 results in compilation, execution, or both.</para> 1177 1178 <para> 1179 Further dependencies are necessary to create the visual output 1180 for the performance tests. To create these graphs, an 1181 additional package is needed: <command>pychart</command>. 1182 </para> 1183 </section> 1184 1185 <section xml:id="pbds.using.organization"> 1186 <info><title>Organization</title></info> 1187 1188 <para> 1189 The various data structures are organized as follows. 1190 </para> 1191 1192 <itemizedlist> 1193 <listitem> 1194 <para> 1195 Branch-Based 1196 </para> 1197 1198 <itemizedlist> 1199 <listitem> 1200 <para> 1201 <classname>basic_branch</classname> 1202 is an abstract base class for branched-based 1203 associative-containers 1204 </para> 1205 </listitem> 1206 1207 <listitem> 1208 <para> 1209 <classname>tree</classname> 1210 is a concrete base class for tree-based 1211 associative-containers 1212 </para> 1213 </listitem> 1214 1215 <listitem> 1216 <para> 1217 <classname>trie</classname> 1218 is a concrete base class trie-based 1219 associative-containers 1220 </para> 1221 </listitem> 1222 </itemizedlist> 1223 </listitem> 1224 1225 <listitem> 1226 <para> 1227 Hash-Based 1228 </para> 1229 <itemizedlist> 1230 <listitem> 1231 <para> 1232 <classname>basic_hash_table</classname> 1233 is an abstract base class for hash-based 1234 associative-containers 1235 </para> 1236 </listitem> 1237 1238 <listitem> 1239 <para> 1240 <classname>cc_hash_table</classname> 1241 is a concrete collision-chaining hash-based 1242 associative-containers 1243 </para> 1244 </listitem> 1245 1246 <listitem> 1247 <para> 1248 <classname>gp_hash_table</classname> 1249 is a concrete (general) probing hash-based 1250 associative-containers 1251 </para> 1252 </listitem> 1253 </itemizedlist> 1254 </listitem> 1255 1256 <listitem> 1257 <para> 1258 List-Based 1259 </para> 1260 <itemizedlist> 1261 <listitem> 1262 <para> 1263 <classname>list_update</classname> 1264 list-based update-policy associative container 1265 </para> 1266 </listitem> 1267 </itemizedlist> 1268 </listitem> 1269 <listitem> 1270 <para> 1271 Heap-Based 1272 </para> 1273 <itemizedlist> 1274 <listitem> 1275 <para> 1276 <classname>priority_queue</classname> 1277 A priority queue. 1278 </para> 1279 </listitem> 1280 </itemizedlist> 1281 </listitem> 1282 </itemizedlist> 1283 1284 <para> 1285 The hierarchy is composed naturally so that commonality is 1286 captured by base classes. Thus <function>operator[]</function> 1287 is defined at the base of any hierarchy, since all derived 1288 containers support it. Conversely <function>split</function> is 1289 defined in <classname>basic_branch</classname>, since only 1290 tree-like containers support it. 1291 </para> 1292 1293 <para> 1294 In addition, there are the following diagnostics classes, 1295 used to report errors specific to this library's data 1296 structures. 1297 </para> 1298 1299 <figure> 1300 <title>Exception Hierarchy</title> 1301 <mediaobject> 1302 <imageobject> 1303 <imagedata align="center" format="PDF" scale="75" 1304 fileref="../images/pbds_exception_hierarchy.pdf"/> 1305 </imageobject> 1306 <imageobject> 1307 <imagedata align="center" format="PNG" scale="100" 1308 fileref="../images/pbds_exception_hierarchy.png"/> 1309 </imageobject> 1310 <textobject> 1311 <phrase>Exception Hierarchy</phrase> 1312 </textobject> 1313 </mediaobject> 1314 </figure> 1315 1316 </section> 1317 1318 <section xml:id="pbds.using.tutorial"> 1319 <info><title>Tutorial</title></info> 1320 1321 <section xml:id="pbds.using.tutorial.basic"> 1322 <info><title>Basic Use</title></info> 1323 1324 <para> 1325 For the most part, the policy-based containers containers in 1326 namespace <literal>__gnu_pbds</literal> have the same interface as 1327 the equivalent containers in the standard C++ library, except for 1328 the names used for the container classes themselves. For example, 1329 this shows basic operations on a collision-chaining hash-based 1330 container: 1331 </para> 1332 <programlisting> 1333 #include <ext/pb_ds/assoc_container.h> 1334 1335 int main() 1336 { 1337 __gnu_pbds::cc_hash_table<int, char> c; 1338 c[2] = 'b'; 1339 assert(c.find(1) == c.end()); 1340 }; 1341 </programlisting> 1342 1343 <para> 1344 The container is called 1345 <classname>__gnu_pbds::cc_hash_table</classname> instead of 1346 <classname>std::unordered_map</classname>, since <quote>unordered 1347 map</quote> does not necessarily mean a hash-based map as implied by 1348 the C++ library (C++11 or TR1). For example, list-based associative 1349 containers, which are very useful for the construction of 1350 "multimaps," are also unordered. 1351 </para> 1352 1353 <para>This snippet shows a red-black tree based container:</para> 1354 1355 <programlisting> 1356 #include <ext/pb_ds/assoc_container.h> 1357 1358 int main() 1359 { 1360 __gnu_pbds::tree<int, char> c; 1361 c[2] = 'b'; 1362 assert(c.find(2) != c.end()); 1363 }; 1364 </programlisting> 1365 1366 <para>The container is called <classname>tree</classname> instead of 1367 <classname>map</classname> since the underlying data structures are 1368 being named with specificity. 1369 </para> 1370 1371 <para> 1372 The member function naming convention is to strive to be the same as 1373 the equivalent member functions in other C++ standard library 1374 containers. The familiar methods are unchanged: 1375 <function>begin</function>, <function>end</function>, 1376 <function>size</function>, <function>empty</function>, and 1377 <function>clear</function>. 1378 </para> 1379 1380 <para> 1381 This isn't to say that things are exactly as one would expect, given 1382 the container requirments and interfaces in the C++ standard. 1383 </para> 1384 1385 <para> 1386 The names of containers' policies and policy accessors are 1387 different then the usual. For example, if <type>hash_type</type> is 1388 some type of hash-based container, then</para> 1389 1390 <programlisting> 1391 hash_type::hash_fn 1392 </programlisting> 1393 1394 <para> 1395 gives the type of its hash functor, and if <varname>obj</varname> is 1396 some hash-based container object, then 1397 </para> 1398 1399 <programlisting> 1400 obj.get_hash_fn() 1401 </programlisting> 1402 1403 <para>will return a reference to its hash-functor object.</para> 1404 1405 1406 <para> 1407 Similarly, if <type>tree_type</type> is some type of tree-based 1408 container, then 1409 </para> 1410 1411 <programlisting> 1412 tree_type::cmp_fn 1413 </programlisting> 1414 1415 <para> 1416 gives the type of its comparison functor, and if 1417 <varname>obj</varname> is some tree-based container object, 1418 then 1419 </para> 1420 1421 <programlisting> 1422 obj.get_cmp_fn() 1423 </programlisting> 1424 1425 <para>will return a reference to its comparison-functor object.</para> 1426 1427 <para> 1428 It would be nice to give names consistent with those in the existing 1429 C++ standard (inclusive of TR1). Unfortunately, these standard 1430 containers don't consistently name types and methods. For example, 1431 <classname>std::tr1::unordered_map</classname> uses 1432 <type>hasher</type> for the hash functor, but 1433 <classname>std::map</classname> uses <type>key_compare</type> for 1434 the comparison functor. Also, we could not find an accessor for 1435 <classname>std::tr1::unordered_map</classname>'s hash functor, but 1436 <classname>std::map</classname> uses <classname>compare</classname> 1437 for accessing the comparison functor. 1438 </para> 1439 1440 <para> 1441 Instead, <literal>__gnu_pbds</literal> attempts to be internally 1442 consistent, and uses standard-derived terminology if possible. 1443 </para> 1444 1445 <para> 1446 Another source of difference is in scope: 1447 <literal>__gnu_pbds</literal> contains more types of associative 1448 containers than the standard C++ library, and more opportunities 1449 to configure these new containers, since different types of 1450 associative containers are useful in different settings. 1451 </para> 1452 1453 <para> 1454 Namespace <literal>__gnu_pbds</literal> contains different classes for 1455 hash-based containers, tree-based containers, trie-based containers, 1456 and list-based containers. 1457 </para> 1458 1459 <para> 1460 Since associative containers share parts of their interface, they 1461 are organized as a class hierarchy. 1462 </para> 1463 1464 <para>Each type or method is defined in the most-common ancestor 1465 in which it makes sense. 1466 </para> 1467 1468 <para>For example, all associative containers support iteration 1469 expressed in the following form: 1470 </para> 1471 1472 <programlisting> 1473 const_iterator 1474 begin() const; 1475 1476 iterator 1477 begin(); 1478 1479 const_iterator 1480 end() const; 1481 1482 iterator 1483 end(); 1484 </programlisting> 1485 1486 <para> 1487 But not all containers contain or use hash functors. Yet, both 1488 collision-chaining and (general) probing hash-based associative 1489 containers have a hash functor, so 1490 <classname>basic_hash_table</classname> contains the interface: 1491 </para> 1492 1493 <programlisting> 1494 const hash_fn& 1495 get_hash_fn() const; 1496 1497 hash_fn& 1498 get_hash_fn(); 1499 </programlisting> 1500 1501 <para> 1502 so all hash-based associative containers inherit the same 1503 hash-functor accessor methods. 1504 </para> 1505 1506 </section> <!--basic use --> 1507 1508 <section xml:id="pbds.using.tutorial.configuring"> 1509 <info> 1510 <title> 1511 Configuring via Template Parameters 1512 </title> 1513 </info> 1514 1515 <para> 1516 In general, each of this library's containers is 1517 parametrized by more policies than those of the standard library. For 1518 example, the standard hash-based container is parametrized as 1519 follows: 1520 </para> 1521 <programlisting> 1522 template<typename Key, typename Mapped, typename Hash, 1523 typename Pred, typename Allocator, bool Cache_Hashe_Code> 1524 class unordered_map; 1525 </programlisting> 1526 1527 <para> 1528 and so can be configured by key type, mapped type, a functor 1529 that translates keys to unsigned integral types, an equivalence 1530 predicate, an allocator, and an indicator whether to store hash 1531 values with each entry. this library's collision-chaining 1532 hash-based container is parametrized as 1533 </para> 1534 <programlisting> 1535 template<typename Key, typename Mapped, typename Hash_Fn, 1536 typename Eq_Fn, typename Comb_Hash_Fn, 1537 typename Resize_Policy, bool Store_Hash 1538 typename Allocator> 1539 class cc_hash_table; 1540 </programlisting> 1541 1542 <para> 1543 and so can be configured by the first four types of 1544 <classname>std::tr1::unordered_map</classname>, then a 1545 policy for translating the key-hash result into a position 1546 within the table, then a policy by which the table resizes, 1547 an indicator whether to store hash values with each entry, 1548 and an allocator (which is typically the last template 1549 parameter in standard containers). 1550 </para> 1551 1552 <para> 1553 Nearly all policy parameters have default values, so this 1554 need not be considered for casual use. It is important to 1555 note, however, that hash-based containers' policies can 1556 dramatically alter their performance in different settings, 1557 and that tree-based containers' policies can make them 1558 useful for other purposes than just look-up. 1559 </para> 1560 1561 1562 <para>As opposed to associative containers, priority queues have 1563 relatively few configuration options. The priority queue is 1564 parametrized as follows:</para> 1565 <programlisting> 1566 template<typename Value_Type, typename Cmp_Fn,typename Tag, 1567 typename Allocator> 1568 class priority_queue; 1569 </programlisting> 1570 1571 <para>The <classname>Value_Type</classname>, <classname>Cmp_Fn</classname>, and 1572 <classname>Allocator</classname> parameters are the container's value type, 1573 comparison-functor type, and allocator type, respectively; 1574 these are very similar to the standard's priority queue. The 1575 <classname>Tag</classname> parameter is different: there are a number of 1576 pre-defined tag types corresponding to binary heaps, binomial 1577 heaps, etc., and <classname>Tag</classname> should be instantiated 1578 by one of them.</para> 1579 1580 <para>Note that as opposed to the 1581 <classname>std::priority_queue</classname>, 1582 <classname>__gnu_pbds::priority_queue</classname> is not a 1583 sequence-adapter; it is a regular container.</para> 1584 1585 </section> 1586 1587 <section xml:id="pbds.using.tutorial.traits"> 1588 <info> 1589 <title> 1590 Querying Container Attributes 1591 </title> 1592 </info> 1593 <para></para> 1594 1595 <para>A containers underlying data structure 1596 affect their performance; Unfortunately, they can also affect 1597 their interface. When manipulating generically associative 1598 containers, it is often useful to be able to statically 1599 determine what they can support and what the cannot. 1600 </para> 1601 1602 <para>Happily, the standard provides a good solution to a similar 1603 problem - that of the different behavior of iterators. If 1604 <classname>It</classname> is an iterator, then 1605 </para> 1606 <programlisting> 1607 typename std::iterator_traits<It>::iterator_category 1608 </programlisting> 1609 1610 <para>is one of a small number of pre-defined tag classes, and 1611 </para> 1612 <programlisting> 1613 typename std::iterator_traits<It>::value_type 1614 </programlisting> 1615 1616 <para>is the value type to which the iterator "points".</para> 1617 1618 <para> 1619 Similarly, in this library, if <type>C</type> is a 1620 container, then <classname>container_traits</classname> is a 1621 trait class that stores information about the kind of 1622 container that is implemented. 1623 </para> 1624 <programlisting> 1625 typename container_traits<C>::container_category 1626 </programlisting> 1627 <para> 1628 is one of a small number of predefined tag structures that 1629 uniquely identifies the type of underlying data structure. 1630 </para> 1631 1632 <para>In most cases, however, the exact underlying data 1633 structure is not really important, but what is important is 1634 one of its other attributes: whether it guarantees storing 1635 elements by key order, for example. For this one can 1636 use</para> 1637 <programlisting> 1638 typename container_traits<C>::order_preserving 1639 </programlisting> 1640 <para> 1641 Also, 1642 </para> 1643 <programlisting> 1644 typename container_traits<C>::invalidation_guarantee 1645 </programlisting> 1646 1647 <para>is the container's invalidation guarantee. Invalidation 1648 guarantees are especially important regarding priority queues, 1649 since in this library's design, iterators are practically the 1650 only way to manipulate them.</para> 1651 </section> 1652 1653 <section xml:id="pbds.using.tutorial.point_range_iteration"> 1654 <info> 1655 <title> 1656 Point and Range Iteration 1657 </title> 1658 </info> 1659 <para></para> 1660 1661 <para>This library differentiates between two types of methods 1662 and iterators: point-type, and range-type. For example, 1663 <function>find</function> and <function>insert</function> are point-type methods, since 1664 they each deal with a specific element; their returned 1665 iterators are point-type iterators. <function>begin</function> and 1666 <function>end</function> are range-type methods, since they are not used to 1667 find a specific element, but rather to go over all elements in 1668 a container object; their returned iterators are range-type 1669 iterators. 1670 </para> 1671 1672 <para>Most containers store elements in an order that is 1673 determined by their interface. Correspondingly, it is fine that 1674 their point-type iterators are synonymous with their range-type 1675 iterators. For example, in the following snippet 1676 </para> 1677 <programlisting> 1678 std::for_each(c.find(1), c.find(5), foo); 1679 </programlisting> 1680 <para> 1681 two point-type iterators (returned by <function>find</function>) are used 1682 for a range-type purpose - going over all elements whose key is 1683 between 1 and 5. 1684 </para> 1685 1686 <para> 1687 Conversely, the above snippet makes no sense for 1688 self-organizing containers - ones that order (and reorder) 1689 their elements by implementation. It would be nice to have a 1690 uniform iterator system that would allow the above snippet to 1691 compile only if it made sense. 1692 </para> 1693 1694 <para> 1695 This could trivially be done by specializing 1696 <function>std::for_each</function> for the case of iterators returned by 1697 <classname>std::tr1::unordered_map</classname>, but this would only solve the 1698 problem for one algorithm and one container. Fundamentally, the 1699 problem is that one can loop using a self-organizing 1700 container's point-type iterators. 1701 </para> 1702 1703 <para> 1704 This library's containers define two families of 1705 iterators: <type>point_const_iterator</type> and 1706 <type>point_iterator</type> are the iterator types returned by 1707 point-type methods; <type>const_iterator</type> and 1708 <type>iterator</type> are the iterator types returned by range-type 1709 methods. 1710 </para> 1711 <programlisting> 1712 class <- some container -> 1713 { 1714 public: 1715 ... 1716 1717 typedef <- something -> const_iterator; 1718 1719 typedef <- something -> iterator; 1720 1721 typedef <- something -> point_const_iterator; 1722 1723 typedef <- something -> point_iterator; 1724 1725 ... 1726 1727 public: 1728 ... 1729 1730 const_iterator begin () const; 1731 1732 iterator begin(); 1733 1734 point_const_iterator find(...) const; 1735 1736 point_iterator find(...); 1737 }; 1738 </programlisting> 1739 1740 <para>For 1741 containers whose interface defines sequence order , it 1742 is very simple: point-type and range-type iterators are exactly 1743 the same, which means that the above snippet will compile if it 1744 is used for an order-preserving associative container. 1745 </para> 1746 1747 <para> 1748 For self-organizing containers, however, (hash-based 1749 containers as a special example), the preceding snippet will 1750 not compile, because their point-type iterators do not support 1751 <function>operator++</function>. 1752 </para> 1753 1754 <para>In any case, both for order-preserving and self-organizing 1755 containers, the following snippet will compile: 1756 </para> 1757 <programlisting> 1758 typename Cntnr::point_iterator it = c.find(2); 1759 </programlisting> 1760 1761 <para> 1762 because a range-type iterator can always be converted to a 1763 point-type iterator. 1764 </para> 1765 1766 <para>Distingushing between iterator types also 1767 raises the point that a container's iterators might have 1768 different invalidation rules concerning their de-referencing 1769 abilities and movement abilities. This now corresponds exactly 1770 to the question of whether point-type and range-type iterators 1771 are valid. As explained above, <classname>container_traits</classname> allows 1772 querying a container for its data structure attributes. The 1773 iterator-invalidation guarantees are certainly a property of 1774 the underlying data structure, and so 1775 </para> 1776 <programlisting> 1777 container_traits<C>::invalidation_guarantee 1778 </programlisting> 1779 1780 <para> 1781 gives one of three pre-determined types that answer this 1782 query. 1783 </para> 1784 1785 </section> 1786 </section> <!-- tutorial --> 1787 1788 <section xml:id="pbds.using.examples"> 1789 <info><title>Examples</title></info> 1790 <para> 1791 Additional code examples are provided in the source 1792 distribution, as part of the regression and performance 1793 testsuite. 1794 </para> 1795 1796 <section xml:id="pbds.using.examples.basic"> 1797 <info><title>Intermediate Use</title></info> 1798 1799 <itemizedlist> 1800 <listitem> 1801 <para> 1802 Basic use of maps: 1803 <filename>basic_map.cc</filename> 1804 </para> 1805 </listitem> 1806 1807 <listitem> 1808 <para> 1809 Basic use of sets: 1810 <filename>basic_set.cc</filename> 1811 </para> 1812 </listitem> 1813 1814 <listitem> 1815 <para> 1816 Conditionally erasing values from an associative container object: 1817 <filename>erase_if.cc</filename> 1818 </para> 1819 </listitem> 1820 1821 <listitem> 1822 <para> 1823 Basic use of multimaps: 1824 <filename>basic_multimap.cc</filename> 1825 </para> 1826 </listitem> 1827 1828 <listitem> 1829 <para> 1830 Basic use of multisets: 1831 <filename>basic_multiset.cc</filename> 1832 </para> 1833 </listitem> 1834 1835 <listitem> 1836 <para> 1837 Basic use of priority queues: 1838 <filename>basic_priority_queue.cc</filename> 1839 </para> 1840 </listitem> 1841 1842 <listitem> 1843 <para> 1844 Splitting and joining priority queues: 1845 <filename>priority_queue_split_join.cc</filename> 1846 </para> 1847 </listitem> 1848 1849 <listitem> 1850 <para> 1851 Conditionally erasing values from a priority queue: 1852 <filename>priority_queue_erase_if.cc</filename> 1853 </para> 1854 </listitem> 1855 </itemizedlist> 1856 1857 </section> 1858 1859 <section xml:id="pbds.using.examples.query"> 1860 <info><title>Querying with <classname>container_traits</classname> </title></info> 1861 <itemizedlist> 1862 <listitem> 1863 <para> 1864 Using <classname>container_traits</classname> to query 1865 about underlying data structure behavior: 1866 <filename>assoc_container_traits.cc</filename> 1867 </para> 1868 </listitem> 1869 1870 <listitem> 1871 <para> 1872 A non-compiling example showing wrong use of finding keys in 1873 hash-based containers: <filename>hash_find_neg.cc</filename> 1874 </para> 1875 </listitem> 1876 <listitem> 1877 <para> 1878 Using <classname>container_traits</classname> 1879 to query about underlying data structure behavior: 1880 <filename>priority_queue_container_traits.cc</filename> 1881 </para> 1882 </listitem> 1883 1884 </itemizedlist> 1885 1886 </section> 1887 1888 <section xml:id="pbds.using.examples.container"> 1889 <info><title>By Container Method</title></info> 1890 <para></para> 1891 1892 <section xml:id="pbds.using.examples.container.hash"> 1893 <info><title>Hash-Based</title></info> 1894 1895 <section xml:id="pbds.using.examples.container.hash.resize"> 1896 <info><title>size Related</title></info> 1897 1898 <itemizedlist> 1899 <listitem> 1900 <para> 1901 Setting the initial size of a hash-based container 1902 object: 1903 <filename>hash_initial_size.cc</filename> 1904 </para> 1905 </listitem> 1906 1907 <listitem> 1908 <para> 1909 A non-compiling example showing how not to resize a 1910 hash-based container object: 1911 <filename>hash_resize_neg.cc</filename> 1912 </para> 1913 </listitem> 1914 1915 <listitem> 1916 <para> 1917 Resizing the size of a hash-based container object: 1918 <filename>hash_resize.cc</filename> 1919 </para> 1920 </listitem> 1921 1922 <listitem> 1923 <para> 1924 Showing an illegal resize of a hash-based container 1925 object: 1926 <filename>hash_illegal_resize.cc</filename> 1927 </para> 1928 </listitem> 1929 1930 <listitem> 1931 <para> 1932 Changing the load factors of a hash-based container 1933 object: <filename>hash_load_set_change.cc</filename> 1934 </para> 1935 </listitem> 1936 </itemizedlist> 1937 </section> 1938 1939 <section xml:id="pbds.using.examples.container.hash.hashor"> 1940 <info><title>Hashing Function Related</title></info> 1941 <para></para> 1942 1943 <itemizedlist> 1944 <listitem> 1945 <para> 1946 Using a modulo range-hashing function for the case of an 1947 unknown skewed key distribution: 1948 <filename>hash_mod.cc</filename> 1949 </para> 1950 </listitem> 1951 1952 <listitem> 1953 <para> 1954 Writing a range-hashing functor for the case of a known 1955 skewed key distribution: 1956 <filename>shift_mask.cc</filename> 1957 </para> 1958 </listitem> 1959 1960 <listitem> 1961 <para> 1962 Storing the hash value along with each key: 1963 <filename>store_hash.cc</filename> 1964 </para> 1965 </listitem> 1966 1967 <listitem> 1968 <para> 1969 Writing a ranged-hash functor: 1970 <filename>ranged_hash.cc</filename> 1971 </para> 1972 </listitem> 1973 </itemizedlist> 1974 1975 </section> 1976 1977 </section> 1978 1979 <section xml:id="pbds.using.examples.container.branch"> 1980 <info><title>Branch-Based</title></info> 1981 1982 1983 <section xml:id="pbds.using.examples.container.branch.split"> 1984 <info><title>split or join Related</title></info> 1985 1986 <itemizedlist> 1987 <listitem> 1988 <para> 1989 Joining two tree-based container objects: 1990 <filename>tree_join.cc</filename> 1991 </para> 1992 </listitem> 1993 1994 <listitem> 1995 <para> 1996 Splitting a PATRICIA trie container object: 1997 <filename>trie_split.cc</filename> 1998 </para> 1999 </listitem> 2000 2001 <listitem> 2002 <para> 2003 Order statistics while joining two tree-based container 2004 objects: 2005 <filename>tree_order_statistics_join.cc</filename> 2006 </para> 2007 </listitem> 2008 </itemizedlist> 2009 2010 </section> 2011 2012 <section xml:id="pbds.using.examples.container.branch.invariants"> 2013 <info><title>Node Invariants</title></info> 2014 2015 <itemizedlist> 2016 <listitem> 2017 <para> 2018 Using trees for order statistics: 2019 <filename>tree_order_statistics.cc</filename> 2020 </para> 2021 </listitem> 2022 2023 <listitem> 2024 <para> 2025 Augmenting trees to support operations on line 2026 intervals: 2027 <filename>tree_intervals.cc</filename> 2028 </para> 2029 </listitem> 2030 </itemizedlist> 2031 2032 </section> 2033 2034 <section xml:id="pbds.using.examples.container.branch.trie"> 2035 <info><title>trie</title></info> 2036 <itemizedlist> 2037 <listitem> 2038 <para> 2039 Using a PATRICIA trie for DNA strings: 2040 <filename>trie_dna.cc</filename> 2041 </para> 2042 </listitem> 2043 2044 <listitem> 2045 <para> 2046 Using a PATRICIA 2047 trie for finding all entries whose key matches a given prefix: 2048 <filename>trie_prefix_search.cc</filename> 2049 </para> 2050 </listitem> 2051 </itemizedlist> 2052 2053 </section> 2054 2055 </section> 2056 2057 <section xml:id="pbds.using.examples.container.priority_queue"> 2058 <info><title>Priority Queues</title></info> 2059 <itemizedlist> 2060 <listitem> 2061 <para> 2062 Cross referencing an associative container and a priority 2063 queue: <filename>priority_queue_xref.cc</filename> 2064 </para> 2065 </listitem> 2066 2067 <listitem> 2068 <para> 2069 Cross referencing a vector and a priority queue using a 2070 very simple version of Dijkstra's shortest path 2071 algorithm: 2072 <filename>priority_queue_dijkstra.cc</filename> 2073 </para> 2074 </listitem> 2075 </itemizedlist> 2076 2077 </section> 2078 2079 2080 </section> 2081 2082 </section> 2083 2084 </section> <!-- using --> 2085 2086 <!-- S03: Design --> 2087 2088 2089<section xml:id="containers.pbds.design"> 2090 <info><title>Design</title></info> 2091 <?dbhtml filename="policy_data_structures_design.html"?> 2092 <para></para> 2093 2094 <section xml:id="pbds.design.concepts"> 2095 <info><title>Concepts</title></info> 2096 2097 <section xml:id="pbds.design.concepts.null_type"> 2098 <info><title>Null Policy Classes</title></info> 2099 2100 <para> 2101 Associative containers are typically parametrized by various 2102 policies. For example, a hash-based associative container is 2103 parametrized by a hash-functor, transforming each key into an 2104 non-negative numerical type. Each such value is then further mapped 2105 into a position within the table. The mapping of a key into a 2106 position within the table is therefore a two-step process. 2107 </para> 2108 2109 <para> 2110 In some cases, instantiations are redundant. For example, when the 2111 keys are integers, it is possible to use a redundant hash policy, 2112 which transforms each key into its value. 2113 </para> 2114 2115 <para> 2116 In some other cases, these policies are irrelevant. For example, a 2117 hash-based associative container might transform keys into positions 2118 within a table by a different method than the two-step method 2119 described above. In such a case, the hash functor is simply 2120 irrelevant. 2121 </para> 2122 2123 <para> 2124 When a policy is either redundant or irrelevant, it can be replaced 2125 by <classname>null_type</classname>. 2126 </para> 2127 2128 <para> 2129 For example, a <emphasis>set</emphasis> is an associative 2130 container with one of its template parameters (the one for the 2131 mapped type) replaced with <classname>null_type</classname>. Other 2132 places simplifications are made possible with this technique 2133 include node updates in tree and trie data structures, and hash 2134 and probe functions for hash data structures. 2135 </para> 2136 </section> 2137 2138 <section xml:id="pbds.design.concepts.associative_semantics"> 2139 <info><title>Map and Set Semantics</title></info> 2140 2141 <section xml:id="concepts.associative_semantics.set_vs_map"> 2142 <info> 2143 <title> 2144 Distinguishing Between Maps and Sets 2145 </title> 2146 </info> 2147 2148 <para> 2149 Anyone familiar with the standard knows that there are four kinds 2150 of associative containers: maps, sets, multimaps, and 2151 multisets. The map datatype associates each key to 2152 some data. 2153 </para> 2154 2155 <para> 2156 Sets are associative containers that simply store keys - 2157 they do not map them to anything. In the standard, each map class 2158 has a corresponding set class. E.g., 2159 <classname>std::map<int, char></classname> maps each 2160 <classname>int</classname> to a <classname>char</classname>, but 2161 <classname>std::set<int, char></classname> simply stores 2162 <classname>int</classname>s. In this library, however, there are no 2163 distinct classes for maps and sets. Instead, an associative 2164 container's <classname>Mapped</classname> template parameter is a policy: if 2165 it is instantiated by <classname>null_type</classname>, then it 2166 is a "set"; otherwise, it is a "map". E.g., 2167 </para> 2168 <programlisting> 2169 cc_hash_table<int, char> 2170 </programlisting> 2171 <para> 2172 is a "map" mapping each <type>int</type> value to a <type> 2173 char</type>, but 2174 </para> 2175 <programlisting> 2176 cc_hash_table<int, null_type> 2177 </programlisting> 2178 <para> 2179 is a type that uniquely stores <type>int</type> values. 2180 </para> 2181 <para>Once the <classname>Mapped</classname> template parameter is instantiated 2182 by <classname>null_type</classname>, then 2183 the "set" acts very similarly to the standard's sets - it does not 2184 map each key to a distinct <classname>null_type</classname> object. Also, 2185 , the container's <type>value_type</type> is essentially 2186 its <type>key_type</type> - just as with the standard's sets 2187 .</para> 2188 2189 <para> 2190 The standard's multimaps and multisets allow, respectively, 2191 non-uniquely mapping keys and non-uniquely storing keys. As 2192 discussed, the 2193 reasons why this might be necessary are 1) that a key might be 2194 decomposed into a primary key and a secondary key, 2) that a 2195 key might appear more than once, or 3) any arbitrary 2196 combination of 1)s and 2)s. Correspondingly, 2197 one should use 1) "maps" mapping primary keys to secondary 2198 keys, 2) "maps" mapping keys to size types, or 3) any arbitrary 2199 combination of 1)s and 2)s. Thus, for example, an 2200 <classname>std::multiset<int></classname> might be used to store 2201 multiple instances of integers, but using this library's 2202 containers, one might use 2203 </para> 2204 <programlisting> 2205 tree<int, size_t> 2206 </programlisting> 2207 2208 <para> 2209 i.e., a <classname>map</classname> of <type>int</type>s to 2210 <type>size_t</type>s. 2211 </para> 2212 <para> 2213 These "multimaps" and "multisets" might be confusing to 2214 anyone familiar with the standard's <classname>std::multimap</classname> and 2215 <classname>std::multiset</classname>, because there is no clear 2216 correspondence between the two. For example, in some cases 2217 where one uses <classname>std::multiset</classname> in the standard, one might use 2218 in this library a "multimap" of "multisets" - i.e., a 2219 container that maps primary keys each to an associative 2220 container that maps each secondary key to the number of times 2221 it occurs. 2222 </para> 2223 2224 <para> 2225 When one uses a "multimap," one should choose with care the 2226 type of container used for secondary keys. 2227 </para> 2228 </section> <!-- map vs set --> 2229 2230 2231 <section xml:id="concepts.associative_semantics.multi"> 2232 <info><title>Alternatives to <classname>std::multiset</classname> and <classname>std::multimap</classname></title></info> 2233 2234 <para> 2235 Brace onself: this library does not contain containers like 2236 <classname>std::multimap</classname> or 2237 <classname>std::multiset</classname>. Instead, these data 2238 structures can be synthesized via manipulation of the 2239 <classname>Mapped</classname> template parameter. 2240 </para> 2241 <para> 2242 One maps the unique part of a key - the primary key, into an 2243 associative-container of the (originally) non-unique parts of 2244 the key - the secondary key. A primary associative-container 2245 is an associative container of primary keys; a secondary 2246 associative-container is an associative container of 2247 secondary keys. 2248 </para> 2249 2250 <para> 2251 Stepping back a bit, and starting in from the beginning. 2252 </para> 2253 2254 2255 <para> 2256 Maps (or sets) allow mapping (or storing) unique-key values. 2257 The standard library also supplies associative containers which 2258 map (or store) multiple values with equivalent keys: 2259 <classname>std::multimap</classname>, <classname>std::multiset</classname>, 2260 <classname>std::tr1::unordered_multimap</classname>, and 2261 <classname>unordered_multiset</classname>. We first discuss how these might 2262 be used, then why we think it is best to avoid them. 2263 </para> 2264 2265 <para> 2266 Suppose one builds a simple bank-account application that 2267 records for each client (identified by an <classname>std::string</classname>) 2268 and account-id (marked by an <type>unsigned long</type>) - 2269 the balance in the account (described by a 2270 <type>float</type>). Suppose further that ordering this 2271 information is not useful, so a hash-based container is 2272 preferable to a tree based container. Then one can use 2273 </para> 2274 2275 <programlisting> 2276 std::tr1::unordered_map<std::pair<std::string, unsigned long>, float, ...> 2277 </programlisting> 2278 2279 <para> 2280 which hashes every combination of client and account-id. This 2281 might work well, except for the fact that it is now impossible 2282 to efficiently list all of the accounts of a specific client 2283 (this would practically require iterating over all 2284 entries). Instead, one can use 2285 </para> 2286 2287 <programlisting> 2288 std::tr1::unordered_multimap<std::pair<std::string, unsigned long>, float, ...> 2289 </programlisting> 2290 2291 <para> 2292 which hashes every client, and decides equivalence based on 2293 client only. This will ensure that all accounts belonging to a 2294 specific user are stored consecutively. 2295 </para> 2296 2297 <para> 2298 Also, suppose one wants an integers' priority queue 2299 (a container that supports <function>push</function>, 2300 <function>pop</function>, and <function>top</function> operations, the last of which 2301 returns the largest <type>int</type>) that also supports 2302 operations such as <function>find</function> and <function>lower_bound</function>. A 2303 reasonable solution is to build an adapter over 2304 <classname>std::set<int></classname>. In this adapter, 2305 <function>push</function> will just call the tree-based 2306 associative container's <function>insert</function> method; <function>pop</function> 2307 will call its <function>end</function> method, and use it to return the 2308 preceding element (which must be the largest). Then this might 2309 work well, except that the container object cannot hold 2310 multiple instances of the same integer (<function>push(4)</function>, 2311 will be a no-op if <constant>4</constant> is already in the 2312 container object). If multiple keys are necessary, then one 2313 might build the adapter over an 2314 <classname>std::multiset<int></classname>. 2315 </para> 2316 2317 <para> 2318 The standard library's non-unique-mapping containers are useful 2319 when (1) a key can be decomposed in to a primary key and a 2320 secondary key, (2) a key is needed multiple times, or (3) any 2321 combination of (1) and (2). 2322 </para> 2323 2324 <para> 2325 The graphic below shows how the standard library's container 2326 design works internally; in this figure nodes shaded equally 2327 represent equivalent-key values. Equivalent keys are stored 2328 consecutively using the properties of the underlying data 2329 structure: binary search trees (label A) store equivalent-key 2330 values consecutively (in the sense of an in-order walk) 2331 naturally; collision-chaining hash tables (label B) store 2332 equivalent-key values in the same bucket, the bucket can be 2333 arranged so that equivalent-key values are consecutive. 2334 </para> 2335 2336 <figure> 2337 <title>Non-unique Mapping Standard Containers</title> 2338 <mediaobject> 2339 <imageobject> 2340 <imagedata align="center" format="PNG" scale="100" 2341 fileref="../images/pbds_embedded_lists_1.png"/> 2342 </imageobject> 2343 <textobject> 2344 <phrase>Non-unique Mapping Standard Containers</phrase> 2345 </textobject> 2346 </mediaobject> 2347 </figure> 2348 2349 <para> 2350 Put differently, the standards' non-unique mapping 2351 associative-containers are associative containers that map 2352 primary keys to linked lists that are embedded into the 2353 container. The graphic below shows again the two 2354 containers from the first graphic above, this time with 2355 the embedded linked lists of the grayed nodes marked 2356 explicitly. 2357 </para> 2358 2359 <figure xml:id="fig.pbds_embedded_lists_2"> 2360 <title> 2361 Effect of embedded lists in 2362 <classname>std::multimap</classname> 2363 </title> 2364 <mediaobject> 2365 <imageobject> 2366 <imagedata align="center" format="PNG" scale="100" 2367 fileref="../images/pbds_embedded_lists_2.png"/> 2368 </imageobject> 2369 <textobject> 2370 <phrase> 2371 Effect of embedded lists in 2372 <classname>std::multimap</classname> 2373 </phrase> 2374 </textobject> 2375 </mediaobject> 2376 </figure> 2377 2378 <para> 2379 These embedded linked lists have several disadvantages. 2380 </para> 2381 2382 <orderedlist> 2383 <listitem> 2384 <para> 2385 The underlying data structure embeds the linked lists 2386 according to its own consideration, which means that the 2387 search path for a value might include several different 2388 equivalent-key values. For example, the search path for the 2389 the black node in either of the first graphic, labels A or B, 2390 includes more than a single gray node. 2391 </para> 2392 </listitem> 2393 2394 <listitem> 2395 <para> 2396 The links of the linked lists are the underlying data 2397 structures' nodes, which typically are quite structured. In 2398 the case of tree-based containers (the grapic above, label 2399 B), each "link" is actually a node with three pointers (one 2400 to a parent and two to children), and a 2401 relatively-complicated iteration algorithm. The linked 2402 lists, therefore, can take up quite a lot of memory, and 2403 iterating over all values equal to a given key (through the 2404 return value of the standard 2405 library's <function>equal_range</function>) can be 2406 expensive. 2407 </para> 2408 </listitem> 2409 2410 <listitem> 2411 <para> 2412 The primary key is stored multiply; this uses more memory. 2413 </para> 2414 </listitem> 2415 2416 <listitem> 2417 <para> 2418 Finally, the interface of this design excludes several 2419 useful underlying data structures. Of all the unordered 2420 self-organizing data structures, practically only 2421 collision-chaining hash tables can (efficiently) guarantee 2422 that equivalent-key values are stored consecutively. 2423 </para> 2424 </listitem> 2425 </orderedlist> 2426 2427 <para> 2428 The above reasons hold even when the ratio of secondary keys to 2429 primary keys (or average number of identical keys) is small, but 2430 when it is large, there are more severe problems: 2431 </para> 2432 2433 <orderedlist> 2434 <listitem> 2435 <para> 2436 The underlying data structures order the links inside each 2437 embedded linked-lists according to their internal 2438 considerations, which effectively means that each of the 2439 links is unordered. Irrespective of the underlying data 2440 structure, searching for a specific value can degrade to 2441 linear complexity. 2442 </para> 2443 </listitem> 2444 2445 <listitem> 2446 <para> 2447 Similarly to the above point, it is impossible to apply 2448 to the secondary keys considerations that apply to primary 2449 keys. For example, it is not possible to maintain secondary 2450 keys by sorted order. 2451 </para> 2452 </listitem> 2453 2454 <listitem> 2455 <para> 2456 While the interface "understands" that all equivalent-key 2457 values constitute a distinct list (through 2458 <function>equal_range</function>), the underlying data 2459 structure typically does not. This means that operations such 2460 as erasing from a tree-based container all values whose keys 2461 are equivalent to a a given key can be super-linear in the 2462 size of the tree; this is also true also for several other 2463 operations that target a specific list. 2464 </para> 2465 </listitem> 2466 2467 </orderedlist> 2468 2469 <para> 2470 In this library, all associative containers map 2471 (or store) unique-key values. One can (1) map primary keys to 2472 secondary associative-containers (containers of 2473 secondary keys) or non-associative containers (2) map identical 2474 keys to a size-type representing the number of times they 2475 occur, or (3) any combination of (1) and (2). Instead of 2476 allowing multiple equivalent-key values, this library 2477 supplies associative containers based on underlying 2478 data structures that are suitable as secondary 2479 associative-containers. 2480 </para> 2481 2482 <para> 2483 In the figure below, labels A and B show the equivalent 2484 underlying data structures in this library, as mapped to the 2485 first graphic above. Labels A and B, respectively. Each shaded 2486 box represents some size-type or secondary 2487 associative-container. 2488 </para> 2489 2490 <figure> 2491 <title>Non-unique Mapping Containers</title> 2492 <mediaobject> 2493 <imageobject> 2494 <imagedata align="center" format="PNG" scale="100" 2495 fileref="../images/pbds_embedded_lists_3.png"/> 2496 </imageobject> 2497 <textobject> 2498 <phrase>Non-unique Mapping Containers</phrase> 2499 </textobject> 2500 </mediaobject> 2501 </figure> 2502 2503 <para> 2504 In the first example above, then, one would use an associative 2505 container mapping each user to an associative container which 2506 maps each application id to a start time (see 2507 <filename>example/basic_multimap.cc</filename>); in the second 2508 example, one would use an associative container mapping 2509 each <classname>int</classname> to some size-type indicating the 2510 number of times it logically occurs 2511 (see <filename>example/basic_multiset.cc</filename>. 2512 </para> 2513 2514 <para> 2515 See the discussion in list-based container types for containers 2516 especially suited as secondary associative-containers. 2517 </para> 2518 </section> 2519 2520 </section> <!-- map and set semantics --> 2521 2522 <section xml:id="pbds.design.concepts.iterator_semantics"> 2523 <info><title>Iterator Semantics</title></info> 2524 2525 <section xml:id="concepts.iterator_semantics.point_and_range"> 2526 <info><title>Point and Range Iterators</title></info> 2527 2528 <para> 2529 Iterator concepts are bifurcated in this design, and are 2530 comprised of point-type and range-type iteration. 2531 </para> 2532 2533 <para> 2534 A point-type iterator is an iterator that refers to a specific 2535 element as returned through an 2536 associative-container's <function>find</function> method. 2537 </para> 2538 2539 <para> 2540 A range-type iterator is an iterator that is used to go over a 2541 sequence of elements, as returned by a container's 2542 <function>find</function> method. 2543 </para> 2544 2545 <para> 2546 A point-type method is a method that 2547 returns a point-type iterator; a range-type method is a method 2548 that returns a range-type iterator. 2549 </para> 2550 2551 <para>For most containers, these types are synonymous; for 2552 self-organizing containers, such as hash-based containers or 2553 priority queues, these are inherently different (in any 2554 implementation, including that of C++ standard library 2555 components), but in this design, it is made explicit. They are 2556 distinct types. 2557 </para> 2558 </section> 2559 2560 2561 <section xml:id="concepts.iterator_semantics.both"> 2562 <info><title>Distinguishing Point and Range Iterators</title></info> 2563 2564 <para>When using this library, is necessary to differentiate 2565 between two types of methods and iterators: point-type methods and 2566 iterators, and range-type methods and iterators. Each associative 2567 container's interface includes the methods:</para> 2568 <programlisting> 2569 point_const_iterator 2570 find(const_key_reference r_key) const; 2571 2572 point_iterator 2573 find(const_key_reference r_key); 2574 2575 std::pair<point_iterator,bool> 2576 insert(const_reference r_val); 2577 </programlisting> 2578 2579 <para>The relationship between these iterator types varies between 2580 container types. The figure below 2581 shows the most general invariant between point-type and 2582 range-type iterators: In <emphasis>A</emphasis> <literal>iterator</literal>, can 2583 always be converted to <literal>point_iterator</literal>. In <emphasis>B</emphasis> 2584 shows invariants for order-preserving containers: point-type 2585 iterators are synonymous with range-type iterators. 2586 Orthogonally, <emphasis>C</emphasis>shows invariants for "set" 2587 containers: iterators are synonymous with const iterators.</para> 2588 2589 <figure> 2590 <title>Point Iterator Hierarchy</title> 2591 <mediaobject> 2592 <imageobject> 2593 <imagedata align="center" format="PNG" scale="100" 2594 fileref="../images/pbds_point_iterator_hierarchy.png"/> 2595 </imageobject> 2596 <textobject> 2597 <phrase>Point Iterator Hierarchy</phrase> 2598 </textobject> 2599 </mediaobject> 2600 </figure> 2601 2602 2603 <para>Note that point-type iterators in self-organizing containers 2604 (hash-based associative containers) lack movement 2605 operators, such as <literal>operator++</literal> - in fact, this 2606 is the reason why this library differentiates from the standard C++ librarys 2607 design on this point.</para> 2608 2609 <para>Typically, one can determine an iterator's movement 2610 capabilities using 2611 <literal>std::iterator_traits<It>iterator_category</literal>, 2612 which is a <literal>struct</literal> indicating the iterator's 2613 movement capabilities. Unfortunately, none of the standard predefined 2614 categories reflect a pointer's <emphasis>not</emphasis> having any 2615 movement capabilities whatsoever. Consequently, 2616 <literal>pb_ds</literal> adds a type 2617 <literal>trivial_iterator_tag</literal> (whose name is taken from 2618 a concept in C++ standardese, which is the category of iterators 2619 with no movement capabilities.) All other standard C++ library 2620 tags, such as <literal>forward_iterator_tag</literal> retain their 2621 common use.</para> 2622 2623 </section> 2624 2625 <section xml:id="pbds.design.concepts.invalidation"> 2626 <info><title>Invalidation Guarantees</title></info> 2627 <para> 2628 If one manipulates a container object, then iterators previously 2629 obtained from it can be invalidated. In some cases a 2630 previously-obtained iterator cannot be de-referenced; in other cases, 2631 the iterator's next or previous element might have changed 2632 unpredictably. This corresponds exactly to the question whether a 2633 point-type or range-type iterator (see previous concept) is valid or 2634 not. In this design, one can query a container (in compile time) about 2635 its invalidation guarantees. 2636 </para> 2637 2638 2639 <para> 2640 Given three different types of associative containers, a modifying 2641 operation (in that example, <function>erase</function>) invalidated 2642 iterators in three different ways: the iterator of one container 2643 remained completely valid - it could be de-referenced and 2644 incremented; the iterator of a different container could not even be 2645 de-referenced; the iterator of the third container could be 2646 de-referenced, but its "next" iterator changed unpredictably. 2647 </para> 2648 2649 <para> 2650 Distinguishing between find and range types allows fine-grained 2651 invalidation guarantees, because these questions correspond exactly 2652 to the question of whether point-type iterators and range-type 2653 iterators are valid. The graphic below shows tags corresponding to 2654 different types of invalidation guarantees. 2655 </para> 2656 2657 <figure> 2658 <title>Invalidation Guarantee Tags Hierarchy</title> 2659 <mediaobject> 2660 <imageobject> 2661 <imagedata align="center" format="PDF" scale="75" 2662 fileref="../images/pbds_invalidation_tag_hierarchy.pdf"/> 2663 </imageobject> 2664 <imageobject> 2665 <imagedata align="center" format="PNG" scale="100" 2666 fileref="../images/pbds_invalidation_tag_hierarchy.png"/> 2667 </imageobject> 2668 <textobject> 2669 <phrase>Invalidation Guarantee Tags Hierarchy</phrase> 2670 </textobject> 2671 </mediaobject> 2672 </figure> 2673 2674 <itemizedlist> 2675 <listitem> 2676 <para> 2677 <classname>basic_invalidation_guarantee</classname> 2678 corresponds to a basic guarantee that a point-type iterator, 2679 a found pointer, or a found reference, remains valid as long 2680 as the container object is not modified. 2681 </para> 2682 </listitem> 2683 2684 <listitem> 2685 <para> 2686 <classname>point_invalidation_guarantee</classname> 2687 corresponds to a guarantee that a point-type iterator, a 2688 found pointer, or a found reference, remains valid even if 2689 the container object is modified. 2690 </para> 2691 </listitem> 2692 2693 <listitem> 2694 <para> 2695 <classname>range_invalidation_guarantee</classname> 2696 corresponds to a guarantee that a range-type iterator remains 2697 valid even if the container object is modified. 2698 </para> 2699 </listitem> 2700 </itemizedlist> 2701 2702 <para>To find the invalidation guarantee of a 2703 container, one can use</para> 2704 <programlisting> 2705 typename container_traits<Cntnr>::invalidation_guarantee 2706 </programlisting> 2707 2708 <para>Note that this hierarchy corresponds to the logic it 2709 represents: if a container has range-invalidation guarantees, 2710 then it must also have find invalidation guarantees; 2711 correspondingly, its invalidation guarantee (in this case 2712 <classname>range_invalidation_guarantee</classname>) 2713 can be cast to its base class (in this case <classname>point_invalidation_guarantee</classname>). 2714 This means that this this hierarchy can be used easily using 2715 standard metaprogramming techniques, by specializing on the 2716 type of <literal>invalidation_guarantee</literal>.</para> 2717 2718 <para> 2719 These types of problems were addressed, in a more general 2720 setting, in <xref linkend="biblio.meyers96more"/> - Item 2. In 2721 our opinion, an invalidation-guarantee hierarchy would solve 2722 these problems in all container types - not just associative 2723 containers. 2724 </para> 2725 2726 </section> 2727 </section> <!-- iterator semantics --> 2728 2729 <section xml:id="pbds.design.concepts.genericity"> 2730 <info><title>Genericity</title></info> 2731 2732 <para> 2733 The design attempts to address the following problem of 2734 data-structure genericity. When writing a function manipulating 2735 a generic container object, what is the behavior of the object? 2736 Suppose one writes 2737 </para> 2738 <programlisting> 2739 template<typename Cntnr> 2740 void 2741 some_op_sequence(Cntnr &r_container) 2742 { 2743 ... 2744 } 2745 </programlisting> 2746 2747 <para> 2748 then one needs to address the following questions in the body 2749 of <function>some_op_sequence</function>: 2750 </para> 2751 2752 <itemizedlist> 2753 <listitem> 2754 <para> 2755 Which types and methods does <literal>Cntnr</literal> support? 2756 Containers based on hash tables can be queries for the 2757 hash-functor type and object; this is meaningless for tree-based 2758 containers. Containers based on trees can be split, joined, or 2759 can erase iterators and return the following iterator; this 2760 cannot be done by hash-based containers. 2761 </para> 2762 </listitem> 2763 2764 <listitem> 2765 <para> 2766 What are the exception and invalidation guarantees 2767 of <literal>Cntnr</literal>? A container based on a probing 2768 hash-table invalidates all iterators when it is modified; this 2769 is not the case for containers based on node-based 2770 trees. Containers based on a node-based tree can be split or 2771 joined without exceptions; this is not the case for containers 2772 based on vector-based trees. 2773 </para> 2774 </listitem> 2775 2776 <listitem> 2777 <para> 2778 How does the container maintain its elements? Tree-based and 2779 Trie-based containers store elements by key order; others, 2780 typically, do not. A container based on a splay trees or lists 2781 with update policies "cache" "frequently accessed" elements; 2782 containers based on most other underlying data structures do 2783 not. 2784 </para> 2785 </listitem> 2786 <listitem> 2787 <para> 2788 How does one query a container about characteristics and 2789 capabilities? What is the relationship between two different 2790 data structures, if anything? 2791 </para> 2792 </listitem> 2793 </itemizedlist> 2794 2795 <para>The remainder of this section explains these issues in 2796 detail.</para> 2797 2798 2799 <section xml:id="concepts.genericity.tag"> 2800 <info><title>Tag</title></info> 2801 <para> 2802 Tags are very useful for manipulating generic types. For example, if 2803 <literal>It</literal> is an iterator class, then <literal>typename 2804 It::iterator_category</literal> or <literal>typename 2805 std::iterator_traits<It>::iterator_category</literal> will 2806 yield its category, and <literal>typename 2807 std::iterator_traits<It>::value_type</literal> will yield its 2808 value type. 2809 </para> 2810 2811 <para> 2812 This library contains a container tag hierarchy corresponding to the 2813 diagram below. 2814 </para> 2815 2816 <figure> 2817 <title>Container Tag Hierarchy</title> 2818 <mediaobject> 2819 <imageobject> 2820 <imagedata align="center" format="PDF" scale="75" 2821 fileref="../images/pbds_container_tag_hierarchy.pdf"/> 2822 </imageobject> 2823 <imageobject> 2824 <imagedata align="center" format="PNG" scale="100" 2825 fileref="../images/pbds_container_tag_hierarchy.png"/> 2826 </imageobject> 2827 <textobject> 2828 <phrase>Container Tag Hierarchy</phrase> 2829 </textobject> 2830 </mediaobject> 2831 </figure> 2832 2833 <para> 2834 Given any container <type>Cntnr</type>, the tag of 2835 the underlying data structure can be found via <literal>typename 2836 Cntnr::container_category</literal>. 2837 </para> 2838 2839 </section> <!-- tag --> 2840 2841 <section xml:id="concepts.genericity.traits"> 2842 <info><title>Traits</title></info> 2843 <para></para> 2844 2845 <para>Additionally, a traits mechanism can be used to query a 2846 container type for its attributes. Given any container 2847 <literal>Cntnr</literal>, then <literal><Cntnr></literal> 2848 is a traits class identifying the properties of the 2849 container.</para> 2850 2851 <para>To find if a container can throw when a key is erased (which 2852 is true for vector-based trees, for example), one can 2853 use 2854 </para> 2855 <programlisting>container_traits<Cntnr>::erase_can_throw</programlisting> 2856 2857 <para> 2858 Some of the definitions in <classname>container_traits</classname> 2859 are dependent on other 2860 definitions. If <classname>container_traits<Cntnr>::order_preserving</classname> 2861 is <constant>true</constant> (which is the case for containers 2862 based on trees and tries), then the container can be split or 2863 joined; in this 2864 case, <classname>container_traits<Cntnr>::split_join_can_throw</classname> 2865 indicates whether splits or joins can throw exceptions (which is 2866 true for vector-based trees); 2867 otherwise <classname>container_traits<Cntnr>::split_join_can_throw</classname> 2868 will yield a compilation error. (This is somewhat similar to a 2869 compile-time version of the COM model). 2870 </para> 2871 2872 </section> <!-- traits --> 2873 2874 </section> <!-- genericity --> 2875 </section> <!-- concepts --> 2876 2877 <section xml:id="pbds.design.container"> 2878 <info><title>By Container</title></info> 2879 2880 <!-- hash --> 2881 <section xml:id="pbds.design.container.hash"> 2882 <info><title>hash</title></info> 2883 2884 <!-- 2885 2886// hash policies 2887/// general terms / background 2888/// range hashing policies 2889/// ranged-hash policies 2890/// implementation 2891 2892// resize policies 2893/// general 2894/// size policies 2895/// trigger policies 2896/// implementation 2897 2898// policy interactions 2899/// probe/size/trigger 2900/// hash/trigger 2901/// eq/hash/storing hash values 2902/// size/load-check trigger 2903 --> 2904 <section xml:id="container.hash.interface"> 2905 <info><title>Interface</title></info> 2906 2907 2908 2909 <para> 2910 The collision-chaining hash-based container has the 2911 following declaration.</para> 2912 <programlisting> 2913 template< 2914 typename Key, 2915 typename Mapped, 2916 typename Hash_Fn = std::hash<Key>, 2917 typename Eq_Fn = std::equal_to<Key>, 2918 typename Comb_Hash_Fn = direct_mask_range_hashing<> 2919 typename Resize_Policy = default explained below. 2920 bool Store_Hash = false, 2921 typename Allocator = std::allocator<char> > 2922 class cc_hash_table; 2923 </programlisting> 2924 2925 <para>The parameters have the following meaning:</para> 2926 2927 <orderedlist> 2928 <listitem><para><classname>Key</classname> is the key type.</para></listitem> 2929 2930 <listitem><para><classname>Mapped</classname> is the mapped-policy.</para></listitem> 2931 2932 <listitem><para><classname>Hash_Fn</classname> is a key hashing functor.</para></listitem> 2933 2934 <listitem><para><classname>Eq_Fn</classname> is a key equivalence functor.</para></listitem> 2935 2936 <listitem><para><classname>Comb_Hash_Fn</classname> is a range-hashing_functor; 2937 it describes how to translate hash values into positions 2938 within the table. </para></listitem> 2939 2940 <listitem><para><classname>Resize_Policy</classname> describes how a container object 2941 should change its internal size. </para></listitem> 2942 2943 <listitem><para><classname>Store_Hash</classname> indicates whether the hash value 2944 should be stored with each entry. </para></listitem> 2945 2946 <listitem><para><classname>Allocator</classname> is an allocator 2947 type.</para></listitem> 2948 </orderedlist> 2949 2950 <para>The probing hash-based container has the following 2951 declaration.</para> 2952 <programlisting> 2953 template< 2954 typename Key, 2955 typename Mapped, 2956 typename Hash_Fn = std::hash<Key>, 2957 typename Eq_Fn = std::equal_to<Key>, 2958 typename Comb_Probe_Fn = direct_mask_range_hashing<> 2959 typename Probe_Fn = default explained below. 2960 typename Resize_Policy = default explained below. 2961 bool Store_Hash = false, 2962 typename Allocator = std::allocator<char> > 2963 class gp_hash_table; 2964 </programlisting> 2965 2966 <para>The parameters are identical to those of the 2967 collision-chaining container, except for the following.</para> 2968 2969 <orderedlist> 2970 <listitem><para><classname>Comb_Probe_Fn</classname> describes how to transform a probe 2971 sequence into a sequence of positions within the table.</para></listitem> 2972 2973 <listitem><para><classname>Probe_Fn</classname> describes a probe sequence policy.</para></listitem> 2974 </orderedlist> 2975 2976 <para>Some of the default template values depend on the values of 2977 other parameters, and are explained below.</para> 2978 2979 </section> 2980 <section xml:id="container.hash.details"> 2981 <info><title>Details</title></info> 2982 2983 <section xml:id="container.hash.details.hash_policies"> 2984 <info><title>Hash Policies</title></info> 2985 2986 <section xml:id="details.hash_policies.general"> 2987 <info><title>General</title></info> 2988 2989 <para>Following is an explanation of some functions which hashing 2990 involves. The graphic below illustrates the discussion.</para> 2991 2992 <figure> 2993 <title>Hash functions, ranged-hash functions, and 2994 range-hashing functions</title> 2995 <mediaobject> 2996 <imageobject> 2997 <imagedata align="center" format="PNG" scale="100" 2998 fileref="../images/pbds_hash_ranged_hash_range_hashing_fns.png"/> 2999 </imageobject> 3000 <textobject> 3001 <phrase>Hash functions, ranged-hash functions, and 3002 range-hashing functions</phrase> 3003 </textobject> 3004 </mediaobject> 3005 </figure> 3006 3007 <para>Let U be a domain (e.g., the integers, or the 3008 strings of 3 characters). A hash-table algorithm needs to map 3009 elements of U "uniformly" into the range [0,..., m - 3010 1] (where m is a non-negative integral value, and 3011 is, in general, time varying). I.e., the algorithm needs 3012 a ranged-hash function</para> 3013 3014 <para> 3015 f : U �� Z<subscript>+</subscript> ��� Z<subscript>+</subscript> 3016 </para> 3017 3018 <para>such that for any u in U ,</para> 3019 3020 <para>0 ��� f(u, m) ��� m - 1</para> 3021 3022 <para>and which has "good uniformity" properties (say 3023 <xref linkend="biblio.knuth98sorting"/>.) 3024 One 3025 common solution is to use the composition of the hash 3026 function</para> 3027 3028 <para>h : U ��� Z<subscript>+</subscript> ,</para> 3029 3030 <para>which maps elements of U into the non-negative 3031 integrals, and</para> 3032 3033 <para>g : Z<subscript>+</subscript> �� Z<subscript>+</subscript> ��� 3034 Z<subscript>+</subscript>,</para> 3035 3036 <para>which maps a non-negative hash value, and a non-negative 3037 range upper-bound into a non-negative integral in the range 3038 between 0 (inclusive) and the range upper bound (exclusive), 3039 i.e., for any r in Z<subscript>+</subscript>,</para> 3040 3041 <para>0 ��� g(r, m) ��� m - 1</para> 3042 3043 3044 <para>The resulting ranged-hash function, is</para> 3045 3046 <!-- ranged_hash_composed_of_hash_and_range_hashing --> 3047 <equation> 3048 <title>Ranged Hash Function</title> 3049 <mathphrase> 3050 f(u , m) = g(h(u), m) 3051 </mathphrase> 3052 </equation> 3053 3054 <para>From the above, it is obvious that given g and 3055 h, f can always be composed (however the converse 3056 is not true). The standard's hash-based containers allow specifying 3057 a hash function, and use a hard-wired range-hashing function; 3058 the ranged-hash function is implicitly composed.</para> 3059 3060 <para>The above describes the case where a key is to be mapped 3061 into a single position within a hash table, e.g., 3062 in a collision-chaining table. In other cases, a key is to be 3063 mapped into a sequence of positions within a table, 3064 e.g., in a probing table. Similar terms apply in this 3065 case: the table requires a ranged probe function, 3066 mapping a key into a sequence of positions withing the table. 3067 This is typically achieved by composing a hash function 3068 mapping the key into a non-negative integral type, a 3069 probe function transforming the hash value into a 3070 sequence of hash values, and a range-hashing function 3071 transforming the sequence of hash values into a sequence of 3072 positions.</para> 3073 3074 </section> 3075 3076 <section xml:id="details.hash_policies.range"> 3077 <info><title>Range Hashing</title></info> 3078 3079 <para>Some common choices for range-hashing functions are the 3080 division, multiplication, and middle-square methods (<xref linkend="biblio.knuth98sorting"/>), defined 3081 as</para> 3082 3083 <equation> 3084 <title>Range-Hashing, Division Method</title> 3085 <mathphrase> 3086 g(r, m) = r mod m 3087 </mathphrase> 3088 </equation> 3089 3090 3091 3092 <para>g(r, m) = ��� u/v ( a r mod v ) ���</para> 3093 3094 <para>and</para> 3095 3096 <para>g(r, m) = ��� u/v ( r<superscript>2</superscript> mod v ) ���</para> 3097 3098 <para>respectively, for some positive integrals u and 3099 v (typically powers of 2), and some a. Each of 3100 these range-hashing functions works best for some different 3101 setting.</para> 3102 3103 <para>The division method (see above) is a 3104 very common choice. However, even this single method can be 3105 implemented in two very different ways. It is possible to 3106 implement using the low 3107 level % (modulo) operation (for any m), or the 3108 low level & (bit-mask) operation (for the case where 3109 m is a power of 2), i.e.,</para> 3110 3111 <equation> 3112 <title>Division via Prime Modulo</title> 3113 <mathphrase> 3114 g(r, m) = r % m 3115 </mathphrase> 3116 </equation> 3117 3118 <para>and</para> 3119 3120 <equation> 3121 <title>Division via Bit Mask</title> 3122 <mathphrase> 3123 g(r, m) = r & m - 1, (with m = 3124 2<superscript>k</superscript> for some k) 3125 </mathphrase> 3126 </equation> 3127 3128 3129 <para>respectively.</para> 3130 3131 <para>The % (modulo) implementation has the advantage that for 3132 m a prime far from a power of 2, g(r, m) is 3133 affected by all the bits of r (minimizing the chance of 3134 collision). It has the disadvantage of using the costly modulo 3135 operation. This method is hard-wired into SGI's implementation 3136 .</para> 3137 3138 <para>The & (bit-mask) implementation has the advantage of 3139 relying on the fast bit-wise and operation. It has the 3140 disadvantage that for g(r, m) is affected only by the 3141 low order bits of r. This method is hard-wired into 3142 Dinkumware's implementation.</para> 3143 3144 3145 </section> 3146 3147 <section xml:id="details.hash_policies.ranged"> 3148 <info><title>Ranged Hash</title></info> 3149 3150 <para>In cases it is beneficial to allow the 3151 client to directly specify a ranged-hash hash function. It is 3152 true, that the writer of the ranged-hash function cannot rely 3153 on the values of m having specific numerical properties 3154 suitable for hashing (in the sense used in <xref linkend="biblio.knuth98sorting"/>), since 3155 the values of m are determined by a resize policy with 3156 possibly orthogonal considerations.</para> 3157 3158 <para>There are two cases where a ranged-hash function can be 3159 superior. The firs is when using perfect hashing: the 3160 second is when the values of m can be used to estimate 3161 the "general" number of distinct values required. This is 3162 described in the following.</para> 3163 3164 <para>Let</para> 3165 3166 <para> 3167 s = [ s<subscript>0</subscript>,..., s<subscript>t - 1</subscript>] 3168 </para> 3169 3170 <para>be a string of t characters, each of which is from 3171 domain S. Consider the following ranged-hash 3172 function:</para> 3173 <equation> 3174 <title> 3175 A Standard String Hash Function 3176 </title> 3177 <mathphrase> 3178 f<subscript>1</subscript>(s, m) = ��� <subscript>i = 3179 0</subscript><superscript>t - 1</superscript> s<subscript>i</subscript> a<superscript>i</superscript> mod m 3180 </mathphrase> 3181 </equation> 3182 3183 3184 <para>where a is some non-negative integral value. This is 3185 the standard string-hashing function used in SGI's 3186 implementation (with a = 5). Its advantage is that 3187 it takes into account all of the characters of the string.</para> 3188 3189 <para>Now assume that s is the string representation of a 3190 of a long DNA sequence (and so S = {'A', 'C', 'G', 3191 'T'}). In this case, scanning the entire string might be 3192 prohibitively expensive. A possible alternative might be to use 3193 only the first k characters of the string, where</para> 3194 3195 <para>|S|<superscript>k</superscript> ��� m ,</para> 3196 3197 <para>i.e., using the hash function</para> 3198 3199 <equation> 3200 <title> 3201 Only k String DNA Hash 3202 </title> 3203 <mathphrase> 3204 f<subscript>2</subscript>(s, m) = ��� <subscript>i 3205 = 0</subscript><superscript>k - 1</superscript> s<subscript>i</subscript> a<superscript>i</superscript> mod m 3206 </mathphrase> 3207 </equation> 3208 3209 <para>requiring scanning over only</para> 3210 3211 <para>k = log<subscript>4</subscript>( m )</para> 3212 3213 <para>characters.</para> 3214 3215 <para>Other more elaborate hash-functions might scan k 3216 characters starting at a random position (determined at each 3217 resize), or scanning k random positions (determined at 3218 each resize), i.e., using</para> 3219 3220 <para>f<subscript>3</subscript>(s, m) = ��� <subscript>i = 3221 r</subscript>0<superscript>r<subscript>0</subscript> + k - 1</superscript> s<subscript>i</subscript> 3222 a<superscript>i</superscript> mod m ,</para> 3223 3224 <para>or</para> 3225 3226 <para>f<subscript>4</subscript>(s, m) = ��� <subscript>i = 0</subscript><superscript>k - 3227 1</superscript> s<subscript>r</subscript>i a<superscript>r<subscript>i</subscript></superscript> mod 3228 m ,</para> 3229 3230 <para>respectively, for r<subscript>0</subscript>,..., r<subscript>k-1</subscript> 3231 each in the (inclusive) range [0,...,t-1].</para> 3232 3233 <para>It should be noted that the above functions cannot be 3234 decomposed as per a ranged hash composed of hash and range hashing.</para> 3235 3236 3237 </section> 3238 3239 <section xml:id="details.hash_policies.implementation"> 3240 <info><title>Implementation</title></info> 3241 3242 <para>This sub-subsection describes the implementation of 3243 the above in this library. It first explains range-hashing 3244 functions in collision-chaining tables, then ranged-hash 3245 functions in collision-chaining tables, then probing-based 3246 tables, and finally lists the relevant classes in this 3247 library.</para> 3248 3249 <section xml:id="hash_policies.implementation.collision-chaining"> 3250 <info><title> 3251 Range-Hashing and Ranged-Hashes in Collision-Chaining Tables 3252 </title></info> 3253 3254 3255 <para><classname>cc_hash_table</classname> is 3256 parametrized by <classname>Hash_Fn</classname> and <classname>Comb_Hash_Fn</classname>, a 3257 hash functor and a combining hash functor, respectively.</para> 3258 3259 <para>In general, <classname>Comb_Hash_Fn</classname> is considered a 3260 range-hashing functor. <classname>cc_hash_table</classname> 3261 synthesizes a ranged-hash function from <classname>Hash_Fn</classname> and 3262 <classname>Comb_Hash_Fn</classname>. The figure below shows an <classname>insert</classname> sequence 3263 diagram for this case. The user inserts an element (point A), 3264 the container transforms the key into a non-negative integral 3265 using the hash functor (points B and C), and transforms the 3266 result into a position using the combining functor (points D 3267 and E).</para> 3268 3269 <figure> 3270 <title>Insert hash sequence diagram</title> 3271 <mediaobject> 3272 <imageobject> 3273 <imagedata align="center" format="PNG" scale="100" 3274 fileref="../images/pbds_hash_range_hashing_seq_diagram.png"/> 3275 </imageobject> 3276 <textobject> 3277 <phrase>Insert hash sequence diagram</phrase> 3278 </textobject> 3279 </mediaobject> 3280 </figure> 3281 3282 <para>If <classname>cc_hash_table</classname>'s 3283 hash-functor, <classname>Hash_Fn</classname> is instantiated by <classname>null_type</classname> , then <classname>Comb_Hash_Fn</classname> is taken to be 3284 a ranged-hash function. The graphic below shows an <function>insert</function> sequence 3285 diagram. The user inserts an element (point A), the container 3286 transforms the key into a position using the combining functor 3287 (points B and C).</para> 3288 3289 <figure> 3290 <title>Insert hash sequence diagram with a null policy</title> 3291 <mediaobject> 3292 <imageobject> 3293 <imagedata align="center" format="PNG" scale="100" 3294 fileref="../images/pbds_hash_range_hashing_seq_diagram2.png"/> 3295 </imageobject> 3296 <textobject> 3297 <phrase>Insert hash sequence diagram with a null policy</phrase> 3298 </textobject> 3299 </mediaobject> 3300 </figure> 3301 3302 </section> 3303 3304 <section xml:id="hash_policies.implementation.probe"> 3305 <info><title> 3306 Probing tables 3307 </title></info> 3308 <para><classname>gp_hash_table</classname> is parametrized by 3309 <classname>Hash_Fn</classname>, <classname>Probe_Fn</classname>, 3310 and <classname>Comb_Probe_Fn</classname>. As before, if 3311 <classname>Hash_Fn</classname> and <classname>Probe_Fn</classname> 3312 are both <classname>null_type</classname>, then 3313 <classname>Comb_Probe_Fn</classname> is a ranged-probe 3314 functor. Otherwise, <classname>Hash_Fn</classname> is a hash 3315 functor, <classname>Probe_Fn</classname> is a functor for offsets 3316 from a hash value, and <classname>Comb_Probe_Fn</classname> 3317 transforms a probe sequence into a sequence of positions within 3318 the table.</para> 3319 3320 </section> 3321 3322 <section xml:id="hash_policies.implementation.predefined"> 3323 <info><title> 3324 Pre-Defined Policies 3325 </title></info> 3326 3327 <para>This library contains some pre-defined classes 3328 implementing range-hashing and probing functions:</para> 3329 3330 <orderedlist> 3331 <listitem><para><classname>direct_mask_range_hashing</classname> 3332 and <classname>direct_mod_range_hashing</classname> 3333 are range-hashing functions based on a bit-mask and a modulo 3334 operation, respectively.</para></listitem> 3335 3336 <listitem><para><classname>linear_probe_fn</classname>, and 3337 <classname>quadratic_probe_fn</classname> are 3338 a linear probe and a quadratic probe function, 3339 respectively.</para></listitem> 3340 </orderedlist> 3341 3342 <para> 3343 The graphic below shows the relationships. 3344 </para> 3345 <figure> 3346 <title>Hash policy class diagram</title> 3347 <mediaobject> 3348 <imageobject> 3349 <imagedata align="center" format="PNG" scale="100" 3350 fileref="../images/pbds_hash_policy_cd.png"/> 3351 </imageobject> 3352 <textobject> 3353 <phrase>Hash policy class diagram</phrase> 3354 </textobject> 3355 </mediaobject> 3356 </figure> 3357 3358 3359 </section> 3360 3361 </section> <!-- impl --> 3362 3363 </section> 3364 3365 <section xml:id="container.hash.details.resize_policies"> 3366 <info><title>Resize Policies</title></info> 3367 3368 <section xml:id="resize_policies.general"> 3369 <info><title>General</title></info> 3370 3371 <para>Hash-tables, as opposed to trees, do not naturally grow or 3372 shrink. It is necessary to specify policies to determine how 3373 and when a hash table should change its size. Usually, resize 3374 policies can be decomposed into orthogonal policies:</para> 3375 3376 <orderedlist> 3377 <listitem><para>A size policy indicating how a hash table 3378 should grow (e.g., it should multiply by powers of 3379 2).</para></listitem> 3380 3381 <listitem><para>A trigger policy indicating when a hash 3382 table should grow (e.g., a load factor is 3383 exceeded).</para></listitem> 3384 </orderedlist> 3385 3386 </section> 3387 3388 <section xml:id="resize_policies.size"> 3389 <info><title>Size Policies</title></info> 3390 3391 3392 <para>Size policies determine how a hash table changes size. These 3393 policies are simple, and there are relatively few sensible 3394 options. An exponential-size policy (with the initial size and 3395 growth factors both powers of 2) works well with a mask-based 3396 range-hashing function, and is the 3397 hard-wired policy used by Dinkumware. A 3398 prime-list based policy works well with a modulo-prime range 3399 hashing function and is the hard-wired policy used by SGI's 3400 implementation.</para> 3401 3402 </section> 3403 3404 <section xml:id="resize_policies.trigger"> 3405 <info><title>Trigger Policies</title></info> 3406 3407 <para>Trigger policies determine when a hash table changes size. 3408 Following is a description of two policies: load-check 3409 policies, and collision-check policies.</para> 3410 3411 <para>Load-check policies are straightforward. The user specifies 3412 two factors, ��<subscript>min</subscript> and 3413 ��<subscript>max</subscript>, and the hash table maintains the 3414 invariant that</para> 3415 3416 <para>��<subscript>min</subscript> ��� (number of 3417 stored elements) / (hash-table size) ��� 3418 ��<subscript>max</subscript><remark>load factor min max</remark></para> 3419 3420 <para>Collision-check policies work in the opposite direction of 3421 load-check policies. They focus on keeping the number of 3422 collisions moderate and hoping that the size of the table will 3423 not grow very large, instead of keeping a moderate load-factor 3424 and hoping that the number of collisions will be small. A 3425 maximal collision-check policy resizes when the longest 3426 probe-sequence grows too large.</para> 3427 3428 <para>Consider the graphic below. Let the size of the hash table 3429 be denoted by m, the length of a probe sequence be denoted by k, 3430 and some load factor be denoted by ��. We would like to 3431 calculate the minimal length of k, such that if there were �� 3432 m elements in the hash table, a probe sequence of length k would 3433 be found with probability at most 1/m.</para> 3434 3435 <figure> 3436 <title>Balls and bins</title> 3437 <mediaobject> 3438 <imageobject> 3439 <imagedata align="center" format="PNG" scale="100" 3440 fileref="../images/pbds_balls_and_bins.png"/> 3441 </imageobject> 3442 <textobject> 3443 <phrase>Balls and bins</phrase> 3444 </textobject> 3445 </mediaobject> 3446 </figure> 3447 3448 <para>Denote the probability that a probe sequence of length 3449 k appears in bin i by p<subscript>i</subscript>, the 3450 length of the probe sequence of bin i by 3451 l<subscript>i</subscript>, and assume uniform distribution. Then</para> 3452 3453 3454 3455 <equation> 3456 <title> 3457 Probability of Probe Sequence of Length k 3458 </title> 3459 <mathphrase> 3460 p<subscript>1</subscript> = 3461 </mathphrase> 3462 </equation> 3463 3464 <para>P(l<subscript>1</subscript> ��� k) =</para> 3465 3466 <para> 3467 P(l<subscript>1</subscript> ��� �� ( 1 + k / �� - 1) ��� (a) 3468 </para> 3469 3470 <para> 3471 e ^ ( - ( �� ( k / �� - 1 )<superscript>2</superscript> ) /2) 3472 </para> 3473 3474 <para>where (a) follows from the Chernoff bound (<xref linkend="biblio.motwani95random"/>). To 3475 calculate the probability that some bin contains a probe 3476 sequence greater than k, we note that the 3477 l<subscript>i</subscript> are negatively-dependent 3478 (<xref linkend="biblio.dubhashi98neg"/>) 3479 . Let 3480 I(.) denote the indicator function. Then</para> 3481 3482 <equation> 3483 <title> 3484 Probability Probe Sequence in Some Bin 3485 </title> 3486 <mathphrase> 3487 P( exists<subscript>i</subscript> l<subscript>i</subscript> ��� k ) = 3488 </mathphrase> 3489 </equation> 3490 3491 <para>P ( ��� <subscript>i = 1</subscript><superscript>m</superscript> 3492 I(l<subscript>i</subscript> ��� k) ��� 1 ) =</para> 3493 3494 <para>P ( ��� <subscript>i = 1</subscript><superscript>m</superscript> I ( 3495 l<subscript>i</subscript> ��� k ) ��� m p<subscript>1</subscript> ( 1 + 1 / (m 3496 p<subscript>1</subscript>) - 1 ) ) ��� (a)</para> 3497 3498 <para>e ^ ( ( - m p<subscript>1</subscript> ( 1 / (m p<subscript>1</subscript>) 3499 - 1 ) <superscript>2</superscript> ) / 2 ) ,</para> 3500 3501 <para>where (a) follows from the fact that the Chernoff bound can 3502 be applied to negatively-dependent variables (<xref 3503 linkend="biblio.dubhashi98neg"/>). Inserting the first probability 3504 equation into the second one, and equating with 1/m, we 3505 obtain</para> 3506 3507 3508 <para>k ~ ��� ( 2 �� ln 2 m ln(m) ) 3509 ) .</para> 3510 3511 </section> 3512 3513 <section xml:id="resize_policies.impl"> 3514 <info><title>Implementation</title></info> 3515 3516 <para>This sub-subsection describes the implementation of the 3517 above in this library. It first describes resize policies and 3518 their decomposition into trigger and size policies, then 3519 describes pre-defined classes, and finally discusses controlled 3520 access the policies' internals.</para> 3521 3522 <section xml:id="resize_policies.impl.decomposition"> 3523 <info><title>Decomposition</title></info> 3524 3525 3526 <para>Each hash-based container is parametrized by a 3527 <classname>Resize_Policy</classname> parameter; the container derives 3528 <classname>public</classname>ly from <classname>Resize_Policy</classname>. For 3529 example:</para> 3530 <programlisting> 3531 cc_hash_table<typename Key, 3532 typename Mapped, 3533 ... 3534 typename Resize_Policy 3535 ...> : public Resize_Policy 3536 </programlisting> 3537 3538 <para>As a container object is modified, it continuously notifies 3539 its <classname>Resize_Policy</classname> base of internal changes 3540 (e.g., collisions encountered and elements being 3541 inserted). It queries its <classname>Resize_Policy</classname> base whether 3542 it needs to be resized, and if so, to what size.</para> 3543 3544 <para>The graphic below shows a (possible) sequence diagram 3545 of an insert operation. The user inserts an element; the hash 3546 table notifies its resize policy that a search has started 3547 (point A); in this case, a single collision is encountered - 3548 the table notifies its resize policy of this (point B); the 3549 container finally notifies its resize policy that the search 3550 has ended (point C); it then queries its resize policy whether 3551 a resize is needed, and if so, what is the new size (points D 3552 to G); following the resize, it notifies the policy that a 3553 resize has completed (point H); finally, the element is 3554 inserted, and the policy notified (point I).</para> 3555 3556 <figure> 3557 <title>Insert resize sequence diagram</title> 3558 <mediaobject> 3559 <imageobject> 3560 <imagedata align="center" format="PNG" scale="100" 3561 fileref="../images/pbds_insert_resize_sequence_diagram1.png"/> 3562 </imageobject> 3563 <textobject> 3564 <phrase>Insert resize sequence diagram</phrase> 3565 </textobject> 3566 </mediaobject> 3567 </figure> 3568 3569 3570 <para>In practice, a resize policy can be usually orthogonally 3571 decomposed to a size policy and a trigger policy. Consequently, 3572 the library contains a single class for instantiating a resize 3573 policy: <classname>hash_standard_resize_policy</classname> 3574 is parametrized by <classname>Size_Policy</classname> and 3575 <classname>Trigger_Policy</classname>, derives <classname>public</classname>ly from 3576 both, and acts as a standard delegate (<xref linkend="biblio.gof"/>) 3577 to these policies.</para> 3578 3579 <para>The two graphics immediately below show sequence diagrams 3580 illustrating the interaction between the standard resize policy 3581 and its trigger and size policies, respectively.</para> 3582 3583 <figure> 3584 <title>Standard resize policy trigger sequence 3585 diagram</title> 3586 <mediaobject> 3587 <imageobject> 3588 <imagedata align="center" format="PNG" scale="100" 3589 fileref="../images/pbds_insert_resize_sequence_diagram2.png"/> 3590 </imageobject> 3591 <textobject> 3592 <phrase>Standard resize policy trigger sequence 3593 diagram</phrase> 3594 </textobject> 3595 </mediaobject> 3596 </figure> 3597 3598 <figure> 3599 <title>Standard resize policy size sequence 3600 diagram</title> 3601 <mediaobject> 3602 <imageobject> 3603 <imagedata align="center" format="PNG" scale="100" 3604 fileref="../images/pbds_insert_resize_sequence_diagram3.png"/> 3605 </imageobject> 3606 <textobject> 3607 <phrase>Standard resize policy size sequence 3608 diagram</phrase> 3609 </textobject> 3610 </mediaobject> 3611 </figure> 3612 3613 3614 </section> 3615 3616 <section xml:id="resize_policies.impl.predefined"> 3617 <info><title>Predefined Policies</title></info> 3618 <para>The library includes the following 3619 instantiations of size and trigger policies:</para> 3620 3621 <orderedlist> 3622 <listitem><para><classname>hash_load_check_resize_trigger</classname> 3623 implements a load check trigger policy.</para></listitem> 3624 3625 <listitem><para><classname>cc_hash_max_collision_check_resize_trigger</classname> 3626 implements a collision check trigger policy.</para></listitem> 3627 3628 <listitem><para><classname>hash_exponential_size_policy</classname> 3629 implements an exponential-size policy (which should be used 3630 with mask range hashing).</para></listitem> 3631 3632 <listitem><para><classname>hash_prime_size_policy</classname> 3633 implementing a size policy based on a sequence of primes 3634 (which should 3635 be used with mod range hashing</para></listitem> 3636 </orderedlist> 3637 3638 <para>The graphic below gives an overall picture of the resize-related 3639 classes. <classname>basic_hash_table</classname> 3640 is parametrized by <classname>Resize_Policy</classname>, which it subclasses 3641 publicly. This class is currently instantiated only by <classname>hash_standard_resize_policy</classname>. 3642 <classname>hash_standard_resize_policy</classname> 3643 itself is parametrized by <classname>Trigger_Policy</classname> and 3644 <classname>Size_Policy</classname>. Currently, <classname>Trigger_Policy</classname> is 3645 instantiated by <classname>hash_load_check_resize_trigger</classname>, 3646 or <classname>cc_hash_max_collision_check_resize_trigger</classname>; 3647 <classname>Size_Policy</classname> is instantiated by <classname>hash_exponential_size_policy</classname>, 3648 or <classname>hash_prime_size_policy</classname>.</para> 3649 3650 </section> 3651 3652 <section xml:id="resize_policies.impl.internals"> 3653 <info><title>Controling Access to Internals</title></info> 3654 3655 <para>There are cases where (controlled) access to resize 3656 policies' internals is beneficial. E.g., it is sometimes 3657 useful to query a hash-table for the table's actual size (as 3658 opposed to its <function>size()</function> - the number of values it 3659 currently holds); it is sometimes useful to set a table's 3660 initial size, externally resize it, or change load factors.</para> 3661 3662 <para>Clearly, supporting such methods both decreases the 3663 encapsulation of hash-based containers, and increases the 3664 diversity between different associative-containers' interfaces. 3665 Conversely, omitting such methods can decrease containers' 3666 flexibility.</para> 3667 3668 <para>In order to avoid, to the extent possible, the above 3669 conflict, the hash-based containers themselves do not address 3670 any of these questions; this is deferred to the resize policies, 3671 which are easier to change or replace. Thus, for example, 3672 neither <classname>cc_hash_table</classname> nor 3673 <classname>gp_hash_table</classname> 3674 contain methods for querying the actual size of the table; this 3675 is deferred to <classname>hash_standard_resize_policy</classname>.</para> 3676 3677 <para>Furthermore, the policies themselves are parametrized by 3678 template arguments that determine the methods they support 3679 ( 3680 <xref linkend="biblio.alexandrescu01modern"/> 3681 shows techniques for doing so). <classname>hash_standard_resize_policy</classname> 3682 is parametrized by <classname>External_Size_Access</classname> that 3683 determines whether it supports methods for querying the actual 3684 size of the table or resizing it. <classname>hash_load_check_resize_trigger</classname> 3685 is parametrized by <classname>External_Load_Access</classname> that 3686 determines whether it supports methods for querying or 3687 modifying the loads. <classname>cc_hash_max_collision_check_resize_trigger</classname> 3688 is parametrized by <classname>External_Load_Access</classname> that 3689 determines whether it supports methods for querying the 3690 load.</para> 3691 3692 <para>Some operations, for example, resizing a container at 3693 run time, or changing the load factors of a load-check trigger 3694 policy, require the container itself to resize. As mentioned 3695 above, the hash-based containers themselves do not contain 3696 these types of methods, only their resize policies. 3697 Consequently, there must be some mechanism for a resize policy 3698 to manipulate the hash-based container. As the hash-based 3699 container is a subclass of the resize policy, this is done 3700 through virtual methods. Each hash-based container has a 3701 <classname>private</classname> <classname>virtual</classname> method:</para> 3702 <programlisting> 3703 virtual void 3704 do_resize 3705 (size_type new_size); 3706 </programlisting> 3707 3708 <para>which resizes the container. Implementations of 3709 <classname>Resize_Policy</classname> can export public methods for resizing 3710 the container externally; these methods internally call 3711 <classname>do_resize</classname> to resize the table.</para> 3712 3713 3714 </section> 3715 3716 </section> 3717 3718 3719 </section> <!-- resize policies --> 3720 3721 <section xml:id="container.hash.details.policy_interaction"> 3722 <info><title>Policy Interactions</title></info> 3723 <para> 3724 </para> 3725 <para>Hash-tables are unfortunately especially susceptible to 3726 choice of policies. One of the more complicated aspects of this 3727 is that poor combinations of good policies can form a poor 3728 container. Following are some considerations.</para> 3729 3730 <section xml:id="policy_interaction.probesizetrigger"> 3731 <info><title>probe/size/trigger</title></info> 3732 3733 <para>Some combinations do not work well for probing containers. 3734 For example, combining a quadratic probe policy with an 3735 exponential size policy can yield a poor container: when an 3736 element is inserted, a trigger policy might decide that there 3737 is no need to resize, as the table still contains unused 3738 entries; the probe sequence, however, might never reach any of 3739 the unused entries.</para> 3740 3741 <para>Unfortunately, this library cannot detect such problems at 3742 compilation (they are halting reducible). It therefore defines 3743 an exception class <classname>insert_error</classname> to throw an 3744 exception in this case.</para> 3745 3746 </section> 3747 3748 <section xml:id="policy_interaction.hashtrigger"> 3749 <info><title>hash/trigger</title></info> 3750 3751 <para>Some trigger policies are especially susceptible to poor 3752 hash functions. Suppose, as an extreme case, that the hash 3753 function transforms each key to the same hash value. After some 3754 inserts, a collision detecting policy will always indicate that 3755 the container needs to grow.</para> 3756 3757 <para>The library, therefore, by design, limits each operation to 3758 one resize. For each <classname>insert</classname>, for example, it queries 3759 only once whether a resize is needed.</para> 3760 3761 </section> 3762 3763 <section xml:id="policy_interaction.eqstorehash"> 3764 <info><title>equivalence functors/storing hash values/hash</title></info> 3765 3766 <para><classname>cc_hash_table</classname> and 3767 <classname>gp_hash_table</classname> are 3768 parametrized by an equivalence functor and by a 3769 <classname>Store_Hash</classname> parameter. If the latter parameter is 3770 <classname>true</classname>, then the container stores with each entry 3771 a hash value, and uses this value in case of collisions to 3772 determine whether to apply a hash value. This can lower the 3773 cost of collision for some types, but increase the cost of 3774 collisions for other types.</para> 3775 3776 <para>If a ranged-hash function or ranged probe function is 3777 directly supplied, however, then it makes no sense to store the 3778 hash value with each entry. This library's container will 3779 fail at compilation, by design, if this is attempted.</para> 3780 3781 </section> 3782 3783 <section xml:id="policy_interaction.sizeloadtrigger"> 3784 <info><title>size/load-check trigger</title></info> 3785 3786 <para>Assume a size policy issues an increasing sequence of sizes 3787 a, a q, a q<superscript>1</superscript>, a q<superscript>2</superscript>, ... For 3788 example, an exponential size policy might issue the sequence of 3789 sizes 8, 16, 32, 64, ...</para> 3790 3791 <para>If a load-check trigger policy is used, with loads 3792 ��<subscript>min</subscript> and ��<subscript>max</subscript>, 3793 respectively, then it is a good idea to have:</para> 3794 3795 <orderedlist> 3796 <listitem><para>��<subscript>max</subscript> ~ 1 / q</para></listitem> 3797 3798 <listitem><para>��<subscript>min</subscript> < 1 / (2 q)</para></listitem> 3799 </orderedlist> 3800 3801 <para>This will ensure that the amortized hash cost of each 3802 modifying operation is at most approximately 3.</para> 3803 3804 <para>��<subscript>min</subscript> ~ ��<subscript>max</subscript> is, in 3805 any case, a bad choice, and ��<subscript>min</subscript> > 3806 �� <subscript>max</subscript> is horrendous.</para> 3807 3808 </section> 3809 3810 </section> 3811 3812 </section> <!-- details --> 3813 3814 </section> <!-- hash --> 3815 3816 <!-- tree --> 3817 <section xml:id="pbds.design.container.tree"> 3818 <info><title>tree</title></info> 3819 3820 <section xml:id="container.tree.interface"> 3821 <info><title>Interface</title></info> 3822 3823 <para>The tree-based container has the following declaration:</para> 3824 <programlisting> 3825 template< 3826 typename Key, 3827 typename Mapped, 3828 typename Cmp_Fn = std::less<Key>, 3829 typename Tag = rb_tree_tag, 3830 template< 3831 typename Const_Node_Iterator, 3832 typename Node_Iterator, 3833 typename Cmp_Fn_, 3834 typename Allocator_> 3835 class Node_Update = null_node_update, 3836 typename Allocator = std::allocator<char> > 3837 class tree; 3838 </programlisting> 3839 3840 <para>The parameters have the following meaning:</para> 3841 3842 <orderedlist> 3843 <listitem> 3844 <para><classname>Key</classname> is the key type.</para></listitem> 3845 3846 <listitem> 3847 <para><classname>Mapped</classname> is the mapped-policy.</para></listitem> 3848 3849 <listitem> 3850 <para><classname>Cmp_Fn</classname> is a key comparison functor</para></listitem> 3851 3852 <listitem> 3853 <para><classname>Tag</classname> specifies which underlying data structure 3854 to use.</para></listitem> 3855 3856 <listitem> 3857 <para><classname>Node_Update</classname> is a policy for updating node 3858 invariants.</para></listitem> 3859 3860 <listitem> 3861 <para><classname>Allocator</classname> is an allocator 3862 type.</para></listitem> 3863 </orderedlist> 3864 3865 <para>The <classname>Tag</classname> parameter specifies which underlying 3866 data structure to use. Instantiating it by <classname>rb_tree_tag</classname>, <classname>splay_tree_tag</classname>, or 3867 <classname>ov_tree_tag</classname>, 3868 specifies an underlying red-black tree, splay tree, or 3869 ordered-vector tree, respectively; any other tag is illegal. 3870 Note that containers based on the former two contain more types 3871 and methods than the latter (e.g., 3872 <classname>reverse_iterator</classname> and <classname>rbegin</classname>), and different 3873 exception and invalidation guarantees.</para> 3874 3875 </section> 3876 3877 <section xml:id="container.tree.details"> 3878 <info><title>Details</title></info> 3879 3880 <section xml:id="container.tree.node"> 3881 <info><title>Node Invariants</title></info> 3882 3883 3884 <para>Consider the two trees in the graphic below, labels A and B. The first 3885 is a tree of floats; the second is a tree of pairs, each 3886 signifying a geometric line interval. Each element in a tree is refered to as a node of the tree. Of course, each of 3887 these trees can support the usual queries: the first can easily 3888 search for <classname>0.4</classname>; the second can easily search for 3889 <classname>std::make_pair(10, 41)</classname>.</para> 3890 3891 <para>Each of these trees can efficiently support other queries. 3892 The first can efficiently determine that the 2rd key in the 3893 tree is <constant>0.3</constant>; the second can efficiently determine 3894 whether any of its intervals overlaps 3895 <programlisting>std::make_pair(29,42)</programlisting> (useful in geometric 3896 applications or distributed file systems with leases, for 3897 example). It should be noted that an <classname>std::set</classname> can 3898 only solve these types of problems with linear complexity.</para> 3899 3900 <para>In order to do so, each tree stores some metadata in 3901 each node, and maintains node invariants (see <xref linkend="biblio.clrs2001"/>.) The first stores in 3902 each node the size of the sub-tree rooted at the node; the 3903 second stores at each node the maximal endpoint of the 3904 intervals at the sub-tree rooted at the node.</para> 3905 3906 <figure> 3907 <title>Tree node invariants</title> 3908 <mediaobject> 3909 <imageobject> 3910 <imagedata align="center" format="PNG" scale="100" 3911 fileref="../images/pbds_tree_node_invariants.png"/> 3912 </imageobject> 3913 <textobject> 3914 <phrase>Tree node invariants</phrase> 3915 </textobject> 3916 </mediaobject> 3917 </figure> 3918 3919 <para>Supporting such trees is difficult for a number of 3920 reasons:</para> 3921 3922 <orderedlist> 3923 <listitem><para>There must be a way to specify what a node's metadata 3924 should be (if any).</para></listitem> 3925 3926 <listitem><para>Various operations can invalidate node 3927 invariants. The graphic below shows how a right rotation, 3928 performed on A, results in B, with nodes x and y having 3929 corrupted invariants (the grayed nodes in C). The graphic shows 3930 how an insert, performed on D, results in E, with nodes x and y 3931 having corrupted invariants (the grayed nodes in F). It is not 3932 feasible to know outside the tree the effect of an operation on 3933 the nodes of the tree.</para></listitem> 3934 3935 <listitem><para>The search paths of standard associative containers are 3936 defined by comparisons between keys, and not through 3937 metadata.</para></listitem> 3938 3939 <listitem><para>It is not feasible to know in advance which methods trees 3940 can support. Besides the usual <classname>find</classname> method, the 3941 first tree can support a <classname>find_by_order</classname> method, while 3942 the second can support an <classname>overlaps</classname> method.</para></listitem> 3943 </orderedlist> 3944 3945 <figure> 3946 <title>Tree node invalidation</title> 3947 <mediaobject> 3948 <imageobject> 3949 <imagedata align="center" format="PNG" scale="100" 3950 fileref="../images/pbds_tree_node_invalidations.png"/> 3951 </imageobject> 3952 <textobject> 3953 <phrase>Tree node invalidation</phrase> 3954 </textobject> 3955 </mediaobject> 3956 </figure> 3957 3958 <para>These problems are solved by a combination of two means: 3959 node iterators, and template-template node updater 3960 parameters.</para> 3961 3962 <section xml:id="container.tree.node.iterators"> 3963 <info><title>Node Iterators</title></info> 3964 3965 3966 <para>Each tree-based container defines two additional iterator 3967 types, <classname>const_node_iterator</classname> 3968 and <classname>node_iterator</classname>. 3969 These iterators allow descending from a node to one of its 3970 children. Node iterator allow search paths different than those 3971 determined by the comparison functor. The <classname>tree</classname> 3972 supports the methods:</para> 3973 <programlisting> 3974 const_node_iterator 3975 node_begin() const; 3976 3977 node_iterator 3978 node_begin(); 3979 3980 const_node_iterator 3981 node_end() const; 3982 3983 node_iterator 3984 node_end(); 3985 </programlisting> 3986 3987 <para>The first pairs return node iterators corresponding to the 3988 root node of the tree; the latter pair returns node iterators 3989 corresponding to a just-after-leaf node.</para> 3990 </section> 3991 3992 <section xml:id="container.tree.node.updator"> 3993 <info><title>Node Updator</title></info> 3994 3995 <para>The tree-based containers are parametrized by a 3996 <classname>Node_Update</classname> template-template parameter. A 3997 tree-based container instantiates 3998 <classname>Node_Update</classname> to some 3999 <classname>node_update</classname> class, and publicly subclasses 4000 <classname>node_update</classname>. The graphic below shows this 4001 scheme, as well as some predefined policies (which are explained 4002 below).</para> 4003 4004 <figure> 4005 <title>A tree and its update policy</title> 4006 <mediaobject> 4007 <imageobject> 4008 <imagedata align="center" format="PNG" scale="100" 4009 fileref="../images/pbds_tree_node_updator_policy_cd.png"/> 4010 </imageobject> 4011 <textobject> 4012 <phrase>A tree and its update policy</phrase> 4013 </textobject> 4014 </mediaobject> 4015 </figure> 4016 4017 <para><classname>node_update</classname> (an instantiation of 4018 <classname>Node_Update</classname>) must define <classname>metadata_type</classname> as 4019 the type of metadata it requires. For order statistics, 4020 e.g., <classname>metadata_type</classname> might be <classname>size_t</classname>. 4021 The tree defines within each node a <classname>metadata_type</classname> 4022 object.</para> 4023 4024 <para><classname>node_update</classname> must also define the following method 4025 for restoring node invariants:</para> 4026 <programlisting> 4027 void 4028 operator()(node_iterator nd_it, const_node_iterator end_nd_it) 4029 </programlisting> 4030 4031 <para>In this method, <varname>nd_it</varname> is a 4032 <classname>node_iterator</classname> corresponding to a node whose 4033 A) all descendants have valid invariants, and B) its own 4034 invariants might be violated; <classname>end_nd_it</classname> is 4035 a <classname>const_node_iterator</classname> corresponding to a 4036 just-after-leaf node. This method should correct the node 4037 invariants of the node pointed to by 4038 <classname>nd_it</classname>. For example, say node x in the 4039 graphic below label A has an invalid invariant, but its' children, 4040 y and z have valid invariants. After the invocation, all three 4041 nodes should have valid invariants, as in label B.</para> 4042 4043 4044 <figure> 4045 <title>Restoring node invariants</title> 4046 <mediaobject> 4047 <imageobject> 4048 <imagedata align="center" format="PNG" scale="100" 4049 fileref="../images/pbds_restoring_node_invariants.png"/> 4050 </imageobject> 4051 <textobject> 4052 <phrase>Restoring node invariants</phrase> 4053 </textobject> 4054 </mediaobject> 4055 </figure> 4056 4057 <para>When a tree operation might invalidate some node invariant, 4058 it invokes this method in its <classname>node_update</classname> base to 4059 restore the invariant. For example, the graphic below shows 4060 an <function>insert</function> operation (point A); the tree performs some 4061 operations, and calls the update functor three times (points B, 4062 C, and D). (It is well known that any <function>insert</function>, 4063 <function>erase</function>, <function>split</function> or <function>join</function>, can restore 4064 all node invariants by a small number of node invariant updates (<xref linkend="biblio.clrs2001"/>) 4065 .</para> 4066 4067 <figure> 4068 <title>Insert update sequence</title> 4069 <mediaobject> 4070 <imageobject> 4071 <imagedata align="center" format="PNG" scale="100" 4072 fileref="../images/pbds_update_seq_diagram.png"/> 4073 </imageobject> 4074 <textobject> 4075 <phrase>Insert update sequence</phrase> 4076 </textobject> 4077 </mediaobject> 4078 </figure> 4079 4080 <para>To complete the description of the scheme, three questions 4081 need to be answered:</para> 4082 4083 <orderedlist> 4084 <listitem><para>How can a tree which supports order statistics define a 4085 method such as <classname>find_by_order</classname>?</para></listitem> 4086 4087 <listitem><para>How can the node updater base access methods of the 4088 tree?</para></listitem> 4089 4090 <listitem><para>How can the following cyclic dependency be resolved? 4091 <classname>node_update</classname> is a base class of the tree, yet it 4092 uses node iterators defined in the tree (its child).</para></listitem> 4093 </orderedlist> 4094 4095 <para>The first two questions are answered by the fact that 4096 <classname>node_update</classname> (an instantiation of 4097 <classname>Node_Update</classname>) is a <emphasis>public</emphasis> base class 4098 of the tree. Consequently:</para> 4099 4100 <orderedlist> 4101 <listitem><para>Any public methods of 4102 <classname>node_update</classname> are automatically methods of 4103 the tree (<xref linkend="biblio.alexandrescu01modern"/>). 4104 Thus an order-statistics node updater, 4105 <classname>tree_order_statistics_node_update</classname> defines 4106 the <function>find_by_order</function> method; any tree 4107 instantiated by this policy consequently supports this method as 4108 well.</para></listitem> 4109 4110 <listitem><para>In C++, if a base class declares a method as 4111 <literal>virtual</literal>, it is 4112 <literal>virtual</literal> in its subclasses. If 4113 <classname>node_update</classname> needs to access one of the 4114 tree's methods, say the member function 4115 <function>end</function>, it simply declares that method as 4116 <literal>virtual</literal> abstract.</para></listitem> 4117 </orderedlist> 4118 4119 <para>The cyclic dependency is solved through template-template 4120 parameters. <classname>Node_Update</classname> is parametrized by 4121 the tree's node iterators, its comparison functor, and its 4122 allocator type. Thus, instantiations of 4123 <classname>Node_Update</classname> have all information 4124 required.</para> 4125 4126 <para>This library assumes that constructing a metadata object and 4127 modifying it are exception free. Suppose that during some method, 4128 say <classname>insert</classname>, a metadata-related operation 4129 (e.g., changing the value of a metadata) throws an exception. Ack! 4130 Rolling back the method is unusually complex.</para> 4131 4132 <para>Previously, a distinction was made between redundant 4133 policies and null policies. Node invariants show a 4134 case where null policies are required.</para> 4135 4136 <para>Assume a regular tree is required, one which need not 4137 support order statistics or interval overlap queries. 4138 Seemingly, in this case a redundant policy - a policy which 4139 doesn't affect nodes' contents would suffice. This, would lead 4140 to the following drawbacks:</para> 4141 4142 <orderedlist> 4143 <listitem><para>Each node would carry a useless metadata object, wasting 4144 space.</para></listitem> 4145 4146 <listitem><para>The tree cannot know if its 4147 <classname>Node_Update</classname> policy actually modifies a 4148 node's metadata (this is halting reducible). In the graphic 4149 below, assume the shaded node is inserted. The tree would have 4150 to traverse the useless path shown to the root, applying 4151 redundant updates all the way.</para></listitem> 4152 </orderedlist> 4153 <figure> 4154 <title>Useless update path</title> 4155 <mediaobject> 4156 <imageobject> 4157 <imagedata align="center" format="PNG" scale="100" 4158 fileref="../images/pbds_rationale_null_node_updator.png"/> 4159 </imageobject> 4160 <textobject> 4161 <phrase>Useless update path</phrase> 4162 </textobject> 4163 </mediaobject> 4164 </figure> 4165 4166 4167 <para>A null policy class, <classname>null_node_update</classname> 4168 solves both these problems. The tree detects that node 4169 invariants are irrelevant, and defines all accordingly.</para> 4170 4171 </section> 4172 4173 </section> 4174 4175 <section xml:id="container.tree.details.split"> 4176 <info><title>Split and Join</title></info> 4177 4178 <para>Tree-based containers support split and join methods. 4179 It is possible to split a tree so that it passes 4180 all nodes with keys larger than a given key to a different 4181 tree. These methods have the following advantages over the 4182 alternative of externally inserting to the destination 4183 tree and erasing from the source tree:</para> 4184 4185 <orderedlist> 4186 <listitem><para>These methods are efficient - red-black trees are split 4187 and joined in poly-logarithmic complexity; ordered-vector 4188 trees are split and joined at linear complexity. The 4189 alternatives have super-linear complexity.</para></listitem> 4190 4191 <listitem><para>Aside from orders of growth, these operations perform 4192 few allocations and de-allocations. For red-black trees, allocations are not performed, 4193 and the methods are exception-free. </para></listitem> 4194 </orderedlist> 4195 </section> 4196 4197 </section> <!-- details --> 4198 4199 </section> <!-- tree --> 4200 4201 <!-- trie --> 4202 <section xml:id="pbds.design.container.trie"> 4203 <info><title>Trie</title></info> 4204 4205 <section xml:id="container.trie.interface"> 4206 <info><title>Interface</title></info> 4207 4208 <para>The trie-based container has the following declaration:</para> 4209 <programlisting> 4210 template<typename Key, 4211 typename Mapped, 4212 typename Cmp_Fn = std::less<Key>, 4213 typename Tag = pat_trie_tag, 4214 template<typename Const_Node_Iterator, 4215 typename Node_Iterator, 4216 typename E_Access_Traits_, 4217 typename Allocator_> 4218 class Node_Update = null_node_update, 4219 typename Allocator = std::allocator<char> > 4220 class trie; 4221 </programlisting> 4222 4223 <para>The parameters have the following meaning:</para> 4224 4225 <orderedlist> 4226 <listitem><para><classname>Key</classname> is the key type.</para></listitem> 4227 4228 <listitem><para><classname>Mapped</classname> is the mapped-policy.</para></listitem> 4229 4230 <listitem><para><classname>E_Access_Traits</classname> is described in below.</para></listitem> 4231 4232 <listitem><para><classname>Tag</classname> specifies which underlying data structure 4233 to use, and is described shortly.</para></listitem> 4234 4235 <listitem><para><classname>Node_Update</classname> is a policy for updating node 4236 invariants. This is described below.</para></listitem> 4237 4238 <listitem><para><classname>Allocator</classname> is an allocator 4239 type.</para></listitem> 4240 </orderedlist> 4241 4242 <para>The <classname>Tag</classname> parameter specifies which underlying 4243 data structure to use. Instantiating it by <classname>pat_trie_tag</classname>, specifies an 4244 underlying PATRICIA trie (explained shortly); any other tag is 4245 currently illegal.</para> 4246 4247 <para>Following is a description of a (PATRICIA) trie 4248 (this implementation follows <xref linkend="biblio.okasaki98mereable"/> and 4249 <xref linkend="biblio.filliatre2000ptset"/>). 4250 </para> 4251 4252 <para>A (PATRICIA) trie is similar to a tree, but with the 4253 following differences:</para> 4254 4255 <orderedlist> 4256 <listitem><para>It explicitly views keys as a sequence of elements. 4257 E.g., a trie can view a string as a sequence of 4258 characters; a trie can view a number as a sequence of 4259 bits.</para></listitem> 4260 4261 <listitem><para>It is not (necessarily) binary. Each node has fan-out n 4262 + 1, where n is the number of distinct 4263 elements.</para></listitem> 4264 4265 <listitem><para>It stores values only at leaf nodes.</para></listitem> 4266 4267 <listitem><para>Internal nodes have the properties that A) each has at 4268 least two children, and B) each shares the same prefix with 4269 any of its descendant.</para></listitem> 4270 </orderedlist> 4271 4272 <para>A (PATRICIA) trie has some useful properties:</para> 4273 4274 <orderedlist> 4275 <listitem><para>It can be configured to use large node fan-out, giving it 4276 very efficient find performance (albeit at insertion 4277 complexity and size).</para></listitem> 4278 4279 <listitem><para>It works well for common-prefix keys.</para></listitem> 4280 4281 <listitem><para>It can support efficiently queries such as which 4282 keys match a certain prefix. This is sometimes useful in file 4283 systems and routers, and for "type-ahead" aka predictive text matching 4284 on mobile devices.</para></listitem> 4285 </orderedlist> 4286 4287 4288 </section> 4289 4290 <section xml:id="container.trie.details"> 4291 <info><title>Details</title></info> 4292 4293 <section xml:id="container.trie.details.etraits"> 4294 <info><title>Element Access Traits</title></info> 4295 4296 <para>A trie inherently views its keys as sequences of elements. 4297 For example, a trie can view a string as a sequence of 4298 characters. A trie needs to map each of n elements to a 4299 number in {0, n - 1}. For example, a trie can map a 4300 character <varname>c</varname> to 4301 <programlisting>static_cast<size_t>(c)</programlisting>.</para> 4302 4303 <para>Seemingly, then, a trie can assume that its keys support 4304 (const) iterators, and that the <classname>value_type</classname> of this 4305 iterator can be cast to a <classname>size_t</classname>. There are several 4306 reasons, though, to decouple the mechanism by which the trie 4307 accesses its keys' elements from the trie:</para> 4308 4309 <orderedlist> 4310 <listitem><para>In some cases, the numerical value of an element is 4311 inappropriate. Consider a trie storing DNA strings. It is 4312 logical to use a trie with a fan-out of 5 = 1 + |{'A', 'C', 4313 'G', 'T'}|. This requires mapping 'T' to 3, though.</para></listitem> 4314 4315 <listitem><para>In some cases the keys' iterators are different than what 4316 is needed. For example, a trie can be used to search for 4317 common suffixes, by using strings' 4318 <classname>reverse_iterator</classname>. As another example, a trie mapping 4319 UNICODE strings would have a huge fan-out if each node would 4320 branch on a UNICODE character; instead, one can define an 4321 iterator iterating over 8-bit (or less) groups.</para></listitem> 4322 </orderedlist> 4323 4324 <para>trie is, 4325 consequently, parametrized by <classname>E_Access_Traits</classname> - 4326 traits which instruct how to access sequences' elements. 4327 <classname>string_trie_e_access_traits</classname> 4328 is a traits class for strings. Each such traits define some 4329 types, like:</para> 4330 <programlisting> 4331 typename E_Access_Traits::const_iterator 4332 </programlisting> 4333 4334 <para>is a const iterator iterating over a key's elements. The 4335 traits class must also define methods for obtaining an iterator 4336 to the first and last element of a key.</para> 4337 4338 <para>The graphic below shows a 4339 (PATRICIA) trie resulting from inserting the words: "I wish 4340 that I could ever see a poem lovely as a trie" (which, 4341 unfortunately, does not rhyme).</para> 4342 4343 <para>The leaf nodes contain values; each internal node contains 4344 two <classname>typename E_Access_Traits::const_iterator</classname> 4345 objects, indicating the maximal common prefix of all keys in 4346 the sub-tree. For example, the shaded internal node roots a 4347 sub-tree with leafs "a" and "as". The maximal common prefix is 4348 "a". The internal node contains, consequently, to const 4349 iterators, one pointing to <varname>'a'</varname>, and the other to 4350 <varname>'s'</varname>.</para> 4351 4352 <figure> 4353 <title>A PATRICIA trie</title> 4354 <mediaobject> 4355 <imageobject> 4356 <imagedata align="center" format="PNG" scale="100" 4357 fileref="../images/pbds_pat_trie.png"/> 4358 </imageobject> 4359 <textobject> 4360 <phrase>A PATRICIA trie</phrase> 4361 </textobject> 4362 </mediaobject> 4363 </figure> 4364 4365 </section> 4366 4367 <section xml:id="container.trie.details.node"> 4368 <info><title>Node Invariants</title></info> 4369 4370 <para>Trie-based containers support node invariants, as do 4371 tree-based containers. There are two minor 4372 differences, though, which, unfortunately, thwart sharing them 4373 sharing the same node-updating policies:</para> 4374 4375 <orderedlist> 4376 <listitem> 4377 <para>A trie's <classname>Node_Update</classname> template-template 4378 parameter is parametrized by <classname>E_Access_Traits</classname>, while 4379 a tree's <classname>Node_Update</classname> template-template parameter is 4380 parametrized by <classname>Cmp_Fn</classname>.</para></listitem> 4381 4382 <listitem><para>Tree-based containers store values in all nodes, while 4383 trie-based containers (at least in this implementation) store 4384 values in leafs.</para></listitem> 4385 </orderedlist> 4386 4387 <para>The graphic below shows the scheme, as well as some predefined 4388 policies (which are explained below).</para> 4389 4390 <figure> 4391 <title>A trie and its update policy</title> 4392 <mediaobject> 4393 <imageobject> 4394 <imagedata align="center" format="PNG" scale="100" 4395 fileref="../images/pbds_trie_node_updator_policy_cd.png"/> 4396 </imageobject> 4397 <textobject> 4398 <phrase>A trie and its update policy</phrase> 4399 </textobject> 4400 </mediaobject> 4401 </figure> 4402 4403 4404 <para>This library offers the following pre-defined trie node 4405 updating policies:</para> 4406 4407 <orderedlist> 4408 <listitem> 4409 <para> 4410 <classname>trie_order_statistics_node_update</classname> 4411 supports order statistics. 4412 </para> 4413 </listitem> 4414 4415 <listitem><para><classname>trie_prefix_search_node_update</classname> 4416 supports searching for ranges that match a given prefix.</para></listitem> 4417 4418 <listitem><para><classname>null_node_update</classname> 4419 is the null node updater.</para></listitem> 4420 </orderedlist> 4421 4422 </section> 4423 4424 <section xml:id="container.trie.details.split"> 4425 <info><title>Split and Join</title></info> 4426 <para>Trie-based containers support split and join methods; the 4427 rationale is equal to that of tree-based containers supporting 4428 these methods.</para> 4429 </section> 4430 4431 </section> <!-- details --> 4432 4433 </section> <!-- trie --> 4434 4435 <!-- list_update --> 4436 <section xml:id="pbds.design.container.list"> 4437 <info><title>List</title></info> 4438 4439 <section xml:id="container.list.interface"> 4440 <info><title>Interface</title></info> 4441 4442 <para>The list-based container has the following declaration:</para> 4443 <programlisting> 4444 template<typename Key, 4445 typename Mapped, 4446 typename Eq_Fn = std::equal_to<Key>, 4447 typename Update_Policy = move_to_front_lu_policy<>, 4448 typename Allocator = std::allocator<char> > 4449 class list_update; 4450 </programlisting> 4451 4452 <para>The parameters have the following meaning:</para> 4453 4454 <orderedlist> 4455 <listitem> 4456 <para> 4457 <classname>Key</classname> is the key type. 4458 </para> 4459 </listitem> 4460 4461 <listitem> 4462 <para> 4463 <classname>Mapped</classname> is the mapped-policy. 4464 </para> 4465 </listitem> 4466 4467 <listitem> 4468 <para> 4469 <classname>Eq_Fn</classname> is a key equivalence functor. 4470 </para> 4471 </listitem> 4472 4473 <listitem> 4474 <para> 4475 <classname>Update_Policy</classname> is a policy updating positions in 4476 the list based on access patterns. It is described in the 4477 following subsection. 4478 </para> 4479 </listitem> 4480 4481 <listitem> 4482 <para> 4483 <classname>Allocator</classname> is an allocator type. 4484 </para> 4485 </listitem> 4486 </orderedlist> 4487 4488 <para>A list-based associative container is a container that 4489 stores elements in a linked-list. It does not order the elements 4490 by any particular order related to the keys. List-based 4491 containers are primarily useful for creating "multimaps". In fact, 4492 list-based containers are designed in this library expressly for 4493 this purpose.</para> 4494 4495 <para>List-based containers might also be useful for some rare 4496 cases, where a key is encapsulated to the extent that only 4497 key-equivalence can be tested. Hash-based containers need to know 4498 how to transform a key into a size type, and tree-based containers 4499 need to know if some key is larger than another. List-based 4500 associative containers, conversely, only need to know if two keys 4501 are equivalent.</para> 4502 4503 <para>Since a list-based associative container does not order 4504 elements by keys, is it possible to order the list in some 4505 useful manner? Remarkably, many on-line competitive 4506 algorithms exist for reordering lists to reflect access 4507 prediction. (See <xref linkend="biblio.motwani95random"/> and <xref linkend="biblio.andrew04mtf"/>). 4508 </para> 4509 4510 </section> 4511 4512 <section xml:id="container.list.details"> 4513 <info><title>Details</title></info> 4514 <para> 4515 </para> 4516 <section xml:id="container.list.details.ds"> 4517 <info><title>Underlying Data Structure</title></info> 4518 4519 <para>The graphic below shows a 4520 simple list of integer keys. If we search for the integer 6, we 4521 are paying an overhead: the link with key 6 is only the fifth 4522 link; if it were the first link, it could be accessed 4523 faster.</para> 4524 4525 <figure> 4526 <title>A simple list</title> 4527 <mediaobject> 4528 <imageobject> 4529 <imagedata align="center" format="PNG" scale="100" 4530 fileref="../images/pbds_simple_list.png"/> 4531 </imageobject> 4532 <textobject> 4533 <phrase>A simple list</phrase> 4534 </textobject> 4535 </mediaobject> 4536 </figure> 4537 4538 <para>List-update algorithms reorder lists as elements are 4539 accessed. They try to determine, by the access history, which 4540 keys to move to the front of the list. Some of these algorithms 4541 require adding some metadata alongside each entry.</para> 4542 4543 <para>For example, in the graphic below label A shows the counter 4544 algorithm. Each node contains both a key and a count metadata 4545 (shown in bold). When an element is accessed (e.g. 6) its count is 4546 incremented, as shown in label B. If the count reaches some 4547 predetermined value, say 10, as shown in label C, the count is set 4548 to 0 and the node is moved to the front of the list, as in label 4549 D. 4550 </para> 4551 4552 <figure> 4553 <title>The counter algorithm</title> 4554 <mediaobject> 4555 <imageobject> 4556 <imagedata align="center" format="PNG" scale="100" 4557 fileref="../images/pbds_list_update.png"/> 4558 </imageobject> 4559 <textobject> 4560 <phrase>The counter algorithm</phrase> 4561 </textobject> 4562 </mediaobject> 4563 </figure> 4564 4565 4566 </section> 4567 4568 <section xml:id="container.list.details.policies"> 4569 <info><title>Policies</title></info> 4570 4571 <para>this library allows instantiating lists with policies 4572 implementing any algorithm moving nodes to the front of the 4573 list (policies implementing algorithms interchanging nodes are 4574 unsupported).</para> 4575 4576 <para>Associative containers based on lists are parametrized by a 4577 <classname>Update_Policy</classname> parameter. This parameter defines the 4578 type of metadata each node contains, how to create the 4579 metadata, and how to decide, using this metadata, whether to 4580 move a node to the front of the list. A list-based associative 4581 container object derives (publicly) from its update policy. 4582 </para> 4583 4584 <para>An instantiation of <classname>Update_Policy</classname> must define 4585 internally <classname>update_metadata</classname> as the metadata it 4586 requires. Internally, each node of the list contains, besides 4587 the usual key and data, an instance of <classname>typename 4588 Update_Policy::update_metadata</classname>.</para> 4589 4590 <para>An instantiation of <classname>Update_Policy</classname> must define 4591 internally two operators:</para> 4592 <programlisting> 4593 update_metadata 4594 operator()(); 4595 4596 bool 4597 operator()(update_metadata &); 4598 </programlisting> 4599 4600 <para>The first is called by the container object, when creating a 4601 new node, to create the node's metadata. The second is called 4602 by the container object, when a node is accessed ( 4603 when a find operation's key is equivalent to the key of the 4604 node), to determine whether to move the node to the front of 4605 the list. 4606 </para> 4607 4608 <para>The library contains two predefined implementations of 4609 list-update policies. The first 4610 is <classname>lu_counter_policy</classname>, which implements the 4611 counter algorithm described above. The second is 4612 <classname>lu_move_to_front_policy</classname>, 4613 which unconditionally move an accessed element to the front of 4614 the list. The latter type is very useful in this library, 4615 since there is no need to associate metadata with each element. 4616 (See <xref linkend="biblio.andrew04mtf"/> 4617 </para> 4618 4619 </section> 4620 4621 <section xml:id="container.list.details.mapped"> 4622 <info><title>Use in Multimaps</title></info> 4623 4624 <para>In this library, there are no equivalents for the standard's 4625 multimaps and multisets; instead one uses an associative 4626 container mapping primary keys to secondary keys.</para> 4627 4628 <para>List-based containers are especially useful as associative 4629 containers for secondary keys. In fact, they are implemented 4630 here expressly for this purpose.</para> 4631 4632 <para>To begin with, these containers use very little per-entry 4633 structure memory overhead, since they can be implemented as 4634 singly-linked lists. (Arrays use even lower per-entry memory 4635 overhead, but they are less flexible in moving around entries, 4636 and have weaker invalidation guarantees).</para> 4637 4638 <para>More importantly, though, list-based containers use very 4639 little per-container memory overhead. The memory overhead of an 4640 empty list-based container is practically that of a pointer. 4641 This is important for when they are used as secondary 4642 associative-containers in situations where the average ratio of 4643 secondary keys to primary keys is low (or even 1).</para> 4644 4645 <para>In order to reduce the per-container memory overhead as much 4646 as possible, they are implemented as closely as possible to 4647 singly-linked lists.</para> 4648 4649 <orderedlist> 4650 <listitem> 4651 <para> 4652 List-based containers do not store internally the number 4653 of values that they hold. This means that their <function>size</function> 4654 method has linear complexity (just like <classname>std::list</classname>). 4655 Note that finding the number of equivalent-key values in a 4656 standard multimap also has linear complexity (because it must be 4657 done, via <function>std::distance</function> of the 4658 multimap's <function>equal_range</function> method), but usually with 4659 higher constants. 4660 </para> 4661 </listitem> 4662 4663 <listitem> 4664 <para> 4665 Most associative-container objects each hold a policy 4666 object (a hash-based container object holds a 4667 hash functor). List-based containers, conversely, only have 4668 class-wide policy objects. 4669 </para> 4670 </listitem> 4671 </orderedlist> 4672 4673 4674 </section> 4675 4676 </section> <!-- details --> 4677 4678 </section> <!-- list --> 4679 4680 4681 <!-- priority_queue --> 4682 <section xml:id="pbds.design.container.priority_queue"> 4683 <info><title>Priority Queue</title></info> 4684 4685 <section xml:id="container.priority_queue.interface"> 4686 <info><title>Interface</title></info> 4687 4688 <para>The priority queue container has the following 4689 declaration: 4690 </para> 4691 <programlisting> 4692 template<typename Value_Type, 4693 typename Cmp_Fn = std::less<Value_Type>, 4694 typename Tag = pairing_heap_tag, 4695 typename Allocator = std::allocator<char > > 4696 class priority_queue; 4697 </programlisting> 4698 4699 <para>The parameters have the following meaning:</para> 4700 4701 <orderedlist> 4702 <listitem><para><classname>Value_Type</classname> is the value type.</para></listitem> 4703 4704 <listitem><para><classname>Cmp_Fn</classname> is a value comparison functor</para></listitem> 4705 4706 <listitem><para><classname>Tag</classname> specifies which underlying data structure 4707 to use.</para></listitem> 4708 4709 <listitem><para><classname>Allocator</classname> is an allocator 4710 type.</para></listitem> 4711 </orderedlist> 4712 4713 <para>The <classname>Tag</classname> parameter specifies which underlying 4714 data structure to use. Instantiating it by<classname>pairing_heap_tag</classname>,<classname>binary_heap_tag</classname>, 4715 <classname>binomial_heap_tag</classname>, 4716 <classname>rc_binomial_heap_tag</classname>, 4717 or <classname>thin_heap_tag</classname>, 4718 specifies, respectively, 4719 an underlying pairing heap (<xref linkend="biblio.fredman86pairing"/>), 4720 binary heap (<xref linkend="biblio.clrs2001"/>), 4721 binomial heap (<xref linkend="biblio.clrs2001"/>), 4722 a binomial heap with a redundant binary counter (<xref linkend="biblio.maverik_lowerbounds"/>), 4723 or a thin heap (<xref linkend="biblio.kt99fat_heaps"/>). 4724 </para> 4725 4726 <para> 4727 As mentioned in the tutorial, 4728 <classname>__gnu_pbds::priority_queue</classname> shares most of the 4729 same interface with <classname>std::priority_queue</classname>. 4730 E.g. if <varname>q</varname> is a priority queue of type 4731 <classname>Q</classname>, then <function>q.top()</function> will 4732 return the "largest" value in the container (according to 4733 <classname>typename 4734 Q::cmp_fn</classname>). <classname>__gnu_pbds::priority_queue</classname> 4735 has a larger (and very slightly different) interface than 4736 <classname>std::priority_queue</classname>, however, since typically 4737 <classname>push</classname> and <classname>pop</classname> are deemed 4738 insufficient for manipulating priority-queues. </para> 4739 4740 <para>Different settings require different priority-queue 4741 implementations which are described in later; see traits 4742 discusses ways to differentiate between the different traits of 4743 different implementations.</para> 4744 4745 4746 </section> 4747 4748 <section xml:id="container.priority_queue.details"> 4749 <info><title>Details</title></info> 4750 4751 <section xml:id="container.priority_queue.details.iterators"> 4752 <info><title>Iterators</title></info> 4753 4754 <para>There are many different underlying-data structures for 4755 implementing priority queues. Unfortunately, most such 4756 structures are oriented towards making <function>push</function> and 4757 <function>top</function> efficient, and consequently don't allow efficient 4758 access of other elements: for instance, they cannot support an efficient 4759 <function>find</function> method. In the use case where it 4760 is important to both access and "do something with" an 4761 arbitrary value, one would be out of luck. For example, many graph algorithms require 4762 modifying a value (typically increasing it in the sense of the 4763 priority queue's comparison functor).</para> 4764 4765 <para>In order to access and manipulate an arbitrary value in a 4766 priority queue, one needs to reference the internals of the 4767 priority queue from some form of an associative container - 4768 this is unavoidable. Of course, in order to maintain the 4769 encapsulation of the priority queue, this needs to be done in a 4770 way that minimizes exposure to implementation internals.</para> 4771 4772 <para>In this library the priority queue's <function>insert</function> 4773 method returns an iterator, which if valid can be used for subsequent <function>modify</function> and 4774 <function>erase</function> operations. This both preserves the priority 4775 queue's encapsulation, and allows accessing arbitrary values (since the 4776 returned iterators from the <function>push</function> operation can be 4777 stored in some form of associative container).</para> 4778 4779 <para>Priority queues' iterators present a problem regarding their 4780 invalidation guarantees. One assumes that calling 4781 <function>operator++</function> on an iterator will associate it 4782 with the "next" value. Priority-queues are 4783 self-organizing: each operation changes what the "next" value 4784 means. Consequently, it does not make sense that <function>push</function> 4785 will return an iterator that can be incremented - this can have 4786 no possible use. Also, as in the case of hash-based containers, 4787 it is awkward to define if a subsequent <function>push</function> operation 4788 invalidates a prior returned iterator: it invalidates it in the 4789 sense that its "next" value is not related to what it 4790 previously considered to be its "next" value. However, it might not 4791 invalidate it, in the sense that it can be 4792 de-referenced and used for <function>modify</function> and <function>erase</function> 4793 operations.</para> 4794 4795 <para>Similarly to the case of the other unordered associative 4796 containers, this library uses a distinction between 4797 point-type and range type iterators. A priority queue's <classname>iterator</classname> can always be 4798 converted to a <classname>point_iterator</classname>, and a 4799 <classname>const_iterator</classname> can always be converted to a 4800 <classname>point_const_iterator</classname>.</para> 4801 4802 <para>The following snippet demonstrates manipulating an arbitrary 4803 value:</para> 4804 <programlisting> 4805 // A priority queue of integers. 4806 priority_queue<int > p; 4807 4808 // Insert some values into the priority queue. 4809 priority_queue<int >::point_iterator it = p.push(0); 4810 4811 p.push(1); 4812 p.push(2); 4813 4814 // Now modify a value. 4815 p.modify(it, 3); 4816 4817 assert(p.top() == 3); 4818 </programlisting> 4819 4820 4821 <para>It should be noted that an alternative design could embed an 4822 associative container in a priority queue. Could, but most 4823 probably should not. To begin with, it should be noted that one 4824 could always encapsulate a priority queue and an associative 4825 container mapping values to priority queue iterators with no 4826 performance loss. One cannot, however, "un-encapsulate" a priority 4827 queue embedding an associative container, which might lead to 4828 performance loss. Assume, that one needs to associate each value 4829 with some data unrelated to priority queues. Then using 4830 this library's design, one could use an 4831 associative container mapping each value to a pair consisting of 4832 this data and a priority queue's iterator. Using the embedded 4833 method would need to use two associative containers. Similar 4834 problems might arise in cases where a value can reside 4835 simultaneously in many priority queues.</para> 4836 4837 </section> 4838 4839 4840 <section xml:id="container.priority_queue.details.d"> 4841 <info><title>Underlying Data Structure</title></info> 4842 4843 <para>There are three main implementations of priority queues: the 4844 first employs a binary heap, typically one which uses a 4845 sequence; the second uses a tree (or forest of trees), which is 4846 typically less structured than an associative container's tree; 4847 the third simply uses an associative container. These are 4848 shown in the graphic below, in labels A1 and A2, label B, and label C.</para> 4849 4850 <figure> 4851 <title>Underlying Priority-Queue Data-Structures.</title> 4852 <mediaobject> 4853 <imageobject> 4854 <imagedata align="center" format="PNG" scale="100" 4855 fileref="../images/pbds_priority_queue_different_underlying_dss.png"/> 4856 </imageobject> 4857 <textobject> 4858 <phrase>Underlying Priority-Queue Data-Structures.</phrase> 4859 </textobject> 4860 </mediaobject> 4861 </figure> 4862 4863 <para>Roughly speaking, any value that is both pushed and popped 4864 from a priority queue must incur a logarithmic expense (in the 4865 amortized sense). Any priority queue implementation that would 4866 avoid this, would violate known bounds on comparison-based 4867 sorting (see <xref linkend="biblio.clrs2001"/> and <xref linkend="biblio.brodal96priority"/>). 4868 </para> 4869 4870 <para>Most implementations do 4871 not differ in the asymptotic amortized complexity of 4872 <function>push</function> and <function>pop</function> operations, but they differ in 4873 the constants involved, in the complexity of other operations 4874 (e.g., <function>modify</function>), and in the worst-case 4875 complexity of single operations. In general, the more 4876 "structured" an implementation (i.e., the more internal 4877 invariants it possesses) - the higher its amortized complexity 4878 of <function>push</function> and <function>pop</function> operations.</para> 4879 4880 <para>This library implements different algorithms using a 4881 single class: <classname>priority_queue</classname>. 4882 Instantiating the <classname>Tag</classname> template parameter, "selects" 4883 the implementation:</para> 4884 4885 <orderedlist> 4886 <listitem><para> 4887 Instantiating <classname>Tag = binary_heap_tag</classname> creates 4888 a binary heap of the form in represented in the graphic with labels A1 or A2. The former is internally 4889 selected by priority_queue 4890 if <classname>Value_Type</classname> is instantiated by a primitive type 4891 (e.g., an <type>int</type>); the latter is 4892 internally selected for all other types (e.g., 4893 <classname>std::string</classname>). This implementations is relatively 4894 unstructured, and so has good <classname>push</classname> and <classname>pop</classname> 4895 performance; it is the "best-in-kind" for primitive 4896 types, e.g., <type>int</type>s. Conversely, it has 4897 high worst-case performance, and can support only linear-time 4898 <function>modify</function> and <function>erase</function> operations.</para></listitem> 4899 4900 <listitem><para>Instantiating <classname>Tag = 4901 pairing_heap_tag</classname> creates a pairing heap of the form 4902 in represented by label B in the graphic above. This 4903 implementations too is relatively unstructured, and so has good 4904 <function>push</function> and <function>pop</function> 4905 performance; it is the "best-in-kind" for non-primitive types, 4906 e.g., <classname>std:string</classname>s. It also has very good 4907 worst-case <function>push</function> and 4908 <function>join</function> performance (O(1)), but has high 4909 worst-case <function>pop</function> 4910 complexity.</para></listitem> 4911 4912 <listitem><para>Instantiating <classname>Tag = 4913 binomial_heap_tag</classname> creates a binomial heap of the 4914 form repsented by label B in the graphic above. This 4915 implementations is more structured than a pairing heap, and so 4916 has worse <function>push</function> and <function>pop</function> 4917 performance. Conversely, it has sub-linear worst-case bounds for 4918 <function>pop</function>, e.g., and so it might be preferred in 4919 cases where responsiveness is important.</para></listitem> 4920 4921 <listitem><para>Instantiating <classname>Tag = 4922 rc_binomial_heap_tag</classname> creates a binomial heap of the 4923 form represented in label B above, accompanied by a redundant 4924 counter which governs the trees. This implementations is 4925 therefore more structured than a binomial heap, and so has worse 4926 <function>push</function> and <function>pop</function> 4927 performance. Conversely, it guarantees O(1) 4928 <function>push</function> complexity, and so it might be 4929 preferred in cases where the responsiveness of a binomial heap 4930 is insufficient.</para></listitem> 4931 4932 <listitem><para>Instantiating <classname>Tag = 4933 thin_heap_tag</classname> creates a thin heap of the form 4934 represented by the label B in the graphic above. This 4935 implementations too is more structured than a pairing heap, and 4936 so has worse <function>push</function> and 4937 <function>pop</function> performance. Conversely, it has better 4938 worst-case and identical amortized complexities than a Fibonacci 4939 heap, and so might be more appropriate for some graph 4940 algorithms.</para></listitem> 4941 </orderedlist> 4942 4943 <para>Of course, one can use any order-preserving associative 4944 container as a priority queue, as in the graphic above label C, possibly by creating an adapter class 4945 over the associative container (much as 4946 <classname>std::priority_queue</classname> can adapt <classname>std::vector</classname>). 4947 This has the advantage that no cross-referencing is necessary 4948 at all; the priority queue itself is an associative container. 4949 Most associative containers are too structured to compete with 4950 priority queues in terms of <function>push</function> and <function>pop</function> 4951 performance.</para> 4952 4953 4954 4955 </section> 4956 4957 <section xml:id="container.priority_queue.details.traits"> 4958 <info><title>Traits</title></info> 4959 4960 <para>It would be nice if all priority queues could 4961 share exactly the same behavior regardless of implementation. Sadly, this is not possible. Just one for instance is in join operations: joining 4962 two binary heaps might throw an exception (not corrupt 4963 any of the heaps on which it operates), but joining two pairing 4964 heaps is exception free.</para> 4965 4966 <para>Tags and traits are very useful for manipulating generic 4967 types. <classname>__gnu_pbds::priority_queue</classname> 4968 publicly defines <classname>container_category</classname> as one of the tags. Given any 4969 container <classname>Cntnr</classname>, the tag of the underlying 4970 data structure can be found via <classname>typename 4971 Cntnr::container_category</classname>; this is one of the possible tags shown in the graphic below. 4972 </para> 4973 4974 <figure> 4975 <title>Priority-Queue Data-Structure Tags.</title> 4976 <mediaobject> 4977 <imageobject> 4978 <imagedata align="center" format="PNG" scale="100" 4979 fileref="../images/pbds_priority_queue_tag_hierarchy.png"/> 4980 </imageobject> 4981 <textobject> 4982 <phrase>Priority-Queue Data-Structure Tags.</phrase> 4983 </textobject> 4984 </mediaobject> 4985 </figure> 4986 4987 4988 <para>Additionally, a traits mechanism can be used to query a 4989 container type for its attributes. Given any container 4990 <classname>Cntnr</classname>, then <programlisting>__gnu_pbds::container_traits<Cntnr></programlisting> 4991 is a traits class identifying the properties of the 4992 container.</para> 4993 4994 <para>To find if a container might throw if two of its objects are 4995 joined, one can use 4996 <programlisting> 4997 container_traits<Cntnr>::split_join_can_throw 4998 </programlisting> 4999 </para> 5000 5001 <para> 5002 Different priority-queue implementations have different invalidation guarantees. This is 5003 especially important, since there is no way to access an arbitrary 5004 value of priority queues except for iterators. Similarly to 5005 associative containers, one can use 5006 <programlisting> 5007 container_traits<Cntnr>::invalidation_guarantee 5008 </programlisting> 5009 to get the invalidation guarantee type of a priority queue.</para> 5010 5011 <para>It is easy to understand from the graphic above, what <classname>container_traits<Cntnr>::invalidation_guarantee</classname> 5012 will be for different implementations. All implementations of 5013 type represented by label B have <classname>point_invalidation_guarantee</classname>: 5014 the container can freely internally reorganize the nodes - 5015 range-type iterators are invalidated, but point-type iterators 5016 are always valid. Implementations of type represented by labels A1 and A2 have <classname>basic_invalidation_guarantee</classname>: 5017 the container can freely internally reallocate the array - both 5018 point-type and range-type iterators might be invalidated.</para> 5019 5020 <para> 5021 This has major implications, and constitutes a good reason to avoid 5022 using binary heaps. A binary heap can perform <function>modify</function> 5023 or <function>erase</function> efficiently given a valid point-type 5024 iterator. However, in order to supply it with a valid point-type 5025 iterator, one needs to iterate (linearly) over all 5026 values, then supply the relevant iterator (recall that a 5027 range-type iterator can always be converted to a point-type 5028 iterator). This means that if the number of <function>modify</function> or 5029 <function>erase</function> operations is non-negligible (say 5030 super-logarithmic in the total sequence of operations) - binary 5031 heaps will perform badly. 5032 </para> 5033 5034 </section> 5035 5036 </section> <!-- details --> 5037 5038 </section> <!-- priority_queue --> 5039 5040 5041 5042 </section> <!-- container --> 5043 5044 </section> <!-- design --> 5045 5046 5047 5048 <!-- S04: Test --> 5049 <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" parse="xml" 5050 href="test_policy_data_structures.xml"> 5051 </xi:include> 5052 5053 <!-- S05: Reference/Acknowledgments --> 5054 <section xml:id="pbds.ack"> 5055 <info><title>Acknowledgments</title></info> 5056 <?dbhtml filename="policy_data_structures_ack.html"?> 5057 5058 <para> 5059 Written by Ami Tavory and Vladimir Dreizin (IBM Haifa Research 5060 Laboratories), and Benjamin Kosnik (Red Hat). 5061 </para> 5062 5063 <para> 5064 This library was partially written at IBM's Haifa Research Labs. 5065 It is based heavily on policy-based design and uses many useful 5066 techniques from Modern C++ Design: Generic Programming and Design 5067 Patterns Applied by Andrei Alexandrescu. 5068 </para> 5069 5070 <para> 5071 Two ideas are borrowed from the SGI-STL implementation: 5072 </para> 5073 5074 <orderedlist> 5075 <listitem> 5076 <para> 5077 The prime-based resize policies use a list of primes taken from 5078 the SGI-STL implementation. 5079 </para> 5080 </listitem> 5081 5082 <listitem> 5083 <para> 5084 The red-black trees contain both a root node and a header node 5085 (containing metadata), connected in a way that forward and 5086 reverse iteration can be performed efficiently. 5087 </para> 5088 </listitem> 5089 </orderedlist> 5090 5091 <para> 5092 Some test utilities borrow ideas from 5093 <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.boost.org/doc/libs/release/libs/timer/index.html">boost::timer</link>. 5094 </para> 5095 5096 <para> 5097 We would like to thank Scott Meyers for useful comments (without 5098 attributing to him any flaws in the design or implementation of the 5099 library). 5100 </para> 5101 <para>We would like to thank Matt Austern for the suggestion to 5102 include tries.</para> 5103 </section> 5104 5105 <!-- S06: Biblio --> 5106<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" parse="xml" 5107 href="policy_data_structures_biblio.xml"> 5108</xi:include> 5109 5110</chapter> 5111