1<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 2 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 3 4<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> 5<head> 6 <meta name="generator" content= 7 "HTML Tidy for Linux/x86 (vers 12 April 2005), see www.w3.org" /> 8 9 <title>Motivation</title> 10 <meta http-equiv="Content-Type" content= 11 "text/html; charset=us-ascii" /> 12 </head> 13 14<body> 15 <div id="page"> 16 <h1>Motivation</h1> 17 18 <p>Many fine associative-container libraries were already 19 written, most notably, the STL's associative containers. Why 20 then write another library? This section shows some possible 21 advantages of this library, when considering the challenges in 22 <a href="introduction.html">Introduction</a>. Many of these 23 points stem from the fact that the STL introduced 24 associative-containers in a two-step process (first 25 standardizing tree-based containers, only then adding 26 hash-based containers, which are fundamentally different), did 27 not standardize priority queues as containers, and (in our 28 opinion) overloads the iterator concept.</p> 29 30 <h2><a name="assoc" id="assoc">Associative Containers</a></h2> 31 32 <h3><a name="assoc_policies" id="assoc_policies">More 33 Configuration Choices</a></h3> 34 35 <p>Associative containers require a relatively large number of 36 policies to function efficiently in various settings. In some 37 cases this is needed for making their common operations more 38 efficient, and in other cases this allows them to support a 39 larger set of operations</p> 40 41 <ol> 42 <li>Hash-based containers, for example, support look-up and 43 insertion methods (<i>e.g.</i>, <tt>find</tt> and 44 <tt>insert</tt>). In order to locate elements quickly, they 45 are supplied a hash functor, which instruct how to transform 46 a key object into some size type; <i>e.g.</i>, a hash functor 47 might transform <tt>"hello"</tt> into <tt>1123002298</tt>. A 48 hash table, though, requires transforming each key object 49 into some size-type type in some specific domain; 50 <i>e.g.</i>, a hash table with a 128-long table might 51 transform <tt>"hello"</tt> into position 63. The policy by 52 which the hash value is transformed into a position within 53 the table can dramatically affect performance (see <a href= 54 "hash_based_containers.html#hash_policies">Design::Associative 55 Containers::Hash-Based Containers::Hash Policies</a>). 56 Hash-based containers also do not resize naturally (as 57 opposed to tree-based containers, for example). The 58 appropriate resize policy is unfortunately intertwined with 59 the policy that transforms hash value into a position within 60 the table (see <a href= 61 "hash_based_containers.html#resize_policies">Design::Associative 62 Containers::Hash-Based Containers::Resize Policies</a>). 63 64 <p><a href= 65 "assoc_performance_tests.html#hash_based">Associative-Container 66 Performance Tests::Hash-Based Containers</a> quantifies 67 some of these points.</p> 68 </li> 69 70 <li>Tree-based containers, for example, also support look-up 71 and insertion methods, and are primarily useful when 72 maintaining order between elements is important. In some 73 cases, though, one can utilize their balancing algorithms for 74 completely different purposes. 75 76 <p>Figure <a href="#node_invariants">Metadata for 77 order-statistics and interval intersections</a>-A, for 78 example, shows a tree whose each node contains two entries: 79 a floating-point key, and some size-type <i>metadata</i> 80 (in bold beneath it) that is the number of nodes in the 81 sub-tree. (<i>E.g.</i>, the root has key 0.99, and has 5 82 nodes (including itself) in its sub-tree.) A container based 83 on this data structure can obviously answer efficiently 84 whether 0.3 is in the container object, but it can also 85 answer what is the order of 0.3 among all those in the 86 container object [<a href= 87 "references.html#clrs2001">clrs2001</a>] (see <a href= 88 "assoc_examples.html#tree_like_based">Associative Container 89 Examples::Tree-Like-Based Containers (Trees and 90 Tries)</a>).</p> 91 92 <p>As another example, Figure <a href= 93 "#node_invariants">Metadata for order-statistics and 94 interval intersections</a>-B shows a tree whose each node 95 contains two entries: a half-open geometric line interval, 96 and a number <i>metadata</i> (in bold beneath it) that is 97 the largest endpoint of all intervals in its sub-tree. 98 (<i>E.g.</i>, the root describes the interval <i>[20, 99 36)</i>, and the largest endpoint in its sub-tree is 99.) A 100 container based on this data structure can obviously answer 101 efficiently whether <i>[3, 41)</i> is in the container 102 object, but it can also answer efficiently whether the 103 container object has intervals that intersect <i>[3, 104 41)</i> (see <a href= 105 "assoc_examples.html#tree_like_based">Associative Container 106 Examples::Tree-Like-Based Containers (Trees and 107 Tries)</a>). These types of queries are very useful in 108 geometric algorithms and lease-management algorithms.</p> 109 110 <p>It is important to note, however, that as the trees are 111 modified, their internal structure changes. To maintain 112 these invariants, one must supply some policy that is aware 113 of these changes (see <a href= 114 "tree_based_containers.html#invariants">Design::Associative 115 Containers::Tree-Based Containers::Node Invariants</a>); 116 without this, it would be better to use a linked list (in 117 itself very efficient for these purposes).</p> 118 119 <p><a href= 120 "assoc_performance_tests.html#tree_like_based">Associative-Container 121 Performance Tests::Tree-Like-Based Containers</a> 122 quantifies some of these points.</p> 123 </li> 124 </ol> 125 126 <h6 class="c1"><a name="node_invariants" id= 127 "node_invariants"><img src="node_invariants.png" alt= 128 "no image" /></a></h6> 129 130 <h6 class="c1">Metadata for order-statistics and interval 131 intersections.</h6> 132 133 <h3><a name="assoc_ds_genericity" id="assoc_ds_genericity">More 134 Data Structures and Traits</a></h3> 135 136 <p>The STL contains associative containers based on red-black 137 trees and collision-chaining hash tables. These are obviously 138 very useful, but they are not ideal for all types of 139 settings.</p> 140 141 <p>Figure <a href= 142 "#different_underlying_data_structures">Different underlying 143 data structures</a> shows different underlying data structures 144 (the ones currently supported in <tt>pb_ds</tt>). A shows a 145 collision-chaining hash-table, B shows a probing hash-table, C 146 shows a red-black tree, D shows a splay tree, E shows a tree 147 based on an ordered vector(implicit in the order of the 148 elements), F shows a PATRICIA trie, and G shows a list-based 149 container with update policies.</p> 150 151 <p>Each of these data structures has some performance benefits, 152 in terms of speed, size or both (see <a href= 153 "assoc_performance_tests.html">Associative-Container 154 Performance Tests</a> and <a href= 155 "assoc_performance_tests.html#dss_family_choice">Associative-Container 156 Performance Tests::Observations::Underlying Data-Structure 157 Families</a>). For now, though, note that <i>e.g.</i>, 158 vector-based trees and probing hash tables manipulate memory 159 more efficiently than red-black trees and collision-chaining 160 hash tables, and that list-based associative containers are 161 very useful for constructing "multimaps" (see <a href= 162 "#assoc_mapping_semantics">Alternative to Multiple Equivalent 163 Keys</a>, <a href= 164 "assoc_performance_tests.html#multimaps">Associative Container 165 Performance Tests::Multimaps</a>, and <a href= 166 "assoc_performance_tests.html#msc">Associative Container 167 Performance Tests::Observations::Mapping-Semantics 168 Considerations</a>).</p> 169 170 <h6 class="c1"><a name="different_underlying_data_structures" 171 id="different_underlying_data_structures"><img src= 172 "different_underlying_dss.png" alt="no image" /></a></h6> 173 174 <h6 class="c1">Different underlying data structures.</h6> 175 176 <p>Now consider a function manipulating a generic associative 177 container, <i>e.g.</i>,</p> 178 <pre> 179<b>template</b>< 180 <b>class</b> Cntnr> 181<b>int</b> 182 some_op_sequence 183 (Cntnr &r_cnt) 184{ 185 ... 186} 187</pre> 188 189 <p>Ideally, the underlying data structure of <tt>Cntnr</tt> 190 would not affect what can be done with <tt>r_cnt</tt>. 191 Unfortunately, this is not the case.</p> 192 193 <p>For example, if <tt>Cntnr</tt> is <tt>std::map</tt>, then 194 the function can use <tt>std::for_each(r_cnt.find(foo), 195 r_cnt.find(bar), foobar)</tt> in order to apply <tt>foobar</tt> 196 to all elements between <tt>foo</tt> and <tt>bar</tt>. If 197 <tt>Cntnr</tt> is a hash-based container, then this call's 198 results are undefined.</p> 199 200 <p>Also, if <tt>Cntnr</tt> is tree-based, the type and object 201 of the comparison functor can be accessed. If <tt>Cntnr</tt> is 202 hash based, these queries are nonsensical.</p> 203 204 <p>There are various other differences based on the container's 205 underlying data structure. For one, they can be constructed by, 206 and queried for, different policies. Furthermore:</p> 207 208 <ol> 209 <li>Containers based on C, D, E and F store elements in a 210 meaningful order; the others store elements in a meaningless 211 (and probably time-varying) order. By implication, only 212 containers based on C, D, E and F can support erase 213 operations taking an iterator and returning an iterator to 214 the following element without performance loss (see <a href= 215 "#assoc_ers_methods">Slightly Different Methods::Methods 216 Related to Erase</a>).</li> 217 218 <li>Containers based on C, D, E, and F can be split and 219 joined efficiently, while the others cannot. Containers based 220 on C and D, furthermore, can guarantee that this is 221 exception-free; containers based on E cannot guarantee 222 this.</li> 223 224 <li>Containers based on all but E can guarantee that erasing 225 an element is exception free; containers based on E cannot 226 guarantee this. Containers based on all but B and E can 227 guarantee that modifying an object of their type does not 228 invalidate iterators or references to their elements, while 229 containers based on B and E cannot. Containers based on C, D, 230 and E can furthermore make a stronger guarantee, namely that 231 modifying an object of their type does not affect the order 232 of iterators.</li> 233 </ol> 234 235 <p>A unified tag and traits system (as used for the STL's 236 iterators, for example) can ease generic manipulation of 237 associative containers based on different underlying 238 data structures (see <a href= 239 "tutorial.html#assoc_ds_gen">Tutorial::Associative 240 Containers::Determining Containers' Attributes</a> and <a href= 241 "ds_gen.html#container_traits">Design::Associative 242 Containers::Data-Structure Genericity::Data-Structure Tags and 243 Traits</a>).</p> 244 245 <h3><a name="assoc_diff_it" id="assoc_diff_it">Differentiating 246 between Iterator Types</a></h3> 247 248 <p>Iterators are centric to the STL's design, because of the 249 container/algorithm/iterator decomposition that allows an 250 algorithm to operate on a range through iterators of some 251 sequence (<i>e.g.</i>, one originating from a container). 252 Iterators, then, are useful because they allow going over a 253 <u>sequence</u>. The STL also uses iterators for accessing a 254 <u>specific</u> element - <i>e.g.</i>, when an associative 255 container returns one through <tt>find</tt>. The STL, however, 256 consistently uses the same types of iterators for both 257 purposes: going over a range, and accessing a specific found 258 element. Before the introduction of hash-based containers to 259 the STL, this made sense (with the exception of priority 260 queues, which are discussed in <a href="#pq">Priority 261 Queues</a>).</p> 262 263 <p>Using the STL's associative containers together with 264 non-order-preserving associative containers (and also because 265 of priority-queues container), there is a possible need for 266 different types of iterators for self-organizing containers - 267 the iterator concept seems overloaded to mean two different 268 things (in some cases). The following subsections explain this; 269 <a href="tutorial.html#assoc_find_range">Tutorial::Associative 270 Containers::Point-Type and Range-Type Methods</a> explains an 271 alternative design which does not complicate the use of 272 order-preserving containers, but is better for unordered 273 containers; <a href= 274 "ds_gen.html#find_range">Design::Associative 275 Containers::Data-Structure Genericity::Point-Type and 276 Range-Type Methods</a> explains the design further.</p> 277 278 <h4><a name="assoc_find_it_range_it" id= 279 "assoc_find_it_range_it">Using Point-Type Iterators for 280 Range-Type Operations</a></h4> 281 282 <p>Suppose <tt>cntnr</tt> is some associative container, and 283 say <tt>c</tt> is an object of type <tt>cntnr</tt>. Then what 284 will be the outcome of</p> 285 <pre> 286std::for_each(c.find(1), c.find(5), foo); 287</pre> 288 289 <p>If <tt>cntnr</tt> is a tree-based container object, then an 290 in-order walk will apply <tt>foo</tt> to the relevant elements, 291 <i>e.g.</i>, as in Figure <a href="#range_it_in_hts">Range 292 iteration in different data structures</a> -A. If <tt>c</tt> is 293 a hash-based container, then the order of elements between any 294 two elements is undefined (and probably time-varying); there is 295 no guarantee that the elements traversed will coincide with the 296 <i>logical</i> elements between 1 and 5, <i>e.g.</i>, as in 297 Figure <a href="#range_it_in_hts">Range iteration in different 298 data structures</a>-B.</p> 299 300 <h6 class="c1"><a name="range_it_in_hts" id= 301 "range_it_in_hts"><img src="point_iterators_range_ops_1.png" 302 alt="no image" /></a></h6> 303 304 <h6 class="c1">Range iteration in different 305 data structures.</h6> 306 307 <p>In our opinion, this problem is not caused just because 308 red-black trees are order preserving while collision-chaining 309 hash tables are (generally) not - it is more fundamental. Most 310 of the STL's containers order sequences in a well-defined 311 manner that is determined by their <u>interface</u>: calling 312 <tt>insert</tt> on a tree-based container modifies its sequence 313 in a predictable way, as does calling <tt>push_back</tt> on a 314 list or a vector. Conversely, collision-chaining hash tables, 315 probing hash tables, priority queues, and list-based containers 316 (which are very useful for "multimaps") are self-organizing 317 data structures; the effect of each operation modifies their 318 sequences in a manner that is (practically) determined by their 319 <u>implementation</u>.</p> 320 321 <p>Consequently, applying an algorithm to a sequence obtained 322 from most containers <u>may or may not</u> make sense, but 323 applying it to a sub-sequence of a self-organizing container 324 <u>does not</u>.</p> 325 326 <h4><a name="assoc_range_it_for_find_it" id= 327 "assoc_range_it_for_find_it">The Cost of Enabling Range 328 Capabilities to Point-Type Iterators</a></h4> 329 330 <p>Suppose <tt>c</tt> is some collision-chaining hash-based 331 container object, and one calls <tt>c.find(3)</tt>. Then what 332 composes the returned iterator?</p> 333 334 <p>Figure <a href="#find_its_in_hash_tables">Point-type 335 iterators in hash tables</a>-A shows the simplest (and most 336 efficient) implementation of a collision-chaining hash table. 337 The little box marked <tt>point_iterator</tt> shows an object 338 that contains a pointer to the element's node. Note that this 339 "iterator" has no way to move to the next element (<i>i.e.</i>, 340 it cannot support <tt><b>operator</b>++</tt>). Conversely, the 341 little box marked <tt>iterator</tt> stores both a pointer to 342 the element, as well as some other information (<i>e.g.</i>, 343 the bucket number of the element). the second iterator, then, 344 is "heavier" than the first one- it requires more time and 345 space. If we were to use a different container to 346 cross-reference into this hash-table using these iterators - it 347 would take much more space. As noted in <a href= 348 "#assoc_find_it_range_it">Using Point-Type Iterators for 349 Range-Type Operations</a>, nothing much can be done by 350 incrementing these iterators, so why is this extra information 351 needed?</p> 352 353 <p>Alternatively, one might create a collision-chaining 354 hash-table where the lists might be linked, forming a 355 monolithic total-element list, as in Figure <a href= 356 "#find_its_in_hash_tables">Point-type iterators in hash 357 tables</a>-B (this seems similar to the Dinkumware design 358 [<a href="references.html#dinkumware_stl">dinkumware_stl</a>]). 359 Here the iterators are as light as can be, but the hash-table's 360 operations are more complicated.</p> 361 362 <h6 class="c1"><a name="find_its_in_hash_tables" id= 363 "find_its_in_hash_tables"><img src= 364 "point_iterators_range_ops_2.png" alt="no image" /></a></h6> 365 366 <h6 class="c1">Point-type iterators in hash tables.</h6> 367 368 <p>It should be noted that containers based on 369 collision-chaining hash-tables are not the only ones with this 370 type of behavior; many other self-organizing data structures 371 display it as well.</p> 372 373 <h4><a name="assoc_inv_guar" id="assoc_inv_guar">Invalidation 374 Guarantees</a></h4> 375 376 <p>Consider the following snippet:</p> 377 <pre> 378it = c.find(3); 379 380c.erase(5); 381</pre> 382 383 <p>Following the call to <tt>erase</tt>, what is the validity 384 of <tt>it</tt>: can it be de-referenced? can it be 385 incremented?</p> 386 387 <p>The answer depends on the underlying data structure of the 388 container. Figure <a href= 389 "#invalidation_guarantee_erase">Effect of erase in different 390 underlying data structures</a> shows three cases: A1 and A2 391 show a red-black tree; B1 and B2 show a probing hash-table; C1 392 and C2 show a collision-chaining hash table.</p> 393 394 <h6 class="c1"><a name="invalidation_guarantee_erase" id= 395 "invalidation_guarantee_erase"><img src= 396 "invalidation_guarantee_erase.png" alt="no image" /></a></h6> 397 398 <h6 class="c1">Effect of erase in different underlying 399 data structures.</h6> 400 401 <ol> 402 <li>Erasing 5 from A1 yields A2. Clearly, an iterator to 3 403 can be de-referenced and incremented. The sequence of 404 iterators changed, but in a way that is well-defined by the 405 <u>interface</u>.</li> 406 407 <li>Erasing 5 from B1 yields B2. Clearly, an iterator to 3 is 408 not valid at all - it cannot be de-referenced or incremented; 409 the order of iterators changed in a way that is (practically) 410 determined by the <u>implementation</u> and not by the 411 <u>interface</u>.</li> 412 413 <li>Erasing 5 from C1 yields C2. Here the situation is more 414 complicated. On the one hand, there is no problem in 415 de-referencing <tt>it</tt>. On the other hand, the order of 416 iterators changed in a way that is (practically) determined 417 by the <u>implementation</u> and not by the 418 <u>interface</u>.</li> 419 </ol> 420 421 <p>So in classic STL, it is not always possible to express 422 whether <tt>it</tt> is valid or not. This is true also for 423 <tt>insert</tt>, <i>e.g.</i>. Again, the iterator concept seems 424 overloaded.</p> 425 426 <h3><a name="assoc_methods" id="assoc_methods">Slightly 427 Different Methods</a></h3> 428 429 <p>[<a href="references.html#meyers02both">meyers02both</a>] 430 points out that a class's methods should comprise only 431 operations which depend on the class's internal structure; 432 other operations are best designed as external functions. 433 Possibly, therefore, the STL's associative containers lack some 434 useful methods, and provide some other methods which would be 435 better left out (<i>e.g.</i>, [<a href= 436 "references.html#sgi_stl">sgi_stl</a>] ).</p> 437 438 <h4><a name="assoc_ers_methods" id="assoc_ers_methods">Methods 439 Related to Erase</a></h4> 440 441 <ol> 442 <li>Order-preserving STL associative containers provide the 443 method 444 <pre> 445iterator 446 erase 447 (iterator it) 448</pre>which takes an iterator, erases the corresponding element, 449and returns an iterator to the following element. Also hash-based 450STL associative containers provide this method. This <u>seemingly 451increases</u> genericity between associative containers, since, <i> 452 e.g.</i>, it is possible to use 453 <pre> 454<b>typename</b> C::iterator it = c.begin(); 455<b>typename</b> C::iterator e_it = c.end(); 456 457<b>while</b>(it != e_it) 458 it = pred(*it)? c.erase(it) : ++it; 459</pre>in order to erase from a container object <tt> 460 c</tt> all element which match a predicate <tt>pred</tt>. 461 However, in a different sense this actually 462 <u>decreases</u> genericity: an integral implication of 463 this method is that tree-based associative containers' 464 memory use is linear in the total number of elements they 465 store, while hash-based containers' memory use is unbounded 466 in the total number of elements they store. Assume a 467 hash-based container is allowed to decrease its size when 468 an element is erased. Then the elements might be rehashed, 469 which means that there is no "next" element - it is simply 470 undefined. Consequently, it is possible to infer from the 471 fact that STL hash-based containers provide this method 472 that they cannot downsize when elements are erased 473 (<a href="assoc_performance_tests.html#hash_based">Performance 474 Tests::Hash-Based Container Tests</a> quantifies this.) As 475 a consequence, different code is needed to manipulate 476 different containers, assuming that memory should be 477 conserved. <tt>pb_ds</tt>'s non-order preserving 478 associative containers omit this method. 479 </li> 480 481 <li>All of <tt>pb_ds</tt>'s associative containers include a 482 conditional-erase method 483 <pre> 484<b>template</b>< 485 <b>class</b> Pred> 486size_type 487 erase_if 488 (Pred pred) 489</pre>which erases all elements matching a predicate. This is 490probably the only way to ensure linear-time multiple-item erase 491which can actually downsize a container. 492 </li> 493 494 <li>STL associative containers provide methods for 495 multiple-item erase of the form 496 <pre> 497size_type 498 erase 499 (It b, 500 It e) 501</pre>erasing a range of elements given by a pair of iterators. For 502tree-based or trie-based containers, this can implemented more 503efficiently as a (small) sequence of split and join operations. For 504other, unordered, containers, this method isn't much better than an 505external loop. Moreover, if <tt>c</tt> is a hash-based container, 506then, <i>e.g.</i>, <tt>c.erase(c.find(2), c.find(5))</tt> is almost 507certain to do something different than erasing all elements whose 508keys are between 2 and 5, and is likely to produce other undefined 509behavior. 510 </li> 511 </ol> 512 513 <h4><a name="assoc_split_join_methods" id= 514 "assoc_split_join_methods">Methods Related to Split and 515 Join</a></h4> 516 517 <p>It is well-known that tree-based and trie-based container 518 objects can be efficiently split or joined [<a href= 519 "references.html#clrs2001">clrs2001</a>]. Externally splitting 520 or joining trees is super-linear, and, furthermore, can throw 521 exceptions. Split and join methods, consequently, seem good 522 choices for tree-based container methods, especially, since as 523 noted just before, they are efficient replacements for erasing 524 sub-sequences. <a href= 525 "assoc_performance_tests.html#tree_like_based">Performance 526 Tests::Tree-Like-Based Container Tests</a> quantifies this.</p> 527 528 <h4><a name="assoc_insert_methods" id= 529 "assoc_insert_methods">Methods Related to Insert</a></h4> 530 531 <p>STL associative containers provide methods of the form</p> 532 <pre> 533<b>template</b>< 534 <b>class</b> It> 535size_type 536 insert 537 (It b, 538 It e); 539</pre>for inserting a range of elements given by a pair of 540iterators. At best, this can be implemented as an external loop, 541or, even more efficiently, as a join operation (for the case of 542tree-based or trie-based containers). Moreover, these methods seem 543similar to constructors taking a range given by a pair of 544iterators; the constructors, however, are transactional, whereas 545the insert methods are not; this is possibly confusing. 546 547 <h4><a name="assoc_equiv_comp_methods" id= 548 "assoc_equiv_comp_methods">Functions Related to 549 Comparison</a></h4> 550 551 <p>Associative containers are parametrized by policies 552 allowing to test key equivalence; <i>e.g.</i> a hash-based 553 container can do this through its equivalence functor, and a 554 tree-based container can do this through its comparison 555 functor. In addition, some STL associative containers have 556 global function operators, <i>e.g.</i>, 557 <tt><b>operator</b>==</tt> and <tt><b>operator</b><=</tt>, 558 that allow comparing entire associative containers.</p> 559 560 <p>In our opinion, these functions are better left out. To 561 begin with, they do not significantly improve over an external 562 loop. More importantly, however, they are possibly misleading - 563 <tt><b>operator</b>==</tt>, for example, usually checks for 564 equivalence, or interchangeability, but the associative 565 container cannot check for values' equivalence, only keys' 566 equivalence; also, are two containers considered equivalent if 567 they store the same values in different order? this is an 568 arbitrary decision.</p> 569 570 <h3><a name="assoc_mapping_semantics" id= 571 "assoc_mapping_semantics">Alternative to Multiple Equivalent 572 Keys</a></h3> 573 574 <p>Maps (or sets) allow mapping (or storing) unique-key values. 575 The STL, however, also supplies associative containers which 576 map (or store) multiple values with equivalent keys: 577 <tt>std::multimap</tt>, <tt>std::multiset</tt>, 578 <tt>std::tr1::unordered_multimap</tt>, and 579 <tt>unordered_multiset</tt>. We first discuss how these might 580 be used, then why we think it is best to avoid them.</p> 581 582 <p>Suppose one builds a simple bank-account application that 583 records for each client (identified by an <tt>std::string</tt>) 584 and account-id (marked by an <tt><b>unsigned long</b></tt>) - 585 the balance in the account (described by a 586 <tt><b>float</b></tt>). Suppose further that ordering this 587 information is not useful, so a hash-based container is 588 preferable to a tree based container. Then one can use</p> 589 <pre> 590std::tr1::unordered_map<std::pair<std::string, <b>unsigned long</b>>, <b>float</b>, ...> 591</pre>which <u>hashes every combination of client and 592account-id</u>. This might work well, except for the fact that it 593is now impossible to efficiently list all of the accounts of a 594specific client (this would practically require iterating over all 595entries). Instead, one can use 596 <pre> 597std::tr1::unordered_multimap<std::pair<std::string, <tt><b>unsigned long</b></tt>>, <b>float</b>, ...> 598</pre>which <u>hashes every client</u>, and <u>decides equivalence 599based on client</u> only. This will ensure that all accounts 600belonging to a specific user are stored consecutively. 601 602 <p>Also, suppose one wants an integers' priority queue 603 (<i>i.e.,</i> a container that supports <tt>push</tt>, 604 <tt>pop</tt>, and <tt>top</tt> operations, the last of which 605 returns the largest <tt><b>int</b></tt>) that also supports 606 operations such as <tt>find</tt> and <tt>lower_bound</tt>. A 607 reasonable solution is to build an adapter over 608 <tt>std::set<<b>int</b>></tt>. In this adapter, 609 <i>e.g.</i>, <tt>push</tt> will just call the tree-based 610 associative container's <tt>insert</tt> method; <tt>pop</tt> 611 will call its <tt>end</tt> method, and use it to return the 612 preceding element (which must be the largest). Then this might 613 work well, except that the container object cannot hold 614 multiple instances of the same integer (<tt>push(4)</tt>, 615 <i>e.g.</i>, will be a no-op if <tt>4</tt> is already in the 616 container object). If multiple keys are necessary, then one 617 might build the adapter over an 618 <tt>std::multiset<<b>int</b>></tt>.</p> 619 620 <p class="c1">STL non-unique-mapping containers, then, are 621 useful when (1) a key can be decomposed in to a primary key and 622 a secondary key, (2) a key is needed multiple times, or (3) any 623 combination of (1) and (2).</p> 624 625 <p>Figure <a href="#embedded_lists_1">Non-unique mapping 626 containers in the STL's design</a> shows how the STL's design 627 works internally; in this figure nodes shaded equally represent 628 equivalent-key values. Equivalent keys are stored consecutively 629 using the properties of the underlying data structure: binary 630 search trees (Figure <a href="#embedded_lists_1">Non-unique 631 mapping containers in the STL's design</a>-A) store 632 equivalent-key values consecutively (in the sense of an 633 in-order walk) naturally; collision-chaining hash tables 634 (Figure <a href="#embedded_lists_1">Non-unique mapping 635 containers in the STL's design</a>-B) store equivalent-key 636 values in the same bucket, the bucket can be arranged so that 637 equivalent-key values are consecutive.</p> 638 639 <h6 class="c1"><a name="embedded_lists_1" id= 640 "embedded_lists_1"><img src="embedded_lists_1.png" alt= 641 "no image" /></a></h6> 642 643 <h6 class="c1">Non-unique mapping containers in the STL's 644 design.</h6> 645 646 <p>Put differently, STL non-unique mapping 647 associative-containers are associative containers that map 648 primary keys to linked lists that are embedded into the 649 container. Figure <a href="#embedded_lists_2">Effect of 650 embedded lists in STL multimaps</a> shows again the two 651 containers from Figure <a href="#embedded_lists_1">Non-unique 652 mapping containers in the STL's design</a>, this time with the 653 embedded linked lists of the grayed nodes marked 654 explicitly.</p> 655 656 <h6 class="c1"><a name="embedded_lists_2" id= 657 "embedded_lists_2"><img src="embedded_lists_2.png" alt= 658 "no image" /></a></h6> 659 660 <h6 class="c1">Effect of embedded lists in STL multimaps.</h6> 661 662 <p>These embedded linked lists have several disadvantages.</p> 663 664 <ol> 665 <li>The underlying data structure embeds the linked lists 666 according to its own consideration, which means that the 667 search path for a value might include several different 668 equivalent-key values. For example, the search path for the 669 the black node in either of Figures <a href= 670 "#embedded_lists_1">Non-unique mapping containers in the 671 STL's design</a> A or B, includes more than a single gray 672 node.</li> 673 674 <li>The links of the linked lists are the underlying 675 data structures' nodes, which typically are quite structured. 676 <i>E.g.</i>, in the case of tree-based containers (Figure 677 <a href="#embedded_lists_2">Effect of embedded lists in STL 678 multimaps</a>-B), each "link" is actually a node with three 679 pointers (one to a parent and two to children), and a 680 relatively-complicated iteration algorithm. The linked lists, 681 therefore, can take up quite a lot of memory, and iterating 682 over all values equal to a given key (<i>e.g.</i>, through 683 the return value of the STL's <tt>equal_range</tt>) can be 684 expensive.</li> 685 686 <li>The primary key is stored multiply; this uses more 687 memory.</li> 688 689 <li>Finally, the interface of this design excludes several 690 useful underlying data structures. <i>E.g.</i>, of all the 691 unordered self-organizing data structures, practically only 692 collision-chaining hash tables can (efficiently) guarantee 693 that equivalent-key values are stored consecutively.</li> 694 </ol> 695 696 <p>The above reasons hold even when the ratio of secondary keys 697 to primary keys (or average number of identical keys) is small, 698 but when it is large, there are more severe problems:</p> 699 700 <ol> 701 <li>The underlying data structures order the links inside 702 each embedded linked-lists according to their internal 703 considerations, which effectively means that each of the 704 links is unordered. Irrespective of the underlying 705 data structure, searching for a specific value can degrade to 706 linear complexity.</li> 707 708 <li>Similarly to the above point, it is impossible to apply 709 to the secondary keys considerations that apply to primary 710 keys. For example, it is not possible to maintain secondary 711 keys by sorted order.</li> 712 713 <li>While the interface "understands" that all equivalent-key 714 values constitute a distinct list (<i>e.g.</i>, through 715 <tt>equal_range</tt>), the underlying data structure 716 typically does not. This means, <i>e.g.</i>, that operations 717 such as erasing from a tree-based container all values whose 718 keys are equivalent to a a given key can be super-linear in 719 the size of the tree; this is also true also for several 720 other operations that target a specific list.</li> 721 </ol> 722 723 <p>In <tt>pb_ds</tt>, therefore, all associative containers map 724 (or store) unique-key values. One can (1) map primary keys to 725 secondary associative-containers (<i>i.e.</i>, containers of 726 secondary keys) or non-associative containers (2) map identical 727 keys to a size-type representing the number of times they 728 occur, or (3) any combination of (1) and (2). Instead of 729 allowing multiple equivalent-key values, <tt>pb_ds</tt> 730 supplies associative containers based on underlying 731 data structures that are suitable as secondary 732 associative-containers (see <a href= 733 "assoc_performance_tests.html#msc">Associative-Container 734 Performance Tests::Observations::Mapping-Semantics 735 Considerations</a>).</p> 736 737 <p>Figures <a href="#embedded_lists_3">Non-unique mapping 738 containers in <tt>pb_ds</tt></a> A and B show the equivalent 739 structures in <tt>pb_ds</tt>'s design, to those in Figures 740 <a href="#embedded_lists_1">Non-unique mapping containers in 741 the STL's design</a> A and B, respectively. Each shaded box 742 represents some size-type or secondary 743 associative-container.</p> 744 745 <h6 class="c1"><a name="embedded_lists_3" id= 746 "embedded_lists_3"><img src="embedded_lists_3.png" alt= 747 "no image" /></a></h6> 748 749 <h6 class="c1">Non-unique mapping containers in the 750 <tt>pb_ds</tt>.</h6> 751 752 <p>In the first example above, then, one would use an 753 associative container mapping each user to an associative 754 container which maps each application id to a start time (see 755 <a href= 756 "http://gcc.gnu.org/viewcvs/*checkout*/trunk/libstdc%2B%2B-v3/testsuite/ext/pb_ds/example/basic_multimap.cc"><tt>basic_multimap.cc</tt></a>); 757 in the second example, one would use an associative container 758 mapping each <tt><b>int</b></tt> to some size-type indicating 759 the number of times it logically occurs (see <a href= 760 "http://gcc.gnu.org/viewcvs/*checkout*/trunk/libstdc%2B%2B-v3/testsuite/ext/pb_ds/example/basic_multiset.cc"><tt>basic_multiset.cc</tt></a>).</p> 761 762 <p><a href= 763 "assoc_performance_tests.html#multimaps">Associative-Container 764 Performance Tests::Multimaps</a> quantifies some of these 765 points, and <a href= 766 "assoc_performance_tests.html#msc">Associative-Container 767 Performance Tests::Observations::Mapping-Semantics 768 Considerations</a> shows some simple calculations.</p> 769 770 <p><a href="assoc_examples.html#mmaps">Associative-Container 771 Examples::Multimaps</a> shows some simple examples of using 772 "multimaps".</p> 773 774 <p><a href="lu_based_containers.html">Design::Associative 775 Containers::List-Based Containers</a> discusses types of 776 containers especially suited as secondary 777 associative-containers.</p> 778 779 <h2><a name="pq" id="pq">Priority Queues</a></h2> 780 781 <h3><a name="pq_more_ops" id="pq_more_ops">Slightly Different 782 Methods</a></h3> 783 784 <p>Priority queues are containers that allow efficiently 785 inserting values and accessing the maximal value (in the sense 786 of the container's comparison functor); <i>i.e.</i>, their 787 interface supports <tt>push</tt> and <tt>pop</tt>. The STL's 788 priority queues indeed support these methods, but they support 789 little else. For algorithmic and software-engineering purposes, 790 other methods are needed:</p> 791 792 <ol> 793 <li>Many graph algorithms [<a href= 794 "references.html#clrs2001">clrs2001</a>] require increasing a 795 value in a priority queue (again, in the sense of the 796 container's comparison functor), or joining two 797 priority-queue objects.</li> 798 799 <li>It is sometimes necessary to erase an arbitrary value in 800 a priority queue. For example, consider the <tt>select</tt> 801 function for monitoring file descriptors: 802 <pre> 803<b>int</b> 804 select 805 (<b>int</b> nfds, 806 fd_set *readfds, 807 fd_set *writefds, 808 fd_set *errorfds, 809 <b>struct</b> timeval *timeout); 810</pre>then, as the <tt>select</tt> manual page [<a href= 811"references.html#select_man">select_man</a>] states: 812 813 <p><q>The nfds argument specifies the range of file 814 descriptors to be tested. The select() function tests file 815 descriptors in the range of 0 to nfds-1.</q></p> 816 817 <p>It stands to reason, therefore, that we might wish to 818 maintain a minimal value for <tt>nfds</tt>, and priority 819 queues immediately come to mind. Note, though, that when a 820 socket is closed, the minimal file description might 821 change; in the absence of an efficient means to erase an 822 arbitrary value from a priority queue, we might as well 823 avoid its use altogether.</p> 824 825 <p><a href="pq_examples.html#xref">Priority-Queue 826 Examples::Cross-Referencing</a> shows examples for these 827 types of operations.</p> 828 </li> 829 830 <li>STL containers typically support iterators. It is 831 somewhat unusual for <tt>std::priority_queue</tt> to omit 832 them (see, <i>e.g.</i>, [<a href= 833 "references.html#meyers01stl">meyers01stl</a>]). One might 834 ask why do priority queues need to support iterators, since 835 they are self-organizing containers with a different purpose 836 than abstracting sequences. There are several reasons: 837 838 <ol> 839 <li>Iterators (even in self-organizing containers) are 840 useful for many purposes, <i>e.g.</i>, cross-referencing 841 containers, serialization, and debugging code that uses 842 these containers.</li> 843 844 <li>The STL's hash-based containers support iterators, 845 even though they too are self-organizing containers with 846 a different purpose than abstracting sequences.</li> 847 848 <li>In STL-like containers, it is natural to specify the 849 interface of operations for modifying a value or erasing 850 a value (discussed previously) in terms of a iterators. 851 This is discussed further in <a href= 852 "pq_design.html#pq_it">Design::Priority 853 Queues::Iterators</a>. It should be noted that the STL's 854 containers also use iterators for accessing and 855 manipulating a specific value. <i>E.g.</i>, in hash-based 856 containers, one checks the existence of a key by 857 comparing the iterator returned by <tt>find</tt> to the 858 iterator returned by <tt>end</tt>, and not by comparing a 859 pointer returned by <tt>find</tt> to <tt>NULL</tt>.</li> 860 </ol> 861 </li> 862 </ol> 863 864 <p><a href="pq_performance_tests.html">Performance 865 Tests::Priority Queues</a> quantifies some of these points.</p> 866 867 <h3><a name="pq_ds_genericity" id="pq_ds_genericity">More Data 868 Structures and Traits</a></h3> 869 870 <p>There are three main implementations of priority queues: the 871 first employs a binary heap, typically one which uses a 872 sequence; the second uses a tree (or forest of trees), which is 873 typically less structured than an associative container's tree; 874 the third simply uses an associative container. These are 875 shown, respectively, in Figures <a href= 876 "#pq_different_underlying_dss">Underlying Priority-Queue 877 Data-Structures</a> A1 and A2, B, and C.</p> 878 879 <h6 class="c1"><a name="pq_different_underlying_dss" id= 880 "pq_different_underlying_dss"><img src= 881 "pq_different_underlying_dss.png" alt="no image" /></a></h6> 882 883 <h6 class="c1">Underlying Priority-Queue Data-Structures.</h6> 884 885 <p>No single implementation can completely replace any of the 886 others. Some have better <tt>push</tt> and <tt>pop</tt> 887 amortized performance, some have better bounded (worst case) 888 response time than others, some optimize a single method at the 889 expense of others, <i>etc.</i>. In general the "best" 890 implementation is dictated by the problem (see <a href= 891 "pq_performance_tests.html#pq_observations">Performance 892 Tests::Priority Queues::Observations</a>).</p> 893 894 <p>As with associative containers (see <a href= 895 "#assoc_ds_genericity">Associative Containers::Traits for 896 Underlying Data-Structures</a>), the more implementations 897 co-exist, the more necessary a traits mechanism is for handling 898 generic containers safely and efficiently. This is especially 899 important for priority queues, since the invalidation 900 guarantees of one of the most useful data structures - binary 901 heaps - is markedly different than those of most of the 902 others.</p> 903 904 <p><a href="pq_design.html#pq_traits">Design::Priority 905 Queues::Traits</a> discusses this further.</p> 906 907 <h3><a name="pq_binary_heap" id="pq_binary_heap">Binary Heap 908 Implementation</a></h3> 909 910 <p>Binary heaps are one of the most useful underlying 911 data structures for priority queues. They are very efficient in 912 terms of memory (since they don't require per-value structure 913 metadata), and have the best amortized <tt>push</tt> and 914 <tt>pop</tt> performance for primitive types (<i>e.g.</i>, 915 <tt><b>int</b></tt>s).</p> 916 917 <p>The STL's <tt>priority_queue</tt> implements this data 918 structure as an adapter over a sequence, typically 919 <tt>std::vector</tt> or <tt>std::deque</tt>, which correspond 920 to Figures <a href="#pq_different_underlying_dss">Underlying 921 Priority-Queue Data-Structures</a> A1 and A2, respectively.</p> 922 923 <p>This is indeed an elegant example of the adapter concept and 924 the algorithm/container/iterator decomposition (see [<a href= 925 "references.html#nelson96stlpq">nelson96stlpql</a>]). There are 926 possibly reasons, however, why a binary-heap priority queue 927 would be better implemented as a container instead of a 928 sequence adapter:</p> 929 930 <ol> 931 <li><tt>std::priority_queue</tt> cannot erase values from its 932 adapted sequence (irrespective of the sequence type). This 933 means that the memory use of an <tt>std::priority_queue</tt> 934 object is always proportional to the maximal number of values 935 it ever contained, and not to the number of values that it 936 currently contains (see <a href= 937 "priority_queue_text_pop_mem_usage_test.html">Priority Queue 938 Text <tt>pop</tt> Memory Use Test</a>); this implementation 939 of binary heaps acts very differently than other underlying 940 data structures (<i>e.g.</i>, pairing heaps).</li> 941 942 <li>Some combinations of adapted sequences and value types 943 are very inefficient or just don't make sense. If one uses 944 <tt>std::priority_queue<std::vector<std::string> 945 > ></tt>, for example, then not only will each 946 operation perform a logarithmic number of 947 <tt>std::string</tt> assignments, but, furthermore, any 948 operation (including <tt>pop</tt>) can render the container 949 useless due to exceptions. Conversely, if one uses 950 <tt>std::priority_queue<std::deque<<b>int</b>> > 951 ></tt>, then each operation uses incurs a logarithmic 952 number of indirect accesses (through pointers) unnecessarily. 953 It might be better to let the container make a conservative 954 deduction whether to use the structure in Figures <a href= 955 "#pq_different_underlying_dss">Underlying Priority-Queue 956 Data-Structures</a> A1 or A2.</li> 957 958 <li>There does not seem to be a systematic way to determine 959 what exactly can be done with the priority queue. 960 961 <ol> 962 <li>If <tt>p</tt> is a priority queue adapting an 963 <tt>std::vector</tt>, then it is possible to iterate over 964 all values by using <tt>&p.top()</tt> and 965 <tt>&p.top() + p.size()</tt>, but this will not work 966 if <tt>p</tt> is adapting an <tt>std::deque</tt>; in any 967 case, one cannot use <tt>p.begin()</tt> and 968 <tt>p.end()</tt>. If a different sequence is adapted, it 969 is even more difficult to determine what can be 970 done.</li> 971 972 <li>If <tt>p</tt> is a priority queue adapting an 973 <tt>std::deque</tt>, then the reference return by 974 <tt>p.top()</tt> will remain valid until it is popped, 975 but if <tt>p</tt> adapts an <tt>std::vector</tt>, the 976 next <tt>push</tt> will invalidate it. If a different 977 sequence is adapted, it is even more difficult to 978 determine what can be done.</li> 979 </ol> 980 </li> 981 982 <li>Sequence-based binary heaps can still implement 983 linear-time <tt>erase</tt> and <tt>modify</tt> operations. 984 This means that if one needs, <i>e.g.</i>, to erase a small 985 (say logarithmic) number of values, then one might still 986 choose this underlying data structure. Using 987 <tt>std::priority_queue</tt>, however, this will generally 988 change the order of growth of the entire sequence of 989 operations.</li> 990 </ol> 991 </div> 992</body> 993</html> 994