containers.xml revision 1.5
1<chapter xmlns="http://docbook.org/ns/docbook" version="5.0" 2 xml:id="std.containers" xreflabel="Containers"> 3<?dbhtml filename="containers.html"?> 4 5<info><title> 6 Containers 7 <indexterm><primary>Containers</primary></indexterm> 8</title> 9 <keywordset> 10 <keyword>ISO C++</keyword> 11 <keyword>library</keyword> 12 </keywordset> 13</info> 14 15 16 17<!-- Sect1 01 : Sequences --> 18<section xml:id="std.containers.sequences" xreflabel="Sequences"><info><title>Sequences</title></info> 19<?dbhtml filename="sequences.html"?> 20 21 22<section xml:id="containers.sequences.list" xreflabel="list"><info><title>list</title></info> 23<?dbhtml filename="list.html"?> 24 25 <section xml:id="sequences.list.size" xreflabel="list::size() is O(n)"><info><title>list::size() is O(n)</title></info> 26 27 <para> 28 Yes it is, and that's okay. This is a decision that we preserved 29 when we imported SGI's STL implementation. The following is 30 quoted from <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.sgi.com/tech/stl/FAQ.html">their FAQ</link>: 31 </para> 32 <blockquote> 33 <para> 34 The size() member function, for list and slist, takes time 35 proportional to the number of elements in the list. This was a 36 deliberate tradeoff. The only way to get a constant-time 37 size() for linked lists would be to maintain an extra member 38 variable containing the list's size. This would require taking 39 extra time to update that variable (it would make splice() a 40 linear time operation, for example), and it would also make the 41 list larger. Many list algorithms don't require that extra 42 word (algorithms that do require it might do better with 43 vectors than with lists), and, when it is necessary to maintain 44 an explicit size count, it's something that users can do 45 themselves. 46 </para> 47 <para> 48 This choice is permitted by the C++ standard. The standard says 49 that size() <quote>should</quote> be constant time, and 50 <quote>should</quote> does not mean the same thing as 51 <quote>shall</quote>. This is the officially recommended ISO 52 wording for saying that an implementation is supposed to do 53 something unless there is a good reason not to. 54 </para> 55 <para> 56 One implication of linear time size(): you should never write 57 </para> 58 <programlisting> 59 if (L.size() == 0) 60 ... 61 </programlisting> 62 63 <para> 64 Instead, you should write 65 </para> 66 67 <programlisting> 68 if (L.empty()) 69 ... 70 </programlisting> 71 </blockquote> 72 </section> 73</section> 74 75</section> 76 77<!-- Sect1 02 : Associative --> 78<section xml:id="std.containers.associative" xreflabel="Associative"><info><title>Associative</title></info> 79<?dbhtml filename="associative.html"?> 80 81 82 <section xml:id="containers.associative.insert_hints" xreflabel="Insertion Hints"><info><title>Insertion Hints</title></info> 83 84 <para> 85 Section [23.1.2], Table 69, of the C++ standard lists this 86 function for all of the associative containers (map, set, etc): 87 </para> 88 <programlisting> 89 a.insert(p,t); 90 </programlisting> 91 <para> 92 where 'p' is an iterator into the container 'a', and 't' is the 93 item to insert. The standard says that <quote><code>t</code> is 94 inserted as close as possible to the position just prior to 95 <code>p</code>.</quote> (Library DR #233 addresses this topic, 96 referring to <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1780.html">N1780</link>. 97 Since version 4.2 GCC implements the resolution to DR 233, so 98 that insertions happen as close as possible to the hint. For 99 earlier releases the hint was only used as described below. 100 </para> 101 <para> 102 Here we'll describe how the hinting works in the libstdc++ 103 implementation, and what you need to do in order to take 104 advantage of it. (Insertions can change from logarithmic 105 complexity to amortized constant time, if the hint is properly 106 used.) Also, since the current implementation is based on the 107 SGI STL one, these points may hold true for other library 108 implementations also, since the HP/SGI code is used in a lot of 109 places. 110 </para> 111 <para> 112 In the following text, the phrases <emphasis>greater 113 than</emphasis> and <emphasis>less than</emphasis> refer to the 114 results of the strict weak ordering imposed on the container by 115 its comparison object, which defaults to (basically) 116 <quote><</quote>. Using those phrases is semantically sloppy, 117 but I didn't want to get bogged down in syntax. I assume that if 118 you are intelligent enough to use your own comparison objects, 119 you are also intelligent enough to assign <quote>greater</quote> 120 and <quote>lesser</quote> their new meanings in the next 121 paragraph. *grin* 122 </para> 123 <para> 124 If the <code>hint</code> parameter ('p' above) is equivalent to: 125 </para> 126 <itemizedlist> 127 <listitem> 128 <para> 129 <code>begin()</code>, then the item being inserted should 130 have a key less than all the other keys in the container. 131 The item will be inserted at the beginning of the container, 132 becoming the new entry at <code>begin()</code>. 133 </para> 134 </listitem> 135 <listitem> 136 <para> 137 <code>end()</code>, then the item being inserted should have 138 a key greater than all the other keys in the container. The 139 item will be inserted at the end of the container, becoming 140 the new entry before <code>end()</code>. 141 </para> 142 </listitem> 143 <listitem> 144 <para> 145 neither <code>begin()</code> nor <code>end()</code>, then: 146 Let <code>h</code> be the entry in the container pointed to 147 by <code>hint</code>, that is, <code>h = *hint</code>. Then 148 the item being inserted should have a key less than that of 149 <code>h</code>, and greater than that of the item preceding 150 <code>h</code>. The new item will be inserted between 151 <code>h</code> and <code>h</code>'s predecessor. 152 </para> 153 </listitem> 154 </itemizedlist> 155 <para> 156 For <code>multimap</code> and <code>multiset</code>, the 157 restrictions are slightly looser: <quote>greater than</quote> 158 should be replaced by <quote>not less than</quote>and <quote>less 159 than</quote> should be replaced by <quote>not greater 160 than.</quote> (Why not replace greater with 161 greater-than-or-equal-to? You probably could in your head, but 162 the mathematicians will tell you that it isn't the same thing.) 163 </para> 164 <para> 165 If the conditions are not met, then the hint is not used, and the 166 insertion proceeds as if you had called <code> a.insert(t) 167 </code> instead. (<emphasis>Note </emphasis> that GCC releases 168 prior to 3.0.2 had a bug in the case with <code>hint == 169 begin()</code> for the <code>map</code> and <code>set</code> 170 classes. You should not use a hint argument in those releases.) 171 </para> 172 <para> 173 This behavior goes well with other containers' 174 <code>insert()</code> functions which take an iterator: if used, 175 the new item will be inserted before the iterator passed as an 176 argument, same as the other containers. 177 </para> 178 <para> 179 <emphasis>Note </emphasis> also that the hint in this 180 implementation is a one-shot. The older insertion-with-hint 181 routines check the immediately surrounding entries to ensure that 182 the new item would in fact belong there. If the hint does not 183 point to the correct place, then no further local searching is 184 done; the search begins from scratch in logarithmic time. 185 </para> 186 </section> 187 188 189 <section xml:id="containers.associative.bitset" xreflabel="bitset"><info><title>bitset</title></info> 190 <?dbhtml filename="bitset.html"?> 191 192 <section xml:id="associative.bitset.size_variable" xreflabel="Variable"><info><title>Size Variable</title></info> 193 194 <para> 195 No, you cannot write code of the form 196 </para> 197 <!-- Careful, the leading spaces in PRE show up directly. --> 198 <programlisting> 199 #include <bitset> 200 201 void foo (size_t n) 202 { 203 std::bitset<n> bits; 204 .... 205 } 206 </programlisting> 207 <para> 208 because <code>n</code> must be known at compile time. Your 209 compiler is correct; it is not a bug. That's the way templates 210 work. (Yes, it <emphasis>is</emphasis> a feature.) 211 </para> 212 <para> 213 There are a couple of ways to handle this kind of thing. Please 214 consider all of them before passing judgement. They include, in 215 no particular order: 216 </para> 217 <itemizedlist> 218 <listitem><para>A very large N in <code>bitset<N></code>.</para></listitem> 219 <listitem><para>A container<bool>.</para></listitem> 220 <listitem><para>Extremely weird solutions.</para></listitem> 221 </itemizedlist> 222 <para> 223 <emphasis>A very large N in 224 <code>bitset<N></code>.����</emphasis> It has been 225 pointed out a few times in newsgroups that N bits only takes up 226 (N/8) bytes on most systems, and division by a factor of eight is 227 pretty impressive when speaking of memory. Half a megabyte given 228 over to a bitset (recall that there is zero space overhead for 229 housekeeping info; it is known at compile time exactly how large 230 the set is) will hold over four million bits. If you're using 231 those bits as status flags (e.g., 232 <quote>changed</quote>/<quote>unchanged</quote> flags), that's a 233 <emphasis>lot</emphasis> of state. 234 </para> 235 <para> 236 You can then keep track of the <quote>maximum bit used</quote> 237 during some testing runs on representative data, make note of how 238 many of those bits really need to be there, and then reduce N to 239 a smaller number. Leave some extra space, of course. (If you 240 plan to write code like the incorrect example above, where the 241 bitset is a local variable, then you may have to talk your 242 compiler into allowing that much stack space; there may be zero 243 space overhead, but it's all allocated inside the object.) 244 </para> 245 <para> 246 <emphasis>A container<bool>.����</emphasis> The 247 Committee made provision for the space savings possible with that 248 (N/8) usage previously mentioned, so that you don't have to do 249 wasteful things like <code>Container<char></code> or 250 <code>Container<short int></code>. Specifically, 251 <code>vector<bool></code> is required to be specialized for 252 that space savings. 253 </para> 254 <para> 255 The problem is that <code>vector<bool></code> doesn't 256 behave like a normal vector anymore. There have been 257 journal articles which discuss the problems (the ones by Herb 258 Sutter in the May and July/August 1999 issues of C++ Report cover 259 it well). Future revisions of the ISO C++ Standard will change 260 the requirement for <code>vector<bool></code> 261 specialization. In the meantime, <code>deque<bool></code> 262 is recommended (although its behavior is sane, you probably will 263 not get the space savings, but the allocation scheme is different 264 than that of vector). 265 </para> 266 <para> 267 <emphasis>Extremely weird solutions.����</emphasis> If 268 you have access to the compiler and linker at runtime, you can do 269 something insane, like figuring out just how many bits you need, 270 then writing a temporary source code file. That file contains an 271 instantiation of <code>bitset</code> for the required number of 272 bits, inside some wrapper functions with unchanging signatures. 273 Have your program then call the compiler on that file using 274 Position Independent Code, then open the newly-created object 275 file and load those wrapper functions. You'll have an 276 instantiation of <code>bitset<N></code> for the exact 277 <code>N</code> that you need at the time. Don't forget to delete 278 the temporary files. (Yes, this <emphasis>can</emphasis> be, and 279 <emphasis>has been</emphasis>, done.) 280 </para> 281 <!-- I wonder if this next paragraph will get me in trouble... --> 282 <para> 283 This would be the approach of either a visionary genius or a 284 raving lunatic, depending on your programming and management 285 style. Probably the latter. 286 </para> 287 <para> 288 Which of the above techniques you use, if any, are up to you and 289 your intended application. Some time/space profiling is 290 indicated if it really matters (don't just guess). And, if you 291 manage to do anything along the lines of the third category, the 292 author would love to hear from you... 293 </para> 294 <para> 295 Also note that the implementation of bitset used in libstdc++ has 296 <link linkend="manual.ext.containers.sgi">some extensions</link>. 297 </para> 298 299 </section> 300 <section xml:id="associative.bitset.type_string" xreflabel="Type String"><info><title>Type String</title></info> 301 302 <para> 303 </para> 304 <para> 305 Bitmasks do not take char* nor const char* arguments in their 306 constructors. This is something of an accident, but you can read 307 about the problem: follow the library's <quote>Links</quote> from 308 the homepage, and from the C++ information <quote>defect 309 reflector</quote> link, select the library issues list. Issue 310 number 116 describes the problem. 311 </para> 312 <para> 313 For now you can simply make a temporary string object using the 314 constructor expression: 315 </para> 316 <programlisting> 317 std::bitset<5> b ( std::string("10110") ); 318 </programlisting> 319 320 <para> 321 instead of 322 </para> 323 324 <programlisting> 325 std::bitset<5> b ( "10110" ); // invalid 326 </programlisting> 327 </section> 328 </section> 329 330</section> 331 332<!-- Sect1 03 : Unordered Associative --> 333<section xml:id="std.containers.unordered" xreflabel="Unordered"> 334 <info><title>Unordered Associative</title></info> 335 <?dbhtml filename="unordered_associative.html"?> 336 337 <section xml:id="containers.unordered.insert_hints" xreflabel="Insertion Hints"> 338 <info><title>Insertion Hints</title></info> 339 340 <para> 341 Here is how the hinting works in the libstdc++ implementation of unordered 342 containers, and the rationale behind this behavior. 343 </para> 344 <para> 345 In the following text, the phrase <emphasis>equivalent to</emphasis> refer 346 to the result of the invocation of the equal predicate imposed on the 347 container by its <code>key_equal</code> object, which defaults to (basically) 348 <quote>==</quote>. 349 </para> 350 <para> 351 Unordered containers can be seen as a <code>std::vector</code> of 352 <code>std::forward_list</code>. The <code>std::vector</code> represents 353 the buckets and each <code>std::forward_list</code> is the list of nodes 354 belonging to the same bucket. When inserting an element in such a data 355 structure we first need to compute the element hash code to find the 356 bucket to insert the element to, the second step depends on the uniqueness 357 of elements in the container. 358 </para> 359 <para> 360 In the case of <code>std::unordered_set</code> and 361 <code>std::unordered_map</code> you need to look through all bucket's 362 elements for an equivalent one. If there is none the insertion can be 363 achieved, otherwise the insertion fails. As we always need to loop though 364 all bucket's elements, the hint doesn't tell us if the element is already 365 present, and we don't have any constraint on where the new element is to 366 be inserted, the hint won't be of any help and will then be ignored. 367 </para> 368 <para> 369 In the case of <code>std::unordered_multiset</code> 370 and <code>std::unordered_multimap</code> equivalent elements must be 371 linked together so that the <code>equal_range(const key_type&)</code> 372 can return the range of iterators pointing to all equivalent elements. 373 This is where hinting can be used to point to another equivalent element 374 already part of the container and so skip all non equivalent elements of 375 the bucket. So to be useful the hint shall point to an element equivalent 376 to the one being inserted. The new element will be then inserted right 377 after the hint. Note that because of an implementation detail inserting 378 after a node can require updating the bucket of the following node. To 379 check if the next bucket is to be modified we need to compute the 380 following node's hash code. So if you want your hint to be really efficient 381 it should be followed by another equivalent element, the implementation 382 will detect this equivalence and won't compute next element hash code. 383 </para> 384 <para> 385 It is highly advised to start using unordered containers hints only if you 386 have a benchmark that will demonstrate the benefit of it. If you don't then do 387 not use hints, it might do more harm than good. 388 </para> 389 </section> 390 391 <section xml:id="containers.unordered.hash" xreflabel="Hash"> 392 <info><title>Hash Code</title></info> 393 394 <section xml:id="containers.unordered.cache" xreflabel="Cache"> 395 <info><title>Hash Code Caching Policy</title></info> 396 397 <para> 398 The unordered containers in libstdc++ may cache the hash code for each 399 element alongside the element itself. In some cases not recalculating 400 the hash code every time it's needed can improve performance, but the 401 additional memory overhead can also reduce performance, so whether an 402 unordered associative container caches the hash code or not depends on 403 the properties described below. 404 </para> 405 <para> 406 The C++ standard requires that <code>erase</code> and <code>swap</code> 407 operations must not throw exceptions. Those operations might need an 408 element's hash code, but cannot use the hash function if it could 409 throw. 410 This means the hash codes will be cached unless the hash function 411 has a non-throwing exception specification such as <code>noexcept</code> 412 or <code>throw()</code>. 413 </para> 414 <para> 415 If the hash function is non-throwing then libstdc++ doesn't need to 416 cache the hash code for 417 correctness, but might still do so for performance if computing a 418 hash code is an expensive operation, as it may be for arbitrarily 419 long strings. 420 As an extension libstdc++ provides a trait type to describe whether 421 a hash function is fast. By default hash functions are assumed to be 422 fast unless the trait is specialized for the hash function and the 423 trait's value is false, in which case the hash code will always be 424 cached. 425 The trait can be specialized for user-defined hash functions like so: 426 </para> 427 <programlisting> 428 #include <unordered_set> 429 430 struct hasher 431 { 432 std::size_t operator()(int val) const noexcept 433 { 434 // Some very slow computation of a hash code from an int ! 435 ... 436 } 437 } 438 439 namespace std 440 { 441 template<> 442 struct __is_fast_hash<hasher> : std::false_type 443 { }; 444 } 445 </programlisting> 446 </section> 447</section> 448 449</section> 450 451<!-- Sect1 04 : Interacting with C --> 452<section xml:id="std.containers.c" xreflabel="Interacting with C"><info><title>Interacting with C</title></info> 453<?dbhtml filename="containers_and_c.html"?> 454 455 456 <section xml:id="containers.c.vs_array" xreflabel="Containers vs. Arrays"><info><title>Containers vs. Arrays</title></info> 457 458 <para> 459 You're writing some code and can't decide whether to use builtin 460 arrays or some kind of container. There are compelling reasons 461 to use one of the container classes, but you're afraid that 462 you'll eventually run into difficulties, change everything back 463 to arrays, and then have to change all the code that uses those 464 data types to keep up with the change. 465 </para> 466 <para> 467 If your code makes use of the standard algorithms, this isn't as 468 scary as it sounds. The algorithms don't know, nor care, about 469 the kind of <quote>container</quote> on which they work, since 470 the algorithms are only given endpoints to work with. For the 471 container classes, these are iterators (usually 472 <code>begin()</code> and <code>end()</code>, but not always). 473 For builtin arrays, these are the address of the first element 474 and the <link linkend="iterators.predefined.end">past-the-end</link> element. 475 </para> 476 <para> 477 Some very simple wrapper functions can hide all of that from the 478 rest of the code. For example, a pair of functions called 479 <code>beginof</code> can be written, one that takes an array, 480 another that takes a vector. The first returns a pointer to the 481 first element, and the second returns the vector's 482 <code>begin()</code> iterator. 483 </para> 484 <para> 485 The functions should be made template functions, and should also 486 be declared inline. As pointed out in the comments in the code 487 below, this can lead to <code>beginof</code> being optimized out 488 of existence, so you pay absolutely nothing in terms of increased 489 code size or execution time. 490 </para> 491 <para> 492 The result is that if all your algorithm calls look like 493 </para> 494 <programlisting> 495 std::transform(beginof(foo), endof(foo), beginof(foo), SomeFunction); 496 </programlisting> 497 <para> 498 then the type of foo can change from an array of ints to a vector 499 of ints to a deque of ints and back again, without ever changing 500 any client code. 501 </para> 502 503<programlisting> 504// beginof 505template<typename T> 506 inline typename vector<T>::iterator 507 beginof(vector<T> &v) 508 { return v.begin(); } 509 510template<typename T, unsigned int sz> 511 inline T* 512 beginof(T (&array)[sz]) { return array; } 513 514// endof 515template<typename T> 516 inline typename vector<T>::iterator 517 endof(vector<T> &v) 518 { return v.end(); } 519 520template<typename T, unsigned int sz> 521 inline T* 522 endof(T (&array)[sz]) { return array + sz; } 523 524// lengthof 525template<typename T> 526 inline typename vector<T>::size_type 527 lengthof(vector<T> &v) 528 { return v.size(); } 529 530template<typename T, unsigned int sz> 531 inline unsigned int 532 lengthof(T (&)[sz]) { return sz; } 533</programlisting> 534 535 <para> 536 Astute readers will notice two things at once: first, that the 537 container class is still a <code>vector<T></code> instead 538 of a more general <code>Container<T></code>. This would 539 mean that three functions for <code>deque</code> would have to be 540 added, another three for <code>list</code>, and so on. This is 541 due to problems with getting template resolution correct; I find 542 it easier just to give the extra three lines and avoid confusion. 543 </para> 544 <para> 545 Second, the line 546 </para> 547 <programlisting> 548 inline unsigned int lengthof (T (&)[sz]) { return sz; } 549 </programlisting> 550 <para> 551 looks just weird! Hint: unused parameters can be left nameless. 552 </para> 553 </section> 554 555</section> 556 557</chapter> 558