iterators.xml revision 1.1
1<?xml version='1.0'?> 2<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" 3 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" 4[ ]> 5 6<chapter id="std.iterators" xreflabel="Iterators"> 7<?dbhtml filename="iterators.html"?> 8 9<chapterinfo> 10 <keywordset> 11 <keyword> 12 ISO C++ 13 </keyword> 14 <keyword> 15 library 16 </keyword> 17 </keywordset> 18</chapterinfo> 19 20<title> 21 Iterators 22 <indexterm><primary>Iterators</primary></indexterm> 23</title> 24 25<!-- Sect1 01 : Predefined --> 26<sect1 id="std.iterators.predefined" xreflabel="Predefined"> 27 <title>Predefined</title> 28 29 <sect2 id="iterators.predefined.vs_pointers" xreflabel="Versus Pointers"> 30 <title>Iterators vs. Pointers</title> 31 <para> 32 The following 33FAQ <link linkend="faq.iterator_as_pod">entry</link> points out that 34iterators are not implemented as pointers. They are a generalization 35of pointers, but they are implemented in libstdc++ as separate 36classes. 37 </para> 38 <para> 39 Keeping that simple fact in mind as you design your code will 40 prevent a whole lot of difficult-to-understand bugs. 41 </para> 42 <para> 43 You can think of it the other way 'round, even. Since iterators 44 are a generalization, that means 45 that <emphasis>pointers</emphasis> are 46 <emphasis>iterators</emphasis>, and that pointers can be used 47 whenever an iterator would be. All those functions in the 48 Algorithms sect1 of the Standard will work just as well on plain 49 arrays and their pointers. 50 </para> 51 <para> 52 That doesn't mean that when you pass in a pointer, it gets 53 wrapped into some special delegating iterator-to-pointer class 54 with a layer of overhead. (If you think that's the case 55 anywhere, you don't understand templates to begin with...) Oh, 56 no; if you pass in a pointer, then the compiler will instantiate 57 that template using T* as a type, and good old high-speed 58 pointer arithmetic as its operations, so the resulting code will 59 be doing exactly the same things as it would be doing if you had 60 hand-coded it yourself (for the 273rd time). 61 </para> 62 <para> 63 How much overhead <emphasis>is</emphasis> there when using an 64 iterator class? Very little. Most of the layering classes 65 contain nothing but typedefs, and typedefs are 66 "meta-information" that simply tell the compiler some 67 nicknames; they don't create code. That information gets passed 68 down through inheritance, so while the compiler has to do work 69 looking up all the names, your runtime code does not. (This has 70 been a prime concern from the beginning.) 71 </para> 72 73 74 </sect2> 75 76 <sect2 id="iterators.predefined.end" xreflabel="end() Is One Past the End"> 77 <title>One Past the End</title> 78 79 <para>This starts off sounding complicated, but is actually very easy, 80 especially towards the end. Trust me. 81 </para> 82 <para>Beginners usually have a little trouble understand the whole 83 'past-the-end' thing, until they remember their early algebra classes 84 (see, they <emphasis>told</emphasis> you that stuff would come in handy!) and 85 the concept of half-open ranges. 86 </para> 87 <para>First, some history, and a reminder of some of the funkier rules in 88 C and C++ for builtin arrays. The following rules have always been 89 true for both languages: 90 </para> 91 <orderedlist> 92 <listitem> 93 <para>You can point anywhere in the array, <emphasis>or to the first element 94 past the end of the array</emphasis>. A pointer that points to one 95 past the end of the array is guaranteed to be as unique as a 96 pointer to somewhere inside the array, so that you can compare 97 such pointers safely. 98 </para> 99 </listitem> 100 <listitem> 101 <para>You can only dereference a pointer that points into an array. 102 If your array pointer points outside the array -- even to just 103 one past the end -- and you dereference it, Bad Things happen. 104 </para> 105 </listitem> 106 <listitem> 107 <para>Strictly speaking, simply pointing anywhere else invokes 108 undefined behavior. Most programs won't puke until such a 109 pointer is actually dereferenced, but the standards leave that 110 up to the platform. 111 </para> 112 </listitem> 113 </orderedlist> 114 <para>The reason this past-the-end addressing was allowed is to make it 115 easy to write a loop to go over an entire array, e.g., 116 while (*d++ = *s++);. 117 </para> 118 <para>So, when you think of two pointers delimiting an array, don't think 119 of them as indexing 0 through n-1. Think of them as <emphasis>boundary 120 markers</emphasis>: 121 </para> 122 <programlisting> 123 124 beginning end 125 | | 126 | | This is bad. Always having to 127 | | remember to add or subtract one. 128 | | Off-by-one bugs very common here. 129 V V 130 array of N elements 131 |---|---|--...--|---|---| 132 | 0 | 1 | ... |N-2|N-1| 133 |---|---|--...--|---|---| 134 135 ^ ^ 136 | | 137 | | This is good. This is safe. This 138 | | is guaranteed to work. Just don't 139 | | dereference 'end'. 140 beginning end 141 142 </programlisting> 143 <para>See? Everything between the boundary markers is chapter of the array. 144 Simple. 145 </para> 146 <para>Now think back to your junior-high school algebra course, when you 147 were learning how to draw graphs. Remember that a graph terminating 148 with a solid dot meant, "Everything up through this point," 149 and a graph terminating with an open dot meant, "Everything up 150 to, but not including, this point," respectively called closed 151 and open ranges? Remember how closed ranges were written with 152 brackets, <emphasis>[a,b]</emphasis>, and open ranges were written with parentheses, 153 <emphasis>(a,b)</emphasis>? 154 </para> 155 <para>The boundary markers for arrays describe a <emphasis>half-open range</emphasis>, 156 starting with (and including) the first element, and ending with (but 157 not including) the last element: <emphasis>[beginning,end)</emphasis>. See, I 158 told you it would be simple in the end. 159 </para> 160 <para>Iterators, and everything working with iterators, follows this same 161 time-honored tradition. A container's <code>begin()</code> method returns 162 an iterator referring to the first element, and its <code>end()</code> 163 method returns a past-the-end iterator, which is guaranteed to be 164 unique and comparable against any other iterator pointing into the 165 middle of the container. 166 </para> 167 <para>Container constructors, container methods, and algorithms, all take 168 pairs of iterators describing a range of values on which to operate. 169 All of these ranges are half-open ranges, so you pass the beginning 170 iterator as the starting parameter, and the one-past-the-end iterator 171 as the finishing parameter. 172 </para> 173 <para>This generalizes very well. You can operate on sub-ranges quite 174 easily this way; functions accepting a <emphasis>[first,last)</emphasis> range 175 don't know or care whether they are the boundaries of an entire {array, 176 sequence, container, whatever}, or whether they only enclose a few 177 elements from the center. This approach also makes zero-length 178 sequences very simple to recognize: if the two endpoints compare 179 equal, then the {array, sequence, container, whatever} is empty. 180 </para> 181 <para>Just don't dereference <code>end()</code>. 182 </para> 183 184 </sect2> 185</sect1> 186 187<!-- Sect1 02 : Stream --> 188 189</chapter> 190