iterators.xml revision 1.1
1<?xml version='1.0'?>
2<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
3 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"
4[ ]>
5
6<chapter id="std.iterators" xreflabel="Iterators">
7<?dbhtml filename="iterators.html"?>
8
9<chapterinfo>
10  <keywordset>
11    <keyword>
12      ISO C++
13    </keyword>
14    <keyword>
15      library
16    </keyword>
17  </keywordset>
18</chapterinfo>
19
20<title>
21  Iterators
22  <indexterm><primary>Iterators</primary></indexterm>
23</title>
24
25<!-- Sect1 01 : Predefined -->
26<sect1 id="std.iterators.predefined" xreflabel="Predefined">
27  <title>Predefined</title>
28
29  <sect2 id="iterators.predefined.vs_pointers" xreflabel="Versus Pointers">
30    <title>Iterators vs. Pointers</title>
31   <para>
32     The following
33FAQ <link linkend="faq.iterator_as_pod">entry</link> points out that
34iterators are not implemented as pointers.  They are a generalization
35of pointers, but they are implemented in libstdc++ as separate
36classes.
37   </para>
38   <para>
39     Keeping that simple fact in mind as you design your code will
40      prevent a whole lot of difficult-to-understand bugs.
41   </para>
42   <para>
43     You can think of it the other way 'round, even.  Since iterators
44     are a generalization, that means
45     that <emphasis>pointers</emphasis> are
46      <emphasis>iterators</emphasis>, and that pointers can be used
47     whenever an iterator would be.  All those functions in the
48     Algorithms sect1 of the Standard will work just as well on plain
49     arrays and their pointers.
50   </para>
51   <para>
52     That doesn't mean that when you pass in a pointer, it gets
53      wrapped into some special delegating iterator-to-pointer class
54      with a layer of overhead.  (If you think that's the case
55      anywhere, you don't understand templates to begin with...)  Oh,
56      no; if you pass in a pointer, then the compiler will instantiate
57      that template using T* as a type, and good old high-speed
58      pointer arithmetic as its operations, so the resulting code will
59      be doing exactly the same things as it would be doing if you had
60      hand-coded it yourself (for the 273rd time).
61   </para>
62   <para>
63     How much overhead <emphasis>is</emphasis> there when using an
64      iterator class?  Very little.  Most of the layering classes
65      contain nothing but typedefs, and typedefs are
66      &quot;meta-information&quot; that simply tell the compiler some
67      nicknames; they don't create code.  That information gets passed
68      down through inheritance, so while the compiler has to do work
69      looking up all the names, your runtime code does not.  (This has
70      been a prime concern from the beginning.)
71   </para>
72
73
74  </sect2>
75
76  <sect2 id="iterators.predefined.end" xreflabel="end() Is One Past the End">
77    <title>One Past the End</title>
78
79   <para>This starts off sounding complicated, but is actually very easy,
80      especially towards the end.  Trust me.
81   </para>
82   <para>Beginners usually have a little trouble understand the whole
83      'past-the-end' thing, until they remember their early algebra classes
84      (see, they <emphasis>told</emphasis> you that stuff would come in handy!) and
85      the concept of half-open ranges.
86   </para>
87   <para>First, some history, and a reminder of some of the funkier rules in
88      C and C++ for builtin arrays.  The following rules have always been
89      true for both languages:
90   </para>
91   <orderedlist>
92      <listitem>
93	<para>You can point anywhere in the array, <emphasis>or to the first element
94	  past the end of the array</emphasis>.  A pointer that points to one
95	  past the end of the array is guaranteed to be as unique as a
96	  pointer to somewhere inside the array, so that you can compare
97	  such pointers safely.
98	</para>
99      </listitem>
100      <listitem>
101	<para>You can only dereference a pointer that points into an array.
102	  If your array pointer points outside the array -- even to just
103	  one past the end -- and you dereference it, Bad Things happen.
104	</para>
105      </listitem>
106      <listitem>
107	<para>Strictly speaking, simply pointing anywhere else invokes
108	  undefined behavior.  Most programs won't puke until such a
109	  pointer is actually dereferenced, but the standards leave that
110	  up to the platform.
111	</para>
112      </listitem>
113   </orderedlist>
114   <para>The reason this past-the-end addressing was allowed is to make it
115      easy to write a loop to go over an entire array, e.g.,
116      while (*d++ = *s++);.
117   </para>
118   <para>So, when you think of two pointers delimiting an array, don't think
119      of them as indexing 0 through n-1.  Think of them as <emphasis>boundary
120      markers</emphasis>:
121   </para>
122   <programlisting>
123
124   beginning            end
125     |                   |
126     |                   |               This is bad.  Always having to
127     |                   |               remember to add or subtract one.
128     |                   |               Off-by-one bugs very common here.
129     V                   V
130	array of N elements
131     |---|---|--...--|---|---|
132     | 0 | 1 |  ...  |N-2|N-1|
133     |---|---|--...--|---|---|
134
135     ^                       ^
136     |                       |
137     |                       |           This is good.  This is safe.  This
138     |                       |           is guaranteed to work.  Just don't
139     |                       |           dereference 'end'.
140   beginning                end
141
142   </programlisting>
143   <para>See?  Everything between the boundary markers is chapter of the array.
144      Simple.
145   </para>
146   <para>Now think back to your junior-high school algebra course, when you
147      were learning how to draw graphs.  Remember that a graph terminating
148      with a solid dot meant, &quot;Everything up through this point,&quot;
149      and a graph terminating with an open dot meant, &quot;Everything up
150      to, but not including, this point,&quot; respectively called closed
151      and open ranges?  Remember how closed ranges were written with
152      brackets, <emphasis>[a,b]</emphasis>, and open ranges were written with parentheses,
153      <emphasis>(a,b)</emphasis>?
154   </para>
155   <para>The boundary markers for arrays describe a <emphasis>half-open range</emphasis>,
156      starting with (and including) the first element, and ending with (but
157      not including) the last element:  <emphasis>[beginning,end)</emphasis>.  See, I
158      told you it would be simple in the end.
159   </para>
160   <para>Iterators, and everything working with iterators, follows this same
161      time-honored tradition.  A container's <code>begin()</code> method returns
162      an iterator referring to the first element, and its <code>end()</code>
163      method returns a past-the-end iterator, which is guaranteed to be
164      unique and comparable against any other iterator pointing into the
165      middle of the container.
166   </para>
167   <para>Container constructors, container methods, and algorithms, all take
168      pairs of iterators describing a range of values on which to operate.
169      All of these ranges are half-open ranges, so you pass the beginning
170      iterator as the starting parameter, and the one-past-the-end iterator
171      as the finishing parameter.
172   </para>
173   <para>This generalizes very well.  You can operate on sub-ranges quite
174      easily this way; functions accepting a <emphasis>[first,last)</emphasis> range
175      don't know or care whether they are the boundaries of an entire {array,
176      sequence, container, whatever}, or whether they only enclose a few
177      elements from the center.  This approach also makes zero-length
178      sequences very simple to recognize:  if the two endpoints compare
179      equal, then the {array, sequence, container, whatever} is empty.
180   </para>
181   <para>Just don't dereference <code>end()</code>.
182   </para>
183
184  </sect2>
185</sect1>
186
187<!-- Sect1 02 : Stream -->
188
189</chapter>
190