Although we can use subscripts to access the characters of a string
or the elements in a vector
, there is a more general mechanism—known as iterators—that we can use for the same purpose. As we’ll see in Part II, in addition to vector
, the library defines several other kinds of containers. All of the library containers have iterators, but only a few of them support the subscript operator. Technically speaking, a string
is not a container type, but string
supports many of the container operations. As we’ve seen string
, like vector
has a subscript operator. Like vector
s, string
s also have iterators.
Like pointers (§ 2.3.2, p. 52), iterators give us indirect access to an object. In the case of an iterator, that object is an element in a container or a character in a string
. We can use an iterator to fetch an element and iterators have operations to move from one element to another. As with pointers, an iterator may be valid or invalid. A valid iterator either denotes an element or denotes a position one past the last element in a container. All other iterator values are invalid.
Unlike pointers, we do not use the address-of operator to obtain an iterator. Instead, types that have iterators have members that return iterators. In particular, these types have members named begin
and end
. The begin
member returns an iterator that denotes the first element (or first character), if there is one:
// the compiler determines the type of b and e; see § 2.5.2 (p. 68)
// b denotes the first element and e denotes one past the last element in v
auto b = v.begin(), e = v.end(); // b and e have the same type
The iterator returned by end
is an iterator positioned “one past the end” of the associated container (or string
). This iterator denotes a nonexistent element “off the end” of the container. It is used as a marker indicating when we have processed all the elements. The iterator returned by end
is often referred to as the off-the-end iterator or abbreviated as “the end
iterator.” If the container is empty, begin
returns the same iterator as the one returned by end
.
If the container is empty, the iterators returned by
begin
andend
are equal—they are both off-the-end iterators.
In general, we do not know (or care about) the precise type that an iterator has. In this example, we used auto
to define b
and e
(§ 2.5.2, p. 68). As a result, these variables have whatever type is returned by the begin
and end
members, respectively. We’ll have more to say about those types on page 108.
Iterators support only a few operations, which are listed in Table 3.6. We can compare two valid iterators using ==
or !=
. Iterators are equal if they denote the same element or if they are both off-the-end iterators for the same container. Otherwise, they are unequal.
Table 3.6. Standard Container Iterator Operations
As with pointers, we can dereference an iterator to obtain the element denoted by an iterator. Also, like pointers, we may dereference only a valid iterator that denotes an element (§ 2.3.2, p. 53). Dereferencing an invalid iterator or an off-the-end iterator has undefined behavior.
As an example, we’ll rewrite the program from § 3.2.3 (p. 94) that capitalized the first character of a string
using an iterator instead of a subscript:
string s("some string");
if (s.begin() != s.end()) { // make sure s is not empty
auto it = s.begin(); // it denotes the first character in s
*it = toupper(*it); // make that character uppercase
}
As in our original program, we first check that s
isn’t empty. In this case, we do so by comparing the iterators returned by begin
and end
. Those iterators are equal if the string
is empty. If they are unequl, there is at least one character in s
.
Inside the if
body, we obtain an iterator to the first character by assigning the iterator returned by begin
to it
. We dereference that iterator to pass that character to toupper
. We also dereference it
on the left-hand side of the assignment in order to assign the character returned from toupper
to the first character in s
. As in our original program, the output of this loop will be:
Some string
Iterators use the increment (++
) operator (§ 1.4.1, p. 12) to move from one element to the next. Incrementing an iterator is a logically similar operation to incrementing an integer. In the case of integers, the effect is to “add 1
” to the integer’s value. In the case of iterators, the effect is to “advance the iterator by one position.”
Because the iterator returned from
end
does not denote an element, it may not be incremented or dereferenced.
Using the increment operator, we can rewrite our program that changed the case of the first word in a string
to use iterators instead:
// process characters in s until we run out of characters or we hit a whitespace
for (auto it = s.begin(); it != s.end() && !isspace(*it); ++it)
*it = toupper(*it); // capitalize the current character
This loop, like the one in § 3.2.3 (p. 94), iterates through the characters in s
, stopping when we encounter a whitespace character. However, this loop accesses these characters using an iterator, not a subscript.
The loop starts by initializing it
from s.begin
, meaning that it
denotes the first character (if any) in s
. The condition checks whether it
has reached the end
of s
. If not, the condition next dereferences it
to pass the current character to isspace
to see whether we’re done. At the end of each iteration, we execute ++it
to advance the iterator to access the next character in s
.
The body of this loop, is the same as the last statement in the previous if
. We dereference it
to pass the current character to toupper
and assign the resulting uppercase letter back into the character denoted by it
.
Programmers coming to C++ from C or Java might be surprised that we used
!=
rather than<
in ourfor
loops such as the one above and in the one on page 94. C++ programmers use!=
as a matter of habit. They do so for the same reason that they use iterators rather than subscripts: This coding style applies equally well to various kinds of containers provided by the library.As we’ve seen, only a few library types,
vector
andstring
being among them, have the subscript operator. Similarly, all of the library containers have iterators that define the==
and!=
operators. Most of those iterators do not have the<
operator. By routinely using iterators and!=
, we don’t have to worry about the precise type of container we’re processing.
Just as we do not know the precise type of a vector
’s or string
’s size_type
member (§ 3.2.2, p. 88), so too, we generally do not know—and do not need to know—the precise type of an iterator. Instead, as with size_type
, the library types that have iterators define types named iterator
and const_iterator
that represent actual iterator types:
vector<int>::iterator it; // it can read and write vector<int> elements
string::iterator it2; // it2 can read and write characters in a string
vector<int>::const_iterator it3; // it3 can read but not write elements
string::const_iterator it4; // it4 can read but not write characters
A const_iterator
behaves like a const
pointer (§ 2.4.2, p. 62). Like a const
pointer, a const_iterator
may read but not write the element it denotes; an object of type iterator
can both read and write. If a vector
or string
is const
, we may use only its const_iterator
type. With a nonconst vector
or string
, we can use either iterator
or const_iterator
.
The term iterator is used to refer to three different entities. We might mean the concept of an iterator, or we might refer to the
iterator
type defined by a container, or we might refer to an object as an iterator.What’s important to understand is that there is a collection of types that are related conceptually. A type is an iterator if it supports a common set of actions. Those actions let us access an element in a container and let us move from one element to another.
Each container class defines a type named
iterator;
thatiterator
type supports the actions of an (conceptual) iterator.
begin
and end
OperationsThe type returned by begin
and end
depends on whether the object on which they operator is const
. If the object is const
, then begin
and end
return a const_iterator;
if the object is not const
, they return iterator
:
vector<int> v;
const vector<int> cv;
auto it1 = v.begin(); // it1 has type vector<int>::iterator
auto it2 = cv.begin(); // it2 has type vector<int>::const_iterator
Often this default behavior is not what we want. For reasons we’ll explain in § 6.2.3 (p. 213), it is usually best to use a const
type (such as const_iterator
) when we need to read but do not need to write to an object. To let us ask specifically for the const_iterator
type, the new standard introduced two new functions named cbegin
and cend
:
auto it3 = v.cbegin(); // it3 has type vector<int>::const_iterator
As do the begin
and end
members, these members return iterators to the first and one past the last element in the container. However, regardless of whether the vector
(or string
) is const
, they return a const_iterator
.
When we dereference an iterator, we get the object that the iterator denotes. If that object has a class type, we may want to access a member of that object. For example, we might have a vector
of string
s and we might need to know whether a given element is empty. Assuming it
is an iterator into this vector
, we can check whether the string
that it
denotes is empty as follows:
(*it).empty()
For reasons we’ll cover in § 4.1.2 (p. 136), the parentheses in (*it).empty()
are necessary. The parentheses say to apply the dereference operator to it
and to apply the dot operator (§ 1.5.2, p. 23) to the result of dereferencing it
. Without parentheses, the dot operator would apply to it
, not to the resulting object:
(*it).empty() // dereferences it and calls the member empty on the resulting object
*it.empty() // error: attempts to fetch the member named empty from it
// but it is an iterator and has no member named empty
The second expression is interpreted as a request to fetch the empty
member from the object named it
. However, it
is an iterator and has no member named empty
. Hence, the second expression is in error.
To simplify expressions such as this one, the language defines the arrow operator (the ->
operator). The arrow operator combines dereference and member access into a single operation. That is, it->mem
is a synonym for (* it).mem
.
For example, assume we have a vector<string>
named text
that holds the data from a text file. Each element in the vector
is either a sentence or an empty string
representing a paragraph break. If we want to print the contents of the first paragraph from text
, we’d write a loop that iterates through text
until we encounter an element that is empty:
// print each line in text up to the first blank line
for (auto it = text.cbegin();
it != text.cend() && !it->empty(); ++it)
cout << *it << endl;
We start by initializing it
to denote the first element in text
. The loop continues until either we process every element in text
or we find an element that is empty. So long as there are elements and we haven’t seen an empty element, we print the current element. It is worth noting that because the loop reads but does not write to the elements in text
, we use cbegin
and cend
to control the iteration.
vector
Operations Invalidate IteratorsIn § 3.3.2 (p. 101) we noted that there are implications of the fact that vector
s can grow dynamically. We also noted that one such implication is that we cannot add elements to a vector
inside a range for
loop. Another implication is that any operation, such as push_back
, that changes the size of a vector
potentially invalidates all iterators into that vector
. We’ll explore how iterators become invalid in more detail in § 9.3.6 (p. 353).
For now, it is important to realize that loops that use iterators should not add elements to the container to which the iterators refer.
Exercises Section 3.4.1
Exercise 3.21: Redo the first exercise from § 3.3.3 (p. 105) using iterators.
Exercise 3.22: Revise the loop that printed the first paragraph in
text
to instead change the elements intext
that correspond to the first paragraph to all uppercase. After you’ve updatedtext
, print its contents.Exercise 3.23: Write a program to create a
vector
with tenint
elements. Using an iterator, assign each element a value that is twice its current value. Test your program by printing thevector
.
Incrementing an iterator moves the iterator one element at a time. All the library containers have iterators that support increment. Similarly, we can use ==
and !=
to compare two valid iterators (§ 3.4, p. 106) into any of the library container types.
Iterators for string
and vector
support additional operations that can move an iterator multiple elements at a time. They also support all the relational operators. These operations, which are often referred to as iterator arithmetic, are described in Table 3.7.
Table 3.7. Operations Supported by vector
and string
Iterators
We can add (or subtract) an integral value and an iterator. Doing so returns an iterator positioned forward (or backward) that many elements. When we add or subtract an integral value and an iterator, the result must denote an element in the same vector
(or string
) or denote one past the end of the associated vector
(or string
). As an example, we can compute an iterator to the element nearest the middle of a vector
:
// compute an iterator to the element closest to the midpoint of vi
auto mid = vi.begin() + vi.size() / 2;
If vi
has 20 elements, then vi.size()/2
is 10
. In this case, we’d set mid
equal to vi.begin() + 10
. Remembering that subscripts start at 0, this element is the same as vi[10]
, the element ten past the first.
In addition to comparing two iterators for equality, we can compare vector
and string
iterators using the relational operators (<
, <=
, >
, >=
). The iterators must be valid and must denote elements in (or one past the end of) the same vector
or string
. For example, assuming it
is an iterator into the same vector
as mid
, we can check whether it
denotes an element before or after mid
as follows:
if (it < mid)
// process elements in the first half of vi
We can also subtract two iterators so long as they refer to elements in, or one off the end of, the same vector
or string
. The result is the distance between the iterators. By distance we mean the amount by which we’d have to change one iterator to get the other. The result type is a signed integral type named difference_type
. Both vector
and string
define difference_type
. This type is signed, because subtraction might have a negative result.
A classic algorithm that uses iterator arithmetic is binary search. A binary search looks for a particular value in a sorted sequence. It operates by looking at the element closest to the middle of the sequence. If that element is the one we want, we’re done. Otherwise, if that element is smaller than the one we want, we continue our search by looking only at elements after the rejected one. If the middle element is larger than the one we want, we continue by looking only in the first half. We compute a new middle element in the reduced range and continue looking until we either find the element or run out of elements.
We can do a binary search using iterators as follows:
// text must be sorted
// beg and end will denote the range we're searching
auto beg = text.begin(), end = text.end();
auto mid = text.begin() + (end - beg)/2; // original midpoint
// while there are still elements to look at and we haven't yet found sought
while (mid != end && *mid != sought) {
if (sought < *mid) // is the element we want in the first half?
end = mid; // if so, adjust the range to ignore the second half
else // the element we want is in the second half
beg = mid + 1; // start looking with the element just after mid
mid = beg + (end - beg)/2; // new midpoint
}
We start by defining three iterators: beg
will be the first element in the range, end
one past the last element, and mid
the element closest to the middle. We initialize these iterators to denote the entire range in a vector<string>
named text
.
Our loop first checks that the range is not empty. If mid
is equal to the current value of end
, then we’ve run out of elements to search. In this case, the condition fails and we exit the while
. Otherwise, mid
refers to an element and we check whether mid
denotes the one we want. If so, we’re done and we exit the loop.
If we still have elements to process, the code inside the while
adjusts the range by moving end
or beg
. If the element denoted by mid
is greater than sought
, we know that if sought
is in text
, it will appear before the element denoted by mid
. Therefore, we can ignore elements after mid
, which we do by assigning mid
to end
. If *mid
is smaller than sought
, the element must be in the range of elements after the one denoted by mid
. In this case, we adjust the range by making beg
denote the element just after mid
. We already know that mid
is not the one we want, so we can eliminate it from the range.
At the end of the while
, mid
will be equal to end
or it will denote the element for which we are looking. If mid
equals end
, then the element was not in text
.
Exercises Section 3.4.2
Exercise 3.24: Redo the last exercise from § 3.3.3 (p. 105) using iterators.
Exercise 3.25: Rewrite the grade clustering program from § 3.3.3 (p. 104) using iterators instead of subscripts.
Exercise 3.26: In the binary search program on page 112, why did we write
mid = beg + (end - beg) / 2;
instead ofmid = (beg + end) /2;
?