3.4. Introducing Iterators

Although we can use subscripts to access the characters of a string or the elements in a vector, there is a more general mechanism—known as iterators—that we can use for the same purpose. As we’ll see in Part II, in addition to vector, the library defines several other kinds of containers. All of the library containers have iterators, but only a few of them support the subscript operator. Technically speaking, a string is not a container type, but string supports many of the container operations. As we’ve seen string, like vector has a subscript operator. Like vectors, strings also have iterators.

Like pointers (§ 2.3.2, p. 52), iterators give us indirect access to an object. In the case of an iterator, that object is an element in a container or a character in a string. We can use an iterator to fetch an element and iterators have operations to move from one element to another. As with pointers, an iterator may be valid or invalid. A valid iterator either denotes an element or denotes a position one past the last element in a container. All other iterator values are invalid.

3.4.1. Using Iterators

Unlike pointers, we do not use the address-of operator to obtain an iterator. Instead, types that have iterators have members that return iterators. In particular, these types have members named begin and end. The begin member returns an iterator that denotes the first element (or first character), if there is one:

// the compiler determines the type of b and e; see § 2.5.2 (p. 68) // b denotes the first element and e denotes one past the last element in v auto b = v.begin(), e = v.end(); // b and e have the same type

The iterator returned by end is an iterator positioned “one past the end” of the associated container (or string). This iterator denotes a nonexistent element “off the end” of the container. It is used as a marker indicating when we have processed all the elements. The iterator returned by end is often referred to as the off-the-end iterator or abbreviated as “the end iterator.” If the container is empty, begin returns the same iterator as the one returned by end.

Note

If the container is empty, the iterators returned by begin and end are equal—they are both off-the-end iterators.

In general, we do not know (or care about) the precise type that an iterator has. In this example, we used auto to define b and e (§ 2.5.2, p. 68). As a result, these variables have whatever type is returned by the begin and end members, respectively. We’ll have more to say about those types on page 108.

Iterator Operations

Iterators support only a few operations, which are listed in Table 3.6. We can compare two valid iterators using == or !=. Iterators are equal if they denote the same element or if they are both off-the-end iterators for the same container. Otherwise, they are unequal.

Table 3.6. Standard Container Iterator Operations

As with pointers, we can dereference an iterator to obtain the element denoted by an iterator. Also, like pointers, we may dereference only a valid iterator that denotes an element (§ 2.3.2, p. 53). Dereferencing an invalid iterator or an off-the-end iterator has undefined behavior.

As an example, we’ll rewrite the program from § 3.2.3 (p. 94) that capitalized the first character of a string using an iterator instead of a subscript:

string s("some string"); if (s.begin() != s.end()) { // make sure s is not empty auto it = s.begin(); // it denotes the first character in s *it = toupper(*it); // make that character uppercase }

As in our original program, we first check that s isn’t empty. In this case, we do so by comparing the iterators returned by begin and end. Those iterators are equal if the string is empty. If they are unequl, there is at least one character in s.

Inside the if body, we obtain an iterator to the first character by assigning the iterator returned by begin to it. We dereference that iterator to pass that character to toupper. We also dereference it on the left-hand side of the assignment in order to assign the character returned from toupper to the first character in s. As in our original program, the output of this loop will be:

Some string

Moving Iterators from One Element to Another

Iterators use the increment (++) operator (§ 1.4.1, p. 12) to move from one element to the next. Incrementing an iterator is a logically similar operation to incrementing an integer. In the case of integers, the effect is to “add 1” to the integer’s value. In the case of iterators, the effect is to “advance the iterator by one position.”

Note

Because the iterator returned from end does not denote an element, it may not be incremented or dereferenced.

Using the increment operator, we can rewrite our program that changed the case of the first word in a string to use iterators instead:

// process characters in s until we run out of characters or we hit a whitespace for (auto it = s.begin(); it != s.end() && !isspace(*it); ++it) *it = toupper(*it); // capitalize the current character

This loop, like the one in § 3.2.3 (p. 94), iterates through the characters in s, stopping when we encounter a whitespace character. However, this loop accesses these characters using an iterator, not a subscript.

The loop starts by initializing it from s.begin, meaning that it denotes the first character (if any) in s. The condition checks whether it has reached the end of s. If not, the condition next dereferences it to pass the current character to isspace to see whether we’re done. At the end of each iteration, we execute ++it to advance the iterator to access the next character in s.

The body of this loop, is the same as the last statement in the previous if. We dereference it to pass the current character to toupper and assign the resulting uppercase letter back into the character denoted by it.

Key Concept: Generic Programming

Programmers coming to C++ from C or Java might be surprised that we used != rather than < in our for loops such as the one above and in the one on page 94. C++ programmers use != as a matter of habit. They do so for the same reason that they use iterators rather than subscripts: This coding style applies equally well to various kinds of containers provided by the library.

As we’ve seen, only a few library types, vector and string being among them, have the subscript operator. Similarly, all of the library containers have iterators that define the == and != operators. Most of those iterators do not have the < operator. By routinely using iterators and !=, we don’t have to worry about the precise type of container we’re processing.

Iterator Types

Just as we do not know the precise type of a vector’s or string’s size_type member (§ 3.2.2, p. 88), so too, we generally do not know—and do not need to know—the precise type of an iterator. Instead, as with size_type, the library types that have iterators define types named iterator and const_iterator that represent actual iterator types:

vector<int>::iterator it; // it can read and write vector<int> elements string::iterator it2; // it2 can read and write characters in a string vector<int>::const_iterator it3; // it3 can read but not write elements string::const_iterator it4; // it4 can read but not write characters

A const_iterator behaves like a const pointer (§ 2.4.2, p. 62). Like a const pointer, a const_iterator may read but not write the element it denotes; an object of type iterator can both read and write. If a vector or string is const, we may use only its const_iterator type. With a nonconst vector or string, we can use either iterator or const_iterator.

Terminology: Iterators and Iterator Types

The term iterator is used to refer to three different entities. We might mean the concept of an iterator, or we might refer to the iterator type defined by a container, or we might refer to an object as an iterator.

What’s important to understand is that there is a collection of types that are related conceptually. A type is an iterator if it supports a common set of actions. Those actions let us access an element in a container and let us move from one element to another.

Each container class defines a type named iterator; that iterator type supports the actions of an (conceptual) iterator.

The `begin` and `end` Operations

The type returned by begin and end depends on whether the object on which they operator is const. If the object is const, then begin and end return a const_iterator; if the object is not const, they return iterator:

vector<int> v; const vector<int> cv; auto it1 = v.begin(); // it1 has type vector<int>::iterator auto it2 = cv.begin(); // it2 has type vector<int>::const_iterator

Often this default behavior is not what we want. For reasons we’ll explain in § 6.2.3 (p. 213), it is usually best to use a const type (such as const_iterator) when we need to read but do not need to write to an object. To let us ask specifically for the const_iterator type, the new standard introduced two new functions named cbegin and cend:

auto it3 = v.cbegin(); // it3 has type vector<int>::const_iterator

As do the begin and end members, these members return iterators to the first and one past the last element in the container. However, regardless of whether the vector (or string) is const, they return a const_iterator.

Combining Dereference and Member Access

When we dereference an iterator, we get the object that the iterator denotes. If that object has a class type, we may want to access a member of that object. For example, we might have a vector of strings and we might need to know whether a given element is empty. Assuming it is an iterator into this vector, we can check whether the string that it denotes is empty as follows:

(*it).empty()

For reasons we’ll cover in § 4.1.2 (p. 136), the parentheses in (*it).empty() are necessary. The parentheses say to apply the dereference operator to it and to apply the dot operator (§ 1.5.2, p. 23) to the result of dereferencing it. Without parentheses, the dot operator would apply to it, not to the resulting object:

(*it).empty() // dereferences it and calls the member empty on the resulting object *it.empty() // error: attempts to fetch the member named empty from it // but it is an iterator and has no member named empty

The second expression is interpreted as a request to fetch the empty member from the object named it. However, it is an iterator and has no member named empty. Hence, the second expression is in error.

To simplify expressions such as this one, the language defines the arrow operator (the -> operator). The arrow operator combines dereference and member access into a single operation. That is, it->mem is a synonym for (* it).mem.

For example, assume we have a vector<string> named text that holds the data from a text file. Each element in the vector is either a sentence or an empty string representing a paragraph break. If we want to print the contents of the first paragraph from text, we’d write a loop that iterates through text until we encounter an element that is empty:

// print each line in text up to the first blank line for (auto it = text.cbegin(); it != text.cend() && !it->empty(); ++it) cout << *it << endl;

We start by initializing it to denote the first element in text. The loop continues until either we process every element in text or we find an element that is empty. So long as there are elements and we haven’t seen an empty element, we print the current element. It is worth noting that because the loop reads but does not write to the elements in text, we use cbegin and cend to control the iteration.

Some `vector` Operations Invalidate Iterators

In § 3.3.2 (p. 101) we noted that there are implications of the fact that vectors can grow dynamically. We also noted that one such implication is that we cannot add elements to a vector inside a range for loop. Another implication is that any operation, such as push_back, that changes the size of a vector potentially invalidates all iterators into that vector. We’ll explore how iterators become invalid in more detail in § 9.3.6 (p. 353).

Warning

For now, it is important to realize that loops that use iterators should not add elements to the container to which the iterators refer.

Exercises Section 3.4.1

Exercise 3.21: Redo the first exercise from § 3.3.3 (p. 105) using iterators.

Exercise 3.22: Revise the loop that printed the first paragraph in text to instead change the elements in text that correspond to the first paragraph to all uppercase. After you’ve updated text, print its contents.

Exercise 3.23: Write a program to create a vector with ten int elements. Using an iterator, assign each element a value that is twice its current value. Test your program by printing the vector.

3.4.2. Iterator Arithmetic

Incrementing an iterator moves the iterator one element at a time. All the library containers have iterators that support increment. Similarly, we can use == and != to compare two valid iterators (§ 3.4, p. 106) into any of the library container types.

Iterators for string and vector support additional operations that can move an iterator multiple elements at a time. They also support all the relational operators. These operations, which are often referred to as iterator arithmetic, are described in Table 3.7.

Table 3.7. Operations Supported by vector and string Iterators

Arithmetic Operations on Iterators

We can add (or subtract) an integral value and an iterator. Doing so returns an iterator positioned forward (or backward) that many elements. When we add or subtract an integral value and an iterator, the result must denote an element in the same vector (or string) or denote one past the end of the associated vector (or string). As an example, we can compute an iterator to the element nearest the middle of a vector:

// compute an iterator to the element closest to the midpoint of vi auto mid = vi.begin() + vi.size() / 2;

If vi has 20 elements, then vi.size()/2 is 10. In this case, we’d set mid equal to vi.begin() + 10. Remembering that subscripts start at 0, this element is the same as vi[10], the element ten past the first.

In addition to comparing two iterators for equality, we can compare vector and string iterators using the relational operators (<, <=, >, >=). The iterators must be valid and must denote elements in (or one past the end of) the same vector or string. For example, assuming it is an iterator into the same vector as mid, we can check whether it denotes an element before or after mid as follows:

if (it < mid) // process elements in the first half of vi

We can also subtract two iterators so long as they refer to elements in, or one off the end of, the same vector or string. The result is the distance between the iterators. By distance we mean the amount by which we’d have to change one iterator to get the other. The result type is a signed integral type named difference_type. Both vector and string define difference_type. This type is signed, because subtraction might have a negative result.

Using Iterator Arithmetic

A classic algorithm that uses iterator arithmetic is binary search. A binary search looks for a particular value in a sorted sequence. It operates by looking at the element closest to the middle of the sequence. If that element is the one we want, we’re done. Otherwise, if that element is smaller than the one we want, we continue our search by looking only at elements after the rejected one. If the middle element is larger than the one we want, we continue by looking only in the first half. We compute a new middle element in the reduced range and continue looking until we either find the element or run out of elements.

We can do a binary search using iterators as follows:

// text must be sorted // beg and end will denote the range we're searching auto beg = text.begin(), end = text.end(); auto mid = text.begin() + (end - beg)/2; // original midpoint // while there are still elements to look at and we haven't yet found sought while (mid != end && *mid != sought) { if (sought < *mid) // is the element we want in the first half? end = mid; // if so, adjust the range to ignore the second half else // the element we want is in the second half beg = mid + 1; // start looking with the element just after mid mid = beg + (end - beg)/2; // new midpoint }

We start by defining three iterators: beg will be the first element in the range, end one past the last element, and mid the element closest to the middle. We initialize these iterators to denote the entire range in a vector<string> named text.

Our loop first checks that the range is not empty. If mid is equal to the current value of end, then we’ve run out of elements to search. In this case, the condition fails and we exit the while. Otherwise, mid refers to an element and we check whether mid denotes the one we want. If so, we’re done and we exit the loop.

If we still have elements to process, the code inside the while adjusts the range by moving end or beg. If the element denoted by mid is greater than sought, we know that if sought is in text, it will appear before the element denoted by mid. Therefore, we can ignore elements after mid, which we do by assigning mid to end. If *mid is smaller than sought, the element must be in the range of elements after the one denoted by mid. In this case, we adjust the range by making beg denote the element just after mid. We already know that mid is not the one we want, so we can eliminate it from the range.

At the end of the while, mid will be equal to end or it will denote the element for which we are looking. If mid equals end, then the element was not in text.

Exercises Section 3.4.2

Exercise 3.24: Redo the last exercise from § 3.3.3 (p. 105) using iterators.

Exercise 3.25: Rewrite the grade clustering program from § 3.3.3 (p. 104) using iterators instead of subscripts.

Exercise 3.26: In the binary search program on page 112, why did we write mid = beg + (end - beg) / 2; instead of mid = (beg + end) /2;?