10.5. Structure of Generic Algorithms

The most fundamental property of any algorithm is the list of operations it requires from its iterator(s). Some algorithms, such as find, require only the ability to access an element through the iterator, to increment the iterator, and to compare two iterators for equality. Others, such as sort, require the ability to read, write, and randomly access elements. The iterator operations required by the algorithms are grouped into five iterator categories listed in Table 10.5. Each algorithm specifies what kind of iterator must be supplied for each of its iterator parameters.

Table 10.5. Iterator Categories

A second way is to classify the algorithms (as we did in the beginning of this chapter) is by whether they read, write, or reorder the elements in the sequence. Appendix A covers all the algorithms according to this classification.

The algorithms also share a set of parameter-passing conventions and a set of naming conventions, which we shall cover after looking at iterator categories.

10.5.1. The Five Iterator Categories

Like the containers, iterators define a common set of operations. Some operations are provided by all iterators; other operations are supported by only specific kinds of iterators. For example, ostream_iterators have only increment, dereference, and assignment. Iterators on vector, strings, and deques support these operations and the decrement, relational, and arithmetic operators.

Iterators are categorized by the operations they provide and the categories form a sort of hierarchy. With the exception of output iterators, an iterator of a higher category provides all the operations of the iterators of a lower categories.

The standard specifies the minimum category for each iterator parameter of the generic and numeric algorithms. For example, find—which implements a one-pass, read-only traversal over a sequence—minimally requires an input iterator. The replace function requires a pair of iterators that are at least forward iterators. Similarly, replace_copy requires forward iterators for its first two iterators. Its third iterator, which represents a destination, must be at least an output iterator, and so on. For each parameter, the iterator must be at least as powerful as the stipulated minimum. Passing an iterator of a lesser power is an error.

Warning

Many compilers will not complain when we pass the wrong category of iterator to an algorithm.

The Iterator Categories

Input iterators: can read elements in a sequence. An input iterator must provide

• Equality and inequality operators (==, !=) to compare two iterators

• Prefix and postfix increment (++) to advance the iterator

• Dereference operator (*) to read an element; dereference may appear only on the right-hand side of an assignment

• The arrow operator (->) as a synonym for (* it).member—that is, dereference the iterator and fetch a member from the underlying object

Input iterators may be used only sequentially. We are guaranteed that *it++ is valid, but incrementing an input iterator may invalidate all other iterators into the stream. As a result, there is no guarantee that we can save the state of an input iterator and examine an element through that saved iterator. Input iterators, therefore, may be used only for single-pass algorithms. The find and accumulate algorithms require input iterators; istream_iterators are input iterators.

Output iterators: can be thought of as having complementary functionality to input iterators; they write rather than read elements. Output iterators must provide

• Prefix and postfix increment (++) to advance the iterator

• Dereference (*), which may appear only as the left-hand side of an assignment (Assigning to a dereferenced output iterator writes to the underlying element.)

We may assign to a given value of an output iterator only once. Like input iterators, output iterators may be used only for single-pass algorithms. Iterators used as a destination are typically output iterators. For example, the third parameter to copy is an output iterator. The ostream_iterator type is an output iterator.

Forward iterators: can read and write a given sequence. They move in only one direction through the sequence. Forward iterators support all the operations of both input iterators and output iterators. Moreover, they can read or write the same element multiple times. Therefore, we can use the saved state of a forward iterator. Hence, algorithms that use forward iterators may make multiple passes through the sequence. The replace algorithm requires a forward iterator; iterators on forward_list are forward iterators.

Bidirectional iterators: can read and write a sequence forward or backward. In addition to supporting all the operations of a forward iterator, a bidirectional iterator also supports the prefix and postfix decrement (--) operators. The reverse algorithm requires bidirectional iterators, and aside from forward_list, the library containers supply iterators that meet the requirements for a bidirectional iterator.

Random-access iterators: provide constant-time access to any position in the sequence. These iterators support all the functionality of bidirectional iterators. In addition, random-access iterators support the operations from Table 3.7 (p. 111):

• The relational operators (<, <=, >, and >=) to compare the relative positions of two iterators.

• Addition and subtraction operators (+, +=, -, and -=) on an iterator and an integral value. The result is the iterator advanced (or retreated) the integral number of elements within the sequence.

• The subtraction operator (-) when applied to two iterators, which yields the distance between two iterators.

• The subscript operator (iter[n]) as a synonym for * (iter + n).

The sort algorithms require random-access iterators. Iterators for array, deque, string, and vector are random-access iterators, as are pointers when used to access elements of a built-in array.

Exercises Section 10.5.1

Exercise 10.38: List the five iterator categories and the operations that each supports.

Exercise 10.39: What kind of iterator does a list have? What about a vector?

Exercise 10.40: What kinds of iterators do you think copy requires? What about reverse or unique?

10.5.2. Algorithm Parameter Patterns

Superimposed on any other classification of the algorithms is a set of parameter conventions. Understanding these parameter conventions can aid in learning new algorithms—by knowing what the parameters mean, you can concentrate on understanding the operation the algorithm performs. Most of the algorithms have one of the following four forms:

alg(beg, end, other args); alg(beg, end, dest, other args); alg(beg, end, beg2, other args); alg(beg, end, beg2, end2, other args);

where alg is the name of the algorithm, and beg and end denote the input range on which the algorithm operates. Although nearly all algorithms take an input range, the presence of the other parameters depends on the work being performed. The common ones listed here—dest, beg2, and end2—are all iterators. When used, these iterators fill similar roles. In addition to these iterator parameters, some algorithms take additional, noniterator parameters that are algorithm specific.

Algorithms with a Single Destination Iterator

A dest parameter is an iterator that denotes a destination in which the algorithm can write its output. Algorithms assume that it is safe to write as many elements as needed.

Warning

Algorithms that write to an output iterator assume the destination is large enough to hold the output.

If dest is an iterator that refers directly to a container, then the algorithm writes its output to existing elements within the container. More commonly, dest is bound to an insert iterator (§ 10.4.1, p. 401) or an ostream_iterator (§ 10.4.2, p. 403). An insert iterator adds new elements to the container, thereby ensuring that there is enough space. An ostream_iterator writes to an output stream, again presenting no problem regardless of how many elements are written.

Algorithms with a Second Input Sequence

Algorithms that take either beg2 alone or beg2 and end2 use those iterators to denote a second input range. These algorithms typically use the elements from the second range in combination with the input range to perform a computation.

When an algorithm takes both beg2 and end2, these iterators denote a second range. Such algorithms take two completely specified ranges: the input range denoted by [beg, end), and a second input range denoted by [beg2, end2).

Algorithms that take only beg2 (and not end2) treat beg2 as the first element in a second input range. The end of this range is not specified. Instead, these algorithms assume that the range starting at beg2 is at least as large as the one denoted by beg, end.

Warning

Algorithms that take beg2 alone assume that the sequence beginning at beg2 is as large as the range denoted by beg and end.

10.5.3. Algorithm Naming Conventions

Separate from the parameter conventions, the algorithms also conform to a set of naming and overload conventions. These conventions deal with how we supply an operation to use in place of the default < or == operator and with whether the algorithm writes to its input sequence or to a separate destination.

Some Algorithms Use Overloading to Pass a Predicate

Algorithms that take a predicate to use in place of the < or == operator, and that do not take other arguments, typically are overloaded. One version of the function uses the element type’s operator to compare elements; the second takes an extra parameter that is a predicate to use in place of < or ==:

unique(beg, end); // uses the == operator to compare the elements unique(beg, end, comp); // uses comp to compare the elements

Both calls reorder the given sequence by removing adjacent duplicated elements. The first uses the element type’s == operator to check for duplicates; the second calls comp to decide whether two elements are equal. Because the two versions of the function differ as to the number of arguments, there is no possible ambiguity (§ 6.4, p. 233) as to which function is being called.

Algorithms with `_if` Versions

Algorithms that take an element value typically have a second named (not overloaded) version that takes a predicate (§ 10.3.1, p. 386) in place of the value. The algorithms that take a predicate have the suffix _if appended:

find(beg, end, val); // find the first instance of val in the input range find_if(beg, end, pred); // find the first instance for which pred is true

These algorithms both find the first instance of a specific element in the input range. The find algorithm looks for a specific value; the find_if algorithm looks for a value for which pred returns a nonzero value.

These algorithms provide a named version rather than an overloaded one because both versions of the algorithm take the same number of arguments. Overloading ambiguities would therefore be possible, albeit rare. To avoid any possible ambiguities, the library provides separate named versions for these algorithms.

Distinguishing Versions That Copy from Those That Do Not

By default, algorithms that rearrange elements write the rearranged elements back into the given input range. These algorithms provide a second version that writes to a specified output destination. As we’ve seen, algorithms that write to a destination append _copy to their names (§ 10.2.2, p. 383):

reverse(beg, end); // reverse the elements in the input range reverse_copy(beg, end, dest);// copy elements in reverse order into dest

Some algorithms provide both _copy and _if versions. These versions take a destination iterator and a predicate:

// removes the odd elements from v1 remove_if(v1.begin(), v1.end(), [](int i) { return i % 2; }); // copies only the even elements from v1 into v2; v1 is unchanged remove_copy_if(v1.begin(), v1.end(), back_inserter(v2), [](int i) { return i % 2; });

Both calls use a lambda (§ 10.3.2, p. 388) to determine whether an element is odd. In the first case, we remove the odd elements from the input sequence itself. In the second, we copy the non-odd (aka even) elements from the input range into v2.

Exercises Section 10.5.3

Exercise 10.41: Based only on the algorithm and argument names, describe the operation that the each of the following library algorithms performs:

replace(beg, end, old_val, new_val); replace_if(beg, end, pred, new_val); replace_copy(beg, end, dest, old_val, new_val); replace_copy_if(beg, end, dest, pred, new_val);