Andrew Koenig is a former AT&T researcher and programmer. He is author of C Traps and Pitfalls and, with Barbara, coauthor of Ruminations on C++ and Accelerated C++.
Barbara Moo is an independent consultant with 20 years of experience in the software field.
We have long been interested in how programmers' tools affect the way they approach problems. We recently found a particularly clear example that shows how three different toolsets lead to three different solutions to the same problem.
The first solution is nearly 35 years old, and is a classically structured solution that is easy to express in any imperative programming language such as C++ or C. The second uses the ideas of functional programming, particularly the notion of lazy evaluation, to solve the problem in a radically different way. Several times we have seen functional-programming advocates use this solution to argue for their paradigm.
In the third solution, we return to traditional imperative programming, but we add the ordered associative containers from the C++ Standard Library. This addition turns out to simplify the solution dramatically.
The problem comes from Edsger Dijkstra's book A Discipline of Programming (Prentice-Hall, 1971), in which he attributes the problem to R.W. Hamming: Compute the elements of the series of integers with only 2, 3, and 5 as prime factors. In other words:
- 1 is a member of the series.
- If n is a member of the series, so are 2n, 3n, and 5n.
- The series consists only of members determined by rules 1 and 2.
Dijkstra points out that this series is easy to compute if speed is no object, and then goes on to give a clever algorithm for computing it quickly. Like most clever algorithms, it requires some explanation to be understood.
Since A Discipline of Programming was published, functional-programming advocates have seized on this example to show how much easier some problems are to solve in a functional style than in Dijkstra's imperative style. They typically show a beautiful, elegant solution that depends on the notion of lazy evaluationa notion that is an essential part of pure functional-programming languages but is not widely present in imperative languages.
Recently, we had occasion to translate Dijkstra's algorithm into C++. In doing so, we realized that he made an assumption about the language features he was willing to use that does not apply to modern C++. By weakening this assumption and taking advantage of one of the data structures from the C++ Standard Library, we were able to come up with a new solution that we believe is much easier to understand than either the functional solution or Dijkstra's original algorithm.
A Naive Solution
Before we think about an efficient way to solve our problem, let's try to solve it as straightforwardly as possible. We know that elements of our sequence are going to depend on the values of earlier elements, which suggests that we should store the entire sequence as we generate it. Moreover, we know that 1 is the first element in the sequence. These two facts suggest that we begin by defining a sequence that contains 1 as its only element. In C++, we should usually use vectors unless we have a good reason to prefer another container. Because we are adopting a naive approach, we shall not even think about other containers.
This discussion leads us to write the following code as the first part of our solution:
// Initialize the sequence vector<int> seq; seq.push_back(1);
The remaining part of the problem is to find the next element in the sequence, given whatever elements we have generated already. We would like to generate elements in order, so each element that we generate should be strictly larger than any elements we have already. Under that constraint, the next element in the sequence is the smallest integer obtained by multiplying each of the existing elements by 2, 3, and 5. In other words, we have to multiply every element in the sequence by 2, 3, and 5, and choose the smallest of these products that is greater than every existing element in the sequence.
Because we are going to have to do three similar computations, we should define a function that we can call three times. This function will multiply each element in the sequence by a given factor and return the smallest product that is greater than any existing element.
Because we will be generating elements in order, we can take advantage of this order by searching the elements from the beginning toward the end. That way, the smallest product will also be the first:
int firstprod(const vector<int>& v, int n) { assert(!v.empty()); for (vector<int>::const_iterator it = v.begin(); it != v.end(); ++it) { int val = *it * n; if (val > v.back()) return val; } assert(false); }
We begin by verifying that our vector argument is not empty because, otherwise, v.back() would be undefined. Then we examine each vector element, multiplying it by n and returning the first product that is greater than v.back(), which is presumably the largest element of v. We should always find an appropriate value; if we don't, the program is broken so we halt it.
Once we have this function, it is easy to append an appropriate new element to the vector:
// Extend the sequence seq.push_back(min(firstprod(seq, 2), min(firstprod(seq, 3), firstprod(seq, 5))));
We call firstprod three times, call min twice to find the smallest of the three values we obtain, and append that value to seq.
Dijkstra's Algorithm
Dijkstra did not go so far as to present a solution along these lines because it is unacceptably slow. Each call to firstprod searches the sequence from the beginning, so we can expect that it will take O(n) time to search an n-element sequence. Therefore, generating an n-element sequence, which requires n searches, will take O(n2) time.
It is a good idea to avoid quadratic behavior if there is any way to do so. Accordingly, Dijkstra suggested the fasterand more complicatedapproach of beginning each search for an appropriate product at the point where the last search ended.
The most straightforward way to remember where each search ended is to store the locations in three variables, one each for 2, 3, and 5. In C++, it is tempting to use iterators for these locations, but extending the sequence will invalidate those iterators. Accordingly, we shall use indices instead.
How do we obtain those indices? Each call to firstprod returns a product, from which it is not easy to obtain an index. Accordingly, we shall modify our search function to return an index instead. It will be easy enough to compute the appropriate product in the calling code:
vector<int>::size_type firstindex(const vector<int>& v, int n, vector<int>::size_type start) { assert(v.size() > start); for (vector<int>::size_type i = start; i != v.size(); ++i) if (v[i] * n > v.back()) return i; assert(false); }
This function is similar to firstprod. However, it has acquired a third argument, which is the position in the vector at which to begin the search. Moreover, its return type now indicates that it is returning a vector index. As a general rule, whenever we change the interface or behavior of a function, we should change its name to avoid confusion.
We begin by verifying that the vector has at least one element beyond the starting pointotherwise the search would fail, and we know that such a failure should never happen. The inner loop is similar to the one in firstprod, but once we find an appropriate value, we return the index of the vector element rather than the product.
The code that uses this function is now slightly more complicated. As before, we initialize a vector to contain 1 as its only element. However, we now also define three variables that will remember the current states of our three searches:
// Initialize the sequence vector<int> seq; seq.push_back(1); vector<int>::size_type ind2 = 0, ind3 = 0, ind5 = 0;
The variable ind2 remembers where we are in our search for a vector element to multiply by 2, and analogously for ind3 and ind5.
To extend the sequence, we now do our three searches, changing ind2, ind3, and ind5 appropriately, multiply each element found by its corresponding factor and append the smallest product to the vector:
// Extend the sequence ind2 = firstindex(v, 2, ind2); ind3 = firstindex(v, 3, ind3); ind5 = firstindex(v, 5, ind5); seq.push_back(min(2 * seq[ind2], min(3 * seq[ind3], 5 * seq[ind5])));
Because this revised program visits each element of the sequence at most twice, it should be able to create an n-element sequence in O(n) time, which is a dramatic improvement on the previous version.
The Functional Solution
On several occasions, we have seen functional-programming advocates use this example to show how much simpler functional programming is than imperative programming. They argue that if you have lazy evaluation, it is easy to solve this problem by saying simply that the sequence we are trying to generate is the result of merging three sequences, each of which is the product of the appropriate integer and the sequence that we are generating.
This claim is much easier to illustrate than it is to describe. In this example, we use Haskell, which is a popular functional language that supports lazy evaluation. In Haskell, the notation x:y refers to a sequence that has x as its first element and y as its remaining elements. The empty sequence is []. The = symbol is not an assignment, in the sense that it doesn't change the value of anything. Instead, it defines whatever is on the left side as a synonym for the expression on the right side. For example, the statement:
x = 1:x
defines x to be an infinite sequence, each element of which is 1. We can understand this definition by seeing that the first element of x is 1, and that the remaining elements of x are the elements of x. So the second element is the same as the first, the third is the same as the second, and so on. This kind of infinite recursion is no great problem in Haskell, as the language is implemented so as to evaluate only as many elements of x as are needed.
With this notation in mind, we need to write code to solve three subproblems:
- Multiply a sequence by a constant.
- Merge two sorted sequences.
- Use these two functions to define the main sequence.
The first of these subproblems is easy:
scale n (x:xs) = (n * x) : (scale n xs)
Function arguments are separated by spaces in Haskell, so we can think of scale as a function with two parameters, n and (x:xs). The second parameter's form says that we want to split it into two pieces: the first element (x) and the remaining elements (xs). Here, we can think of the s in xs as a plural, indicating that xs might represent many elements.
This definition, then, says that the first element of the result of scale is obtained by multiplying n by the first element of the second argument, and the remaining elements are obtained by applying scale recursively to the remaining elements.
The program to merge two sequences is somewhat longer, but not much more complicated:
merge xs [] = xs merge [] ys = ys merge (x:xs) (y:ys) = if x == y then x : (merge xs ys) else if x < y then x : (merge xs (y:ys)) else y : (merge (x:xs) ys)
The first two lines note that if one argument is empty, the result is the other argument. Otherwise, we must compare the first element of each sequence so as to avoid duplicates. If the elements are equal, we pick one of them (x in this case) to be the first element of the result; the rest of the result is obtained by merging the rest of the two sequences. Otherwise, the first element of the result is the smaller of the two comparands, followed by the result of merging the tail of the corresponding sequence with the entire other sequence.
The program to compute the overall sequence looks at first like magic:
seq = 1 : (merge (scale 2 seq) (merge (scale 3 seq) (scale 5 seq)))
However, it is no more magical than our earlier example that defined x as 1:x. We are merely saying that the first element of seq is 1, and that subsequent elements are to be obtained by multiplying the elements of seq by 2, 3, and 5, respectively, and merging the resulting sequences.
An Idiomatic C++ Solution
When we read Dijkstra's description of his solution to the problem, we were struck by his assumption that the sequence would be stored in a vector-like data structure. In particular, he assumed that we would want to generate new elements at the end of the sequence because, otherwise, inserting newly generated elements into the sequence would require "an amount of reshuffling, which we would like to avoid."
Suppose, however, that we do not feel the need to avoid that reshuffling? In particular, suppose we have a data structure that will automatically maintain values in order for us and remove duplicates? Then we can examine its elements in turn, multiply each element by 2, 3, and 5, and put the multiples into our data structure.
Come to think of it, we have just such a data structure; it's what the C++ Standard Library calls a set. Our strategy, then, will be to maintain an iterator that refers to each element of the set in turn, and insert the appropriate multiples of each element into the set as we encounter it.
Initializing the set is similar to initializing the vector that we were using before. The iterator is similarly straightforward:
// Initialization set<int> seq; seq.insert(1); set<int>::const_iterator it = seq.begin();
The element to which it refers will always be what we know will be the next element in the sequence because whatever multiples of that element we insert into the sequence will inevitably follow it. Therefore, fetching that element and extending the sequence is simplicity itself:
// Extend the sequence int val = *it; seq.insert(val * 2); seq.insert(val * 3); seq.insert(val * 5); ++it;
This code conceals two subtleties. First, it relies on the fact that inserting elements into a set does not invalidate iterators that refer to existing elements. Second, it does not increment it until after inserting the new elements, in case one of those elements is the one to which it will refer next.
Aside from these subtleties, it is hard to imagine a program that follows more accurately our original statement of the problem.
Conclusion
We have looked at four solutions to a simple problem. The first was straightforward but slow. The second was much faster but fairly tricky. The third, in a functional language, was again straightforwardbut requires a totally different way of thinking about programming. Indeed, advocates of this way of thinking use the program's straightforwardness to argue that this way of thinking is superior.
With the fourth solution, we believe that the argument is far from obvious. It is true that this solution is somewhat slower than Dijkstra's solution: Instead of O(n) it is O(n log n) because operations on an n-element set are usually O(log n). However, in practice, O(n log n) is usually acceptable; in particular, O(n log n) is much closer to O(n) than it is to O(n2).
Moreover, the solution is much simpler. The problem says that 1 is part of the sequence, so we insert 1 in the sequence. It says that if val is in the sequence, then so are val * 2, val * 3, and val * 5, so we insert those multiples into the sequence. Everything else happens automatically.
In retrospect, this solution is obvious. What we find most interesting about it is that even though we have known about this problem for many years, we did not consider using an ordered-set data structure to solve it until we realized that we already had one available.
Such realizations are an important, positive effect of having a variety of useful data structures and algorithms availableand of learning how to use them even if there is not an obvious, immediate need for doing so. It is not essential to master every detail of every tool that you might ever need to use. What is essential is knowing that the tools exist, being open to the possibility of using them, and being willing to learn how to use them when they are likely to be useful.