Everyone who reads the Usenet C++ newsgroups sees questions about how to make programs run faster. These questions are often about the relative merits of low-level constructs such as n*2 versus n<<1, or i++ versus ++i. The answers that appear to such questions often remark that computers are so fast these days that there is no reason to worry about performance.
These questions and answers conceal two important misconceptions:
- Although low-level details such as choosing among equivalent expressions can affect a programs execution time, programmers often ignore algorithmic choices that have a much greater impact on performance.
- The fact that computers keep getting faster increases, rather than decreases, these algorithmic choices effect on performance.
The rest of this article will explain the facts behind these misconceptions.
Faster Computers are No Panacea
In 1970, Columbia University had an IBM System/360 Model 91 computer, which was the fastest general-purpose computer commercially available. This machine, which served the entire universitys academic computing needs, had a 60-nanosecond cycle time, two megabytes of main memory, and about a gigabyte of disk.
Today, a typical desktop PC has about a 0.5-nanosecond cycle time, a quarter of a gigabyte of memory, and 100 gigabytes of disk. So an individual today has access to a machine that is roughly 100 times the capacity of a 1970-era supercomputer, regardless of whether you measure processor speed, memory, or disk. Why should anyone give a hoot about performance anymore?
One reason to care about performance is that even though todays computers are much faster than their predecessors, we use them to solve larger problems in less time than we did with their predecessors. As an example, consider how much computation is involved in decompressing and displaying a video image from a DVD in real time.
Even programs without such stringent performance requirements as real-time video can run into scale problems. For example, some operating systems use a data structure for their directories that causes the amount of time it takes to create the n+1st file in a directory to be proportional to n. As a result, creating a directory with n files takes time proportional to n2.
Suppose that you are using a computer to store a thousand images from a digital camera, and that you take the straightforward approach of storing all of these images in a single directory. In that case, the data structure that your operating system uses to store directories could potentially make a thousand-fold difference in how quickly you can retrieve an individual image and a million-fold difference in how long it takes to copy your thousand images from one directory to another.
Suppose that your images are a megabyte each. Then a thousand images will occupy a gigabyte. Thats the total capacity of our 1970 supercomputer, but only one percent of the capacity of a modern PC. In other words, we would not have dreamed of storing a thousand files in a single directory in 1970, but such large directories are routine today. As computers become bigger and faster, the negative effect of poorly chosen algorithms grows proportionally.
Another way to put it is that as computers become faster, low-level performance details become less important, and high-level strategy has a greater effect on performance. Accordingly, we should be thinking less about low-level, tactical performance issues, and more about high-level strategic and algorithmic performance issues.
A Tale of Three Algorithms
To make this discussion concrete, here is a small problem: Write a function that takes a string and returns a new string that contains the same characters as the original, but with all the spaces removed. So, for example, given this is a test as input, the function should return thisisatest.
We imagine that many C++ programmers would write something along the following lines:
// Version 1 std::string deblank(std::string s) { std::string::iterator it; while ((it = std::find(s.begin(), s.end(), ' ')) != s.end()) s.erase(it); return s; }
On the surface, this function looks like a reasonable solution to the problem. It repeatedly searches the subject string to find the first blank. If there is no blank to be found, the call to find will return s.end(), and the loop will terminate. Otherwise, the function continues by erasing the blank, and then repeating the entire process.
The trouble with this approach is easy to see once it has been pointed out: Finding a space involves searching all of the characters before the space, and erasing a space involves copying all of the characters after it. Accordingly, the function touches every nonspace character in the string each time it encounters a space. Therefore, the execution time of this function has the potential to grow proportionally to the square of the length of the input.
When we tested this function on a (somewhat slow) computer that happened to be available, we found that giving it a string with 100,000 nonspace characters ran too quickly to measure, but giving the function a string of 100,000 spaces caused it to take took 18 seconds!
We can find a much better solution to this problem by ensuring that we touch each character only once:
// Version 2 std::string deblank(std::string s) { std::string r; for (int i = 0; i != s.size(); ++i) if (s[i] != ' ') r += s[i]; return r; }
On the surface, this function may seem less efficient than Version 1, for several reasons:
- It creates a new string to hold the result, thus potentially requiring two copies of the string in memory at the same time.
- It uses += to append characters one at a time to the result.
- It compares characters directly in a loop rather than using find.
- It evaluates s.size() each time through the loop.
- It uses indices rather than iterators to fetch characters from s.
Nevertheless, we consider this function to be a great improvement over Version 1 for a very important reason: We can count on each individual operation on each character to take no more than a constant amount of time. Even appending s[i] to r is guaranteed to take no more than time proportional to n for n consecutive appends. Accordingly, the execution time of Version 2 is no worse than proportional to the length of its input.
When we run Version 2, we find that it handles a 100,000-character string too quickly for us to measure, regardless of how many spaces the string contains.
We mentioned several objections that one might raise to Version 2 of our program. Lets rewrite it to remove some of those objections:
// Version 3 std::string deblank(std::string s) { std::string::iterator in = s.begin(), out = in, end = s.end(); while (in != end) { char c = *in; if (c != ' ') { *out = c; ++out; } ++in; } s.erase(out, end); return s; }
This revised version overwrites the characters of s one at a time, using two iterators, in and out, to keep track of where it is in the string. When it is finished, it erases the characters from the end of the string, thereby truncating the string to the length of the output.
Version 3 is definitely larger and harder to understand than Version 2. Does Version 3 run faster than Version 2? Certainly. Most of the difference appears to come from the multiple += operations in Version 2 that Version 3 avoids especially if the input is so large as to exhaust memory trying to keep both the input and output strings around at the same time. As a result, Version 3 is only about twice as fast as Version 2 when the input is mostly spaces; the difference is much greater when there are relatively few spaces in the input. In general, however, the difference between Versions 2 and 3 is much less than the difference between Versions 1 and 2.
Summary
Computers keep getting bigger and faster. A gigabyte of disk space served an entire university in 1970; today, its barely enough to store a vacations worth of snapshots. As a result, we expect computers today to do things that would have been out of their predecessors range.
When we write programs, we make many decisions that affect those programs performance. Some of those decisions are local, low-level optimizations that can change the programs speed by no more than a constant factor. For example, if you do a computation twice instead of doing it once, that part of the program will run twice as slowly as a change that will probably have an even smaller effect on the program as a whole.
Other decisions will affect programs performance by a factor that might be proportional to the size of the input, or might even grow more quickly than the input. Whenever the execution speed changes proportionally to the size of the input, the overall execution time changes proportionally to the square of the input size. In other words, if making the input 10 times as large slows the program down by an additional factor of 10, the total execution time will increase by a factor of 100. Accordingly, a single bad algorithm can have such a profound effect on a programs execution time as to render irrelevant any other decisions about that programs performance. As computers become faster, and people use them to solve larger problems, these algorithmic decisions become relatively more important.
Andrew Koenig retired from AT&Ts research division in June, 2003. A programmer for more than 35 years, 17 of them in C++, he has published more than 150 articles about C++ and speaks on the topic worldwide. He is the author of C Traps and Pitfalls and coauthor of Ruminations on C++ and Accelerated C++.
Barbara E. Moo is an independent consultant with 20 years experience in the software field. During her nearly 15 years at AT&T, she worked on one of the first commercial projects ever written in C++, managed the companys first C++ compiler project, and directed the development of AT&Ts award-winning WorldNet Internet service business. She is coauthor of Ruminations on C++ and Accelerated C++ and lectures worldwide.