Andrew is a former AT&T researcher and programmer for more than 35 years. He is the author of C Traps and Pitfalls and, with Barbara, coauthor of Ruminations on C++ and Accelerated C++. Barbara is an independent consultant with 20 years of experience in the software field.
Conventional wisdom is that if we are trying to decide what kind of Standard Library container to use and we have no reason to choose one kind over another, we should use vector. The rationale is that even though appending successive elements to a vector may cause it to reallocate its memory, copying all its existing elements, the overhead associated with this copying is less than the overhead implicit in other kinds of containers.
One well-known reason to avoid the vector type is if we are inserting elements at the beginningor anywhere except close to the endof the container. Inserting a new element at the beginning of a vector requires copying all of the other elements, whereas inserting an element at the beginning of a list or deque does not copy any other elements. Therefore, the conventional wisdom is to avoid the vector class if you intend on inserting elements anywhere but at (or near) the end.
We have been claiming for some time that measuring the performance of actual programs often yields surprising results. Accordingly, we have taken some simple measurements of the vector, list, and deque classes. In this article, we discuss measurements on only a single C++ implementation, and we use only int elements in our containers. Nevertheless, there are enough points of interest in these measurements to make them worthwhile, and to suggest further experiments.
The Experiment
Most programs that deal with large containers build those containers one element at a time. Accordingly, we want to measure how long it takes to create a container of a given size. The following code fragment is a straightforward way of doing so:
for (int i = 0; i != count; ++i) { std::container-type<int> c; for (int j = 0; j != size; ++j) c.insert(c.end(), j); }
Each iteration of the outer loop creates a container of a given type; the inner loop then appends size consecutive integers to the container. The outer loop runs count times; we shall adjust count so that the overall execution time is reasonable.
Before going any further, we shall pause to say what results we expect. First, it should be obvious that filling a large container will take longer than filling a small container of the same type. For that reason, we shall discuss the execution time per element, rather than per container. In other words, when we run this program fragment, we shall measure its overall execution time, then divide that number by count * size to get an idea of how fast a particular container is.
If we are appending elements to a container, the C++ Standard says that the execution time must be linear in the size of the container (amortized time in the case of vector), so we should expect that the time per element will be about the same for each container type, regardless of the container size. This expectation might be tempered somewhat for small containers, where the overhead of creating the container itself might be greater than the time required to fill it.
If we are inserting elements at the beginning of a vector, of course, we should expect very different results. Inserting at the beginning of a vector requires copying all of the vector's elements, which suggests that the time per element should grow linearly with the size of the container. We would expect this growth to occur only in the case of vector, and not with list or deque.
Let's see how closely practice agrees with theory. As ever, we note that the particular practice we report applies to one implementation on one computer, so you should not expect the same results on your own implementation. Nevertheless, we think that our implementation is typical, and we would not be surprised to find that other implementations behave similarly, if not identically.
Vectors
Our first trial was to repeatedly create a vector of a given size by appending elements to the end. To estimate how much of the overhead comes from reallocation, we also measured how long it took if we preallocated the vector's memory. In other words, our test programs looked like this:
for (int i = 0; i != count; ++i) { std::vector<int> v; // The following line is omitted // in the first version v.reserve(size); for (int j = 0; j != size; ++j) v.insert(v.end(), j); }
This program executes a loop count times. Each time through the loop, it constructs an empty vector named v, uses the inner loop to append size elements to it, and destroys it again. In the first version, the call to reserve is absent, so the vector must grow as needed to accommodate its elements. The second version adds a call to reserve to preallocate enough memory to hold the vector's elements. Table 1 presents our results.
As we expected, time per element is greatest with one-element vectors, and declines by a factor of 100 as the vector's size grows. However, looking at the effect of calling reserve yields our first surprise: reserve makes our program nearly five times as fast for vectors in the 10-100 element range, but doesn't make much difference with either very large or very small vectors. We can understand why there wouldn't be much difference for small vectorsafter all, we already predicted that the overhead for creating the vector itself would dominatebut why isn't there a difference for million-element vectors?
We don't know for sure, but we are guessing that the real difference that using reserve makes is not how many elements are copied, but rather how often memory is allocated and deallocated. We are guessing that the cost of copying elements is small compared to the cost of generating values and storing them, and the cost of allocating memory is substantially greater. That is, if the cost of allocating memory is commensurate with that of copying, say, 100 elements, we would expect that cost to affect the cost of using reserve for medium-sized vectors, but less so for very small or very large ones. For the small vectors, the cost of allocating the vector itself would dominate; for the large ones, the cost of generating values for the elements would do so. If copying were a major factor in the overall cost, we would expect a substantial time difference between using and not using reserve for large vectors, but apparently this is not the case. Indeed, for a vector with 100,000 elements, using reserve actually makes the program slower, though this behavior may just be a measurement anomaly.
We do not know if our guesses are correct, but we shall note them as a good subject for a future experiment.
Our next experiment was to measure insertion at the beginning of the vector to see whether doing so was as slow as we had heard. Table 2 shows the results. Until the vector grows to 100 elements, there isn't much difference between inserting at the beginning and inserting at the end without using reserve. (We found that whether we used reserve made no measurable difference when we were inserting at the beginning.) For a 100-element vector, inserting at the beginning is only about 30 percent slower. For a 1000-element vector, on the other hand, inserting at the beginning costs a factor of 7. After that, the time per element seems to grow proportionally to the size of the vectorexcept for the last measurement. There, the time per element increases by more than twice as much as the number of elements. So we have two surprises: Inserting elements at the beginning of a vector costs less than we would expect for small vectors, and more than we would expect for very large ones.
Again, we do not know the reason for these surprises. If we had to guess, we would say that it must have to do with our computer's hardware cache. Many modern desktop computers have a multilevel cache: A small cache lives on the chip as part of the processor, and a larger one is part of the memory controller. Accordingly, until the vector becomes larger than the small cache, copying elements around is faster than one might expect; once the vector becomes larger than the large cache, copying elements requires actual memory references and can no longer be done in the cache. We shall note this hypothesis as a subject for future experiments.
Lists and Deques
As the last part of our experiment, we would like to see how much it costs to build up a list or a deque one element at a time. Changing our program fragment in the obvious ways, we get the numbers in Table 3. Here, we get yet another surprise: Appending an element to a list costs as much as creating an entire list from scratch. We surmise that the time is dominated by how long it takes to allocate memory for a list element.
Out of curiosity, we also inserted the elements at the beginning of the list; we got identical timings throughout, with one notable exception: The million-element list took 8.5 time units per element instead of 4.3. Evidently, there is some kind of cache phenomenon at work here, too.
The deque measurements show that creating a deque is nearly four times as slow as creating a list or vector. However, appending elements to a deque is usually intermediate in speed between appending to a vector with and without using reserve.
Discussion
Simple measurements, simple operations, simple data structureseven so, the results surprised us:
- Using reserve makes our program much faster with medium-sized vectors, but not with very small or very large ones.
- Inserting elements at the beginning of a small vector costs less than we expect; inserting at the beginning of a very large vector costs more. In other words, the time to build a vector by inserting at the beginning appears to grow quadratically worse than for large vectors.
- Appending an element to a list costs as much as building an entire list from scratchnearly 40 times as much as appending an element to a deque.
- Building a very large list by appending elements to the beginning takes longer than appending elements to the end.
We do not know the reasons for these surprises, but we suspect that they have something to do with our computer's cache. These results suggest future lines of experimentation:
- Can we measure the cache effects more confidently and accurately?
- What if instead of putting int objects in our containers, we use a class that is more expensive to copy?
- Why does it take so long to append an element to a list?
- Do other implementations yield similar results?
Meanwhile, we can draw two general conclusions.
First, these experiments are consistent with the common advice to use vectors unless there is a reason to choose a different kind of container. One such reason is a desire to insert elements at the beginning of the container, in which case a deque would make more sense. It appears to be wise to reserve lists for when you really want to insert and remove elements in the middlenot only because lists appear to be slow, but because they do not offer random access.
The second conclusion is, and remains, the importance of measuring performance for yourself if you care about it. There is no reason to believe that your C++ implementation will give the same results as ours, and we would expect your surprises to differ from ours as well.