C++ Made Easier: Which Container Should I Use?
Andrew Koenig and Barbara E. Moo
It seems that deques are often mispronounced and seldom used. Here is wisdom on when to use them, as well as when to use the other two standard sequence containers.
Introduction
Among the C++ Standard librarys great conveniences are container classes that handle collections of data remarkably well in ordinary contexts much better than built-in arrays handle such collections.
For example, a program that works with a bunch of data usually starts out with an empty collection. As the program runs, it either reads or computes pieces of data that it adds to the collection. At some point, it starts referring to the data that it already has, sometimes intermixed with adding additional data to the pile.
Built-in arrays are ill suited to such programs, because they require the programmer to know in advance how many elements the array will contain. Dynamic arrays, allocated by new and deallocated by delete, are little better, because it is hard to change the size of an array once it has been allocated.
On the other hand, the three sequential standard-library containers (vector, list, and deque) all offer an operation named push_back, which efficiently appends an element to the end of a container, making the container grow as necessary to accommodate the new element. Accordingly, programmers do not have to worry in advance about how large the container will become.
This article comes about because the library has three kinds of sequential containers, not just one. Each of these containers has advantages and disadvantages compared to the others. Accordingly, programmers who wish to use the library effectively need to know enough about each kind of container to have an idea about which one to choose.
Vectors
We can think of a vector as being similar to a built-in array, with two important differences:
- A vector is a self-contained object. In addition to its elements, it stores its size. Accordingly, if we pass a vector as an argument to a function, that function can find out how many elements the vector contains without needing an auxiliary argument for the purpose.
- For expansion purposes, a vector will often have extra memory allocated that does not presently contain any elements. Every vector corresponds to a (possibly empty) contiguous region of storage that contains a number of elements. The number of elements that the vector presently contains is called its size; the number of elements that can fit in the allocated storage is called its capacity.
You can set the capacity of a vector explicitly by calling the reserve member. If v is a vector and you call v.reserve(n), then v will have a capacity at least as large as n.
You might think of a vector as looking like Figure 1. From Figure 1, it should be clear how push_back works: if the available storage has any room, push_back allocates enough of that room to hold the additional element and puts it there; otherwise, push_back reallocates the vectors memory. Weve already described how this reallocation works in an earlier article [1], so we wont repeat the description here. It suffices to realize three things:
- Most of the time, push_back is very fast.
- Although push_back is sometimes slow, reallocation happens infrequently enough that on average, it contributes only a small amount of overhead to the entire program. In particular, a program that uses push_back repeatedly to build up a vector one element at a time will never run more than a constant factor more slowly than a similar program that knows in advance how much memory to allocate.
- In addition to the constant time overhead, there is a space overhead associated with push_backs allocating extra memory. Like the time overhead, this space overhead is never worse than a constant multiple of the amount of memory that the vector needs.
In exchange for the moderate overhead associated with push_back, the vector class offers very fast indexing. It is reasonable to expect that v[i] will run as quickly if v is a vector as it would if v were a built-in array.
Like all sequential containers, vector supports the insert and erase operations. However, the requirement that the vectors memory be contiguous means that whenever you use insert or erase, the vector must copy all of the elements that come after the insertion or deletion. Moreover, if the operation is insert and there isnt enough memory, it might be necessary to copy the entire vector. In other words, insert and erase will be slow unless theyre near the end of the vector (and, in the case of insert, unless there is enough room to store another element). Accordingly, vector doesnt support push_front; it supports only push_back. Why support a convenient way of doing something that will almost certainly be inconveniently slow?
In short, a vector supports reasonably fast insertion and deletion at or near the end of the container and very fast random-access indexing. It doesnt support push_front at all. If you know how large a vector is going to become, you can use the reserve member to preallocate an appropriate amount of memory for better performance.
Lists
You can think of a list as being a classical doubly-linked list (see Figure 2). As you would expect from such a data structure, insertions and deletions are quick and easy at any point in the structure. Therefore, list supports both push_back and push_front, and executing insert and erase takes time proportional to the number of elements inserted or erased.
On the other hand, the only way to get at an element of a list is through an iterator. The elements might be scattered throughout memory, so the only way to locate the nth element is to start with the first (or last) element and count. Just as vector doesnt support push_front, list doesnt support []. The rationale is the same: why provide a convenient way of doing something that is known to be slow?
For the same reason, list iterators are bidirectional iterators, in contrast to the random-access iterators that vector provides. Although vector and list are both called sequential containers, list is more rigorously sequential than vector.
A list never needs to allocate the excess memory that a vector uses. Therefore, it avoids that overhead. On the other hand, it does have some space overhead per element in keeping track of the location of the elements neighbors. This overhead is significant if the elements themselves are small, and insignificant if the elements are large.
Deques
A deque, which stands for double-ended queue, is a compromise between list and vector. You can think of it as being a collection of chunks organized into a sequence, with the first and last chunk possibly having room at their outer ends (see Figure 3).
Because the first chunk has room at the beginning and the last chunk has room at the end, deque can support both push_front and push_back. At worst, these operations require the deque to allocate a new chunk a reasonably fast operation as long as the whole chunk doesnt have to be initialized. Insertions and deletions in a deque will be fast if they are at either end. Moreover, the amount of space overhead for a deque will generally not be more than two chunks worth of memory plus a little extra for the auxiliary data structures. Why, then, would one ever want to use anything other than a deque?
One reason is that insertions or deletions in the middle can be slow; in general, inserting or erasing in a deque requires copying all the elements between the insertion or deletion point and one end of the deque. The implementation will choose the nearest end, of course, but if youre in the middle of the deque, there is no way around having to copy half of the elements.
Another reason is that as soon as a deque acquires a single element, it will have to allocate enough memory for its first chunk. Therefore, a program that uses a large number of small containers would be better off with a list or vector than it would be with a deque.
Finally, although deque offers indexing, there is more work involved in indexing a deque than in indexing a vector. Accordingly, programs that do a lot of indexing would be better off using a vector than a deque.
Iterator Invalidation
There is one more important difference between these three kinds of containers, and that is the circumstances under which iterators that refer to container elements become invalid. In general, standard-library iterators that refer to container elements remain valid as long as those elements stay in the same place. Various container operations can cause container elements to move from one part of memory to another; such movement may invalidate any iterators that refer to those elements.
Suppose, for example, that you have a program that uses iterators to keep track of particular elements in a container as the container grows. If your container is a vector, your iterators will become invalid each time the container is reallocated. In other words, if you call push_back and the vectors size is equal to its capacity, you will lose all of the iterators that you had so carefully saved.
In contrast, a list never invalidates iterators as long as the elements continue to exist to which those iterators refer. Of course, erasing a list element, or destroying the entire list, will invalidate iterators but aside from those cases, the iterators will remain valid.
As with a vector, inserting or erasing elements in a deque is permitted to invalidate all iterators that refer to elements of the deque.
Discussion
Table 1 shows the behavior of the sequential containers. This table should make clear several circumstances that might preclude or even force a choice of a particular container:
- If you havent a clue about what kind of container to use, a vector is a good first choice.
- If you want indexing, you cant use a list.
- If you want push_front, you cant use a vector.
- If youre going to create a lot of little containers, you shouldnt use deques.
- If youre going to be doing a lot of indexing, you probably want a vector.
- If your container elements are particularly expensive to copy, you may wish to avoid using a vector.
- If you want to insert or erase elements efficiently in the middle of a container, you must use a list.
- If you expect iterators that refer to container elements to remain valid across insertions and erasures, you must use a list.
What if one of these circumstances doesnt make it obvious which container to use? Then there are two possibilities.
The first possibility is that any of the containers will work reasonably well for your application. In that case, we recommend that you use vector, and that you use iterators instead of indexing unless you find that you need indexing. That way, if you decide to use a different kind of container, it will be easy for you to switch.
The second possibility is that there is a reason that we havent covered that makes one kind of container better than the others. The most likely reason is the low-level performance of the containers on your particular implementation with your particular application. In that case, there is no substitute for doing your own measurements.
Summary
There is no single data structure that is optimum for all applications. Accordingly, the C++ Standard library offers three kinds of sequential containers, each of which does one thing better than the others:
- The vector class offers particularly fast random-access indexing and reasonable performance for inserting and erasing elements at one end.
- The list class offers fast insertion and erasure at any point in the container, but no random access at all.
- The deque class offers reasonably fast random-access indexing and good performance for insertion or erasure at either end.
Each of these classes corresponds in a straightforward way to a classical data structure. Although implementations are not required to use these particular data structures, they are required to meet performance guarantees that those data structures would achieve. Thus these data structures are still a useful guide when you are trying to decide which kind of container to use.
Reference
[1] Andrew Koenig and Barbara E. Moo. C++ Made Easier: How Vectors Grow, C/C++ Users Journal, April 2001.
Andrew Koenig is a member of the Large-Scale Programming Research Department at AT&Ts Shannon Laboratory, and the Project Editor of the C++ standards committee. A programmer for more than 30 years, 15 of them in C++, he has published more than 150 articles about C++ and speaks on the topic worldwide. He is the author of C Traps and Pitfalls and co-author of Ruminations on C++.
Barbara E. Moo is an independent consultant with 20 years experience in the software field. During her nearly 15 years at AT&T, she worked on one of the first commercial projects ever written in C++, managed the companys first C++ compiler project, and directed the development of AT&Ts award-winning WorldNet Internet service business. She is co-author of Ruminations on C++ and lectures worldwide.