This column addresses two topics: an update on whats going on in the C++ standards process, and a technical question. For information about the former, see the accompanying "Standards Update" sidebar at the end of this article for breaking news about the 1998 C++ Standards first official update, whose ink should be drying as you read this, and how it affects you.
The technical question is this: how much memory do the various standard containers use to store the same number of objects of the same type T
? To answer this question, we have to consider two major items:
- the internal data structures used by containers like
vector
,deque
,list
,set/multiset
, andmap/multimap
; and - the way dynamic memory allocation works.
Lets begin with a brief recap of dynamic memory allocation and then work our way back up to what it means for the Standard library.
Memory Managers and Their Strategies: a Brief Survey
To understand the total memory cost of using various containers, its important to understand the basics of how the underlying dynamic memory allocation works after all, the container has to get its memory from some memory manager somewhere, and that manager in turn has to figure out how to parcel out available memory by applying some memory management strategy. Here, in brief, are two popular memory management strategies. Further details are beyond the scope of this article; consult your favorite operating systems text for more information:
- General-purpose allocation can provide any size of memory block that a caller might request (the request size, or block size). General-purpose allocation is very flexible, but has several drawbacks, two of which are: a) performance, because it has to do more work; and b) fragmentation, because as blocks are allocated and freed we can end up with lots of little noncontiguous areas of unallocated memory.
- Fixed-size allocation always returns a block of the same fixed size. This is obviously less flexible than general-purpose allocation, but it can be done much faster and doesnt result in the same kind of fragmentation.
In practice, youll often see combinations of the above. For example, perhaps your memory manager uses a general-purpose allocation scheme for all requests over some size S
, and as an optimization provides a fixed-size allocation scheme for all requests up to size S
. Its usually unwieldy to have a separate arena for requests of size 1, another for requests of size 2, and so on; what normally happens is that the manager has a separate arena for requests of multiples of a certain size, say 16 bytes. If you request 16 bytes, great, you only use 16 bytes; if you request 17 bytes, the request is allocated from the 32-byte arena, and 15 bytes are wasted. This is a source of possible overhead, but more about that in a moment.
The obvious next question is, who selects the memory management strategy? There are several possible layers of memory manager involved, each of which may override the previous (lower-level) one:
- The operating system kernel provides the most basic memory allocation services. This underlying allocation strategy, and its characteristics, can vary from one operating systems platform to another, and is most likely to be affected by hardware considerations.
- The compilers default runtime library builds its allocation services, such as C++s
operator
new
and Csmalloc
, upon the native allocation services. The compilers services might just be a thin wrapper around the native ones and inherit their characteristics, or the compilers services might override the native strategies by buying larger chunks from the native services and then parceling those out according to their own methods. - The standard containers and allocators in turn use the compilers services, and possibly further override them to implement their own strategies and optimizations.
- Finally, user-defined containers and/or user-defined allocators can further reuse any of the lower-level services (for example, they may want to directly access native services if portability doesnt matter) and do pretty much whatever they please.
Thus memory allocators come in various flavors, and can or will vary from operating system to operating system, from compiler to compiler on the same operating system, from container to container and even from object to object, say in the case of a vector<int>
object which uses the strategy implemented by allocator<int>
, and a vector<int, MyAllocator>
object which could express-mail memory blocks from Taiwan unless its a weekday night and the Mets are playing, or implement whatever other strategy you like.
"Ill Take Operator New For 200 Bytes, Alex"
When you ask for n bytes of memory using new
or malloc
, you actually use up at least n bytes of memory because typically the memory manager must add some overhead to your request. Two common considerations that affect this overhead are:
1. Housekeeping Overhead
In a general-purpose (i.e., not fixed-size) allocation scheme, the memory manager
will have to somehow remember how big each block is so that it later knows how
much memory to release when you call delete
or free
. Typically
the manager remembers the block size by storing that value at the beginning
of the actual block it allocates, and then giving you a pointer to "your"
memory that is offset past the housekeeping information. (See Figure
1.)

Figure 1: Typical memory allocation using a general-purpose allocator for a request size of n bytes
Of course, this means it has to allocate extra space for that value, which could be a number as big as the largest possible valid allocation and so is typically the same size as a pointer. When freeing the block, the memory manager just takes the pointer you give it, subtracts the number of housekeeping bytes and reads the size, then performs the deallocation.
Of course, fixed-size allocation schemes (i.e., ones that return blocks of a given known size) dont need to store such overhead because they always know how big the block will be.
2. Chunk Size Overhead
Even when you dont need to store extra information, a memory manager
will often reserve more bytes than you asked for because memory is often allocated
in certain-sized chunks. For one thing, some platforms require certain types
of data to appear on certain byte boundaries (e.g., some require pointers to
be stored on 4-byte boundaries) and either break or perform more slowly if theyre
not. This is called alignment, and even plain old built-in C-style arrays
are affected by this need for alignment; see Figure 2.
(Incidentally, this is why its surprisingly tricky to wordsmith the requirement
that "vectors must be contiguous" in the same sense as arrays
in Figure 2, the memory is considered contiguous, so
what is "contiguous," really? See also the "Standards Update"
sidebar at the end of this article.) The C++ Standard
guarantees that all memory allocated via operator new
or malloc
will be suitably aligned for any possible kind of object you might want to store
in it, which means that operator new
and malloc
have to respect
the strictest possible alignment requirement of the native platform.
Alternatively, as described earlier, a fixed-size allocation scheme might maintain memory arenas for blocks of certain sizes that are multiples of some basic size m, and a request for n bytes will get rounded up to the next multiple of m.
Memory and the Standard Containers: the Basic Story
Now we can address the original question: how much memory do the various standard containers use to store the same number of elements of the same type T
? Each standard container uses a different underlying memory structure and therefore imposes different overhead per contained object:
- A
vector<T>
internally stores a contiguous C-style array ofT
objects, and so has no extra per-element overhead at all (besides padding for alignment, of course; note that here "contiguous" has the same meaning as it does for C-style arrays, as shown in Figure 2). - A
deque<T>
can be thought of as avector<T>
whose internal storage is broken up into chunks. Adeque<T>
stores chunks, or "pages," of objects; the actual page size isnt specified in the Standard and depends mainly on how bigT
objects are and on the size choices made by your Standard library implementer. This paging requires thedeque
to store one extra pointer of management information per page, which usually works out to a mere fraction of a bit per contained object; for example, on a system with 8-bit bytes and 4-byteint
and pointers, adeque<int>
with a 4K page size incurs an overhead perint
of 0.03125 bits just 1/32 of a bit. Theres no other per-element overhead becausedeque<T>
doesnt store any extra pointers or other information for individualT
objects. There is no requirement that adeque
s pages be C-style arrays, but thats the usual implementation. - A
list<T>
is a doubly-linked list of nodes that holdT
elements. This means that for eachT
element,list<T>
also stores two pointers, which point to the previous and next nodes in the list. Every time we insert a newT
element, we also create two more pointers, so alist<T>
requires at least two pointers worth of overhead per element. - A
set<T>
(and, for that matter, amultiset<T>
,map<Key,T>
, ormultimap<Key,T>
) also stores nodes that holdT
(orpair<const Key,T>
) elements. The usual implementation of aset
is a tree with three extra pointers per node. Often people see this and think, "why three pointers? isnt two enough, one for the left child and one for the right child?" The reason three are needed is that we also need an up pointer to the parent node, otherwise determining the next element starting from some arbitrary iterator cant be done efficiently. (Besides trees, other internal implementations ofset
are possible; for example, an alternating skip list can be used, which still requires at least three pointers per element in the set [1].)
Table 1 summarizes this basic overhead for each container.
Table 1: Basic overhead per contained object for various containers
Memory and the Standard Containers: the Real World
Now we get to the interesting part: dont be too quick to draw conclusions
from Table 1. For example, judging from just the housekeeping
data required for list
and set
, you might conclude that list
requires less overhead per contained object than set
after all,
list
only stores two extra pointers, whereas set
stores three.
The interesting thing is that this may not be true once you take into consideration
memory sizes. To dig a little deeper, consider Table 2,
which shows the node layouts typically used internally by list
, set/multiset
,
and map/multimap
.

Table 2: Dynamic memory blocks used per contained object for various containers