Imagine the installment of "Generic<Programming>" you're about to read starting like so: "This article treats memory buffers in C++."
As you hastily close the browser, if you kept your ear attentive, you would hear the subtle noise of tens of thousands of other mouse clicks doing the same. Because who'd be interested at all in an article that treats such a trivial subject as memory buffers?!
This article indeed treats memory buffers in C++, but with two twists: first, the buffers are generic, which means they can contain typed data (as opposed to raw bytes). Second, the buffers we'll talk about are as efficient as their hosted type and the host operating system allows, in every aspect allocation strategy, as well as data manipulation.
As you'll soon see, writing a generic buffer is a cute exercise on templates. Writing an efficient buffer is not too complicated of a task either. Writing a buffer that is exception-safe is more challenging, but still not too hard, especially if you've read the book on exception safety [1]. But writing a buffer that's both generic and efficient is like a long climb up a steep hill. As it often happens with C++, the scenery you enjoy at the top of the hill is well worth the effort if only you could ignore that thing you just stepped in.
The Twilight Zone
Every now and then, someone posts to the Usenet newsgroup comp.lang.c++.moderated a question like: Why doesn't auto_ptr offer you the chance to use delete[] in the destructor instead of delete? The discussion follows a specific pattern:
"You should stay away from C-style arrays and use std::vector, which is efficient and all."
"Well, std::vector is not efficient enough for what I'm trying to do."
"Why isn't it?" etc.
The thing is, std::vector is indeed a very well designed class for handling contiguous sequences of objects. When I first saw std::vector's elegant design, I wasn't compelled at all to throw it away and define my own (a compulsion felt when I saw, for example, certain MFC artifacts). However, std::vector does have some inefficiencies that can seriously bother certain users:
- Unnecessary data initialization. Often, you need to create an appropriately sized vector of a primitive type (such as char) and pass it to a low-level C function that fills it with data read from a socket or file. The problem is that, although you don't need it, the whole vector will be initialized upon construction or resizing. If you have to interface with C functions, there is no way in which you can avoid this overhead.
- Exponential growth. Standard vectors sport fast (constant time on average) growth. This implies that std::vector grows in multiplicative hops most implementations double the allocated size whenever they need to grow automatically. If you want to append an object to a vector containing one million objects, a vector will allocate for a short time enough space to accommodate three million objects; that might not be what you would prefer.
- "Imperialistic" memory allocation strategy. Standard vectors never shrink; they only grow. If you collected three million data samples and you decide to keep one million samples after some interpolation, you'll have slack space worth two million objects. The recommended idiom for such cases is to copy your vector into a fresh new vector and then swap the two vectors:
std::vector<double> data; ... work with data ... // Shrink data std::vector<double>(data.begin(), data.end()).swap(data);
The idiom is less than optimal because you must copy the data. Worse, even this idiom is not 100% guaranteed to make economic use of memory because std::vector can always allocate more memory than needed. It would be better if you could shrink the vector "in-place," that is, just tell the allocator that it can use the memory at the end of your vector. Many memory allocators allow you to do that.
In many applications, some or all of these problems are not annoying at all. But when you start asking for heavy-duty performance, the attractiveness of std::vector might decrease and you might be looking for a structure that is more optimized and gives you more control. The problem is that you won't find any container in the C++ Standard below std::vector except for the C-style string. It's either Cadillac or Geo.
Proof? Well, the proof's right there in the pudding: STL uses contiguous buffers in at least three places (aside from std::vector itself): std::basic_string, std::deque's chunk management, and std::deque's chunks themselves. However, no STL implementation I've seen uses vector as a back-end for these structures. If std::vector is the one-stop-shop for your contiguous data needs, then it must be said that STL implementers shop at a specialty grocer.
There is an interstice, a twilight zone between std::vector and the C-style string. We target this niche. We will develop together a contiguous buffer that:
- is generic can store sequences of any type
- is familiar supports a subset of std::vector's syntax and semantics
- offers total control offers fine-grained primitives in addition to higher-level functions
- generates pedal-to-the-metal code for primitive types and select user-defined types
- allows user-controlled optimizations
- supports advanced allocation facilities, such as in-place expansion and fast reallocation
- most importantly, can be used as a back-end for higher-level contiguous containers in particular, you can use it to implement vectors, strings, or deques
buffer: First Blush
Let's define an interface for a buffer class template. The buffer holds only two pointers the beginning and the end of the controlled memory chunk. To a buffer, there's no difference between size and capacity. We thus start from std::vector's interface and eliminate functions that refer to capacity. We are left with the following set of member functions:
template <class T, class Allocator = std::allocator<T> > class buffer { public: ... all of std::vector's types and functions, except: // size_type capacity() const; // void reserve(size_type n); private: T* beg_; T* end_; };
Interestingly, although buffer has no notion of capacity, it can implement all of std::vector's primitives except capacity and reserve, with the amendment that buffer cannot satisfy the performance requirements of all those functions. For example, buffer<T>::push_back has O(N) complexity, while std::vector<T>::push_back has O(1) complexity if mediated over a large number of calls. (That is what the Standard calls "amortized constant" complexity.) As you'll see down the road, we'll improve buffer<T>::push_back's performance in certain cases, while still not supporting the notion of capacity.
Implementing buffer's interface is not very complicated; the two things that require attention are exception safety [3] and properly destroying the objects. The standard functions std::uninitialized_copy and std::uninitialized_fill are two very helpful primitives.
To allow the user to allocate a buffer without initializing it, we need a special constructor and a couple of helper functions. Then, we need a primitive grow_noinit that grows the buffer without calling the constructors. Consequently, we need the complementary function shrink_nodestroy that shrinks the vector without calling the destructors, and finally, the slightly redundant clear_nodestroy, which clears the buffer and deallocates the memory without invoking the destructors. Here they are:
template <class T, class Allocator = std::allocator<T> > class buffer { ... as above ... public: enum noinit_t { noinit }; buffer(size_type n, noinit_t, const Allocator& a = Allocator()); void grow_noinit(size_type growBy); void shrink_nodestroy(size_type shrinkBy); void clear_nodestroy(); };
This extended interface gives you total control over the buffer and the data that's contained in it. Beware, however, of naively using the extension functions. Consider, for example, you use grow_noinit as follows:
void Fun() { buffer<Widget> someWidgets; ... // add space for 10 Widgets someWidgets.grow_noinit(10); // Initialize the Widgets ConstructWidgets( someWidgets.end() - 10, someWidgets.end()); ... }
The problem is that if the construction of the 10 Widgets fails in any way, all things are not good. When Fun returns, someWidgets' destructor will diligently try to destroy all of its contents it doesn't track constructed and non-constructed Widgets because, again, unlike std::vector, buffer doesn't have the notion of capacity. If there is still any uninitialized memory in the buffer, the effect of applying Widget's destructor to that memory is, obviously, undefined.
With these caveats in mind, it seems like we have a good start. Let's see now how we can optimize buffer.
Type Traits
One key technique for optimizing generic code is to infer information about the types on which the generic code operates. Then, the generic code can dispatch work at compile time to specialized routines that make specific assumptions about the types on which they operate.
For example, in our case, an important piece of information is whether the copy constructor of buffer's hosted type throws an exception or not. For a type that doesn't throw an exception upon copying, the code is simpler as it doesn't have to take care of exception safety. In addition, some compilers generate better code if try blocks are not used.
Type traits are a known technique for deducing information about types. Boost [4] has a library that implements type traits, as well as Loki [5]. (With buffer, we will use a type traits mechanism that's slightly different from those implemented in both Boost and Loki, suggested to me by Vesa Karvonen via email.)
Let's figure out code for deducing whether the copy constructor of a type might throw an exception or not. Quite frankly, figuring this out for any type is not possible in current C++. However, we can make a few steps and let the user help us when they want optimization.
You don't have to be Sherlock Holmes to deduce that the copy constructor of any primitive type cannot throw. Therefore, a conservative assumption is any type's copy constructor might throw an exception, except for primitive types. Here's the corresponding code:
namespace TypeTraits { // Enumerate all primitive types typedef TYPELIST_14( const bool, const char, const signed char, const unsigned char, const wchar_t, const short int, const unsigned short int, const int, const unsigned int, const long int, const unsigned long int, const float, const double, const long double) PrimitiveTypes; template <typename T> struct IsPrimitive { enum { value = Loki::TL::IndexOf<PrimitiveTypes, const T>::value >= 0 }; }; template <typename T> struct IsPrimitive<T*> { enum { value = true }; }; template <typename T> struct CopyThrows { enum { value = !IsPrimitive<T>::value }; }; }
To be more concise, the code above makes use of two amenities provided by Loki: typelists and IndexOf. Typelists allow you to create and manipulate lists of types, and Loki::TL::IndexOf finds an individual type in a typelist and returns its index. The index is negative if the type is not found in the typelist. Finally, TypeTraits::CopyThrows<T>::value contains the needed information.
The mechanism through which type information is inferred is considerably flexible. Imagine you define the following type in an application:
struct Point { int x; int y; ... utility functions ... };
The type Point above is not primitive, but it also doesn't throw an exception upon copying. You can communicate that to our type traits mechanism. All you have to do is to reopen the TypeTraits namespace and put an explicit instantiation of CopyThrows in there:
// In file "Point.h", after Point's definition namespace TypeTraits { template <> struct CopyThrows<Point> { enum { value = false }; }; }
Better yet, you can specify CopyThrows for an entire category of types via partial template specialization. For example, recall the standard complex type, std::complex<T>. You can instantiate std::complex with primitive arithmetic types, but also with a user-defined numeric type, such as Rational or BigInt. Now, because copying an std::complex<T> object consists of copying two T objects (the real and imaginary part), it follows that std::complex<T> has the same CopyThrows trait as T. Here's how you can express that:
namespace TypeTraits { template <class T> struct CopyThrows< std::complex<T> > { enum { value = CopyThrows<T>::value }; }; }
Let's get back to buffer. How does buffer use the CopyThrows information? Dispatching upon a Boolean value at compile time is easy using the Int2Type class template, described in [5] and [6]. To recap, Int2Type has the deceptively simple definition:
template <int v> struct Int2Type { enum { value = v }; };
Here, for example, is how buffer's constructor uses Int2Type to dispatch to an exception-safe or an exception-numb initialization function:
template <class T, class Allocator = std::allocator<T> > class buffer : private Allocator { private: enum { copyThrows = TypeTraits::CopyThrows<T>::value != 0 }; // Exceptionless initialization void Init(size_type n, const T& value, Loki::Int2Type<false>) { ... } // Initialization in the presence of exceptions void Init(size_type n, const T& value, Loki::Int2Type<true>) { ... } public: explicit buffer(size_type n, const T& value = T(), const Allocator& a = Allocator()) { Init(n, value, Loki::Int2Type<copyThrows>()); } };
Other sensible pieces of information about a type that would be interesting to buffer would be:
- Whether the type is MemCopyable, that is, if copying an object is the same as memcpy'ing that object's bits. Obviously, this is the case for primitive types and for POD (Plain Old Data, C-style) structures.
- Whether the type is MemMoveable, that is, if copying an object from a spot to another in memory and then destroying the source object is the same as memcpy'ing the object's bits and not destroying the source. Again, this is the case for primitive types and PODs. But, as you will see soon, a surprisingly rich subset of user-defined types is MemMoveable.
The next installment of "Generic<Programming>" will define MemCopyable and MemMoveable and exploit them in a manner similar to CopyThrows. Does that put the subject to rest? Not at all. As the reallocate and inplace_reallocate artifacts in the downloadable code suggest, we'll travel through the Caudine Forks of memory allocation. Until then comments galore!
Acknowledgement
Many thanks to Tim Sharrock for a thorough review.
Bibliography and Notes
[1] Herb Sutter. Exceptional C++ (Addison-Wesley, 2000).
[2] Acronym for COW Fan Club. In turn, COW stands for Copy-On-Write. COW is the one strategy that is friendly to the move-implemented-as-copy policy used by std::vector. However, many library writers are moving away from COW-based implementations due to the problems they raise in multithreaded programs.
[3] In general, buffer must have the same level of exception safety as vector. See Bjarne Stroustrup, Appendix E: Standard-Library Exception Safety at <www.research.att.com/~bs/3rd_safe0.html>.
[4] Boost is a collection of cutting-edge C++ libraries. See <www.boost.org>.
[5] Loki is the library featured in Modern C++ Design (Addison-Wesley, 2001) written by yours truly. You can download Loki from <www.moderncppdesign.com>.
[6] Andrei Alexandrescu. "Generic<Programming>: Mappings between Types and Values," C/C++ Users Journal C++ Experts Forum, October 2000, <www.cuj.com/experts/1810/alexandr.htm>.
Andrei Alexandrescu is a Development Manager at RealNetworks Inc. (<www.realnetworks.com>), based in Seattle, WA, and author of the acclaimed book Modern C++ Design. He may be contacted at www.moderncppdesign.com. Andrei is also one of the featured instructors of The C++ Seminar (<www.gotw.ca/cpp_seminar>).