There are two novel elements about this month's installment of Generic<Programming>. One is the subject we will talk about implementing the standard library component basic_string (better known as string, which is a convenience typedef for basic_string<char>), an important element of the C++ library. But the truly interesting thing is that the code available for download is especially crafted to work with Visual C++ 6.0, a compiler known for two contradictory things its ubiquity and its weak support for generic programming.
The code accompanying this article implements not one, not two, but twelve basic_strings featuring various trade-offs. They are not toys. We're talking about full-fledged, Standard-compliant, industrial-strength stuff here (er, modulo bugs, of course). You think that that's going to take an awful lot of code? Think twice. Believe me, this article is going to be a lot of fun.
One Size Does Not Fit All
First off, why would anyone bother implementing basic_string at all? It's already implemented by your standard library, so coming up with "yet another" basic_string implementation seems to have educational value only.
Yet, many of those who have been using strings in multithreaded applications know about a difficult problem. The Standard tries to allow copy-on-write implementations of basic_string. (Copy-on-write is fondly called COW by its fans, and "the mad cow" by its opponents.) COW-based strings, which use reference counting internally, are either unusable in a multithreaded application or, if the library implementer supports multithreading, unacceptably slow even in the single-threaded parts of your application. Pick one.
Further problems with COW strings might arise in applications that use dynamic loading of libraries when you free a library, there is a risk that your application might still hold shallow copies of strings allocated in the memory space of that library.
Extensive discussions about the trouble with COW strings [1] [2] have convinced many STL implementers to ditch COW and use alternate optimization strategies for their basic_string implementations. Yet, "most" is not "all," so when programming with threads and basic_string, you must use a non-COW implementation and consequently give up portability of your code across STL implementations.
Furthermore, COW does have its advantages and is quite useful in a large category of applications. Wouldn't it be nice, then, if you could just choose what optimizations to use for a certain string in a certain application? "Here I want a non-COW string featuring the small string optimization for strings up to 16 characters" or "here I'd like to take advantage of COW, and I'd like to use my own heap for allocation."
How to Implement basic_string in 200 Lines of Code
In spite of the usefulness of having multiple string implementations available, building even only one such implementation is a daunting task. Writing all the member functions and type definitions on top of your implementation of choice is certainly not an easy task. I know because I did implement the basic_string interface for an application that needed to use the COM string allocator and multiple threads.
While I was carefully writing all the utility functions as prescribed by the Standard, I noticed an interesting fact. Most member functions seem to gravitate around a small kernel of functions and types in other words, you can decompose the basic_string interface in "core" and "utility." The utility part is the same no matter what implementation strategy you use, while the core part varies drastically among implementations. For example, the replace family consists of utility functions implemented in terms of the core function resize.
And here's the rub: the utility part of basic_string is also the bulkiest one (in my implementation it has over 700 lines of code). In contrast, writing a core implementation, even a sophisticated one, is a much easier task my implementations vary between 75 and 250 lines of code. This means that you can create new implementations easily by fitting different core incarnations under the utility interface. And you don't have to implement the bulk but once. (Actually, not at all, because you can download this column's code, which is eager to be of use.) Really, you are 200 lines of code away from your dream basic_string implementation!
A Policy-Based String
Those of you who have read my book [3] (ah, don't you love marketing plugs) know what our nascent design screams for: policies! Of course, when you want to vary a specific aspect of a class' implementation and want to let the user choose what implementation of that aspect to use, you migrate that aspect into a template parameter and define an interface for it. It's not rocket science, but it is remarkably effective.
The standard basic_string declaration looks like this:
namespace std { template <class E, class T = char_traits<E>, class A = allocator<E> > class basic_string; }
E is the character type of the string (most often, either char or wchar_t), T controls how strings are compared and copied, and A is the allocator that we all know, love, and never use. We will add a fourth template argument that controls the exact implementation of the string. Because it deals with exactly how the string is stored, let's call it the Storage policy.
We call our new string flex_string, because, as you will soon see, it is quite flexible:
template <class E, class T = char_traits<E>, class A = allocator<E> class Storage = AllocatorStringStorage<E, A> > class flex_string;
Storage defaults to AllocatorStringStorage<E, A>, which is a straightforward storage implementation that uses eager copy (sort of an antithesis of a COW). In its implementation, flex_string holds a Storage object and uses its types and member functions.
How exactly you choose the interface of Storage might vary a little. In essence, after fiddling with my basic_string implementation, I found a set of functions without which I could not possibly provide an implementation, and the functions weren't redundant, either. Here's a semi-formal specification of the conditions that a Storage policy implementation must satisfy:
template <typename E, <i>other arguments</i>> class StorageImpl { public: typedef <i>some_type</i> size_type; typedef <i>some_type</i> iterator; typedef <i>some_type</i> const_iterator; typedef <i>some_type</i> allocator_type; StorageImpl(const StorageImpl &); StorageImpl(const allocator_type&); StorageImpl(const E* s, size_type len, const allocator_type& a); StorageImpl(size_type len, E, const allocator_type&); iterator begin(); const_iterator begin() const; iterator end(); const_iterator end() const; size_type size() const; size_type max_size() const; size_type capacity() const; void resize(size_type, E); void reserve(size_type); void swap(StorageImpl&); const E* c_str() const; const E* data() const; allocator_type get_allocator() const; };
That's pretty much it. The specification is quite simple (and would have been even simpler without the allocator, which is a pain in the neck). The idea is that you can implement basic_string's entire interface in an efficient manner by ultimately leveraging Storage's small kernel of types and functions.
The flex_string class holds a Storage object by value. I chose private inheritance for the sake of some minor conveniences. Hence, the flex_string in the code available for download looks like this:
template <class E, class T = std::char_traits<E>, class A = std::allocator<E>, class Storage = AllocatorStringStorage<E, A> > class flex_string : private Storage { public: typedef typename Storage::iterator iterator; typedef typename Storage::const_iterator const_iterator; ... // 21.3.1 construct/copy/destroy explicit flex_string(const A& a = A()) : Storage(a) {} ... };
Implementing the Storage policy
Ok, time to get our hands dirty. Let's churn some Storage implementations. An effective string implementation would hold a pointer to a buffer. In turn, the buffer holds the length and the capacity of the string, plus the string itself. To avoid allocating memory twice (once for the bookkeeping data and once for the data), you might want to use a trick known as "the struct hack": the buffer holds a C-style array of characters as its last element and grows dynamically to accommodate as many characters as needed. This is exactly what SimpleStringStorage does:
template <class E, class A = std::allocator<E> > class SimpleStringStorage { struct Data { E* pEnd_; E* pEndOfMem_; E buffer_[1]; }; Data* pData_; public: size_type size() const { return pData_->pEnd_ - pData_->buffer_; } size_type capacity() const { return pData_->pEndOfMem_ - pData_->buffer_; } ... };
pEnd_ points to the end of the string, pEndOfMem_ points to the end of the allocated buffer, and buffer_ extends to as many characters as the string holds in other words, it "continues" beyond the end of Data's memory. To achieve this flexibility, pData_ does not exactly point to a Data object, but to a larger chunk of memory cast to a Data. This "struct hack" is in theory not 100 percent portable, but in practice, well, it just is.
SimpleStringStorage features another nice little optimization all empty strings are shared and point to a static Data instance. An alternate implementation could initialize pData_ with zero for empty strings, but that would have propagated tests through many member functions.
SimpleStringStorage is "simple" because it is aloof to the allocator passed in. SimpleStringStorage simply uses the standard free store (new/delete) for its memory needs. Using the passed-in allocator for allocating Data objects is harder than it might seem, due partly to the allocator's design (no support for objects of arbitrary size) and partly to compiler compatibility issues. You can find such a politically correct Storage policy implementation in the class template AllocatorStringStorage.
Yet another possible implementation of a string storage is to simply use std::vector as a back-end. The implementation is a slam-dunk, and what you'll get is a lean, mean string that reuses a nicely tuned standard library facility. This also helps in minimizing object code size. You can look up that implementation in VectorStringStorage.
All these three implementations use inheritance to use the EBO (Empty Base Optimization) [4] wherever possible. (Did I mention the "industrial-strength" buzzword?) Using EBO is very effective because most allocators are in fact empty classes.
Exhilarated C++
Ok, so here we are, some 1,300 of lines of code later, already with three nifty basic_string implementations under our belt. That's 433 lines of code per implementation. Not too bad, especially when you think that you can add new implementations quite easily.
If you think that was fun, the article has reached its goal so far. But don't forget that the opening paragraph mentions a lot of fun, which hopefully starts now.
Let's drop in the SSO (small string optimization) [5]. The idea behind SSO is to store small strings right in the string object (not in dynamically-allocated storage). When the size becomes too big to fit inside the string, a dynamic allocation strategy is used. The two strategies share the memory inside string for bookkeeping data. The string class can differentiate between the two mechanisms through some sort of a tag:
template <class E, other parameters> class sso_string { struct DynamicData { ... }; static const unsigned int maxSmallStringLen = 12; union { E[maxSmallStringLen] inlineBuffer_; DynamicData data_; }; bool isSmall_; ... };
If isSmall_ is true, the string is stored right in inlineBuffer_. Otherwise, data_ is valid. The problem is what kind of dynamic allocation strategy to use for DynamicData? An std::vector? A SimpleStringStorage? An AllocatorStringStorage? The answer, of course, is "any of the above and more, please."
It's clear that using SSO is orthogonal on whatever alternate storage you use. Therefore, the SmallStringOpt class template has another storage as a template parameter:
template <class E, unsigned int threshold, class Storage, typename Align = E*> class SmallStringOpt { enum { temp = threshold > sizeof(Storage) ? threshold : sizeof(Storage) }; public: enum { maxSmallString = temp > sizeof(Align) ? temp : sizeof(Align) }; private: union { E buf_[maxSmallString + 1]; Align align_; }; ...implement the Storage policy... };
The buf_ member variable stores either a Storage object or the string itself. But what's that Align business? Well, when dealing with such "seated allocation," you must be careful with alignment issues. Because there is no portable way of figuring out what alignment requirements Storage has, SmallStringOpt accepts a type that specifies the alignment and stores it in the dummy align_ variable.
How does SmallStringOpt make the difference between small and large strings? The last element of buf_ (namely buf_[maxSmallString]) stores the difference between maxSmallString and the actual length of the string for small strings, and a magic number for long strings. For a string of size maxSmallString, buf_[maxSmallString] is zero, which very nicely serves as both null terminator and tag.
You can see a number of tricks, casts, and low-level stuff in SmallStringOpt (we're talking about an optimization here, right?), but in the end the result is remarkable: we can combine SmallStringOpt with any other Storage implementation, including of course SimpleStringStorage, VectorStringStorage, and AllocatorStringStorage. So now we have six implementations of basic_string we multiplied our returns with an incremental effort. (By the way, lots of fun yet?) By now the code is 1,440 lines long, so we went down to 240 lines of code per basic_string implementation. If C++ programming were karate, leveraging multiplied returns on your code investment would be like fighting with multiple opponents at once.
Here's an example the instantiation:
typedef flex_string< char, std::char_traits<char>, std::allocator<char>, SmallStringOpt<char, 16, VectorStringStorage<char, std::allocator<char> > > > String;
specifies a string that combines an std::vector-based storage with the small-string optimization for strings less than at least 16 characters.
Back to COW
Like it or not, you can't ignore COW too many people find the gentle animal useful. For their sake, let's implement a CowString class template that, again, is able to add COW to any other Storage. CowString looks like:
template <class E, class Storage> class CowString { struct Data { Storage s_; unsigned int refs_; }; Data* pData_; public: ... };
Data holds whatever Storage you choose and a reference count. CowString itself contains only a pointer to Data. Multiple CowStrings might point to the same Data object. Whenever a potential change is detected, CowString makes a genuine duplicate of its data.
Now let's take a look at this:
typedef flex_string< char, std::char_traits<char>, std::allocator<char>, SmallStringOpt<char, 5, CowString<char, AllocatorStringStorage<char, std::allocator<char> > > > > String;
What we have here is a string optimized to not use dynamic allocation for strings shorter than five characters. For longer strings, a COW strategy is used over an allocator-based implementation.
CowString doubles again the number of potential instances of flex_string, so now we have twelve implementations at our disposal. Total code amounts to 1,860 lines, or 155 lines of code per implementation. There are actually twenty-four of them if you consider the order in which you apply SmallStringOpt and CowString. However, applying COW to small strings is not likely to be an effective design decision, so you'll always apply SmallStringOpt to CowString and not vice versa.
Conclusion
basic_string is a very baroque component. In spite of that, careful policy-based design can increase your productivity into the stratosphere. Using a handful of policy implementations, you can choose between straight, small-string optimized, and reference-counted basic_string implementation as easy as feeding arguments to a template class. Surgeon General's warning: You might allegedly have a lot of fun while doing all that.
References
[1] Herb Sutter. "Optimizations that Aren't (In a Multithreaded World)," C/C++ Users Journal, June 1999.
[2] Kevlin Henney. "From Mechanism to Method: Distinctly Qualified," C/C++ Users Journal C++ Experts Forum, May 2001, http://www.cuj.com/experts/1905/henney.htm.
[3] Andrei Alexandrescu. Modern C++ Design (Addison-Wesley, 2001).
[4] Andrei Alexandrescu. "Traits on Steroids," C++ Report, June 2000, http://ftp.sj.univali.br/prof/Fernando%20Montenegro/artigos/GenericProgramingCPP02.htm.
[5] Jack Reeves. "String in the Real World Part 2," C++ Report, January 1999, http://www.bleading-edge.com/Publications/C++Report/v9901/Column14.htm.
About the Author
Andrei Alexandrescu is a Development Manager at RealNetworks Inc. (www.realnetworks.com), based in Seattle, WA, and author of the acclaimed book Modern C++ Design. He may be contacted at www.moderncppdesign.com. Andrei is also one of the featured instructors of The C++ Seminar (www.gotw.ca/cpp_seminar).