Avoiding code bloat
Nathan has worked on the ISO/ANSI C++ Standard since 1993, designing most of what is in Chapter 22 of the Draft Standard. He can be contacted at http://www.cantrip.org/.
The (Draft) Standard C++ Library is filled with useful templates, including extended versions of those found in the Standard Template Library (STL). These templates offer great flexibility, yet they are optimized for best performance in the common case. Not only can you use these templates directly, but they can stand as examples of effective design, and as a source of inspiration for ways to make your own components efficient and flexible.
Some of the ways that they offer flexibility involve "empty" classes -- classes with no data members. Ideally, these empty classes should consume no space at all. They typically declare typedefs or member functions, and you can replace them with your own classes (which might not be empty) to handle special needs. The default empty class is by far the most commonly used, however, so this case must be optimal.
Due to an unfortunate detail of the language definition, instances of empty classes usually occupy storage. In members of other classes, this overhead can make otherwise small objects large enough to be unusable in some applications. If this overhead couldn't be avoided in the construction of the Standard Library, the cost of the library's flexibility would drive away many of its intended users. But the optimization techniques used in the Standard library limit this overhead, and can be equally useful in your own code.
Empty Member Bloat
In the Standard C++ Library, each STL Container constructor takes as a parameter, and copies, an allocator object. Whenever the container needs storage, it asks its allocator member. In this way, a user with some special memory to allocate (such as a shared memory region) can define an allocator for it, and pass that to a container constructor, so that container elements will be placed there. In the common case, however, the standard default allocator is used, which just delegates to the global operator new. It is an empty object.
The listings resent simplified code for a possible implementation of these components. In Listing One, the standard default allocator allocator<> has only function members. In Listing Two, the generic list<> container template keeps a private alloc_ member, copied from its constructor argument. The list<> constructor is declared "explicit" so that the compiler will not use it as an automatic conversion (this is a recent language feature).
The member list<>::alloc_ usually occupies four bytes in the object, even in the default case where Alloc is the empty class allocator<T>.
A few extra bytes for the list object doesn't seem so bad until you imagine a big vector of these lists, as in a hash table. Any extra junk in each list is multiplied, and reduces the number of list headers that fit in a virtual memory page. Wasted space makes slower programs.
Empty Objects
How can this overhead be avoided? First, you need to know why the overhead is there. The Standard C++ language definition says:
A class with an empty sequence of [data] members and base class objects is an empty class. Complete objects and member subobjects of an empty class type shall have nonzero size.
Why must objects with no member data occupy storage? Consider Listing Three. If sizeof(Bar) were zero, f.b and the elements of f.a[] might all have the same address. If you were keeping track of separate objects by their addresses, f.b and f.a[0] would appear to be the same object. The C++ standards committee chose to finesse these issues by forbidding zero-sized addressable objects.
Still, why does an empty member take up so much space (four bytes, in our example)? On all the compilers I know of, sizeof(Bar) is 1. However, on most architectures, objects must be aligned according to their type. For example, if you declare
struct Baz { Bar b; int* p; };
on most architectures today, sizeof(Baz) is 8. This is because the compiler adds "padding" so that member Baz::p doesn't cross a memory word boundary; see Figure 1(a).
How can you avoid this overhead? The Draft says (in a footnote) that "a base class subobject of an empty class type may have zero size." In other words, if you declared Baz2 like this,
struct Baz2 : Bar { int* p; };
then a compiler is allowed to reserve zero bytes for the empty base class Bar. Hence, sizeof(Baz2) can be just 4 on most architectures; see Figure 1(b).
Compiler implementers are not required to do this optimization, and many don't -- yet. However, you can expect that most standard-conforming compilers will, because the efficiency of so many components of the Standard Library (not only the containers) depends on it.
Eliminating Bloat
This principle eliminates the space overhead. How do you apply it? Let's consider how to fix the implementation of our example template list<>. You could just derive from Alloc, as in Listing Four, and it would work, mostly. Code in the list<> member functions would get storage by calling this->allocate() instead of alloc_.allocate(). However, the Alloc type supplied by the user is allowed to have virtual members, and these could conflict with a list<> member. (Imagine a private member void list<>::init(), and a virtual member bool Alloc::init().)
A better approach is to package the allocator with a list<> data member, such as the pointer to the first list node (as in Listing Five), so that the allocator's interface cannot leak out.
Now, list<> members get storage by calling head_.allocate(), and mention the first list element by calling head_.p. This works perfectly, there's no unnecessary overhead of any kind, and users of list<> can't tell the difference. Like most good optimizations, it makes the implementation a bit messier, but doesn't affect the interface.
A Packaged Solution
There is still room for improvement and, as usual, the improvement involves a template. In Listing Six, I've packaged the technique so that it is clean and easy o use.
The declaration of list<> in Listing Seven is no bigger or messier than the unoptimized version I started with. Any other library component (Standard or otherwise) can use BaseOpt<> just as easily. The member code is only slightly messier; while it's not immediately obvious what's going on, the declaration of BaseOpt<> provides a natural place to document the technique and the reasons for it.
It is tempting to add members to BaseOpt<>, but that would not improve it: They could conflict with members inherited from the Base parameter, just as in Listing Four.
Finally
This technique can be used with any compiler that has sturdy template support. Not all C++ compilers support the empty-base optimization yet (the Sun, HP, IBM, and Microsoft compilers do), but the technique costs nothing extra even on those that don't. When you get a standard-conforming compiler, it probably will do the optimization, and if your code uses this technique, it will automatically become more efficient.
Acknowledgment
Fergus Henderson contributed an essential refinement to this technique.
Listing One
template <class T>class allocator { // an empty class static T* allocate(size_t n) { return (T*) ::operator new(n * sizeof T); } . . . };
Listing Two
template <class T, class Alloc = allocator<T> >class list { Alloc alloc_; struct Node { . . . }; Node* head_; public: explicit list(Alloc const& a = Alloc()) : alloc_(a) { . . . } . . . };
Listing Three
struct Bar { };struct Foo { struct Bar a[2]; struct Bar b; }; Foo f;
Listing Four
template <class T, class Alloc = allocator<T> >class list : private Alloc { struct Node { . . . }; Node* head_; public: explicit list(Alloc const& a = Alloc()) : Alloc(a) { . . . } . . . };
Listing Five
template <class T, class Alloc = allocator<T> >class list { struct Node { . . . }; struct P : public Alloc { P(Alloc const& a) : Alloc(a), p(0) { } Node* p; }; P head_; public: explicit list(Alloc const& a = Alloc()) : head_(a) { . . . } . . . };
Listing Six
template <class Base, class Member>struct BaseOpt : Base { Member m; BaseOpt(Base const& b, Member const& mem) : Base(b), m(mem) { } };
Listing Seven
template <class T, class Alloc = allocator<T> >class list { struct Node { . . . }; BaseOpt<Alloc,Node*> head_; public: explicit list(Alloc const& a = Alloc()) : head_(a,0) { . . . } . . . }; </p>
DDJ
Copyright © 1997, Dr. Dobb's Journal