C/C++

The Empty Member C++ Optimization

By Nathan Myers, August 01, 1997

The STL is flexible, but does it use too much memory? Nathan shows how the STL-and you-can use empty subobjects without bloating your data requirements.

Dr. Dobb's Journal August 1997: The Empty Member C++ Optimization

Avoiding code bloat

Nathan has worked on the ISO/ANSI C++ Standard since 1993, designing most of what is in Chapter 22 of the Draft Standard. He can be contacted at http://www.cantrip.org/.

The (Draft) Standard C++ Library is filled with useful templates, including extended versions of those found in the Standard Template Library (STL). These templates offer great flexibility, yet they are optimized for best performance in the common case. Not only can you use these templates directly, but they can stand as examples of effective design, and as a source of inspiration for ways to make your own components efficient and flexible.

Some of the ways that they offer flexibility involve "empty" classes -- classes with no data members. Ideally, these empty classes should consume no space at all. They typically declare typedefs or member functions, and you can replace them with your own classes (which might not be empty) to handle special needs. The default empty class is by far the most commonly used, however, so this case must be optimal.

Due to an unfortunate detail of the language definition, instances of empty classes usually occupy storage. In members of other classes, this overhead can make otherwise small objects large enough to be unusable in some applications. If this overhead couldn't be avoided in the construction of the Standard Library, the cost of the library's flexibility would drive away many of its intended users. But the optimization techniques used in the Standard library limit this overhead, and can be equally useful in your own code.

Empty Member Bloat

In the Standard C++ Library, each STL Container constructor takes as a parameter, and copies, an allocator object. Whenever the container needs storage, it asks its allocator member. In this way, a user with some special memory to allocate (such as a shared memory region) can define an allocator for it, and pass that to a container constructor, so that container elements will be placed there. In the common case, however, the standard default allocator is used, which just delegates to the global operator new. It is an empty object.

The listings resent simplified code for a possible implementation of these components. In Listing One, the standard default allocator allocator<> has only function members. In Listing Two, the generic list<> container template keeps a private alloc_ member, copied from its constructor argument. The list<> constructor is declared "explicit" so that the compiler will not use it as an automatic conversion (this is a recent language feature).

The member list<>::alloc_ usually occupies four bytes in the object, even in the default case where Alloc is the empty class allocator<T>.

A few extra bytes for the list object doesn't seem so bad until you imagine a big vector of these lists, as in a hash table. Any extra junk in each list is multiplied, and reduces the number of list headers that fit in a virtual memory page. Wasted space makes slower programs.

Empty Objects

How can this overhead be avoided? First, you need to know why the overhead is there. The Standard C++ language definition says:

A class with an empty sequence of [data] members and base class objects is an empty class. Complete objects and member subobjects of an empty class type shall have nonzero size.

Why must objects with no member data occupy storage? Consider Listing Three. If sizeof(Bar) were zero, f.b and the elements of f.a[] might all have the same address. If you were keeping track of separate objects by their addresses, f.b and f.a[0] would appear to be the same object. The C++ standards committee chose to finesse these issues by forbidding zero-sized addressable objects.

Still, why does an empty member take up so much space (four bytes, in our example)? On all the compilers I know of, sizeof(Bar) is 1. However, on most architectures, objects must be aligned according to their type. For example, if you declare

struct Baz {
   Bar b;
   int* p;
};

on most architectures today, sizeof(Baz) is 8. This is because the compiler adds "padding" so that member Baz::p doesn't cross a memory word boundary; see Figure 1(a).

How can you avoid this overhead? The Draft says (in a footnote) that "a base class subobject of an empty class type may have zero size." In other words, if you declared Baz2 like this,

struct Baz2 : Bar {
   int* p;
};

then a compiler is allowed to reserve zero bytes for the empty base class Bar. Hence, sizeof(Baz2) can be just 4 on most architectures; see Figure 1(b).

Compiler implementers are not required to do this optimization, and many don't -- yet. However, you can expect that most standard-conforming compilers will, because the efficiency of so many components of the Standard Library (not only the containers) depends on it.

Eliminating Bloat

This principle eliminates the space overhead. How do you apply it? Let's consider how to fix the implementation of our example template list<>. You could just derive from Alloc, as in Listing Four, and it would work, mostly. Code in the list<> member functions would get storage by calling this->allocate() instead of alloc_.allocate(). However, the Alloc type supplied by the user is allowed to have virtual members, and these could conflict with a list<> member. (Imagine a private member void list<>::init(), and a virtual member bool Alloc::init().)

A better approach is to package the allocator with a list<> data member, such as the pointer to the first list node (as in Listing Five), so that the allocator's interface cannot leak out.

Now, list<> members get storage by calling head_.allocate(), and mention the first list element by calling head_.p. This works perfectly, there's no unnecessary overhead of any kind, and users of list<> can't tell the difference. Like most good optimizations, it makes the implementation a bit messier, but doesn't affect the interface.

A Packaged Solution

There is still room for improvement and, as usual, the improvement involves a template. In Listing Six, I've packaged the technique so that it is clean and easy o use.

The declaration of list<> in Listing Seven is no bigger or messier than the unoptimized version I started with. Any other library component (Standard or otherwise) can use BaseOpt<> just as easily. The member code is only slightly messier; while it's not immediately obvious what's going on, the declaration of BaseOpt<> provides a natural place to document the technique and the reasons for it.

It is tempting to add members to BaseOpt<>, but that would not improve it: They could conflict with members inherited from the Base parameter, just as in Listing Four.

Finally

This technique can be used with any compiler that has sturdy template support. Not all C++ compilers support the empty-base optimization yet (the Sun, HP, IBM, and Microsoft compilers do), but the technique costs nothing extra even on those that don't. When you get a standard-conforming compiler, it probably will do the optimization, and if your code uses this technique, it will automatically become more efficient.

Acknowledgment

Fergus Henderson contributed an essential refinement to this technique.

Listing One

template <class T>class allocator {   // an empty class
  static T* allocate(size_t n)
    { return (T*) ::operator new(n * sizeof T); }
  . . .
};

Back to Article

Listing Two

template <class T, class Alloc = allocator<T> >class list {
  Alloc alloc_; 
  struct Node { . . . };
  Node* head_;      
 public:
  explicit list(Alloc const& a = Alloc())
    : alloc_(a) { . . . }
  . . .
};

Back to Article

Listing Three

struct Bar { };struct Foo {
  struct Bar a[2];
  struct Bar b;
};
Foo f;

Back to Article

Listing Four

template <class T, class Alloc = allocator<T> >class list : private Alloc {
  struct Node { . . . };
  Node* head_;      
 public:
  explicit list(Alloc const& a = Alloc())
    : Alloc(a) { . . . }
  . . .
};

Back to Article

Listing Five

template <class T, class Alloc = allocator<T> >class list {
  struct Node { . . . };
  struct P : public Alloc {
    P(Alloc const& a) : Alloc(a), p(0) { }
    Node* p;
  };
  P head_;
 public:
  explicit list(Alloc const& a = Alloc())
      : head_(a) { . . . }
  . . .
};

Back to Article

Listing Six

template <class Base, class Member>struct BaseOpt : Base {
  Member m;
  BaseOpt(Base const& b, Member const& mem) 
   : Base(b), m(mem) { }
};

Back to Article

Listing Seven

template <class T, class Alloc = allocator<T> >class list {
  struct Node { . . . };
  BaseOpt<Alloc,Node*> head_;
 public:
  explicit list(Alloc const& a = Alloc())
    : head_(a,0) { . . . }
  . . .
};


</p>

Back to Article

DDJ

1 2 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

C/C++

The Empty Member C++ Optimization

Avoiding code bloat

Empty Member Bloat

Empty Objects

Eliminating Bloat

A Packaged Solution

Finally

Acknowledgment

Listing One

Listing Two

Listing Three

Listing Four

Listing Five

Listing Six

Listing Seven

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

C/C++

The Empty Member C++ Optimization

Avoiding code bloat

Empty Member Bloat

Empty Objects

Eliminating Bloat

A Packaged Solution

Finally

Acknowledgment

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content