![]() |
We continue working on our Variant
design and implementation. This time we focus on storage - how data is stored and accessed. This article presents techniques for computing alignment of types with reasonable accuracy and portability. Then, once the storage proper is in place, we need polymorphic means to manipulate that storage - for which Variant
uses the fake vtable
June 01, 2002
URL:http://drdobbs.com/cpp/discriminated-unions-ii/184403828
You know the one about the syntactic sugar? It causes cancer of the semicolon [1]? Well, enough jokes for today. We have a lot of ground to cover today, so let's hit the road early.
This installment continues the implementation of discriminated unions in C++. Today we'll finish the discussion on alignment and proceed full-speed to writing some code for the actual implementation of Variant
.
Before that, let's refresh our memory on the progress we made in the previous installment of "Generic<Programming>" [2].
After putting together a wish list for implementing discriminated unions, we discussed what would be the best storage model. After stumbling upon alignment issues, we figured out that a storage model that is at the same time properly aligned, efficient, and exception-friendly is:
union { unsigned char buffer_[neededSize]; Align dummy_; };
where neededSize
is the maximum size of the types in the union, and Align
is a POD [3] type that ensures proper alignment.
Depending on how well chosen Align
is, the discriminated union's storage can vary from optimal to unnecessarily large to defective.
Even with the best Align
, the implementation above is still not 100-percent portable for all types. In theory, someone could implement a compiler that respects the Standard but still does not work properly with discriminated unions. This is because the Standard does not guarantee that all user-defined types ultimately have the alignment of some POD type. Such a compiler, however, would be more of a figment of a wicked language lawyer's imagination, rather than a realistic language implementation.
That said, several effective alignment computation algorithms can be devised. You can see a couple implemented in the Boost libraries [4]. One that works well for computing the alignment of a type T
is:
T
.Align
is a C union
containing each of the types in the collection computed in step 2.
The basic idea underlying this algorithm is that any user-defined type T
will ultimately contain only primitive types. Consequently, T
's alignment requirements will be the same as one of those types. Larger types have larger alignment requirements; hence it's reasonable to infer that an upper bound of T
's alignment is the alignment of its largest member.
This algorithm might "overalign" the result. For example, if T
is char[5]
and sizeof(int)
is four, T
might have an alignment of one, but Align
will have the alignment of an int
(most likely, four).
Erring on the large side of alignment is not that bad in most cases. (Erring on the small side is disastrous.) On the other hand, erring on the large side of size wastes space. Our algorithm's step 2 ensures that only types of size less than or equal to T
's size remain selected.
To implement the algorithm for computing Align
, recall from the previous installment of "Generic<Programming>" that we have two appropriate tools at our disposal: typelists and ConfigurableUnion
. Typelists allow us to manipulate collections of types, allowing us to perform steps 1 and 2. ConfigurableUnion
generates a C-style union
from a typelist, thus taking care of step 3.
It is worth noting that, in addition to all primitive types, the initial collection of types should include some simple struct
s as well. Each of those struct
s contains only one member of a primitive type. We generate these stub structures from a simple template:
template <typename U> struct Structify { U dummy_; };
The reason for introducing these little structures is simple. Some compilers put tougher alignment on a structure containing only one int
than on the int
alone. This way, the compiler writer ensures that all user-defined types have the same alignment requirements. In turn, this architectural decision makes things easy for various parts of the compiler.
We thus start with a typelist containing all primitive types and all "structified" types:
class Unknown; typedef cons< char, short int, int, long int, float, double, long double, char*, short int*, int*, long int*, float*, double*, long double*, void*, Unknown (*)(Unknown), Unknown* Unknown::*, Unknown (Unknown::*)(Unknown), Structify<char>, Structify<short int>, Structify<int>, Structify<long int>, Structify<float>, Structify<double>, Structify<long double>, Structify<char*>, Structify<short int*>, Structify<int*>, Structify<long int*>, Structify<float*>, Structify<double*>, Structify<long double*>, Structify<void*>, Structify<Unknown (*)(Unknown)>, Structify<Unknown* Unknown::*>, Structify<Unknown (Unknown::*)(Unknown)> >::type TypesOfAllAlignments;
Okay, so the primitive types are there; the structified types are there, too very predictable. But some lines of the typelist above look as if the copyeditor's cat did a little Teutonic dance on the keyboard. The incriminating lines are:
Unknown (*)(Unknown) Unknown* Unknown::* Unknown (Unknown::*)(Unknown)
Also, their structified versions appear at the bottom of the list.
Who are these guys, and where did they come from? To make them look more familiar, let's give them names:
Unknown (*T1)(Unknown); Unknown* Unknown::* T2; Unknown (Unknown::* T3)(Unknown);
Aha! If you precede each of the lines above with typedef
, they reveal themselves in all the tired glory of the C declaration syntax. T1
is a pointer to a function taking Unknown
and returning Unknown
; T2
is a pointer to member of an Unknown
, member of type Unknown*
; and T3
is a pointer to member function of Unknown
, taking an Unknown
as an argument and returning an Unknown
as a result.
The trick here is that the cunningly named Unknown
is indeed not defined. Therefore, the compiler will assume that Unknown
is defined elsewhere and that it has the worst alignment requirements possible. (Otherwise, the compiler can optimize Unknown
's layout. Optimization here plays against generality.)
Okay, here comes the fun part. Let's now eliminate from TypesOfAllAlignments
all types that are larger than a given size, by writing the following template:
template <class TList, size_t size> struct ComputeAlignBound;
First, we treat the limited case, as a nice recursive algorithm ought to:
template <size_t size> struct ComputeAlignBound<null_typelist, size> { typedef null_typelist Result; };
Then, we treat the general case:
template <class Head, class Tail, size_t size> struct ComputeAlignBound<typelist<Head, Tail>, size> { typedef typename ComputeAlignBound<Tail, size>::result TailResult; typedef typename select< // details below sizeof(Head) <= size, typelist<Head, TailResult>, TailResult>::result Result; };
First, ComputeAlignBound
yields the result of its recursive application to the tail of the typelist in TailResult
. Now, if the head's size is less than or equal to size
, the result is a typelist formed by Head
and TailResult
. Otherwise, the result is only TailResult
Head
is no longer included.
This type selection is performed by a little template entitled select
. To save column space, please allow me to refer you to [5]. It's a very useful little template, and if you are interested in generic programming using C++, you owe it to yourself to find out about it.
To put it all together, we need to tie up some loose ends. Let's recap what we have and what we need. We have the MaxSize
template, which calculates the maximum size of all types in a typelist. We have ConfigurableUnion
, which builds a union
out of a typelist. Finally, we have ComputeAlignBound
, which calculates a type that has the same or more stringent alignment requirements as any type in a typelist. We need a POD type that provides a properly aligned storage for any type in a typelist. Here's how to exploit what we have to obtain what we need:
template <typename TList> class AlignedPOD { enum { maxSize = MaxSize<TList>::result }; typedef ComputeAlignBound<TypesOfAllAlignments, maxSize>::Result AlignTypes; public: typedef ConfigurableUnion<AlignTypes> Result; };
You can play encapsulation games by putting ComputeAlignBound
, Unknown
, Structify
, and ConfigurableUnion
in some private namespace so that they are properly hidden.
Let's get back to discriminated unions. At this point, half of this article's real estate has been spent on computing alignment. I hope that it was a well-invested effort; alignment is important in a host of low-level applications, as well as in the efficient implementation of higher-level abstractions.
Now that we have storage for the objects held by the union, we need the discriminator the tag that Variant
stores and that tells what type the object inside really is. The discriminator must make it possible to perform certain type-specific operations on the contents of the Variant
, such as type identification and data manipulation in a type-safe manner.
We have many design options. The simplest is to store an integral discriminator:
template <class TList, class Align = AlignedPOD<TList>::Result> class Variant { union { char buffer_[size]; Align dummy_; }; int discriminator_; public: ... };
The disadvantage of this solution is that integers are not very "smart." To do different things depending on the discriminator, the switch
solution rears its head, and switch
means coupling. Perhaps we could use the int
as an index into some table, but then why not directly use pointers into that table (a solution that we'll investigate later).
The second solution is to use a proxy polymorphic object.
template <class TList, class Align = AlignedPOD<TList>::Result> class Variant { union { char buffer_[size]; Align dummy_; }; struct ImplBase { virtual type_info& TypeId() = 0; virtual Variant Clone(const Variant&) = 0; virtual void Destroy(Variant&) = 0; ... }; ImplBase* pImpl_; public: ... };
The idea here is to perform various operations on the same data by using a pointer to a polymorphic object. Then, depending on what actually lies in buffer_
, the pointer points to a different concrete ImplBase
, and the polymorphic object performs specialized operations on it.
This approach is indeed very clean, except for one thing: efficiency. When invoking a function such as Destroy
, the access is:
Variant
object.pImpl_
.pImpl_
's virtual function table, the so-called "vtable
" [6].vtable
.
Indexing with a constant into the vtable
and accessing a field in an object (which boils down to indexing as well) are not problematic. The problem is dereferencing pointers there are two levels of indirection between issuing the call and getting to the function. However, one level of indirection should be enough to dispatch a call to the correct type handler. What to do?
(Allow me a parenthetical remark. If you wonder whether I forgot the two laws of optimizations: (1) don't do it and (2) don't do it yet well, I still remember them. Why, then, is Variant
so keen on optimization? The reason is simple. This is not application code; this is a library artifact, and more so, it is one that emulates and rivals a language feature. The more streamlined Variant
is, the more you can rely on it for heavyweight duties; the more sloppily it is implemented, the more it becomes a toy that you wouldn't care to use in a real application.)
A good approach would be to simulate what the compiler does: fake a vtable
and ensure only one level of indirection. This leads us to the following code:
template <class TList, class Align = AlignedPOD<TList>::Result> class Variant { union { char buffer_[size]; Align dummy_; }; struct VTable { const std::type_info& (*typeId_)(); void (*destroy_)(const Variant&); void (*clone_)(const Variant&, Variant&); }; VTable* vptr_; public: ... };
The VTable
structure contains pointers to various functions, functions that access the Variant
object. (Be wary of the syntax for defining destroy_
and clone_
; they are pointers to functions stored as regular data members inside VTable
. The C declaration syntax strikes again.)
The fake vtable
idiom offers, at the cost of some implementation effort, only one level of indirection and great flexibility in manipulating the Variant
object. Let's now see how we can initialize and use the fake vtable
.
Upon construction, Variant
needs to initialize its vptr_
member. In addition, it should properly initialize each pointer inside the VTable
. To this end, let's define a template class VTableImpl<T>
. That template defines static functions that match the types of the pointers to functions in VTable
.
... inside Variant ... template <class T> struct VTableImpl { static const std::type_info& TypeId() { return typeid(T); } static void Destroy(const Variant& var) { const T& data = *reinterpret_cast<const T*>(&var.buffer_[0]); data.~T(); } static void Clone(const Variant& src, Variant& dest) { new(&dest.buffer_[0]) T( *reinterpret_cast<const T*>(&src.buffer_[0])); dest.vptr_ = src.vptr_; } };
Notice a couple of interesting aspects in VTableImpl
's implementation:
Variant
object to manipulate by reference as their first argument.T
, VTableImpl
obtains it by casting &buffer_[0]
to T*
. In other words, all functions in VTableImpl
assume that there's a T
seated inside buffer_
.
It is easy to put the function pointers and the type stored in buffer_
in sync this is the job of the constructor:
... inside Variant ... template <class T> Variant(const T& val) { new(&buffer_[0]) T(val); static VTable vtbl = { &VTableImpl<T>::TypeId, &VTableImpl<T>::Destroy, &VTableImpl<T>::Clone, }; vptr_ = &vtbl; }
Yes, it's that easy. We create a copy of the incoming val
and comfortably seat it (by using the placement new
operator) inside buffer_
. (We worked hard to ensure that buffer_
is properly aligned to hold a T
.) Then, we create a static VTable
right on the spot and initialize it with pointers to the three corresponding static functions in VTableImpl
. All that is left to do is to initialize vptr_
with the address of the newly created vtable
, and that's all there is to it.
Stop that's not really all. Consider this:
typedef Variant<cons<int, double>::type> Number; string s("Hello, World!"); Number weird(s);
The code above compiles fine because it successfully instantiates Variant
's constructor with string
. However, the intent of Number
is obviously to accept only int
or double
upon construction. So we need to ensure that Variant
's templated constructor cannot be instantiated with any other type than those listed in Variant
's typelist.
Here two helper tools kick in: typelist manipulation, which makes it possible to compute at compile time the index of a type in a typelist; any typelist package contains such a facility. The other tool is compile time assertions a feature that allows you to render code non-compilable if a logical condition is not met. I won't bloat the article by repeating the whole story. Both Boost [7] and Loki [8] have typelist manipulation and compile-time assertions that vary only a little in syntax.
If you want to enforce the restriction starting from the first principles, here's a little tool that will work:
template <class TList, typename T> struct EnsureOccurence { typedef EnsureOccurence<typename TList::tail, T>::Result Result; }; template <class T, class U > struct EnsureOccurence<typelist<T, U>, T> { typedef T Result; // can be T or whatever };
If you instantiate:
typedef EnsureOccurence<cons<int, double>::type, double>::Result Test;
then the first specialization gets instantiated and instantiates recursively the same template with the tail of the list. By this point, the second specialization matches and "extinguishes" the recursion.
If, on the other hand, you try to write:
typedef EnsureOccurence<cons<int, double>::type, long int>::Result Test;
then the second specialization of EnsureOccurence
never matches; the recursive instantiation exhausts the typelist, by which point the compiler tries to instantiate null_typelist::tail
, a type that does not exist.
So here's the revamped templated constructor of Variant
:
... inside Variant ... template <class T> Variant(const T& val) { typedef EnsureOccurence<TList, T>::Result Test; new(&buffer_[0]) T(val); static VTable vtbl = { &VTableImpl<T>::TypeId, &VTableImpl<T>::Destroy, &VTableImpl<T>::Clone, }; vptr_ = &vtbl; }
How about the default constructor? There is a temptation to ban it on grounds of minimalism, but a class without a proper default constructor, is, so to say, too "minimalistic." For one thing, you can't store Variant
s in STL containers.
A decision that makes sense is to initialize the Variant
with a default-constructed object whose type is the first type in the typelist. You can't leave the Variant
uninitialized, and defining the constructor that way lets the user place the "cheapest" type in the front of the list.
... inside Variant ... Variant() { typedef typename TList::Head T; new(&buffer_[0]) T(val); static VTable vtbl = { &VTableImpl<T>::TypeId, &VTableImpl<T>::Destroy, &VTableImpl<T>::Clone, }; vptr_ = &vtbl; }
Eliminating the code duplication between the templated and the default constructor is left as an interesting exercise to the reader.
Let's see how to expose the functionality offered by VTableImpl
to Variant
's user. For example, the function that gets the type identifier of a Variant
object looks like this:
template <class TList, class Align = AlignedPOD<TList>::Result> class Variant { ... public: const std::type_info& TypeId() const { return (vptr_->typeId_)(); } ... };
The code accesses the typeId_
pointer to function from vptr_
, and invokes the function it points to, in a single shot. Now the function that returns a typed pointer to the contents of the Variant
object is only six lines of code away:
... inside Variant ... template <typename T> T* GetPtr() { return TypeId() == typeid(T) ? reinterpret_cast<T*>(&buffer_[0]) : 0; }
(Well, in the actual implementation, you would write the corresponding const
function as well.)
The destructor is even easier:
... inside Variant ... ~Variant() { (vptr_->destroy_)(*this); }
By now we have a nice little core Variant
template, able to create and destroy objects properly, as well as extract the data stored in a type-safe manner. The little class is hungry for extensions, as we'll see in the next installment of "Generic<Programming>".
There are several conclusions to draw from this article. One is that, like software projects, articles on software can run over budget. I initially planned to dedicate one column to discriminated unions. After carefully analyzing the level of detail necessary, I was sure (and advertised) two columns for the task. Now I'm forced to dedicate three columns (or more, no promise this time) to the subject.
However, my belief is that the issues are important enough to be described in detail.
Computing alignment portably is hard, but feasible. It never is 100-percent portable. The task reclaims a language feature. Alignment is important to anyone who wants to implement custom seating for generic types (as Variant
does) or to deal directly with memory in various other ways.
The fake vtable
idiom has you write by hand what a compiler generates automatically a structure containing pointers to functions. It also prescribes the explicit presence of a pointer (vptr_
) to such a vtable
in your objects and has you initialize the pointer properly. Templates make it possible to automate vtable
generation.
In exchange for your effort, the fake vtable
idiom allows you to implement polymorphic behavior for in-situ objects.
Many thanks are due to Emily Winch who provided a thorough and helpful review.
[1] One of Alan Perlis' legendary one-liners.
[2] A. Alexandrescu. "Generic<Programming>: Discriminated Unions (I)," C/C++ Users Journal C++ Experts Forum, <www.cuj.com/experts/2004/alexandr.htm>.
[3] POD stands for Plain Old Data, a type umbrella that covers primitive types and essentially C-like struct
s/union
s, namely (in all excruciating detail): having no user-declared constructors, no private or protected non-static data member, no base classes, no virtual functions, no non-static data members of type pointer to member, non-POD-struct
, non-POD-union
(or an array of such types) or reference, no user-defined copy assignment operator, and no user-defined destructor.
[4] Search for "align" in the Boost downloadable archive or on the developer mailing list. You can get to both from <www.boost.org>.
[6] The use of virtual function tables is not required, but virtually all C++ implementations use a variation of this mechanism.
[8] A. Alexandrescu. Modern C++ Design (Addison-Wesley Longman, 2001), Chapters 2 and 3.
Andrei Alexandrescu is a Ph.D. student at University of Washington in Seattle, and author of the acclaimed book Modern C++ Design. He may be contacted at www.moderncppdesign.com. Andrei is also one of the featured instructors of The C++ Seminar.
Terms of Service | Privacy Statement | Copyright © 2025 UBM Tech, All rights reserved.