C/C++

Discriminated Unions (II)

By Andrei Alexandrescu, June 01, 2002

We continue working on our Variant design and implementation. This time we focus on storage - how data is stored and accessed. This article presents techniques for computing alignment of types with reasonable accuracy and portability. Then, once the storage proper is in place, we need polymorphic means to manipulate that storage - for which Variant uses the fake vtable

You know the one about the syntactic sugar? It causes cancer of the semicolon [1]? Well, enough jokes for today. We have a lot of ground to cover today, so let's hit the road early.

This installment continues the implementation of discriminated unions in C++. Today we'll finish the discussion on alignment and proceed full-speed to writing some code for the actual implementation of Variant.

Before that, let's refresh our memory on the progress we made in the previous installment of "Generic<Programming>" [2].

Highlights from the Previous Episode

After putting together a wish list for implementing discriminated unions, we discussed what would be the best storage model. After stumbling upon alignment issues, we figured out that a storage model that is at the same time properly aligned, efficient, and exception-friendly is:

union 
{
  unsigned char buffer_[neededSize];
  Align dummy_;
};

where neededSize is the maximum size of the types in the union, and Align is a POD [3] type that ensures proper alignment.

Depending on how well chosen Align is, the discriminated union's storage can vary from optimal to unnecessarily large to defective.

Even with the best Align, the implementation above is still not 100-percent portable for all types. In theory, someone could implement a compiler that respects the Standard but still does not work properly with discriminated unions. This is because the Standard does not guarantee that all user-defined types ultimately have the alignment of some POD type. Such a compiler, however, would be more of a figment of a wicked language lawyer's imagination, rather than a realistic language implementation.

Implementing an Alignment Calculator

That said, several effective alignment computation algorithms can be devised. You can see a couple implemented in the Boost libraries [4]. One that works well for computing the alignment of a type T is:

Start with a collection of all primitive types.
Eliminate from that collection the types larger than T.
The resulting Align is a C union containing each of the types in the collection computed in step 2.

The basic idea underlying this algorithm is that any user-defined type T will ultimately contain only primitive types. Consequently, T's alignment requirements will be the same as one of those types. Larger types have larger alignment requirements; hence it's reasonable to infer that an upper bound of T's alignment is the alignment of its largest member.

This algorithm might "overalign" the result. For example, if T is char[5] and sizeof(int) is four, T might have an alignment of one, but Align will have the alignment of an int (most likely, four).

Erring on the large side of alignment is not that bad in most cases. (Erring on the small side is disastrous.) On the other hand, erring on the large side of size wastes space. Our algorithm's step 2 ensures that only types of size less than or equal to T's size remain selected.

To implement the algorithm for computing Align, recall from the previous installment of "Generic<Programming>" that we have two appropriate tools at our disposal: typelists and ConfigurableUnion. Typelists allow us to manipulate collections of types, allowing us to perform steps 1 and 2. ConfigurableUnion generates a C-style union from a typelist, thus taking care of step 3.

It is worth noting that, in addition to all primitive types, the initial collection of types should include some simple structs as well. Each of those structs contains only one member of a primitive type. We generate these stub structures from a simple template:

template
  <typename U> struct Structify
  { U dummy_; };

The reason for introducing these little structures is simple. Some compilers put tougher alignment on a structure containing only one int than on the int alone. This way, the compiler writer ensures that all user-defined types have the same alignment requirements. In turn, this architectural decision makes things easy for various parts of the compiler.

We thus start with a typelist containing all primitive types and all "structified" types:

class Unknown;

typedef cons<
        char,      
        short int,
        int,
        long int,
        float,
        double,
        long double,
        char*,
        short int*,
        int*,
        long int*,
        float*,
        double*,
        long double*,
        void*,
        Unknown (*)(Unknown),
        Unknown* Unknown::*,
        Unknown (Unknown::*)(Unknown),
        Structify<char>,
        Structify<short int>,
        Structify<int>,
        Structify<long int>,
        Structify<float>,
        Structify<double>,
        Structify<long double>,
        Structify<char*>,
        Structify<short int*>,
        Structify<int*>,
        Structify<long int*>,
        Structify<float*>,
        Structify<double*>,
        Structify<long double*>,
        Structify<void*>,
        Structify<Unknown (*)(Unknown)>,
        Structify<Unknown* Unknown::*>,
        Structify<Unknown (Unknown::*)(Unknown)>
        >::type
      TypesOfAllAlignments;

Okay, so the primitive types are there; the structified types are there, too — very predictable. But some lines of the typelist above look as if the copyeditor's cat did a little Teutonic dance on the keyboard. The incriminating lines are:

        Unknown (*)(Unknown)
        Unknown* Unknown::*
        Unknown (Unknown::*)(Unknown)

Also, their structified versions appear at the bottom of the list.

Who are these guys, and where did they come from? To make them look more familiar, let's give them names:

        Unknown (*T1)(Unknown);
        Unknown* Unknown::* T2;
        Unknown (Unknown::* T3)(Unknown);

Aha! If you precede each of the lines above with typedef, they reveal themselves in all the tired glory of the C declaration syntax. T1 is a pointer to a function taking Unknown and returning Unknown; T2 is a pointer to member of an Unknown, member of type Unknown*; and T3 is a pointer to member function of Unknown, taking an Unknown as an argument and returning an Unknown as a result.

The trick here is that the cunningly named Unknown is indeed not defined. Therefore, the compiler will assume that Unknown is defined elsewhere and that it has the worst alignment requirements possible. (Otherwise, the compiler can optimize Unknown's layout. Optimization here plays against generality.)

Okay, here comes the fun part. Let's now eliminate from TypesOfAllAlignments all types that are larger than a given size, by writing the following template:

template <class TList, size_t size> struct ComputeAlignBound;

First, we treat the limited case, as a nice recursive algorithm ought to:

template <size_t size>
struct ComputeAlignBound<null_typelist, size>
{
  typedef null_typelist Result;
};

Then, we treat the general case:

template <class Head, class Tail, size_t size>
struct ComputeAlignBound<typelist<Head, Tail>, size>
{
  typedef typename ComputeAlignBound<Tail, size>::result TailResult;
  typedef typename select<  // details below
    sizeof(Head) <= size,
    typelist<Head, TailResult>,
    TailResult>::result
    Result;
};

First, ComputeAlignBound yields the result of its recursive application to the tail of the typelist in TailResult. Now, if the head's size is less than or equal to size, the result is a typelist formed by Head and TailResult. Otherwise, the result is only TailResult — Head is no longer included.

This type selection is performed by a little template entitled select. To save column space, please allow me to refer you to [5]. It's a very useful little template, and if you are interested in generic programming using C++, you owe it to yourself to find out about it.

To put it all together, we need to tie up some loose ends. Let's recap what we have and what we need. We have the MaxSize template, which calculates the maximum size of all types in a typelist. We have ConfigurableUnion, which builds a union out of a typelist. Finally, we have ComputeAlignBound, which calculates a type that has the same or more stringent alignment requirements as any type in a typelist. We need a POD type that provides a properly aligned storage for any type in a typelist. Here's how to exploit what we have to obtain what we need:

template <typename TList>
class AlignedPOD
{
  enum { maxSize = MaxSize<TList>::result };
  typedef ComputeAlignBound<TypesOfAllAlignments, maxSize>::Result
    AlignTypes;
public:
  typedef ConfigurableUnion<AlignTypes> Result;
  };

You can play encapsulation games by putting ComputeAlignBound, Unknown, Structify, and ConfigurableUnion in some private namespace so that they are properly hidden.

Discriminated Unions Implementation: The Fake vtable Idiom

Let's get back to discriminated unions. At this point, half of this article's real estate has been spent on computing alignment. I hope that it was a well-invested effort; alignment is important in a host of low-level applications, as well as in the efficient implementation of higher-level abstractions.

1 2 3 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

C/C++

Discriminated Unions (II)

Highlights from the Previous Episode

Implementing an Alignment Calculator

Discriminated Unions Implementation: The Fake vtable Idiom

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

C/C++

Discriminated Unions (II)

Highlights from the Previous Episode

Implementing an Alignment Calculator

Discriminated Unions Implementation: The Fake vtable Idiom

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content