Dr. Dobb's | Contract Programming and RTTI

Contract Programming and RTTI

RTTI is not just for implementers of software tools. It's also handy for validating interface compliance.

January 01, 2003
URL:http://drdobbs.com/contract-programming-and-rtti/184401608

Contract Programming and RTTI

Introduction

Few of the features in C++ are as poorly understood as RTTI (Run-Time Type Information). Most C++ texts demonstrate in detail how not to use RTTI, without any corresponding examples of appropriate use, while widespread "conventional wisdom" takes it for granted that RTTI is inefficient and bloated. Is RTTI merely a concession to old-fashioned coding practices and poor design? This article argues that, used judiciously, RTTI can be a powerful tool for managing software complexity through Contract Programming [1].

Background

The use of polymorphism to abstract object behavior is well understood by C++ developers. Many of us were introduced to polymorphism through some variation of the Shape class [2], which defines an abstract base for objects in a hypothetical drawing program:

class Shape
{
public:
  virtual void Draw() = 0;
}

Concrete classes (e.g., Triangle, Circle, Box) are then derived from Shape and implement the Shape::Draw() method. The drawing program stores and manipulates these objects through their Shape base class, and polymorphism guarantees that the correct implementation of Draw() will be called for each individual object at run time.

The example usually ends there, which is fine for an illustration of polymorphism, but hardly constitutes a realistic example of a drawing application -- in a real working program, Shape-derived objects will have to support everything from serialization to deletion, selection, grouping, 2-D transformations, and more. More importantly, as the set of Shape behaviors grows, so does the likelihood that some shapes will not support every behavior, requiring additional methods to test for supported behavior at run time:

class Shape
{
public:
  virtual bool IsDrawable() = 0;
  virtual void Draw() = 0;

  virtual bool IsSerializable() =
    0;
  virtual void Save(
    std::ostream& Stream) = 0;

  virtual bool IsSelectable() = 0;
  virtual void Select() = 0;
  virtual void Deselect() = 0;
  virtual bool IsSelected() = 0;

  virtual bool IsMovable() = 0;
  virtual void Move
    (const double X,
     const double Y) = 0;

  virtual bool IsDeletable() = 0;
  /* ... and so on ... */
};

This design is unsatisfying for three reasons:

1. Complexity -- bringing together many otherwise unrelated behaviors into a single interface.

2. Rigidity -- forcing all derivatives to implement every method, whether they are applicable to that concrete object or not.

3. Lack of run-time safety -- although tests are provided for each behavior, there is no way to force a client to actually use those tests at run-time prior to calling the associated methods.

The result is implementations such as the following, for a hypothetical on-screen formatting object that is visible, but can't be selected, modified, or deleted by the application user:

class Ruler : public Shape
{
public:
  bool IsDrawable()
    { return true; }
  void Draw()
    { /* Specifics for drawing
         a ruler here */ }
  
  bool IsSerializable()
    { return false; }
  void Save
    (std::ostream& Stream) {}
  
  bool IsSelectable()
    { return false; }
  void Select() {}
  void Deselect() {}
  bool IsSelected()
    { return false; }
  
  bool IsMovable()
    { return false; }
  void Move(const double X,
    const double Y) {}
  
  bool IsDeletable()
    { return false; }
  /* ... and so on ... */
};

Note how this class consists almost entirely of unused, empty implementations [3]. Fortunately, there's a better alternative -- the Contract Programming model.

Introducing Contract Programming

The problem introduced is a common one in software engineering: given a broad range of well-defined behaviors and a collection of objects of unknown capability, determine at run time which objects support which behavior. This situation arises anytime objects are being instantiated via the Factory Design Pattern and is especially pertinent in applications that support run-time plug-ins -- since objects written by third parties are by definition unknown to the original application authors, some type of negotiation must occur at run time after the plug-in objects have been instantiated to determine their capabilities. The Contract Programming model addresses this problem with a two-pronged approach. First, the range of behaviors supported by the system is defined by a series of tightly defined interfaces of narrow scope. Interfaces are the "contracts" in Contract Programming -- they define sets of services that are implemented by an object and accessed by a client -- if a client holds a specific interface to an object, it knows that it can call the methods of that interface with predictable, well-defined results, without any knowledge of the object's concrete type. In C++, an interface can be implemented as a class with no data members, no constructors, and nothing but pure, virtual methods [4]. Although the Shape class technically fits this definition of an interface, it certainly isn't "narrow in scope," so the first step in applying the Contract Programming model to the drawing application is to split the Shape class into several more manageable interfaces:

class Drawable
{
public:
  virtual void Draw() = 0;
};

class Serializable
{
public:
  virtual void Save(std::ostream& Stream) = 0;
};

class Selectable
{
public:
  virtual void Select() = 0;
  virtual void Deselect() = 0;
  virtual bool IsSelected() = 0;
};

class Movable
{
public:
  virtual void Move(const double X,
    const double Y) = 0;
};

Now, each concrete class can multiply inherit from only those interfaces that it intends to implement [5]:

class Triangle :
  public Drawable,
  public Serializable,
  public Selectable,
  public Movable
{
public:
  /* Implementations of all
     four interfaces here */
};

class Ruler :
  public Drawable
{
public:
  // Implement Drawable here
};

This cleans up our objects considerably, by eliminating both "unused" methods and tests for run-time capabilities. It also brings us to the second half of the Contract Programming model: some means must be provided to query an object at run time for the interfaces it supports. Querying objects for interfaces is the bread-and-butter of Contract Programming, and it is where RTTI (finally) enters the picture.

Putting RTTI to Work

Assume that a client holds a Drawable interface implemented by an object:

Drawable* const drawable =
  new Triangle();

The client would like to place the object (if possible) in a new location within the drawing; to do so, it must find out if the object can be moved, by querying it for the Movable interface. If the object does, in fact, implement Movable, the client will use the interface to carry out the desired changes. But how does it query for an interface? Several solutions are possible; one of the simplest and most elegant is to use RTTI via dynamic_cast. Clients can use dynamic_cast to cast (query) any interface to any other interface -- the result of the dynamic_cast will either be the requested interface, if it's implemented by the concrete object, or NULL:

Movable* const movable =
  dynamic_cast<Movable*>(drawable);
if(movable)
  movable->Move(5, 10);

Note that there's nothing special about the choice of interfaces in this example: I could have applied dynamic_cast to any of the four interfaces I've defined to query for any of the others. Note too that this use of dynamic_cast is semantically quite different from what textbooks warn against (see sidebar) -- instead of testing an object to see if it is a specific type, you are testing it for a specific capability, as defined by an interface class.

Off into the Great Unknown

One difficulty glossed over in the preceding discussion is how to store a collection of shapes now that the Shape class has been replaced with the interface classes Drawable, Serializable, Selectable, and Movable. Presumably, the original drawing application declared a container of shapes as something along the lines of:

typedef std::vector<Shape*> Shapes;

Clearly, this will have to change now that the Shape class has been replaced, but to what? Choosing to use a container of a specific type of interface such as Drawable would mean that all objects within the container must support the Drawable interface. This might be a fairly safe choice for a simple drawing application, but then again, it might not -- and why build constraints into the design if you don't have to?

A second, related issue is that it would be useful to write functions that take objects of unknown capabilities as parameters. For example, the following code fragment can be used to deselect any Selectable object:

Selectable* const selectable =
  dynamic_cast<Selectable*>(
    any_interface_here);
if(selectable)
  selectable->Deselect();

Wrapping this code in a function makes good programming sense, but what type should that function take as an argument? Again, choosing one of the four interface types would be a problem -- suppose the function is declared as:

void DeselectShape(
  Movable* const Shape);

You would then have to provide a Movable interface in order to call DeselectShape():

DeselectShape(
  dynamic_cast<Movable*>(
    any_interface_here));

which, translated into English, says "if this object is Movable, and it's Selectable, deselect it" -- clearly not what I originally had in mind, since DeselectShape() should deselect any object whether it's Movable or not.

What you need to solve both these problems is an "ambiguous" or "unknown" interface. While void* might seem like a logical choice, it won't do the job: you can't dynamic_cast from void* to another type [6]. The solution is to create a special interface, Unknown, that all other interfaces derive from [7]:

class Unknown
{
protected:
  virtual ~Unknown() {}
};

class Drawable :
  public virtual Unknown
{/* same as before */};

class Serializable :
  public virtual Unknown
{/* same as before */};

class Selectable :
  public virtual Unknown
{/* same as before */};

class Movable :
  public virtual Unknown
{/* same as before */};

Now you can create a collection of objects whose capabilities are completely unknown until you test for them:

typedef std::vector<Unknown*> Shapes;

Furthermore, you can define functions that will take any interface type as an argument:

void DeselectShape(
  Unknown* const Shape)
{
  /* same as before */
}

Using Proper Protection

When designing interfaces, don't forget to pay attention to proper access control. I recommend the following:

1. Provide a protected default constructor for all interfaces (including Unknown). You'll need one for "empty" interfaces like Unknown that don't contain pure virtual methods, to prevent clients from instantiating them as if they were objects:

// Interface to nowhere!
Unknown* a = new Unknown();

The non-empty interfaces can't be instantiated (because they contain pure virtuals), but they'll still require a default constructor in order to compile after you apply recommendation 2, below.

Since you are providing a default interface constructor that will be called from derived classes, you must make it protected instead of private and provide an implementation. I prefer inline implementations, to avoid linking and interdependency issues:

class Unknown
{
protected:
  Unknown() {}
  virtual ~Unknown() {}
};

2. Provide a protected copy constructor and assignment operator for all interfaces (including Unknown). This will prevent compilation of code whose behavior would otherwise be undefined:

Unknown* const a = new Triangle();
Unknown* const b = new Circle();

*a = *b; // Assignment of a Circle
         // to a Triangle!

Unknown* CopyShape(Unknown* const Shape)
{
  // Tries to create a copy, but of what?
  return new Unknown(*Shape);
}

The bottom line is that with Contract Programming the concrete type that an interface points to is unknown, so direct copies can't be made [8].

Again, copy constructors and assignment operators should be protected and defined inline so that the implementer of an interface can use copy construction and assignment in his implementations.

3. Ensure all interfaces have a virtual destructor. The need for virtual destructors in base classes is well documented -- they guarantee that calling operator delete with a base class (interface) pointer calls the correct destructor for the derived (implementation) class. In this case, the Unknown interface provides a virtual destructor so derived interfaces don't have to.

4. (Optional) Make the destructor protected in all interfaces and create a special interface for deleting objects. While item 3 is sufficient to meet language requirements and write working software, I feel that the Contract Programming model calls for additional prudence: querying for interfaces means holding multiple pointers to a single object, so anything that prevents casual or accidental deletion of the object through one pointer will prevent mishaps with the other(s). For this reason I generally define protected destructors for all interfaces (including Unknown) and provide a special interface with a public destructor for destroying objects:

class Deletable :
  public Unknown
{
public:
  virtual ~Deletable() {}
  /* ... and-so-on ... */
};

With these changes in place, an object can only be deleted after querying for the Deletable interface:

void DestroyShape(Unknown* const Shape)
{
  Deletable* const deletable =
    dynamic_cast<Deletable*>(Shape);

  if(deletable)
    delete deletable;
}

In addition to protecting against accidents, using the Deletable interface gives us the flexibility to create "indestructible" objects that don't implement it, such as Singletons or other objects with explicitly managed lifetimes. An attempt to destroy such an object through the DestroyShape function will be quietly ignored.

Alternatives

There are several alternatives to the above approach to Contract Programming: the Mozilla project's XPCOM [8] and the proprietary Microsoft COM are just two. Each provides the interface/query model that you've seen, plus language and process independence: a COM component written in C++ can interact with a COM component written in JavaScript, running in a different process or on some other host. This language independence imposes significant constraints on how applications are written within either framework and is beyond the scope of this article.

In brief, the two COMs break functionality down into interfaces, which are (again) classes that contain only pure, virtual methods with no data members [9]. All interfaces derive from a special interface, IUnknown. Unlike the Unknown interface described above, IUnknown contains three methods: QueryInterface(), AddRef(), and Release(). The need for these three methods is dictated by the language independence of their frameworks: dynamic_cast could never work on an object written in another language (or residing in another process), so the QueryInterface() method has to be provided in its place. Because QueryInterface() is a member of IUnknown, and all other interfaces derive from IUnknown, you can call QueryInterface() with any interface to obtain any other supported interface on an object. Similarly, destructors don't exist for objects written in another language, so AddRef() and Release() allow clients to control the lifetimes of objects through reference counting.

Of course, if language or process independence for objects is important to your design, you'll want to go with one of the two COMs. Even if you don't require their features you may want to look into one of the numerous simpler frameworks that have been written, or consider writing one of your own, if you prefer the QueryInterface() route to dynamic_cast. The efficiency issues surrounding such a choice are addressed in the next section; I will only point out here that whatever framework you use/develop will have to provide:

1. A means of uniquely identifying interface types.

2. A method to query an object for the interfaces that it supports.

3. An implementation of that method in every class that supports interfaces.

Items 1 and 3 taken together are the real problems with this approach -- the maintenance of some type of registry for interface identifiers, coupled with the need to write code on a class-by-class basis to identify which interfaces a class supports, is error prone and difficult to maintain at best. Using RTTI capitalizes nicely on the fact that the compiler knows at compile time exactly which interfaces a class implements. Nevertheless, a homemade framework of this type is included in the sample code for your inspection (available for download at <www.cuj.com/code>); you should compare it to the RTTI version and note the added complexity.

How Much Does RTTI Cost?

Now that you have a flexible, portable Contract Programming framework, one last item remains: the cost of it all. As you've seen above, there are many alternatives to the RTTI-based framework presented in this article. The key question is: are they more efficient, in time or space? One of the most pervasively held beliefs that I encountered while learning about RTTI took the form of anecdotal newsgroup lore along the lines of "I turned on RTTI in the compiler, and my executable size immediately increased by five percent." While it's certainly true that there are costs associated with using RTTI, it's important to know exactly what they are so that you can make an informed decision about whether to use it or not. You may find along the way that RTTI is not as expensive as you think. For example, if you've ever wondered how a try-catch block knows how to catch a specific type of exception while allowing others to continue unwinding the call stack, the answer is RTTI. In fact, RTTI was added in part to the language by making public the mechanisms that already existed to support exception handling. If your code (or the code in a library you're using) uses exceptions, you are already using RTTI, in which case the cost of putting it to additional use is probably zero!

Getting down to specifics, compilers typically support RTTI first by creating a type_info structure for every class type in your application. In turn, each class' type_info privately contains a searchable list of references to the type_info structures for its base classes. That's the first cost: sizeof(type_info) + sizeof(type_info*)*(number of base classes) per class. Second, a pointer to a class' type_info is inserted into the virtual function table for that class. This pointer is what typeid and dynamic_cast use to look up the dynamic type of an object at run time. That's the second cost: one additional virtual table pointer per class. And that's all. Whether this overhead is objectionable to you will probably hinge on the number of classes in your application: the footprint of an application with many small, simple class types will grow proportionally larger with RTTI enabled than that of an application with fewer, larger class types. Another thing to consider will be how often RTTI will be used as part of your design: a 3-D graphics application I maintain that makes heavy use of the Contract Programming model actually decreased in size when I switched from a homemade QueryInterface()-based framework to an RTTI framework. A program with a more conventional design that only needs to handle a few objects via the Contract Programming model might see different results.

At run time the results are less ambiguous -- there are no run-time overheads introduced by RTTI, aside from the execution times of typeid and dynamic_cast. Finding the concrete type of an object at run time via typeid is strictly a matter of obtaining a reference to the object's type_info structure through its virtual function table and is thus roughly equivalent in cost to any other virtual function lookup. Testing for interfaces with dynamic_cast is a little more complicated: after looking up the type_info structures of the source and target classes, dynamic_cast must then descend recursively through the source class' list of base-class type_info references, looking for a match with the target class. In a shallow class hierarchy, this is roughly equivalent to what an implementation of QueryInterface() has to do, and I have found in practice that the RTTI-based framework can query for interfaces just as quickly as other frameworks.

Summary

Standard C++ RTTI can be used to quickly and portably apply the Contract Programming model to handling complex interactions with objects of unknown capability at run time. Although alternative frameworks exist, they typically impose significant restrictions on how objects are handled, add considerable complexity when building software, and are much less portable. Homemade alternatives to RTTI may not be any more efficient, add significant complexity and maintenance requirements, and are considerably more error prone. The source code accompanying this article (available at <www.cuj.com/code>) presents the sample drawing application in three versions: one without the Contract Programming model, one with homemade RTTI, and one with Standard C++ RTTI.

Notes and References

[1] The working title for this article was "RTTI -- Our Misunderstood Friend."

[2] Bjarne Stroustrup. The C++ Programming Language, 2nd Edition (Addison Wesley, 1992).

[3] The implementation of Ruler::IsSelected() is particularly troubling -- note that Ruler::IsSelectable() returns false, indicating that this is an unsupported feature, while IsSelected() is returning a valid result as if nothing is wrong.

[4] Scott Meyers calls this a "protocol class" in Effective C++, 2nd Edition (Addison Wesley, 1998), but I prefer the term "interface" as it is more widely used and applicable to languages and frameworks other than Standard C++, including XPCOM, Microsoft COM, and Java.

[5] If you thought the Java language didn't support multiple inheritance -- surprise! This form of multiple inheritance (from interfaces) is a staple of Java.

[6] That's because void* is more than just "pointer to unknown" -- it's effectively "pointer to unknowable."

[7] Yes, I know that this is the dreaded Diamond Inheritance Pattern. I would argue, though, that since the Unknown interface has no data members, no constructors, and no methods, this design doesn't suffer from the problems traditionally associated with multiple inheritance diamonds.

[8] The Mozilla Project, <www.mozilla.org>.

[9] Actually, they're virtual function tables that are laid out as if they were C++ classes -- which is why a C++ program can call methods on an XPCOM/Microsoft COM interface implemented by a program written in C or JavaScript.

About the Author

Timothy M. Shead is a former U.S. Marine, now working as a senior software engineer with nine years experience in Sunnyvale, CA. You can see the principles outlined in this article at work in a large scale environment at Tim's website, <http://k3d.sourceforge.net>.

How Not to Use RTTI

It's no lie that RTTI is an easily-abused feature of C++ -- most texts point out (correctly) that using RTTI to determine an object's specific type is almost always the result of flawed design. As an example, see what happens when one developer (mis)uses RTTI to discover whether an object can be safely deleted:

void DeleteShape(Shape* S)
{
  if(dynamic_cast<Triangle*>(S))
    delete S;
  else if(dynamic_cast<Circle*>(S))
    delete S;
  else if(dynamic_cast<PageBorder*>(S))
    delete S;
  else if(dynamic_cast<Ruler*>(S))
    return;
  else if(dynamic_cast<LayoutGrid*>(S))
    return;
}

Note that there are three classes (apparently) that can be deleted, and (maybe) two that can't. Regrettably, this code will compile and work in the short term; I say regrettably because problems will inevitably begin to appear as the program evolves. What happens when a new class is introduced, say, Ellipse? The code will continue to compile, and the application will continue to run, yet Ellipse objects won't be deletable unless the developer remembers to go back and modify the implementation of DeleteShape(). Similarly, what happens when the capabilities of an object change? Suppose a later release of the application does allow the user to delete Ruler objects -- again, the developer must remember to alter DeleteShape() to allow it. Conversely, if an object's behavior changes from deletable to non-deletable without a corresponding update to DeleteShape(), a user may delete an object that they shouldn't have, probably leading to a segfault at run time. In all of these scenarios, behavioral changes in unknown numbers of objects have to be matched with changes to an otherwise unrelated function -- a serious violation of encapsulation. That ought to be enough to warn you away from this sort of code, but if it isn't, consider that this design cannot accommodate plug-in objects (popular in the graphics world), because the author of DeleteShape() can't possibly know in advance the type names of every Shape-derived class that might be written by a third party! Finally, consider that DeleteShape() is just one function -- if you have to resort to such trickery here, you'll undoubtedly have to do it in myriad other places, adding up to a maintenance nightmare. The moral of the story is that you should never need to use RTTI to find out the actual concrete type of an object at run time -- anytime you're tempted to do so, it's a surefire pointer to problems in the design. You should be testing objects for their capabilities instead of their type, as described in the accompanying article.