RTTI is not just for implementers of software tools. It's also handy for validating interface compliance.
January 01, 2003
URL:http://drdobbs.com/contract-programming-and-rtti/184401608
class Shape { public: virtual void Draw() = 0; }Concrete classes (e.g., Triangle, Circle, Box) are then derived from Shape and implement the Shape::Draw() method. The drawing program stores and manipulates these objects through their Shape base class, and polymorphism guarantees that the correct implementation of Draw() will be called for each individual object at run time.
The example usually ends there, which is fine for an illustration of polymorphism, but hardly constitutes a realistic example of a drawing application -- in a real working program, Shape-derived objects will have to support everything from serialization to deletion, selection, grouping, 2-D transformations, and more. More importantly, as the set of Shape behaviors grows, so does the likelihood that some shapes will not support every behavior, requiring additional methods to test for supported behavior at run time:
class Shape { public: virtual bool IsDrawable() = 0; virtual void Draw() = 0; virtual bool IsSerializable() = 0; virtual void Save( std::ostream& Stream) = 0; virtual bool IsSelectable() = 0; virtual void Select() = 0; virtual void Deselect() = 0; virtual bool IsSelected() = 0; virtual bool IsMovable() = 0; virtual void Move (const double X, const double Y) = 0; virtual bool IsDeletable() = 0; /* ... and so on ... */ };This design is unsatisfying for three reasons:
1. Complexity -- bringing together many otherwise unrelated behaviors into a single interface.
2. Rigidity -- forcing all derivatives to implement every method, whether they are applicable to that concrete object or not.
3. Lack of run-time safety -- although tests are provided for each behavior, there is no way to force a client to actually use those tests at run-time prior to calling the associated methods.
The result is implementations such as the following, for a hypothetical on-screen formatting object that is visible, but can't be selected, modified, or deleted by the application user:
class Ruler : public Shape { public: bool IsDrawable() { return true; } void Draw() { /* Specifics for drawing a ruler here */ } bool IsSerializable() { return false; } void Save (std::ostream& Stream) {} bool IsSelectable() { return false; } void Select() {} void Deselect() {} bool IsSelected() { return false; } bool IsMovable() { return false; } void Move(const double X, const double Y) {} bool IsDeletable() { return false; } /* ... and so on ... */ };Note how this class consists almost entirely of unused, empty implementations [3]. Fortunately, there's a better alternative -- the Contract Programming model.
class Drawable { public: virtual void Draw() = 0; }; class Serializable { public: virtual void Save(std::ostream& Stream) = 0; }; class Selectable { public: virtual void Select() = 0; virtual void Deselect() = 0; virtual bool IsSelected() = 0; }; class Movable { public: virtual void Move(const double X, const double Y) = 0; };Now, each concrete class can multiply inherit from only those interfaces that it intends to implement [5]:
class Triangle : public Drawable, public Serializable, public Selectable, public Movable { public: /* Implementations of all four interfaces here */ }; class Ruler : public Drawable { public: // Implement Drawable here };This cleans up our objects considerably, by eliminating both "unused" methods and tests for run-time capabilities. It also brings us to the second half of the Contract Programming model: some means must be provided to query an object at run time for the interfaces it supports. Querying objects for interfaces is the bread-and-butter of Contract Programming, and it is where RTTI (finally) enters the picture.
Drawable* const drawable = new Triangle();The client would like to place the object (if possible) in a new location within the drawing; to do so, it must find out if the object can be moved, by querying it for the Movable interface. If the object does, in fact, implement Movable, the client will use the interface to carry out the desired changes. But how does it query for an interface? Several solutions are possible; one of the simplest and most elegant is to use RTTI via dynamic_cast. Clients can use dynamic_cast to cast (query) any interface to any other interface -- the result of the dynamic_cast will either be the requested interface, if it's implemented by the concrete object, or NULL:
Movable* const movable = dynamic_cast<Movable*>(drawable); if(movable) movable->Move(5, 10);Note that there's nothing special about the choice of interfaces in this example: I could have applied dynamic_cast to any of the four interfaces I've defined to query for any of the others. Note too that this use of dynamic_cast is semantically quite different from what textbooks warn against (see sidebar) -- instead of testing an object to see if it is a specific type, you are testing it for a specific capability, as defined by an interface class.
typedef std::vector<Shape*> Shapes;Clearly, this will have to change now that the Shape class has been replaced, but to what? Choosing to use a container of a specific type of interface such as Drawable would mean that all objects within the container must support the Drawable interface. This might be a fairly safe choice for a simple drawing application, but then again, it might not -- and why build constraints into the design if you don't have to?
A second, related issue is that it would be useful to write functions that take objects of unknown capabilities as parameters. For example, the following code fragment can be used to deselect any Selectable object:
Selectable* const selectable = dynamic_cast<Selectable*>( any_interface_here); if(selectable) selectable->Deselect();Wrapping this code in a function makes good programming sense, but what type should that function take as an argument? Again, choosing one of the four interface types would be a problem -- suppose the function is declared as:
void DeselectShape( Movable* const Shape);You would then have to provide a Movable interface in order to call DeselectShape():
DeselectShape( dynamic_cast<Movable*>( any_interface_here));which, translated into English, says "if this object is Movable, and it's Selectable, deselect it" -- clearly not what I originally had in mind, since DeselectShape() should deselect any object whether it's Movable or not.
What you need to solve both these problems is an "ambiguous" or "unknown" interface. While void* might seem like a logical choice, it won't do the job: you can't dynamic_cast from void* to another type [6]. The solution is to create a special interface, Unknown, that all other interfaces derive from [7]:
class Unknown { protected: virtual ~Unknown() {} }; class Drawable : public virtual Unknown {/* same as before */}; class Serializable : public virtual Unknown {/* same as before */}; class Selectable : public virtual Unknown {/* same as before */}; class Movable : public virtual Unknown {/* same as before */};Now you can create a collection of objects whose capabilities are completely unknown until you test for them:
typedef std::vector<Unknown*> Shapes;Furthermore, you can define functions that will take any interface type as an argument:
void DeselectShape( Unknown* const Shape) { /* same as before */ }
1. Provide a protected default constructor for all interfaces (including Unknown). You'll need one for "empty" interfaces like Unknown that don't contain pure virtual methods, to prevent clients from instantiating them as if they were objects:
// Interface to nowhere! Unknown* a = new Unknown();The non-empty interfaces can't be instantiated (because they contain pure virtuals), but they'll still require a default constructor in order to compile after you apply recommendation 2, below.
Since you are providing a default interface constructor that will be called from derived classes, you must make it protected instead of private and provide an implementation. I prefer inline implementations, to avoid linking and interdependency issues:
class Unknown { protected: Unknown() {} virtual ~Unknown() {} };2. Provide a protected copy constructor and assignment operator for all interfaces (including Unknown). This will prevent compilation of code whose behavior would otherwise be undefined:
Unknown* const a = new Triangle(); Unknown* const b = new Circle(); *a = *b; // Assignment of a Circle // to a Triangle! Unknown* CopyShape(Unknown* const Shape) { // Tries to create a copy, but of what? return new Unknown(*Shape); }The bottom line is that with Contract Programming the concrete type that an interface points to is unknown, so direct copies can't be made [8].
Again, copy constructors and assignment operators should be protected and defined inline so that the implementer of an interface can use copy construction and assignment in his implementations.
3. Ensure all interfaces have a virtual destructor. The need for virtual destructors in base classes is well documented -- they guarantee that calling operator delete with a base class (interface) pointer calls the correct destructor for the derived (implementation) class. In this case, the Unknown interface provides a virtual destructor so derived interfaces don't have to.
4. (Optional) Make the destructor protected in all interfaces and create a special interface for deleting objects. While item 3 is sufficient to meet language requirements and write working software, I feel that the Contract Programming model calls for additional prudence: querying for interfaces means holding multiple pointers to a single object, so anything that prevents casual or accidental deletion of the object through one pointer will prevent mishaps with the other(s). For this reason I generally define protected destructors for all interfaces (including Unknown) and provide a special interface with a public destructor for destroying objects:
class Deletable : public Unknown { public: virtual ~Deletable() {} /* ... and-so-on ... */ };With these changes in place, an object can only be deleted after querying for the Deletable interface:
void DestroyShape(Unknown* const Shape) { Deletable* const deletable = dynamic_cast<Deletable*>(Shape); if(deletable) delete deletable; }In addition to protecting against accidents, using the Deletable interface gives us the flexibility to create "indestructible" objects that don't implement it, such as Singletons or other objects with explicitly managed lifetimes. An attempt to destroy such an object through the DestroyShape function will be quietly ignored.
In brief, the two COMs break functionality down into interfaces, which are (again) classes that contain only pure, virtual methods with no data members [9]. All interfaces derive from a special interface, IUnknown. Unlike the Unknown interface described above, IUnknown contains three methods: QueryInterface(), AddRef(), and Release(). The need for these three methods is dictated by the language independence of their frameworks: dynamic_cast could never work on an object written in another language (or residing in another process), so the QueryInterface() method has to be provided in its place. Because QueryInterface() is a member of IUnknown, and all other interfaces derive from IUnknown, you can call QueryInterface() with any interface to obtain any other supported interface on an object. Similarly, destructors don't exist for objects written in another language, so AddRef() and Release() allow clients to control the lifetimes of objects through reference counting.
Of course, if language or process independence for objects is important to your design, you'll want to go with one of the two COMs. Even if you don't require their features you may want to look into one of the numerous simpler frameworks that have been written, or consider writing one of your own, if you prefer the QueryInterface() route to dynamic_cast. The efficiency issues surrounding such a choice are addressed in the next section; I will only point out here that whatever framework you use/develop will have to provide:
1. A means of uniquely identifying interface types.
2. A method to query an object for the interfaces that it supports.
3. An implementation of that method in every class that supports interfaces.
Items 1 and 3 taken together are the real problems with this approach -- the maintenance of some type of registry for interface identifiers, coupled with the need to write code on a class-by-class basis to identify which interfaces a class supports, is error prone and difficult to maintain at best. Using RTTI capitalizes nicely on the fact that the compiler knows at compile time exactly which interfaces a class implements. Nevertheless, a homemade framework of this type is included in the sample code for your inspection (available for download at <www.cuj.com/code>); you should compare it to the RTTI version and note the added complexity.
Getting down to specifics, compilers typically support RTTI first by creating a type_info structure for every class type in your application. In turn, each class' type_info privately contains a searchable list of references to the type_info structures for its base classes. That's the first cost: sizeof(type_info) + sizeof(type_info*)*(number of base classes) per class. Second, a pointer to a class' type_info is inserted into the virtual function table for that class. This pointer is what typeid and dynamic_cast use to look up the dynamic type of an object at run time. That's the second cost: one additional virtual table pointer per class. And that's all. Whether this overhead is objectionable to you will probably hinge on the number of classes in your application: the footprint of an application with many small, simple class types will grow proportionally larger with RTTI enabled than that of an application with fewer, larger class types. Another thing to consider will be how often RTTI will be used as part of your design: a 3-D graphics application I maintain that makes heavy use of the Contract Programming model actually decreased in size when I switched from a homemade QueryInterface()-based framework to an RTTI framework. A program with a more conventional design that only needs to handle a few objects via the Contract Programming model might see different results.
At run time the results are less ambiguous -- there are no run-time overheads introduced by RTTI, aside from the execution times of typeid and dynamic_cast. Finding the concrete type of an object at run time via typeid is strictly a matter of obtaining a reference to the object's type_info structure through its virtual function table and is thus roughly equivalent in cost to any other virtual function lookup. Testing for interfaces with dynamic_cast is a little more complicated: after looking up the type_info structures of the source and target classes, dynamic_cast must then descend recursively through the source class' list of base-class type_info references, looking for a match with the target class. In a shallow class hierarchy, this is roughly equivalent to what an implementation of QueryInterface() has to do, and I have found in practice that the RTTI-based framework can query for interfaces just as quickly as other frameworks.
[2] Bjarne Stroustrup. The C++ Programming Language, 2nd Edition (Addison Wesley, 1992).
[3] The implementation of Ruler::IsSelected() is particularly troubling -- note that Ruler::IsSelectable() returns false, indicating that this is an unsupported feature, while IsSelected() is returning a valid result as if nothing is wrong.
[4] Scott Meyers calls this a "protocol class" in Effective C++, 2nd Edition (Addison Wesley, 1998), but I prefer the term "interface" as it is more widely used and applicable to languages and frameworks other than Standard C++, including XPCOM, Microsoft COM, and Java.
[5] If you thought the Java language didn't support multiple inheritance -- surprise! This form of multiple inheritance (from interfaces) is a staple of Java.
[6] That's because void* is more than just "pointer to unknown" -- it's effectively "pointer to unknowable."
[7] Yes, I know that this is the dreaded Diamond Inheritance Pattern. I would argue, though, that since the Unknown interface has no data members, no constructors, and no methods, this design doesn't suffer from the problems traditionally associated with multiple inheritance diamonds.
[8] The Mozilla Project, <www.mozilla.org>.
[9] Actually, they're virtual function tables that are laid out as if they were C++ classes -- which is why a C++ program can call methods on an XPCOM/Microsoft COM interface implemented by a program written in C or JavaScript.
void DeleteShape(Shape* S) { if(dynamic_cast<Triangle*>(S)) delete S; else if(dynamic_cast<Circle*>(S)) delete S; else if(dynamic_cast<PageBorder*>(S)) delete S; else if(dynamic_cast<Ruler*>(S)) return; else if(dynamic_cast<LayoutGrid*>(S)) return; }Note that there are three classes (apparently) that can be deleted, and (maybe) two that can't. Regrettably, this code will compile and work in the short term; I say regrettably because problems will inevitably begin to appear as the program evolves. What happens when a new class is introduced, say, Ellipse? The code will continue to compile, and the application will continue to run, yet Ellipse objects won't be deletable unless the developer remembers to go back and modify the implementation of DeleteShape(). Similarly, what happens when the capabilities of an object change? Suppose a later release of the application does allow the user to delete Ruler objects -- again, the developer must remember to alter DeleteShape() to allow it. Conversely, if an object's behavior changes from deletable to non-deletable without a corresponding update to DeleteShape(), a user may delete an object that they shouldn't have, probably leading to a segfault at run time. In all of these scenarios, behavioral changes in unknown numbers of objects have to be matched with changes to an otherwise unrelated function -- a serious violation of encapsulation. That ought to be enough to warn you away from this sort of code, but if it isn't, consider that this design cannot accommodate plug-in objects (popular in the graphics world), because the author of DeleteShape() can't possibly know in advance the type names of every Shape-derived class that might be written by a third party! Finally, consider that DeleteShape() is just one function -- if you have to resort to such trickery here, you'll undoubtedly have to do it in myriad other places, adding up to a maintenance nightmare. The moral of the story is that you should never need to use RTTI to find out the actual concrete type of an object at run time -- anytime you're tempted to do so, it's a surefire pointer to problems in the design. You should be testing objects for their capabilities instead of their type, as described in the accompanying article.
Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.