Stepping Up To C++
More Minor Enhancements as of CD Registration
Dan Saks
Dan Saks is the president of Saks & Associates, which offers consulting and training in C++ and C. He is secretary of the ANSI and ISO C++ committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield OH, 45504-4906, by phone at (513)324-3601, or electronically at [email protected].
For the past two months I've been listing ways that the programming language described by the C++ draft standard (as of Fall, 1994) differs from the language described in the ARM [1]. Two months ago, I described the major extensions:
- Templates
- Exception handling
- Run-time type information (including dynamic_cast)
- Namespaces
- New keywords and digraphs for C++ as alternate ISO646-compliant spellings for existing tokens
- Operator overloading on enumerations
- operator new[] and operator delete[]
- Relaxed restrictions on the return type of virtual functions
- wchar_t as a keyword representing a distinct type
- A built-in Boolean type
- Declarations in conditional expressions
New cast notation
The extensions to support RTTI (run-time type information) included a new type conversion operator called dynamic_cast. An expression such as
dynamic_cast<T *>(p)yields the result of casting (converting) pointer p to type T *. If p points to a T object, the result is p. If p points to an object of a type D derived from T, then the result is a pointer to the unique T sub-object of the D object addressed by p. (In fact, these conversions do not even require a cast.) Otherwise, p must point to an object of a polymorphic type (a type with at least one virtual function), in which case the resulting program performs a run-time check.
The run-time check is this: If p points to a base class sub-object of a T object, the result is a pointer to that T object; otherwise, it yields a null pointer. (This description of the run-time check is over-simplified, but it covers the most common and useful cases, namely public inheritance from a single direct base class.) Listing 1 shows an example of this sort of conversion, commonly called a downcast.
dynamic_cast can also perform reference conversions. An expression such as
dynamic_cast<T &>(r)converts reference r to type T &. The rules for dynamic_cast applied to references parallel the rules when applied to pointers, except that a reference conversion throws an exception, rather than return null, when it fails.
The dynamic_cast notation is obviously, and intentionally, different from the "old-style" cast notation, (T)e, which converts expression e to type T. The proponents of RTTI did not want to confuse this new functionality with existing conversions by using the same syntax.
In the course of distinguishing dynamic_cast from the other casts, the C++ standards committees generally agreed with Bjarne Stroustrup that the old-style cast lumps too many different kinds of conversions under a single blanket notation [2]. Among other things, an old-style cast can do the following:
- narrow or widen an arithmetic value,
- discard a const or volatile attribute from an expression,
- perform implicit address arithmetic while converting B * to D * (where D is derived from B), or
- reinterpret the value of an expression of one type as a value with a completely different type.
The standards committees accepted Stroustrup's proposal to add three new cast operators in the mold of dynamic_cast. These casts do not add any new functionality to C++. They simply classify the functionality of old-style casts into three distinct categories:
- static_cast<T>(e) is for the better-behaved conversions, such as from one arithmetic type to a narrower wider arithmetic type or to an enumeration, from a pointer-to-base (reference- to-base) to pointer-to-derived (reference-to-derived), or from any type to void. static_cast can add, but not remove cv- qualifiers (const and volatile).
- reinterpret_cast<T>(e) is for the poorly-behaved conversions (those with implementation-dependent behavior), such as converting from an integral type to a pointer or vice versa, to or from a pointer to an incomplete type, or from one pointer-to-function type to another.
- const_cast<T>(e) is for converting a type with cv-qualifiers to a type with fewer cv-qualifiers.
For example, the implementation of the Standard C function bsearch typically involves converting a parameter from const void * to const char *, and returning it as void *. Plauger's implementation [3] uses an old style cast in the return statement
return ((void *)q);to convert q from const char *to void *. The current C++ view is that this cast actually performs two distinct conversions:
1. It converts the type from char *to void *.
2. It removes ("casts-away") a const qualifier. It's not apparent from a casual reading of that return statement that the (void *) cast removes a const qualifier.
Plauger wrote his library in C, so he had no choice about the cast notation, but a C++ programmer can now write the conversion more explicitly using the new style-casts:
return static_cast<void *>(const_cast<char *>(q));Writing the return as
return static_cast<void *>(q);is an error, because static_cast cannot cast-away a const qualifier. Writing
return const_cast<void *>(q);is also an error, because a const_cast can only change the cv-qualification, not the type.
For much more about the new-style casts, see [2].
Qualified Names in Elaborated-Type-Specifiers
C places struct, union, and enum tags in a namespace separate from ordinary identifiers (those that designate functions, objects, typedefs, and enumeration constants). For example, the declaration
struct X { ... };enters X as a struct name in the tag namespace. Because X is a tag, not a type, you cannot write declarations such as
X *p;You must write the declaration as
struct X *p;Some C programmers take advantage of this distinction between tags and ordinary identifiers, and write declarations like
struct X { ... } X;which declares a struct X, and then an object X of type struct X, all in one declaration. Other C programmers, including yours truly, deem this bad practice, preferring to think of tags as type names.
I always equate each tag name to a type name using a typedef. A declaration of the form
typedef struct X { ... } X;often does the trick. Unfortunately, this doesn't introduce the typedef name until after the body of the struct. I prefer writing such declarations in the form:
typedef struct X X; struct X { ... };C++ treats tags differently from C. In C++, a class name is still a tag name, but it is also a type name. In fact, C++ treats all tags (for classes, structs, unions, and enums) as types. Thus, given
struct X { ... };C++ lets you use X as an ordinary identifier designating a type, as in
X *p;For compatibility with C, C++ still accepts
struct X *p;In C++, the combination of one of the keywords class, struct, union or enum, followed by a tag, is called an elaborated-type-specifier.
Again in the name of compatibility with C, C++ tolerates declarations such as
struct X { ... } X;In this case, the object name X hides the type name X, so that subsequent uses of the unelaborated name X refer to the object. However, you can refer to the type name by using its elaborated form struct X, or by using X in a qualified name such as X::m (where m is a member of X).
Now, with this background in mind, consider the declaration in Listing 2. Here, B appears as two different members of A: as a type and as a data member. The data member hides the type, so that outside A, A::B refers to the data member, not the nested type. You can refer to B as a type by using it in a qualified name like A::B::n, because C++ looks up a name followed by :: as if that name were a type. But how do you refer to B itself as a type?
For example, given the declaration in Listing 2, a declaration such as
A::B *pb;is an error, because the data member name B hides the type name B in the scope of A. In fact, the ARM provide no notation for referring to A::B as a member type.
The committees rectified this minor problem with a minor grammatical extension to allow a class-key (the keyword class, struct, or union) in an elaborated-type-specifier. For example, you can now write
struct A::B *pb;to declare pb as a pointer to an object of type A::B. Notice that the keyword struct elaborates B, not A. It's already clear that A designates a type in this context because A is followed by ::.
This extension also permits the keyword enum in an elaborated-type-specifier. For example,
enum A::E *pe;declares pe as a pointer to an object of enumeration type A::E.
Expressions of the form a.::B::c and p->::B::c
The committees straggled for several years to correct problems in the ARM's specification of scope and name lookup rules. The ARM's rules are incomplete, and at various times imprecise and contradictory. (See "Stepping Up to C++: Looking Up Names," CUJ, August 1993 and "Stepping Up to C++: Rewriting and Reconsidering," CUJ, September 1993.)
One of the last problems in this area was to pin down the precise meaning of expressions of the form x.B::c (or p->B::c). B might be as simple as an identifier or as complicated as a template expression. How does a C++ compiler evaluate (look up) B in such expressions? The answer to this question covers more complicated expressions such as x.B::C::d, because once the compiler knows what B denotes, it can easily resolve C and d.
After considering different rules for looking up B in x.B::c, the committee narrowed the choice to two possibilities:
1. Look up B in the context where it appears (known as the "golf" rule because you "play it where it lies")
2. Look up B as if it were in the body of a member function of x. That is, look in x's class (and its bases), then look in x's lexically enclosing class (and its bases), and so on out to global scope.
There are arguments in favor of both rules. Suppose you have a class with an inconveniently long name, such as class VeryLongName shown in Listing 3. You can write terser code by defining a shorter alias for the class name, such as:
typedef VeryLongName VLN;When a C++ compiler evaluates
ap->VLN::f();in function g of Listing 3, it won't find VLN if it only looks in the scope of A (which is the scope of *ap). This is especially true if you rearrange lines // 1 and // 2, as shown in Listing 4. This aliasing technique only works if C++ applies the golf rule (rule 1).
On the other hand, rule (2) supports different programming techniques that can also be useful. Sometimes, the author of a derived class may wish to provide users access to hidden names in the base class, without requiring that users know the name of the base class. This gives class designers a little more flexibility to change the class hierarchy without forcing the users to also change their code. The example in Listing 5 illustrates the problem and a solution.
In Listing 5, D::f() hides inherited member B::f(). Normally, the function call in
void g(D *pd) { pd->f(); ... }applies D::f() to the D object addressed by pd. Function g could call B::f() with a call such as
pd->B::f();But this requires that g "know" the base class by name. The alternative (shown in Listing 5) is for D to define
typedef B inherited;as a type member, so that g can call
pd->inherited::f();Stroustrup [2] discusses this technique in greater detail. It only works if C++ compilers look up inherited in the scope of the class of *pd, namely D, using rule 2. Looking in the context of the call itself (rule 1) defeats this technique.
The committees never found a convincing case for choosing one lookup rule over the other. Thus they decided that C++ should use both lookup rules, and that the combined result of both lookups should yield only one type. If the lookups yield conflicting results, the expression is ambiguous. The following is a more precise statement.
In a postfix-expression of the form x.B::c, the translator looks for B as a type (looking only for types) in two contexts:
1. the class scope of x (that is, look up B as if it were a member of x), and
2. the context in which the entire postfix expression appears (the context in which the translator looked for x itself).
The combined set of types found by (1) and (2) must have exactly one element; that element is the meaning of B. If the translator finds nothing, then it treats B as undeclared. If it finds more than one match, the reference is ambiguous.
There is precedent for this approach to name lookup in other parts of C++. When a C++ translator encounters an expression of the form x @ y, where x is an object of a class type and @ is an overloadable operator, it looks for an operator@ that satisfies either
x.operator@(y)or
operator@(x, y)The search must find exactly one match. If it finds more the one, the expression is ambiguous.
Now, at long last, I get to explain the extension itself. As part of clarifying the lookup rules, the committee decided to grant C++ programmers a little more control over name lookup for B in x.B::c. The extension allows postfix expressions of the form
a.::B::c, a.::B::C::d, etc. p->::B::c, p->::B::C::d, etc.That is, C++ now allows :: after the . or ->. In such expressions, the translator evaluates B as if it appeared at the global scope. There's not much to the extension itself, once you get the background out of the way.
Conversion from T **to const T *const *
As a general rule, C++ (like C) allows implicit pointer conversions that increase const-ness, but not conversions that decrease const-ness. For example, given
char *strcpy(char *s1, const char *s2); char a1[N], a2[N];the call
strcpy(a1, a2);successfully converts the expression a2 from type char[N] to char *, and then to const char *. On the other hand, given
const char ca1[] = "asdf";the call
strcpy(ca1, a2);is an error because it requires converting ca1 from const char[5] to const char * (OK so far), and then to char *. The last conversion is an error because it tries to strip away a const qualifier. For that you would need a cast.
Saying that C++ allows pointer conversions that increase const-ness is an oversimplification. Actually, C++ (like C) forbids certain pointer conversions that appear to increase const-ness. In particular,
T ** => const T **is not safe, and C++ does not allow it. To see why this is unsafe, let's work our way through a progression of similar conversions. In the following, => means "implicitly converts to."
It's true that for any type T,
T * => const T * //1a: OKsafely increases const-ness. This much is clear. You cannot use the result of the conversion to corrupt a constant value. The const and T in //1a can appear in either order, so it's also true that
T * => T const * //1b: OKNow, if you replace T in //1b with U *, you get
U ** => U *const * //2: OKSince //2 is okay, it appears that
T ** => const T ** //3: nopeshould be also, but it isn't. If C++ allowed //3, then you could accidentally change the value of a constant object, as shown by the following example.
C++ cannot accept
const char c= 'x'; char *pc = &c; //4: error *pc = 'y'; //5: OKbecause it would change the value of c, which is supposed to be const. The error is not on //5; pc was declared to point to a non-const char. The error is on //4, because it tries to convert
const char * => char * // nowhich decreases const-ness.
Now, let's extend the example to:
const char c = 'x'; char *pc; const char **ppc = &pc; // 6: ? *ppc: &c; // 7: OK *pc = 'y'; // 8: OKThis code also tries to change the value of c, but via a different route. The problem is not on //7, which simply copies one const char* to another. Nor is it on //8, which is the same as //5 in the previous fragment. No, the problem is that //6 opens a hole in the const safety net. If C++ (or C) allowed
T ** => const T ** //3then it would have to allow //6.
There is, however, a restricted conversion that is safe, namely,
T ** => const T *const* // 9: OKThis prevents the conversion on line //6 in the previous example. But is it still useful? In his paper proposing this extension, Andrew Koenig of AT&T showed that the conversion is useful with the following example.
Given an array such as
char *quintet[] = { "flute", "oboe", "horn", "clarinet", "bassoon" };you can compute the length of the longest string in such an array with a function declared as
size_t maxlen(char **a, size_t n);Unfortunately, this function will not accept
const char *quartet[] = { "violin", "violin", "viola", "cello" };because const char ** does not convert to char **. Thus, it appears that you must overload maxlen with another declaration:
size_t maxlen(const char **a, size_t n);(You can't use this latter declaration in place of the original because passing quartet as the first argument requires conversion //3 above, which we already banished.)
However, with the extension to allow the conversion
T ** => const T *const *you can get by with just one declaration for maxlen, namely,
size_t maxlen(const char *const *a, size_t n);In fact, this one definition even accepts
const char * const trio[] = { "washtub", "jaw harp", "kazoo" };Listing 6 shows an implementation of this maxlen function that you can use for experiments.
The committees actually approved a more general form of the conversion rule. Here's what the draft says:
A conversion can add type qualifiers at levels other than the first in multi-level pointers, subject to the following rules:
Two pointer types T1 and T2 are similar if there exists a type T and integer n > 0 such that:
T1 is T cv1,n * . . . cv1,1 * cv1,0and
T2 is T cv2,n * . . . cv2,1 * cv2,0where each cvi,j is const, volatile, const volatile, or nothing.
An expression of type T1 can be converted to type T2 if and only if the following conditions are satisfied:
- the pointer types are similar.
- for every j>0, if const is in CV1,j then const is in cv2j, and similarly for volatile.
- the cv1,j and cv2,j are different, then const is in every cv2,k for 0<k<j.
T *** => const T *const *const * T **** => const T *const *const * const *and so on. So are:
T *** => T **const * T const **** => T const **const *const *and even:
T** => T volatile *const volatile *I'll describe more of these enhancements next month.
References
[1] Margaret A. Ellis and Bjarne Stroustrup. The Annotated C++ Reference Manual (Addison-Wesley, 1990).
[2] Bjarne Stroustrup. The Design and Evolution of C++ Addison-Wesley, 1994).
[3] P. J. Plauger. The Standard C Library (Prentice Hall, 1992).