Dan tells us what's new in C++, and how stray parentheses can do surprising things to a new expression.
Copyright © 1997 by Dan Saks
For the past several months, I've been exploring various aspects of new- and delete-expressions. (See "C++ Theory and Practice: new and delete," CUJ, January 1997; "C++ Theory and Practice: Class-Specific new and delete," CUJ, March 1997; "C++ Theory and Practice: Placement new," CUJ, April 1997; and "C++ Theory and Practice: Placement delete," CUJ, May 1997.) This month, I'll begin another pass through both new- and delete-expressions, focusing on their syntactic structure.
On the surface, new-expressions have a pretty simple structure. However, that structure is overly dependent on parentheses, and therefore more complicated than it first appears to be. An extra or missing set of parentheses in a new-expression can dramatically alter that expression's meaning.
I often kid that, when confronted with a declaration that won't compile, C and C++ programmers often react by thinking "Oh, what the heck. I'll just throw another * or pair of parentheses into the declaration and recompile. Maybe that will work." If you take this approach, you may occasionally find yourself with a declaration that compiles, but doesn't mean what you think it means. In that case, all you've done is transform a compile-time error into a run-time error, which will probably be harder to track down.
New-expressions are a lot like declarations in this respect. In most cases, a new-expression with extra or missing parentheses won't compile. The diagnostic message you get might not hit the mark, but at least you get some indication of trouble. Unfortunately, there are times when extra or missing parentheses, possibly in concert with another nearby mistake, will let your new-expression compile even though your understanding of the expression is not the same as the compiler's. This provides another opportunity to sharpen your debugging skills.
The sad truth is that C++ compilers have trouble pinpointing and diagnosing many erroneous constructs. I suspect they always will. The C++ language is just too complicated. That's why I believe that understanding C++ syntax goes a long way toward helping you cope with the language and current compilers.
If you learn to see the language from the compiler's perspective, diagnostics messages will make more sense and you won't have to guess how to fix errors (at least, not very often). This is the approach I took last year in my series of articles on the syntax of declarations (beginning with "The Column That Needs a Name: Understanding C++ Declarations," CUJ, December 1995). As part of that series, I developed and presented a program called decl that translates C++ declarations into English. (The final form of that program appeared in "C++ Theory and Practice: Declarators, Finale," CUJ, October 1996.)
As with declarations, new-expressions are built around type- specifiers and declarators. The skills you need to master declarations are essentially the same skills you need to master new-expressions. In fact, there is so much overlap between the syntax of new-expressions and the syntax of declarations that it was fairly easy for me to adapt decl into another program called newexpr that translates new-expressions into English.
By comparison with new-expressions, delete-expressions have a much simpler syntax. Nonetheless, delete-expressions offer a few surprises of their own. Although I don't believe writing a program that interprets delete-expressions would be a very helpful exercise, the syntax still deserves a few words of explanation.
Type-Specifiers and Declarators
New-expressions are one of several constructs composed from type-specifiers and declarators. I introduced these terms in "The Column That Needs a Name: Understanding C++ Declarations," CUJ, December 1995. I explored declarators in greater depth in "The Column That Needs a Name: Understanding C++ Declarators," CUJ, January 1996. Here's a quick review.
In C++, as in C, every object and function declaration has two parts: a sequence of decl-specifiers, followed by a comma-separated list of one or more declarators. For example, in the declaration
static char const *table[N];
static, char, and const are the decl-specifiers and *table[N] is the lone declarator.
A decl-specifier can be a type-specifier (such as int, float, or an identifier naming a type), a storage-class-specifier (such as static or extern), or a function-specifier (such as inline or virtual). The order of the decl-specifiers doesn't matter to the compiler, so that
char const static *table[N];
has the same meaning as the previous declaration.
A declarator is a declarator-id (the name being declared) along with any operators (*, &, (), and []) that might surround it. In the previous declaration, *table[N] is the declarator. A declarator-id can be an identifier (such as table) or an operator name (such as operator=). A declarator-id might include class or namespace names as qualifiers (such as string::operator=).
The declarator operators bind with the declarator-id according to operator precedence rules. () and [] have the same precedence as each other. * and & have the same precedence as each other. () and [] have higher precedence than * and &. For instance, in the declarator *table[N], [N] binds to the declarator-id before * does. Thus, *table[N] specifies that table is an "array with N elements of type pointer to ...". As in expressions, you can use parentheses to group operands and operators, so that (*table)[N] specifies table as a "pointer to array with N elements of type ...".
Declarators can appear within larger declarators. For example, the declarator in the function declaration
int putc(int c, FILE *f);
is putc plus the parameter list (including the parentheses). That parameter list contains two declarations, each of which has a declarator. The declarator for the first parameter is just the identifier c. The declarator for the second parameter is *f.
In some contexts, such as parameter declarations, you can omit the declarator-ids from declarators. For example, you can write the declaration for putc as
int putc(int, FILE *);
A declarator without a declarator-id is an abstract-declarator. Both the C and draft C++ Standards use the term declarator when referring to a declarator with a declarator-id. I find this terminology a little imprecise. I use the more explicit term concrete-declarator to refer to a declarator that has a declarator-id, and use plain declarator to mean either an abstract- or a concrete-declarator.
Table 1 shows a grammar for C++ object and function declarations. That grammar is expressed in the EBNF notation summarized in Table 2. (See "The Column That Needs a Name: A Sensible Grammar Notation," CUJ, November, 1995 for an more elaborate explanation of EBNF.) Table 1 describes declarations as recognized by the decl program I presented last year. That grammar omits just a few details from the corresponding parts of the full C++ grammar: it does not allow a declarator-id to be an operator name or qualified name, it does not allow arithmetic operators in constant-expressions, and it does not include storage-class-specifiers and function-specifiers among the decl-specifiers.
Abstract-declarators show up in places other than declarations. For example, in the old-style cast:
(unsigned char *)p
the * is an abstract-declarator (meaning "pointer to..."). Together, the type-specifiers unsigned char and the abstract-declarator * are called a type-id.
A type-id is similar but not identical to a parameter-declaration. Grammatically, a type-id is
type-id = type-specifier-seq abstract-declarator .
whereas a parameter-declaration is
parameter-declaration = decl-specifier-seq declarator .
The specifiers in a type-id can be only type-specifiers, but a parameter-declaration may include storage-class-specifiers. For example,
register char *p
is a valid parameter-declaration, but
(register char *)p
is not a valid cast-expression because register is not a type-specifier. The declarator in a type-id must be an abstract-declarator, but the declarator in a parameter-declaration can be either abstract or concrete.
Parentheses in New-Expressions
Type-ids also appear in new-expressions. For example, in
new T
T is a type-id. Its type-specifier sequence is T, and its abstract-declarator is empty.
As I explained over the past few months, a new-expression can have operands in addition to the allocated type. A new-expression may have a parenthesized initializer, as in:
new T (x, y, z)
which allocates a T object and initializes it with a constructor that accepts (x, y, z) as arguments. A new-expression may have placement arguments, as in:
new (p, q) T
which allocates a T object using an operator new that accepts (sizeof(T), p, q) as arguments. A new-expression can have both placement and initializer arguments, as in:
new (p, q) T (x, y, z)
As in other contexts, the type-id in a new-expression can be much more than just a single type name. For example:
new T[N]
allocates an "array with N elements of type T", and:
new T**
allocates a "pointer to pointer to T". The type-id in a new-expression can get pretty elaborate, as in:
new char const *const [256];
which allocates an "array with 256 elements of type const pointer to const char". In this case, the type-specifier sequence is char const and the abstract-declarator is *const [256].
None of these examples just above contains any surprises, but that's because none of the type-ids in those examples contains parentheses. It's type-ids with parentheses that make for some pretty tricky parsing.
For example, suppose you want to allocate a "pointer to function with no parameters returning T". You might try doing this by analogy with the declaration for a function f that has a parameter pf with that type:
void f(T pf());
By itself,
T pf()
declares pf with type "function with no parameters returning T". However, a parameter with "function..." type immediately decays (transforms) to "pointer to function..." type.
You can omit the parameter name pf from the declaration for f, so that
void f(T ());
also declares f with a parameter of type "pointer to function with no parameters returning T". T () is a type-specifier followed by an abstract declarator, so it looks as if you should be able to use it as a type-id in a new-expression:
new T()
Indeed you can, but it doesn't allocate a pointer to a function as intended. C++ interprets the () in this new-expression as the constructor argument list, not as the parameter list in a function declarator. This new-expression allocates a T object and initializes it with its default constructor.
Maybe it wasn't such a good idea to rely on the function declarator decaying to a pointer type. This decay occurs only in parameter lists. You could declare function f as:
void f(T (*pf)());
which declares more explicitly that pf is a pointer to a function. Again, you can omit the declarator-id from the parameter declarator, and get:
void f(T (*)());
Once again, the parameter declaration T (*)() looks like it could pass for a type-id. Its type is "pointer to function with no parameters returning T". However, if you try using this type-id in a new-expression:
new T (*)()
it won't compile. Your compiler will probably interpret (*) as an ill-formed constructor argument list, not as a parenthesized pointer operator in an abstract-declarator.
Oh, what the heck. Let's just remove some of those pesky parentheses and see if it compiles:
new T *()
This is a valid new-expression, but it doesn't allocate a pointer to a function as hoped. Rather, the compiler takes T * as the type-id, and takes () as the argument list. Therefore, it allocates a "pointer to T" and initializes it to zero (a null pointer value). (I explained the initialization rules in the first article in the series: "C++ Theory and Practice: new and delete," CUJ, January 1997.)
If you add parentheses around the entire type-id, you get:
new (T (*)())
Not only is this also a valid new-expression, but this actually allocates a "pointer to function with no parameters returning T", as intended.
This is just a small taste of what I meant when I wrote earlier that an extra or missing set of parentheses in a new-expression can dramatically alter that expression's meaning.
New-expressions were apparently designed so that the most common uses have simple forms. If you stick to those simple forms, you shouldn't run into any trouble. However, as soon as you try to use the full power of new-expressions, you can't help but notice that new-expressions use parentheses for too many different purposes:
- as delimiters around placement arguments
- as operators in declarators
- as delimiters around constructor arguments
Consequently, you can easily find yourself tangled in a nest of parentheses. The best way to get untangled is to become familiar with the grammar for new-expressions.
A Grammar for New-Expressions
Table 3 shows an EBNF grammar for new-expressions. Many of the productions (grammar rules) in Table 3 refer to productions from Table 1. In fact, the grammar in Table 3 depends on all but the first three productions from Table 1.
The first production:
new-expression = [ "::" ] "new" [ new-placement ] allocation-type-id [ new-initializer ] .
shows that a new-expression has three distinct parts after the keyword new: an optional new-placement followed by a mandatory allocation-type-id, followed by an optional new-initializer.
The new-placement and new-initializer have nearly identical form. The only real difference is that a new-initializer contains a list of zero or more expressions, while a new-placement contains one or more. That is, () is valid as a new-initializer but not as a new-placement.
Allocation-type-id is defined by the production:
allocation-type-id = new-type-id | "(" type-id ")" .
In other words, it is either an ordinary type-id enclosed in parentheses, or a new-type-id. A new-type-id is yet another variety of type-id; it can occur only in new-expressions.
A new-type-id is essentially a type-id than cannot have parentheses in it. It can specify pointer and array types, but not function types. (The grammar allows a new-type-id to specify a reference type, but there's a semantic constraint that prohibits allocating a reference.)
In practice, most new-expressions have a new-type-id rather than a parenthesized type-id. For example, in:
new T()
T is a new-type-id, and () is the new-initializer. In:
new T[N]
T[N] is the new-type-id without a new-initializer. In:
new T*[N]()
T*[N] is the new-type-id, and its type is "array with N elements of type pointer to T". The () is the new-initalizer.
I used to think that a new-expression that allocates an array could not have a new-initializer. If there ever was such a rule in C++, it's not there any more. A new-expression that allocates an array can have a new-initializer, but only if that initializer is empty. A new-expression such as
new T*[N](v)
is ill-formed because the allocation type is an array type and the new-initializer is a non-empty list.
In my earlier example of:
new (T (*)())
(T (*)()) is a parenthesized type-id, not a new-type-id. If you omit the outermost parentheses, you get:
new T (*)()
which your compiler will probably try to parse as if it had a new-type-id. Since a new-type-id cannot have parentheses, the compiler will probably view T as the entire new-type-id, and assume the (*) is an ill-formed new-initializer. I suspect that if you get any diagnostic message other than "syntax error," it will be a complaint about an erroneous initializing expression, even though the real problem is the missing outer parentheses.
Formally, the difference between a type-id and a new-type-id is that a type-id has an abstract-declarator, while a new-type-id has a new-declarator. The essential difference between these declarators is that an abstract-declarator can have parentheses and a new-declarator cannot. However, new-declarators serve another purpose, namely, they permit a new-expression to allocate an array with a variable number of elements.
An ordinary declarator (abstract or concrete) can specify an array type, but every specified dimension must have a constant value. For example, in the declaration
char *table[M][N];
M and N must be constant.
Grammatically, [M] and [N] are array-suffixes. The production for array-suffix in Table 1 suggests that any and every array dimension could be empty. However, semantic rules mandate that only the first dimension may be empty, and then only in a few contexts. For example,
char *table[][N]
is valid, either in an extern declaration or a parameter declaration. Neither
char *table[M][] char *table[][]
are ever valid.
Array declarators in new-expressions are unique in that they can have a first dimension that is non-constant. For example,
new char[strlen(s) + 1]
allocates an array of characters whose number of elements may vary at each execution. This is okay because the production for direct-new-declarator (used by new-declarator) specifies the first array dimension as an expression, not just a constant-expression.
If you rewrote the previous new-expression as
new (char[strlen(s) + 1])
then it would be in error. Placing parentheses around the entire new-type-id leads compilers to parse it as an ordinary type-id, which requires a constant-expression as the first dimension.
If you haven't done so already, take a moment to compare the productions for declarator and direct-declarator in Table 1 with the productions for new-declarator and direct-new-declarator in Table 3.
More to Come
Next month, I'll show you how I adapted my decl program (which translates declaration into English) into newexpr (which translates new-expressions into English), and also take a look at the syntax of delete-expressions. o
Dan Saks is the president of Saks & Associates, which offers training and consulting in C++ and C. He is active in C++ standards, having served nearly 7 years as secretary of the ANSI and ISO C++ standards committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield, OH 45504-4906 USA, by phone at +1-937-324-3601, or electronically at [email protected].