Stepping Up To C++
Temporary Inconvenience, Part 2
Dan Saks
Dan Saks is the founder and principal of Saks & Associates, which offers consulting and training in C++ and C. He is secretary of the ANSI and ISO C++ committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield OH, 45504-4906, by phone at (513)324-3601, or electronically at [email protected].
The emerging C++ standard lets C++ compilers introduce temporary objects into the object program as needed to implement proper run-time behavior. If the object program employs a temporary that has a constructor, it must call a constructor for that temporary. Similarly, if a temporary has a destructor, the object program must call that destructor for the temporary.
The lifetime of a temporary object is the period of time during program execution from the temporary's construction until its corresponding destruction. The Annotated C++ Reference Manual (ARM) (Ellis and Stroustrup [1990]) and, until recently, the Working Paper for the C++ standard left the lifetime of temporary objects created during expression evaluation "implementation dependent." Thus, a program might destroy each temporary almost immediately, or it might save them up and destroy them all at program termination.
Ideally, C++ programmers shouldn't have to concern themselves with exactly when the temporaries come and go. In practice, they must. Under the worst circumstances, a program might inadvertently destroy a temporary before it's done with that temporary. Or a program might tie up resources by keeping temporaries around long after they've ceased being useful.
For a few years now, the C++ standards committee has generally agreed that the Working Paper leaves programmers with too little guidance for writing portable programs. But only recently could we agree on what to do about it.
Last month ("Temporary Inconvenience: Part 1", CUJ, October 1993) I explained why C++ programs create temporary objects. This month I'll explain how variations in the lifetime of temporary objects cause problems, and how the new rules adopted by the standards committee eliminate many of those problems.
The Canonical Example
Section 12.2 of the ARM describes the implementation-dependent nature of temporary objects. It includes extensive commentary that explains problems that can arise from different policies for destroying temporaries. The commentary uses a string class for all of its examples. Much of the standards committee's discussion followed suit and used similar examples (Koenig 1992). I will too. Listing 1 shows the string class I'll use for my examples.
The String class in Listing 1 is very similar to the one I presented in "Function Name Overloading" (CUJ, November 1991) including corrections I noted in "Rewriting and Reconsidering" (CUJ, September 1993). The class stores variable-length strings in character arrays allocated from the free store. Each String has a data member str that stores the pointer to the first character in the array, and another member len that stores the array length (plus one for a null character at the end of the string). Each concatenation operation allocates a new, larger character array from the free store, and deletes the old array.
The String class defines a conversion operator
operator const char *() { return str; }that returns the pointer to the first character in the underlying array representation of a string. This conversion operator lets you pass a String to a function that expects a char * argument, as in
void foo(const char *s); // ... String s = "Hello"; foo(s);It also lets you write calls like
printf("%s\n", (const char *)s);The class defines operator[] (size_t) so that s[i] returns a reference to the ith character of String s. This lets you select and modify individual characters in a string, as in
String s; ... cout < s[i] < '\n';or
s [0] = toupper(s[0]);The class implements concatenation using
String &operator+=(const String &s);so that given
String s1 = "foo"; String s2 = "bar";then
s1 += s2;places foobar in s1.
Listing 1 also defines a non-member function
String operator+(const String &s1, const String &s2)so that given
String s1 = "foo"; String s2 = "bar";then
String s3 = s1 + s2;places foobar in s3.
Notice that this operator+ returns a String (not a String & or a String *). Thus, the program may create a temporary String object to hold the function's return value.
Alternative Lifetimes
The committee considered several different alternative policies for destroying temporaries:
- Immediately: The program destroys each temporary object immediately after using the object's value. (I believe GNU g++ uses this approach.)
- At end of statement (EOS): The program destroys each temporary object at the end of the statement that creates it.
- At end of block (EOB): The program destroys each temporary object at the end of the block that creates it. (I believe AT&T's cfront uses this approach.)
String w = x + y + z;translates into something like
String temp1 = x + y; String temp2 = temp1 + z; destroy temp1; w = temp2; destroy temp2;In contrast, a compiler that destroys temporaries at EOS translates the same declaration into something like
String temp1 = x + y; String temp2 = temp1 + z; w = temp2; destroy temp1; destroy temp2;A compiler that destroys temporaries at EOB produces essentially the same object code as a compiler that destroys at EOS, except the object program delays destroying temp1 and temp2 until it reaches the end of the block enclosing the declaration.
The Dangers of Immediate Destruction
Immediate destruction uses memory very economically. For example, an application that performs arithmetic on large matrices typically can't afford to keep many of those objects in memory. If a single expression generates several temporary matrix objects before destroying any of them, the program might easily run out of memory. Immediate destruction reduces the risk of that happening.
On the other hand, immediate destruction renders some expressions meaningless. The ARM illustrates the problem with an example like:
void foo(const char *s); void bar() { String s("foo"); String t("bar"); foo(s + t); }Using immediate destruction, the generated code looks something like:
String temp1 = s + t; const char *temp2 = temp1.str; destroy temp1; foo(temp2);Destroying temp1 deletes the array addressed by temp1.str, leaving temp2 as a dangling pointer. The resulting behavior is undefined.
The ARM gives a good indication of just how soon "immediate" can be when it says (in section 12.2) that:
There are only two things that can be done with a temporary: fetch its value (implicitly copying it) to use in some other expression, or bind a reference to it. If the value of the temporary is fetched, that temporary is dead and can be destroyed immediately. If a reference is bound to a temporary, the temporary must not be destroyed until the reference is. This destruction must take place before exit from the scope in which the temporary is created.
In the example immediately above,
f(s + t);binds a pointer to part of the (internal) representation of the temporary resulting from s + t. It neither fetches the value of nor binds a reference to the temporary. Thus, a compiler may destroy the temporary even before calling f.
Immediate destruction is hazardous in other similar situations. For Strings s and t,
printf("%s\n", (const char *)(s + t));might destroy the result of s + t immediately after it passes the result of (const char *) (s + t) to, but before it actually calls, printf. Essentially the same problem occurs in
printf("%s\n", (s + t)[i]);You might argue that the problem with all these examples is that they each use implementations of operator const char * or operator[] that violate a well-known guideline (Plum and Saks 1991, Meyers 1992, and others): a public member function should not return a pointer or reference to some part of the internal (private or protected) representation of an object. However, violating the guideline for these two operators seems to be common practice. The committee was reluctant to leave such code outside the standard.
Koenig 1992 showed that even examples that bind references only to entire objects can fail when using immediate destruction. He gave an example like the one in Listing 2. Here, the compiler can't tell in the context of foo that the reference returned from passthru is bound to the temporary resulting from s + t. Therefore, it may conclude that it's safe to destroy the temporary before calling len.
In fact, even
size_t len = String("foo").len();fails when using immediate destruction. String ("foo") creates a temporary by a constructor call and passes the address of that temporary as the this pointer for len. But the member function call neither fetches the value of the temporary nor binds a reference to it, so the compiler may destroy the temporary before calling len.
In light of these and other similar examples, the committee decided that immediate destruction was too hostile to existing code. Most C++ programs expect temporaries to have longer lifetimes.
Later Destruction
Delaying destruction of a temporary until the end of the statement (EOS) that creates it cures the problems shown above. However, it still doesn't cure everything. For example,
void bar() { String s("foo"); String t("bar"); const char *p = s + t; printf("%s\n", p); }If the program destroys temporaries at EOS, then it will destroy the temporary resulting from s + t before it calls printf.
Delaying destruction of a temporary until the end of the block (EOB) that creates it fixes the previous problem, but can't handle the program fragment in Listing 3. If the program destroys temporaries at EOB, then it will destroy the temporary resulting from s + t at the closing brace of the block inside the if statement. The subsequent printf will attempt to dereference a dangling pointer.
The ARM provides still more examples of later destruction. Ultimately, this pursuit leads to the conclusion that the only safe policy is to destroy a temporary only after all pointers or references to any part of the object have been destroyed. But this policy adds run-time overhead that's inconsistent with the "lean and mean" philosophy of C++.
Furthermore, just as immediate destruction might be too soon for some constructs, delaying destruction until EOB (or later) might be too late for others. As I mentioned earlier, late destruction might tie up resources with temporary objects no longer in use. It can also wreak havoc in multi-threaded applications that require resource locking and process synchronization.
For example, a Mutex class can provide mutually exclusive access to a shared resource through member functions lock and unlock. (You might not be able to write a Mutex class entirely in C++, but that shouldn't prevent you from using it.) Your application creates a Mutex object for each shared resource, such as
Mutex printer;A particular process (a thread) gains exclusive access to the resource by locking the corresponding Mutex object, and it releases control by unlocking that object. For example, a print spooler might access a shared printer using
printer.lock(); // access the printer (exclusively) printer.unlock();The code between the calls to lock and unlock is called a "critical section."
If the critical section creates a temporary object, and destroying that temporary requires access to the shared resource, then the process should destroy the temporary before leaving the critical section. But with destruction at EOB, the process might not destroy the temporary until after it leaves the critical section. If another process is in the critical section when the first process finally calls that destructor, then mutual exclusion fails and the shared resource might be corrupted.
After years of discussion, the C++ standards committee chose to destroy each temporary object at end of the full expression (EOFE) that created it. EOFE is a generalization of EOS. As in C, a full expression is an expression that is not part of another expression. It includes an expression statement, as well as
- an initializer
- the controlling expression of an if or switch statement
- the controlling expression of while or do statement
- each of the three (optional) expressions of a for statement
- the optional expression in a return statement
foo(s + t); ^ printf("%s\n", (const char *)(s+t)); ^ printf("%s\n", (s+t)[i]); ^ if ((s+t).len() == 0) ... ; ^However, the example in Listing 3 won't work, nor will the following:
const char *p = s + t; ^ printf("%s\n", p);In both cases, destruction at EOFE destroys the object addressed by p before calling printf.
Conditional Temporaries
Evaluating an expression that includes the conditional operator ?: might not evaluate the entire expression. It's possible that one branch of the expression might create a temporary object while the other branch might not. For example,
printf("%s\n", e ? (const char*)(s + t) : "");creates a temporary only if e is true. (The temporary results from evalutating s + t.) In effect, requiring compilers to destroy temporaries at EOFE forces them to introduce additional code into conditional expressions to determine if conditionally constructed temporaries must be destroyed.
For example, a compiler must translate the printf above into code like that shown in Listing 4. Listing 4 uses an additional variable, destroy_temp2, as a Boolean that indicates whether temporary String temp2 requires destruction at the end of the statement.
The same problem arises in expressions that use the logical operators && and | |. Operators && and | | evaluate sequentially from left to right. A program need not evaluate the entire && or | | expression if it can determine the result from the left operand. In effect, e1 && e2 is equivalent to e1 ? e2 : 0, and e1 | | e2 is equivalent to e1 ? 1 : e2.
The committee considered adding special rules for conditional operators to avoid the overhead of EOFE, such as
temporaries created in the right operand of a && or | | are destroyed before the && or | | yields its result
temporaries created in the second or third operand of a ?: are destroyed before the ?: yields its result.
But these rules reintroduce many of the problems associated with immediate destruction. For example,
printf("%s\n", e ? (const char *)(s + t : "");won't work. So, despite the small runtime overhead, the committee opted for unconditional destruction at EOFE.
In short, the new rules for the lifetime of temporaries are:
1. temporaries are destroyed at end of a full expression
2. temporaries are destroyed in the reverse order of their creation.
Style Ramifications
In the long run, these new rules should simplify the task of writing reliable and portable C++ programs. But programmers who've become hooked on late destruction may find the transition a bit painful.
I still recommend programming in a style that avoids creating temporary objects. For example, if you have a choice between using an overloaded binary operator like + or its corresponding assignment operator +=, use the assignment operator. The expression
s1 = s1 + s2;may create a temporary object, whereas
s1 += s2;almost surely will not.
A class is a struct
On a different note, one of my recent columns prompted the following query:
Mr. Saks,
In talking about one of the examples using nested classes ("Looking Up Names," CUJ, August 1993), you mentioned that in C++, "a struct is a class and a class is a struct." Does this mean that the size of a struct can be just as ambiguous as the size of a class? Specifically, can the size of a struct increase due to compiler-supplied code, such as a copy constructor, address operator, etc?
Does this mean that a struct such as:
struct P { int x, y; };is not necessarily just the size of two ints? That if I assign one P struct to another, a copy constructor may be generated? And that if I take the address of this struct, an address-of operator may be generated? Any help and insight you can provide about this would make me sleep better at night.
Thanks,
Peter Jones
SQA Engineer, Symantec
Internet: peterjon@symantec. com
Compuserve: 73370,66
When I said "a struct is a class, and a class is a struct," I meant that the only difference between a struct and a class is their default access specifiers. In the absence of an explicit access specifier, the members of a class are private and the members of a struct are public. For example,
struct P { int x, y; }; struct P { public: int x, y; }; class P { public: int x, y; };are all equivalent. Also, in the absence of a base class access specifier, the base case is private if the class is declared as a class, and public if the class is declared as a struct. For example, aside from the access to members declared in D,
class D : public B { ... }; struct D : public B { ... }; struct D : B { ... };are all equivalent.
A C struct should have the same size and alignment when compiled for the same architecture using C++. Compiler-generated functions, such as copy constructors or assignment operators, do not increase the size of objects of a class (or struct) type. The compiler-generated functions may occupy code space, but compilers often eliminate the extra code by inlining these functions. In general, non-virtual member functions do not consume any space in class (or struct) objects.
Finally, C++ will generate at most four special member functions for a class that doesn't supply them:
1. copy constructor
2. default constructor
3. destructor
4. assignment operator
It will not generate an address-of operator. You can always take the address of a class object, just as you could always take the address of a struct.
References
Ellis and Stroustrup [1990]. Margaret A. Ellis and Bjarne Stroustrup, The Annotated C++ Reference Manual. Reading, MA: Addison-Wesley.
Koenig 1992. Andrew Koenig, "Lifetime of Temporaries." ANSI C++ committee document X3J16/92-0020 and ISO C++ Working Group document N0098. January, 1992.
Meyers 1992. Scott Meyers, Effective C++. Reading, MA: Addison-Wesley.
Plum and Saks 1991. Thomas Plum and Dan Saks, C++ Programming Guidelines. Plum Hall.
Stroustrup 1986. Bjarne Stroustrup, The C++ Programming Language (1st ed.). Reading, MA: Addison-Wesley.