Catching Untested Return Codes
Marc Guillemot
Who watches the watchers, at least to make sure they're watching? This class does.
Function return values are commonly used to indicate whether a function executed without error. However, it is difficult to ensure that the caller uses this information appropriately. Maybe some commercial tools can do the job, but you can't always get permission to purchase one, particularly in small projects. You have a good chance of hearing the response: "I trust you, you don't make such mistakes."
The idea I present here was inspired by a bug we had in our project some weeks ago, which appeared only in a production environment. It took a few days of work to find that it came from the failure of an environment-specific initialization routine. In fact, the code that called this routine did not test its return code.
Adding a Responsibility Flag
In my (yet not so long) experience, I have often seen function return codes grouped, sometimes by type, into an enum. As shown in Figure 1, the caller can more or less ignore such return values. In order to control what happens to the returned values, I do not directly return these values but instances of a class ErrorCode, which contains two member variables: a value (enValue_) indicating a function error code, and a responsibility flag (PboResp):
class ErrorCode { private: ErrorCodeValue enValue_; bool * PboResp_; public: // some code }
The purpose of the responsibility flag is to indicate whether an instance of ErrorCode is responsible for the value it contains or not. When an instance of class ErrorCode is copied, using either the copy constructor or copy assignment, the value enValue_ is copied and the responsibility flag contained in PboResp_ is "transferred." By "transferred" I mean that the copy operation transfers responsibility from the source instance to the destination instance. After the copy, the source instance is no longer responsible for its content. (Actually, the copy constructor has slightly different semantics than the copy assignment operator, but the general idea is the same. More on this below.)
Since it is a common practice to use the const qualifier for the argument of the copy constructor and copy assignment functions, I have chosen to implement the responsibility flag as a pointer to a Boolean rather than as a Boolean. This enables the copy function to modify the responsibility flag of the source instance passed as an argument. There is another constraint which applies to operator=: if it is called to assign to an instance of class ErrorCode for which the responsibility flag is true, the fact that previous value enValue_ is lost must be logged. (The responsibility transfer process described here is similar to transfer of ownership that occurs when an auto_ptr is copied.)
Using ErrorCode instances also requires that the == and != operators be defined. These operators are needed to compare an ErrorCode instance returned from a function against a temporary instance that has been fabricated to represent a specific error state (such as success). As you might expect, these operators work by comparing the internal enValue_ members of the two ErrorCode objects being tested. However, these operators also set the responsibility flags of the participating ErrorCode objects to false, so that no logging of "untested error codes" occurs. If needed, other test operators (<, >, ... ) can be written for class ErrorCode as well.
Finally, the destructor checks whether the instance is still responsible for its error code value and logs it when that is the case.
Integration with Existing Code
To my thinking, the success of this technique depends on its being very easy to integrate into existing real-world programs. Figure 1 shows a situation which can be considered as a good starting point to show how the integration can be done.
First, when I create the class definition I use the name of a previously defined enum for return values as the class name. Thus, with just a recompilation of the program, all functions that previously returned the enum ErrorCode now return an instance of class ErrorCode. I also change the name of the existing enum from ErrorCode to ErrorCodeValue. To make this technique really effective, it is also necessary to define a constructor for class ErrorCode that takes an enum ErrorCodeValue as a parameter. This constructor is implicitly called, mainly in two situations: first, when a legacy function returns an ErrorCodeValue instead of an instance of class ErrorCode; second, when an ErrorCode instance is tested against an ErrorCodeValue via operator== or operator!=. In this case, a temporary ErrorCode instance is constructed from the ErrorCodeValue to be used as an argument to the comparison operator. As stated previously, the comparison operator also "turns off" the responsibility flag for both instances involved in the comparison.
Implementation
Figure 2 shows the new implementation of error handling, this time using an ErrorCode class. Adding this class requires changes only to existing file ErrorCodes.h. In addition, it requires the creation of a file ErrorCodes.cpp, to implement the member functions of class ErrorCode.
As explained above, the copy constructor and copy assignment operator must transfer responsibility for error codes to their target objects. But that is not enough. A function must always return an ErrorCode instance that is responsible for its contents. So both the copy constructor and the constructor that takes an ErrorCodeValue set the responsibility flag to true in the newly constructed instance and they do so unconditionally. (This is how the copy constructor differs from the copy assignment operator; the copy assignment operator copies the flag from source to destination instance. The copy constructor sets it unconditionally.)
Finally, the default constructor differs from all the other constructors, because it initializes the responsibility flag to false. A default-constructed ErrorCode does not represent an untested error code. It it totally up to the programmer who created it to decide what to do with it.
A possible enhancement to ErrorCode would be to add an assertion to the destructor and copy assignment operators to intercept "error leaks" during the development and test phase.
Executing the program in Figure 1 with the new ErrorCode class will write the following messages on the standard error output:
Destruction of untested error code: value 1 Untested error code (value 0) erased by new value 2
Conclusion
I am convinced this coding trick can be very helpful in detecting a certain class of bugs. Note that this method does not eliminate any bug but just reports untested return codes after the problem has occurred. I believe it is really simple to implement even in an existing project. At the beginning, you will probably get a log file with a large number of untested return codes. In our project, we have therefore had to clean a bit our code.
Marc Guillemot lives in France and has worked as software engineer for two years. He has worked in banking and life insurance, and nows works for an Internet home grocery. He has also developed a tool for math typesetting in MSWord that is currently used by many math teachers. He can be reached at [email protected].