The D programming language is an evolutionary successor to C and C++. As such, it needs to provide new features that are credible replacements for clumsy, archaic features, such as macro text processing. One such feature is nested functions.
Nested functions are functions lexically and semantically enclosed within other functions. Notice in Listing 1 that the nested function has access to its lexically enclosing stack frame variables. (This is done by passing a hidden pointer to bar() to the enclosing frame, analogous to the hidden this pointer in C++ member functions.) Of course, you might say that you've been programming for a decade in C and C++ and have never needed nested functions. However, eventually you might recognize the common C and C++ problems here, and how they might be better done with nested functions.
Nested functions are a cleaner, more elegant solution to many common programming problems than the features C and C++ provide. In this article, I walk through three such problems, show how they are traditionally done in C/C++, and then how they can be done using nested functions in D.
Code Factoring
Code factoring describes the process of finding common sequences of code and replacing them with one sequence written in one place. The most obvious case of that is pulling common code into a function, then calling that function. C and C++ support functions very well and it's hard to imagine coding without them.
But functions have their limitations. The first is the performance penalty of setting up the stack frame and the function call/return sequence. That is partially resolved by inlining the function, followed by an optimizer pass that gets rid of the extra assignments and dead variables.
The next limitation is that the inlined function cannot reference variables in the stack frame of the calling function. Enter the standard way to get around missing capabilities in C and C++: Use a macro.
While Listing 2, code from a typical bytecode interpreter, does indeed work, it still leaves a lot to be desired:
- Macro processing is in a separate pass of the compiler, and is really a different language with its own rules independent from normal C/C++ code. It follows no scoping rules, no syntax rules, has an independent symbol table, and has no semantic connection with the rest of the language.
- When designing such function-like macros that can be embedded inside expressions, the macro must expand into an expression. This precludes using any statements. Any if-then constructions degenerate into a sometimes impenetrable morass of &&, ||, and ? expressions.
- Macros more than one line in length must use line-splicing \ schemes.
- Macros can't declare temporary variables (although there are some clumsy ways to do this for certain cases).
- Macro parameters do not pass by value nor reference, but by textual representation. This means that expressions used as arguments cannot have side effects, and expansions of parameters must be protected with parentheses.
- Once the macro replacement text gets beyond the trivial (such as expanding the PUSH macro to handle stack overflows and reallocations), these limitations become increasingly cumbersome.
- Debuggers do not see the macros, only their expansions, making it hard to debug complex macros.
As Listing 3 illustrates, nested functions elegantly eliminate these problems.
The advantages of nested functions is that they:
- Are an integral part of the language and the semantic pass. They follow all the syntactical and scoping rules of D. There is no separate preprocessing language to learn.
- Can have statements and exception handling and all the expressive power of any D function.
- Use the same syntax as other functions.
- Declare local variables just as other functions do.
- Have parameters passed by value or by reference just as parameters to other functions are.
- Can get as complicated as any other function.
- Emit symbolic debug info to the debugger.
Recursive Traversal
The way to traverse a recursive data structure (such as a binary tree) is to write a helper function to do the recursive calls. Listing 4 shows a function to count up and print the number of nodes in a binary tree. The countHelper() function is only called by doCount(), but its name is visible to other functions in the same source file. With nested functions available, countHelper() is more properly written as a nested function to make it clear it is only for use by doCount(); see Listing 5. Sure it looks more elegant, but assume it's a simple symbol table lookup, and the goal is to find a unique entry in the table but issue an error if there are duplicates. To do this, the recursive search function needs extra parameters for context information, collected together here in Paramblock (Listing 6). The nested function version looks like Listing 7, in which the code has shrunk considerably. The clunky Paramblock struct has been eliminated, as well as the pointers to it. The code is simple and to the point. Importantly, it compiles into code that is just as fast and efficient as the C version.
Generic Apply
Writing reusable container classes is an important aspect of modern programming. Part and parcel of that is providing a way to apply arbitrary code to the contents of that container. How the container is implemented is supposed to be hidden from users of that container, and what code users apply is not known to the implementor of that container.
This example is of a trivial Collection class that happens to be implemented using an array. It provides an apply() interface for users of the Collection class to apply arbitrary code to each element. The apply() function takes as arguments a generic function pointer and a generic pointer to user data; see Listing 8. Users of the Collection class implement code to find the maximum value within the Collection. Listing 9 makes use of pointers and arbitrary casting, and so although it works, it is error prone and not type-safe. Listing 10 is a rewrite of the Collection class using delegates (a delegate is a pair consisting of a function pointer combined with a data pointer). The parameter fp is declared as a delegate taking an argument of type int and returning a void. Users of the Collection class would be Listing 11. The function pointer comp_max becomes a delegate with the addition of the pointer to the stack frame of its enclosing function func(). The mechanics of this are handled by the compiler. The result is that exposed pointers are eliminated and type safety is preserved. There are no casts.
Essentially the same code is generated for both the C++ and D versions, so the cleaner syntax does not carry a penalty in speed or size.
Conclusion
Lack of nested functions is a significant deficit in the expressive power of C and C++, necessitating lengthy and error-prone workarounds. Nested functions are applicable to a wide variety of common programming situations, providing a symbolic, pointer-free, and type-safe solution.
They could be added to C and C++, but to my knowledge they are not being considered for inclusion. Nested functions and delegates are available now with D. As I get used to them, I find more and more uses for them in my own code and find it increasingly difficult to do without them in C++.
Walter Bright is the author of the D language and the Digital Mars C/C++ compiler. He can be reached at http://www.walterbright.com/.