C and C++: Siblings
Bjarne Stroustrup
We're at a crossroads for compatibility between C and C++. Can siblings go their separate ways and still remain on speaking terms? In this first of three parts, Bjarne provides context for the discussion.
Classic C [1] has two main descendants: ISO C and ISO C++. Over the years, these languages have evolved at different paces and in different directions. One result of this evolution is that each language provides support for traditional C-style programming in slightly different ways. The resulting incompatibilities can make life miserable for people who use both C and C++, for people who write in one language using libraries implemented in the other, and for implementers of C and C++ tools.
This article is Part 1 in a series that explores the relationship between K&R Cs [2] most prominent descendants: ISO C and ISO C++. My focus is the areas where C and C++ differ slightly (the incompatibilities), rather than on the large area of commonality or the areas where one language provides facilities not offered by the other. A longer technical report that presents more historical context and many more examples is available online [3].
A Family Tree
How can I call C and C++ siblings? C++ is a descendant of K&R C. However, what we call C today (the C89 or C99 Standard) is also a descendent of K&R C, and it is therefore appropriate to think of C and C++ as siblings.
Figure 1 shows the C family tree. ISO C and ISO C++ emerge as the two major descendants of K&R C, and as siblings. Each carries with it the key aspects of Classic C, and neither is 100-percent compatible with Classic C. For example, both siblings consider const a keyword, and both deem this famous Classic C program non-standard compliant:
main() { printf("Hello, world\n"); }
As a C89 program, Kernighan and Ritchies classic Hello World has one error. As a C++98 program, it has two errors. As a C99 program, it has the same two errors, and if those were fixed, the meaning would be subtly different from the identical C++ program.
As C and C++ drift further from Classic C, incompatibilities become more numerous and more pronounced. The siblings of Classic C share their various traits in a confusing array of combinations. Figure 2 reveals seven compatibility categories, and a programmer must understand which features fall in which category in order to write compatible code (see Table 1).
One of the big questions for the C/C++ community is whether the next phase of standardization (potentially adding two more circles to Figure 2) will pull the languages together or tear them further apart. In 10 years, there will be large and thriving C and C++ communities. However, if the languages are allowed to drift further apart, there will not be a C/C++ community, sharing tools, implementations, techniques, headers, and code. Figure 3 shows my nightmare scenario. Each separate area of the diagram represents a different set of incompatibilities that an implementer must address and that a programmer may have to be aware of.
The differences between C++ and C89 are documented in Appendix C of the ISO C++ Standard [4]. The major differences between C89 and C99 are summarized on two pages of the C99 foreword [5]. The differences between C++ and C99 are not officially documented because the ISO C committee had neither the time nor the expertise to document differences, and the C99 committees charter [6] did not require documenting C++/C99 incompatibilities. An unofficial, but extensive list of incompatibilities can be found on the Web [7].
The Spirit of C
The phrases the spirit of C and the spirit of C++ are often used as weapons to condemn notions supposedly not in the right spirit and therefore somehow illegitimate. More reasonably, these phrases can be used to distinguish languages aimed at supporting low-level systems programming, such as C and C++, from languages without such support. I find these spirit arguments poisonous when they are thoughtlessly applied within the C/C++ community. More often than not, these phrases dress up personal likes and dislikes as philosophies supposedly backed by the fathers of C or the fathers of C++. These attacks can be amusing and occasionally embarrassing to Dennis Ritchie and me. We are still alive and do hold opinions, though Dennis, being older and wiser, is better able to keep quiet.
The following rules are often claimed as part of the spirit of C:
- Keep the built-in operations close to the machine (and efficient).
- Keep the built-in data types close to the machine (and efficient).
- No built-in operations on composite objects.
- Dont do in the language what can be done in a library.
- The standard library can be written in the language itself.
- Trust the programmer.
- The compiler is simple.
- The run-time support is very simple.
- In principle, the language is type-safe, but not automatically checked (use lint for checking).
- The language isnt perfect because practical concerns are taken seriously.
You can find support for all of these rules in the opening pages of [2].
Naturally, Classic C is a good approximation of the spirit of C. C99 and C++ are less so, but they still approximate those ideals. This is significant because most languages dont. From the perspective of Ada, Java, or Python, C and C++ appear as twins. Only in discussions within the C/C++ community do the differences appear to overwhelm the commonalities.
In the spirit of rule 10, Classic C breaks rule 3 by adding structure assignment and structure argument passing to K&R C.
C++ starts out by breaking rule 7: a greater emphasis on type and scope distinguishes C++ compared to C. Consequently, a C++ compiler front end must do much more than a Classic C front end does. The introduction of exceptions complicates C++s run-time support, violating rule 8. However, that may be defended on the grounds that if you dont need exceptions, you can avoid using them. After 20 years, it is more remarkable that C++ closely follows the remaining eight criteria. In particular, C++ can be seen as the result of following rules 1 to 5 to their logical conclusion by allowing the user to define general and efficient types and libraries.
Compared to early C compilers, modern C implementations cannot be called simple, so C99 also breaks rule 7. Since <tgmath.h> cannot be written in C (though something almost identical can be written in C++), C99 breaks rule 5. Arguably, C99s complex facilities violate rules 1, 2, and 3.
Contrary to popular myths, there is no more tolerance of time and space overheads in C++ than there is in C. The emphasis on run-time performance varies more between different communities using the languages than between the languages themselves. In other words, overheads are found in some uses of the languages rather than in the language features.
Underlying the flame wars over the Spirit of C is a genuine concern for the direction of Cs and/or C++s evolution that is, a consistent aim to provide a coherent language from a set of changes and extensions.
In their evolution from Classic C, C99 and C++ differ in philosophy. C++ has a clearly stated philosophy of language: the emphasis in the selection of new facilities is on mechanisms for defining and using new types safely and efficiently. Basic facilities for computation were, as much as possible, inherited from Classic C and later from C89. C++ will go a long way to avoid introducing a new fundamental type. The prevailing view is that if you need one type then many programmers will need similar types. Consequently, providing mechanisms for expressing such types in the language will serve many more programmers than providing the one type as a built-in. In other words, the emphasis is on facilities for organizing code and building libraries (often referred to as abstraction mechanisms).
By contrast, the emphasis in the evolution of C89 into C99 has been on the direct support for traditional (Fortran-style) numerical computation. Consequently, the major extensions of C99 compared to C89 are in new built-in numeric types, new mathematical functions and macros, numeric I/O facilities, and extensions to the notion of an array. The contrasting approaches to complex numbers and to vectors/VLAs illustrate the difference in C++s and C99s design philosophies: C adds built-in facilities where C++ adds to the standard library [3].
Ideally, Cs emphasis on built-in facilities and C++s emphasis on abstraction mechanisms are complementary. However, to maximize compatibility, the emphasis on built-in facilities must be on fundamental computational issues (i.e., on facilities that cannot elegantly and efficiently be provided by composing already existing facilities). Care must be taken not to increase reliance on mechanisms known to cause problems for the abstraction mechanisms, such as macros (see sidebar), uneven support for built-in types, and type violations.
Understanding C/C++ Feature Differences
Most C and C++ compatibility problems fall into one of the following catagories:
- Issues that affect interfaces, such as virtual functions and VLAs.
- Issues that affect only the form of the code that they are part of, such as declarations in conditions and designated initializers.
The following sections give examples of these compatibility issues and explore some of the perils programmers face when they navigate the incongruities of C and C++.
Trivial Interfaces
C++ programmers have always known that to make code accessible to C programs they must provide interfaces that avoid non-C features, such as classes with virtual functions. These C-to-C++ interfaces have typically been trivial. For example:
// C interface: extern int f(struct X* p, int i); // C++ implementation of C interface: extern "C" int f(X* p, int i) { return p->f(i); }
C programmers typically assume any C header can be used from a C++ program. This assumption has largely been true (after someone adds suitable extern "C" directives), though headers that use C++ keywords as identifiers have been a constant irritant to C++ programmers (and sometimes a serious practical problem). For example:
// not C: class X { /* ... */ }; // not C++: struct S { int class; /* ... */ };
C99 introduces several features that, if placed in a header, will prohibit the use of that header in a C++ program (or in a C89 program). Examples include VLAs, restricted pointers, _Bool, _Complex, some inline functions, and macros with a variable number of arguments. For example:
// C99 interface features, not found in // C++ or C89: equivalent to // f(int *const): void f1(int[const]); // p is supposed to point to at least // 8 chars: void f2(char p[static 8]); void f3(double *restrict); // p is a VLA void f4(char p[*]); // may or may not be C++ also [3]: inline void f5(int i) { /* ... */ } void f6(Bool); void f7(Complex); #define PRINT(form ...) \ fprintf(form,__VA_ARGS__)
If a C header uses one of those features, mediation code and a C++ header must be provided for the C code to be used from C++.
The ability to share header files is an important aspect of C and C++ culture and a key to performance of programs using both languages. If the header files are kept compatible, C and C++ programs can call libraries implemented in the other language with no data conversion overheads and no (or very minimal) call overhead.
Thin Bindings
Shared declarations are sometimes an insufficient solution to the header compatibility problem. In cases where the languages provide similar functionality in different ways, another approach to header compatibility is to provide compatibility headers that, through liberal use of #ifdefs, provide very different definitions for each language, but allow user code to look very similar. For example:
// my double precision complex #ifdef __cplusplus #include<complex> using namespace std; typedef complex<double> Cmplx; inline Cmplx Cmplx_ctor(double r, double i) { return Cmplx(r,i); } //... #else #include<complex.h> typedef double complex Cmplx; #define Cmplx_ctor(r,i) \ ((double)(r)+I*(double)(i)) //... #endif void f(Cmplx z) { Cmplx zz = z+Cmplx_ctor(1,2); Cmplx z2 = sin(zz); // ... }
This approach requires the programmer to create a new dialect that maps into both languages. In other words, a user (or a library vendor) must invent a private language simply to compensate for compatibility problems. The resulting code is typically neither good C nor good C++. In particular, by using this technique, the C++ programmer is restricted to using what is easily represented in C. For example, unless exceptional effort is expended on the C mapping, arrays must be used rather than containers, overloading beyond what is offered by C99s <tgmath.h> must be avoided, and errors cannot be reported using exceptions. In addition, macros tend to be used much more heavily than C++ programmers would like. Such restrictions can be acceptable when providing interfaces to other code, but these restrictions are typically too constraining for a C++ programmer to use within the implementation. Similarly, a C programmer using this technique is prevented from using C facilities not also supported by C++, such as VLAs and restricted pointers.
Real code/libraries will have much larger thin bindings with many more macros, typedefs, inlines, etc., and more conventions for their use. The likelihood that two such thin bindings can be used in combination is slim and the effort to learn a new binding is non-trivial. Thus, the compatibility header approach doesnt scale and fractures the community.
Competing Programming Models
Interfaces (e.g., information in header files) are all that matter to people who see C and C++ as distinct languages that just happen to be able to produce code that can be linked together (like C and Fortran). However, teachers, implementers, and all other programmers who work in both languages must contend with equally intractable compatibility issues related to the facilities used to express computations.
The differing programming models of C and C++ lead to alternative solutions for many common tasks. These alternative approaches are problematic for the following reasons:
- An alternative forces programmers to choose between two sets of facilities and their associated programming techniques.
- An alternative more than doubles the effort for teachers and students.
- Code using separate alternatives can often cooperate only through specially written mediation code.
Consider the problem of manipulating a number of objects where that number is known only at run time. C++ and C99 offer alternative solutions not present in C89. Consider a C89 example:
/* C89: v points to m Ys */ void f89(int n, int m, struct Y* v) { /* not Classic C; not C++ */ struct X* p = malloc(n*sizeof(struct X)); struct Y* q = malloc(m*sizeof(struct Y)); /* memory exhausted */ if (p==NULL || q==NULL) exit(-1); if (3<n && 4<m) p[3] = v[4]; /* copy */ memcpy(q,v,v+m*sizeof(struct Y)); /* ... */ free(q); free(p); }
Among the potential problems with this code is that v might not point to an array with at least m elements.
The obvious C99 alternative is:
// C99: v points to m Ys void f99(int n, int m, struct Y v[m]) { // not C89; not C++ struct X p[n]; struct Y q[m]; if (3<n && 4<m) p[3] = v[4]; // copy memcpy(q,v,v+m*sizeof(struct Y)); // ... }
The nicer syntax makes it less likely that v does not point to an array with at least m elements, but that is still possible. Unfortunately, the code does not define what happens if the array definition fails to allocate memory for the n elements required. The use of arrays automates the freeing of memory, though there could still be a memory leak if f99 is exited through a longjmp.
The obvious C++ alternative is:
// C++: v holds v.size() Ys void fpp(int n, vector<Y>& v) { // not C89; not C99 vector<X> p(n); if (3<p.size() && 4<v.size()) p[3] = v[4]; // copy vector<Y> q = v; // ... }
A vector contains the number of its elements, so the programmer doesnt have to worry about keeping track of array sizes or about freeing the memory used to hold those elements.
The standard library vector is more general than a VLA. For example, vector has a copy operation, you can change the size of a vector, and vector operations are exception safe (see Appendix E of [9]). This could imply a performance overhead compared to VLAs on some implementations, but so far I have not found significant overheads.
The key point is that users have to choose and the users of more than one of these languages have to understand the different programming styles and remember where to apply them. The result is that these differences in the programming models of C and C++ make it significantly more difficult to program in both languages than to program in one even though the two languages share many features and are supposed to be closely related.
As Close as Possible...
The semi-official policy for C++ in regards to C compatibility has always been As Close as Possible to C, but no Closer [10]. Naturally, wits have answered with As Close as Possible to C++, but no Closer, but I have never seen that in any official context nor seen any elaboration of what it means.
How close is as close as possible to C? Traditionally, this statement has meant compatible with C except where the C++ type system would be compromised. Differences such as those for void*, C++s insistence on function prototypes, the use of built-in types for bool and wchar_t, and even the inline rules, can be explained that way [3].
The as close as possible... rules were crafted under the assumption that the other language was immutable. In reality, it has not been so: just look at the number of cross borrowings between C and C++ [3]. I believe that it would be technically feasible for as close as possible to be identical in the subset supporting traditional C-style programming assuming that changes could be made simultaneously to both languages systematically bringing them closer together.
Whatever is (or isnt) done must be considered in light of the fact that the world changes rapidly and users expect programming languages to evolve to meet new challenges. Thus, compatibility issues must be considered in the wider context of language evolution. The most promising approach is to consider C and C++ close to complete in language support for their respective programming styles. Future extensions can focus on provision of standard or non-standard libraries. If you think of C and C++ as essentially complete, C/C++ compatibility emerges as part of the consolidation and cleanup of basic facilities.
Youll learn more about the case for C and C++ compatibility in next months CUJ.
References
[1] Classic C is K&R C plus structure assignment, enumerations, and void. I picked the term Classic C from a sticker that used to be affixed to Dennis Ritchies terminal.
[2] Brian Kernighan and Dennis Ritchie. The C Programming Language (Prentice-Hall, 1978).
[3] Bjarne Stroustrup. Sibling Rivalry: C and C++ (AT&T Labs Research Technical Report TD-54MQZY, January 2002), <www.research.att.com/~bs/siblingrivalry.pdf>.
[4] ISO/IEC 14882, Standard for the C++ Language.
[5] ISO/IEIC 9899:1999, Programming Languages C.
[6] John Benito, the ISO C committee liaison to the ISO C++ committee, in response to a request to document C++/C99 incompatibilities similar to C89/C++ incompatibilities.
[7] David R. Tribble. Incompatibilities between ISO C and ISO C++, <http://david.tribble.com/text/cdiffs.htm>.
[8] Bjarne Stroustrup. The Design and Evolution of C++ (Addison-Wesley, 1994).
[9] Bjarne Stroustrup. The C++ Programming Language, Special Edition, (Addison-Wesley, 2000).
[10] Andrew Koenig and Bjarne Stroustrup. C++: As Close to C as Possible but No Closer, The C++ Report, July 1989.
[11] Graham Birtwistle, Ole-Johan Dahl, Bjorn Myrhaug, and Kristen Nygaard. SIMULA BEGIN (Studentlitteratur, 1979).
[12] ISO/IEC 9899:1990, Programming Languages C.
[13] Brian Kernighan and Dennis Ritchie. The C Programming Language, Second Edition (Prentice-Hall, 1988).
[14] Martin Richards and Colin Whitby-Strevens. BCPL, the language and its compiler (Cambridge University Press, 1980).
Bjarne Stroustrup is the designer and original implementer of C++. He has been a member of the C/C++ community since he first used C in 1975. For 17 years, he worked in Bell Labs Computer Science Research Center alongside people such as Dennis Ritchie and Brian Kernighan. In the early 1980s, he participated in the internal Bell Labs standardization of C. He is the author of The C++ Programming Language and The Design and Evolution of C++. His research interests include distributed systems, operating systems, simulation, design, and programming. He is an AT&T Fellow and heads AT&T Labs Large-scale Programming Research department. He is actively involved in the ANSI/ISO standardization of C++. He received the 1993 ACM Grace Murray Hopper award and is an ACM fellow.