Samuel is a software architect at Symantec where he is the leads the team developing the Brightmail email filtering scanner. He can be contacted at [email protected].
One issue with the common C compilation model is that it can easily lead to unnecessary coupling between files. This coupling can require users of an API to either directly or indirectly include unnecessary header files. These additional header files may cause frustration as users attempt to figure out what additional header files or include paths are needed to use the API. Even if your header file includes the necessary header files or your build environment centralizes all header files in a single location, these additional header files can still be problematic as they can lead to longer compile times and add additional symbols to the global namespace of the translation unit, which increases the chance of conflicts.
Consider a hypothetical SMTP library in which one important input to the library is the contents of the message that the caller wishes to send. To make the library easy to use by different developers, the library is designed to accept the data in a variety of formats. In addition to built-in data sources such as strings (NULL-terminated character arrays, for instance), the library may also accept locally developed data sources such as Location Transparent Data Source (LTDS), a common data structure that keeps data in memory or on disk, depending on how much data is present in a specific instance. One common way to handle this would require the SMTP header to include the LTDS header; see Listing One.
<b>(a)</b> typedef struct ltds_t { ... } ltds_t; <b>(b)</b> #include <ltds.h> int smtp_send_str(const char* msg, /* other arguments elided */); int smtp_send_ltds(const ltds_t* msg, /* other arguments elided */);
For users of the API who only want to use smtp_send_str(), the fact that smtp_send_ltds() is in the same header file is the first problem you face. Before you can include the SMTP header file, you must first include the LTDS header file. Even if the SMTP header includes the LTDS header file, the fact that the LTDS header is required can cause problems. If the build environment keeps header files in different locations, the client of the SMTP header needs to add the header path in the local build environment. The additional symbols introduced by the LTDS header file could cause conflicts as well as longer build times and larger object files.
Decoupling Header Files
The solution lies in C's distinction between declarations and definitions, and how C allows duplicate declarations to appear in the same translation unit (which in the general case must all agree; in the case of structures that is guaranteed as they have their own namespace and have no modifiers that can conflict).
One issue that can be confusing is the difference between structure names and typedef names. For this discussion, the important thing to keep in mind is that a typedef is simply an alias, so it may be used interchangeably with the structure name. Using this equivalence, you rewrite the SMTP header from Listing One to Listing Two.
#include <ltds.h> int smtp_send_str(const char* msg, /* other arguments elided */); int smtp_send_ltds(const struct ltds_t* msg, /* other arguments elided */);
C allows multiple declarations of a structure, but it can only be defined once. This is true even if all the definitions are exactly identical. So if you have two separate header files that define the same named structure and both are included in a single translation unit, the compiler will issue an error; see Listing Four.
<b>(a)</b> struct foo; <b>(b)</b> struct foo { int x; }; <b>(c)</b> struct foo; <b>(d)</b> #include <foo1.h> #include <foo2.h> #include <foo3.h>
But change one or both of the definitions to declarations and the problem goes away (Listing Five).
struct ltds_t; int smtp_send_str(const char* msg, /* other arguments elided */); int smtp_send_ltds(const struct ltds_t* msg, /* other arguments elided */);
While C limits what can be done with declared but undefined structures, C does let you use pointers to undefined structures (although such pointers cannot be dereferenced). Combining that with the ability to declare the same structure multiple times lets you further rewrite the SMTP header. Once a structure declaration has been added to the SMTP header file, there is no further need to include the LTDS header; see Listing Three. The SMTP header can now be included with or without the LTDS header file.
<b>(a)</b> struct foo { int x; }; <b>(b)</b> struct foo { int x; }; <b>(c)</b> #include <foo1.h> #include <foo2.h>
Common Problems and Solutions
When transitioning source code to use such a technique, there are some common problems you may run into. The first is anonymous structures that only have a typedef name. This is usually a simple problem to solve -- simply give the structure a name; see Listing Six. Since the structure was only referred to by its typedef name, adding a name should not cause any backwards compatibility problems.
<b>(a)</b> typedef struct { ... } somename_t; <b>(b)</b> typedef struct somename_t { ... } somename_t;
The one issue that could arise when giving a structure a name is that some other structure may already be using that name. Hopefully, the other structure is a copy of the structure that you are adding a name to and can be merged. But in the case where you have two completely unrelated structures where the typedef name of one is practically identical to the structure name of the other, I suggest changing them to more unique names to prevent confusion. Another issue that arises is when a structure already has both a name and a typedef name that are very different. From the compiler's point of view, this is perfectly legal, although it can be hard for developers to remember that two distinct names are actually the same type (and will definitely cause developers to curse at the screen as they try to learn such an API). The best recommendation here is to refrain from such a practice in the future. In general, I highly recommend using the same or similar names for both the structure and the typedef name. While this is not required, the easier you make it for developers to be able to derive the typedef name from the structure name and vice versa, the easier it is to apply the transformations required by this technique.
Conclusion
That's all there is to it. While the benefits are not huge, the technique is so easy to apply it quickly becomes second nature.