Relational databases are important components of most information systems. Independent of the actual functional problems that they solve, there is always an issue of interfacing between host languages (that is, languages in which the client application is written) and the database server. Not surprisingly, there are different approaches and libraries available to developers, ranging from low-level interfaces shipped by database server vendors to entire mediating layers. Most of these methods impose the necessity of using lots of function calls that are not conceptually relevant to the problem at hand, having only a technical purposebinding and defining variables, statement preparation, and execution are examples of such calls, for instance.
Another approach is to use embedded SQL, where the SQL code and host language (C or C++, for instance) are mixed, creating a kind of "dialect" where database-related code is as minimal as possible. Pro*C/C++ is one example of this approach. The problem with such approaches, however, is that they use special preprocessors that replace the embedded SQL statements with low-level calls in the host language. This replacement step is artificial and does not mix with the rest of the language, especially with the type system.
In this article, I present a database library in which language features, such as operator overloading and templates are exploited to provide the effect of "embedding" SQL code within C++ applications, staying entirely within the Standard C++ language. (The complete source code for the library is available here.) The advantage of this approach is that the provided syntax plays safe with the rest of the language and does not require a separate preprocessing or compilation step. As a proof-of-concept, I use the Oracle Call Interface (OCI) as the underlying database interface. It should be possible to reuse most of these techniques with other interfaces.
Choosing the Most Natural Syntax
Since a library such as this is intended to provide the most natural and simple syntax for database interfacing, the code it uses should be familiar and not contain distracting elements. In short, you should need to provide only the information that is needed to solve your problem. For example, Example 1(a) means "write the object i
to the standard output." There are many things happening when this line is executed, but those that are not changing with regards to the problem at hand (stream buffering, synchronization, locale setting, and the like) are kept out of sight. It is possible to manipulate all the details, but in most of the code there is no need to do so, and Example 1(a) is indeed minimalit is not possible to make it syntactically simpler without losing the intent.
The question that motivated me to build this library was: Is it possible to achieve the same syntactic minimalism in database interfacing? In other words, what is the minimal syntax for accessing the database? With this in mind, I propose Examples 1(b) and 1(c) which illustrate a syntax that hides details as much as possible, leaving only the information relevant to the problemdatabase connection, SQL query, and source/destination of local data. The advantage of this syntax is that it is easy to understand if you already know SQL and it requires no more than is really necessaryconceptually, nothing can be removed from this syntax without losing the overall sense, just as in the regular code using the IOStreams
library. Is it possible to have a database library that provides this syntax? Thanks to C++ operator overloading, templates, and lifetime rules of temporary objects, the syntax in Example 1 is not only possible, but easy to provide.
(a)
cout << i;
(b)
int count; sql << "select count(*) from person", into(count);
(c)
int id = 7; string name = "John"; sql << "insert into person(id, name) values(:id, :name)", use(id), use(name);
(d)
sql << "query", into(var1), use(var2), ...;
Example 1: Choosing a syntax.
The Machinery
Listing One shows two classesSession
and Statement
that provide the interface that's similar to other database interface wrappers and that encapsulate most of what is happening when low-level OCI calls are used. What's important is that the interfaces in Listing One are complete with regard to database accessit is possible to execute all the basic SQL queries with the help of the relevant methods. Of course, such classes cannot provide the syntax that's proposed in Example 1, but they are enough to form a solid foundation for the complete library. Interestingly, the Session
and Statement
classes encapsulate most of the inner workings of the OCI library, but the low-level concepts of database operation are still exposed.
Listing One
class Session { public: // connect to the database Session(std::string const & serviceName, std::string const & userName, std::string const & password); ~Session(); void commit(); void rollback(); // ... }; class Statement { public: Statement(Session &s); ~Statement(); // accumulate the binding info void exchange(/* ... */); // prepare the SQL statement void prepare(std::string const &query); // use the binding info to actually bind the variables void defineAndBind(); // execute the query bool execute(int num = 0); // fetch the next row of data bool fetch(); // ... };
Typical database access can be expressed on a step-by-step basis; see Figure 1. Each step more or less maps to the methods of the Session
and Statement
classes, and the code that uses a typical database library needs to closely reflect all these steps. Clearly, there is a deep "impedance mismatch" between the syntax proposed in Example 1 and the approach in Figure 1. Of course, even the most simple SQL query requires that all the steps are executed, but the entire process can be automated and controlled.
1. Connect to the database (this is done once for the entire session). |
2. Prepare the SQL statement for execution. |
3 (a). Gather the information needed to bind local data. |
3 (b). Bind local data to SQL placeholders. |
4. Execute the statement. |
5. Fetch more rows, if needed. |
Figure 1: Database access.
Temporary objects are perfect for this job. They can be invisible, but thanks to their lifetime rules, they can at the same time be predictable when it comes to the timing of their actions. Consider again the proposed syntax, like that in Example 1(d). This syntax can be "translated" into the steps in Figure 1 by a specially introduced temporary object that can be created at the beginning of this expression (created and returned by operator<<
) and that lives until the end of the expression, collecting everything on its way, and actually executing the statement in its destructor. Listing Two presents the additional definitions that make this trick possible:
into
anduse
functions that return objects of special types. These objects carry on information about the type and address of the source/destination of data.- The definition of the temporary object's class and a helper starter class, which exists only to create the temporary object.
- A member shift operator in the
Session
class serves two purposes: It enables storing the query and starts the creation of the temporary object that handles the rest of the expression.
Listing Two
// helper functions for preparing binding and define info template <typename T> IntoTypePtr into(T &t) { return IntoTypePtr(new IntoType<T>(t)); } template <typename T> UseTypePtr use(T &t) { return UseTypePtr(new UseType<T>(t)); } // type of the temporary object class TempType { public: TempType(Session &s); // the destructor actually makes the dirty work ~TempType(); // inserter for additional stream-formatted data template <typename T> TempType & operator<<(T const &t); // operators that accept and accumulate bind info TempType & operator,(IntoTypePtr const &); TempType & operator,(UseTypePtr const &); // ... }; // type of object that is used to kick-start the temporary class StarterType { public: StarterType(Session *s); template <typename T> TempType operator<<(T const &t) { TempType o(*session_); o << t; return o; } private: Session *session_; }; class Session { public: // the starter object StarterType once; // the inserter operator (first in the expression) template <typename T> TempType operator<<(T const &t) { return once << t; } // ... };
With these pieces, I can now better explain what happens in Example 1(d) when it is executed:
- The
sql
is an instance of theSession
class and retains all the information that is needed to create and execute new statements. - The subexpression
sql << "query"
uses theonce
member object to start up the temporary object (this also means that you can use the formsql.once << "query"
, which means that the statement is supposed to be executed only once). The"query"
string (or whatever it is) is remembered by the underlying string stream. From now on, the temporary object is designated astmp
for clarity. into(var1)
is a call to the function that returns theIntoTypePtr
object, which carries information about the type and address of thevar1
variable. The returned object is calledintoObj
in this description.- The subexpression
tmp, intoObj
causesoperator,
to store the information carried on byintoObj
in the main temporary object for later use. This operator returns the same temporary object (by returning *this
) so that the following comma operator works in the same way, and so on, until the end of the full expression. - When the
tmp
object is destroyed, the dirty work is performed behind the scenes to prepare the statement, bind all variables, and execute the query.