Rank:3 Protocol Soup
Here are the common protocols covered in this article: |
You probably have never thought much about how applications connect to databases. Why would you? But many software engineers have thought about it for you and produced several different technologies for allowing applications and databases to talk to each other.
The fundamental problem is not only that applications need to be able to access databases, but that they might even need to access several different kinds of databases. If you're a programmer, odds are that you'd prefer to use a single database access protocol that can handle databases from a number of different vendors rather than learning something completely different for each one.
General database access protocols are designed so that the detailed information about a particular database product can be "snapped in" to a common framework without worrying about the specifics. An application uses the general database access protocol to connect to a specific source of data and the "snap in" for that particular type of database handles the translation from the common protocol to the native language of the database, and vice-versa.
This means that if you need to switch your data from a low-end database to a more sophisticated one, the only necessary step to access the data in its more sophisticated database format is to snap in the appropriate component for the new database. The underlying communication between the application and the database is not effected, and the specific database component takes care of the details.
There are several downsides to this sort of common database access framework. The main disadvantage is that someone has to make the components to snap in for each database product. This is a very difficult task, and there are simply no components available for some databases on certain platforms and under certain protocols. Another problem is speedall the translations between the common protocol and the database mean additional processing time. And in some cases, there's no way to translate database-specific commands and procedures into the common protocol. All of these limits, however, are typically countered by the relative ease of building web database applications using the common protocols.
ODBC
The most common general database protocol is Open DataBase Connectivity (ODBC). This technology was developed jointly by IBM, Microsoft, and a number of other manufacturers in the late 1980s to make it easier to interoperate between their various database products. The snap-in components of ODBC are know as ODBC drivers, and they can be found or bought for virtually every common database platform. ODBC is slower than newer technologies such as OLE-DB, but has the widest support of both databases and applications that can use ODBC to access a database.
JDBC
Sun released JDBC, the Java equivalent of ODBC, shortly after Java was released. This is the connection method of choice for Java applications, but drivers exist for fewer databases than with ODBC.
OLE-DB
Microsoft's newest technology for database access is OLE-DB, which is part of their entire enterprise COM architecture. The snap-in components for different databases are known as providers and they are far more sophisticated than the ODBC and JDBC drivers. The support for databases is quite limited, albeit to the most popular database platforms (MS-SQL Server, Oracle, and the like). This is probably the highest performance method for accessing a database outside of the native database drivers, but it's also currently limited to the Windows NT platform.
Native database access
The fastest way to access a database is to directly interface with the file natively. Many web database connectivity tools offer native database access to provide performance boosts.
Next stop, SQL
Now that we've discussed how an application connects to the Web and how it can connect to a database, you should have a basic understanding of how a web application server works. In all cases, the application uses one of the web server connectivity protocols and one of the database connectivity protocols to send messages back and forth.
You probably also have a pretty good idea of why this type of software is often referred to as middleware. The next question is "what kind of messages does it send?" You'd probably guess that the message sent to the web server is some sort of HTML document ... and you'd be right. HTML is the common language of the Web! But what kind of message does the application send the database? The answer to that varies. If you're using relational databases (most people doing web database development do), you're probably going to need to send that message using Structured Query Language, or SQL. And to really understand SQL, you need a quick history lesson.
The history of SQL
In 1974, IBM researchers developed the first prototype relational database, System R. The database included a language for manipulating the data which they called SEQUEL (Structured English-like QUEry Language). This database language eventually evolved into SQL. The language is technically a mathematical formalization of relational algebra based on first-order predicate logic. Whatever!
Basically, SQL is an English-like language that is not procedural like most programming languages. Instead SQL is simply a data manipulation language. You can create, delete, and change data with SQL, but you can't really use it to work on a text file or produce graphics.
As more relational databases were developed, the usefulness of a language like SQL was readily apparent. Unfortunately, each database vendor created its own, which certainly confused matters. The International Standards Organization (ISO) got involved to create a single SQL standard to promote interoperability of databases and database code. In 1986, they formalized SQL into a standard known as the SQL-86 implementation of the language. Since then, SQL-89 and SQL-92 (a.k.a. SQL2) have also been approved. The newer SQL3 is apparently making its way through the standards process as we speak.
Despite this set of standards, each relational database management system (RDBMS) implements SQL in a slightly different way. One RDBMS might support some features of SQL-92, but mainly support the SQL-89 standard. Others may only implement SQL-89. A number of databases have their own extensions to SQL. Oracle in particular has a procedural extension to SQL.
None of these variations in SQL should bother you too much though, especially in the beginning. One good reason to avoid high-end SQL and proprietary extensions is that you want to make your SQL code portable from system to system. I often prototype web database applications using Microsoft Access on my desktop and then move the data to the implementation system, where the SQL typically works without changes.
For my next trick
Hopefully the sea of acronyms looks a little more familiar after this whirlwind tour of web database connectivity land. In the next column, we'll take a look at plain-vanilla SQL for creating, updating, and deleting database records. In the following columns, we'll cover selecting data records, aggregating and sorting data, and even subjecting data to complex functions (spreadsheet users should feel quite at home).
Until the next column, feel free to email me at