Tim is a freelance software developer and consultant. He can be reached at [email protected].
Even though the bulk of my production development is in C, Java, and assembly language, I often turn to Perl to quickly prototype new ideas. That's especially true with database applications. Perl's DBI module for database access provides convenient access to the most popular relational databases, and its strong text-processing abilities make it easy to synthesize complex queries.
Over the last several years, I've refined a simple framework that handles most routine database access automatically and significantly accelerates the development of simple database applications. This framework takes advantage of Perl's advanced object-oriented features to pack a lot of power into a small package.
Accessors
Accessor functions are the easiest way to abstract database access. By convention, accessors in Perl are subroutines that can either set or return a value, depending on how they are called. Listing 1 illustrates a simple accessor function. The accessor always returns the old value; if a new value is specified, the function also sets the value.
The Class-Per-Table Pattern
Accessors are a natural way to manipulate database data. A simple object-oriented design pattern uses a single class for each table in your database. An object of such a class represents a single row, and you use accessor methods to read or change individual fields. Such a system could be used as in Listing 2. The details of the data storage are completely hidden; this design makes it easy to change the details of the low-level storage without any impact on the higher level application code.
Exploiting AUTOLOAD
The class-per-table pattern can involve a lot of highly repetitive code. For each column in the database, you need a method that can read/write that column. Often, programmers build simple code generators to create reams of code for such classes.
Perl's AUTOLOAD capability lets you eliminate almost all of the explicit accessor methods. If there is no "phone" method defined, then an expression such as $e->phone invokes the AUTOLOAD method instead. The special $AUTOLOAD variable contains the full name (including package name) of the method that was requested. Using this, the AUTOLOAD method can load the requested method from a disk or database or simply emulate the call directly.
My database access framework uses AUTOLOAD to provide a default accessor method for any column of a database. I simply use the name of the column as the name of the accessor. For example, if there is no explicit implementation of the phone method, $e->phone('555-5555') invokes AUTOLOAD to store the specified value into the phone column. This gives you convenient access to any database column without having to write any code.
Packaging It Up
Since real-world projects have lots of tables, I've created a single DBTable class that contains the AUTOLOAD method just described and a number of other convenient methods. The per-table classes simply inherit most of their functionality from DBTable. You only need to write one or more constructors that let you access rows in your tables.
To illustrate, consider the schema in Listing 3. Listing 4 is the complete Perl code needed to use this schema. In particular, note that I've defined no accessors; the AUTOLOAD method in DBTable will handle them automatically.
Defining New Tables
The first three lines of Listing 4 are just standard boilerplate for any Perl class. The only interesting part is the byName constructor. The _fetchOrCreate method accepts a hash that identifies this row and returns a hash with the values of one or more columns. If the row doesn't exist, it is created.
The DBTable methods rely on the TABLE and KEY entries in the hash. The TABLE entry specifies the name of the database table. The KEY entry is a hash providing a primary key that identifies the particular row. In Listing 3, the primary key is the value of the employee_id field. The first argument to the _fetchOrCreate method specifies a condition for identifying or creating the necessary row; the second argument is the column whose value is to be returned.
If you now look carefully at Listing 2, you can see how the entire system works. The $e variable in that example contains everything needed to identify a particular row in a certain table. The call $e->phone('555-5555') invokes the AUTOLOAD method in the DBTable class, which can use the table name (from the TABLE key), the primary key (from the KEY key), and the column name (from the method name) to access the database.
Inside DBTable
The full implementation of DBTable is shown in Listings 8 and 9, available in the Source Code Appendix, pages 24-26. There are only four places where SQL statements are actually constructed: _fetch reads one or more columns from a row, _create creates a new row, _set sets the values of columns, and _columns obtains the names of the columns in this table. The string manipulations that convert hashes into SQL statements are admittedly a bit convoluted, but they are hidden inside just a couple of small functions. Also available electronically is an example program that uses this code. You can use this as a starting point for your own database applications.
Advanced Techniques
Sometimes you will want accessors that return objects rather than raw database fields. Listing 5 is a manager method that returns an object. With this addition, you can refer to $e->manager_id to obtain the numerical ID for the manager, or $e->manager to obtain an object. Such methods are especially useful when you have complex schemas with many interrelated tables. In like fashion, you can define methods that simulate nonexistent database fields (by transparently accessing joined tables, for instance).
The byID constructor in Listing 5 could have simply used $self->{'KEY'} = {'employee_id'=>$id} to set the KEY without any database access. Although that would be faster, it doesn't provide any guarantees that the requested row actually exists.
Finally, remember that $e->dbHandle returns the underlying DBI database handle to your application so that you can use SQL statements directly if necessary.
Performance
This framework makes it easy to prototype database applications. You simply define a package for each table in your schema, and those packages need only contain one or two simple constructors each. You can then begin to build your application, accessing database columns through accessor functions.
Of course, such applications are going to be slow. Every time you read or write a field, at least one database access is going to occur. As your application evolves, you can improve performance by overriding specific accessors. For example, if you access the primary key fields frequently, you can override them using code like that in Listing 6.
Listing 7 takes this idea a step further by modifying the constructor to store the entire row in memory, and overrides the _fetch and _set methods to read and update the in-memory versions, dramatically reducing database traffic. Such optimizations are not always appropriate, however, and are generally best delayed until you understand your application better.
Future Directions
Currently, DBTable stores the DBI database handle in a global variable, which limits you to a single database handle at a time. Applications that must access multiple databases will need to find a way to address this.
Right now, the _columns method is rather wasteful. It generates a full database query once for every object. It would be more efficient to do so only once for every table. Worse, it stores a list of valid columns in every object rather than once for each table, which could become a significant memory problem.
Conclusion
This framework has helped me to very rapidly build new database applications. In some cases, performance is not a serious issue, and such code can even be used in production. However, even when performance is an issue, this design is still a good place to start. You should generally be able to build functioning applications using this interface and then upgrade the underlying data access to improve performance without having to extensively modify your application.
TPJ
Listing 1
# Simple accessor sub foo { my $a=$FOO; ($FOO)=@_ if @_; return $a; } # Using accessors foo(4); # foo = 4 print foo(5); # Print 4, set foo=5 print foo(); # Print 5
Listing 2
# Access a row $e = Employee->byName('Doe', 'John'); # Get phone number print $e->phone(); # Set phone number $e->phone('555-5555');
Listing 3
CREATE TABLE employee( employee_id INT PRIMARY KEY, first CHAR(80), last CHAR(80), phone CHAR(80), manager_id INT );
Listing 4
package Employee use vars qw(@ISA);@ ISA = qw(DBTable); sub byName { my($class,$last,$first) = @_; my $self = bless {'TABLE'=>'employee'},$class; # _fetchOrCreate returns a hash # { 'employee_id' => <number> } $self->{'KEY'} = $self->_fetchOrCreate( {'last' => $last,'first' => $first}, 'employee_id' ); return $self; }
Listing 5
# An accessor returning an object sub manager { my($self) = @_; return Employee->byId($self->manager_id); } # Another constructor sub byID { my($class,$id) = @_; my $self = bless {},$class; $self->{'TABLE'} = 'employee'; $self->{'KEY'} = $self->_fetchOrCreate ({'employee_id' => $id}, 'employee_id' ); return $self; }
Listing 6
# An optimized employee_id accessor sub employee_id { my $self = shift; $self->{'KEY'}->{'employee_id'}; }
Listing 7
# An optimized Employee class package Employee use vars qw(@ISA);@ ISA = qw(DBTable); sub byName { my($class,$last,$first) = @_; my $self = bless {},$class; my $cols = $self->_columns; $self->{'TABLE'} = 'employee'; # Read and cache all of the columns $self->{'FIELDS'} = $self->_fetchOrCreate ({'last' => lc $last, 'first' => lc $first}, keys %$cols); $self->{'KEY'} = { 'employee_id' => $self->{'FIELDS'}->{'employee_id'} }; return $self; } # Override _fetch to just pull the value from memory sub _fetch { my ($self, $key, $col) = @_; return { $col => $self->{'FIELDS'}->{$col} }; } # Override _set to modify the in-memory value sub _set { my($self,$key,$set) = @_; $self->SUPER::_set($key,$set); # Set fields in DB foreach $k (keys %$set) { $self->{'FIELDS'}->{$k} = $set->{$k}; # Remember to update the value in 'KEY', if it's there if(exists $self->{'KEY'}->{$k}) { $self->{'KEY'}->{$k} = $set->{$k}; } } } TPJ