Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Security

Transaction Processing


Mar04: Programmer's Toolchest

Improving the integrity of data

Charles is an independent consultant who lives in Wyoming. He can be contacted at http://www.charlescurley.com/.


By preserving the integrity of whole transactions, not just the elements thereof, transaction processing ensures the integrity of data at the conceptual level in the face of catastrophic failure. By the conceptual level, I mean the level at which users—not programmers—operate. Accountants, for example, are interested in debits, credits, invoices, and checks—not database files or SQL rows and columns. Without transaction processing, external events such as program or computer crashes can cause part of a transaction to be retained, while other parts are lost. The result is that the consistency of the data is jeopardized.

There are four requirements of transaction processing:

  • Atomicity. Transactions take place entirely, or not at all; that is, they are atomic.

  • Consistency. The system is in a valid state at the beginning of a transaction and again at the end, regardless of how it ends.

  • Isolation. When all transactions appear to take place in isolation, they are unaffected by other transactions that may be going on elsewhere in the system.

  • Durability. Once a transaction is complete, it is durable (or persistent).

Journaling File Systems

Most modern operating systems provide filesystem transaction processing through what is known as a "journaling filesystem." A journaling filesystem keeps a record (journal) of each high-level transaction.

For example, suppose you were to append data to a file. A journaling filesystem first records in its journal that it had new data to append to the file. It next writes the data to the disk. Then it adjusts the file information appropriately. It finally marks the transaction as completed in its journal.

With journaling filesystems, recovery from crashes consists of examining the journal for incomplete transactions. Any incomplete transactions are either completed or rolled back. Thus, the filesystem is restored to internal consistency without exhaustive and potentially inconclusive checks of the entire filesystem.

You may already be using transaction processing without knowing it: Modern journaling filesystems include ReiserFS (http://www.namesys.com/) and ext3 (http://batleth.sapienti-sat.org/projects/FAQs/ext3-faq.html) on Linux, and NTFS on Windows NT and its descendants.

Of course, transaction processing does impose overhead because, as with journaling filesystems, it requires extra processing and often extra disk space. Therefore, it should be optional at the database level. For example, temporary data may not be worth the overhead. Likewise, some index files may not be worth the overhead either if, for example, they are rebuilt often. The decision of where to use transaction processing should be made by the application designers, not the database designers. Consequently, you should be able to specify which features of transaction processing you want; for instance, you may want atomic transactions, but not automatic recovery, in a temporary file.

Because of overhead, transactions should be large enough to guarantee the integrity of the data, but no larger. Why? Because of transaction processing's overhead, including record locking. Records involved in transactions are locked until the entire transaction is committed, which means that they are locked longer than they would be without transaction atomicity. This can delay other transactions.

Transaction Processing with C-Tree Plus

C-Tree Plus from FairCom (http://www .faircom.com/) is a database management system that provides transaction processing support. C-Tree Plus is composed of several libraries (with interchangeable APIs) that are optimized for different applications, ranging from single-user database programs (say, a personal information manager) to multiuser enterprise databases with network access.

In this article, I examine C-Tree Plus function calls at the ISAM level. Other C-Tree Plus interfaces are more abstract, such as C-TreeSQL, but retain much of the control. By offering abstractions (such as session, database, table, record, and field), they make programming easy and maintainable. The sample database I present (in database.c, available electronically; see "Resource Center," page 5) is straightforward enough to use in a sample program, but sophisticated enough to present a realistic scenario for transaction processing. It is a database of names, addresses, and phone numbers. For each person, you can have zero or more addresses (home, work, vacation home, and so on), and zero or more phone numbers (home, work, mobile, and the like). Each set of data is represented by an eponymous file. These are bound by the index fields in each file: A phone number has the index number of the person to whom it belongs in its index field.

The sample program was written in C and uses FairCom's C-Tree Server on Linux. It was developed on Linux, but should compile and run on any computer that supports C-Tree client libraries. The main line code is in demo.c. Several functions exist to hide implementation details; these are found in functions.c. The database design is made concrete in database.c. Appropriate header files and a make file for GNU make round out the package. Again, all of these files are available electronically.

I've designed this program to show transaction processing and omitted some features normally associated with databases. For example, the code does not qualify phone numbers or postal codes to ensure that they are valid or properly formatted. In fact, there isn't even a simple scanf-based input function, since you can probably write an equivalent one with the GUI library of your choice. In short, this is not a complete application.

Most data types are FairCom's and are defined in FairCom's ctreep.h header file. This is to ensure portability. For example, the function main (in demo.c) uses FairCom data types in its definition, which the preprocessor automatically resolves to the appropriate local data types.

The first thing the program does in main is build some buffers, make other preparations, and connect to the C-Tree server with the C-Tree function InitISAMXtd. The user name is for the author on his computer; you will probably want to change that. The lack of a password is a security issue, but this is a sample program.

Next, the program opens three data files and their associated index files. The function OpenDataFile (see functions.c) creates a data file and its index files, if they do not already exist. The first thing OpenDataFile does is attempt to open the specified file using the C-Tree function OpenFileWithResourceXtd. Only if that fails does it attempt to create the file (using CreateIFileXtd).

The key to creating C-Tree files is the file mode (or filmod in FairCom nomenclature). I define two file modes in database.c—one for data files and another for index files. Since transaction processing is to be used, the file modes include ctTRNLOG. In addition, using the ctLOGIDX flag for index files accelerates recovery. These file modes are used in the IFIL structures that define the data and index files. The IFIL structures are used when the file is created, but not used when the files are opened.

If the file can be opened but has a problem as indicated by the variable isam_err, the function attempts to rebuild it. If the function cannot recover, the program exits. All exits are preceded by a call to StopUser, which ends the user session with the server. Using StopUser reduces the time before the server removes the session (while the server detects and cleans up orphaned sessions).

The function OpenDataFile returns a file number, which you can think of as a file handle. It is the file number of the data file. Index file numbers are the data file number plus some number up to the number of index files. This makes it easy to calculate the file numbers of indexes, as we do with the macros NAMEDAT and NAMEKEY in database.h. Once the files are opened, you scan the names file to get the highest index number. When that is done, you increment it as ready-to-use for the next entry.

If you don't already have existing data—as is the case if you are creating new files—then you stuff some canned data into the files. This is where you exercise transaction processing.

When you're done, you walk the database, printing out each name in sequence, and printing out all of the associated addresses and phone numbers. This is done in three while loops, two of which are nested in the third. The outer loop steps through the names file, using its index file to step in alphabetical order. Because the index file uses both the first and last name, you sort names in order. So, even though "Fred Flintstone" was added last, he is printed out first.

The two inner loops step through the phones and addresses files in sequence, printing out the data in the order in which it was inserted. For example, Fred's mobile phone number prints out after his work number.

Next, you close out the program. Now, you have a choice depending on which code you comment in or out: You can delete the data files or simply close them for later use. Obviously, for a production program, you would do the latter.

To provide transaction processing support, I add four entries to the names file, and associated entries to the phones and addresses files.

First, I add Wilma (Listing One). I've commented out the transaction processing Begin and Commit calls. The code ignores the return values for the three entry-adding functions. Since the C-Tree function I use to add the record notices that I am trying to add a record without having started a transaction, it refuses to add the record. Had the C-Tree function let me continue, it would have had an unsafe condition. That means that if the program were to crash between the time I added Wilma's name and the time I sent her address, it could have an incomplete record (such as Wilma's name but not her phone number and address). Eventually, the server would time out and abort the transaction. This is not good. (You can experiment with this by compiling with debug information and single stepping through the program, shutting down the server at various places. Uncomment the Begin and Commit code, and you can add Wilma with minimal transaction processing protection.)

As for Barney (Listing Two), I start a transaction with Begin. If all goes well, it ends with Commit, which does the actual write to the database. However, if any of the calls that add data have any problems at all, it prints out an error message, stops the transaction with a call to Abort, ends the session with StopUser, and exits.

There's one benefit of transaction processing with Barney—data integrity. Either all of Barney's data goes into the database or none of it goes into the database: So, if the system crashes, Barney's data has consistency.

But Barney only gets one chance. What if the server is momentarily overloaded or some other transient event prevents us from adding Barney? I'll make several tries with Betty (Listing Three) and, if need be, abort up to six times inside a do..while loop.

At this point, I've shown how to roll back a transaction completely—call abort and start over. With large transactions, you should observe the rule that a transaction should be as large as it needs to be and no larger. To do this with Fred (Listing Four), use a savepoint so you can roll the transaction back partially, then add Fred's name and home phone. For simplicity, assume that those writes are successful. That being so, you don't want to roll them back if you can avoid it. So, before writing the address, set a savepoint. If the address or either of the two following phone numbers fail, restore back to the savepoint and try again. Once you're successful, call Commit and push the whole transaction out to the database.

Fred could be done with a series of nested do..while loops, one for each line of data. I showed only one such loop for simplicity. Adding one or two more would be a good exercise.

Conclusion

Transaction processing is a useful tool for improving the integrity of the data your application collects. With some careful planning early in the design phase, it is easy to use.

DDJ

Listing One

/* Start Wilma */
/*     if (Begin(ctTRNLOG|ctENABLE) == 0) { */
/*       printf("Error beginning transaction: %d\n", uerr_cod); */
/*     } */
    AddName ("Wilma", "Flintstone");
    AddPhone ("213-555-1212", "Home");
    /* "PL" for Paleolithic. */
    AddAddress ("1234 Jurassic Park", "", "Bedrock", "PL", "12345", "Home");
/*     if (Commit(ctFREE)) { */
/*       printf("Error committing transaction: %d.\n", uerr_cod); */
/*     } */
    printf ("LatestIndex is %ld\n", LatestIndex);
    /* End Wilma */

Back to Article

Listing Two

/* Start Barney */
if (Begin(ctTRNLOG|ctENABLE) == 0) {
  printf("Error beginning transaction: %d\n", uerr_cod);
}
oops = AddName ("Barney", "Rubble");
oops |= AddAddress ("Box 432", "1244 Jurassic Park", 
                                          "Bedrock", "PL", "12345", "Home");
oops |= AddPhone ("213-555-5555", "Home");
oops |= AddPhone ("213-555-4444", "Work");

if (oops) {
  printf ("Transaction Failed!. Error is %d\n", oops);
  Abort ();                 /* Stop the transaction */
  StopUser ();              /* Close out the connection */
  ctrt_exit(4);
}
if (Commit(ctFREE)) {
  printf("Error committing transaction: %d.\n", uerr_cod);
}
printf ("LatestIndex is %ld\n", LatestIndex);
/* End Barney */

Back to Article

Listing Three

/* Start Betty */
do {
  if (Begin(ctTRNLOG|ctENABLE) == 0) {
    printf("Error beginning transaction: %d\n", uerr_cod);
  }
  oops = AddName ("Betty", "Rubble");
  oops |= AddPhone ("213-555-5555", "Home");
  oops |= AddAddress ("Box 432", "1244 Jurassic Park", 
                                      "Bedrock", "PL", "12345", "Home");
  if (oops) {
    Abort ();               /* Stop the transaction */
    cycles++;
    if (cycles > 5) {
      printf ("Transaction Failed after 6 tries!");
      StopUser ();          /* Close out the connection */
      ctrt_exit(4);
    }
  }
} while (oops);
if (Commit(ctFREE)) {
  printf("Error committing transaction: %d.\n", uerr_cod);
}
printf ("LatestIndex is %ld\n", LatestIndex);
/* End Betty */

Back to Article

Listing Four

/* Start Fred */
if (Begin(ctTRNLOG|ctENABLE) == 0) {
  printf("Error beginning transaction: %d\n", uerr_cod);
}
AddName ("Fred", "Flintstone");
AddPhone ("213-555-1212", "home");
cycles = 0;
/* Save our position in the transaction for now */
SavePoint = SetSavePoint ();
do {
  /* Try it */
  oops = AddAddress ("1234 Jurassic Park", "","Bedrock","PL","12345","Home");
  oops |= AddPhone ("213-555-6666", "work");
  oops |= AddPhone ("213-555-7777", "mobile");
  /* Were we sucessful? */
  if (oops) {
    /* No, go back on our transaction. */
    cycles++;
    if (cycles > 5) {
      printf ("Transaction Failed after 6 tries!");
      Abort ();             /* Stop the transaction */
      StopUser ();          /* Close out the connection */
      ctrt_exit(4);
    } else {
      RestoreSavePoint (SavePoint);
    }
  }
} while (oops);
if (Commit(ctFREE)) {
  printf("Error committing transaction: %d.\n", uerr_cod);
}
printf ("LatestIndex is %ld\n", LatestIndex);
/* End Fred */

Back to Article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.