NOV94: Object Databases

Object Databases

Object methods in distributed computing

Jonathan Wilcox

Jonathan is president of Menai Corp., a producer of object-oriented programming tools. He can be contacted at [email protected].

Distributed computing and object-oriented programming are central to the emerging class of computing platforms. Architectures like COM, CORBA, and OpenDoc, as well as formal and de facto standards such as OLE2, DOE, DCE, DSOM, ODMG-93, and COSS, all target the intersection where distributed systems meet objects. Numerous object-database research projects and a dozen or so commercial object databases are evolving. To date, the implementations have been based on a wide variety of approaches, and no single one has emerged as dominant.

As usual on the cutting edge, confusion is the one constant. Consequently, in this article I'll examine a number of issues concerning object methods and object databases. In particular, I discuss where method code is located, how it is loaded, and where it is executed.

The combination of object databases (ODB) and object-oriented applications produces two discernible categories of class methods: those specific to applications, and those relating to database management and data manipulation. For example, a chemical-formula calculation is application specific, while a search for an object having a particular attribute value is database specific. To distinguish between these categories, I'll refer to them as "application methods" and "ODB methods." It is important not to confuse these labels with the site of method execution. Some ODB products actually execute their ODB methods as part of the application, and others execute application methods as part of an ODB server process; see Figure 1.

ODB Implementations

To be dealt with as instances of a class, objects must be recognizable. Practical implementations of object technology depend upon instance tagging for object identification. The tag is placed at a defined position within the structure of the instance data, where hardwired code of an application (or of an ODB stub linked into the application) expects to find it. The most commonly used tags are unique object identifiers (OIDs), class names or numbers, and schema-version identifiers. Obviously, a control system must exist to prevent redundant use of OIDs and enforce correct use of class and version tags.

Given a class name or number, the client application can reference the class methods by finding that name or number with a program-loader mechanism. In IDB, from Persistent Data Systems (Philadelphia, PA), for example, distributed-object database instances include a header with a class identifier that provides the leverage for dynamic dispatch of methods by indirect call through a vector of method addresses.

Given that the class instance is recognized, where can you find the executable code of a method of that class? The simple answer is that it must be found locally, in the form of a linked library, or in database storage, or in a code library at some network node. This can be done using an automatic code-retrieval mechanism or remote function server. If the code is installed locally but the instances are supplied by an object database, the database is "passive;" if both the instances and the code are supplied by an object database, it is "active."

However, you can't describe ODB implementations in terms of a simple active/passive dichotomy. Further distinctions must be made between data-management methods and application-specific methods, and between client/server architecture and peer architecture. These distinctions are hard to define, because many software implementations allow application designers to choose either approach.

If the binary code to implement the identified method is not available at the application site (in the same executable or an accessible library), the identification of its location can be supported by an accessible name-server process. Some ODBs employ a universal catalog of methods; others use a domain-name approach.

An application program need not retrieve remote class instances or remote method code. It is sufficient that the application can identify an object or class of objects located elsewhere, because the application can send a message to the remote server requesting that class methods stored at the server be invoked with reference to the class instances also stored there. Indeed, in the world of distributed computing, it may be hard to know just where execution takes place, because instances and methods may be dispatched to a node where the compute load is temporarily low.

The Itasca object database from Itasca Systems (Minneapolis, MN) stores both source and binary-method code in servers where method execution takes place. This allows active distribution of the computation load among Itasca server nodes. In a forthcoming version of Itasca that will support heterogeneous servers, the system will provide automatic compilation/linking of source code as needed on the execution platform.

There appear to be four design approaches to management of methods in distributed ODB systems:

A local code library is directly linked to the executing or ODB server program.
The ODB stores reference information to identify the method and the linkable library where it may be found. This information is passed to a conventional program loader, usually within the application program, but sometimes within the ODB server. Occasionally, the method is stored and executed on a remote network node.
The ODB stores script or intermediate computer code, which is delivered to an interpreter or incremental compiler built into the server or the application.
The ODB stores binary code native to the execution platform and involves an extended program loader on the part of the application to retrieve the code objects.

A "local" code library is generally a DLL or shared library, although it may be statically linked. It may only appear to be local to the process that loads it, because it may be in a remote directory that has been mounted locally by a network facility such as NFS from Sun Microsystems. The "executing process" may be an application process, an ODB server process, or both. The ODB may or may not be implemented by a server process. Finally, the ODB designers may have employed more than one of these approaches and given application developers the same option. Certainly, this isn't a subject that yields easy classifications.

Design Categories

The local-library design is the most conventional. It relies on method-resolution facilities supplied by the computer language and program-loader facilities included with standard computer libraries. In C++, for example, the available methods are known at compile time and accessed through a vector table associated with the class of a given object. If the method is not yet in memory, its library file and method name are accessed from the program's static-memory area and supplied to the program-loader code. The loader maps the referenced binary code into memory and performs "fixups" that change internal executable binary code references into memory addresses. When the library appears to be local as a result of NFS mounting, the same routine occurs.

An extreme example of the local-library design is found in the Persistent Data Systems IDB object database. All application and ODB methods are compiled into the application. There is no server process. The database consists of a set of files to which coordinated access is shared by cooperating applications. The POET object database, from Poet Software (Santa Clara, CA), uses the same approach.

A less extreme example is the O2 object database by O2 Technology (Mountain View, CA). O2 uses local-library methods in association with a page-server process that is blind to the semantics of the objects that it handles. The object semantics are managed by code linked into the application.

Similarly, Versant, from Versant Object Technology (Menlo Park, CA), relies upon ODB and application methods linked to the application. A subset of the ODB methods operates by remote procedure call to the ODB server process, which manages only object instances, not method references.

The second design category is exemplified by the many object databases that employ database-server processes and rely on local libraries for methods. Most of these tools store method references in the database; however, they rely on the application to employ these references to find the code in local libraries. Examples include UniSQL, from UniSQL Inc. (Austin, TX); ONTOS, from ONTOS Inc. (Burlington, MA); Objectivity/DB, by Objectivity Inc. (Mountain View, CA); Matisse, from ADB Inc. (Cambridge, MA); and EasyDB, from Basesoft Open Systems (Kista, Sweden). Open ODB, from Hewlett-Packard (Palo Alto, CA), also fits into this category (more on it shortly).

UniSQL stores the name and location of method code in the ODB and passes the information to an extension of the application-program loader. UniSQL uses NFS to make remote libraries appear local. ONTOS and Objectivity/DB do likewise, with ONTOS storing method references in "procedure objects" and Objectivity/DB maintaining a catalog of schemata and methods. In EasyDB, a run-time view of the data dictionary is used to look up methods for application loading from a local library. Illustra (formerly Montage), by Illustra Information Technologies (Oakland, CA), can be used to manage application methods according to this second category, though it is usually employed to manage ODB method code for ODB server execution.

The third design category employs an interpreter facility linked to a process that retrieves script or compiled intermediate code from the ODB. O2 and ONTOS exemplify this approach. O2 provides an optional, 4GL, incrementally compiled language called O2C, which can be used to develop database-storable ODB methods. Because the O2 server is only a page server, the O2C methods must be loaded to run in an interpreter linked with the application process. ONTOS provides an optional, storable "method object" that records combinations in which application-linked methods should be executed, thus emulating pre- and postcondition triggers and method wrappers.

The fourth design category employs database storage of compiled binary code. This code may be retrieved for execution by the database-server process and/or by the application. In each circumstance, an extension of the conventional program loader is required to retrieve the code and possibly to "swizzle" internal OID references into memory addresses if the binary code is stored as a class of objects.

Because it uses binary-code storage, OpenODB requires that the code be developed in the OPL language, which compiles to binary. These methods execute on the ODB server. ODB storage of binary code is also a developer-specified option with Itasca which also requires that stored code be executed by the ODB server process. OpenDB and Itasca give you the option of storing application methods or linking them to the application for application-site execution. OpenDB also achieves that objective by storing a reference to the method as a remote binary located at the application node and employing an external function server at the application's node to launch the method.

An extension of the binary-code storage approach is the storage of references to executable binaries stored in associated databases or available for automated loading and execution on network nodes. Invoking such a method is usually location transparent to an application because it is managed by the ODB server, perhaps with the aid of a function server (automatic program loader) at a cooperating execution node. This capability exists in OpenODB.

Analogous to database storage of intermediate method code are ODBs that use the Smalltalk environment. With Gemstone, from Servio (Beaverton, OR), you can define new methods in Smalltalk or C, and these are stored and executed at the ODB server. In ObjectStore for Smalltalk from Object Design (Burlington, MA), the ODB extends the Smalltalk Virtual Machine to obtain demand paging into applications of any Smalltalk object, which can include methods.

Native Code

Finding executable code and mapping it into memory for execution is the role of a program loader. Operating systems have program loaders that spawn processes, and programs have program loaders that map program components into memory. Overlays, DLLs, and shared libraries are also mapped into memory by program loaders. Thus, software mechanisms that find and retrieve executable code from a different network node may be viewed as extensions of program loaders.

In a network of homogenous computers that actually execute methods, no complexities are involved in retrieving method code from a local or remote node. If the execution takes place only on servers, then only the servers need to be homogenous; if the execution takes place only on clients, then only the clients need to be homogenous. In all of these instances the same binary code suffices.

In a network of heterogeneous computers that execute the same methods, however, obvious problems arise. If methods are to be accessed locally (in DLLs, for example), manual installation of appropriate libraries may circumvent the problem. But in the case of a remote server, which may be accessed by more than one kind of computer seeking executable method code, some provision must be made to supply the right kind of binary. The case is essentially the same when a client may invoke method execution on multiple, dissimilar computers that may not use local libraries.

A number of solutions have been tried:

Maintain at each code server every version of the binary that could be needed at any execution site, and require each program-load message to identify the binary version needed.
Maintain a local intermediate-code interpreter at each execution site and respond to a code request with the intermediate code or incremental compiler.
Maintain source code at the code server and provide automated compile/link services appropriate to any node that requests method code and has not yet received an appropriate binary.

The first alternative is exemplified in the IDB distributed-object database, in which the server keeps "operations" files for each supported platform. These files are distinguished from one another by a naming convention (for example, foo.dll for Windows, foo.on for Next, and foo.om for Macintosh). The second alternative is exemplified by Smalltalk systems, such as ObjectStore and Gemstone, which use byte code inherent in Smalltalk. Products using other languages offer this approach as well; for example, HP's OpenODB offers the OPL programming language. The third alternative is planned for Itasca, which stores both source and executable code of methods in the database. (At present, Itasca executes all distributed methods on homogeneous servers.)

Any code-retrieval mechanism requires a means for an application to unambiguously reference an operation for which a method has been defined. For DLLs and shared libraries, the reference is generally to a library filename and specifically to a function or procedure name distinguished according to class. In C, the distinction can be maintained by a naming convention; in C++, it is automatically supported by class scoping and function-name mangling.

When method code is (or appears to be) situated locally to the executing process, the program loader for the generating system or process will locate the code by filename and offset within that file. The program loader will perform "fixups" that change file-offset references into memory-location references. Shared libraries and DLLs add an intermediate name-lookup step to secure the offset of the code within the library. The difference between this and non-object-oriented loading procedures or function code is that the method may have been called through a C++ vtable (object vector table) or a C-function pointer array.

When method code is stored in an object database, more complex techniques must be used. Methods must be referenced in the database via an object instance known to the application. Some characteristics of the object instance must provide the key to accessing an associated method through database functions. This is often accomplished through use of a class tag, schema version tag, and/or unique object identifier, any one of which may provide a reference to the methods of the class. The routine for object-method resolution begins after the application code has selected an object of a particular class and an operation upon that object, so that the remaining activity pertains only to locating the code for the designated method.

The class-instance and method-locating routines can proceed from the same take-off point, rather than first instance, then method, if the searches are based on a unique OID. In such a design, the OID may serve as a key into an index that has two separate values associated with that key; a reference to the instance-storage site and a reference to the method-storage site. Often, however, the first reference is to a class or schema representation.

A class tag or schema-version tag is unique to a class, rather than to an object instance, so the tag must be associated with the instance. This might be accompanied by physically storing the tag at a position within the instance that is known to the database implementation, or by an index that associates it with a unique OID that is physically stored with the instance. The index may reference the schema itself, or it may reference an intermediate index to secure the schema version of the object if multiple, concurrent schemata are supported. Thereafter, the class or schema provides a further reference to the location of stored method code.

A schema is a collection of object types that describe the physical storage layout of object-data instances of a given class. Methods can be included in a schema directly or by reference to a list of methods. When included in a list, methods are typically described by signatures that include such information as name, argument types, return types, exception types, and a reference to the physical location of the method code in the database or in a linkable library.

It is not essential that an OID be retained physically together with an instance, as long as an index is maintained that associates the OID value with a physical storage site and a class or schema that describes the instance. This would be hazardous, however, due to the risk of index corruption. Keeping the OID physically together with the instance allows the index to be rebuilt.

It is unrealistic to be categorical about any of the designs. First of all, a reference may be direct or indirect at any stage where a reference is used. Second, the tags and OIDs are often used in parallel for reasons that relate to support of legacy code and databases. Third, the terminology of object-oriented design is not settled, so differing meanings and roles of class, metaclass, schema, schema version, object type, object version, and method version are embodied in a great variety of ODB implementations.

For example, in some databases it is possible to define and store data-manipulation methods as "method objects" within the database itself; but at least one significant maker considers this to be an impure concept and insists that database methods should only be accessed from linkable libraries. It may be gratuitous to observe that the practical difference between the two is that the program loaders differ in their implementation details.

Conclusion

The variety displayed in ODB implementations is at once confusing and encouraging. We may now expect to see a convergence of particular designs with application needs to which they are well suited.

Agents and Object Databases

Object databases typically support well-defined applications that access objects of known classes. The methods of those classes are defined with precise syntax. A very different run-time environment exists for programs known as "agents." Agents are being developed as personal productivity assistants (for example, to cull news items for a particular reader or to perform database research) and many other uses. Truly useful agents must respond to complex and variable messages and often must delegate tasks to other agents. This circumstance invites the development of generalized message languages which can be interpreted and generated by agents. Agents may be viewed as engines that respond to messages of interest (others being ignored), where the messages state or satisfy a need rather than carry out a procedure. In contrast, the typical application that uses an object database will fail or generate an error condition if a malformed or unexpected message is received by an object.

--J.W.

Figure 1 (a) Library method executed at client site; (b) stored method executed at ODB-server site; (c) stored method executed at client site.

Object Databases

Object Databases

Object methods in distributed computing

Jonathan Wilcox

ODB Implementations

Design Categories

Native Code

Conclusion

Agents and Object Databases

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Object Databases

Object Databases

Jonathan Wilcox

Agents and Object Databases

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content