The Scheme of Things
RDF itself is a handy way to describe resources. Widespread use of such a facility could alleviate many of the current problems with the Web. But RDF by itself only gets us part way toward realizing the Semantic Web, in which agents can infer relationships and act on them.
Classification is extremely important on the Semantic Web. Each community of related interests defines categories for the matters that it discusses. For instance, the snowboarding community defines items such as snowboards, parks, tricks, and manufacturers. The definition of a manufacturer in snowboarding terms is related to the definition of a manufacturer in the general business sense. The snowboarding community can enshrine these definitions by creating a schema for its RDF models. The W3C is standardizing a simple mechanism for RDF schemas (W3C RDFS). If you're interested in seeing what the actual code looks like, see Example 3.
W3C RDFS is itself expressed in RDF format. The first stanza of an RDFS document, like that in Example 3, describes a class of item that we identify with the given URI and label, for instance, Snowboard. The comment is just a useful documentation item. The second stanza might describe a class we label Snowboard Manufacturer. We subclass this label from the more general concept of manufacturer
that's defined in the RosettaNet general business dictionary. RosettaNet is an organization for standardizing business-to-business communication using computers. Perhaps from this connection, you can understand how an agent designed for processing information on general business matters could gain at least a foothold of understanding if it were to come across a snowboarding site. This is an essential trick of the Semantic Web.
We use the same trick to define a rider as a specialization of person defined in FOAF, a well-known schema for personal and organizational contact information. We also define a couple of properties in the schemaresources that can be used as the predicates of RDF statements. W3C RDFS lets us apply some simple constraints on properties. For instance, rdf:domain
only lets a property be used on a certain class of resources, rdf:range
declares that the value of the property must be of a certain class of resources. So, for example, we say that only a Rider can have an Endorsement property, and that the value of all such properties must be Snowboards.
With this schema in place, the snowboarding community would have a formal basis for saying things like "Chris Englesmann endorses the Fatbob snowboard, manufactured by K2." Of course, this isn't to say that all of the content on an RDF-enabled site would be translated to abstract graphs, or long-winded XML representations thereof. RDF would rather be used in content headersin the HTML <head> section, for exampleto make formal statements about the content that guides an agent in placing it in context.
Nebulous Knowledge
Schemas take us a step toward the Semantic Web, but not all the way. In the heyday of artificial intelligence, scientists were puzzled by a crucial point. Although computers were beginning to overtake the human brain in terms of sheer processing speed and storage capacity, they still didn't compare to human intelligence. At least one reason for this is that the brain doesn't stubbornly store and categorize every scrap of every detail that we use as the basis of thought. The brain is a miracle because it can make connections between partially-stored information, and assemble this into intelligence when necessary. To achieve this level of understanding with RDF and RDFS, countless resources on servers all over the world would have to be methodically classified and described. This, of course, is completely unrealistic.
The Semantic Web won't be possible until agents have the means to figure out some things by themselves, given the data they have to work with. Fortunately, artificial intelligence gives us two tools to help make this possible. First, knowledge representation is a field that defines how we might represent, in computers, some of what is stored between our ears. This would give computers a fighting chance at synthesizing unclassified data at a useful speed. Second, inference is a way of using formal logic to approximate further knowledge from that which is already known. All of this forms a system of representing and synthesizing knowledge that is often referred to as an ontology.
The leading ontology system for RDF is the DARPA Agent Markup Language (DAML). DARPA, for those who may have forgotten, is the group that brought us the Internet itself. DAML incorporated useful concepts from the Ontology Inference Layer (OIL), a European project to provide some AI primitives in RDF form. The resulting language is DAML+OIL. (Visit www.daml.org for more information.)
DAML+OIL lets us formally express ontologies. W3C RDFS provides primitive classification and simple rules for this, but DAML+OIL goes much further. For instance, DAML+OIL can express that "any snowboard with plate bindings is a race board," which makes it unnecessary to then explicitly flag every race board. You might see in this some of the flavor of business rules, which are known in software development circles as the programmatic expression of mandates for the way data must be processed. In fact, one way to look at DAML+OIL is as the business rules for the Semantic Web, yet it's much more flexible than most business-rules-languages in common use.
Most of DAML+OIL's power comes from primitives for expressing classifications, as the race boards example illustrates. DAML+OIL provides a toolbox of class expressions, which bring the power of mathematical logic and set theory to the tricky and important task of mapping ontologies through classifications.
Miles to Go Before We Sleep
The Semantic Web is still a way off, if it's attainable at all. To date, RDF and DAML+OIL are our best efforts at reaching it. They address a good number of the problems with the present state of the Web, and further enhancements are on the way. For example, a system of statements that's managed at a certification authority could help establish the validity of RDF statements to minimize metadata spam and other security problems.
There are already automated tools to help generate RDF for existing Web pages, which should aid migration. In fact, RDF's great strength is that even without the Semantic Web in place, it has proven to be a practical and usable technology in assorted areas of computing. It can be used for data descriptions in highly generic and extensible databases, or for sophisticated modeling in application development. As a result, an impressive and growing selection of tools and vocabularies are defined in RDF for ready use. The integration with Web publishing tools is improving with developments like the PRISM standard for content syndication metadata. This standard is RDF-based and endorsed by an impressive cross-section of the electronic publishing and publishing tools industries.
Many of the practicalities of the Semantic Web depend on how we define victory. If it's enough to make the Web a better-managed resource for well-defined and somewhat more organized communities, then RDF and the higher-level technologies I've discussed already provide the means to do so now; it's just a matter of obtaining a critical mass of users. RDF success stories like the RDF Site Summary (RSS) for content syndication and the Musicbrainz system (www.musicbrainz.org) for digital music metadata could be the catalysts of significant progress. And soon the Web might have an alternative between the poor scalability of traditional librarians and the ineffective results of today's search engines.
Uche is a consultant and co-founder of Fourthought, a consulting firm that specializes in XML solutions for enterprise knowledge management applications. Contact him at [email protected].