Semantic Processing
Because satisfying business needs defines the value of Information Technology (IT), it becomes apparent that for businesses to be agile, the underlying technology must be able to make decisions. This is the fundamental motivation for introducing "semantic processing" into IT.
Semantic processing, which expresses relationships among concepts represented by phrases, has been a favorite research topic in academia and industry. The ongoing desire for software to inference information from context has been one of the goals of artificial intelligence. Semantic analysis has been studied in the context of:
- Natural language processing. Text and speech recognition (language translation, for instance).
- Correlation and data mining. Trend analysis (for data warehousing, threat detection, and so on).
- Thematic searches. Refined searches and queries that leverage an awareness of the business context.
Likewise, Semantic Service-Oriented Architectures (SSOAs) have been proposed. SSOA introduces semantic enhancement to services such that an agent aware of the semantic model can combine services dynamically to satisfy business goals.
For instance, IBM uses semantic processing in its Websphere Business Fabric software (www-306.ibm.com/software/solutions/soa/servicesfabric.html), and Software AG is doing likewise with its Information Integrator (www.softwareag.com/Corporate/products/cv/inf_int). Progress Software has partnered with Microsoft to provide its Progress Apama Event Processing Platform as a component of Microsoft's "Markets in Financial Instruments Directive" suite. Semagix (recently purchased by Fortent) used its Semantic Enhancement Technology to create an SSOA framework to build a money-laundering detection application called "CIRAS" (short for "Customer Identification and Risk Assessment"). And Ontology Works (www.ontologyworks.com) is involved in a project with NASA to create an SSOA for internal use (www.semantic-conference.com).
But before discussing semantic analysis in an SOA environment, understanding the relative terms can be useful. The definitions assume that a vocabulary (formally, a controlled vocabulary) exists for a domain of interest:
- Taxonomy. A classification scheme that uses parent-child or associative relationships among terms.
- Ontology. Both the vocabulary and a set of formal rules for combining elements to express something meaningful in the domain. Taxonomies can be considered a subset of ontologies.
To semantically enhance services, ontologies need to be defined on the domain of interest. There are different approaches to building ontology:
- Deterministic approaches, which usually imply manual ontology creation. This generally involves tools to help create ontology in a specific format. The ontology is created by domain experts and can be tedious and time consuming.
- Statistical approaches, which attempt to address the fact that domain knowledge may be incomplete, uncertain, or may change over time (changes in business models, for example). Statistical approaches try to build ontologies based on input sampling. Starting with an initial ontology, relevant documents associated with a particular concept are sampled. Using a technique such as Bayesian networks (www.ddj.com/dept/architect/184406064) to extend the First Order Logic of a language such as the Web Ontology Language (OWL), inputs are associated to types or subtypes using conditional probabilities. Inferencing is used to suggest the addition or removal of nodes and leaves.