About Us Our Work Employment News & Events
MITRE Remote Access for MITRE Staff and Partners Site Map

Home > News & Events > MITRE Publications > The Edge >

A Framework for Information Interoperability

Len Seligman and Arnon Rosenthal

o rapidly respond to new opportunities and threats, both government and industry are looking for faster, cheaper ways of sharing information via computer systems. In response, vendors offer a new "solution" every few years, such as data warehouses, Web services, "enterprise information integration" tools, and ontologies. While these are all potentially useful when applied appropriately, none is a silver bullet. Each organization has to assess its own needs and the best approach to meet them.

In all cases, the goal is to make available information that sources have and are willing to export. The framework presented in this article can be used to evaluate interoperability approaches. We'll describe the most common architectures available for achieving interoperability and the challenges that still stand in the way.

There are two main types of information interoperability:

  • Exchange, in which a producer (such as the Department of Defense) provides information to a consumer (such as NATO), and the information is transformed to suit the consumer's needs.
    Integration, in which in addition to being transformed, information from multiple sources is also correlated and fused. In general, the consumer sees a single, coherent view rather than all the systems' opinions.
    Exchange requires addressing the first three problem levels in figure 1.
  • Integration requires that all four levels be addressed. You can use these problem levels to help you analyze proposed interoperability solutions.

Level 1: Overcome geographic distribution and infrastructure heterogeneity.
Data can be widely distributed geographically. In addition, to access the data you must overcome several types of infrastructure heterogeneity including:

  • Different data-structuring primitives, such as relational database tables versus XML versus objects
  • Different data manipulation languages (such as SQL or XQuery), proprietary data languages, and sources with no query language that require use of a general purpose pro- gramming language (e.g., Java)
  • Different platforms, operating systems, networks, etc.

Level 1 challenges are not as resource-consuming as the others because off-the-shelf middleware products handle most of these challenges. In certain environments (e.g., tactical military applications), however, significant engineering is still required at this level.

Figure 1:
The four levels of
information integration


Level 2: Match semantically compatible attributes.

Some independently developed information systems use the same terms for the same concepts, but many don’t. Sometimes, these differences in meaning are quite subtle. For example, in one system, “number-of-employees” may include full-time and part-time employees but not contractors, whereas in another system, it includes all full-time workers, regardless of whether they are regular employees or contractors. If users combine results across systems without understanding these details, the resulting data is unlikely to satisfy the needs of the application.

Level 3: Mediate between diverse representations.

Integrators often must reconcile different representations of the same concept. For example, one system might measure altitude in meters from the earth's surface while another measures it in miles from the earth's center. In the future, application developers may define interfaces in terms of
abstract attributes using "self-description"—for example, Altitude (datatype=integer, units=miles). Mediators can use these descriptions to shield users from the representational details.

Levels 2 and 3 can be addressed by developing mappings across systems.

Level 4: Merge instances from multiple sources.

You can do this through data correlation and data-value reconciliation (sometimes called fusion). Data correlation determines if two objects, usually from different data sources, refer to the same real-world object. For example, if the Criminal Records database has "John Public, armed robber, born 1 Jan. 1970" and the Motor Vehicle Registry database has "John Public Sr., license plate JP-1, born 9 Sept. 1939," might a police query consider these to refer to the same person and return "John Public, armed robber"?

Data correlation can identify different sources that disagree about particular facts. Suppose three sources report John Public's height to be 180, 187, and 0 centimeters, respectively. Data-value reconciliation can be used to determine what values the search should return to the application. This capability requires detailed application knowledge. Vendors and researchers are increasing their efforts in the "data-cleaning" area to help administrators specify the desired policy, semi-automatically identify candidate objects to be merged, and—if cost-justified—resolve individual instances. Reconciliation rules should be flexible, modular, and displayable to domain experts who lack programming skills.

Typically, you must address these challenges in order, from lowest to highest. For example, unless the reconciliation meets the challenges of geographic distribution and diverse infrastructures, addressing higher levels will yield little benefit. For information exchange, levels 1–3 are sufficient, while integration efforts also require information merging.

This issue of The Edge discusses how we have addressed these levels through several approaches.

General architecture approaches for information interoperability include:

  • Integration within the application. An application or Web portal communicates directly with each source using that source's native interface and reconciles the data it receives. While common, this approach has serious drawbacks: it places great demands on the application developer, who must stay knowledgeable about each of several data interfaces. In addition, information combination becomes part of the code base that must be maintained, making it difficult to leverage commercial database management or middleware products.
  • Data warehouses. Administrators define a global schema (i.e., a template) for the shared data. They provide the derivation logic to reconcile data and pump it into one system, typically with the help of extract-transform-load tools. Typically, the warehouse is read-only, with updates made directly on the source systems. As a variation, data marts give individual communities their own subsets of the global data.
  • Federated databases. These virtual data warehouses do not populate the global schema. Instead, the source systems retain the physical data and a middleware layer translates all requests to run against the source systems. Commercial companies call this "enterprise information integration."
  • Messaging. One application or database uses structured messages to pass data to others. Often, however, the sender and receiver use different terms for the same concepts, so that the data must be transformed to meet the needs of the receiver. Enterprise application integration (EAI) products support message-based interoperability.
  • Parameter passing. One application invokes another and passes
    data as parameters. Web services are an example of this architecture, in which services are invoked and described using standard Web languages and protocols. EAI products also support this architecture.

Challenges

The technical issues of these approaches revolve around heterogeneity, distribution, and multiple versions. In general, the greatest challenges lie in semantics.

The framework presented in this article can be used to evaluate interoperability approaches. For example, if a vendor describes their product as being "the answer" to information interoperability, you can ask them which of the four levels their product addresses. If they say "all of them," be suspicious!

Information Interoperability Issue

Summer 2004
Vol. 8, No. 1



Introduction

Arnon Rosenthal and Len Seligman


A Framework for Information Interoperability

Len Seligman and Arnon Rosenthal


How Do We Build Information Systems That Support Network-Centric Warfare?

Scott Renner


Network Representations Support Powerful Data Analysis

Sarah Piekut, Lowell Rosen, and Daniel Venese


The Semantic Web: A Path to Large-Scale Interoperability

Frank Manola, Mary Pulvermacher, and Leo Obrst


Mapping Among Independently Developed Aviation Information Systems Increases Interoperability

Catherine Bolczak, Len Seligman, Nels Broste, Ron Schwarz, and Shawne Lampert


Using Data Warehousing to Integrate Multiple Sources of Data

Victor Pérez-Núñez, Robert Jurgens, Larry Hughes, and Ali Obaidi


Creating Standards for Multiway Data Sharing

Elizabeth Harding, Leo Obrst, and Arnon Rosenthal


Formatted Messaging Modernization Exploits XML Technologies

Robert W. Miller, Mary Ann Malloy, and Ed Masek


pdf icon Download this issue [1.2MB]

 

For more information, please contact guest editors Arnon Rosenthal or Len Seligman using the employee directory.


Page last updated: July 30, 2004   |   Top of page

Homeland Security Center Center for Enterprise Modernization Command, Control, Communications and Intelligence Center Center for Advanced Aviation System Development

 
 
 

Solutions That Make a Difference.®
Copyright © 1997-2013, The MITRE Corporation. All rights reserved.
MITRE is a registered trademark of The MITRE Corporation.
Material on this site may be copied and distributed with permission only.

IDG's Computerworld Names MITRE a "Best Place to Work in IT" for Eighth Straight Year The Boston Globe Ranks MITRE Number 6 Top Place to Work Fast Company Names MITRE One of the "World's 50 Most Innovative Companies"
 

Privacy Policy | Contact Us