A Framework for Information
Interoperability
Len Seligman and Arnon Rosenthal
o
rapidly respond to new opportunities and threats, both government and
industry are looking for faster, cheaper ways of sharing information via
computer systems. In response, vendors offer a new "solution" every few
years, such as data warehouses, Web services, "enterprise information
integration" tools, and ontologies. While these are all potentially useful
when applied appropriately, none is a silver bullet. Each organization
has to assess its own needs and the best approach to meet them.
In all cases, the goal is to make available information that sources
have and are willing to export. The framework presented in this article
can be used to evaluate interoperability approaches. We'll describe the
most common architectures available for achieving interoperability and
the challenges that still stand in the way.
There are two main types of information interoperability:
- Exchange, in which a producer (such as the Department of
Defense) provides information to a consumer (such as NATO), and the
information is transformed to suit the consumer's needs.
Integration, in which in addition to being transformed, information
from multiple sources is also correlated and fused. In general, the
consumer sees a single, coherent view rather than all the systems' opinions.
Exchange requires addressing the first three problem levels in figure
1.
- Integration requires that all four levels be addressed. You
can use these problem levels to help you analyze proposed interoperability
solutions.
Level 1: Overcome geographic distribution and
infrastructure heterogeneity.
Data can be widely distributed geographically. In addition, to access
the data you must overcome several types of infrastructure heterogeneity
including:
- Different data-structuring primitives, such as relational database
tables versus XML versus objects
- Different data manipulation languages (such as SQL or XQuery), proprietary
data languages, and sources with no query language that require use
of a general purpose pro- gramming language (e.g., Java)
- Different platforms, operating systems, networks, etc.
Level 1 challenges are not as resource-consuming as the others because
off-the-shelf middleware products handle most of these challenges. In
certain environments (e.g., tactical military applications), however,
significant engineering is still required at this level.
 |
Figure 1:
The four levels of
information integration
|
Level 2: Match semantically compatible attributes.
Some independently developed information systems use the same terms for
the same concepts, but many don’t. Sometimes, these differences in meaning
are quite subtle. For example, in one system, “number-of-employees” may
include full-time and part-time employees but not contractors, whereas
in another system, it includes all full-time workers, regardless of whether
they are regular employees or contractors. If users combine results across
systems without understanding these details, the resulting data is unlikely
to satisfy the needs of the application.
Level 3: Mediate between diverse representations.
Integrators often must reconcile different representations of the same
concept. For example, one system might measure altitude in meters from
the earth's surface while another measures it in miles from the earth's
center. In the future, application developers may define interfaces in
terms of
abstract attributes using "self-description"—for example, Altitude
(datatype=integer, units=miles). Mediators can use these descriptions
to shield users from the representational details.
Levels 2 and 3 can be addressed by developing mappings across systems.
Level 4: Merge instances from multiple sources.
You can do this through data correlation and data-value reconciliation
(sometimes called fusion). Data correlation determines if two objects,
usually from different data sources, refer to the same real-world object.
For example, if the Criminal Records database has "John Public, armed
robber, born 1 Jan. 1970" and the Motor Vehicle Registry database has
"John Public Sr., license plate JP-1, born 9 Sept. 1939," might a police
query consider these to refer to the same person and return "John Public,
armed robber"?
Data correlation can identify different sources that disagree about
particular facts. Suppose three sources report John Public's height to
be 180, 187, and 0 centimeters, respectively. Data-value reconciliation
can be used to determine what values the search should return to the application.
This capability requires detailed application knowledge. Vendors and researchers
are increasing their efforts in the "data-cleaning" area to help administrators
specify the desired policy, semi-automatically identify candidate objects
to be merged, and—if cost-justified—resolve individual instances.
Reconciliation rules should be flexible, modular, and displayable to domain
experts who lack programming skills.
Typically, you must address these challenges in order, from lowest to
highest. For example, unless the reconciliation meets the challenges of
geographic distribution and diverse infrastructures, addressing higher
levels will yield little benefit. For information exchange, levels 1–3
are sufficient, while integration efforts also require information merging.
This issue of The Edge discusses how we have addressed these
levels through several approaches.
General architecture approaches for information interoperability include:
- Integration within the application. An application or Web
portal communicates directly with each source using that source's native
interface and reconciles the data it receives. While common, this approach
has serious drawbacks: it places great demands on the application developer,
who must stay knowledgeable about each of several data interfaces. In
addition, information combination becomes part of the code base that
must be maintained, making it difficult to leverage commercial database
management or middleware products.
- Data warehouses. Administrators define a global schema (i.e.,
a template) for the shared data. They provide the derivation logic to
reconcile data and pump it into one system, typically with the help
of extract-transform-load tools. Typically, the warehouse is read-only,
with updates made directly on the source systems. As a variation, data
marts give individual communities their own subsets of the global data.
- Federated databases. These virtual data warehouses do not
populate the global schema. Instead, the source systems retain the physical
data and a middleware layer translates all requests to run against the
source systems. Commercial companies call this "enterprise information
integration."
- Messaging. One application or database uses structured messages
to pass data to others. Often, however, the sender and receiver use
different terms for the same concepts, so that the data must be transformed
to meet the needs of the receiver. Enterprise application integration
(EAI) products support message-based interoperability.
- Parameter passing. One application invokes another and passes
data as parameters. Web services are an example of this architecture,
in which services are invoked and described using standard Web languages
and protocols. EAI products also support this architecture.
Challenges
The technical issues of these approaches revolve around heterogeneity,
distribution, and multiple versions. In general, the greatest challenges
lie in semantics.
The framework presented in this article can be used to evaluate interoperability
approaches. For example, if a vendor describes their product as being
"the answer" to information interoperability, you can ask them which of
the four levels their product addresses. If they say "all of them," be
suspicious! |
Information Interoperability Issue
Summer 2004
Vol. 8, No. 1
Introduction
Arnon Rosenthal and Len Seligman
A Framework for Information Interoperability
Len Seligman and Arnon Rosenthal
How Do We Build Information Systems That Support Network-Centric Warfare?
Scott Renner
Network Representations Support Powerful Data Analysis
Sarah Piekut, Lowell Rosen, and Daniel Venese
The Semantic Web: A Path to Large-Scale Interoperability
Frank Manola, Mary Pulvermacher, and Leo Obrst
Mapping Among Independently Developed Aviation Information Systems Increases Interoperability
Catherine Bolczak, Len Seligman, Nels Broste, Ron Schwarz, and Shawne Lampert
Using Data Warehousing to Integrate Multiple Sources of Data
Victor Pérez-Núñez, Robert Jurgens, Larry Hughes, and Ali Obaidi
Creating Standards for Multiway Data Sharing
Elizabeth Harding, Leo Obrst, and Arnon Rosenthal
Formatted Messaging Modernization Exploits XML Technologies
Robert W. Miller, Mary Ann Malloy, and Ed Masek
Download this issue [1.2MB]
|
|