About Us Our Work Employment News & Events
MITRE Remote Access for MITRE Staff and Partners Site Map

Home > News & Events > MITRE Publications > The Edge >

Using Data Warehousing to Integrate Multiple Sources of Data

Victor Pérez-Núñez, Robert Jurgens, Larry Hughes, and Ali Obaidi

multibillion dollar defense enterprise needs to eliminate bottlenecks and decrease interdependency in the more than 100 systems and databases it uses to support its missions.

A nationwide aviation repository system seeks to provide its subscribers with up-to-date, user-friendly, and anomaly-free access to a host of disparate and underdeveloped data sources. Both enlisted MITRE's information interoperability expertise in finding a solution.

Each project required a different approach, but they both involved the use of a data warehouse to achieve information interoperability. A data warehouse is an enterprise-wide, very large database that stores normalized data. It is the source of smaller data marts focusing on subject areas (e.g., department-wide or project-based) that are extracted into multidimensional databases often intended for decision-support systems, online analytic process, or data-mining applications. The resulting system facilitates both information access and analysis.

The typical data warehouse integrates multiple sources using:

  • Data profiling: Categorizes features of the data sources, such as data structures and data domain
  • Data cleansing: Removes inconsistencies in the data; measures accuracy, completeness, and timeliness of data
  • Extract, transform, and load (ETL): Accesses, reformats, and integrates multiple types of data
  • Metadata repositories: Stores and manages data definitions and metadata from source to target.

Project #1: Air Force Training

The Air Education and Training Command (AETC)—a $7 billion organization that conducts recruiting and a wide variety of military training, including advanced professional/military education—collects and stores a tremendous amount of information on its students and activities. MITRE is working with the AETC in its business and IT modernization program, work that includes addressing data exchange (interoperability) and business intelligence applications.

Overall, AETC has more than 100 legacy systems/databases; each legacy system has multiple interfaces that are inherently batch oriented (e.g., point-to-point transactional or flat file exchange), all with different operational goals. This preponderance of systems creates interoperability bottlenecks, hampers flexibility, and elevates system interdependency.

To solve these system problems, we are helping AETC engineer an enterprise architecture, information broker, data warehouse, and other capabilities that will address the needs of system integration, interdependency, interoperability, information exchange, and business intelligence.

We started by creating a typical data warehouse architecture, such as the one illustrated in Figure 1, which includes:

  • An extract-transform-load (ETL) process to pull information from multiple operational data stores and/or
    transactional systems
  • The enterprise-level data warehouse containing the integrated data
  • The metadata repository containing the information that enables the understanding of the data origins and transformations
  • A second ETL process, which enables the construction of data marts for decision-support systems, business intelligence, or data-mining purposes.

The data warehouse architecture we are developing for AETC will serve as an enterprise data management system and architecture for the student registration and records system. The architecture includes an operational data store, an information broker (to extract, transform, and load using publish/subscribe and push/pull technologies), an enterprise data warehouse, decision-support tools, a Web-portal user interface, a metadata repository, and an off-line historical archive.

Eventually, the data warehouse will collect all the relevant data while providing AETC applications for resource use/cost analyses and data mining.

Our overall work with the AETC is much broader than the development of the data warehouse, but that is one of the solutions we are working on to resolve intersystem dependencies, effect cultural change, cleanse the data that impacts legacy business rules, and enable data "on demand." Our work will support the AETC as it moves from independently "owned and operated" organizations to a headquarters-run enterprise that relies on end-to-end data-driven business decisions.

Figure 1: Typical architecture for a data warehouse system, as used by the Air Education and Training Command
Figure 1: Typical architecture for a data warehouse system, as used by the Air Education and Training Command

Project #2: Sharing Aviation Data

MITRE's Center for Advanced Aviation System Development (CAASD), the FAA's federally funded research and development center, is spearheading another data warehousing project. The CAASD Repository System (CRS) brings together independent data sources into an easily digested shared format that can be accessed by a multitude of users, inside and outside of MITRE.

CAASD began working on CRS in late 1999 to provide a cost-effective system that consolidates commonly used analytical aviation data, adds value to the data, and removes anomalies, while preserving high performance and ensuring scalability. The goal was to develop a system that would enable us to better serve our aviation customers. We continue to enhance the CRS, recently developing a state-of-the-art data warehouse that allows users to store new data in the repository, process the data, and export it into different formats.

Today, CRS is widely used by MITRE staff, enabling them to work more efficiently. For example, when the oceanic operations analysis group needed to extract data on arrival/departure airports corresponding to enhanced traffic management system ocean messages, it took group members just 10 minutes to produce data that was virtually impossible to extract before the creation of CRS.

The CRS was recently enhanced with an added batch query interface capability, which enabled the New York airspace redesign team to perform a longitudinal query over an entire month of flights, searching for those corresponding to strict noise requirements. These flights were then fed into the total Airspace and Airport Modeler tool for further analysis.

These are just two examples of how we are using CRS to save time and improve data gathering. CRS data have value to a wide user group, including researchers, analysts, and other FAA research laboratories. It addresses some of the major problems these groups face: that aviation data sources are not designed to talk to each other and are inconsistent and sometimes misleading and incomplete.

The CRS data warehouse system currently contains a database of more than 260 tables and 600 gigabytes of data. This spans more than 20 different data sources, including information on weather, airports, traffic data, sectorization, and schedules. On top of CRS, we created a technology that provides user-designed forms from the CRS database and automatically generates optimized Sequential Query Language queries from the forms.

We are increasing the value of the system by including additional data sources, reducing the storage requirements for each data source, increasing storage capacity for historical data, optimizing the retrieval processes, and providing a robust data model standardized within the aviation community.

Still Much to Learn

For a number of years, we have been researching many areas relevant to data warehouses and integration. For example, we have conducted research projects in data quality, the Semantic Web, and data integration. And there are many other data-warehousing challenges that MITRE continues to investigate, such as the management and integration of schemas from multiple data sources for data sharing.

Data warehousing has proven to be a powerful tool in information interoperability with the potential to become more powerful still as our knowledge and experience grows.

Secrets to Data Warehouse Success

Through our data warehouse development efforts, we have determined that successful projects share the following characteristics:

  • They are business-driven: Business analysis focuses the project on high-payback data (e.g., 20/80 rule where it concentrates on the 20 percent of the data sources that supply 80 percent of the business needs) and concrete business results/reports that are traceable to the data through designs, models, requirements, and needs. Periodic return-on-investment evaluations are conducted based on pre-build calculations (traditional cost-to-savings ratios) and post-build calculations (business-based or application-based cost justifications).
  • They are incrementally built: Early, small successes are emphasized to gain user support and acceptance.
  • They are based on proper modeling: The data warehouse logical model is normalized to the maximum practical degree to support data extracts of subsets for data marts used for decision support and analysis.
  • They are data quality-centered: The data model and the "extract, transform, and load" application are employed to boost overall data quality by improving many characteristics of the data such as accuracy, consistency, and minimized redundancy. Addressing data quality from the beginning is very important because ensuring data quality can account for as much as 60 to 80 percent of the total cost of the data warehousing project.

 

Information Interoperability Issue

Summer 2004
Vol. 8, No. 1



Introduction

Arnon Rosenthal and Len Seligman


A Framework for Information Interoperability

Len Seligman and Arnon Rosenthal


How Do We Build Information Systems That Support Network-Centric Warfare?

Scott Renner


Network Representations Support Powerful Data Analysis

Sarah Piekut, Lowell Rosen, and Daniel Venese


The Semantic Web: A Path to Large-Scale Interoperability

Frank Manola, Mary Pulvermacher, and Leo Obrst


Mapping Among Independently Developed Aviation Information Systems Increases Interoperability

Catherine Bolczak, Len Seligman, Nels Broste, Ron Schwarz, and Shawne Lampert


Using Data Warehousing to Integrate Multiple Sources of Data

Victor Pérez-Núñez, Robert Jurgens, Larry Hughes, and Ali Obaidi


Creating Standards for Multiway Data Sharing

Elizabeth Harding, Leo Obrst, and Arnon Rosenthal


Formatted Messaging Modernization Exploits XML Technologies

Robert W. Miller, Mary Ann Malloy, and Ed Masek


pdf icon Download this issue [1.2MB]

 

For more information, please contact Victor Pérez-Núñez, Robert Jurgens, Larry Hughes or Ali Obaidi using the employee directory.


Page last updated: August 5, 2004   |   Top of page

Homeland Security Center Center for Enterprise Modernization Command, Control, Communications and Intelligence Center Center for Advanced Aviation System Development

 
 
 

Solutions That Make a Difference.®
Copyright © 1997-2013, The MITRE Corporation. All rights reserved.
MITRE is a registered trademark of The MITRE Corporation.
Material on this site may be copied and distributed with permission only.

IDG's Computerworld Names MITRE a "Best Place to Work in IT" for Eighth Straight Year The Boston Globe Ranks MITRE Number 6 Top Place to Work Fast Company Names MITRE One of the "World's 50 Most Innovative Companies"
 

Privacy Policy | Contact Us