![]() |
|||||
|
|
|
|
||||
Using Data Warehousing to Integrate Multiple Sources of Data Victor Pérez-Núñez, Robert Jurgens, Larry Hughes, and Ali Obaidi multibillion dollar defense enterprise needs to eliminate bottlenecks and decrease interdependency in the more than 100 systems and databases it uses to support its missions. A nationwide aviation repository system seeks to provide its subscribers with up-to-date, user-friendly, and anomaly-free access to a host of disparate and underdeveloped data sources. Both enlisted MITRE's information interoperability expertise in finding a solution. Each project required a different approach, but they both involved the use of a data warehouse to achieve information interoperability. A data warehouse is an enterprise-wide, very large database that stores normalized data. It is the source of smaller data marts focusing on subject areas (e.g., department-wide or project-based) that are extracted into multidimensional databases often intended for decision-support systems, online analytic process, or data-mining applications. The resulting system facilitates both information access and analysis. The typical data warehouse integrates multiple sources using:
Project #1: Air Force Training The Air Education and Training Command (AETC)—a $7 billion organization that conducts recruiting and a wide variety of military training, including advanced professional/military education—collects and stores a tremendous amount of information on its students and activities. MITRE is working with the AETC in its business and IT modernization program, work that includes addressing data exchange (interoperability) and business intelligence applications. Overall, AETC has more than 100 legacy systems/databases; each legacy system has multiple interfaces that are inherently batch oriented (e.g., point-to-point transactional or flat file exchange), all with different operational goals. This preponderance of systems creates interoperability bottlenecks, hampers flexibility, and elevates system interdependency. To solve these system problems, we are helping AETC engineer an enterprise architecture, information broker, data warehouse, and other capabilities that will address the needs of system integration, interdependency, interoperability, information exchange, and business intelligence. We started by creating a typical data warehouse architecture, such as the one illustrated in Figure 1, which includes:
The data warehouse architecture we are developing for AETC will serve as an enterprise data management system and architecture for the student registration and records system. The architecture includes an operational data store, an information broker (to extract, transform, and load using publish/subscribe and push/pull technologies), an enterprise data warehouse, decision-support tools, a Web-portal user interface, a metadata repository, and an off-line historical archive. Eventually, the data warehouse will collect all the relevant data while providing AETC applications for resource use/cost analyses and data mining. Our overall work with the AETC is much broader than the development of the data warehouse, but that is one of the solutions we are working on to resolve intersystem dependencies, effect cultural change, cleanse the data that impacts legacy business rules, and enable data "on demand." Our work will support the AETC as it moves from independently "owned and operated" organizations to a headquarters-run enterprise that relies on end-to-end data-driven business decisions.
Project #2: Sharing Aviation Data MITRE's Center for Advanced Aviation System Development (CAASD), the FAA's federally funded research and development center, is spearheading another data warehousing project. The CAASD Repository System (CRS) brings together independent data sources into an easily digested shared format that can be accessed by a multitude of users, inside and outside of MITRE. CAASD began working on CRS in late 1999 to provide a cost-effective system that consolidates commonly used analytical aviation data, adds value to the data, and removes anomalies, while preserving high performance and ensuring scalability. The goal was to develop a system that would enable us to better serve our aviation customers. We continue to enhance the CRS, recently developing a state-of-the-art data warehouse that allows users to store new data in the repository, process the data, and export it into different formats. Today, CRS is widely used by MITRE staff, enabling them to work more efficiently. For example, when the oceanic operations analysis group needed to extract data on arrival/departure airports corresponding to enhanced traffic management system ocean messages, it took group members just 10 minutes to produce data that was virtually impossible to extract before the creation of CRS. The CRS was recently enhanced with an added batch query interface capability, which enabled the New York airspace redesign team to perform a longitudinal query over an entire month of flights, searching for those corresponding to strict noise requirements. These flights were then fed into the total Airspace and Airport Modeler tool for further analysis. These are just two examples of how we are using CRS to save time and
improve data gathering. CRS data have value to a wide user group, including
researchers, analysts, and other FAA research laboratories. It addresses
some of the major problems these groups face: that aviation data sources
are not designed to talk to each other and are inconsistent and sometimes
misleading and incomplete. We are increasing the value of the system by including additional data sources, reducing the storage requirements for each data source, increasing storage capacity for historical data, optimizing the retrieval processes, and providing a robust data model standardized within the aviation community. Still Much to Learn For a number of years, we have been researching many areas relevant to data warehouses and integration. For example, we have conducted research projects in data quality, the Semantic Web, and data integration. And there are many other data-warehousing challenges that MITRE continues to investigate, such as the management and integration of schemas from multiple data sources for data sharing. Data warehousing has proven to be a powerful tool in information interoperability with the potential to become more powerful still as our knowledge and experience grows.
|
|
||||
| For more information, please contact Victor Pérez-Núñez, Robert Jurgens, Larry Hughes or Ali Obaidi using the employee directory. Page last updated: August 5, 2004 | Top of page |
|||||
Solutions That Make a Difference.® |
|
|