Technology Symposium banner
Tech Symposium Project List Tech Symposium Table of Contents MITRE home page

Information Management -- Projects

pixel spacer

Information Management

Information Management investigates databases, distributed databases, data mining and legacy databases.


Building the Semantic Web

Frank Manola, Principal Investigator

Bedford and Washington

Problem
As the amount of information on the World Wide Web continues to grow, the value of automated tools capable of finding, filtering, and combining information in response to specific user requirements greatly increases. The largest barrier preventing more automated use of Web resources is that the semantics (meaning) of these resources is generally unavailable to automated agents.

Objectives
The objective of this project is to develop technical foundations for a "Semantic Web," in which programs such as agents, search engines, or service brokers can identify and use World Wide Web resources (including both information and services) based on machine-readable representations of their semantics.

Activities
We are investigating language concepts for representing and processing semantic information that scale to the Web environment, and application areas that include eBusiness and disaster relief. We are participating in the World Wide Web Consortium's Semantic Web Activity, engaging in joint research with MIT's Context Interchange (COIN) project, and cooperating with researchers in DARPA's DAML (DARPA Agent Markup Language) program.

Impacts
This research addresses a key area of current Web technology development, impacting numerous MITRE programs dependent on Web technologies such as XML, as well as wider eBusiness and other communities addressing issues of large-scale interoperability. The research also provides technology transfer opportunities with a wide range of academic and industry R&D activities and standards groups.

Project Summary Chart Presentation [PDF]

CEM Design and Project Support Laboratory

Victor Perez-Nunez, Principal Investigator

Washington only

Problem
The evaluation of proposed approaches and technologies that will be used in the IRS information and data management research project requires access to development tools used in the modernization effort. For example, tools are needed to support research on the impact of XML on data processing, and research on a metadata repository.

Objectives
CEM will develop an internal facility of resources and technologies to support the research on key topics identified in the IRS information and data management research project. This facility would allow CEM staff to have access to development tools used in modernization efforts as well as alternative tool sets for the evaluation of proposed technical approaches.

Activities
The CEM Design and Project Support Laboratory will host a set of tools for research and evaluation of key topics. The initial set of tools includes the Rational Enterprise Suite and System Architect Modeling Tool for the initial evaluation of metadata repository approaches, and IBM DB2 and Extend Business Process Modeling Tool for the impact analysis of XML in data processing.

Impacts
The facility will enable CEM staff to conduct independent research on topics that are directly relevant to the IRS modernization program. Results of this research should provide the basis for MITRE to make proactive recommendations on related technical work by contractors.

Project Summary Chart Presentation [PDF]

Data and Information Management Research at the IRS

Victor Perez-Nunez, Principal Investigator

Washington only

Problem
Currently, the IRS has limited ability to capture information and data from multiple inputs such as tax returns and information returns and to analyze and provide timely reports from multiple data source systems. Changing regulations have altered business rules that affect the processing of more than 40 terabytes of taxpayer data.

Objectives
The project will define a research agenda that will guide information innovation grant studies addressing information and data management (IM/DM) topics relevant to the IRS. These studies will assist in determining approaches for dealing with IRS challenges in managing the increasing information demand and volume of data for the current and future systems.

Activities
Current IRS information and data management challenges will be identified and will guide the identification of research topics. Investigators will further define each topic and a proposed approach for addressing it. The initial set of topics will include determining the impact of XML on business processing, defining approaches for metadata repositories, and implementing and managing business rules.

Impacts
As new components of the IRS modernization program are defined and developed. MITRE will be able to directly relate and apply its research results regarding the topics defined in this project. This work should assist the IRS in implementing information systems that support the goal of providing timely information to stakeholders.

Project Summary Chart Presentation [PDF]

Data Integration as an Industrial Process

Len Seligman, Co-Principal Investigator

Arnon Rosenthal, Co-Principal Investigator

Washington only

Problem
Data integration requires too much human time and skill. We need to industrialize, to create narrow-skill steps, each of which produces reusable knowledge rather than opaque code. To move from (easily evaded) mandates to natural incentives, we will explore "describe and generate" tools to make even the first connection easier. The approach should be incremental, driven by real interoperability needs, not special initiatives.

Objectives
Our goals are to refine the industrial approach and to move industry, the research community, and sponsors toward that vision. Specifically, we will extend (very scalable) profile-driven integration techniques to be compatible with commercial multidatabase query tools, develop metrics to help project planners compare data integration techniques and judge tools' utility, and evaluate emerging describe-and-generate data integration research prototypes in real projects.

Activities
In the first year we performed a modular breakdown of integration steps and compared profile-driven against federated techniques. We also began constructing an experimental framework and metrics for comparing integration techniques. In the second year we will conduct experiments using research prototypes (e.g., IBM Research's Clio) with aviation, brain mapping, and tax administration data; conduct a survey of data integration practitioners to determine where the costs are the greatest; and adapt metrics to improve project planning. In the third year we will refine the metrics and perform further experimentation. Throughout, we will publish results and transition them to MITRE and sponsor projects.

Impacts
We will reframe a critical technology to reflect rarely addressed organizational realities. We will influence emerging industrial tools and researchers' agendas, and provide metrics where none previously existed. MITRE's sponsors will be aided in moving from giant doomed data integration initiatives to incremental progress.

Project Summary Chart Presentation [PDF]

Database Curation and Access for Bioinformatics

Lynette Hirschman, Principal Investigator

Bedford and Washington

Problem
Biological databases store information on proteins, genes, and their functions. The biomedical literature describes the experiments behind the database entries. Many databases lag behind the literature because they require biologists to transfer the information from articles to database entries. Biologists need interactive tools to help in the timely and consistent transfer of information from the literature into the databases (the "curation" process).

Objectives
This project will develop interactive techniques for the curation of biological databases. These techniques will allow curators of databases to maintain currency and consistency of these databases in the face of exponential growth of research in genomics and proteomics. To provide the curation tools, we will develop information mining methods for free text and structured data, specifically geared to the biology domain.

Activities
In the first year we will determine requirements for biological database curation and mine existing databases for training and test data. We will also develop an initial curation system prototype and conduct initial technology and user-centered evaluations. In the second year we will refine our interactive curation system and further evaluate it. Finally, we will explore a Question & Answer front-end.

Impacts
This work will have impact on the biology community by defining methods and evaluations for automating database curation. We estimate that hundreds to thousands of biology databases exist, and the number is growing rapidly. Investment in this area leverages MITRE's expertise in text data mining and databases, allowing MITRE to become a significant player in bioinformatics.

Project Summary Chart Presentation [PDF]

Distributed Metadata Service

Dock Allen, Principal Investigator

Bedford only

Problem
DOD and commercial practice are moving towards dynamic composition of Web-based systems using publish/subscribe paradigms and metadata (e.g., XML). The Joint Battlespace Infosphere (JBI), Universal Description, Discovery, and Integration (UDDI), and peer-to-peer systems are examples of this. For these "infospheres" to succeed, they must be scalable, flexible, and evolvable, and support component reuse both in applications and in the infosphere services themselves.

Objectives
We are "pushing the edge" of the architecture envelope for Web-based infospheres with respect to distribution and participant stewardship of metadata (for scalability), uniformity (for simplicity and reuse), and dynamic integration of components (for flexibility). As these architectural principles are validated, we transition them into DOD initiatives, programs, and commercial use.

Activities
We will define a metadata-based architecture to improve scalability, uniformity, and flexibility, and a profiling language for advertising Web service / information "haves" and "needs." We will develop components and a software development kit (SDK) for architecture evaluation, technology transition, and demonstrations. As architectural principles are validated, we will transition them via consultation, delivery of components, and training and provide inputs to industry consortia, conferences, and discussion groups.

Impacts
Our profiling language, which expresses both "haves" and "needs," is used for the JBI. Our brokers, profile editors, and SDK will be transitioned to the 3Q2002 JBI release. We support projects in becoming "JBI enabled." We consulted with a MITRE project developing an information service and XML schema that allows "advertising" of UAV intelligence products, as well as "publishing needs" for UAV intelligence products.

Project Summary Chart Presentation [PDF]

ISR Information Service (ISRIS)

John Kane, Principal Investigator

Bedford and Washington

Problem
Legacy intelligence, surveillance, and reconnaissance (ISR) systems connect through stove-piped interfaces to command and control (C2) systems. This limits the ability to form a common operational picture to support missions such as time-critical targeting. A variety of "battlespace intranets" are emerging to address this problem, but the challenge remains: how will ISR assets connect to these information management initiatives?

Objectives
This project is experimenting with the integration of advanced Internet technologies into the ISR sensor ground station. The objective is to enable access for all C2 users to the real-time services and data of an ISR platform from within a user's standard Web browser, and ultimately show the way ahead to the next generation of DOD ISR and C2 Web services.

Activities
The FY01 ISR Information Service (ISRIS) prototype supports the Air Force Global Hawk UAV. We plan to experiment with ISRIS during live flights in FY02 using servers on the MITRE MII and on SIPRNET. We will extend the ISRIS concept to the Predator UAV and U2. The subscription and profiling technology of the Air Force Joint Battlespace Infosphere (JBI) will be integrated with ISRIS.

Impacts
ISRIS research is helping the DOD develop the concepts and technology for next generation Web services on battlespace internets. These services will enable an unprecedented level of access to real-time situational awareness information and raw sensor data. Doing this within the user's generic browser will streamline deployment and enable a browser-based common operational picture.

Project Summary Chart Presentation [PDF]

Managing Data Quality with Shared Views

Eric R. Hughes, Principal Investigator

Bedford and Washington

Problem
Data quality, defined as fitness for use, is increasingly seen as a serious problem in government and private sector databases. Sometimes the data is of low quality; in other cases users cannot easily determine whether the data is satisfactory. Consequences range from user mistrust, lack of use, and creation of redundant stores to mission failures.

Objectives
We have developed technology to manage data quality annotations: to store them using a defined model, to capture them with minimal impact on users by modifying existing production tools, and to use them by extending existing applications. We will show how this technology can be applied to two systems by extending database view technology to include quality annotations and to propagate data quality annotations between systems.

Activities
We will make recommendations on improving data quality within two key government databases. These recommendations will include use of views and quality annotations for improved propagation of data between the systems. We will also develop a data quality assessment tool and demonstrate it for one of the systems.

Impacts
Data quality annotations help a user employ data appropriately and can inform organizational efforts at data quality improvement. We will enable end-to-end management of data quality via creation of better (and more easily maintained) data quality views for various communities of interest (COIs). Our work will make it easier for COIs to form and interact with government databases.

Project Summary Chart Presentation [PDF]

Neuroinformatics

Jordan Feidler, Principal Investigator

Washington only

Problem
The neuroscience community is accumulating a vast amount of human brain mapping data using techniques that operate over many spatial scales. Currently data use is generally limited to the lab of origin; the data is not readily available to other investigators for subsequent studies. Data may exist that a researcher could use to explore a particular hypothesis, but that investigator is not aware of its existence or does not have ready access to it.

Objectives
The overall goal of this research is to design, prototype, and evaluate an information infrastructure for human brain mapping data which will help realize the full potential of this growing store of mapping data. In this initial undertaking, we focus on a system that enables the analysis, exploration, and dissemination of structural magnetic resonance imaging (MRI) data from multiple labs.

Activities
The project will develop and deploy a digital library for structural MRI data; design a warehouse of structural MRI data; design a content-based system that will allow users to retrieve images with features similar to those in a submitted example; develop techniques that enable users to aggregate warehouse information into a probabilistic brain atlas; and develop a prototype system that will ground architecture and query language development in a real-world setting.

Impacts
The problems encountered by the human brain mapping community are isomorphic to those encountered by many of MITRE's traditional sponsors. We expect research conducted under the auspices of this project to be readily transitioned to our Treasury Department, IRS, DOD, and USGC sponsors. In addition, this project provides an important public service to the neuroscience research and clinical communities.

Project Summary Chart Presentation [PDF]

Next Generation Joint Battlespace Infosphere (JBI) Core Services

Robert Cherinka, Principal Investigator

Bedford only

Problem
Digital information rapidly is becoming integrated into all aspects of military activities. Operations are becoming increasingly fast-paced and diverse. To provide commanders with the knowledge required to make decisions in this environment, a greatly enhanced command and control (C2) concept for intelligence gathering, dissemination and visualization is needed, based on revolutionary new information-age concepts and technologies.

Objectives
Our primary objective is to evaluate and integrate the first versions of the best existing research and commercial technologies into the core services of the C2 Enterprise Integration/Common Integrated Infrastructure (C2EI/CII) platform in a manner consistent with the overall JBI vision. This initial integrated set of services is expected to be used by early adopter SPO prototyping efforts.

Activities
The modeling of user information needs and decision goals will be the focus for FY02 research. In addition, investigation of deployable commercial products consistent with major C2 SPO selections and the JBI vision must be continued. Finally, continued improvement to existing JBI functional services will continue. These core services will be evaluated and prototyped for integration into C2EI/CII enterprise services.

Impacts
This project should provide better insight and a working relationship among major S&T community members and form a bridge between technology and information management research. The integration of "best of breed" implementations for each of the C2EI/JBI platform components can form the basis for C2 SPOs and other DOD related programs to adopt an adaptable Web services-based architecture.

Project Summary Chart Presentation [PDF]

Using Domain Knowledge in Data Mining

Zohreh Nazeri, Principal Investigator

Washington only

Problem
Domain knowledge is not being fully exploited by current data mining methods. As a result of this deficiency, data mining tools generate a large number of patterns that are not "interesting," i.e., overly complex, already known or unnecessarily low in predictive power.

Objectives
Directly incorporating domain knowledge into data mining algorithms should improve the quality of the output and therefore reduce the amount of manual filtering and interpretation of the discovered rules (required to be done by humans) and improve the overall efficiency of the process.

Activities
We plan to research the proposed method by accomplishing the following: acquiring domain knowledge and preferences and representing them in a suitable format; modifying Association Rules and Decision Trees algorithms to allow direct use of domain knowledge and preferences within the mining process; evaluating the effect of our techniques; and summarizing our research activities in a conference paper.

Impacts
Because of the popularity of the selected algorithms and the generality of our approach, this work can have a positive impact on any of the ongoing data mining projects within MITRE and on the broader data mining community and tool vendors who support this work.

Project Summary Chart Presentation [PDF]

pixel spacer

Technology Areas

Architectures

Collaboration and Visualization

Communications and Networks

Computing and Software

Decision Support

Electronics

Human Language

Information Assurance

Information Management

Intelligent Information Processing

Investment Strategies

Modeling, Simulation, and Training

Sensors and Environment

Other Projects